Skip to main content
Physiological Genomics logoLink to Physiological Genomics
. 2009 Nov 17;40(3):128–140. doi: 10.1152/physiolgenomics.90403.2008

Dynamism in gene expression across multiple studies

Alexander A Morgan 1, Joel T Dudley 2, Tarangini Deshpande 3, Atul J Butte 1,2,
PMCID: PMC2825768  PMID: 19920211

Abstract

In this study we develop methods of examining gene expression dynamics, how and when genes change expression, and demonstrate their application in a meta-analysis involving over 29,000 microarrays. By defining measures across many experimental conditions, we have a new way of characterizing dynamics, complementary to measures looking at changes in absolute variation or breadth of tissues showing expression. We show conservation in overall patterns of dynamism across three species (human, mouse, and rat) and show associations with known disease-related genes. We discuss the enriched functional properties of the sets of genes showing different patterns of dynamics and show that the differences in expression dynamics is associated with the variety of different transcription factor regulatory sites. These results can influence thinking about the selection of genes for microarray design and the analysis of measurements of mRNA expression variation in a global context of expression dynamics across many conditions, as genes that are rarely differentially expressed between experimental conditions may be the subject of increased scrutiny when they significantly vary in expression between experimental subsets.

Keywords: microarray, dynamics, housekeeping genes, meta-analysis


over 30 years ago, researchers began to realize that in mammals, the major variations in physiology and behavior, indeed the phenotypes upon which natural selection operates, were being predominantly driven not by mutations in the genes themselves, but through variation in gene expression (23, 32). Recent advances in high-throughput measurements of mRNA expression, such as DNA microarrays, have given the biomedical research community the ability to examine these hypotheses in greater detail to obtain a “bird's eye view” of processes at a molecular level (6). The subsequent collection of the measurements of >300,000 samples in repositories such as the National Center for Biotechnology Information's Gene Expression Omnibus (GEO) (3) affords us an opportunity to seek an even loftier view of gene “dynamism,” what might be jocularly termed a “spy satellite view” of the variations in mRNA expression across all genes and all measured conditions. Here we investigate the properties of variability by looking at those genes whose mRNA vary the most across experiments (hyperdynamic genes) and least (hypodynamic genes) and examining the physiological and functional implications of high and low dynamism.

Deciding whether or not a particular mRNA is varying in expression level based on the results of a microarray is a surprisingly difficult problem; this has been the focus of a large amount of work (16, 22, 33, 36, 37, 42). Previous work looking at dynamism of many genes across many experiments has principally focused on two different methods for characterizing dynamics. The first is a measure of breadth of expression across tissues, by looking at the variety of tissues in which a gene is expressed (12, 29, 30, 35). The second is to look at variations of the absolute level of expression across multiple measurements (11, 25, 27, 28).

The examination of expression dynamics on the large scale has been motivated in two different ways. The first has been focused on looking for improved candidate housekeeping genes (11, 25, 27, 28). These efforts typically used much smaller datasets than are now possible and have concentrated on parametric measures of variation. The studies looking at the actual properties of these hypo- and hyperdynamic genes have primarily focused only on sequence based features of conservation and parametric measures of variation looking at such features as the relationship between evolutionary conservation of the gene and dynamics of expression (12, 30, 39, 48); the conservation of proximal, possibly regulatory regions (14, 26); gene size (18, 39); and conflicting reports on the relationship of GC content to dynamics (29, 35). Our own previous work in this area has focused on making the association between high expression dynamism and the likelihood of having a genetic variation associated with disease (9).

To examine dynamism, we need to consider it from a few different perspectives (7). The simplest and most obvious way to look at the dynamism of gene expression is to look at the level of expression measured many times by different researchers drawn from a variety of tissues and under different conditions, and then use the standard summary statistics (e.g., mean, variance) and compare these values. Unfortunately, when one is looking at microarray data, two sets of problems with this approach arise immediately. The first set of problems comes from the properties of microarray data itself, and the second set arises from the practice of analyzing microarray experiments individually.

The nature of raw microarray data makes comparing experimental results difficult. When looking at data from only a single microarray platform, absolute measurements for a particular microarray need to be carefully normalized for each microarray within each experiment, and there is no consensus on the best method for doing this (33). Also, gene expression measurements vary across an enormous range of values and can exhibit multimodal distributions, making it very difficult to robustly compare parametric summary statistics (e.g., mean) directly. Concomitant with these problems, microarray results are typically analyzed by comparing expression levels of each gene between differing experimental conditions and identifying those genes which statistically significantly vary between groups. That means that an important measure of dynamism is how often a gene is seen to be varying between experimental conditions.

To address these issues, we developed a variety of nonparametric measures that are less sensitive to both the very wide range in values and the extreme outliers. We selected three measures to summarize mRNA expression and dynamism. Two (rank median and rank width) seek to measure how expression varies from microarray to microarray directly. The third is experimental variation ratio (EVR), which is used to measure expression variation across group-vs.-group comparisons, such as comparisons of treated and control samples. To enable these kinds of analyses, we have developed a data repository using all the microarray data in GEO (3), mapping the probes used on the microarrays to EntrezGene identifiers (8). To compute the rank median and rank width summary statistics, we focus on the single most commonly used microarray platform for each of three species (human, mouse, and rat) to enable a direct platform-to-platform comparison of dynamics. The rank median is the median of the rank of expression of a gene across every microarray sample. The rank width is the difference between the first and third quartiles of the expression ranks (i.e., interquartile range). Note that these nonparametric measures are measures of dynamics across many microarrays and are not being used to identify differentially expressed genes between experimental sets, as there are already nonparametric methods which have been developed to address this problem including rank products (5) and rank difference analysis (31).

To compute the EVR, we use annotated group-vs.-group microarray comparisons from GEO. In a number of GEO datasets, microarray samples are grouped into subsets that are annotated with terms describing their experimental characteristics, such as “mutant” or “wild type.” This allows for group-vs.-group comparison by pooling all the microarrays of one subset and comparing with the pooled microarrays in another subset. The group-vs.-group comparisons in these datasets range over a number of different experimental variables such as tissue type, chemical treatment, age, infection status, and/or presence/absence of a mutation. These subsets represent an extensive exploration of the space of factors that can influence gene expression and give rise to the dynamic range of variation. We analyzed all these group-vs.-group comparisons in a consistent, automated manner to develop lists of significantly differentially expressed genes within each dataset. We then define the EVR as the percent of experiments (datasets with group-vs.-group comparisons) showing significant changes in expression for a gene between conditions. These three measures are explained in more detail in methods and shown schematically in Fig. 1.

Fig. 1.

Fig. 1.

This is a schematic showing how the three summary statistics are calculated. Expression measurements are ranked within each microarray (A) and then distribution of ranks (B) is used to calculate a median, rank median, and a width between the 1st and 4th quartiles, rank width. The ratio of differential expression measurements and total measurements (C) provides the experimental variation ratio (EVR). Because the rank median and rank width depend on the number of probes measured, they are further scaled to reach a maximum of 1.0 for comparison purposes.

Here, enabled by our three statistics and our comprehensive collection of genes and how they change across the 1,613 group-vs.-group experimental datasets, we asked four questions to characterize genome-wide dynamism. First, are genes that demonstrate a wide range of expression level more likely to be found significantly differentially expressed between sample groups (e.g., treated vs. control) in an experiment. Rephrased, the question is, are the two measures of variation, rank width and EVR, equivalent. A second question is whether the level of dynamism for a gene is conserved between organisms. Our third question is what are the known functional properties of those genes that show extreme values in our dynamism measures. Finally, we asked whether can we determine what genetic signals are responsible for these expression dynamics. We found that our measures of variation are substantially different from one another, although both are conserved across species, and that genes that have different dynamic properties are strongly associated with different functional categories. There is also strong indication that the dynamic properties of a gene are regulated by the range and variety of transcription factor sites associated with it, and that genes with differing levels of dynamics are under different selective pressures. In addition, as a potentially important result, for every gene, we now have an empirical prior expectation of dynamics as an additional feature to consider when analyzing microarray results.

METHODS

Microarray datasets were taken from those publicly available as part of the GEO (3). All mapping from microarray level measurements to gene identifiers was performed using AILUN (8), and all mapping of orthologs from gene identifiers was done using Homologene (45).

Here, we focus on three nonparametric measures of mRNA expression and variation: the rank median, the rank width, and the EVR. The first statistic, rank median, is the median value of the rank in measured expression within each sample for a particular microarray platform. The expression value of each gene in a sample is ranked and then the distribution of those ranks can be investigated as depicted Fig. 1, A and B, where the median represents the “typical” rank of expression and the rank width is the width between the first and third quartiles (where 50% of the data lies) of the ranks (ranked within an experiment) of expression across all samples of a particular microarray platform. We use ranks because of the large range of variation in measured expression and the fact that the presence of extreme outliers can strongly influence parametric statistics (such as standard deviation and mean). For example, if an mRNA A was measured in four identical samples in which 1,000 genes were measured in parallel, and transcript A had the 20th lowest expression in the first sample, and the 10th, 600th, and 30th rank in the other three, it would have a rank median of 25 (the average between 20 and 30 because the set has an even number) and a rank width of 10 (from 30–20). We then rescale these measures (rank median and rank width) to range from 0.0 to 1.0 because the actual ranks depend on the number of genes measured on each platform and do not allow easy comparison between platforms. Note that rank median reports a typical relative value of expression. The actual raw expression values might have varied by several orders of magnitude. The particular choice of microarray platform type to use for this analysis was based on relative abundance of data in GEO.

The third measure, the EVR, is the number of experiments (GEO datasets, GDS) in which a gene was detected to have a difference in expression between any two experimental subsets at a false discovery rate (FDR) <1%, divided by the total number of experiments in which the gene's expression was measured. In other words, it is the proportion of microarray experiments that showed a significant difference in expression in a gene between two different sample types (e.g., experimental conditions) as shown in Fig. 1C. Significance analysis of microarrays (42) was performed between each pair of experimental subsets of the dataset. Any measured difference in expression of a gene, as identified by an Entrez Gene identifier, in any subset with an FDR <1% between any subset pair was taken as an experimental measurement of variation in that dataset. The ratio is the count of differential measurements of expression over the number of datasets in which that gene was measured. Genes measured in less than half the total datasets (e.g., in <333 datasets for the human experiments) were excluded from further analysis because of their insufficient representation.

The human EVR was calculated by searching for all annotated datasets (GDS) that were indicated to have been performed on a human sample on any human-specific microarray platform; the process was repeated for mouse and rat (not specific platforms as was used to calculate the previous two summary statistics). Human, mouse, and rat datasets were chosen based on the abundance of data available in GEO. The summary statistics were averaged to merge to Homologene identifiers within each organism to provide a point to compare orthology.

The analysis of gene set enrichment was performed on the ranked lists obtained from the from the three summary statistics for the human genes mapped to their gene symbols from their Entrez Gene identifiers using the Gene Set Enrichment Analysis (GSEA) Java-based tool and the MSigDB (38). The functional analysis of the four gene sets was performed using PANTHER (40) in contrast with the other genes studied as a background for functional annotation. Known disease associated genes were taken by combining the gene lists from OMIM, the Human Gene Mutation Database, and the Genetic Association Database and mapped to Entrez Gene Identifiers.

Gene conservation scores were computed under the assumption of Dollo parsimony (15) using phylogenetic profiles obtained from the PhyloPat database (20). Phylogenetic profiles in the PhyloPat database are constructed using gene ortholog information computed by the Ensembl compara database. We computed a degree-conservation score, conservation breadth, the proportion of species in which orthologs for the human gene were present. Under this scheme, scores ranged from 1.0 ≥ x > 0.0 where a fully conserved (omnipresent) gene was given a degree-conservation score of 1.0.

Transcription factor annotations were taken from the molecular signatures database (MSigDB) (38). For each gene we summed the total number of transcription factor annotations, with each annotation corresponding to a particular transcription factor or motif. All statistical analyses were done using the R statistical programming language (34).

RESULTS

We calculated three measures to summarize dynamism: experimental variation, along with rank width and rank median. Note that the 1,613 total human, mouse, and rat GEO datasets (29,588 microarrays) with annotated subsets enabling group-vs.-group comparison derive from a multitude of different array platforms, while many of the 22,053 microarray samples from the Affymetrix HG-U133A, MG-U74A, and RG-U344 platforms used to compute rank median and rank width are not part of a dataset with annotated subsets, so the two datasets used to compute the dynamics statistics, while highly overlapping, take advantage of additional data not in the intersection. Table 1 gives details on the numbers of microarrays included in the different analyses. The full data are available as supplementary material1 or from the lab website.2

Table 1.

The principal datasets compared in this paper and the Pearson correlation coefficient (with P values) between our three summary statistics for variation within each species

Human Mouse Rat
GEO datasets, n used to compute EVR 666 731 216
Arrays in datasets, n 14,979 10,128 4,481
Platform used to compute rank median and rank width GPL96, Affymetrix set HG-U133A GPL81, Affymetrix set MG-U74A GPL85, Affymetrix set RG-U34A
Arrays of platform, n 14,738 4,733 2,582
EVR vs. rank median cor = 0.649, P < 10−15 cor = 0.627, P < 10−15 cor = 0.592, P < 10−15
EVR vs. rank width cor = 0.0062, P < 10−15 cor = 0.107, P < 10−15 cor = 0.104, P = 6.3 × 10−11
Rank median vs. rank width cor = −0.312, P < 10−15 cor = −0.277, P < 10−15 cor = −0.217, P < 10−15
Genes, n 13,802 5,086 9,417

Experimental variation ratio (EVR) is the relative number of Gene Expression Omnibus (GEO) datasets showing significant variation of expression between experimental conditions. Rank median and rank width measure the average and range of variation in absolute measured expression, respectively. Note the similar relationships (value of correlation coefficients) in the correlation between measures across species.

The distribution of EVR, our measure for the percentage of datasets showing significant variation between subsets, showed a minimum of 0.028, a mean of 0.134 and a maximum of 0.263, and was surprisingly normally distributed across the genome as can be seen in Fig. 2. Formally, the distribution is relatively mesokurtic (kurtosis = −0.54) and symmetric (skewness = −0.013). This means that the distribution has marked symmetry around the mean and is neither extremely peaked, nor does it show spread with “fat tails.” In other words, we now have a prior expectation that any given gene in the genome will show a significant change in expression level in 13.4% of gene expression experiments.

Fig. 2.

Fig. 2.

The 3 panels along the diagonal of the figure show histograms of the distribution of these 3 test statistics for the 13,802 genes measured in the human microarray datasets. The 3 panels above the diagonal are scatter plots showing the pairwise relationships between these 3 summary statistics. The points in the upper and lower quartiles of the EVR and rank width summary statistics are shown in distinct colors (blue, green, orange, and red). The bottom panels show the value of the Pearson correlation between the summary statistics, with the font size scaled to the correlation coefficient. Correlation coefficients are highly significant (P <10−15). The crescent-shaped structure apparent in the relationship between rank median and rank width is interpreted to show that genes with consistently low or high average expression do not change in their relative expression position very much and stay at the top of the bottom of the ranking. In some sense they have “less room to move” relative to the other genes, as they can only change rank in one direction.

The rank median and rank width were calculated across the 22,283 genes measured on all 14,738 microarrays in GEO using the Affymetrix HG-U133A microarray with the results summarized in Table 1 and show graphically in Fig. 2, with these statistics rescaled to range from 0.0 to 1.0 to aid comparison across species and platforms. We find that the EVR is highly correlated with the rank median but is uncorrelated with the rank width, indicating that EVR and rank width are independent measures of dynamics. We also find that the rank width is slightly negatively correlated with the rank median, demonstrating that highly expressed genes vary less in absolute expression level. The very strong correlation between EVR and rank median may partly be attributed to the relatively poor ability for microarrays to resolve differences in expression at low levels of expression. The full table of genes with associated measures of variation is available in the Supplementary Materials, while selected example genes and their measures of dynamism are shown in Table 2.

Table 2.

Example genes with extreme levels of variation

Gene EVR Rank Median Rank Width Name
RPL41 0.132 1.000 0.001 ribosomal protein L41
RPL23A 0.171 1.000 0.001 ribosomal protein L23a
EEF1A1 0.211 0.999 0.002 eukaryotic translation elongation factor 1 α1
IFNA8 0.037 0.023 0.059 interferon, α8
HNRPA3 0.250 0.841 0.132 heterogeneous nuclear ribonucleoprotein A3
SAT1 0.248 0.784 0.137 solute carrier family 26 (sulfate transporter), member 1
NEU2 0.034 0.133 0.163 sialidase 2 (cytosolic sialidase)
ILF3 0.238 0.767 0.198 interleukin enhancer binding factor 3
THRA 0.242 0.371 0.201 thyroid hormone receptor-α
MGP 0.157 0.512 0.883 matrix Gla protein
LUM 0.136 0.244 0.982 lumican
KRT19 0.138 0.134 1.000 keratin 19
Expected 0.134 0.505 0.227 mean (total)

The measures of expression variation for a selection of genes chosen because they exhibit particularly high/low levels of variation. EVR is the relative number of GEO datasets showing significant variation of expression between experimental conditions. Rank median and rank width measure the average and range of variation in absolute measured expression, respectively. For comparison purposes, the average value of these measures for all genes is shown at the bottom.

We show the relationship of these summary statistics to expression variation in individual experiments in Fig. 3. We selected four published gene expression studies to reflect a variety of tissues and experimental conditions: multiple myeloma (1), breast cancer tumors (13), HIV infection in T-cells (21), and ischemic vs. nonischemic cardiomyopathy (24). Figure 3 shows a selection of genes, showing, respectively, high and low EVR and rank width. The distribution of ranked expression across all samples in GEO for one gene in each category is shown as a histogram to the right, with the horizontal axis representing rank expression. Peaks to the right represent higher average rank expression. The genes with high rank width show a greater variation in expression across the four experiments in the heat map, with a correspondingly wider distribution across all of GEO, as can be observed by the greater width of the histograms for CCND1 and MMP3 compared with those for HMGB2 and C19orf62. The genes with a higher EVR tend to separate the experimental subsets (here differentiated by a different shade on the colored strip at the top of the figure) slightly better than the genes with low EVR. Note that not every gene with high EVR has a different fold change in each experiment, but they are more likely to have a fold difference between experimental conditions than the genes with low EVR.

Fig. 3.

Fig. 3.

The ranked expression for genes in 4 arbitrarily chosen microarray experiments is shown in this heat map. The colored bar at the top differentiates 2 experimental subsets (such as ischemic vs. nonischemic cardiomyopathy) in each experiment, and the colored bar on the left indicates the dynamic group the genes belong to (corresponding to the colors used in Fig. 2). One gene in each of the 4 classes has been selected and a histogram to the right shows the ranked expression of that gene across all experiments in Gene Expression Omnibus. In the histograms, high rank width genes show a broad distribution, reflecting the large amount of absolute variation in expression of these genes. This appears in the heat map as a wide range of expression values across the rows for high rank width genes (red and yellow categories). In contrast, the low rank width genes have a much more sharply peaked distribution of expression in the histograms and tend to have more uniform levels of expression in the heat map. High EVR genes (blue and red class) tend to be significantly differentially expressed between experimental conditions, and we see that genes like IFIT1 and CROP are indeed differentially expressed between some of the conditions.

To examine these measures of dynamism in different species, we also looked at microarrays for mouse and rat samples. Not only are the relationships between these three summary statistics consistent between the three different organisms investigated (Table 1), but the value of these summary statistics and the general properties of expression variation are conserved in the same genes across species. This is particularly true for the EVR. By mapping orthologous genes, we unexpectedly see very strong correlation between the same genes (Fig. 4), thus implying conservation of these measures of variation across species. In other words, the genes that often vary in expression between experimental conditions are conserved across species. This result is surprising, considering the many different kinds of experiments used to study human samples, compared with experiments on mice and rats.

Fig. 4.

Fig. 4.

These 9 panels show scatter plots of the relationship between the indicated summary statistics for the 2,421 homologous gene shared between each organism measured in the microarray experiments. The Pearson correlation coefficient is shown at the top. All the reported correlation values are highly statistically significant, P < 10−15. We can see that orthologous genes show very high conservation of these variation properties (rank width, rank median, and EVR) across all 3 species.

An important functional relationship can be seen by examining the value of these statistics for known human disease-related genes as obtained from OMIM (19), The Human Gene Mutation Database (10), and the Genetic Association Database (4) as shown in Table 3. We observe a weak but statistically significant relationship between these summary statistics and disease association. Although the effect size is small, these results suggest that disease-related genes are more likely to show changes across experimental conditions (higher EVR), lower overall expression levels (rank median), but greater overall variation in measured expression (higher rank width). This finding is particularly surprising, as in general high EVR tends to be correlated with low rank median, the opposite of the relationship shown here for disease-related genes. Curiously, the same pattern was seen for the measures of dynamism of drug target genes, using a list of all the drug target genes from DrugBank (46).

Table 3.

Average value of summary statistics for disease and nondisease-related genes along with calculated P value from t-test comparing these values between disease and nondisease-related genes

n Mean EVR Mean Rank Median Mean Rank Width
Disease-related genes 3,025 0.136 0.461 0.247
Nondisease-related genes 9,513 0.134 0.526 0.224
P value from t-test 0.00794 2.2 × 10−16 2.2 × 10−16
Drug target genes 1,471 0.141 0.496 0.240
Nondrug target genes 11,067 0.133 0.512 0.228
P value from t-test 1.44 × 10−11 0.019 0.00037

Disease-related genes tend to have slightly less than average absolute expression (rank median), while greater than average variation in expression (rank width and EVR). The column ‘n’ indicates the number of genes in each category. A similar pattern is shown for genes that are known drug targets.

We also considered the functional annotations associated with dynamism. We created ranked lists of genes for each of our three statistics from the human samples and then identified functional properties enriched for high and low values of these summary statistics using GSEA (38). Genes ranked by high EVR were very strongly (FDR < 0.0001) associated with the cell cycle (including the p53 pathway and the G1-to-S transition) and mitosis, the mitochondrion and oxidative phosphorylation, mRNA processing, the proteasome and degradation, and the ribosome. In other words, genes associated with these biological processes and cellular locations are the most likely to change across experiments. Additionally, high-EVR genes were strongly associated with predicted transcription factor binding sites for members of the E2F family (FDR < 0.0001). The E2F family is a group of associated transcription factors heavily involved in the G1-to-S transition mammalian cells. Family members form heterodimers with TFDP1 and TFDP2 and can be inactivated in complex with Rb1. The E2F/DP complexes are known to act as transcriptional activators and repressors and are heavily involved in regulating the cell cycle (41).

Genes with low EVR were highly enriched for neurotransmitter receptors and voltage-gated ion channels, suggesting a strong association with genes only expressed in specific (likely neural) tissue. Low EVR genes were also associated with the presence of RE1-silencing transcription factor (REST or NRSF) binding sites in their promoter regions (FDR < 0.01). As REST works to silence neuronal genes in nonneuronal tissues, this is very much in accord with the characterization of these low variation genes as being neuron-specific.

Several important pathways and cellular subunits had significantly high median rank expression including constituents of the ribosome, the electron transport chain, the subunits of the proteasome, and the tRNA synthetases. Also enriched for high median rank were multiple subunits of the eukaryotic translation initiation factors. Genes associated with cell-cell signaling, the ion transporters (including voltage gated ion channels), and collagen all showed significantly low median rank expression, along with genes associated with developmental processes such as muscle cell differentiation. The interpretation of these findings is that genes involved in signaling, particularly in the brain and in development, along with genes responsible for producing the extracellular matrix are expressed at low levels in most samples.

In genes with very low rank width many other subunits and pathways were significantly overenriched, including the proteasome, the tRNA synthetases, mRNA processing, the mitochondrion and ATP synthesis. Conversely, high rank width was associated with genes responsible for the immune response including the complement cascade, general cell communication and adhesion pathways, and the extracellular matrix.

Further insight can be gained by looking at those specific genes with high or low EVR and rank width. By selecting those genes in the upper and lower ranges of each (Fig. 2 and Table 4), we searched for overenrichment of functional annotations, e.g., Gene Ontology (GO) codes, associated with these genes. The genes showing high EVR and high rank width (red in Fig. 2) are associated with large-scale responses such as remodeling of the extracellular space, inflammation, cell migration, and apoptosis. This group is also enriched for cancer association annotations. The opposing group, with low EVR and low rank width (green) is associated with signaling in specialized instances such as between neurons or in the endocrine system. In general these genes show very low expression overall (rank median), as they are only expressed under limited conditions. The genes with high EVR and low rank width (blue) reflect a different type of dynamism. This group is highly enriched for genes responsible for turning DNA into protein product, a relatively nonspecific response between treatment and control. This represents a group of genes that are often differentially expressed under different conditions (high EVR) but may merely represent a generic change in the protein production capacity of the cell. The specifics of which other protein products are changing expression is what differentiates one response from another, in contrast with this more general set of genes. Finally, the fourth category of investigation (orange) shows a broad involvement with cell signaling for immunity.

Table 4.

Select functional annotations showing positive enrichment for the genes colored in Fig. 2, the four quartiles of the EVR vs. rank width plot, relative to the other genes studied

High EVR, High Rank Width, Red, 828 Genes
Low EVR, Low Rank Width, Green, 582 Genes
High EVR, Low Rank Width, Blue, 1,085 Genes
Low EVR, High Rank Width, Orange, 542 Genes
Function Bonferroni corrected P value Function Bonferroni corrected P value Function Bonferroni corrected P value Function Bonferroni corrected P value
Cell cycle 1.03E-11 sensory perception 0.000001 protein biosynthesis 1.78E-22 receptor 0.000016
Cell structure and motility 1.11E-08 G protein-coupled receptor 0.000 protein metabolism and modification 1.62E-19 defense and immunity protein 0.000062
Oncogenesis 0.000001 ion channel 0.000047 ribosomal protein 3.58E-12 natural killer cell-mediated immunity 0.008
Immunity and defense 0.000001 G protein-mediated signaling 0.000071 nucleic acid binding 1.50E-10 immunoglobulin receptor family member 0.018
Signal transduction 0.000003 peptide hormone 0.000084 oxidative phosphorylation 3.95E-09 immunity and defense 0.024
Cytoskeletal protein 0.000003 homeobox transcription factor 0.001090 ubiquitin proteasome pathway 2.35E-08
p53 pathway 0.000430 cell surface receptor mediated signal transduction 0.002 chaperone 1.81E-07
Cell adhesion 0.0007 chemosensory perception 0.003690 Parkinson disease 5.75E-07
Extracellular matrix 0.001 receptor 0.003710 protein complex assembly 5.96E-07
Mesoderm development 0.0013 ligand-gated ion channel 0.005360 translation factor 8.69E-07
T cell activation 0.012 vision 0.007530 PremRNA processing 1.57E-06
Extracellular matrix glycoprotein 0.014 ion transport 0.011 mRNA splicing 0.000013
Growth factor homeostasis 0.017 signaling molecule 0.022 translation initiation factor 0.000165
Apoptosis 0.026 intracellular protein traffic 0.0000056
Cell motility 0.008 chaperonin 0.006
Actin binding cytoskeletal protein 0.008 rRNA metabolism 0.007
Mitosis 0.0024 aminoacyl-tRNA synthetase 0.013
Cell adhesion molecule 0.0025 Huntington disease 0.020
Protein biosynthesis 0.0034 cell cycle 0.020

The annotations and Bonferroni corrected P values are derived from the PANTHER gene annotations and gene list analysis tools (40). Note that some categories are significantly enriched in both high and low ranges implying that we are capturing a different type of gene characteristic than some of these basic functions.

Looking at the variation measures for 11 genes commonly used as mRNA controls, also called “housekeeping genes” (Table 5), shows that although they have very low rank width and sufficiently high levels of expression (rank median), they have one type of dynamism opposite to what we would want in housekeeping genes. They show greater than average EVR. This means that under varying experimental conditions, they are more likely to be found as significantly differentially expressed more often than other genes. This may be associated with their high levels of expression. With our measures, we can find a better set of candidate housekeeping genes. Table 6 shows such a set of candidate genes, which are only slightly different than the candidate housekeeping genes in terms of rank median, but have much lower EVR. This set of candidate housekeeping genes is much less likely to be found as changing within a gene expression experiment.

Table 5.

Measures of dynamism for several commonly used housekeeping genes

Commonly Used Housekeeping Genes
Gene EVR Rank Median Rank Width Name
ACTB 0.224 0.997 0.006 actin-β
B2M 0.181 0.995 0.007 β2-microglobulin
GAPDH 0.212 0.995 0.009 glyceraldehyde-3-phosphate dehydrogenase
ALDOA 0.170 0.988 0.012 aldolase A, fructose-bisphosphate
LDHA 0.203 0.988 0.012 lactate dehydrogenase A
VIM 0.209 0.990 0.020 vimentin
PGAM1 0.196 0.974 0.024 phosphoglycerate mutase 1
PGK1 0.224 0.813 0.102 phosphoglycerate kinase 1
HPRT1 0.174 0.881 0.122 hypoxanthine phosphoribosyltransferase 1
PFKP 0.184 0.864 0.200 phosphofructokinase, platelet
G6PD 0.132 0.684 0.301 glucose-6-phosphate dehydrogenase
Expected 0.134 0.505 0.227 mean (total)
Housekeeping 0.192 0.925 0.074 mean (housekeeping)

For comparison purposes, the average value of these measures is shown at the bottom both for all genes and for these commonly used housekeeping genes. Note that although they show high rank median and low rank width, they show relatively high EVR.

Table 6.

Suggested housekeeping genes that have high rank median but relatively low EVR

Suggested Housekeeping Genes
Symbol EVR Rank Median Rank Width Name
HMG1L1 0.098 0.861 0.107 high-mobility group box 1-like 1
IGKV1D-13 0.068 0.774 0.34 Immunoglobulin-κ variable 1D-13
C19orf62 0.098 0.817 0.098 chromosome 19 open reading frame 62
HSD17B7 0.098 0.797 0.123 hydroxysteroid (17-β) dehydrogenase 7
PSENEN 0.094 0.774 0.221 presenilin enhancer 2 homolog
BIN3 0.094 0.751 0.14 bridging integrator 3
MGC3196 0.097 0.78 0.16 transmembrane protein 223
KIAA1975 0.099 0.817 0.113 AGAP11
HSD17B7P2 0.072 0.797 0.123 hydroxysteroid (17-β) dehydrogenase 7 pseudogene 2
UBE2NL 0.076 0.805 0.112 ubiquitin-conjugating enzyme E2N-like
GIYD1 0.098 0.81 0.14 GIY-YIG domain containing 1

These genes are expressed at a consistently high level as seen by the rank median, comparable to commonly used housekeeping genes (Table 5); however, they are expected to appear as differentially expressed between experimental conditions roughly half as often (EVR).

Finally, we investigated what caused the differences in expression dynamics by looking at the number of regulatory sites associated with genes. Unfortunately, a comprehensive view of the regulatory sites for the expression of human mRNA or even a complete view of the regulation of mRNA expression of a single gene is, of course, lacking. To cover as many predictions as possible, we took all the annotations from the MSigDB (38). For each gene, we counted the total number of transcription factor annotations associated with that gene. A single count corresponds to a gene linked to a particular motif or transcription factor; multiple links between one particular motif and the same gene would be recorded as a single count, as only distinct annotations from the database are included. Although these annotations are mostly predicted through sequence analysis and many may be redundant and are not independent, they give us good view of the overall putative regulatory picture for each gene. We find that the number of transcription factor annotations associated with each gene ranged from 0 to 218, with a mean of 8.5.

The relationship between the number of regulatory annotations and the EVR is shown in Fig. 5. Note the significant upward trend, where an increased number of regulatory annotations is associated with an increased EVR (Pearson correlation 0.13, P < 10−15). As intuition might suggest, this relationship implies that the dynamics captured in EVR is due to regulation through transcription factors. Genes regulated by a more diverse set of transcription factors are more likely to vary under different conditions than genes with fewer regulators. The rank median is essentially independent of diversity of regulators, as can be seen in Fig. 5 (Pearson correlation −0.002, P = 0.788). This suggests that the absolute level of expression does not depend on the diversity of potential regulators. Finally, there is a slight concave down relationship between the rank width and the number of transcription factors (Fig. 5). This suggests that the variation in absolute expression may be caused by the interplay multiple transcription factor regulators. Expression is a dynamic process that must be balanced. Keeping a gene tuned within a narrower range of expression variation seems to require an increased range of regulators.

Fig. 5.

Fig. 5.

The base-10 logarithm of the number of transcription factor binding annotations is shown on the y-axis (zero counts are excluded), and the horizontal axis is broken into ranges of the EVR and other measures of dynamics in these box and whisker plots. The ranges are such that a gene with an EVR of 0.13, which means the gene varies in expression in 13% of experiments in which it is measured, would be counted in the range of 0.126–0.146. The vertical height of the box corresponds to the range of the central 50% of the data, with the whiskers and outlying points covering the rest. The horizontal width of the boxes scales with the square root of the number of genes in that group. Note that an increased number of different transcription factor annotations corresponds to a greater EVR, suggesting that the variation in expression under different conditions can be explained by regulation by a greater range of regulators. Note also that there is a slight downward concavity in the number of transcription factors associated with the rank width.

Although the variety of transcription factor regulation may be able to explain some of the mechanism of expression dynamics using these measures, an important question is the evolutionary associations with dynamics. Previous studies have shown that highly expressed genes are more highly conserved (39, 48) and that genes that are expressed in a wider range of tissue types have increase conservation (30, 43). We found that there was a slight, but highly significant negative association between conservation breadth, a measure of how widely a gene is conserved across species, and the rank width (correlation = −0.120, P < 2 × 10−15), which is in accord with those results (Fig. 6). This indicates that the absolute expression level of highly conserved genes is more stable than less conserved genes. Also in accord with previous studies, there was a positive association between conservation breadth and rank median (correlation = 0.176, P < 2 × 10−15), showing that highly expressed genes are more conserved. However, as might be expected from the strong correlation between rank median and EVR, there was also a very similar positive association between EVR and the conservation breadth (correlation = 0.175, P < 2 × 10−15), suggesting that there may also be selective pressure for the ability to dynamically alter expression in response to differing conditions, as genes that tend to vary under more differing conditions tend to be more highly conserved.

Fig. 6.

Fig. 6.

Conservation breadth and dynamics. The relationship between conservation breadth, a measure of how widely conserved a gene is, and the dynamics. Increased conservation as measured by conservation breadth is associated with increasing EVR (P < 10−15) and decreasing rank width (P < 10−15), suggesting that highly conserved genes are more likely to vary under different conditions yet overall have a more stable level of absolute expression.

DISCUSSION

In this work, we explored the space of genes and how they change across thousands of microarray measurements in hundreds of experiments, by defining three new, nonparametric summary statistics for looking at gene expression dynamics across microarray datasets. We show that these measures of gene expression dynamics are globally conserved across species. We have also shown that the percent of experiments in which a gene is found as differentially expressed, what we term the EVR, is distinct and associated with different functional properties compared with the width of expression level changes, what we term the rank width.

We have also explored the involvement with disease-associated genes and show that there is a statistically significant enrichment of known disease-associated genes in those genes showing more changes across experimental conditions but with lower overall expression levels. Because these measures of variation are significantly associated with known disease associated genes, we suggest that this should be considered when viewing the properties of the genes and diseases in the full human “diseaseome” (17, 47). For example, disease genes that are highly dynamic in expression may be functionally distinct from those with little dynamism, and this can provide clues to the underlying molecular pathophysiology and suggest targets for sequencing based on these differences. As much of our information about gene-disease comes from model organisms, it is an important result that we can see such conservation in dynamism between species both very close (mouse-rat) and far (human-rodent) in evolutionary distance. This conservation also suggests that we can use these measures to identify improved housekeeping genes that may be used in different organisms that show low levels of dynamism.

The strong correlation between rank median and EVR is worthy of further study as additional array data continues to be released. If this correlation is strongly influenced by the poor ability of microarrays to discriminate concerted variation in expression for genes at low levels of expression, then this is an important finding to consider when interpreting any microarray experiment, particularly any meta-analysis across multiple expression experiments. Improvements in measurement modalities such as sequencing of mRNA may be able to address this advantage.

A significant point is that although we found particular functional categories associated with extreme values of our measures of dynamism, none of the existing functional categories adequately cover our measures of dynamism. For example, “immunity and defense” is significantly associated with both high and low EVR for genes with high rank width, while “cell cycle” is associated with high and low rank width for genes with high EVR, suggesting that our measures of dynamism are capturing orthogonal information. Indeed, functional categories such as these are derived from the GO (2), originally designed for comparative genetics and not specifically for the analysis of gene expression changes by microarray. Although the molecular signatures present in the MSigDB (38) are derived from differential expression measurements, they are confined to individual experiments and not global measures like our measures of dynamism.

The utility of this type of analysis extends beyond looking at the functional properties of genes showing these broad categories of variation. These measures of variation essentially serve as a natural source of prior expectation as to whether a gene is changing within an experiment. If a gene is known to have low dynamism but is seen as differentially expressed in a new experiment, the significance of that finding can now be enhanced. We suggest that looking at the measures of dynamism (and expected variation) is an alternate way to analyze the lists of significantly varying genes in a microarray experiment. In addition, measures of dynamism can serve as an improved method for the selection of housekeeping genes, another area for future investigation.

An alternate definition of housekeeping genes based on variation between differing conditions, rather than variation in the absolute levels of expression, enables new ways to compare sets. Although we are just becoming able to measure transcriptional variation at the level of individual cells for many genes, already there is indication that expression is highly variable between cells, even for GAPDH (44), which has traditionally been one of the most important genes for normalization.

In addition to basic descriptions of these kinds of dynamics, we have an explanation for some of the mechanism for changes in EVR, as we can see that the EVR depends on the number of different transcription factors that regulate a given gene. Genes with many different types of regulators can respond differently to a greater range of conditions and have a correspondingly higher EVR. The dynamics of expression under different conditions depends on the variety of regulators associated with a given gene. The fact that rank median is independent of the number of transcription factor binding sites is important for two reasons. First, it shows that although EVR and rank median are highly correlated, the dynamics of variation in response to different conditions is controlled by the variety of regulators. Secondly, it suggests that the basic level of expression of each gene is unrelated to the variety of transcription factors that might regulate its expression. Finally, the relationship between rank width and the transcription factor sites implies that expression dynamics are such that it requires the greatest amount of regulation to maintain a moderate amount of variation, but high and low levels of absolute variation alike require reduced amounts of regulation.

Our investigation into the relationship with conservation suggest that conservation is associated with high average expression (rank median), a narrower range of absolute expression variation (rank width), in good accord with previous work, and a new result that increased conservation is associated with a greater response to differing conditions as expressed by the positive association between increased conservation and increased EVR.

Finally, although we are currently very limited in our understanding of the complex relationships between sequence variations and expression variations in disease processes, it is promising and suggestive that these measures of dynamics may be associated with both disease genes and drug targets. The many genome-wide association studies being published demonstrate the explosive growth in the amount of information linking genes and disease. There is also a parallel exponential growth in the amount of expression data available, as well as constant refinement in expression measurement techniques. We anticipate that high-level methods of viewing the integration of that data, such as the ones we use here, will enable the elucidation of translational connections between basic transcriptional control and the causes of human disease.

GRANTS

Supported by the National Library of Medicine (K22 LM-008261 and T15 LM-007033) and the National Institute for General Medical Sciences (R01 GM-079719) and the Lucile Packard Foundation for Children's Health.

DISCLOSURES

No conflicts of interest are declared by the authors.

Supplementary Material

Table S1
tableS1.txt (79KB, txt)
Table S2
tableS2.txt (7KB, txt)

ACKNOWLEDGEMENTS

Special thanks to Rong Chen for assistance in handling the large amounts of microarray data and Alex Skrenchuk for IT assistance.

Footnotes

1

The online version of this article contains supplemental material.

REFERENCES

  • 1. Agnelli L, Bicciato S, Mattioli M, Fabris S, Intini D, Verdelli D, Baldini L, Morabito F, Callea V, Lombardi L, Neri A. Molecular classification of multiple myeloma: a distinct transcriptional profile characterizes patients expressing CCND1 and negative for 14q32 translocations. J Clin Oncol 23: 7296–7306, 2005. [DOI] [PubMed] [Google Scholar]
  • 2. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat Genet 25: 25–29, 2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Edgar R. NCBI GEO: mining tens of millions of expression profiles–database and tools update. Nucleic Acids Res 35: D760–D765, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Becker KG, Barnes KC, Bright TJ, Wang SA. The genetic association database. Nat Genet 36: 431–432, 2004. [DOI] [PubMed] [Google Scholar]
  • 5. Breitling R, Armengaud P, Amtmann A, Herzyk P. Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett 573: 83–92, 2004. [DOI] [PubMed] [Google Scholar]
  • 6. Butte A. The use and analysis of microarray data. Nat Rev Drug Discov 1: 951–960, 2002. [DOI] [PubMed] [Google Scholar]
  • 7. Butte AJ, Dzau VJ, Glueck SB. Further defining housekeeping, or “maintenance,” genes. Focus on “A compendium of gene expression in normal human tissues”. Physiol Genomics 7: 95–96, 2001. [DOI] [PubMed] [Google Scholar]
  • 8. Chen R, Li L, Butte AJ. AILUN: reannotating gene expression data automatically. Nat Meth 4: 879, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Chen R, Morgan AA, Dudley J, Deshpande T, Li L, Kodama K, Chiang AP, Butte AJ. FitSNPs: highly differentially expressed genes are more likely to have variants associated with disease. Genome Biol 9: R170, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Cooper DN, Krawczak M. Human Gene Mutation Database. Hum Genet 98: 629, 1996. [DOI] [PubMed] [Google Scholar]
  • 11. De Jonge HJ, Fehrmann RS, de Bont ES, Hofstra RM, Gerbens F, Kamps WA, de Vries EG, van der Zee AG, te Meerman GJ, ter Elst A. Evidence based selection of housekeeping genes. PLoS ONE 2: e898, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Duret L, Mouchiroud D. Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol Biol Evol 17: 68–74, 2000. [DOI] [PubMed] [Google Scholar]
  • 13. Farmer P, Bonnefoi H, Becette V, Tubiana-Hulin M, Fumoleau P, Larsimont D, Macgrogan G, Bergh J, Cameron D, Goldstein D, Duss S, Nicoulaz AL, Brisken C, Fiche M, Delorenzi M, Iggo R. Identification of molecular apocrine breast tumours by microarray analysis. Oncogene 24: 4660–4671, 2005. [DOI] [PubMed] [Google Scholar]
  • 14. Farre D, Bellora N, Mularoni L, Messeguer X, Alba MM. Housekeeping genes tend to show reduced upstream sequence conservation. Genome Biol 8: R140, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Farris AD, Koelsch G, Pruijn GJ, van Venrooij WJ, Harley JB. Conserved features of Y RNAs revealed by automated phylogenetic secondary structure analysis. Nucleic Acids Res 27: 1070–1078, 1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Gentleman R, Carey VJ, Huber W, Irizarry RA, Dudoit S. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. New York: Springer-Verlag, 2005. [Google Scholar]
  • 17. Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabasi AL. The human disease network. Proc Natl Acad Sci USA 104: 8685–8690, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Goncalves I, Duret L, Mouchiroud D. Nature and structure of human genes that generate retropseudogenes. Genome Res 10: 672–678, 2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 33: D514–D517, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Hulsen T, de Vlieg J, Groenen PM. PhyloPat: phylogenetic pattern analysis of eukaryotic genes. BMC Bioinformatics 7: 398, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Hyrcza MD, Kovacs C, Loutfy M, Halpenny R, Heisler L, Yang S, Wilkins O, Ostrowski M, Der SD. Distinct transcriptional profiles in ex vivo CD4+ and CD8+ T cells are established early in human immunodeficiency virus type 1 infection and are characterized by a chronic interferon response as well as extensive transcriptional changes in CD8+ T cells. J Virol 81: 3477–3486, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4: 249–264, 2003. [DOI] [PubMed] [Google Scholar]
  • 23. King M, Wilson A. Evolution at two levels humans and chimpanzee. Science 188: 107–116, 1975. [DOI] [PubMed] [Google Scholar]
  • 24. Kittleson MM, Minhas KM, Irizarry RA, Ye SQ, Edness G, Breton E, Conte JV, Tomaselli G, Garcia JG, Hare JM. Gene expression analysis of ischemic and nonischemic cardiomyopathy: shared and distinct genes in the development of heart failure. Physiol Genomics 21: 299–307, 2005. [DOI] [PubMed] [Google Scholar]
  • 25. Kouadjo KE, Nishida Y, Cadrin-Girard JF, Yoshioka M, St-Amand J. Housekeeping and tissue-specific genes in mouse tissues. BMC Genomics 8: 127, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Lawson MJ, Zhang L. Housekeeping and tissue-specific genes differ in simple sequence repeats in the 5′-UTR region. Gene 407: 54–62, 2008. [DOI] [PubMed] [Google Scholar]
  • 27. Lee PD, Sladek R, Greenwood CM, Hudson TJ. Control genes and variability: absence of ubiquitous reference transcripts in diverse mammalian expression studies. Genome Res 12: 292–297, 2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Lee S, Jo M, Lee J, Koh SS, Kim S. Identification of novel universal housekeeping genes by statistical analysis of microarray data. J Biochem Mol Biol 40: 226–231, 2007. [DOI] [PubMed] [Google Scholar]
  • 29. Lercher M, Urrutia A, Pavlicek A, Hurst L. A unification of mosaic structures in the human genome. Hum Mol Genet 12: 2411–2415, 2003. [DOI] [PubMed] [Google Scholar]
  • 30. Liao BY, Zhang J. Low rates of expression profile divergence in highly expressed genes and tissue-specific genes during mammalian evolution. Mol Biol Evol 23: 1119–1128, 2006. [DOI] [PubMed] [Google Scholar]
  • 31. Martin DE, Demougin P, Hall MN, Bellis M. Rank Difference Analysis of Microarrays (RDAM), a novel approach to statistical analysis of microarray expression profiling data. BMC Bioinformatics 5: 148, 2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Ohno S. An argument for the genetic simplicity of man and other mammals. J Hum Evol 1: 651–662, 1972. [Google Scholar]
  • 33. Parmigiani G, Garrett ES, Irizarry RA, Zeger SL. The Analysis of Gene Expression Data: Methods and Software. New York: Springer-Verlag, 2003. [Google Scholar]
  • 34. R Development Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing, 2009. [Google Scholar]
  • 35. Semon M, Mouchiroud D, Duret L. Relationship between gene expression and GC-content in mammals: statistical significance and biological relevance. Hum Mol Genet 14: 421–427, 2005. [DOI] [PubMed] [Google Scholar]
  • 36. Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, de Longueville F, Kawasaki ES, Lee KY, Luo Y, Sun YA, Willey JC, Setterquist RA, Fischer GM, Tong W, Dragan YP, Dix DJ, Frueh FW, Goodsaid FM, Herman D, Jensen RV, Johnson CD, Lobenhofer EK, Puri RK, Schrf U, Thierry-Mieg J, Wang C, Wilson M, Wolber PK, Zhang L, Amur S, Bao W, Barbacioru CC, Lucas AB, Bertholet V, Boysen C, Bromley B, Brown D, Brunner A, Canales R, Cao XM, Cebula TA, Chen JJ, Cheng J, Chu TM, Chudin E, Corson J, Corton JC, Croner LJ, Davies C, Davison TS, Delenstarr G, Deng X, Dorris D, Eklund AC, Fan XH, Fang H, Fulmer-Smentek S, Fuscoe JC, Gallagher K, Ge W, Guo L, Guo X, Hager J, Haje PK, Han J, Han T, Harbottle HC, Harris SC, Hatchwell E, Hauser CA, Hester S, Hong H, Hurban P, Jackson SA, Ji H, Knight CR, Kuo WP, LeClerc JE, Levy S, Li QZ, Liu C, Liu Y, Lombardi MJ, Ma Y, Magnuson SR, Maqsodi B, McDaniel T, Mei N, Myklebost O, Ning B, Novoradovskaya N, Orr MS, Osborn TW, Papallo A, Patterson TA, Perkins RG, Peters EH, Peterson R, Philips KL, Pine PS, Pusztai L, Qian F, Ren H, Rosen M, Rosenzweig BA, Samaha RR, Schena M, Schroth GP, Shchegrova S, Smith DD, Staedtler F, Su Z, Sun H, Szallasi Z, Tezak Z, Thierry-Mieg D, Thompson KL, Tikhonova I, Turpaz Y, Vallanat B, Van C, Walker SJ, Wang SJ, Wang Y, Wolfinger R, Wong A, Wu J, Xiao C, Xie Q, Xu J, Yang W, Zhang L, Zhong S, Zong Y, Slikker W., Jr The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 24: 1151–1161, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci USA 100: 9440–9445, 2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 102: 15545–15550, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Subramanian S, Kumar S. Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome. Genetics 168: 373–381, 2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Thomas PD, Campbell MJ, Kejariwal A, Mi H, Karlak B, Daverman R, Diemer K, Muruganujan A, Narechania A. PANTHER: a library of protein families and subfamilies indexed by function. Genome Res 13: 2129–2141, 2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Trimarchi JM, Lees JA. Sibling rivalry in the E2F family. Nat Rev Mol Cell Biol 3: 11–20, 2002. [DOI] [PubMed] [Google Scholar]
  • 42. Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98: 5116–5121, 2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Vinogradov AE. Compactness of human housekeeping genes: selection for economy or genomic design? Trends Genet 20: 248–253, 2004. [DOI] [PubMed] [Google Scholar]
  • 44. Warren L, Bryder D, Weissman IL, Quake SR. Transcription factor profiling in individual hematopoietic progenitors by digital RT-PCR. Proc Natl Acad Sci USA 103: 17807–17812, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, Geer LY, Kapustin Y, Khovayko O, Landsman D, Lipman DJ, Madden TL, Maglott DR, Ostell J, Miller V, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Sirotkin K, Souvorov A, Starchenko G, Tatusov RL, Tatusova TA, Wagner L, Yaschenko E. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 35: D5–D12, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, Gautam B, Hassanali M. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res 36: D901–D906, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Yildirim MA, Goh KI, Cusick ME, Barabasi AL, Vidal M. Drug-target network. Nat Biotechnol 25: 1119–1126, 2007. [DOI] [PubMed] [Google Scholar]
  • 48. Zhang L, Li WH. Mammalian housekeeping genes evolve more slowly than tissue-specific genes. Mol Biol Evol 21: 236–239, 2004. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Table S1
tableS1.txt (79KB, txt)
Table S2
tableS2.txt (7KB, txt)

Articles from Physiological Genomics are provided here courtesy of American Physiological Society

RESOURCES