Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2003 Sep 1;31(17):5212–5220. doi: 10.1093/nar/gkg699

Isochores and tissue-specificity

Alexander E Vinogradov 1,*
PMCID: PMC212799  PMID: 12930973

Abstract

The housekeeping (ubiquitously expressed) genes in the mammal genome were shown here to be on average slightly GC-richer than tissue-specific genes. Both housekeeping and tissue-specific genes occupy similar ranges of GC content, but the former tend to concentrate in the upper part of the range. In the human genome, tissue-specific genes show two maxima, GC-poor and GC-rich. The strictly tissue-specific human genes tend to concentrate in the GC-poor region; their distribution is left-skewed and thus reciprocal to the distribution of housekeeping genes. The intermediately tissue-specific genes show an intermediate GC content and the right-skewed distribution. Both in the human and mouse, genes specific for some tissues (e.g., parts of the central nervous system) have a higher average GC content than housekeeping genes. Since they are not transcribed in the germ line (in contrast to housekeeping genes), and therefore have a lower probability of inheritable gene conversion, this finding contradicts the biased gene conversion (BGC) explanation for elevated GC content in the heavy isochores of mammal genome. Genes specific for germ-line tissues (ovary, testes) show a low average GC content, which is also in contradiction to the BGC explanation. Both for the total data set and for the most part of tissues taken separately, a weak positive correlation was found between gene GC content and expression level. The fraction of ubiquitously expressed genes is nearly 1.5-fold higher in the mouse than in the human. This suggests that mouse tissues are comparatively less differentiated (on the molecular level), which can be related to a less pronounced isochoric structure of the mouse genome. In each separate tissue (in both species), tissue-specific genes do not form a clear-cut frequency peak (in contrast to housekeeping genes), but constitute a continuum with a gradually increasing degree of tissue-specificity, which probably reflects the path of cell differentiation and/or an independent use of the same protein in several unrelated tissues.

INTRODUCTION

Genomes of warm-blooded vertebrates consist of regions differing in gene concentration and GC content (both of genes and intergenic spacers), called isochores [reviewed (14)]. The origin of gene-rich and GC-rich (heavy) isochores, which appeared lately in the evolution, is explained either in the neutralist sense as a result of mutation bias (59) or biased gene conversion (BGC) (1015), or, in the selectionist interpretations, as an adaptation to elevated temperature (2,16) or active gene transcription (17,18). Recently, the population genetics data based on the single-nucleotide polymorphism were reported which provided evidence against the mutation bias but still remained compatible both with the BGC and selectionist explanations (1921).

The heavy isochores, constituting early replicated and less condensed chromatin, were initially supposed to harbor housekeeping genes, while the light (GC-poor) isochores were assumed to accommodate tissue-specific genes (2225). However, the latter works arrived at the opposite conclusion (26,27). In other work, it was concluded that there is no difference in average GC content between narrowly expressed and widely expressed genes (28). The earlier works (2225) were either based on the limited data sets or used indirect methods for distinction between housekeeping and tissue-specific genes (analysis of association with CpG islands, features of 5′ UTRs and the AUG start codon context). The latter studies (26,27) used frequencies of ESTs sampled in different tissues for identification of housekeeping versus tissue-specific genes, with the data set being >2000 human genes. Since there is a fundamental relation of transcription to recombination and gene conversion (2936), the latter conclusion about housekeeping genes being preferentially accommodated in the GC-poor isochores contradicts the BGC explanation. In contrast to the most part of tissue-specific genes, the housekeeping genes should be actively transcribed in the germ line and thus have a higher probability of inheritable gene conversion (and thereby of elevation of GC content according to the BGC hypothesis). Besides its relevance to the BGC model, the problem of accommodation of housekeeping versus tissue-specific genes in particular isochores is important because the emergence of the isochoric structure is supposed to be an advanced trait of higher vertebrates and therefore might have had a functional meaning (17,18).

Here the GC content of housekeeping versus tissue-specific genes was studied with the much larger sets of human and mouse genes (>7000 and 6000 genes, respectively), using the data of oligonucleotide microarrays (with the same platform for all tissues of a given species), which are superior to the ESTs sampling with regards to standardization of detection of gene expression [reviewed in (37,38)].

MATERIALS AND METHODS

The data of oligonucleotide microarray experiments were extracted from the Gene Expression Atlas (39). The uniform platform for the human was Affimetrix U95A and for the mouse Affimetrix U74A. Only probes which presented the characterized genes (with links to the RefSeq database) were used, with signal from probes corresponding to the same gene being averaged. This makes 7708 human and 6078 mouse genes. Gene sequences were extracted from the RefSeq database. Only data for normal tissues were used, the samples and replicates representing the same tissue were averaged. This makes 32 human and 45 mouse tissues. As was recommended by Su et al. (39), a gene was regarded as expressed if its signal level exceeded a conservative threshold of 200 arbitrary units. The homology between human and mouse genes was established using the HomoloGene database (1791 homologous genes were found, all based on curated homology, i.e. not just automatic sequence matches). The homologous genes were used only in a special part of analysis (comparison of fractions of housekeeping genes in the mouse versus human), in all other analyses the total data sets were used. The statistical analyses were done with the Statgraphics (Statistical Graphics Co.) and Statistica (StatSoft, Inc.) software.

RESULTS

General picture

The bivariate distributions of GC content of third codon position (GC3) versus number of tissues where the gene is expressed are shown for the human and mouse in Figure 1. A similar picture can be seen on the total GC content of coding sequence (CDS) versus tissue number distribution, but GC3 is a better indicator of the isochore where a given gene belongs (1,4). It can be seen that there are two ‘mountain ridges’ (i.e., genes expressed in all tissues and genes expressed only in a few tissues) with a plateau between them that contains genes with an intermediate scope of expression. However, the peak of tissue-specific genes can be seen only in the total picture of genes expressed in all tissues pooled together (Fig. 2A). For each separate tissue, there is only a plateau (Fig. 2B), with only a single exception of testis where the plateau is indeed finished by a small peak (Fig. 2C). Consequently, the peak of tissue-specific genes seen in Figure 2A (and mountain ridge in Fig. 1) was formed by accumulation of strictly tissue-specific genes from all tissues. Thus, while the housekeeping genes constitute a comparatively clear-cut group, the tissue-specific genes in each separate tissue form a continuum with a gradually increasing degree of specificity.

Figure 1.

Figure 1

Bivariate distribution of GC3 versus number of tissues where a given gene is expressed. (A) Human, (B) mouse.

Figure 2.

Figure 2

Histograms of human genes expressed in different numbers of tissues. (A) Cumulative for all tissues (as in Fig. 1), (B) genes expressed in the spleen (a picture typical for all tissues except testis), (C) genes expressed in the testis.

Both housekeeping and tissue-specific genes occupy the same range of GC content (Fig. 1). However, while the housekeeping genes tend to concentrate in the upper part of this range, the tissue-specific genes in the human genome show two maxima, GC-poor and GC-rich (Fig. 1A). In the strictly tissue-specific human genes, the GC-poor maximum is greater, therefore the overall distribution is right-skewed with regards to GC content and thus asymmetric to the left-skewed distribution of housekeeping genes (Fig. 3A). The distribution of intermediately tissue-specific genes is left-skewed (Fig. 3B). With the increase of tissue-specificity, the mean and median GC content is decreasing (Fig. 3C). In the mouse genome, the distribution is left-skewed for all types of genes (Fig. 4A and B), but a similar decrease in the mean and median GC content with the increase of tissue-specificity was observed (Fig. 4C).

Figure 3.

Figure 3

Comparison of GC3 of genes expressed in different number of tissues in the human genome. (A) Histograms of housekeeping versus strictly tissue-specific genes, (B) histograms of housekeeping versus intermediately tissue-specific genes, (C) means with least significant differences (LSD) intervals. Medians of GC3 of housekeeping, intermediately and strictly tissue-specific genes also differ significantly (68.3, 63.1, 55.9%; Kruskall–Wallis, P < 10–12). Standardized skewness indicates that the distributions of housekeeping and intermediately tissue-specific genes are significantly right-skewed (–2.5, P < 0.01; and –4.9, P < 10–4, respectively), whereas the distribution of strictly tissue-specific genes is left-skewed (4.3, P < 10–4).

Figure 4.

Figure 4

Comparison of GC3 of genes expressed in different numbers of tissues in the mouse genome. (A) Histograms of housekeeping versus strictly tissue-specific genes, (B) histograms of housekeeping versus intermediately tissue-specific genes, (C) means with LSD intervals. Medians of GC3 of housekeeping, intermediately and strictly tissue-specific genes also differ significantly (63.4, 62.1, 60.8%; Kruskall–Wallis, P < 10–4). Standardized skewness indicates that all distributions are significantly right-skewed (–4.3, P < 10–4; –7.4, P < 10–4; and –2.5, P < 0.05, respectively).

In the human, the weak positive correlation was found between GC content and gene expression level averaged for all tissues (and log-transformed): for GC3 (r = 0.20, P < 10–6) (Fig. 5A), GC CDS (r = 0.20, P < 10–6), GC content of untranslated regions, 5′UTR (r = 0.17, P < 10–6) and 3′UTR (r = 0.18, P< 10–6), GC content of intronic sequence (r = 0.25, P < 10–6) and average GC content of adjacent intergenic sequences (r = 0.24, P < 10–6). The correlation coefficient for intronic sequence is significantly higher than for other parts of a gene (for difference, P < 0.01 at least). In the mouse (where intronic and intergenic sequences were not tested), the correlation was valid for GC3 (r = 0.11, P < 10–6) (Fig. 5B), CDS (r = 0.10, P < 10–6), 5′UTR (r = 0.08, P < 10–6) and 3′UTR (r = 0.09, P < 10–6).

Figure 5.

Figure 5

The regression of (log-transformed) gene expression level averaged for all tissues on GC3. (A) Human (r = 0.20, P < 10–6), (B) mouse (r = 0.11, P < 10–6). Dashed lines, confidence limits; dotted lines, prediction limits.

Genes specific for separate tissues

To investigate genes that are specific for separate tissues, the various cut-off values of a number of tissues where a given gene is expressed were tested. As was already noted, for each separate tissue there is no clear-cut peak of tissue-specific genes (Fig. 3B), which makes establishing the cut-off value an arbitrary procedure. The results with a cut-off value equal to one-quarter of a total number of tissues in a given data set are shown in Tables 1 and 2. The results obtained with a lower number of tissues were similar but in the case of some tissues the number of tissue-specific genes was rather small, which would incur high statistical noise. The main result is that in certain tissues the tissue-specific genes are on average GC-richer than housekeeping genes, in others they have a similar GC content, while in others still they are GC-poorer (Tables 1 and 2). In both species, among tissues in which tissue-specific genes are GC-richer than housekeeping genes there are parts of the central nervous system, and among those in which tissue-specific genes are GC-poorer are germ-line tissues.

Table 1. The mean GC3 of tissue-specific human genes (expressed in a given tissue and in less than one-quarter of a total number of tissues), and coefficients of correlation between parameters of gene GC content (GC CDS, GC3, intronic GC) and the log-transformed level of gene expression in a given tissue (averaged for all genes expressed in a given tissue).

Tissue Mean GC3 Correlation coefficients
    GC CDS GC3 Intronic GC
Spleen 71.45> 0.10 0.16 0.20
Whole brain 69.32> 0.06 0.12 0.12
Placenta 69.12> 0.10 0.13 0.15
Heart 68.98> 0.06 0.13 0.16
Cerebellum 65.46∼ 0.09 0.15 0.15
Cortex 64.68< 0.06 0.11 0.10
Pancreas 64.43< 0.05 0.10 0.10
Trachea 64.35∼ 0.06 0.11 0.14
Liver 64.04< 0.07 0.12 0.10
Fetal liver 64.04< NS 0.08 0.08
Kidney 63.91< 0.07 0.12 0.11
Salivary gland 63.51< NS 0.05 0.08
Lung 62.26< 0.08 0.14 0.14
Uterus 62.22< 0.10 0.14 0.14
CD34+Thy+ hematopoietic stem cells 61.28< 0.00 0.04 0.06
Amygdala 60.84< NS 0.09 0.09
Caudate nucleus 60.42< 0.07 0.11 0.11
Whole blood 60.33< NS 0.06 0.09
Thalamus 59.06< NS 0.08 0.06
CD34+Thy– progenitor cells 58.60< NS NS NS
Dorsal root ganglia 57.91< 0.06 0.10 0.10
Thymus 57.48< 0.04 0.07 0.08
Thyroid 57.38< 0.04 0.09 0.11
Pituitary gland 56.87< NS 0.07 0.09
Spinal cord 56.46< 0.07 0.09 0.10
Corpus callosum 56.04< NS 0.07 0.06
Adrenal gland 55.82< 0.04 0.09 0.10
Prostate 55.37< 0.05 0.09 0.11
Ovary 54.87< 0.09 0.11 0.17
Fetal brain 54.52< NS NS NS
Umbilical vein endothelial cells 54.36< NS 0.06 0.07
Testis 51.82< NS 0.05 0.09

Mean GC3: >, higher than in housekeeping genes; ∼, not significantly different; <, lower than in housekeeping genes.

Correlation coefficients: r ≥ 0.09, P < 10–4; r ≥ 0.07, P < 10–3; r≥ 0.06, P < 0.01; r ≥ 0.05, P < 0.05; NS, not significant.

Table 2. The mean GC3 of tissue-specific mouse genes (expressed in a given tissue and in less than one-quarter of a total number of tissues), and coefficients of correlation between parameters of gene GC content (GC CDS, GC3) and the log-transformed level of gene expression in a given tissue (averaged for all genes expressed in a given tissue).

Tissue Mean GC3 Correlation coefficients
    GC CDS GC3
Hippocampus 64.54> 0.06 0.10
Spinal cord lower 64.39> 0.04 0.08
Snout epidermis 63.10∼ 0.08 0.11
Spinal cord upper 63.02∼ NS 0.08
Frontal cortex 62.57∼ 0.06 0.09
Tongue 62.45∼ 0.07 0.08
Spleen 62.06∼ 0.08 0.09
Umbilical cord 61.88∼ 0.05 0.08
Adrenal gland 61.83∼ 0.04 0.09
Cerebellum 61.72∼ 0.05 0.09
Trigeminal 61.46∼ NS 0.06
Cortex 61.45∼ 0.08 0.12
Eye 61.29∼ 0.05 0.11
Lung 61.15∼ NS 0.06
Bone 61.13∼ NS NS
Stomach 61.13∼ 0.04 0.06
Uterus 60.87∼ 0.05 0.06
Digits 60.84∼ 0.05 0.08
Amygdala 60.58∼ 0.06 0.10
Epidermis 60.54∼ 0.07 0.10
Kidney 60.45∼ NS 0.04
Trachea 60.40∼ 0.05 0.07
Olfactory bulb 60.39∼ 0.06 0.08
Gall bladder 60.34< NS 0.04
Small intestine 60.25< NS 0.04
Thyroid 60.22< NS 0.05
Mammary gland 60.11< NS 0.05
Dorsal root ganglion 59.81< NS 0.07
Bone marrow 59.74< 0.04 0.05
Striatum 59.69< 0.05 0.10
Placenta 59.57< NS 0.07
Brown fat 59.55< 0.04 0.08
Heart 59.55< NS 0.05
Salivary gland 59.50< NS NS
Prostate 59.40< NS NS
Liver 59.12< NS 0.04
Skeletal muscle 59.09< NS 0.05
Lymph node 59.03< 0.04 0.07
Hypothalamus 58.82< 0.06 0.09
Adipose tissue 58.62< 0.05 0.09
Testis 58.22< NS NS
Bladder 57.70< NS 0.05
Thymus 57.44< 0.05 0.06
Large intestine 56.78< NS NS
Ovary 55.44< NS NS

Mean GC3: >, higher than in housekeeping genes; ∼, not significantly different; <, lower than in housekeeping genes.

Correlation coefficients: r ≥ 0.09, P < 10–4; r ≥ 0.07, P < 10–3; r ≥ 0.06, P < 0.01; r ≥ 0.05, P < 0.05; NS, not significant.

In >90% of separate human tissues, there is a weak positive correlation between gene GC3 and expression level in a given tissue (Table 1). It is noteworthy that coefficients of correlation are higher for GC3 as compared to GC CDS (P < 10–8 for pair-wise comparison), and for intronic GC as compared to GC3 (P < 0.01). The correlation between gene GC content and expression level is valid also for >85% of mouse tissues (Table 2); the coefficients of correlation are also higher for GC3 as compared to GC CDS (P < 10–8).

Fractions of housekeeping genes in human versus mouse

The number of genes that are expressed in all tissues is higher in the mouse than in the human (885 versus 539), notwithstanding the fact that a lesser number of mouse genes (6078 versus 7708) and a greater number of mouse tissues (45 versus 32) were presented in the data sets. The fraction of housekeeping genes appeared to be 2-fold higher in the mouse as compared to the human (14.6 versus 7.0%, P < 10–8). A part of this difference can be (conservatively) explained by an assumption that a fraction of housekeeping genes is over-represented in the mouse because of a smaller gene data set (if one assumes that housekeeping genes are on average better studied and therefore have a higher probability to be presented in the data set).

For more strict comparison, only homologous genes (n = 1791) and homologous tissues (n = 20) were taken (Fig. 6). With this data set, the fraction of housekeeping genes was still significantly higher in the mouse than in the human (16.4 versus 11.3%, P < 10–5). The spectrum of housekeeping genes was somewhat different in mouse as compared to human: there were 97 genes expressed in all tissues of both species, 106 genes expressed in all tissues of human but not mouse and 196 genes expressed in all tissues of mouse but not human. It is interesting that these groups of genes show a characteristic pattern of average GC content (Fig. 7). Genes behaving as housekeeping in the human but not mouse (HS+MM–) do not differ in mean (and median) GC3 from genes behaving as housekeeping in both species (HS+MM+), whereas genes behaving as housekeeping in the mouse but not human (HS–MM+) are GC-poorer and do not differ from genes behaving as tissue-specific in both species (HS–MM–) (Fig. 7). This pattern can be seen in the mouse genome as well, although the average difference between housekeeping and tissue-specific genes is lower here because of a lower GC3 of housekeeping genes (Fig. 8).

Figure 6.

Figure 6

Bivariate distribution of GC3 versus number of tissues where a given gene is expressed for the homologous genes/homologous tissues data sets (1791 genes and 20 tissues). (A) Human, (B) mouse.

Figure 7.

Figure 7

The GC3 for different groups of mouse genes from the homologous genes/homologous tissues data set (HS+MM+, genes expressed in all tissues of both species; HS+MM–, genes expressed in all tissues of human but not mouse; HS–MM+, genes expressed in all tissues of mouse but not human; HS–MM–, genes expressed in all tissues in both species). (A) Means with LSD intervals, (B) box-plot. The values within the first and the second pairs of gene groups do not differ significantly while the first pair differ from the second both in parametric and non-parametric (Mann–Whitney and Kruskall–Wallis) tests (P < 10–5).

Figure 8.

Figure 8

The GC3 for different groups of mouse genes from the homologous genes/homologous tissues data set (HS+MM+, genes expressed in all tissues of both species; HS+MM–, genes expressed in all tissues of human but not mouse; HS–MM+, genes expressed in all tissues of mouse but not human; HS–MM–, genes expressed in all tissues in both species). (A) Means with LSD intervals, (B) box-plot. The values within the first and the second pairs of gene groups do not differ significantly while the first pair differ from the second both in parametric and non-parametric (Mann–Whitney and Kruskall–Wallis) tests (P < 10–4).

In the human, tissue-specific genes are more diverse in GC content than housekeeping genes: coefficients of variation of GC3 differ significantly (27.1 versus 20.3%, P < 10–3). In the mouse, where peaks both of tissue-specific and housekeeping genes are narrower and unimodal (Fig. 6A and B), there is only a very small or no difference in coefficients of variation (18.4 versus 16.8%, P < 0.2).

In the human, a gene from the homologous gene data set is expressed on average in 7.2 ± 0.3 tissues (median = 4) from the homologous tissues data set, in the mouse, in 8.3 ± 0.4 tissues (median = 5). This difference is statistically significant (P < 10–4 both for means and in Mann–Whitney test for medians). On average, a HS–MM+ gene is expressed in less than half of human tissues (7.8 ± 0.6, median = 5, interquartile range 2–15), which excludes the possibility that a lower fraction of human housekeeping genes was due to an underestimation of some low-expressed housekeeping genes just in a couple of tissues because of, for instance, some variation in experimental conditions. To the same argument, if instead of the conservative threshold of 200 arbitrary units, a ‘liberal’ 100-unit threshold was taken as a signal of gene expression, the fractions of genes apparently expressed in all tissues increased consistently in both species, preserving nearly the same difference (27.5 versus 19.5%, P < 10–7). It is noteworthy that in this case the pattern of mean (and median) GC3 for the four groups of genes (HS+MM+, HS+MM–, HS–MM+, HS–MM–) was still similar to Figures 7 and 8 (not shown).

DISCUSSION

The dichotomy of housekeeping versus tissue-specific genes, which is frequently featured in the literature, was visualized here by demonstration of the two clear-cut peaks (Figs 1 and 2A). However, in between these peaks there is a considerable plateau of intermediately expressed genes. Furthermore, in each tissue taken separately there are only the peak of housekeeping genes and the continuum of genes with a gradually increasing degree of tissue-specificity, which is only very rarely finished by a tiny peak (Fig. 2B and C). These observations suggest a certain modification to the concept of housekeeping versus tissue-specific genes. The continuum of intermediately expressed genes, seen in the direction from the housekeeping to tissue-specific genes, probably reflects the path of cell differentiation: many tissues of a similar origin share the same genes with an intermediate specificity, while tissue-specific genes sensu stricto are expressed only in a few related tissues. Besides, some tissue-specific genes may be characteristic for different, even unrelated tissues because of a multiple, independent use of the same protein (40).

On average, the housekeeping genes are slightly GC-richer both in the human and mouse, which confirms the conclusion of the earlier works (2225) and contradicts the latter studies (26,27). This finding is in accordance with the BGC explanation for the elevated GC content in the heavy isochores of mammal genome. It is known that transcription is universally associated with (mitotic and meiotic) recombination and gene conversion (2936). Up to a 20-fold increase of recombination and gene conversion rate was reported for actively transcribed genes (29). Because housekeeping genes are transcribed in the germ line (in contrast to the most part of tissue-specific genes), they should be more prone to inheritable gene conversion. (Although mitotic recombination and gene conversion are relevant for somatic tissues as well, changes in the latter are not inheritable.) However, both in the human and mouse, genes specific for some tissues (e.g., parts of the central nervous system) have a higher average GC content than housekeeping genes (Tables 1 and 2). Since they are not transcribed in the germ line (in contrast to housekeeping genes), and therefore have a lower probability of inheritable gene conversion, this finding contradicts the BGC hypothesis. Moreover, genes specific for germ-line tissues (ovary, testes) show a low average GC content, which is also in contradiction to the BGC model. It is noteworthy that although recombination rate in the human genome was first reported to positively correlate with GC content, the other, more significant sequence parameters were found, and after correction for them the correlation with GC content became negative (41). The authors concluded that “regions with the highest recombination rates tend to be those with high CpG fraction but low GC content and poly(A)/poly(T) fraction.” (41, p. 244). Thus, these data also seem to indicate against the BGC model.

Both for the total data set and for the most part of tissues taken separately, a positive correlation was found between gene GC content and expression level both in the human and mouse, which is in accordance with the previous result on the rat and mouse obtained with very limited data sets of 100 and 200 genes, respectively, each on a single tissue (42). It is important to note that correlation of expression level is stronger with GC3 than with GC CDS, and with intronic GC content than with GC3. This allows the exclusion of the possibility of artefactual correlation due to a (putatively) stronger hybridization of GC-rich probe. In the case of such an artefact, the coefficients of correlation would increase in the opposite direction (because GC3 represents only a third part of the probe, while intronic sequence does not participate in hybridization at all). Thus, it is the isochore affiliation (reflected in GC3 and intronic GC) that matters for this correlation, not the hybridization strength. This correlation is rather weak but it should be taken into account that the concentration of genes is ≥20-fold higher in the heavy isochores than in the GC-poor regions (1,43). Thus, both the higher gene concentration and the higher expression of individual genes are to be multiplied in the overall transcription of GC-rich isochores. (Besides, we probably still have very rough estimates of gene expression levels, especially presuming its natural variance in the inducible genes. This variation can only reduce the correlation coefficients. Therefore, the real correlation should be higher.)

The selectionist hypotheses explain the emergence of GC-rich isochores by adaptation to elevated temperature (2,16) or by requirements of transcription (17,18). However, the absence of correlation between habitat temperature and GC content in prokaryotes and ectothermic (cold-blooded) vertebrates strongly undermines the ‘thermal’ hypothesis (4447). Furthermore, it was shown that with elevation of GC content, the thermostability and curvature of DNA molecule in genomes of warm-blooded vertebrates grew slower than in random sequences, whereas the bendability and ability to B-Z transition raised faster (17,18). The bendability is known to relate to open chromatin, and the ability to B-Z transition is closely associated with transcription (4850). In contrast, the curvature is known to facilitate chromatin condensation (48,49,5153). Therefore, it was suggested that compositional heterogeneity of the genome arose in the higher vertebrates not as adaptation to elevated temperature but because of their advanced genomic organization, with physical properties of DNA in the gene-rich regions (heavy isochores) being optimized for active transcription and in the gene-poor regions (light isochores), for chromatin condensation (17,18). This is in agreement with the observation that in the interphase human nuclei, the gene-rich (GC-rich) chromosomal regions display a much more spread-out chromatin conformation as compared to the gene-poor (GC-poor) regions (54). Recently, it was shown that the thermostability of corresponding RNA/DNA and RNA/RNA duplexes in mammal genomes is on average higher than in random sequences and is rising faster with elevation of GC content (55), which also can be relevant to transcription. The data obtained here (the relation of gene GC content to expression level and tissue-specificity) are in agreement with the functional interpretation of the isochoric structure of the genome. Besides relevance to physical properties of DNA molecule (bendability, B-Z transition and curvature), the functional significance of the isochoric structure can be realized through DNA methylation, which is related both to gene expression and GC content. Thus, it is known that methylation of DNA suppresses gene expression (56) and is relatively lower in the GC-rich isochores (57).

The comparison between human and mouse is also in accordance with the functional interpretation of the isochoric structure. The mouse has the less pronounced isochores (5862) and an apparently lower degree of tissue differentiation on the molecular level (reflected in the higher fraction of ubiquitously expressed genes found here). It is well recognized that vertebrate genes underwent multiple duplications both with regards to the whole genome (ancient polyploidization) and to the separate genes (6365). As a result, most genes are presented in families containing multiple paralogs. The duplicate genes often appear to have subdivided the roles of their single-gene ancestors (66). Therefore, it is possible that some paralogs, even performing the maintenance functions, became fine-tuned for separate tissues (at first stage, even without a change in amino acid sequence, only by modification of gene regulation which can be related to isochore affiliation). Genes behaving as housekeeping in the human but not mouse (HS+MM–) have the same average GC3 as unconditionally housekeeping (HS+MM+) genes, whereas the HS–MM+ genes has a lower GC3, not differing from the unconditionally tissue-specific (HS–MM–) genes (Figs 7 and 8). This suggests that a distinction between housekeeping and tissue-specific genes deteriorated in the mouse, which resulted in an increase of the fraction of genes expressed in all tissues. This assumption is supported by the fact that the human/mouse ratio of GC3 (for homologous genes) positively correlated with the human/mouse ratio of numbers of tissues where a given gene is expressed (Spearman r = 0.11, P < 10–4). These findings are in accordance with the reports that mouse blastocysts show a retarded and lowered DNA de novo methylation (67), and that mouse has a deteriorated isochoric structure (6062).

The less pronounced isochoric structure of the mouse genome can be seen already from the comparison of previously mentioned coefficients of variation of GC3 in the housekeeping and tissue-specific genes, both of which are higher in the human. The strictly tissue-specific human genes show a reciprocal to housekeeping gene distribution by GC3 (Fig. 3A and C). There is no such reciprocity in the mouse genome, although tissue-specific genes are still slightly GC-poorer than housekeeping genes (Fig. 4). Generally, the bivariate distribution of GC3 versus number of tissues shows a more simple ‘epigenetic (epigenomic) landscape’ in the mouse as compared to the human (Fig. 6A and B). This is in agreement with a more simple design of r-selected, short-living rodents. In particular, they have less efficient systems of DNA repair (68,69).

Acknowledgments

ACKNOWLEDGEMENTS

I thank two anonymous reviewers for helpful comments. This work was supported by the Russian Foundation for Basic Research (RFBR).

REFERENCES

  • 1.Bernardi G. (2000) Isochores and the evolutionary genomics of vertebrates. Gene, 241, 3–17. [DOI] [PubMed] [Google Scholar]
  • 2.Bernardi G. (2000) The compositional evolution of vertebrate genomes. Gene, 259, 31–43. [DOI] [PubMed] [Google Scholar]
  • 3.Bernardi G. (2001) Misunderstandings about isochores. Part 1. Gene, 276, 3–13. [DOI] [PubMed] [Google Scholar]
  • 4.Eyre-Walker A. and Hurst,L.D. (2001) The evolution of isochores. Nat. Rev. Genet., 2, 549–555. [DOI] [PubMed] [Google Scholar]
  • 5.Suoeka N. (1988) Directional mutation pressure and neutral molecular evolution. Proc. Natl Acad. Sci. USA, 85, 2653–2657. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Wolfe K., Sharp,P.M. and Li,W.H. (1989) Mutation rates differ among regions of the mammalian genome. Nature, 337, 441–456. [DOI] [PubMed] [Google Scholar]
  • 7.Francino M.P. and Ochman,H. (1999) Isochores result from mutation not selection. Nature, 400, 30–31. [DOI] [PubMed] [Google Scholar]
  • 8.Fryxell K.J. and Zuckerkandl,E. (2000) Cytosine deamination plays a primary role in the evolution of mammalian isochores. Mol. Biol. Evol., 17, 1371–1383. [DOI] [PubMed] [Google Scholar]
  • 9.Sueoka N. and Kawanishi,Y. (2000) DNA G+C content of the third codon position and codon usage biases of human genes. Gene, 261, 53–62. [DOI] [PubMed] [Google Scholar]
  • 10.Eyre-Walker A. (1993) Recombination and mammalian genome evolution. Proc. R. Soc. Lond. B, 252, 237–243. [DOI] [PubMed] [Google Scholar]
  • 11.Galtier N., Piganeau,G., Mouchiroud,D. and Duret,L. (2001) GC content evolution in mammalian genomes, the biased gene conversion hypothesis. Genetics, 159, 907–911. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Smith N.G. and Eyre-Walker,A. (2001) Synonymous codon bias is not caused by mutation bias in human. Mol. Biol. Evol., 18, 982–986. [DOI] [PubMed] [Google Scholar]
  • 13.Duret L., Semon,M., Piganeau,G., Mouchiroud,D. and Galtier,N. (2002) Vanishing GC-rich isochores in mammalian genomes. Genetics, 162, 1837–1847. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Lercher M.J., Smith,N.G., Eyre-Walker,A. and Hurst,L.D. (2002) The evolution of isochores. Evidence from snp frequency distributions. Genetics, 162, 1805–1810. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Galtier N. (2003) Gene conversion drives GC content evolution in mammalian histones. Trends Genet., 19, 65–68. [DOI] [PubMed] [Google Scholar]
  • 16.Bernardi G. and Bernardi,G. (1986) Compositional constraints and genome evolution. J. Mol. Evol., 24, 1–11. [DOI] [PubMed] [Google Scholar]
  • 17.Vinogradov A.E. (2001) Bendable genes of warm-blooded vertebrates. Mol. Biol. Evol., 18, 2195–2200. [DOI] [PubMed] [Google Scholar]
  • 18.Vinogradov A.E. (2003) DNA helix: the importance of being GC-rich. Nucleic Acids Res., 31, 1838–1844. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Eyre-Walker A. (1999) Evidence of selection on silent site base composition in mammals: potential implications for the evolution of isochores and junk DNA. Genetics, 152, 675–683. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Smith N.G., and Eyre-Walker,A. (2001) Synonymous codon bias is not caused by mutation bias in human. Mol. Biol. Evol., 18, 982–986. [DOI] [PubMed] [Google Scholar]
  • 21.Lercher M.J., Smith,N.G., Eyre-Walker,A. and Hurst,L.D. (2002) The evolution of isochores. Evidence from snp frequency distributions. Genetics, 162, 1805–1810. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Mouchiroud D., Fichant,G. and Bernardi,G. (1987) Compositional compartmentalization and gene composition in the genome of vertebrates. J. Mol. Evol., 26, 198–204. [DOI] [PubMed] [Google Scholar]
  • 23.Bernardi G. (1993) The isochore organization of the human genome and its evolutionary history—a review. Gene, 135, 57–66. [DOI] [PubMed] [Google Scholar]
  • 24.Bernardi G. (1995) The human genome: organization and evolutionary history. Annu. Rev. Genet., 29, 445–476. [DOI] [PubMed] [Google Scholar]
  • 25.Pesole G., Bernardi,G. and Saccone,C. (1999) Isochore specificity of AUG initiator context of human genes. FEBS Lett., 464, 60–62. [DOI] [PubMed] [Google Scholar]
  • 26.Goncalves I., Duret,L. and Mouchiroud,D. (2000) Nature and structure of human genes that generate retropseudogenes. Genome Res., 10, 672–678. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Ponger L., Duret,L. and Mouchiroud,D. (2001) Determinants of CpG islands: expression in early embryo and isochore structure. Genome Res., 11, 1854–1860. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.D'Onofrio G. (2002) Expression patterns and gene distribution in the human genome. Gene, 300, 155–160. [DOI] [PubMed] [Google Scholar]
  • 29.Voelkel-Meiman K. and Roeder,G.S. (1990) Gene conversion tracts stimulated by HOT1-promoted transcription are long and continuous. Genetics, 126, 851–867. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Shenkar R., Shen,M.H. and Arnheim,N. (1991) DNase I-hypersensitive sites and transcription factor-binding motifs within the mouse E beta meiotic recombination hot spot. Mol. Cell Biol. 11, 1813–1819. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Nickoloff J.A. (1992) Transcription enhances intrachromosomal homologous recombination in mammalian cells. Mol. Cell Biol., 12, 5311–5318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Droge P. (1993) Transcription-driven site-specific DNA recombination in vitro. Proc. Natl Acad. Sci. USA, 90, 2759–2763. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Cook P.R. (1997) The transcriptional basis of chromosome pairing. J. Cell Sci., 110, 1033–1040. [DOI] [PubMed] [Google Scholar]
  • 34.Bell S.J, Chow,Y.C., Ho,J.Y. and Forsdyke,D.R. (1998) Correlation of chi orientation with transcription indicates a fundamental relationship between recombination and transcription. Gene, 216, 285–292. [DOI] [PubMed] [Google Scholar]
  • 35.Nicolas A. (1998) Relationship between transcription and initiation of meiotic recombination: towards chromatin accessibility. Proc. Natl Acad. Sci. USA, 95, 87–89. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Koren A., Ben-Aroya,S. and Kupiec,M. (2002) Control of meiotic recombination initiation: a role for the environment? Curr. Genet., 42, 129–139. [DOI] [PubMed] [Google Scholar]
  • 37.Brazma A. and Vilo,J. (2000) Gene expression data analysis. FEBS Lett., 480, 17–24. [DOI] [PubMed] [Google Scholar]
  • 38.Schulze A. and Downward,J. (2001) Navigating gene expression using microarrays–a technology review. Nature Cell Biol., 3, E190–E195. [DOI] [PubMed] [Google Scholar]
  • 39.Su A.I., Cooke,M.P., Ching,K.A., Hakak,Y., Walker,J.R., Wiltshire,T., Orth,A.P., Vega,R.G., Sapinoso,L.M., Moqrich,A. et al. (2002) Large-scale analysis of the human and mouse transcriptomes. Proc. Natl Acad. Sci. USA, 99, 4465–4470. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Duboule D. and Wilkins,A.S. (1998) The evolution of ‘bricolage’. Trends Genet., 14, 54–59. [DOI] [PubMed] [Google Scholar]
  • 41.Kong A., Gudbjartsson,D.F., Sainz,J., Jonsdottir,G.M., Gudjonsson,S.A., Richardsson,B., Sigurdardottir,S., Barnard,J., Hallbeck,B., Masson,G. et al. (2002) A high-resolution recombination map of the human genome. Nature Genet., 31, 241–247. [DOI] [PubMed] [Google Scholar]
  • 42.Konu O.O. and Li,M.D. (2002) Correlations between mRNA expression levels and GC contents of coding and untranslated regions of genes in rodents. J. Mol. Evol., 54, 35–41. [DOI] [PubMed] [Google Scholar]
  • 43.D'Onofrio G., Jabbari,K., Musto,H., Alvarez-Valin,F., Cruveiller,S. and Bernardi,G. (1999) Evolutionary genomics of vertebrates and its implications. Ann. N. Y. Acad. Sci., 870, 81–94. [DOI] [PubMed] [Google Scholar]
  • 44.Galtier N. and Lobry,J.R. (1997) Relationships between genomic G+C content, RNA secondary structures, and optimal growth temperature in prokaryotes. J. Mol. Evol., 44, 632–636. [DOI] [PubMed] [Google Scholar]
  • 45.Hurst L.D. and Merchant,A.R. (2001) High guanine–cytosine content is not an adaptation to high temperature, a comparative analysis amongst prokaryotes. Proc. R. Soc. Lond. B, 268, 493–497. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Belle E.M., Smith,N. and Eyre-Walker,A. (2002) Analysis of the phylogenetic distribution of isochores in vertebrates and a test of the thermal stability hypothesis. J. Mol. Evol., 55, 356–363. [DOI] [PubMed] [Google Scholar]
  • 47.Ream R.A., Johns,G.C. and Somero,G.N. (2003) Base compositions of genes encoding alpha-actin and lactate dehydrogenase-A from differently adapted vertebrates show no temperature-adaptive variation in G + C content. Mol. Biol. Evol., 20, 105–110. [DOI] [PubMed] [Google Scholar]
  • 48.Anselmi C., Bocchinfuso,G., De Santis,P., Savino,M. and Scipioni,A. (1999) Dual role of DNA intrinsic curvature and flexibility in determining nucleosome stability. J. Mol. Biol., 286, 1293–1301. [DOI] [PubMed] [Google Scholar]
  • 49.Anselmi C., Bocchinfuso,G., De Santis,P., Savino,M. and Scipioni,A. (2000) A theoretical model for the prediction of sequence-dependent nucleosome thermodynamic stability. Biophys. J., 79, 601–613. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Herbert A. and Rich,A. (1999) Left-handed Z-DNA, structure and function. Genetica, 106, 37–47. [DOI] [PubMed] [Google Scholar]
  • 51.Radic M.Z., Lundgren,K. and Hamkalo,B.A. (1987) Curvature of mouse satellite DNA and condensation of heterochromatin. Cell, 50, 1101–1108. [DOI] [PubMed] [Google Scholar]
  • 52.Blomquist P., Belikov,S. and Wrange,O. (1999) Increased nuclear factor 1 binding to its nucleosomal site mediated by sequence-dependent DNA structure. Nucleic Acids Res., 27, 517–525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Kiyama R. and Trifonov,E.N. (2002) What positions nucleosomes?—A model. FEBS Lett., 523, 7–11. [DOI] [PubMed] [Google Scholar]
  • 54.Saccone S., Federico,C. and Bernardi,G. (2002) Localization of the gene-richest and the gene-poorest isochores in the interphase nuclei of mammals and birds. Gene, 300, 169–178. [DOI] [PubMed] [Google Scholar]
  • 55.Vinogradov A.E. (2003) Silent DNA: speaking RNA language? Bioinformatics, 19, 000–000. [DOI] [PubMed] [Google Scholar]
  • 56.Jaenisch R. and Bird,A. (2003) Epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals. Nature Genet., 33, 245–254. [DOI] [PubMed] [Google Scholar]
  • 57.Jabbari K. and Bernardi,G. (1998) CpG doublets, CpG islands and Alu repeats in long human DNA sequences from different isochore families. Gene, 224, 123–127. [DOI] [PubMed] [Google Scholar]
  • 58.Sabeur G., Macaya,G., Kadi,F. and Bernardi,G. (1993) The isochore patterns of mammalian genomes and their phylogenetic implications. J. Mol. Evol., 37, 93–108. [DOI] [PubMed] [Google Scholar]
  • 59.Robinson M., Gautier,C. and Mouchiroud,D. (1997) Evolution of isochores in rodents. Mol. Biol. Evol., 14, 823–828. [DOI] [PubMed] [Google Scholar]
  • 60.Galtier N. and Mouchiroud,D. (1998) Isochore evolution in mammals, a human-like ancestral structure. Genetics, 150, 1577–1584. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Douady C., Carels,N., Clay,O., Catzeflis,F. and Bernardi,G. (2000) Diversity and phylogenetic implications of CsCl profiles from rodent DNAs. Mol. Phylogenet. Evol., 17, 219–230. [DOI] [PubMed] [Google Scholar]
  • 62.Smith N.G. and Eyre-Walker,A. (2002) The compositional evolution of the murid genome. J. Mol. Evol., 55, 197–201. [DOI] [PubMed] [Google Scholar]
  • 63.Furlong R.F. and Holland,P.W. (2002) Were vertebrates octoploid? Philos. Trans. R. Soc. Lond. B, 357, 531–544. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Gu X., Wang,Y. and Gu,J. (2002) Age distribution of human gene families shows significant roles of both large- and small-scale duplications in vertebrate evolution. Nature Genet., 31, 205–209. [DOI] [PubMed] [Google Scholar]
  • 65.McLysaght A., Hokamp,K. and Wolfe,K.H. (2002) Extensive genomic duplication during early chordate evolution. Nature Genet., 31, 200–204. [DOI] [PubMed] [Google Scholar]
  • 66.Van de Peer Y., Taylor,J.S., Joseph,J. and Meyer,A. (2002) Wanda: a database of duplicated fish genes. Nucleic Acids Res., 30, 109–112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Dean W., Santos,F., Stojkovic,M., Zakhartchenko,V., Walter,J., Wolf,E. and Reik,W. (2001) Conservation of methylation reprogramming in mammalian development: aberrant reprogramming in cloned embryos. Proc. Natl Acad. Sci. USA, 98, 13734–13738. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Adelman R., Saul,R.L. and Ames,B.N. (1988) Oxidative damage to DNA: relation to species metabolic rate and life span. Proc. Natl Acad. Sci. USA, 85, 2706–2708. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Ames B.N. (1989) Endogenous DNA damage as related to cancer and aging. Mutat. Res., 214, 41–46. [DOI] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES