Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2003 Apr 7;100(8):4369–4371. doi: 10.1073/pnas.0831050100

What happens to genes in duplicated genomes

Elizabeth A Kellogg 1,*
PMCID: PMC153560  PMID: 12682287

Introductory textbooks tell us that humans (and other metazoa and all our agricultural plants) are diploid. They have two copies of each gene, one from each parent. Comparative genomic studies, however, suggest that we, along with yeast (Saccharomyces cerevisiae) (1), maize (2, 3), and Arabidopsis (4), may actually be ancient polyploids (5, 6) in which the chromosome complement doubled at some time in the past and then, through gene silencing, mutation, and loss, reverted to a diploid-like state.

Polyploidy raises a problem for gene regulation. The amount of gene product produced is often critical for proper cellular function, but with all genes copied, the complex regulatory network may be modified in peculiar ways. To solve this problem, cells turn off or at least turn down the expression of some copies of some genes. Adams et al. (7), in this issue of PNAS, now provide data to address how many genes are modified in their expression patterns and whether it matters which genome they come from.

Documentation of polyploidy is not new, particularly in plants. Winge (8) proposed that polyploids might be formed by interspecific hybridization followed by chromosome doubling. A well known early example was provided by a cultivated primrose, a first-generation hybrid between Primula floribunda and Primula verticillata. The plant grew well and could be propagated from cuttings, but was sterile (9). Periodically, one of the cuttings would produce a shoot with larger leaves and flowers, and these would be fertile. The explanation is outlined in Fig. 1. The genomes of the parents were similar enough to allow the plant to grow, but different enough that meiosis was not successful because the chromosomes were unable to pair properly. For an unknown reason, the chromosomes in one branch of the plant spontaneously doubled (somatic doubling). This restored fertility by providing each chromosome an identical partner to pair with. The plant was now tetraploid and fully fertile.

Figure 1.

Figure 1

Schematic diagram of allopolyploidy. Chromosome complements are shown for two imaginary organisms, both diploid and with two pairs of chromosomes. When crossed, they produce a sterile F1 hybrid in which chromosomes do not pair. Chromosome doubling leads to an allotetraploid, with one full (doubled) genome from each parent.

In pioneering work at the beginning of the 20th century, Kihara (10, 11) and Percival (12) demonstrated the same phenomenon in wheat. Durum wheat (Triticum durum, used to make pasta) is an allotetraploid, produced from a cross between Einkorn wheat (Triticum monococcum) and a species of goat grass (presumed to be similar to Aegilops speltoides), that occurs naturally as a weed. Bread wheat (Triticum aestivum) is an allohexaploid that includes the two genomes from T. durum, as well as one from another species of goat grass, Aegilops tauschii.

The cotton genus (Gossypium) also includes both diploid and tetraploid species. Among the tetraploids are Gossypium hirsutum (upland cotton), which provides ≈95% of the cotton used commercially, and Gossypium barbadense, sold variously as Pima or Egyptian cotton and valued for its particularly long fibers. These are two of five tetraploid species resulting from a single event of polyploid formation that occurred ≈1.5 million years ago, combining the genomes of a North American species (genome designation A) with that of an African species (genome designation D) (13).

What nature does spontaneously in fields and woods can be reproduced synthetically in the lab. Plants of disparate species can be crossed to produce sterile F1 hybrids. Chromosome doubling can then be induced by the use of colchicine, which inhibits spindle formation, thus creating diploid cells from which whole plants can be regenerated. By using naturally occurring and synthetic polyploids, it is possible to determine which gene-level changes are an immediate result of chromosome doubling and which require thousands or millions of years of mutation and selection.

The effects of polyploidization on gene structure and function have been the basis of a considerable and accumulating body of theory. Haldane (14, 15) observed that a duplicate gene might easily become a pseudogene through accumulation of mutations. Because only one copy of the gene is necessary for function, selective pressure on the second copy is presumably reduced. Walsh (16) summarizes the extensive literature on how long gene silencing and loss may take; in short, the time will be proportional to the effective population size. Walsh also shows that unless the product 4Nesρ (where Ne is effective population size, s is selective advantage of the advantageous allele, and ρ is the ratio of the mutation rate of the advantageous to the null allele) is very much greater than 1, then the duplicate gene will become a pseudogene. [Walsh's result is related to that shown by Basten and Ohta (17) for multigene families under diversifying selection.]

Although pseudogene formation is thus predicted to be very common, mutation to produce new function could theoretically occur. In the words of Ohno (18), “occasionally . . . [the duplicate gene] may acquire a new active site sequence, therefore a new function and emerge triumphant as a new gene locus,” an alternative to the “less glamorous fate of fixing a null allele and becoming a pseudogene” (16). Hughes (19) presents extensive data to argue against this possibility, at least in the form outlined by Ohno. He shows that pairs of duplicate genes in Xenopus laevis (a tetraploid frog) are both still subject to purifying selection and cites several examples of diversifying selection on members of multigene families. He observes that these examples are inconsistent with the hypothesis of relaxed selection on one copy of a duplicate gene pair. He then suggests, instead, that ancestral genes might be bifunctional, and that the descendant genes evolve to partition the ancestral function. This model would permit selection for functional specialization on the two gene copies immediately after they are formed.

Force et al. (20) and Lynch and Force (21), in a widely cited pair of papers, provide a plausible mechanism for the subfunctionalization hypothesis by focusing attention on the regulatory regions of genes, rather than on the coding sequence itself. They suggest, specifically, that purifying selection is maintained on all or part of the coding regions of duplicates, while mutations accumulate in the upstream regions, such that tissue or cell specificity is partitioned. Under this model, it is not necessary to invoke positive selection, which is presumed to be rare.

Adams et al. (7) now provide data to suggest that subfunctionalization indeed occurs in some genes and is an immediate product of polyploidization. Their first set of experiments investigates expression of 40 pairs of genes (homeologs), one copy from the A genome and one from the D genome, in natural and synthetic Gossypium tetraploids and in modern descendants of their presumed diploid progenitors. Average sequence divergence between the A and D copies of the genes is ≈1%, allowing the copies to be distinguished by assays of single-stranded conformation polymorphism (SSCP). In previous phylogenetic studies of Gossypium, Cronn et al. (13) showed that the genes were indeed single-copy, related by descent, and truly homologous. This documentation is critical to their interpretation, and is missing from all comparable studies, most of which have assumed orthology based on similar segregation on an AFLP gel, or on high sequence similarity. In many cases, this will indeed identify orthologues, but, because the term “orthologous” is defined solely with reference to a gene tree (22), orthology can only be demonstrated with a molecular phylogeny.

Subfunctionalization occurs in some genes and is not an immediate product of polyploidization.

Their focus was on genes expressed in ovules, which produce the valuable fibers of cotton. Of the 40 gene pairs whose expression was investigated in ovules, 30 (75%) showed approximately the same level of expression from both the A and the D genome copy. In other words, the two genomes in the polyploid had not differentiated for expression of these 30 genes. In contrast, 9 of the 40 genes investigated were biased in their expression pattern. Five produced more transcript from the A genome copy and four produced more from the D genome copy. For the remaining gene (adhE), the D genome copy was apparently completely silenced. This supports a hypothesis put forth by Haldane (14, 15), who predicted that genetic changes would affect the gene copies randomly with respect to genome of origin. As a corollary of this hypothesis, he suggested that the retained gene(s) might evolve to create new linkage groups, an intriguing possibility that will soon be testable by comparison of the gene contents of multiple disparate genomes.

At first glance, these data on a single species in a single tissue type suggest that differential gene silencing may be a rather minor phenomenon, a possibility supported by other studies. In a synthetic tetraploid derived from a cross of Arabidopsis thaliana and Arabidopsis arenosa, differential silencing affected only ≈0.4% of the 700 genes investigated (23). A study of the naturally occurring tetraploid from the same parents (Arabidopsis suecica) suggested that differential silencing affected 2.5% of the genes (24). In a cross between a diploid wheat (Triticum monococcum) and goat grass (Aegilops sharonensis), methylation patterns were retained in ≈90% of the DNA fragments investigated (25). And in synthetic polyploids of cotton, no differences were found in methylation patterns between parents and polyploid offspring (26). These studies investigate different species and use different ways to assess expression, and, indeed, the wheat and cotton studies do not directly address gene expression at all. Together, though, they suggest that most genes in an allotetraploid are not affected by the polyploidization event.

In a second set of experiments, however, Adams et al. (7) show that differential gene silencing is not a minor phenomenon at all. They chose a set of 18 gene pairs, some of which were the same as those used in the ovule studies, and examined differential expression among different plant organs. When multiple organs are compared, differential expression is the rule rather than the exception. Eleven of the 18 genes (about two-thirds) were differentially expressed in at least one organ type, and the direction of the bias varied from organ to organ. Thus, for example, the A genome copy of adhD was more strongly expressed than the D genome copy in all organs except for stamens, in which the expression pattern was reversed. This is a striking departure from studies of single organs or pooled cDNAs. The phenomenon thus appears to be more important than might have been supposed.

All studies to date, including that of Adams et al., suggest that gene expression changes almost instantly on polyploid formation (7, 23, 25, 27). Adams et al. considered whether gene expression differences were present in the diploid ancestors and then simply perpetuated in the polyploid, whether the differences could have appeared immediately after the two genomes found themselves in a common nucleus, or whether biases evolved over the 1.5 million years between the original polyploid event and modern tetraploid cotton. They disentangled these possibilities by looking at diploid relatives of the A and D genome donors and also by looking at synthetic polyploids. If biased gene expression reflects the particular biology of the ancestral diploids then it should be detectable in modern diploid relatives; Adams et al. found no differences between the diploids, however. If biased gene expression requires millions of years of genomes coexisting in a common nucleus, then it should appear only in modern tetraploid cotton and not the diploids or synthetic tetraploids. And finally, if biases are a direct result of polyploidization, then they should appear in the synthetic polyploid but not in the diploid genomes. Investigation of synthetic polyploids included a more limited number of genes and organs than studies in the natural species. With this caveat, however, at least some of the expression patterns were similar to those found in the naturally occurring polyploids. This then suggests that expression changed immediately on formation of polyploids.

The mechanism underlying differential expression is unknown. Although rapid elimination of noncoding sequences has been demonstrated in some species (28), there is no evidence that mutation rate suddenly jumps with polyploid formation. Instead, it is more likely that initial expression changes are caused by epigenetic mechanisms. Expression bias and differential silencing have been well documented for high-copy-number genes, such as ribosomal RNA genes or transposable elements. Silencing of rRNA genes was observed as the failure of one genome to form a nucleolus in hybrid progeny (29). Recent data now suggest that this is caused by epigenetic mechanisms that somehow modify chromatin (27, 30). Such mechanisms might include methylation, changes in chromatin structure, changes in nuclear architecture, or some combination of these. Epigenetic changes may then be made permanent by subsequent mutations or, alternatively, may persist for many generations. Currently we have no idea how long such changes might last, what maintains the altered epigenetic state, or whether particular genes are more likely to be affected than others.

Expression and function of many genes have apparently been conserved over evolutionary time. This observation has led to suggestions that all organisms, at least those in the same kingdom, may work with the same basic genetic toolkit (31). Although this is undoubtedly true in general, polyploidy provides a way to diversify the basic set of tools, and plants in particular have taken advantage of this opportunity. By copying their genomes, they retain the tool kit and at the same time generate a garage full of spare parts. Gene duplication can provide the raw material for expression changes to occur, and polyploidy itself can trigger epigenetic changes. The next step is to connect differential gene expression to selectable changes that drive the origin of species.

Footnotes

See companion article on page 4649.

References


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES