. 2017 Aug 18;7(5):20160115. doi: 10.1098/rsfs.2016.0115

Biological action in Read–Write genome evolution

James A Shapiro 1,
PMCID: PMC5566801  PMID: 28839913


Many of the most important evolutionary variations that generated phenotypic adaptations and originated novel taxa resulted from complex cellular activities affecting genome content and expression. These activities included (i) the symbiogenetic cell merger that produced the mitochondrion-bearing ancestor of all extant eukaryotes, (ii) symbiogenetic cell mergers that produced chloroplast-bearing ancestors of photosynthetic eukaryotes, and (iii) interspecific hybridizations and genome doublings that generated new species and adaptive radiations of higher plants and animals. Adaptive variations also involved horizontal DNA transfers and natural genetic engineering by mobile DNA elements to rewire regulatory networks, such as those essential to viviparous reproduction in mammals. In the most highly evolved multicellular organisms, biological complexity scales with ‘non-coding’ DNA content rather than with protein-coding capacity in the genome. Coincidentally, ‘non-coding’ RNAs rich in repetitive mobile DNA sequences function as key regulators of complex adaptive phenotypes, such as stem cell pluripotency. The intersections of cell fusion activities, horizontal DNA transfers and natural genetic engineering of Read–Write genomes provide a rich molecular and biological foundation for understanding how ecological disruptions can stimulate productive, often abrupt, evolutionary transformations.

Keywords: symbiogenesis, hybrid speciation, horizontal DNA transfer, mobile DNA, network rewiring, non-coding RNA regulation

1. Living organisms regularly facilitate their own evolution

The most basic questions in evolutionary biology concern the origins of taxonomic and adaptive innovations. How do new kinds of organisms arise and acquire heritable functionalities that enable them to proliferate in a changing ecology? While the details of each innovation are unique, genome analysis shows that diverse evolutionary novelties repeatedly originated through core generic activities:

  • — horizontal DNA transfers between unrelated cell lineages [13];

  • — symbiogenetic cell mergers [4,5];

  • — interspecific hybridizations [6,7];

  • — biochemical activities that cleave, splice, polymerize and otherwise modify cell DNA molecules (collectively, ‘natural genetic engineering’ or NGE activities) to generate novel sequence configurations and amplify mobile DNA elements [8].

The following sections present examples of these generic processes repeatedly achieving evolutionary innovations. These cases exemplify broader implications of biological agency (i.e. direct biochemical and biomechanical activity) in generating evolutionary variation:

  • (1.a.) Major classes of genome modification (horizontal transfer, symbiogenesis, hybridization) involve the interaction of organisms with distinct evolutionary histories. Consequently, a full account of many evolutionary innovations involves more than a single branching vertical line of descent, as commonly depicted [9]. A major fraction of evolutionary pedigrees have an obligatory networked or reticulate architecture [10].

  • (1.b.) Since cells have agency in generating organisms with new genome configurations, the genome functions as a read–write (RW) storage system in evolution [11]. The RW genome is subject to a range of generic inscriptions (e.g. acquisition of DNA molecules encoding complex metabolic capabilities, relocation of information-rich sequence cassettes) with greater functional consequences than the random copying errors postulated to occur in the conventional read-only memory (ROM) genome [12,13].

  • (1.c.) Like all classes of cellular biochemistry, NGE DNA transport and restructuring functions are subject to control by regulatory circuits and respond to changing conditions (electronic supplementary material, table S1). Among the conditions known to activate NGE functions are cell mergers, such as interspecific hybridizations [14,15]. By altering the regulatory context, one kind of genome-modification activity can stimulate other kinds in a positive feedback loop that amplifies evolutionary innovation.

  • (1.d.) Genome changes by symbiogenetic cell fusions, interspecific hybridizations and multilocus NGE activities typically affect multiple characters of the variant cell and organism. Consequently, major phenotypic transformations can occur in a single evolutionary episode and are not restricted to a gradual accumulation of ‘numerous, successive, slight modifications’ [9]. In other words, there is an empirical molecular–cellular basis for saltationist views of the evolutionary process, similar to those proposed by certain early evolutionists [1619].

2. Many lineages acquire adaptive innovations through horizontal DNA transfers rather than reinventing each function de novo

A major fraction of all biomass on Earth consists of prokaryotic cells without a nucleus carrying out most biogeochemical transformations in a complex web of interactions with nucleated eukaryotic microbes (protists) [20,21]. Thanks to the pioneering work of Carl Woese and colleagues on ribosomal rRNA sequences characterizing fundamental cell protein synthesis organelles, we know that there are two distinct evolutionary lineages of prokaryotes, Eubacteria and Archaea [22,23]. In addition to ribosomal differences, the two prokaryotic lineages differ in other core cell functionalities, such as DNA replication machinery, RNA transcription complexes and membrane structure.

Well before the discovery of Archaea in the 1970s, it was clear from basic bacterial genetics that prokaryotic cells have multiple molecular mechanisms for horizontal DNA transfer: uptake of naked DNA from the environment (‘transformation’), encapsidation of cellular DNA in infectious virus particles (‘transduction’), and direct contact-mediated transfer (‘conjugation’) [1,24]. Rapid appreciation of bacterial virtuosity in horizontal DNA transfer resulted from the need to understand the evolution and spread of infectious multiple antibiotic resistance in pathogenic bacteria [25] (electronic supplementary material, table S2). The discovery of transmissible drug resistance plasmids provided an early real-world object lesson on the limitations of evolutionary thinking based solely on spontaneous mutation and vertical inheritance of novel traits.

The lessons of antibiotic resistance evolution proved applicable to other bacterial adaptations. Diverse bacteria possess extended DNA structures (labelled ‘genome islands’ or ‘integrons’) encoding complex functionalities, like catabolic pathways or pathogenicity determinants, that they can transfer to unrelated cells, where they integrate into the genome and abruptly extend the recipient's adaptive capabilities [2628].

The existence of transmissible DNA cassettes that generate sudden adaptive changes prompted rethinking of how an ecologically suitable genome evolves in the prokaryotic realm. The prevalence of horizontal DNA exchange gave rise to the idea of a vast supra-cellular pan-genome that bacterial cells sample in a facultative manner to evolve novel genomes suitable for adaptation to particular ecological niches [29]. This initially controversial notion gained wider credibility as metagenomic analysis revealed that extracellular environmental DNA samples contain unexpectedly large numbers of potentially adaptive coding elements in virus particles [30]. These environmental DNA elements include sequences encoding previously unknown functionalities [31]. Detailed single cell analysis of soil bacteria indicates virus infections and horizontal DNA transfers are active ongoing events [32].

Although the discrete origins of Eubacteria and Archaea remain highly speculative [33,34], patterns of biochemical specializations in the two kingdoms provide evidence for significant inter-kingdom DNA transfer. Horizontal transfer was initially apparent between hyperthermophilic archaeal and eubacterial cells that shared a common high-temperature ecology [35]. Subsequent genomics-based analysis of Archaea from more temperate ecologies showed that transfer of DNA encoding hundreds of metabolic functions from eubacteria correlated with the origination of novel mesophilic archaeal clades [3639]. While there is some debate about the number and speed of Eubacteria–Archaea horizontal transfers [40], the sequence evidence clearly indicates that evolutionary innovation among Archaea only makes sense by invoking DNA exchanges with Eubacteria. These discoveries also fit with the Eubacteria–Archaea transfers now recognized as necessary to explain the origin of nucleated eukaryotic cells from prokaryotic ancestors [38,41,42].

Horizontal exchange of adaptively important DNA has also been important in eukaryotic evolution, although less pervasive than in prokaryotes [4346] (electronic supplementary material, table S2). Many bacteria-to-eukaryote transfers originate from endosymbiotic bacteria [47,48]. Intriguing horizontal exchanges include the transfer from bacteria to primitive land plants (mosses) of xylem synthesis and other functions needed for land colonization [49] and transfers from both bacteria and fungi to plant parasitic nematode worms of biochemical activities for phytopolymer digestion [5052]. In the latter case, we know there were multiple independent transfers because related nematode species use distinct microbe-derived enzymes to break down plant material.

3. Symbiogenetic cell fusions formed the ancestral eukaryote and major photosynthetic eukaryotic clades

We know that evolution of the ancestor of all nucleated eukaryotic cells involved a symbiogenetic event. Eukaryotic cells all possess an oxidative mitochondrion organelle or a defective non-oxidative derivative (hydrogenosome or mitosome) [42,5355]. While mitochondria and derivatives are quite diverse, all contain organelle genomes encoding organelle-specific ribosomal rRNA molecules. These rRNA molecules clearly identify the organelles as descendants of an ancestral alpha-proteobacteria endosymbiont [56,57]. Thus, a saltatory cell fusion event has to be added to horizontal DNA transfers as contributing to one of the most important evolutionary innovations in life history.

The ancestral eukaryote had two genomic compartments: (i) the nucleus, encoding eukaryotic rRNA and most proliferative functions, and (ii) the mitochondrion, encoding mitochondrial rRNA and oxidative metabolic functions from the alpha-proteobacterium endosymbiont. A major feature of subsequent eukaryotic evolutionary diversification has been loss from mitochondrial genomes of sequences encoding functions needed for autonomous cell reproduction as well as intracellular horizontal transfer to the nuclear genome of DNA sequences encoding mitochondrial proteins [56,58]. The processes of mitochondrial genome reduction and restructuring have been different in distinct eukaryotic lineages (including those that have lost oxidative metabolism), producing a great variety of taxonomically specific organelle genotypes [54,5961]. Eukaryotic diversification thus resulted from biochemical NGE functions executing deletions, transfers and rearrangements of mitochondrial DNA.

Symbiogenetic cell fusions have been ongoing features of major steps in eukaryotic evolution. The origin of photosynthetic eukaryotes harbouring additional chloroplast/plastid organelles can be traced by rRNA sequencing to an ancestral cyanobacterial endosymbiont [53,62]. The cell product of this primordial symbiogenesis contained three genome compartments—nucleus, mitochondrion and plastid—and subsequent plastid–nuclear transfers and deletions gave rise to four primary photosynthetic eukaryotic lineages: (i) green algae, (ii) land plants (embryophytes), (iii) glaucophytes and (iv) red algae [53,62]. The evolutionary history of photosynthetic eukaryotes diversified further by higher-level symbiogenetic events, in which red or green algae became endosymbionts of distinct eukaryotic lineages (table 1). The cell arising from such a secondary endosymbiosis has four distinct genome compartments: (i) nucleus, (ii) mitochondrion, (iii) plastid and (iv) nucleomorph (derived from the nucleus of the algal endosymbiont). As with mitochondria, deletions, intracellular transfers and rearrangements of the DNA in all these compartments contribute to evolutionary innovation.

Table 1.

Photosynthetic eukaryotic lineages resulting from symbiogenesis [53,62].

taxonomic group symbiogenetic origin
green algae (Chlorophyta) primary cyanobacterial endosymbiosis
land plants (Embryophyta) primary cyanobacterial endosymbiosis
Glaucophytes (order Chlorococcales) primary cyanobacterial endosymbiosis
red algae (Rhodophyta) primary cyanobacterial endosymbiosis
Euglenids (flagellated algae) secondary green algal endosymbiosis
Chlorarachniophytes (marine algae) secondary green algal endosymbiosis
Chromalveolates (multiple lineages including major photosynthetic organisms responsible for a large fraction of atmospheric oxygen, such as brown algae, dinoflagellates and diatoms) secondary red algal endosymbiosis

Among the dinoflagellates, there is a remarkable photosynthetic taxon labelled warnowiids, which comprises individual photosynthetic cells containing a light-harvesting organelle (the ‘ocelloid’) that resembles the camera eyes of multicellular animals [63]. The ocelloid has analogues to the cornea, lens, iris and retina. A noteworthy recent paper reports genomic analysis showing that two of these structures resulted from distinct endosymbiogenetic events, the cornea composed of mitochondria and the retina of red algal plastids [64]. The role of cell fusion events in the evolution of the ocelloid eye-like structure stands in sharp contrast to Darwin's conception of eyes evolving ‘by numerous, successive, slight modifications’ (p. 189) [9].

4. Interspecific hybridizations and whole genome duplications accelerate eukaryotic speciation and taxonomic diversification

Although Darwin emphasized repeated selection of individual traits as a source of new species, using human selective breeding as an illustration [9], major domesticated crop species with beneficial agricultural characters like wheat, cotton and tobacco actually arose by a completely different ‘cataclysmic evolution’ process: cross-breeding of different species to form genotypically and phenotypically novel hybrids [65,66]. For example, several thousand years ago in the Middle East, ‘the combination of chromosomes of a moderately useful plant, emmer wheat, and those of a completely useless and noxious weed {wild goat grass, Aegilops squarrosa} produced the world's most valuable crop plant,’ high-yielding bread wheat, and this hybrid speciation process can be repeated in real time [65].

Because organisms with two different haploid chromosome sets will not proceed through meiosis to reproduce sexually, successful fertile interspecific hybridizations also involve whole genome duplication (WGD) events so that the merged ‘allopolyploid’ genome has the necessary two copies of each parental chromosome. Genome doubling is a common response to interspecific hybridization, at least in plants where it has most been studied [6769]. The cytogenetic and genomic evidence for hybrid speciation among yeasts, plants and animals is abundant (table 2) [133136]. Two rapidly evolving species groups in table 2 merit special emphasis: Darwin's Galapagos finches (Geospiza), often taken as models of gradualist selection-driven speciation, and East African dichlids (fresh water fish), recent examples of explosive adaptive radiations [137].

Table 2.

Selected examples of speciation and adaptive radiation by interspecific hybridization and changes in chromosome number.

taxon references
fungi [70]
Saccharomyces yeast [7177]
plants [69,7881]
Tragopogon (Asteraceae) [82,83]
 irises (Iris fulva, I. hexagona, and I. nelsonii) [8486]
Nicotiana (Solanaceae) [8789]
Brassica napus [9092]
Arabidopsis [9395]
 potatoes (Solanum stoloniferum and S. hjertingii) [9698]
 wheats (Aegilops–Triticum group) [65,66,99,100]
 cotton (Gossypium) [101,102]
animals [103105]
 tephritid fruit flies [106,107]
Heliconius butterflies [108112]
 sailfin silversides (Teleostei) [113,114]
 East African cichlids [115121]
 sparrows [122,123]
 Galapagos finches (Geospiza) [124130]
 Clymene dolphin (Delphininae (Cetacea, Mammalia)) [131,132]

An important feature of evolutionary innovation by interspecific hybridization is that the transformational event involves the entire genomes of both parental species. Thus, all organismal traits are simultaneously subject to modification. Molecular studies find dramatically altered genome-wide expression patterns in newly synthesized interspecific hybrids, resulting at least in part from epigenetic modifications across the hybrid genomes [138,139]. The same regulatory and epigenetic changes that alter coding sequence expression in hybrid nuclei also de-repress transcription of mobile DNA elements and other NGE functions [140]. Repetitive and mobile DNA elements play special roles in chromosome restructuring [141144], as exemplified in primate evolution by the recently published gibbon genome [145]. Thus, in addition to global effects on genome expression, interspecific hybridization serves as a major stimulus to genome variability, including chromosome rearrangements and mobile DNA activation [14,15,82,146149] (electronic supplementary material, table S3). Chromosome rearrangements have long been recognized as major features of speciation and taxonomic divergence [150152]. Accordingly, hybrid organisms have a markedly elevated potential for generating novel DNA sequence configurations not present in either parent species genome, an added impact of biological agency on evolutionary innovation [45].

5. Amplification of mobile DNA accompanies increased organismal complexity and provides information-rich cassettes for adaptive innovations

The fact that interspecific hybridization activates the spread and amplification of mobile DNA elements (MEs) fits into a larger set of connections between this particular class of genome sequences and the origins of novel adaptations. All organisms contain repetitive DNA elements capable of transposing to novel genome locations [153]. Both transposition purely at the DNA level and retrotransposition through RNA intermediates tend to increase the copy number of a particular element, and MEs are collectively referred to as dispersed repetitive DNA.

The class of mobile repetitive DNA has typically been considered functionally distinct from the relatively fixed, low copy DNA encoding the vast majority of organismal proteins. Hence, it was significant when the initial draft human genome sequence revealed that at least 44% of our genomes consisted of MEs [154]. (Today, our fraction of repetitive DNA is estimated to be as high as 67% [155].) In fact, the repetitive, so-called ‘non-coding’ content of the genome scales more closely with organismal complexity defined by the number of distinct cell types than does the protein-coding content, which plateaus in organisms of about a dozen different cell types [156]. In other words, MEs and other ‘non-coding’ sequences are especially prevalent in genomes of the most highly evolved organisms.

Focusing our attention on mammals, where the most in-depth genomic analysis has been performed, we can discern at least three major evolutionary roles for MEs [157]:

  • (1) MEs provide distributed signals, such as binding site for structural proteins and replication factors, to format the genome as a physically organized self-replicating sequence database [158160];

  • (2) Although characterized by some as non-functional ‘junk DNA’ [161,162], MEs provide cis-regulatory signals to rewire transcriptional networks determining multiple higher-level traits, including key mammalian innovations on both uterine and placental sides of the maternal–fetal interface in viviparous reproduction [163166];

  • (3) MEs provide the majority of conserved sequences comprising taxonomically specific ‘long non-coding lncRNAs' [167169], a class of molecule that we increasingly recognize as modulating epigenetic control [170] and regulating complex traits like stem cell pluripotency [171174], internal organ and nervous system development [175177], and innate and adaptive immunity [178,179].

These three categories, plus dozens of other examples cited in the various tables of [157], provide robust support for the RW genome view of MEs as NGE tools (plug-in formatting cassettes) for adaptive genome restructuring. Viewing MEs as genome modification tools has been in the literature ever since McClintock showed transposable ‘controlling elements’ to alter developmental patterns [180] and molecular studies in bacteria and yeast revealed MEs to act as sources of regulatory innovation [181183]. Significantly, ecological stress triggers mobile DNA activity [8] (electronic supplementary material, table S1).

6. Core biological capacities rewrite genomes for evolutionary innovation

The preceding discussion illustrates how generic biological activities have regularly played decisive roles in major episodes of evolutionary innovation:

  • — horizontal DNA transfers in the origination of mesophilic archaeal taxa;

  • — recurring symbiogenetic cell fusions in the origination of the ancestral eukaryotic cell and the major clades of photosynthetic eukaryotes;

  • — interspecific hybridizations and changes in genome ploidy in speciation and adaptive radiations in yeast, plants and animals;

  • — amplification and relocalization of mobile DNA elements in formatting mammalian genomes for replication, viviparous reproduction, and lncRNA regulation of nervous and immune system functions.

These examples show us that core biological capacities for self-modification in response to ecological challenge have been integral to the history of life on earth. That conclusion should not surprise us because extant organisms are descendants of multiple evolutionary episodes. Considering potential interactions between dynamic ecological conditions and the biological engines of cell and genome variation raises important questions about control and specificity in evolutionary innovation. The years to come likely hold surprising lessons about how cell fusions, genome doublings, and natural genetic engineering may operate non-randomly to enhance the probabilities of evolutionary success.

