. 2011 Jan;27(1):1–6. doi: 10.1016/j.tig.2010.10.004

DNA double-strand break repair and the evolution of intron density

Ashley Farlow 1, Eshwar Meduri 1, Christian Schlötterer 1
PMCID: PMC3020277  PMID: 21106271


The density of introns is both an important feature of genome architecture and a highly variable trait across eukaryotes. This heterogeneity has posed an evolutionary puzzle for the last 30 years. Recent evidence is consistent with novel introns being the outcome of the error-prone repair of DNA double-stranded breaks (DSBs) via non-homologous end joining (NHEJ). Here we suggest that deletion of pre-existing introns could occur via the same pathway. We propose a novel framework in which species-specific differences in the activity of NHEJ and homologous recombination (HR) during the repair of DSBs underlie changes in intron density.

Is intron density controlled by selection or mutation?

All eukaryotes hold in common a highly complex spliceosome devoted to the identification and removal of introns from the nascent mRNA. Although speculative, it is probable that both the very first introns and core components of the spliceosome arose from the mutational decay and cooption of self-splicing group II introns during early eukaryotic evolution [1–3]. Subsequent evolution has involved both extensive intron gain and loss, leading to the current distribution of intron density that varies by several orders of magnitude between species [4–6]. With as few as four introns in the entire Giardia intestinalis genome [7], and more than eight per gene in most mammals, intron density is a key determinant of genome architecture. Although this highly variable trait has important phenotypic consequences, both the mutational mechanisms and the evolutionary conditions that alter intron density have remained unclear.

The absence of group II introns from eukaryotic nuclear genomes, and the fact that novel spliceosomal introns do not share length or sequence characteristics with group II introns, have argued for a two-tier model of intron evolution in which different mechanisms underlie intron-density variation [8,9]. It now appears likely that intron gain is mediated by the capture of DNA fragments during NHEJ of DSBs [10]. The presence of short direct repeats overlapping the splice sites of a subset of novel introns in Daphnia [10], Drosophila [11], and Aspergillus [12] is consistent with the capture of an exogenous fragment at the overhanging ends of a staggered breakpoint (Box 1) [10–13]. Based on patterns of intron gain and loss in Drosophila, we propose that a major proportion of intron loss also occurs as an outcome of NHEJ (in addition to the previously established mechanism of HR-mediated intron loss).

Box 1. DNA double-strand break repair.

The cell utilises two major pathways to repair DNA double-strand breaks (DSBs) [19]. Non-homologous end joining (NHEJ) is the rapid (approx. 30 min) and error-prone religation of two free DNA ends (Figure I), whereas homologous recombination (HR) involves the accurate replacement of a broken segment by copying the homologous chromosome (or sister chromatid), a process that can take more than 7 h to complete [55]. Both pathways compete for the repair of a DSB and the ratio of NHEJ to HR activity is highly dependent on the type of damage, the stage of the cell cycle (HR is often restricted to S/G2-phase), the chromosomal location of the damage and the organism involved [43,55].

We suggest that a mechanism of intron turnover based on DSB repair could provide insight into the evolution of intron density. To date, much of this discussion has focused on the relative importance of selection and drift in shaping taxon-specific variation in intron density (e.g. [14,15]). An alternative explanation is that the dynamics of intron evolution depend largely on changes in the rates of mutations that generate or remove an intron [16]. A genome-wide change in intron density (as has occurred multiple times throughout eukaryotic evolution) requires non-randomness in either the introduction of variants (mutation bias) or in the transmission of variants (selection). As such, the equilibrium intron density can be expressed as:

IntronDensity=μGAINμLOSS×pGAINpLOSS [1]

where μ is the mutation rate and p is the probability of fixation [17]. The acceptance factor (p) is a function of the selection coefficient (s) and the population size (which is equal for gain and loss within a species). Thus, differential acceptance only occurs if there is a systematic difference in s between the presence and absence of an intron. Numerous studies have sought such a difference (reviewed in [8]), proposing that in general novel introns are deleterious (e.g. [10,15]), or that introns are beneficial and that intron density is therefore adaptive (e.g. [18]).

Here, we suggest that changes in mutation bias are a major factor underlying species-specific intron density. Given that both intron gain and loss might be outcomes of DSB repair, we propose that the relative importance of NHEJ and HR (a ratio that does vary between species) might alter the rate of mutations that generate novel introns or remove existing ones.

Does microhomology between the 5′ and 3′ splice sites promote NHEJ-mediated intron deletion?

Intron loss is thought to be mediated by either HR or genomic deletion [8]. HR typically utilises the homologous chromosome or sister chromatid as a template during the repair of a DSB [19]. However, in rare cases a reverse-transcriptase-generated cDNA copy of the same gene [20–22] or an intronless retrogene elsewhere in the genome could serve as a template. Both scenarios result in the precise deletion of the genomic intron because the repair template has previously undergone splicing (Figure I in Box 1).

Figure I.

Figure I

Intron gain and loss as outcomes of DSB repair. (a) NHEJ repair is stabilised by short ‘microhomology’ (blue bases) after 5′ to 3′ resection to generate single-stranded overhangs. Repair could be clean, or lead to a deletion or insertion (as shown). Microhomology pairing within the overhangs results in the insertion of a short direct repeat (red arrows). (b) Microhomology between the overhangs and an exogenous/free DNA fragment can result in a large insertion [68,69] that might or might not be flanked by short direct repeats (as seen for novel introns [10,11]). (c) Microhomology pairing between the 5′ and 3′ splice sites flanking an intronic DSB will cause the precise deletion of the intron, leaving only the original AGGT motif. (d) DNA repair also occurs via homologous recombination. If the template for repair is an intronless cDNA then the genomic intron is lost.

Interestingly, the two ends of a full-length cDNA will resemble DSBs and can thus trigger HR [20]. This would generate a double-crossover event at each end of a gene leading to a long tract of gene conversion potentially spanning (and thus deleting) several introns at once. This model has a powerful advantage in that it offers an explanation for the simultaneous and precise deletion of multiple adjacent introns [12,22–24]. However, despite strong evidence for precise HR-mediated intron loss, several species show a large number of imprecise intron deletions (20% in Drosophila [11,25] and ∼28% in Caenorhabditis [26] for example) that are inconsistent with HR, indicating that more than one mechanism is causing intron loss.

Although it is generally accepted that genomic deletions are an alternative intron-loss pathway, mechanistic details are lacking. Does the error-prone repair of DSBs via NHEJ, which often generates deletions, offer an explanation [26]? The rejoining of single-stranded overhangs by NHEJ is stabilised by the pairing of short (1–6 bp) identical motifs (often referred to as microhomology) on either side of a breakpoint [19]. In many cases the first such similarity encountered on either side of an intronic DSB will be the consensus motif AG|GT of the 5′ and 3′ splice sites (where | indicates the splice site). If pairing occurs between the splice sites, the subsequent repair event will cause the precise deletion of the intronic sequence (Figure I in Box 1) [26]. If an alternative microhomology in the proximity of the DSB is used, this would generate an imprecise deletion. Therefore, NHEJ-mediated deletion is consistent with both the precise and imprecise deletion of one intron at a time.

Experimental support for NHEJ-mediated deletion comes from Caenorhabditis and Drosophila where lost introns show an overly strong adherence to the consensus motif AG|GT at both their 5′ and 3′ splice sites (based on the sequence in the closest neighbouring species) [25,26]. Furthermore, lineages with very few introns tend to have 5′ and 3′ splice sites with high sequence similarity, whereas intron-rich species show highly degenerate splicing motifs [27–29]. Within longer introns one might expect to encounter short identical motifs before reaching the splice sites, hence precise deletion should favour shorter introns, and this is in fact the case in mammals [30], Drosophila [11] and yeast [12].

Interestingly, recently lost introns in Drosophila are more likely to contain motifs imparting a higher twist angle on the DNA backbone than do stable introns (see supplementary material online). Such motifs have been associated with a high propensity to suffer DSBs [31,32] due to the formation of a non-canonical DNA secondary structure [33] and replication stress and instability [34]. Although indirect, this might suggest that some introns have a higher than average chance of undergoing intron loss via NHEJ or HR.

DSB repair: a common mechanism for intron gain and loss?

If repair of DSBs is the mechanistic basis of both intron gain (via NHEJ) and intron loss (via a combination of NHEJ and HR) then one might expect a positive correlation between the rates of intron gain and loss across species and over time. A survey of these rates across the ∼40 million years of Drosophila evolution shows just such a strong positive correlation (Spearman correlation coefficient = 0.89, P < 0.0001) (Figure 1) [11]. Likewise, the same positive relationship is observed over much deeper branches of eukaryotic evolution (Spearman correlation coefficient = 0.69, P < 0.003 [4]). Although it could be possible that other factors influence this positive correlation, we suggest that, in general, the rate of intron gain is linked to the rate of intron loss – because both processes are at least partly an outcome of DSB repair.

Figure 1.

Figure 1

A highly significant positive correlation between the rate of intron gain and intron loss is consistent with commonality in the underlying mutational mechanism. The number of intron gain and loss events that have occurred along each branch of the Drosophila clade was taken as previously published [11]. Each datapoint represents one branch of Drosophila evolution, allowing the level of intron gain and loss over the same time period to be compared (Figure S2 in the supplementary material online).

Several points in eukaryotic evolution have been marked by either a dramatic increase in intron density (leading to Metazoa and Deuterostomia for example) or the overwhelming loss of introns [4,35,36]. One common hypothesis is that selection drives changes in intron density, for example through selection for genome reduction or an increase in alternative splicing [5,6,37]. However, if selection is the dominant factor determining intron density then the rates of intron gain and loss should be negatively correlated because the same evolutionary process would drive both rates in opposite directions [5]. Likewise, a negative correlation is expected if changes in intron density are purely the result of changes in population size [38]. Although these expectations could be considered overly simplistic (by ignoring any number of more complex evolutionary scenarios), the observation of a positive correlation across several timescales and many species argues against a general role for selection in determining intron density. Furthermore, despite a great deal of effort attempting to link intron density to either adaptation (via selection for or against introns) or genetic drift (and population size), no strong connection has been established [4–6,35,36,39,40].

Changing intron density: the relative usage of NHEJ and HR differs between species

The large change in intron density that has taken place at several points in eukaryotic evolution requires a shift in the ratio of intron gain to loss. We suggest that this change could be modulated by differences in the activity of NHEJ and HR during DSB repair. Both pathways are largely separate and compete for the repair of DSBs [41,42] and, significantly, the relative contribution of the two pathways is highly species-specific (Table 1) [19]. NHEJ is predominantly used in mammals [43,44] and Drosophila [45,46], whereas HR is the major pathway in Saccharomyces cerevisiae [47], a species having undergone almost complete intron loss. By contrast, NHEJ is the major repair pathway in two comparatively intron-rich fungal species Schizosaccharomyces pombe [48] [∼1 intron/gene (Wellcome Trust Sanger Institute,, and in Cryptococcus neoformans [49] (5.3 introns/gene [50]) suggesting a correspondence between NHEJ usage and intron density.

Table 1.

The relative contributions of NHEJ and HR differ between species

Species Introns
per gene
Estimated contributiona
Approximate ratio DSB typeb Refs
S. cerevisiae 0.07 <1% >99% 1:100 Complex [58]
S. cerevisiae 2–18% 82–98% 1:9 HO [19]
S. cerevisiae Minimal Major Complex (bleomycin) [59]
S. pombec 0.8 ∼14% 66% 1:5 HO [60]
D. melanogaster ∼4 19% 11% 3:2 I-SceI [45,61]
C. elegans 4.7 ∼0 Dominantd Ionising radiation [62]
C. neoformans 4.7 Major 0–47% >1:1 Transgene integration [49]
Mus musculuse ∼8 57% 19% 3:1 I-SceI [63]
Homo sapiens ∼9 86% 14% 9:1 I-SceI [55]
Homo sapiens 75% 25% 3:1 Incompatible ends [55]

Several studies also differentiate repair via single-strand annealing (SSA) which generates large deletions. There is no mechanistic basis for intron gain or loss via SSA [19], therefore the contribution of this pathway is not included here.


I-SceI and HO endonucleases generate 4 bp complementary overhangs.


NHEJ becomes the dominant pathway during the G1 phase of the haploid cell cycle because sister chromatids are not available as templates for HR [64].


Germline DSBs in C. elegans are exclusively repaired via HR. Interestingly, intron loss is 400-fold higher in nematodes than in mammals [4,65], and most novel introns arise via the unusual process of intronisation [66].


Genetic differences in the efficiency of HR segregate within natural populations of mouse [67].

We propose a simple model in which the relative rate of these two DSB repair pathways (in combination with intron density) can produce either an increase or decrease in intron number (Figure 2). Given that the relative contribution of NHEJ and HR to DSB repair varies over several orders of magnitude between intron-rich and -poor species, it is difficult to know the relevant values for each parameter that existed within a species over evolutionary time. However, using a conservative range of values clearly demonstrates that intron turnover can shift from net gain to net loss. Although this simple model does not consider variables such as the rate of exogenous DNA capture during NHEJ, or variation in splicing efficiency and intron size between clades, it does illustrate how adjusting a single parameter (the relative activity of NHEJ and HR) could be sufficient to explain the heterogeneity in intron density among eukaryotic species, an observation which has puzzled researchers for three decades.

Figure 2.

Figure 2

Changes to the relative activity of NHEJ and HR could be sufficient to explain both positive and negative rates of intron turnover. We model the short-term intron turnover by considering gain to be an outcome of NHEJ, whereas intron loss is dependent on NHEJ, HR and intron density, such that:

Small changes to the relative roles of NHEJ and HR are sufficient to increase or decrease intron numbers depending on intron density. This simple model highlights the intuitive finding that intron density presents a limiting factor to intron proliferation, consistent with the excess of intron loss observed in intron-dense genomes [30,56,57]. Although the parameters used are largely arbitrary (Table S2), with the solid line representing a fivefold excess of NHEJ over HR and the broken line a twofold excess of HR, these values are reasonable considering the >100-fold preference of HR in budding yeast [47] and the general preference for NHEJ in metazoa [19,45] (Table 1) (see also Table S2 in the supplementary material online).

This model might partly explain the curious observation that, whereas almost all intron-poor species have a strong 5′ bias in intron position, intron-rich species do not [51,52]. The dominant action of cDNA-mediated HR in these species might lead to the preferential loss of 3′ [53] and internal introns [54] due to the directionality of reverse transcriptase. However, a more dominant role of NHEJ-mediated intron loss in intron-rich species would not generate this 3′ bias.

Concluding remarks

Based on the suggestion that intron gain and loss are outcomes of DSB repair, we propose that intron density is consistent with the long-term efficiency of the two repair pathways, NHEJ and HR. Although this hypothesis is consistent with much of the available data, we hope our proposal could serve as a null hypothesis against which new genomic data can be tested. This points future work in two directions: experimental evolution and population genetic surveys to establish the molecular outcomes of NHEJ and HR, and estimation of the relative activity of these two pathways in species that have undergone recent changes in intron density. Whereas a full account of intron evolution will include examples where intron density is influenced by selection and drift, here we suggest a dominant role for genetic changes to the activity of NHEJ and HR in generating species-specific differential intron gain/loss mutation rates.


We are grateful A. McGregor for raising our awareness of the different rates of NHEJ and HR in Drosophila, and thank A. Tucker and J. Heraud for feedback on this manuscript. Many thanks to other members of the Institute of Population Genetics for helpful discussion. This work was supported by grants of the FWF Austrian Science Fund (P19832).


Appendix A

Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.tig.2010.10.004.

Appendix A. Supplementary data

mmc1.pdf (129.7KB, pdf)


