Abstract
In many organisms, local deviations from Chargaff’s second parity rule are observed around replication and transcription start sites and within intron sequences. Here, we use expression data as well as a whole-genome data set of nearly 200 haplotypes to investigate such compositional skews in Drosophila melanogaster genes. We find a positive correlation between compositional skew and gene expression, comparable in strength to similar correlations between expression levels and genome-wide sequence features. This correlation is relatively stronger for germline, compared with somatic expression, consistent with the process of transcription-associated mutation bias. We also inferred mutation rates from alleles segregating at low frequencies in short introns, and show that, whereas the overall GC content of short introns does not conform to the equilibrium expectation, the level of the observed deviation from the second parity rule is generally consistent with the inferred rates.
Keywords: Chargaff’s second parity rule, compositional skew, transcription-associated mutation bias, base composition evolution
Introduction
Chargaff’s second parity rule, that is, the equal proportion of complementary nucleotide bases ([A] = [T] and [G] = [C]) along a strand of DNA, holds globally for most double-stranded DNA genomes (Mitchell and Bridge 2006). Nevertheless, local deviations from this rule are common, especially around replication origins and transcription start sites and within introns (Francino and Ochman 1997; Frank and Lobry 1999; Touchon et al. 2004). Compositional “skew” between strands may be introduced by DNA replication and transcription as a consequence of the directionality of DNA and RNA polymerization. Such skews have been regularly used to identify replication origins (oris) and termini in bacteria (Lobry 1996; Mrazek and Karlin 1998; Picardeau et al. 2000; Zawilak et al. 2001). Recent technological advances in nascent strand purification allowed identification of oris in metazoans as well, and revealed similar skews surrounding these regions (Cayrou et al. 2011, 2012; Comoglio et al. 2015). Compositional skew in transcribed regions has also been observed, which has generally been attributed to transcription-associated mutation bias (TAMB) (Green et al. 2003; Touchon et al. 2003, 2004; Mugal et al. 2009; McVicker and Green 2010). TAMB might arise due to conditions differing between strands during transcription as one strand is chemically associated with the transcriptional machinery and the other exposed in the nucleus, which might result in strand-specific mutation or repair processes (Svejstrup 2002; Fong et al. 2013).
Here, we investigate compositional skews associated with transcription in Drosophila melanogaster using developmental and tissue-specific expression data sets (Chintapalli et al. 2007; Vibranovski et al. 2009; Graveley et al. 2011) and sequence data from a large population sample of the ancestral range of the species (Lachaise et al. 1988; Lack et al. 2015). We find that noncoding regions within genes show strand-specific deviations from the second parity rule consistent with TAMB, whereas the overall GC content deviates from mutational equilibrium, as has been shown before (Kern and Begun 2005; Zeng and Charlesworth 2010; Clemente and Vogl 2012; Robinson et al. 2014).
Results
Association between Compositional Skews and Gene Expression
To investigate compositional skews between strands in transcribed regions, we calculated per gene estimates of CG skew (SCG) and TA skew (STA) from coding-strand intron sequences, for the 1,925 autosomal and 478 X-linked genes that passed our data filtering (see Materials and Methods). If transcription helps to shape skews, this should be reflected in correlations between skews and gene expression. We thus examined correlations between skews and gene expression across different D. melanogaster tissues and developmental stages (Chintapalli et al. 2007; Vibranovski et al. 2009; Graveley et al. 2011) for patterns consistent (or inconsistent) with TAMB.
The skew values calculated by concatenating all introns are SCG = 1.18% (95% CI: 0.97–1.37%) and STA = 0.82% (95% CI: 0.66–0.97%). When looking at per gene skew estimates, we find that both SCG and STA are positively, though weakly, correlated with gene expression (fig. 1), consistent with TAMB, and in keeping with the known preference of C and T content on the coding strand of Drosophila introns (Touchon et al. 2004). As expected, the skew parameters are also positively correlated with each other (Spearman’s ρ = 0.064, P = 0.002), as has been observed for humans (Touchon et al. 2003).
Fig. 1.
—Pearson’s coefficients (with 95% CIs) for the correlations between compositional skew and gene expression across different tissues and developmental stages. (A–C) Correlation of CG skew and gene expression. (D–F) Correlation of TA skew and gene expression. Although 0- to 2-h expression in embryos most likely reflects maternal transcription, which should not necessarily affect germline development, the correlation between maternal expression and later putative zygotic expression (2–4 h) is strong (Spearman’s ρ = 0.937, P < 0.001).
Secondly, spatial and temporal patterns of gene expression and skew are also broadly consistent with some effect of transcription. Specifically, only mutations occurring in the germline, or early in development (prior to the differentiation of germline tissues), are inherited and thus affect long-term base composition (Touchon et al. 2003; McVicker and Green 2010). Thus, we asked how the strengths of the correlations between skew and expression depend on expression in the germline or developmental stage. In fact, the correlation is relatively stronger between skew and expression levels in germ cells, for both ovaries and testes, than for somatic expression (fig. 1A and D), as is also observed in humans (McVicker and Green 2010) and in mice spermatogonia (Arneodo et al. 2011). Further, in a data set consisting of gene expression for three different tissues of the Drosophila testes, the association between skew and expression during early spermatogenesis (in mitotic and meiotic cells) is stronger than that between skew and postmeiotic expression (fig. 1B and E). Similarly, the association between skew and gene expression is stronger for early developmental expression than for later developmental stages (fig. 1C and F). All correlation coefficients are listed in supplementary table S1, Supplementary Material online.
Although these trends are consistent with TAMB, there are a few caveats that require further analysis. The enrichment of C content on the coding strand could, in principle, be due to annotation errors if many annotated introns are in fact protein coding exons, as these tend to be C-rich in Drosophila (Akashi 1994). However, the correlation patterns remain qualitatively similar when excluding introns with lengths that are multiples of three (supplementary fig. S1, Supplementary Material online), which are the most likely to be misannotated exons, as they do not imply frame-shifts. In addition, none of the comparisons based on germline, somatic, or developmental stage expression individually show statistically significant differences (as indicated by the overlapping CIs in fig. 1), though all are stronger in the direction predicted by the TAMB hypothesis. Notably, the correlation between somatic expression and skew is positive (fig. 1A and D), which is not a prediction of the TAMB hypothesis. However, this correlation may be a by-product of the positive correlation between somatic and germline expression. We therefore calculated pairwise partial correlations for skew values and gene expression for ovaries, testes, and soma to estimate the independent effects of each. The results show that there is no relationship between skew values and somatic expression, whereas germline expression remains significantly correlated to skew (supplementary tables S2 and S3, Supplementary Material online). Furthermore, we compare the level of skews between concatenated introns from the 10% most highly and lowly expressed genes in ovaries and testes (table 1). As expected, the 10% most highly expressed genes have significantly higher skew values compared with the 10% most lowly expressed genes, for both ovaries and testes (as indicated by the nonoverlapping CIs in table 1). These results indicate that the level of skew is mainly driven by germline expression.
Table 1.
Skew Values with Their Corresponding 95% CIs (in Square Brackets) Calculated from Concatenating Introns in Genes with the 10% Highest and 10% Lowest Expression in Different Germline Tissues (Number of Genes in Each Category is n = 141)
| Ovary Expression |
Testes Expression |
|||
|---|---|---|---|---|
| High | Low | High | Low | |
| SCG (%) | 4.19 | 0.67 | 4.33 | 0.55 |
| [3.24, 5.17] | [−0.04, 1.36] | [3.18, 5.39] | [−0.08, 1.19] | |
| STA (%) | 2.31 | −0.45 | 2.58 | −0.16 |
| [1.54, 3.06] | [−0.99, 0.15] | [1.75, 3.39] | [−0.68, 0.35] | |
Population Genetic Analysis
To further analyze the likely causes of strand-specific nucleotide composition, we analyzed sites in autosomal short introns (≤65 bp in length) in a sample of 197 Zambian chromosomes from the putatively ancestral population of D. melanogaster from Zambia (Lack et al. 2015; supplementary table S4, Supplementary Material online). Short introns appear to be the least selectively constrained of all sequence classes (Halligan and Keightley 2006; Parsch et al. 2010; Clemente and Vogl 2012), and thus should most closely reflect mutational processes. We first focus on the sites that are fixed (i.e., monomorphic) in the population sample alignment for one of the four possible nucleotides. The proportion of sites fixed for complementary nucleotides on the coding strand of autosomal short introns differs from a 1:1 ratio (calculated from a total of n = 191,747 sites; supplementary table S4, Supplementary Material online), with an excess of C over G (17.35% C vs. 14.91% G; χ2 = 352.8, d.f. = 1, P < 0.001) and T over A nucleotides (34.49% T vs. 33.24% A; χ2 = 43.871, d.f. =1, P < 0.001). Therefore, the resulting skews in short introns are SCG = 7.55% (95% CI: 6.78–8.34%) and STA = 1.84% (95% CI: 1.35–2.41%). A similar pattern holds for all autosomal introns (n = 9,894,445 sites; 30.04% A, 19.87% C, 19.25% G, 30.84% T; supplementary table S4, Supplementary Material online), though the skew is weaker than for short introns—SCG = 1.55% (95% CI: 1.46–1.65%) and STA = 1.32% (95% CI: 1.24–1.40%)—probably because the sequence composition of long introns is more selectively constrained (Haddrill et al. 2005). Notably, the G: C ratio in short introns (≈0.86) is more extreme (χ2= 202.03, d.f. = 1, P < 0.001) than the A: T ratio (≈0.96).
As an alternative to mutation, strand-specific selection could explain strand-specific skews. In particular, selection to avoid the canonical GT and AG splicing signals within intron sequences (Farlow et al. 2012) might lead to an excess of C content on the coding strand compared with the noncoding strand (though note that short introns are AT-rich overall). To test whether mutation alone is sufficient to explain the observed compositional patterns, we estimated mutation rates from singleton frequencies of the autosomal short introns, that is, from sites in the sample alignment which contain a single copy of the minor allele variant. The relatively young age of these low frequency mutations makes it unlikely that their composition has been affected by directional selection (Kimura and Ohta 1973; Messer 2009); instead it should be predominantly influenced by mutation. The mutation rates estimated from singleton frequencies (table 2) agree with previous estimates (supplementary fig. S2, Supplementary Material online). These rates indicate that any mutational asymmetry between coding and noncoding strands is weak, similar to previous findings (Zeng 2010). Furthermore, we find no evidence for mutation-associated compositional skews when directly comparing singleton frequencies between the coding and noncoding strands for each of the complementary nucleotide pairs (supplementary table S5, Supplementary Material online). When we analyze the estimated strand-specific mutation rates overall, however, they do imply C- and T-biased composition on the coding strand: the G-to-C and A-to-T rates are 1.1 and 1.08 times higher than their corresponding reverse mutation rate estimates. Given the point estimates of the mutation rates from table 2, the equilibrium frequencies of fixed sites on the coding strand are (πA, πC, πG, πT) = (0.3653, 0.1245, 0.1231, 0.3871), resulting in the equilibriums skews of SCG = 0.57% and STA = 2.90%. The resulting expected A: T ratio based on mutation rates deviates from a 1:1 ratio (χ2 = 121.11, d.f. = 1, P < 0.001) in the direction of the observed T over A excess, but more extremely (with an expected ratio equal to 1.060 vs. an observed ratio of 1.037; χ2 = 14.592, d.f. = 1, P < 0.001). The expected G: C ratio is in the same direction as the observed C-bias for the coding strand, but does not differ significantly from a 1:1 ratio (χ2 = 1.518, d.f. = 1, P = 0.218). Importantly, these results have been obtained using only point estimates, when in reality there is uncertainty in the estimates. We therefore used a parameter search algorithm (see Materials and Methods) and asked whether there are combinations of mutation rates, within the 95% CIs of these estimates, which are consistent with the data. Specifically, we asked if mutation rates can explain both the skew and overall base composition. The results show that the estimated mutation rates can explain the levels of CG and TA skew observed in short intron sequences (fig. 2A and B). However, the combinations of mutation rates that give rise to the observed levels of skew cannot explain the overall base composition in short introns (fig. 2C and D)—the observed GC-content is too high to be consistent with these rates, that is, the GC content is not in mutational equilibrium, as noted by previous studies (Kern and Begun 2005; Zeng and Charlesworth 2010; Clemente and Vogl 2012; Robinson et al. 2014).
Table 2.
Mutation Rates qij from Nucleotide i to j with the Corresponding 95% CIs, Estimated from the Coding Strand of Autosomal Short Introns
| i → j | qij (Fij/Mi) | qij 95% CI |
|---|---|---|
| A → C | 0.0048 (308/64,053) | 0.0043–0.0053 |
| A → G | 0.0120 (776/64,521) | 0.0112–0.0128 |
| A → T | 0.0110 (709/64,454) | 0.0102–0.0118 |
| C → A | 0.0181 (615/33,886) | 0.0167–0.0195 |
| C → G | 0.0080 (270/33,541) | 0.0071–0.0089 |
| C → T | 0.0293 (1,008/34,359) | 0.0276–0.0310 |
| G → A | 0.0324 (957/29,556) | 0.0304–0.0344 |
| G → C | 0.0089 (256/28,885) | 0.0079–0.0099 |
| G → T | 0.0175 (508/29,107) | 0.0161–0.0190 |
| T → A | 0.0100 (667/66,779) | 0.0093–0.0107 |
| T → C | 0.0113 (753/66,885) | 0.0105–0.0121 |
| T → G | 0.0047 (314/66,446) | 0.0042–0.0052 |
Note.—Fij is the frequency of singletons of type j with major allele i and Mi is the sum of the frequency of sites fixed for nucleotide i and the frequency of singletons of type Fij.
Fig. 2.
—Distributions of skew estimates and nucleotide content obtained from 10,000 independent parameter search runs, conditional on the observed compositional skew in autosomal short introns and the 95% CIs of mutation rates in table 1. (A and B) Distributions of CG and TA skew, respectively; the red dashed line is the observed skew level. (C) The distribution of G (red) and C (black) content; the dashed lines are the observed values. (D) The distribution of A (red) and T (black) content; the dashed lines are the observed values.
Finally, we used a generalized linear model (GLM) to analyze the association between mutation rates and gene expression. Specifically, we analyze the effect of expression on the frequency of singleton mutations from nucleotide i-to-j for the coding strand, using a GLM with a binomial response variable consisting of successes (singletons of type i) and failures (fixed sites of type j); the expression estimates from ovaries, testes, or soma (Chintapalli et al. 2007) were taken as explanatory variables. We analyzed either singletons only in short introns (supplementary table S6, Supplementary Material online), and, in order to perform a more powerful analysis, we repeated the analysis using singletons in all introns (supplementary table S7, Supplementary Material online). As singletons are unlikely to be affected by selection (Kimura and Ohta 1973; Messer 2009), restricting this analysis to putatively neutral short introns may unnecessarily limit power. The results show that the correlations are, regardless of which data set is used, consistently negative with few exceptions, suggesting a possible role of transcription-coupled repair in reducing overall mutation rates (Svejstrup 2002; Fong et al. 2013). In cases where the results of the GLM analyses indicate expression as a significant predictor of mutation rates, the associated coefficient is usually negative, implying that transcription is not mutagenic overall. Nonetheless, correlation coefficients associated with C or T singletons tend to be less negative than those associated with G or A singletons (supplementary fig. S3, Supplementary Material online), implying that mutation rates change with expression in a manner consistent with the observed directions of compositional skews.
Discussion
In eukaryotes, transcription appears to drive asymmetries in the frequencies of complementary nucleotides between the coding and noncoding strands of transcribed regions (Touchon et al. 2003, 2004; McVicker and Green 2010). Generally, T is more abundant than its complement A on the coding strand, whereas either G or C content is observed at higher frequencies in vertebrates or invertebrates, respectively (Touchon et al. 2004).
In D. melanogaster, we find that gene expression in different tissues and across development correlates with compositional skew in a manner consistent with TAMB (fig. 1). However, these correlations are weak and explain only a small proportion of the variance in skew levels between genes. The reason that the TAMB signal is weak is likely partly due to the fact that base composition in Drosophila introns changes with sequence length, and is affected by both purifying and positive selection (Andolfatto 2005; Haddrill et al. 2005; Singh et al. 2005; Halligan and Keightley 2006; Haddrill and Charlesworth 2008). Nevertheless, weak genome-wide correlations can shed light on molecular processes shaping nucleotide base composition over evolutionary time: for example, the relationship between intronic GC content and recombination is similarly weak, but probably reflects the action of GC-biased gene conversion, now a well-established phenomenon of eukaryote genome evolution (Pessia et al. 2012).
Materials and Methods
Data Used in the Analyses
Expression data were taken from Chintapalli et al. (2007), Vibranovski et al. (2009), and Graveley et al. (2011). The raw expression estimates were transformed with log2(value + 1). For RNAseq data, these values are FPKM values; for the microarray analyses, they are relative flourescence intensities. Per gene expression values for soma and later developmental stages were calculated as averages across the nongermline and late developmental stage expression values, respectively. Replication start sites (RSS) were determined as peaks of the nascent strand signal or as a site of maximum coverage within a given ori region as identified in Cayrou et al. (2011) and Comoglio et al. (2015), respectively. We further analyzed a sample of the Zambian D. melanogaster population (Lack et al. 2015). In total, the data set consists of 197 sequences for each autosome and 196 sequences for the X chromosome. Sequences were annotated using the reference genome annotation of D. melanogaster (r5.57 from http://www.flybase.org/; last accessed March 10, 2017). For statistical analyses, R (R Core Team 2014) was used.
Calculation of the Skew Parameters
The skew parameters (SCG and STA) were calculated for each gene using the D. melanogaster reference sequence (r5.57 from http://www.flybase.org/; last accessed March 10, 2017). All intron sequences of the longest transcript of a gene were concatenated and estimates of skews per gene were calculated as: SCG = (C−G)/(C + G) and STA = (T−A)/(T + A). Additionally, seven bases were trimmed from the 5′ end and 35 bases from the 3′ end of each intron to exclude genomic regions where the nucleotide composition is affected by the presence of splicing sites (supplementary fig. S4, Supplementary Material online). Furthermore, genes overlapping regions ±500 bp around RSS were excluded from the analysis. Only genes containing ≥100 bp of concatenated intron sequence were considered. This filtering procedure left us with 1,925 autosomal and 478 X-linked genes available for analysis. The 95% CIs for each of the skew parameters were estimated from 1,000 bootstrap-resamples. Each resample consisted of the number of observations equal to the number of sites used to calculate the original skew parameter, and the probabilities of drawing a specific nucleotide equal to the observed relative frequencies of nucleotides.
Inference of Site Frequency Spectra and Mutation Rates
Site frequency spectra were inferred from the Zambian D. melanogaster sample (Lack et al. 2015) for all six possible combinations of base pairs, for both autosomal short introns (≤65 bp in length; Halligan and Keightley 2006; Parsch et al. 2010; Clemente and Vogl 2012), and all introns (supplementary table S4, Supplementary Material online). Using custom Python scripts, we filtered out sites that overlapped coding sequences or contained an undefined nucleotide state in at least one of the sequences in the sample alignment. Furthermore, sites belonging to the longest transcript of a gene were considered and sites with more than two alleles were filtered out. Intron sequences were trimmed as described previously. Mutation rates were calculated from autosomal short intron sequences as qij=Fij/Mi, where qij indicates the mutation rate from nucleotide i to j, Fij is the frequency of singletons of type j with major allele i, and Mi is the sum of the frequency of sites fixed for nucleotide i and the frequency of singletons of type Fij. The CIs for the mutation rate estimates were determined by assuming binomial probabilities with the number of successes x = Fij and the number of corresponding observations n = Mi.
Analysis of Skew Level and Mutation Rate Estimates
We applied a parameter optimization algorithm to search for combinations of mutation rates, within their respective 95% CIs (table 1), which would recapitulate the observed levels of skew. To this end, we utilized the Sequential Least SQuares Programming (SLSQP) method as implemented in the Python library “scipy” (Jones et al. 2001). The parameters for each optimization run were randomly initialized within the 95% CIs of the inferred mutation rates.
GLM Analysis
The GLM analysis was conducted using the “glm” function in R (R Core Team 2014) with the response variable following a binomial distribution and the default logit link function. The response variable was given as a two-column matrix where the first column contained the number of singletons of a specific type (“successes”), whereas the second contained the number of corresponding fixed sites (“failures”). These frequencies were estimated per gene. The explanatory variables were gene expression estimates for either ovaries, testes, or soma provided in Chintapalli et al. (2007).
Supplementary Material
Supplementary data are available at Genome Biology and Evolution online.
Supplementary Material
Acknowledgments
The authors thank all members of the Institute of Population Genetics for support and discussion. We also thank two anonymous reviewers whose suggestions helped to improve the article. The work was funded by the Austrian Science Fund (FWF; W1225-B20).
Literature Cited
- Akashi H. 1994. Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 1363:927–935. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Andolfatto P. 2005. Adaptive evolution of non-coding DNA in Drosophila. Nature 4377062:1149–1152. [DOI] [PubMed] [Google Scholar]
- Arneodo A, et al. 2011. Multi-scale coding of genomic information: from DNA sequence to genome structure and function. Phys Rep. 498(2-3):45–188. [Google Scholar]
- Cayrou C, et al. 2011. Genome-scale analysis of metazoan replication origins reveals their organization in specific but flexible sites defined by conserved features. Genome Res. 219:1438–1449. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cayrou C, et al. 2012. New insights into replication origin characteristics in metazoans. Cell Cycle 114:658–667. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chintapalli V, Wang J, Dow J.. 2007. Using FlyAtlas to identify better Drosophila melanogaster models of human disease. Nat Genet. 396:715–720. [DOI] [PubMed] [Google Scholar]
- Clemente F, Vogl C.. 2012. Unconstrained evolution in short introns?-An analysis of genome-wide polymorphism and divergence data from Drosophila. J Evol Biol. 2510:1975–1990. [DOI] [PubMed] [Google Scholar]
- Comoglio F, et al. 2015. High-resolution profiling of Drosophila replication start sites reveals a DNA shape and chromatin signature of metazoan origins. Cell Rep. 115:821–834. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Farlow A, Dolezal M, Hua L, Schlötterer C.. 2012. The genomic signature of splicing-coupled selection differs between long and short introns. Mol Biol Evol. 291:21–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fong YW, Cattoglio C, Tjian R.. 2013. The intertwined roles of transcription and repair proteins. Mol Cell 523:291–302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Francino PM, Ochman H.. 1997. Strand asymmetries in DNA evolution. Trends Genet. 136:240–245. [DOI] [PubMed] [Google Scholar]
- Frank AC, Lobry JR.. 1999. Asymmetric substitution patterns: a review of possible underlying mutational or selective mechanisms. Gene 2381:65–77. [DOI] [PubMed] [Google Scholar]
- Graveley BR, et al. 2011.The developmental transcriptome of Drosophila melanogaster. Nature 4717339:473–479. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Green P, Ewing B, Miller W, Thomas PJ, Green ED.. 2003. Transcription-associated mutational asymmetry in mammalian evolution. Nat Genet. 334:514–517. [DOI] [PubMed] [Google Scholar]
- Haddrill PR, Charlesworth B, Halligan DL, Andolfatto P.. 2005. Patterns of intron sequence evolution in Drosophila are dependent upon length and GC content. Genome Biol. 68:R67.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haddrill PR, Charlesworth B.. 2008. Non-neutral processes drive the nucleotide composition of non-coding sequences in Drosophila. Biol Lett. 44:438–441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Halligan DL, Keightley PD.. 2006. Ubiquitous selective constraints in the Drosophila genome revealed by a genome-wide interspecies comparison. Genome Res. 167:875–884. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jones E, Oliphant T, Peterson P.. 2001. SciPy: open source scientific tools for python. Available from: http://www.scipy.org/, last accessed November 30, 2016.
- Kern AD, Begun DJ.. 2005. Patterns of polymorphism and divergence from noncoding sequences of Drosophila melanogaster and D. simulans: evidence for nonequilibrium processes. Mol Biol Evol. 221:51–62. [DOI] [PubMed] [Google Scholar]
- Kimura M, Ohta T.. 1973. The age of a neutral mutant persisting in a finite population. Genetics 751:199–212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lachaise D, et al. 1988. Historical biogeography of the Drosophila melanogaster species subgroup. Evol Biol. 22:159–225. [Google Scholar]
- Lack JB, et al. 2015. The Drosophila genome nexus: a population genomic resource of 623 Drosophila melanogaster genomes, including 197 from a single ancestral range population. Genetics 1994:1229–1241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lobry J. 1996. Asymmetric substitution patterns in the two DNA strands of bacteria. Mol Biol Evol. 135:660–665. [DOI] [PubMed] [Google Scholar]
- McVicker G, Green P.. 2010. Genomic signatures of germline gene expression. Genome Res. 2011:1503–1511. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Messer PW. 2009. Measuring the rates of spontaneous mutation from deep and large-scale polymorphism data. Genetics 1824:1219–1232. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mitchell D, Bridge R.. 2006. A test of Chargaff’s second rule. Biochem Biophys Res Commun. 3401:90–94. [DOI] [PubMed] [Google Scholar]
- Mrazek J, Karlin S.. 1998. Strand compositional asymmetry in bacterial and large viral genomes. Proc Natl Acad Sci U S A. 957:3720–3725. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mugal CF, von Grünberg HH, Peifer M.. 2009. Transcription-induced mutational strand bias and its effect on substitution rates in human genes. Mol Biol Evol. 261:131–142. [DOI] [PubMed] [Google Scholar]
- Parsch J, Novozhilov S, Saminadin-Peter S, Wong K, Andolfatto P.. 2010. On the utility of short intron sequences as a reference for the detection of positive and negative selection in Drosophila. Mol Biol Evol. 276:1226–1234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pessia E, et al. 2012. Evidence for widespread GC-biased gene conversion in eukaryotes. Genome Biol Evol. 47:675–682. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Picardeau M, Lobry JR, Hinnebusch BJ.. 2000. Analyzing DNA strand compositional asymmetry to identify candidate replication origins of Borrelia burgdorferi linear and circular plasmids. Genome Res. 1010:1594–1604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- R Core Team. 2014. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. [Google Scholar]
- Robinson MC, Stone EA, Singh ND.. 2014. Population genomic analysis reveals no evidence for GC-biased gene conversion in Drosophila melanogaster. Mol Biol Evol. 312:425–433. [DOI] [PubMed] [Google Scholar]
- Singh ND, Davis JC, Petrov DA.. 2005. Codon bias and noncoding GC content correlate negatively with recombination rate on the Drosophila X chromosome. J Mol Evol. 613:315–324. [DOI] [PubMed] [Google Scholar]
- Svejstrup JQ. 2002. Mechanisms of transcription-coupled DNA repair. Nat Rev Mol Cell Biol. 31:21–29. [DOI] [PubMed] [Google Scholar]
- Touchon M, Arneodo A, d’Aubenton Carafa Y, Thermes C.. 2004. Transcription-coupled and splicing-coupled strand asymmetries in eukaryotic genomes. Nucleic Acids Res. 3217:4969–4978. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Touchon M, Nicolay S, Arneodo A, d’Aubenton-Carafa Y, Thermes C.. 2003. Transcription-coupled TA and GC strand asymmetries in the human genome. FEBS Lett. 5553:579–582. [DOI] [PubMed] [Google Scholar]
- Vibranovski MD, Lopes HF, Karr TL, Long M, Malik HS.. 2009. Stage-specific expression profiling of Drosophila spermatogenesis suggests that meiotic sex chromosome inactivation drives genomic relocation of testis-expressed genes. PLoS Genet. 511:e1000731.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zawilak A, et al. 2001. Identification of a putative chromosomal replication origin from Helicobacter pylori and its interaction with the initiator protein DnaA. Nucleic Acids Res. 2911:2251–2259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zeng K, Charlesworth B.. 2010. Studying patterns of recent evolution at synonymous sites and intronic sites in Drosophila melanogaster. J Mol Evol. 701:116–128. [DOI] [PubMed] [Google Scholar]
- Zeng K. 2010. A simple multiallele model and its application to preferred-unpreferred codons using polymorphism data. Mol Biol Evol. 276:1327–1337. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


