Abstract
Rates of spontaneous mutation determine the ability of viruses to evolve, infect new hosts, evade immunity and undergo drug resistance. Contrarily to RNA viruses, few mutation rate estimates have been obtained for DNA viruses, because their high replication fidelity implies that new mutations typically fall below the detection limits of Sanger and standard next-generation sequencing. Here, we have used a recently developed high-fidelity deep sequencing technique (Duplex Sequencing) to score spontaneous mutations in human adenovirus 5 under conditions of minimal selection. Based on >200 single-base spontaneous mutations detected throughout the entire viral genome, we infer an average mutation rate of 1.3 × 10−7 per base per cell infection cycle. This value is similar to those of other, large double-stranded DNA viruses, but an order of magnitude lower than those of single-stranded DNA viruses, consistent with the possible action of post-replicative repair. Although the mutation rate did not vary strongly along the adenovirus genome, we found several sources of mutation rate heterogeneity. First, two regions mapping to transcription units L3 and E1B-IVa2 were significantly depleted for mutations. Second, several point insertions/deletions located within low-complexity sequence contexts appeared recurrently, suggesting mutational hotspots. Third, mutation probability increased at GpC dinucleotides. Our findings suggest that host factors may influence the distribution of spontaneous mutations in human adenoviruses and potentially other nuclear DNA viruses.
Author Summary
Next-generation sequencing has provided a powerful tool for studying microbial genetic diversity but suffers from relatively low per-base accuracy, limiting our ability to detect low-frequency polymorphisms and spontaneous mutations. However, this limitation has been solved recently by the development of high-fidelity deep sequencing techniques. Taking advantage of these advancements, here we provide the first unbiased genome-wide characterization of the rate of spontaneous mutation of a human DNA virus (adenovirus 5) under controlled laboratory conditions. The adenovirus genome shows a relatively low mutation rate, consistent with high replication fidelity and the action of post-replicative repair. We also found evidence for mutation rate heterogeneities and regions of genetic instability in the viral genome. Together with previous reports, our findings indicate that DNA viruses with large double-stranded genomes mutate significantly slower than those with small single-stranded genomes.
Introduction
DNA viruses have been traditionally viewed as slowly-evolving entities, but this notion has been challenged in the last decade after the discovery of several highly diverse and fast-evolving DNA viruses [1–6]. The pace of evolution should be dependent on the rate at which new spontaneous mutations are produced, yet it is currently accepted that DNA virus mutation rates are typically much lower than those of RNA viruses [7]. However, as opposed to RNA viruses, few mutation rate estimates have been obtained for DNA viruses, which include four bacteriophages (φX174, m13, λ, and T4), herpes simplex virus, and human cytomegalovirus [7–11]. Moreover, these estimates were derived from indirect, phenotype-based methods of mutation detection, used very small portions of the viral genome, or suffered from bias due to selection acting on population mutation frequencies. Therefore, we currently lack an unbiased, genome-wide view of how spontaneous mutations are produced in DNA viruses. Although next-generation sequencing (NGS) has made it possible to analyze genetic variation in full-length DNA virus genomes with unprecedented detail, its relatively low per-read accuracy has prevented detection of rare variants, including new spontaneous mutations. This problem has been solved in recently-developed methods that increase the accuracy of NGS by orders of magnitude [12,13], now permitting an in-depth characterization of DNA virus spontaneous mutation rates.
Adenoviruses are non-enveloped icosahedral, viruses with double-stranded linear DNA genomes of 26–45 kbp. They infect a broad range of vertebrates, and over 60 serotypes of human adenoviruses have been identified and grouped into seven species (A-G) and serotypes [14]. Human adenoviruses can cause a wide variety of diseases including eye, gut and respiratory infections that are typically not clinically relevant for healthy individuals, yet potentially life-threating for immuno-compromised patients [15]. The adenovirus genome is flanked by inverted terminal repeats (ITR) containing the origins of replication and encodes early-expressed proteins required for DNA replication (transcription units E1A, E1B, E2, E3 and E4), late structural proteins (transcription units L1, L2, L3, L4 and L5), and intermediate-expressed proteins (IX and IVa2). Transcription units can be oriented in either direction and undergo extensive alternative splicing to yield diverse mRNA products, referred to as genes. Adenoviruses constitute an excellent model for studying the evolution of DNA viruses due to their high prevalence, broad tropism, tractability, and relatively simple genomes. Sequencing of isolates from patients has revealed peaks of genetic diversity at some regions of the hexon, fiber, penton base genes, and E3, which encode exposed domains of the capsid required for host cell binding and entry, or proteins involved in virus-host interactions and immune evasion [16,17]. Some of these hypervariable regions are being currently used for genotyping adenovirus isolates by PCR and Sanger sequencing, as well as by NGS [18].
Here, we have used high-fidelity NGS to analyze the genome-wide rate of spontaneous mutation of human adenovirus C5 (HAdv5) under controlled laboratory conditions. After an endpoint dilution step to remove pre-existing diversity and short-term culturing to minimize the effects of natural selection, we sequenced >98% of the viral genome with >1000-fold coverage and extremely high accuracy. This allowed us to identify >200 spontaneous mutations produced at different positions of the viral genome. The estimated rate of spontaneous mutation was 1.3 × 10−7 per site per cell infection cycle, and the mutational spectrum was dominated by G-to-A and C-to-T base transitions. We found that different transcriptions units mutated at roughly similar rates, indicating that hypervariable regions originate mainly by selection and not by mutational hotspots. However, we found several sources of mutation rate heterogeneity. First, GpC dinucleotides showed increased mutation probability. Second, we identified two 5 kpb regions approximately mapping to E1B, IX, IVa2 and L3 which showed a significant reduction in the number of accumulated mutations. Third, several individual genome sites located within low-complexity sequence contexts exhibited the same mutations recurrently in independently replicating viruses, suggesting the presence of mutational hotspots. Some of these mutation rate heterogeneities correlated with changes in diversity in publicly available patient-derived sequences.
Results
Detection of HAdv5 mutations by high-fidelity deep sequencing
HAdv5 was subjected to three serial end-point dilution steps in HeLa cells and then re-amplified by two serial transfers in liquid culture at high multiplicity of infection (MOI) to obtain sufficient viral genome copies to carry out DNA extraction and NGS without PCR amplification, such that we could avoid PCR-driven sequencing errors (Fig 1).
The endpoint dilution steps ensured that viral growth was initiated from a single infectious unit. Since, after three serial endpoint dilutions, all pre-existing genetic diversity should have been removed, variants observed in the sequenced populations should correspond to newly-produced, spontaneous mutations. HAdv5 DNA was purified from the cytoplasm of infected cells to avoid carrying over large amounts of nuclear cellular DNA, and directly subjected to Duplex Sequencing (DS) using the Illumina platform. DS relies on template tagging and strand complementarity to increase base call accuracy by orders of magnitude compared to conventional NGS [13,19]. For each of three independent biological replicates, 99% of the HAdv5 genome was sequenced with an average coverage >2500 in each replicate (Fig 2A). We found 68, 78, and 62 different single-base substitutions in 93.2, 115.7, and 123.7 Mb sequenced, respectively, yielding an average per-base mutation frequency of (6.4 ± 0.7) × 10−7 (Table 1).
Table 1. Spontaneous mutational spectrum of HAdv5.
Mutation | Replicate assay | Total % | ||
---|---|---|---|---|
1 | 2 | 3 | ||
G→A | 11 | 22 | 13 | 22.1 |
C→T | 13 | 13 | 17 | 20.7 |
A→G | 9 | 17 | 6 | 15.4 |
T→C | 6 | 5 | 7 | 8.7 |
G→T | 8 | 2 | 1 | 5.3 |
C→A | 8 | 5 | 4 | 8.2 |
T→G | 2 | 3 | 3 | 3.8 |
A→C | 2 | 2 | 3 | 3.4 |
G→C | 2 | 4 | 3 | 4.3 |
C→G | 2 | 2 | 1 | 2.4 |
A→T | 4 | 1 | 1 | 2.9 |
T→A | 1 | 2 | 3 | 2.9 |
Total | 68 | 78 | 62 | 100.0 |
Mbp sequenced | 93.2 | 115.7 | 123.7 | - |
Mutation frequency (× 106) | 0.73 | 0.67 | 0.50 | - |
These mutations were present at frequencies below 1% in sequence reads (S1 Table). Hence, it is highly unlikely that these were pre-existing polymorphisms carried forward from the initial population, because any variant that may have survived the three endpoint dilutions should have reached very high population frequencies after such drastic bottlenecks. To test whether the observed variants were sequencing errors, we performed DS of a purified E. coli plasmid pUC18, which should exhibit low diversity given the high replication fidelity of bacteria. This yielded three single-nucleotide substitutions in 21.0 Mbp sequenced, which implies a maximal per-base sequencing error rate of 1.4 × 10−7 assuming that the plasmid contained no diversity. Thus, at least 80% of the observed variants should correspond to real mutations. Based on the estimated sequencing error, the net mutation frequency was (6.4–1.4) × 10−7 = 5.0 × 10−7. In addition to single-nucleotide substitutions, the HAdv5 sequences contained 70 point insertions and deletions (S1 Table), but these types of mutations were present in the control plasmid at similar frequencies, indicating that most should be sequencing errors. Therefore, we did not use insertions and deletions for mutation rate estimation. To further check the nucleotide substitutions detected by DS, we set out qPCR assays aimed at selectively amplifying wild-type and mutant alleles in four genome sites. For mutations detected by DS, the Ct values obtained in qPCRs designed to amplify the mutant allele were delayed by approximately 6 to 12 cycles compared to those in which the wild-type sequence was amplified. In contrast, for mutations not detected by DS, this difference increased to 12–20 cycles (S1 Fig). This further supports the conclusion that most DS mutations were real.
HAdv5 spectrum and rate of spontaneous mutations
G→A transitions and the reverse complementary C→T transitions were the most abundant type of mutations detected by DS (42%), followed by A→G/T→C (24%), whereas transversions were 2.0 times less abundant than transitions (Table 1). We found that the nearly twofold excess of G→A and C→T changes over A→G/T→C was accounted for by a higher mutation probability at GpC motifs. Specifically, of the 3167 such dinucleotides in the HAdv5 genome, 50 contained G→A or C→T mutations, whereas the 13,380 G and C bases that were not part of GpC motifs showed only 39 mutations, revealing a 2.7-fold excess probability of G→A /C→T mutation at GpC dinucleotides (Fisher test: P < 0.001). In contrast, CpG motifs only showed a 1.3-fold excess C→T/G→A mutation probability compared to other G and C bases. The observed mutational pattern at GpC motifs is unlikely to be explained by selection, because HAdv5 was passaged only twice after the bottleneck, minimizing the ability of selection to purge spontaneous mutations. Furthermore, these passages were done at high MOI, which reduces the efficacy of selection [20–23]. The lack of significant selection was further supported by analysis of synonymous and non-synonymous variation. Based on the observed mutational spectrum, we expected 71.5% of base substitutions at coding regions to be non-synonymous under a neutral model. The observed fraction was 74.5%, a value that did not deviate significantly from this expectation (dN/dS = 1.16, chi-square test: P = 0.327). In the absence of selection, the observed mutation frequency should equal the mutation rate per cell infection cycle times the number of infection cycles elapsed [7]. Each of the two high-MOI transfers allowed for one infection cycle and, using our estimate of the HAdv5 burst size (ca. 104 infectious units per cell), approximately two additional cycles were required for growth of the initial infectious unit isolated by endpoint dilution (see Methods). Hence, the calculated point mutation rate per cell infection cycle was 5.0 × 10−7 / 4 = 1.3 × 10−7.
Distribution of spontaneous mutations along the HAdv5 genome
Genetic diversity is not uniformly distributed in adenovirus genomes [16,17], raising the question whether mutation rates vary accordingly. Although mutations were generally well-scattered along the HAdv5 genome, we found two regions of approximately 5 kpb showing significantly fewer mutations than expected under the assumption of a constant rate (Fig 2B). Specifically, the region encompassing genome sites 18,500 to 23,500 showed only seven total base substitutions in the three biological replicates, whereas the expected number was 29.17 (Binomial test: P < 0.001), the reduction being statistically significant in each of the three replicates (P ≤ 0.014). This region approximately maps to genes located in transcription unit L3, which encodes a protease and two capsid proteins (VI and hexon protein).
Another low-mutation region encompassed genome sites 500 to 5500, which includes transcription units E1B, IX, and IVa2 (Binomial test: P = 0.001), although in this case the reduction was significant in only two of the three replicates. One possible factor driving mutation depletion may be purifying selection acting specifically at these regions. However, the percentage of non-synonymous mutations in L3/E1B-IVa2 was similar to the rest of the genome (76.9% versus 74.5%, respectively), suggesting no differential selection pressure at the protein-coding level. We also found no major differences in the frequency of GpC dinucleotides between the low-mutation regions and the rest of the genome, and the reduction in mutation frequency in L3 was still significant after removing all G→A /C→T changes from the analysis (Binomial test: P < 0.001). Hence, the mechanisms driving mutation rate variation along the viral genome remain unclear. Aside from the reduced mutation frequency in these two regions, the observed number of mutations per transcription unit correlated well with the expected number assuming a constant mutation rate (Fig 2C; Chi-square test: P = 0.672). Finally, to further test for mutation clustering, we obtained the empirical distribution of the distance between consecutive mutations. Assuming a constant rate, this distance should follow a geometric distribution with parameter equal to the per-site mutation probability. The data were in broad agreement with this null model, although there was a slight increase in the proportion of mutations showing small distances, suggesting some level of clustering.
Candidate mutational hotspots
Interestingly, we identified 11 individual genome sites that exhibited the exact same mutation in at least two of the three biological replicates, suggesting the presence of mutational hotspots (Table 2). For instance, genome site 9417, which maps to transcription unit E2B (terminal protein precursor gene) showed a T-to-G substitution in 0.28%, 0.54% and 0.77% of the reads of each replicate, a frequency that exceeds the genome-wide mutation frequency by orders of magnitude. This site was flanked by a G-rich motif (GGTGGGG), such that the mutation produced a G heptamer. Most other mutations were point insertions. A common feature of these recurrently appearing mutations was a low-complexity sequence context with frequent homopolymeric runs. This type of sequence context reduces replication fidelity by inducing frequent polymerase slippage, but also elevates the sequencing error rates. Based on this, we cannot rule out the possibility that these mutations were sequencing artefacts. In genome site 14,073, which maps to the end of L1, we found a frequent insertion that was further accompanied by a marked decrease in sequencing coverage, from >1000 to approximately 200 (Fig 2A).
Table 2. Mutations appearing recurrently in independent experiments and polymorphism found at these sites in publicly available adenovirus C sequences.
Genome site | Replicates mutated | Mutation | Sequence context | Coding | Natural polymorphism |
---|---|---|---|---|---|
1165 | 1, 2, 3 | Insertion | TGGTGTGGTAATTTTTTTTTT | No | Insertions, deletions 1 |
1216 | 2, 3 | Insertion | TTGTATTGTGATTTTTTTAAA | No | Insertions, deletions 1 |
7096 | 1, 2, 3 | Insertion | TATCCTGTCCCTTTTTTTTCC | E2B | None |
8617 | 1, 2, 3 | Insertion | CCCCGGAGGTAGGGGGGGCTC | E2B | None |
9417 | 1, 2, 3 | T→G | CTGGCGGCGGTGGGGGAGGGG | E2B | None 2 |
11,222 | 2, 3 | A→C | CGGGCCCGGCACTACCTGGAC | L1 | None |
14,073 | 1, 2, 3 | Insertion | GAGAATGTTTTAAAAAAAAAA | L1/No | Insertions, deletions 3 |
16,602 | 1, 2 | Insertion | GAGATCTATGGCCCCCCGAAG | L2 | None |
34,336 | 1, 2, 3 | Insertion/deletion | GAAGAACCATGTTTTTTTTTT | No | Insertions, deletions 1 |
35,122 | 2, 3 | Insertion | AATAAAATAACAAAAAAACAT | No | Insertions, deletions, C/T 1 |
35,215 | 1, 2 | Insertion | GGCGTGACCGTAAAAAAACTG | E4 | None |
In contrast, other similar motifs such as a poly-T 11-mer at genome positions 34,337–34,347 and a poly-T 10-mer at genome positions 1161–1170 did not show a similarly marked decrease in sequencing coverage.
Relationship between mutation rate and sequence diversity in vivo
Under a null model in which the mutation rate shows no appreciable variation along the HAdv5 genome, mutated sites should represent a random sample of genome sites, hence, should not show particularly elevated diversity or evolvability. To test this, we downloaded from GenBank 35–52 adenovirus C sequences for each gene (except the L3 hexon gene, for which only 15 well-aligning sequences could be retrieved; S1 Dataset) and calculated Li and Nei´s nucleotide diversity per site, defined as the probability that pairs of sequences differ at that particular site (Fig 3A). Of the 29,858 genome sites examined, 2463 (8.2%) were polymorphic. Overall, the distribution of diversity values was similar for genome sites showing mutations in our experimental system and for those showing no mutations (Fig 3B), supporting no major effects of mutation rate variation on in vivo diversity of HAdv5. However, when we focused on the low-mutating regions L3 and E1B-IVa2, the fraction of polymorphic sites in database sequences dropped to 5.2%, versus 9.1% outside these regions (Fisher test: P < 0.001; Fig 3C), suggesting that reduced mutation rate limits in vivo diversity in these regions. We also tested whether the 11 sites showing recurrent mutations in our experimental design exhibited the same mutations in database sequences. If these sites were true mutational hotspots, database sequences should tend to show variation at these sites too. In contrast, if these were DS artefacts, most should not be polymorphic or, alternatively, they may systematically show sequence changes similar to those detected by DS if low-complexity sequence contexts also led to errors in database sequences. Interestingly, recurrently mutated sites that mapped to non-coding regions showed variation in GenBank sequences, whereas those mapping to coding regions did not (Table 2; Fig 3D; S2 Fig). This strongly suggests that most of the proposed mutational hotspots are real and that, in patient-derived sequences, purifying selection has removed these mutations from coding regions, but not from non-coding regions.
Discussion
Per-base mutation rates correlate negatively with genome sizes over a broad range of DNA microorganisms including viruses, bacteria, and unicellular eukaryotes [7,24,25]. As a result, the genomic mutation rate varies weakly, and a quasi-constant rate of approximately 0.003 mutations per genome per round of copying was suggested [24]. For HAdv5, our calculated mutation rate per cell infection cycle is 1.3 × 10−7 or, equivalently, 0.0046 per 35.9 kbp genome, in good agreement with the suggested rule. In recent work with human cytomegalovirus, de novo mutations were identified in longitudinal patient samples and, using the estimated duration of the cell infection cycle for this virus in vivo, the calculated mutation rate was 2.0 × 10−7 [11]. Previous work with murine cytomegalovirus gave a very similar estimate of 1.4 × 10−7, although this mutation rate was measured per day instead of per cell infection cycle [8]. In herpes simplex virus, the mutation rate was estimated by scoring null mutations in the tk gene using ganciclovir [10]. This yielded an estimated rate of 5.9 × 10−8 per cell infection cycle [7]. Therefore, mutation rates for different human DNA viruses measured by widely different methods vary within approximately twofold around 10−7 mutations per base per cell infection cycle. Genome sizes range from 35.9 kbp for HAdV to 150 kpb for herpes simplex virus and 230–236 kbp for cytomegaloviruses. As a result, genomic mutation rates vary by approximately an order of magnitude, and are substantially lower than the 0.003 expected value for the largest DNA viruses (Fig 4).
This cast some doubts on the proposed relationship between genome sizes and mutation rates, at least for viruses. However, estimates for cytomegaloviruses and herpes virus suffer from limitations imposed by scoring only a few, potentially unrepresentative, genome sites, and from biases associated to selection, and hence should be taken with caution, making it premature to draw conclusions. We suggest, though, that factors other than genome size per se determine DNA virus mutation rates, such as whether the virus encodes its own polymerase or whether the viral genomic DNA is single-stranded or double-stranded. For instance, viruses encoding their own polymerases should have greater capacity to optimize mutation rates in response to selective pressures such as mutational load, the costs of replication fidelity, or adaptation to novel environments, compared to viruses that use host-encoded polymerases. Also, single-stranded DNA should be more prone to spontaneous damage and host-mediated editing than double-stranded DNA. Future work should help clarify the molecular mechanisms and evolutionary processes underlying mutation rate variation across viruses.
Whereas from an evolutionary standpoint mutation rates per cell infection cycle are meaningful because the cell infection cycle is the equivalent of a viral generation, from a biochemical perspective use of per round of copying estimates better reflects the actual fidelity of replication. If approximately 104 HAdv5 infectious units are produced per cell, there should be at least log2 (104) = 13.3 rounds of semi-conservative replication per cell, giving a rate of 1.3 × 10−7/ 13.3 = 9.8 × 10−9 per-round-of copying. Notice that this estimate is robust to gross uncertainties in burst size measurements. If, for instance, we assume only 103 HAdv5 infectious units per cell, the estimated mutation rate per round of copying would change only weakly, i.e. 1.3 × 10−7/ log2 (103) = 13.0 × 10−9. The mutation rate of normal human cells is on the order of 10−9 per round of copying [25]. In contrast, typical mutation rates for tumoral cells are at least 100-fold higher [27], with recent estimations for various types of cancers ranging from 10−7 to 10−6 [28]. Here, we used human cervix tumor HeLa cells for HAdv5 growth, suggesting that the HAdv5 mutation rate per round of copying was similar or even lower than that of the host cell. Whether adenoviruses use cellular post-replicative repair is unclear and the fact that tumoral cells as those used here typically show aberrant repair pathways precludes us from addressing this question here. Use of normal cells for HAdv5 growth may help clarify this point in future studies. It is well-established that most DNA viruses cause genome instability and interact with repair and DNA damage response (DDR) pathways [29,30]. For instance, the adenovirus E4orf6 protein recruits an ubiquitin ligase and promotes the proteasomal degradation of TOPBP1, an activator of DDR via ATR [31], and defects in the adenoviral E4 gene lead to the formation of concatemers of viral genomes with heterogeneous junctions [32]. However, it remains unclear whether activation or suppression of the DDR determines DNA virus mutation rates.
Although two 5 kpb regions with an over two-fold reduction in mutation rate were identified, we found no dramatic differences in mutation rate across adenovirus transcription units. This contrasts with the ability of some organisms to critically increase the mutation rate of some loci. Targeted hyper-mutation has been described in the immunoglobulin genes of B lymphocytes [33], contingency loci encoding surface proteins in some bacteria [34], and genes encoding tail fiber proteins in some DNA bacteriophages [35]. The involved mechanisms include DNA editing, polymerase slippage in DNA tandem repeats, and error-prone reverse transcription. Adenovirus genetic diversity shows ample across-gene variation and is highest in genes involved in virus-host interactions [16,17]. In principle, adenoviruses should also benefit from targeting mutations to these specific loci, but we found no such targeting in HAdv5. Potential hotspots were restricted to low-complexity sequence contexts and did not span entire genes but only specific nucleotide sites. Interestingly, one such possible hotspot, the poly-A motif in the 3´end of gene L1, has also been identified as a recombination hotspot and is located between a relatively conserved genome region (3´of the hotspot) and a more variable region (5´of the hotspot) [17]. It can be speculated that frequent replication errors in this low-complexity region may recruit proteins of the post-replicative system, which would increase the likelihood of recombination events. Finally, we also found that GpC dinucleotides were 2.7 times more prone to mutation than G or C bases alone. In vertebrates, cytosines in CpGs are frequently methylated and their spontaneous deamination produces thymidines, leading to high rates of C→T substitutions [36] but, in contrast, there is little evidence for methylation or elevated mutation rates at GpC motifs [37]. Furthermore, adenovirus DNA is poorly methylated [38,39] and hence this mutational pathway should be infrequent. As such, the potential mechanisms leading to increased mutation rate at GpC motifs in HAdv5 remain to be investigated.
Methods
Virus and cells
HAdv5 was a generous gift from Dr. Ramón Alemany (Bellvitge Biomedical Research Institute) and was propagated in HeLa cells from ATCC. Cells were free of mycoplasma as determined by a PCR test. Cells were cultured at 37°C and 5% CO2 in Dulbeco’s Eagle’s medium (DMEM) supplemented with 10% FBS and antibiotics.
Plaque assay
HeLa H1 cells at 70–80% of confluency were used for plaque assays in 6-well plates. After viral adsorption, cells were washed with PBS and incubated for 4–5 days at 37°C with 5% CO2 in a semi-solid medium containing DMEM supplemented with 1% FBS, 1% penicillin-streptomycin, and 0.8% noble agar overlaid with a nutrient medium layer of DMEM supplemented with 1% FBS, 1% penicillin-streptomycin, glucose and GOP supplement. Cell monolayers were fixed with 4% formaldehyde, and stained with 2% crystal violet. Viral titers were expressed as plaque forming units (PFU)/mL.
HAdv5 DNA purification
Viral infections were done in DMEM supplemented with 5–10% FBS and 1% penicillin-streptomycin until cytopathic effect was observed. The virus was first subjected to three transfers at endpoint dilutions in 96-well plates (10 days per transfer). The last transfer was then further amplified in a 10-cm plate (3 days) and the supernatant was used to inoculate five 10-cm plates of confluent HeLa cells at an MOI of 5 PFU/cell (3 days). Cells were harvested by centrifugation and washed with PBS. A suspension of approximately 107 cells was incubated 30 min on ice with 1 mL cell lysis buffer (50 mM Tris-HCl pH 8.5, 150 mM NaCl and 1% Triton X-100). Cellular debris were removed by centrifugation at 14,000 rpm for 10 min at 4°C, and the supernatant was treated with 10 mg/mL RNAse A for 1 h at 37°C. Then, viral lysis buffer (10% SDS, 0.5M EDTA and 10 mg/mL proteinase K) was added and samples were incubated for 1 h at 56°C. Viral DNA was extracted with phenol/ chloroform and resuspended in ddH20 for direct DS sequencing. As a sequencing control, pUC18 DNA was purified using the NucleoSpin Plasmid Kit (Macherey-Nagel, Germany). All DNA samples were quantitated using the Qubit dsDNA BR Assay kit (Life Technologies, USA) prior to sequencing.
Duplex Sequencing
The increased accuracy of DS is based on ligation of sequencing adapters containing random yet complementary double-stranded nucleotide sequences [19]. These molecular tags allow tracing each strand of the original double-stranded template and removal of mutational artefacts that appear in only one of the two strands. DS adapters were constructed by annealing of two oligonucleotides, one of which contained a 12-nt single-stranded randomized sequence tag. Annealed primers were extended using the Klenow fragment, digested to obtain cohesive ends, and used as the final DS adapters for library preparation as previously described [13]. The purified HAdv5 DNA was fragmented using a Covaris sonicator and size selection was performed with Ampure X beads. Subsequent steps including sequencing library construction were exactly performed as detailed in previous work [13]. Samples were run on a NextSeq machine (Illumina) with 150 bp reads. FastQ files were processed with the DS software pipeline (https://github.com/loeblab/Duplex-Sequencing) using BWA 0.6.2, Samtools 0.1.19, Picard-tools 1.130 and GATK 3.3–0, and GenBank accession AY601635 as reference sequence. The computational workflow relies into three major steps: tag parsing and initial alignment, single stranded consensus sequence (SSCS) assembly, and duplex consensus sequence (DCS) assembly. Finally, the processed DCS data were realigned to the reference genome to analyze each genomic position and count mutations. Default parameters were used except for family size, which was reduced from 3 to 2 to increase the number of reads. Analysis of HAdv5 and puC18 sequencing outputs at family size 2 and 3 yielded nearly identical mutation frequencies, indicating that the reduced family size did not increase sequencing error appreciably (S2 Table. Accession AY601635 was used to extract HAdv5 gene coordinates, and deduced changes in protein sequences were obtained with a homemade R script. The actual sequence of the virus used in the experiments, as obtained from the DS consensus, was identical to AY601635 except for two single-base substitutions: a T→C change at position 18,813, and a G→C change at position 30,598. The DCS-final.bam.pileup text file was used to visualize insertions/deletions. The DS output is available from the NCBI SRA database (www.ncbi.nlm.nih.gov/sra; accession SRP091328).
qPCRs
We set out to check mutations at the following genome sites: 191 (G→A), 7295 (A→G), 9417 (T→G), and 33,565 (A→G). For each, we designed three primers for SYBR green-based qPCR: one located approximately 200 bp upstream of the mutated site and two for which the 3´base matched exactly the mutated site, one containing the mutant base and another containing the wild-type base (WT, see S3 Table for primer information). We performed two separate qPCR reactions under the same conditions, using the mutant or wild-type primer, and a modified Taq DNA pol as provided in Agilent´s Brilliant III Ultra-Fast SYBR Green qPCR Master Mix. Since pairing of the 3´base is critical for DNA extension and Taq DNA pol has no 3´exonuclease activity, each of these two alternate qPCRs should amplify selectively the mutant or wild-type DNA. All reactions were carried using 0.3 ng of the purified HAdv5 DNA and 200 nM of each primer. Preliminary gradient PCRs were carried out using WT primers to set the annealing temperature as high as possible (67°C, 69°C, 78°C, and 68.5°C for each of the sites, respectively). The thermal qPCR profile was as follows: an initial denaturation step (95°C 3 min), 40 cycles of amplification (95°C 5 s, annealing temperature 10 s, 72°C 10 s), and a final melting cycle (95°C 30 s, 65°C 30 s, 95°C 30 s). Data analysis and Ct estimation were carried out using the AriaMx software provided by the manufacturer. Each qPCR was performed in triplicate.
Burst size estimation
To estimate the number of PFUs produced per infected cell (burst size), HeLa cells were infected with HAdv5 at an MOI of 10 PFU/cell in a 24-well plate. The supernatant was collected at 3 and 7 days post inoculation and titrated by the plaque assay. The burst size was calculated as the number of PFUs produced per culture at day 3 divided by the number of cells, giving 9.86 × 103 PFU/cell. Titers at day 7 were slightly higher, yielding a burst size of 2.60 × 104 PFU/cell. Since each well of a 96-well plate contained approximately 3 × 104 cells, we estimate that roughly two infectious cycles were required to fully infect a well initially containing a single PFU.
Analysis of GenBank sequences
To infer Nei and Li´s nucleotide diversity, sequence AY601635 was used as query to carry out Blastn analyses on a per-gene basis, and all hits corresponding to human adenovirus C sequences were retrieved and aligned using the Muscle algorithm implemented in Mega 7 (www.megasoftware.net). The analysis was done gene by gene to facilitate alignment. For E3, since there are abundant alternative coding regions, we arbitrarily defined our query as a region spanning AY601635 genome positions 27,500 to 31,000. Similarly, our query for the E4 regions encompassed AY601635 genome positions 32,500 to 35,119. A, T, G, C base frequencies (f) at each site were obtained using a custom script and Nei and Li´s diversity was calculated as Ⅱ = 1 –f A 2 –f T 2 –f G 2 –f C 2. To analyze sequence polymorphism in sites showing recurrent mutations in our system, an approximately 300 pb region flanking the site of interest was used as query for a Blastn. Human adenovirus C sequences were retrieved and visually inspected to assess polymorphism.
Supporting Information
Acknowledgments
We thank Ramón Alemany for the virus and helpful comments, Silvia Torres for technical assistance, Manoli Torres for assistance with programming, and members of the Genomics facility of the Universitat de València for help with Duplex Sequencing.
Data Availability
Duplex sequencing output files are available from the NCBI SRA database (www.ncbi.nlm.nih.gov/sra; accession SRP091328).
Funding Statement
This work was financially supported by grants from the Spanish Ministerio de Economía y Competitividad (BFU2013-41329; www.mineco.gob.es) and the European Research Council (ERC-2011-StG-281191-VIRMUT; erc.europa.eu). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1. Biek R, Pybus OG, Lloyd-Smith JO, Didelot X. Measurably evolving pathogens in the genomic era. Trends Ecol Evol. 2015;30:306–13. 10.1016/j.tree.2015.03.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Chateigner A, Bezier A, Labrousse C, Jiolle D, Barbe V, Herniou EA. Ultra deep sequencing of a baculovirus population reveals widespread genomic variations. Viruses. 2015;7:3625–46. 10.3390/v7072788 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Duffy S, Shackelton LA, Holmes EC. Rates of evolutionary change in viruses: patterns and determinants. Nat Rev Genet. 2008;9:267–76. 10.1038/nrg2323 [DOI] [PubMed] [Google Scholar]
- 4. Firth C, Kitchen A, Shapiro B, Suchard MA, Holmes EC, Rambaut A. Using time-structured data to estimate evolutionary rates of double-stranded DNA viruses. Mol Biol Evol. 2010;27:2038–51. 10.1093/molbev/msq088 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Sarker S, Patterson EI, Peters A, Baker GB, Forwood JK, Ghorashi SA, et al. Mutability dynamics of an emergent single stranded DNA virus in a naive host. PLoS One. 2014;9:e85370 10.1371/journal.pone.0085370 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Szpara ML, Tafuri YR, Parsons L, Shamim SR, Verstrepen KJ, Legendre M, et al. A wide extent of inter-strain diversity in virulent and vaccine strains of alphaherpesviruses. PLoS Pathog. 2011;7:e1002282 10.1371/journal.ppat.1002282 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Sanjuán R, Nebot MR, Chirico N, Mansky LM, Belshaw R. Viral mutation rates. J Virol. 2010;84:9733–48. 10.1128/JVI.00694-10 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Cheng TP, Valentine MC, Gao J, Pingel JT, Yokoyama WM. Stability of murine cytomegalovirus genome after in vitro and in vivo passage. J Virol. 2010;84:2623–8. 10.1128/JVI.02142-09 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Cuevas JM, Duffy S, Sanjuán R. Point mutation rate of bacteriophage ΦX174. Genetics. 2009;183:747–9. 10.1534/genetics.109.106005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Drake JW, Hwang CB. On the mutation rate of herpes simplex virus type 1. Genetics. 2005;170:969–70. 10.1534/genetics.104.040410 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Renzette N, Pokalyuk C, Gibson L, Bhattacharjee B, Schleiss MR, Hamprecht K, et al. Limits and patterns of cytomegalovirus genomic diversity in humans. Proc Natl Acad Sci USA. 2015;112:E4120–E4128. 10.1073/pnas.1501880112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Acevedo A, Andino R. Library preparation for highly accurate population sequencing of RNA viruses. Nat Protoc. 2014;9:1760–9. 10.1038/nprot.2014.118 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Kennedy SR, Schmitt MW, Fox EJ, Kohrn BF, Salk JJ, Ahn EH, et al. Detecting ultralow-frequency mutations by Duplex Sequencing. Nat Protoc. 2014;9:2586–606. 10.1038/nprot.2014.170 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Benko M, Harrach B. Molecular evolution of adenoviruses. Curr Top Microbiol Immunol. 2003;272:3–35. [DOI] [PubMed] [Google Scholar]
- 15. Kojaoghlanian T, Flomenberg P, Horwitz MS. The impact of adenovirus infection on the immunocompromised host. Rev Med Virol. 2003;13:155–71. 10.1002/rmv.386 [DOI] [PubMed] [Google Scholar]
- 16. Robinson CM, Seto D, Jones MS, Dyer DW, Chodosh J. Molecular evolution of human species D adenoviruses. Infect Genet Evol. 2011;11:1208–17. 10.1016/j.meegid.2011.04.031 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Robinson CM, Singh G, Lee JY, Dehghan S, Rajaiya J, Liu EB, et al. Molecular evolution of human adenoviruses. Sci Rep. 2013;3:1812 10.1038/srep01812 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Ogorzaly L, Walczak C, Galloux M, Etienne S, Gassilloud B, Cauchie HM. Human adenovirus diversity in water samples using a next-generation amplicon sequencing approach. Food Environ Virol. 2015; 7:112–121. [DOI] [PubMed] [Google Scholar]
- 19. Schmitt MW, Kennedy SR, Salk JJ, Fox EJ, Hiatt JB, Loeb LA. Detection of ultra-rare mutations by next-generation sequencing. Proc Natl Acad Sci USA. 2012;109:14508–13. 10.1073/pnas.1208715109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Marriott AC, Dimmock NJ. Defective interfering viruses and their potential as antiviral agents. Rev Med Virol. 2010;20:51–62. 10.1002/rmv.641 [DOI] [PubMed] [Google Scholar]
- 21. Frensing T. Defective interfering viruses and their impact on vaccines and viral vectors. Biotechnol J. 2015;10:681–9. 10.1002/biot.201400429 [DOI] [PubMed] [Google Scholar]
- 22. Gutiérrez S, Michalakis Y, Blanc S. Virus population bottlenecks during within-host progression and host-to-host transmission. Curr Opin Virol. 2012;2:546–55. 10.1016/j.coviro.2012.08.001 [DOI] [PubMed] [Google Scholar]
- 23. Montville R, Froissart R, Remold SK, Tenaillon O, Turner PE. Evolution of mutational robustness in an RNA virus. PLoS Biol. 2005;3:e381 10.1371/journal.pbio.0030381 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Drake JW. A constant rate of spontaneous mutation in DNA-based microbes. Proc Natl Acad Sci USA. 1991;88:7160–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Lynch M. Evolution of the mutation rate. Trends Genet. 2010;26:345–52. 10.1016/j.tig.2010.05.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Sanjuán R, Domingo-Calap P. Mechanisms of viral mutation. Cell Mol Life Sci. 2016; 73:4433–48. 10.1007/s00018-016-2299-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Bielas JH, Loeb KR, Rubin BP, True LD, Loeb LA. Human cancers express a mutator phenotype. Proc Natl Acad Sci USA. 2006;103:18238–42. 10.1073/pnas.0607057103 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Williams MJ, Werner B, Barnes CP, Graham TA, Sottoriva A. Identification of neutral tumor evolution across cancer types. Nat Genet. 2016;48:238–44. 10.1038/ng.3489 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Luftig MA. Viruses and the DNA damage response: activation and antagonism. Annu Rev Virol. 2014;1:605–25. 10.1146/annurev-virology-031413-085548 [DOI] [PubMed] [Google Scholar]
- 30. Weitzman MD, Lilley CE, Chaurushiya MS. Genomes in conflict: maintaining genome integrity during virus infection. Annu Rev Microbiol. 2010;64:61–81. 10.1146/annurev.micro.112408.134016 [DOI] [PubMed] [Google Scholar]
- 31. Blackford AN, Patel RN, Forrester NA, Theil K, Groitl P, Stewart GS, et al. Adenovirus 12 E4orf6 inhibits ATR activation by promoting TOPBP1 degradation. Proc Natl Acad Sci USA. 2010;107:12251–6. 10.1073/pnas.0914605107 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Weiden MD, Ginsberg HS. Deletion of the E4 region of the genome produces adenovirus DNA concatemers. Proc Natl Acad Sci USA. 1994;91:153–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Teng G, Papavasiliou FN. Immunoglobulin somatic hypermutation. Annu Rev Genet. 2007;41:107–20. 10.1146/annurev.genet.41.110306.130340 [DOI] [PubMed] [Google Scholar]
- 34. Moxon R, Bayliss C, Hood D. Bacterial contingency loci: the role of simple sequence DNA repeats in bacterial adaptation. Annu Rev Genet. 2006;40:307–33. 10.1146/annurev.genet.40.110405.090442 [DOI] [PubMed] [Google Scholar]
- 35. Medhekar B, Miller JF. Diversity-generating retroelements. Curr Opin Microbiol. 2007;10:388–95. 10.1016/j.mib.2007.06.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Hodgkinson A, Eyre-Walker A. Variation in the mutation rate across mammalian genomes. Nat Rev Genet. 2011;12:756–66. 10.1038/nrg3098 [DOI] [PubMed] [Google Scholar]
- 37. Pontecorvo G, De Felice B, Carfagna M. Novel methylation at GpC dinucleotide in the fish Sparus aurata genome. Mol Biol Rep. 2000;27:225–30. [DOI] [PubMed] [Google Scholar]
- 38. Gunthert U, Schweiger M, Stupp M, Doerfler W. DNA methylation in adenovirus, adenovirus-transformed cells, and host cells. Proc Natl Acad Sci USA. 1976;73:3923–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Hoelzer K, Shackelton LA, Parrish CR. Presence and role of cytosine methylation in DNA viruses of animals. Nucleic Acids Res. 2008;36:2825–37. 10.1093/nar/gkn121 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Duplex sequencing output files are available from the NCBI SRA database (www.ncbi.nlm.nih.gov/sra; accession SRP091328).