Abstract
DNA viruses can exploit host cellular epigenetic processes to their advantage; however, the epigenome status of most DNA viruses remains undetermined. Third generation sequencing technologies allow for the identification of modified nucleotides from sequencing experiments without specialized sample preparation, permitting the detection of non-canonical epigenetic modifications that may distinguish viral nucleic acid from that of their host, thus identifying attractive targets for advanced therapeutics and diagnostics. We present a novel nanopore de novo assembly pipeline used to assemble a misidentified Camelpox vaccine. Two confirmed deletions of this vaccine strain in comparison to the closely related Vaccinia virus strain modified vaccinia Ankara make it one of the smallest non-vector derived orthopoxvirus genomes to be reported. Annotation of the assembly revealed a previously unreported signal peptide at the start of protein A38 and several predicted signal peptides that were found to differ from those previously described. Putative epigenetic modifications around various motifs have been identified and the assembly confirmed previous work showing the vaccine genome to most closely resemble that of Vaccinia virus strain Modified Vaccinia Ankara. The pipeline may be used for other DNA viruses, increasing the understanding of DNA virus evolution, virulence, host preference, and epigenomics.
Subject terms: Epigenetics, Epigenomics, Genome, Microbial genetics, Sequencing
Introduction
DNA viruses include those which have DNA genomes and replicate using DNA-dependent DNA polymerase. They are grouped into two classes, comprising single stranded DNA viruses and double stranded DNA viruses. The latter group contains the infamous Variola Virus (VARV), the causative agent of smallpox, which belongs to the family Poxviradae, subfamily Chordopoxvirinae and genus Orthopoxvirus. There are currently 12 accepted species within the genus, the other notable members including; Vaccinia virus (VACV)—the prototype Orthopoxvirus used as a vaccine to eradicate human smallpox and which has no known natural host1, Cowpox virus (CPXV)—administered successfully by Edward Jenner as the first documented successful vaccine2, Monkeypox virus (MPXV)—a zoonotic virus endemic to the African subcontinent3, and Camelpox (CMLV)—the most genetically similar extant species to VARV4.
Poxviruses have linear, double-stranded DNA genomes that vary from 130 to 230 kbp5. The telomere ends of the genome form covalently closed hairpin structures at the termini6. The hairpin is at the end of a long, inverted terminal repetition (ITR) containing sets of short, tandemly repeated sequences5. For orthopoxviruses, the size of the ITRs range from approximately 200–500 base pairs for variola viruses, to almost 12,000 base pairs for several vaccinia virus strains7. Large ITR regions can pose problems for first generation Sanger sequencing8 and second-generation Illumina sequencing9, which are capable of producing sequence read lengths of up to around 1000 bp and 300 bp (or around 500 bp linked pair-end) respectively. Such tracts of repetitive sequences in a genome can be resolved by third-generation long read sequencing technologies10–12, which are capable of producing read lengths in excess of 100,000 bp.
The central portions of most poxvirus genomes are highly conserved, and contain essential genes involved in key functions such as transcription, DNA replication and virion assembly13. In contrast, genes that cluster at the ends of the genome are usually species or host specific, and encode virulence factors that modulate the host immune system13,14. Various proteins encoded by the genome have been shown to interact with DNA or precursor nucleotides5. The K7 protein has been shown to promote histone methylation associated with heterochromatin formation15. Furthermore, vaccinia virus (VACV) C416, C617, C1618, B1419, E320, F1621, and N222 gene products can be detected in the host nucleus, thus implicating them in some form of transcriptional regulation. To our knowledge, no research has been aimed towards assessing whether these proteins epigenetically modify the viral DNA. Furthermore, despite what is known of the capability of DNA viruses to exploit host cellular epigenetic processes to their advantage during infection23,24, the epigenome status of most DNA viruses remains unknown.
Third generation sequencing technologies have advanced epigenomic research by providing platforms that allow for the identification of modified nucleotides from sequencing experiments without the need for specialized secondary sample preparation protocols25–27. Such a direct approach for interrogating an epigenome is particularly beneficial for viral epigenetic research, as samples often contain high amounts of contaminating host DNA, which can complicate specialized DNA methylation probing techniques such as bisulfite sequencing28 and antibody based approached29. Furthermore, motifs with non-canonical epigenetic modifications can be identified by distinguishing a deviation of the raw signal from that of a standard model at a given nucleotide sequence26,30. Such non-canonical epigenetic modifications would distinguish viral DNA from that of host DNA, making them attractive targets for advanced therapeutics and diagnostics31. A drawback of Nanopore sequencing technology is that reads generally suffer from a comparatively high error rate (particularly in regions containing homopolymers) in comparison to other sequencing technologies, although advances in library preparation chemistry, pore technology and algorithms (basecalling, assembly and polishing) have greatly improved overall assembly error rates32.
In this study, we use nanopore sequencing to assemble the genome of a live attenuated CMLV strain, Ducapox, that was stated to comprise a CMLV isolate from the United Arab Emirates (CaPV298-2)33. The vaccine has since been found to contain two gene regions that more closely resembled that of VACV strain Modified Vaccinia Ankara (VACV-MVA)34. A separate study of the strain using second generation WGS found the vaccine genome matches that of VACV-MVA, with the exception of two genomic deletions (5195 and 890 bp in size), however, the authors questioned the authenticity of these genomic deletions due to both the reference-based assembly approach adopted, and the low sequencing coverage of the genome35. We present a sequencing and annotation pipeline for long read de novo assembly of Poxvirus genomes and identify putative epigenetic modifications within the genome. Using the latest version of signal peptide predication software, we identify a predicted protein with a previously undescribed signal peptide, and present several predicted signal peptides that were found to differ from previously described sequences. The pipeline may be used for other DNA viruses, increasing the understanding of DNA virus epigenomics.
Results
Sequencing statistics and de novo assembly
A total of 405,925 base called sequences were produced from the MinION sequencing run, of which 16,059 (3.95%) remained after size filtering and removal of non-viral DNA (Table 1). Most of the non-viral DNA was found to be of simian origin, consistent with the virus having been propagated in Vero cells. The Flye assembler produced a viral contig that was 195,695 bp in length. After ITR correction and all polishing steps, the assembly was 159,696 bp in length. Read coverage was found to be more uniformly distributed in the final assembly in comparison to the initial assembly (Flye assembly using > 3000 Viral DNA Read Set), the latter of which was found to have uneven read coverage distributions at the contig ends (Fig. 1). This is indicative of the final polished assembly containing terminal repeat sequence lengths that more closely match that of ground truth. Furthermore, a large coverage of reads had mapped to the ITR at the 3′ end of the genomes, indicative of poor ITR assembly, when reads were mapped to the Ducapox short-read assembly (supplementary information 1a). The mappings highlight the short-comings of adopting reference-based alignment assemblies using short-reads, as the large coverage of mapped reads to the 3′ ITR region was also observed when the same > 3000 Viral DNA read set was mapped to VACV Acambis 3000 MVA (supplementary information 1b).
Table 1.
Metric | Raw reads | > 3000 viral DNA read set |
---|---|---|
Number of reads | 405,925 | 16,059 |
Cumulative size (bp) | 828,487,274 | 94,298,074 |
Average read length (bp) | 2,041 | 5,872 |
N50 (bp) | 6,507 | 6,174 |
> Q12 | 301,538 (74.3%) | 13,599 (84.7%) |
Whole genome sequence comparisons
A blast search of the final polished assembly revealed the genome to most closely match that of Vaccinia virus strain Acambis 3000 MVA (Genbank Accession: AY603355.1), with a blast percentage identity score of 99.99%. A dotplot comparison of the Ducapox long read assembly vs VACV Acambis 3000 MVA revealed genomic deletions of 5449 bp and 916 bp in size in the Ducapox genome, corresponding to VACV Acambis 3000 MVA genome positions 3735–9183, and 23,219–24,134, respectively (Fig. 2). These deletions were confirmed by visualizing the mapping of reads to the genome assembly, and confirming that unbroken reads traversed the deletion sites (supplementary information 2a and 2b). The VACV Acambis 3000 MVA was also found to be 227 bp and 435 bp longer at its ends, in comparison to the Ducapox genome. The deletions in the Ducapox genome are further contrasted by a multiple sequence alignment between the Ducapox long read genome assembly, the Ducapox short read genome assembly, and the VACV Acambis 3000 MVA genome in supplementary information 2c. Both average and median identity scores were found to be higher, and error rates lower, when the > 3000 Viral DNA read set was mapped to the Ducapox genome than when mapped to VACV Acambis 3000 MVA (Table 2). 2 proteins predicted in the initial long-read assembly were found to be a single protein in the short-read assembly, as a result of a frameshift caused by the insertion of an additional adenine residue in a homopolymer track wherein the length of the homopolymer was 6 adenine residues in the short read assembly, and 7 adenine resides in the long-read assembly causing a truncation of the first protein (supplementary information 2d). Remarkably, in the long-read protein set, a second open reading frame within the first protein that frameshifted resulted in the formation of a second protein that was in-frame with the end portion of the truncated protein (supplementary information 2d).
Table 2.
Read-mapping metric | Ducapox long-read assembly | VACV Acambis 3000 MVA |
---|---|---|
Average percent identity | 95.3 | 85.9 |
Median percent identity | 96.7 | 84.3 |
Error rate (# mismatches/bases mapped) | 0.046 | 0.067 |
Genome annotations and functional analyses
The Ducapox genome was found to contain a total of 186 predicted protein coding genes (Fig. 3). A total of 194 genes were initially predicted by Prodigal, however, 8 of these predicted genes were found to contain no functional domain, and had no significant percentage identity to any protein in the Swissprot database, hence were removed from subsequent analyses. 13 out of these 186 proteins were found to contain predicted signal peptides (Table 3, supplementary information 3). A comparison of the proteins predicted by SignalP v5.0 (the latest version) and the signal peptides listed in the Uniprot database revealed that SignalP v5.0 predicted one previously unreported signal peptide in the protein A38L. Two proteins (A39R and HA) were found to have signal peptides predicted by SignalP v5.0 that matched those in the Uniprot database. The remaining 10 proteins contained signal peptides predicted by SignalP v5.0 that differed from those in the Uniprot database (predicted mature protein sequences in supplementary information 4). StructRNAfinder predicted a single structural RNA—the Pox_AX_element (RF00385), whis is involved in directing the efficient production and orientation-dependent formation of late RNAs36,37. A comparison of the predicted proteins from the long-read assembly against those generated from short read assembly was conducted using a protein blast, by aligning two or more sequences (BLOSUM62 comparison matrix; Gap costs: Existence 11, Extension 1). A total of 176 proteins were found to have equal length and 100% percentage identity between the two genome protein sets. An additional 7 proteins were found to have equal lengths and 100% identity, excepting for the fact that the short-read protein set contained letters that allowed for multiple amino acids to occupy the positions bringing the total identical proteins to 183 (supplementary information 5a). Of the remaining 3 proteins, 2 from the long-read assembly protein set were found to have better hit scores to VACV proteins in the UniProt database, and a single short read protein set had better hit scores to VACV proteins in the UniProt database (supplementary information 5a). Of the additional 10 proteins in the short-read protein set, 13 were found to either have no hit to VACV proteins in the UniProt database, or had hits that were less than half the length of a given protein.
Table 3.
Protein | Amino acid sequence | SignalP v5.0 prediction | Uniprot prediction |
---|---|---|---|
A38L | MSRVRISLIYLYTLVVITTTKTIEYTACNDTIIIPCTIDNPTKYIRWKLDNHDILTYNKTSKTTILSKWHTSARLHSLSDSDVSLIMEYKDILPGTYTCGDNTGIKSTVKLVQLHTNWFNDYQTMLMFIFTGITLFLLFLEITYTSISVVFSTNLGILQVFGCVIAMIELCGAFLFYPSMFTLRHIIGLLMMTLPSIFLIITKVFSFWLLCKSSCAVHLIIYYQLAGYILTVLGLGLSLKECVDGTLLLSGLGTIMVSEHFSLLFLVCFPSTQRDYY | MSRVRISLIYLYTLVVITTTKT | No signal peptide predicted |
C8L | MSAIRFIACLYLISIFGNCHEDPYYQPFDKLNITLDIYTYEDLVPYTVDNDTTSFVKIYFKNFWITVMTKWCAPFIDTVSVYTSHDNLNIQFYSRDEYDTQSEDKICTIDVKARCKHLTKREVTVQQEAYRYSLSSDLSCFDSIDLEIDLIETNSTDTTVLKSYELMLPKRAKSIHN | MSAIRFIACLYLISIFGNC | MSAIRFIACLYLISIFGNCHE |
HA | MTRLPILLLLISLVYATPFPQTSKKIGDDATLSCNRNNTNDYVVMSAWYKEPNSIILLAAKSDVLYFDNYTKDKISYDSPYDDLVTTITIKSLTARDAGTYVCAFFMTSPTNDTDKVDYEEYSTELIVNTDSESTIDIILSGSTHSPETSSEKPDYIDNSNCSSVFEIATPEPITDNVEDHTDTVTYTSDSINTVSASSGESTTDETPEPITDKEEDHTVTDTVSYTTVSTSSGIVTTKSTTDDADLYDTYNDNDTVPSTTVGGSTTSISNYKTKDFVEIFGITALIILSAVAIFCITYYIYNKRSRKYKTENKV | MTRLPILLLLISLVYA | MTRLPILLLLISLVYA |
B19R | MKMTMKMMVHIYFVSLLLLLFHSYAIDIENEITEFFNKMRDTLPAKDSKWLNPACMFGGTMNDIAALGEPFSAKCPPIEDSLLSHRYKDYVVKWERLEKNRRRQVSNKRVKHGDLWIANYTSKFSNRRYLCTVTTKNGDCVQGIVRSHIKKPPSCIPKTYELGTHDKYGIDLYCGILYAKHYNNITWYKDNKEINIDDIKYSQTGKKLIIHNPELEDSGRYNCYVHYDDVRIKM | MKMTMKMMVHIYFVSLLLLLFHSYA | MTMKMMVHIYFVSLLLLLF |
E10R | MNPKHWGRAVWTIIFIVLSQAGLDGNIEACKRKLYTIVSTLPCPACRRHATIAIEDNNVMSSDDLNYIYYFFIRLFNNLASDPKYAIDVTKVNPL | MNPKHWGRAVWTIIFIVLSQAGLDG | MNPKHWGR |
B8R | MRYIIILAVLFINSIHAKITSYKFESVNFDSKIEWTGDGLYNISLKNYGIKTWQTMYTNVPEGTYDISAFPKNDFVSFWVKFEQGDYKVEEYCTGPPTVTLTEYDDHPYATRGSKKIPIYKRGDMCDIYLLYTANFTFGDSKEPVPYDIDDYDCTSTGCSIDFVTTEKVCVTAQGATEGFLEKITPWSSKVCLTPKKSVYTCAIRSKEDVPNFKDKMARVIKRKFN | MRYIIILAVLFINSIHA | MRYIIILAVLFIN |
B7R | MYKKLITFLFVIGALASYSNNEYTPFNKLSVKLYIDGVDNIENSYTDDNNELVLNFKEYTISIITESCDVGFDSIDIDVINDYKIIDMSTIQRRGHTCRISTKLSCHYDKYPYIHKYDGDERQYSITAEGKCYKGIKYEISMINDDTLLRKHTLKIGSTYIFDRHGHSNTYYSKYDF | MYKKLITFLFVIGALASYS | MYKKLITFLFVIGALA |
A28L | MNSLSIFFIVVATAAVCLLFIQGYSIYENYGNIKEFNATHAAFEYSKSIGGTPALDRRVQDVNDTISDVKQKWRCVVYPGNGFVSASIFGFQAEVGPNNTRSIRKFNTMQQCIDFTFSDVININIYNPCVVPNINNAECQFLKSVL | MNSLSIFFIVVATAAVCLLFIQG | MNSLSIFFIVVATAAVCLLFI |
B16R | MSILPVIFLSIFFYSSFVQTFNAPECIDKGQYFASFMELENEPVILPCPQINTLSSGYNILDILWEKRGADNDRIIPIDNGSNMLILNPTQSDSGIYICITTNETYCDMMSLNLTIVSVSESNIDLISYPQIVNERSTGEMVCPNINAFIASNVNADIIWSGHRRLRNKRLKQRTPGIITIEDVRKNDAGYYTCVLEYIYGGKTYNVTRIVKLEVRDKIIPSTMQLPEGVVTSIGSNLTIACRVSLRPPTTDADVFWISNGMYYEEDDGDGDGRISVANKIYMTDKRRVITSRLNINPVKEEDATTFTCMAFTIPSISKTVTVSIT | MSILPVIFLSIFFYSSFVQT | MSILPVIFLSIFFYSSFV |
SPI-3 | MIALLILSLTCSASTYRLQGFTNAGIVAYKNIQDDNIVFSPFGYSFSMFMSLLPASGNTRIELLKTMDLRKRDLGPAFTELISGLAKLKTSKYTYTDLTYQSFVDNTVCIKPSYYQQYHRFGLYRLNFRRDAVNKINSIVERRSGMSNVVDSNMLDNNTLWAIINTIYFKGIWQYPFDITKTRNASFTNKYGTKTVPMMNVVTKLQGNTITIDDKEYDMVRLPYKDANISMYLAIGDNMTHFTDSITAAKLDYWSFQLGNKVYNLKLPKFSIENKRDIKSIAEMMAPSMFNPDNASFKHMTRDPLYIYKMFQNAKIDVDEQGTVAEASTIMVATARSSPEKLEFNTPFVFIIRHDITGFILFMGKVESP | MIALLILSLTCSA | MIALLILSLTCSAST |
A39 | MIPLLFILFYFANGIEWHKFETSEEIISTYLLDDVLYTGVNGAVYTFSNNKLNKTGLTNNNYITTSIKVEDAEPITEIPNVGK | MIPLLFILFYFANG | MIPLLFILFYFANG |
PS/HR | MKTISVVTLLCVLPAVVYSTCTVPTMNNAKLTSTETSFNNNQKVTFTCDQGYHSSDPNAVCETDKWKYENPCKKMCTVSDYISELYNKPLYEVNSTMTLSCNGETKYFRCEEKNGNTSWNDTVTCPNAECQPLQLEHGSCQPVKEKYSFGEYITINCDVGYEVIGASYISCTANSWNVIPSCQQKCDIPSLSNGLISGSTFSIGGVIHLSCKSGFILTGSPSSTCIDGKWNPILPTCVRSNEKFDPVDDGPDDETDLSKLSKDVVQYEQEIESLEATYHIIIVALTIMGVIFLISVIVLVCSCDKNNDQY | MKTISVVTLLCVLPAVVYS | MKTISVVTLLCVLPAVV |
A43R | MMMMKWIISILTMSIMPVLAYSSSIFRFHSEDVELCYGHLYFDRIYNVVNIKYNPHIPYRYNFINRTLTVDELDDNVFFTHGYFLKHKYGSLNPSLIVSLSGNLKYNDIQCSVNVSCLIKNLATSTSTILTSKHKTYSLHRSTCITIIGYDSIIWYKDINDIYDFTAICMLIASTLIVTIYVFKKIKMNS | MMMMKWIISILTMSIMPVLA | MMMMKWIISILTMSIMPVLAYS |
Assessment of putative epigenetic modification sites
A total of three motifs were identified in the Ducapox genome that consistently produced raw signals that diverged from the standard model. The AGAAGRC motif was found at 31 regions within the genome of which 24 regions had a coverage > 50. Signal fluctuations differing from the canonical model were observed around the central AAG nucleotides (Fig. 4). A Tomtom search of the motif detected no similar known motifs. The AARRRGATKH motif was found at 61 regions within the genome of which 48 regions had a coverage > 50. Signal fluctuations differing from the canonical model were observed around the central GA nucleotides (Fig. 5). A Tomtom search of the motif showed the reverse-complement to most closely match MA0467.1 (Crx binding motif; Mus musculus) in the JASPAR database.
The WWAATGWC motif was found to be present at 114 regions within the genome of which 90 regions had a coverage > 50. Signal fluctuations differing from that of the canonical model were observed around the central TGT nucleotides (Fig. 6). A Tomtom search of the motif showed the reverse-complement to most closely match MA1112.1 (NR4A1; Homo sapiens) in the JASPAR database. For each putatively modified motif detected by Tombo, the coverage, genomic position, signal fluctuations compared to a standard model, and number of regions containing each motif can be found in the TomboResultsOutput folder of the project Git (https://github.com/zacksaud/Ducapox-Assembly-Project/tree/master/TomboResultsOutput). No methylation sites with a frequency above 0.5 were detected with Nanopolish (supplementary information 6). No evidence of 5mC methylation was detected by Megalodon (supplementary information 7).
Discussion
Except for two confirmed genomic deletions, the whole genome sequence of this vaccine was shown to closely resemble that of VACV-MVA, supporting our earlier study in which we reported that two gene regions of this vaccine most closely resembled those of the aforementioned strain34. Our findings also corroborate with a previous study that used short read Illumina sequencing, and a reference guided assembly to generate a partial Ducapox genome, wherein the authors noted the putative deletions, but could not confirm the validity of the deletions due to the both the assembly pipeline and sequencing technology used35. At 159,695 bp in length, the vaccine genome, to our knowledge, is the smallest amongst the non-vector derived orthopoxviruses. We postulate that the deletions may have been a result of passage of a misidentified VACV-MVA strain, as it is known that poxvirus genomes tend to decrease in size with serial passage38. It has been demonstrated that VACV has a defined origin of replication, which supports a model for poxvirus genome replication that involves leading and lagging strand synthesis39. Studies on poxvirus DNA replication described putative Okazaki fragments of about 1,000 nt in length (suspiciously similar in size to the 916 bp deletion of the Ducapox sequence) and RNA primers on the 5′-ends of newly made chains of VACV DNA40,41.
We predicted a previously unreported signal peptide in protein A38L. The A38L gene product is a 33 kDa integral membrane glycoprotein42. Overexpression of the protein has been shown to promote Ca2+ influx into infected cells43. The latest version of SignalP predicted alternate peptide signals for 10 other proteins. These include; the gene product of C8L—the function of which remains unknown, the gene product of B19R—a type 1 interferon decoy44, the gene product of E10R—associated with membranes of intracellular mature virions and plays a role in morphogenesis45, the gene product of B8R- another interferon decoy44, the gene product of B7R- which is involved with virulence46, the gene product of B16R- an IL-1β binding protein47, the gene product of SPI-3- a cell fusion inhibitor protein48, the gene product of PS/HR—which plays a role in the dissolution of the outermost membrane of extracellular enveloped virions to allow virion entry into host cells and also participates in wrapping mature virions to form enveloped virions49, and finally the gene product of A43R—which enhances intradermal lesion formation50. Signal peptides play a range of different roles within cells that include marking proteins for secretion, intracellular translocation, and keeping catalytic proteins in an inactive precursor form until the signal peptide is cleaved51. Further research is needed to determine whether biochemical analyses of these new mature proteins yield any further insight into protein function.
We have presented regions within the Ducapox genome that contain motifs wherein the Nanopore signal diverges from the standard model, which may be indicative of bases within these regions containing epigenetic modifications. Although the Nanopore sequencing is a valuable tool for identifying putative epigenetic sites within a genome, the device does not allow for the identification of either the individual base that is modified, nor does it allow for the identification of the modifying chemical group. Thus, further analyses are required to confirm the results, such as isolation and purification of the motifs containing the putative epigenetic modifications and generating amplicons that could be Nanopore sequenced to confirm reversion of the amplicon raw signal to that of the standard model. Modifications that distinguish viral DNA from that of the host may be targets for advanced therapeutics. Should these epigenetic modifications be confirmed and chemically characterized, another important question would concern whether the modifications were the result of a viral protein, or the result of a host protein, and whether the base modifications are exclusive to the isolate of Vaccinia virus, or more widely distributed amongst poxviruses.
Given the relative cheapness of Nanopore sequencing, future research could investigate the evolutionary trajectory of orthopoxviruses with continued passage. Experiments such as determining whether different evolutionary trajectories occur when a seed stock of a virus is passaged in differing permissive cell lines would be of great interest. Furthermore, the Nanopore would allow for the assessment of differing epigenome modifications with continued passage. Such studies would assist in providing further evidence towards efforts to better understand the origins of Vaccinia virus52. Additionally, long read sequencing transcriptomics techniques have recently shed light on the high variation in transcript lengths at certain Vaccinia genome loci, termed chaotic regions53,54. Long read sequencing coupled with these transcriptomics techniques could provide greater insight into the loss of Poxvirus virulence with passage. Much research has gone into the elucidation of nucleic acid modifying proteins of Vaccinia virus, for instance, Vaccinia virus K7R protein has been shown to promote histone methylation associated with heterochromatin formation15. Furthermore, it is postulated that epigenetic and genetic mechanisms may also lead to VACV-induced transcription silencing, and VACV infection induces a global degradation of host and viral mRNA55. Also, VACV mRNA capping is carried out in three reactions performed by viral enzymes wherein guanine N-7 methylation occurs, and VACV encodes the VP39 protein (J3R) that is known to add a methyl group at the 2′-O position of the first transcribed nucleotide adjacent to the 5′ cap55. Poxviruses are unique among most DNA viruses in that DNA replication occurs in the cytoplasm, independent of the nucleus of the infected host cell, and accordingly, its genome encodes for factors required for both cytoplasmic transcription as well as DNA replication5. Hence should the putative epigenetic modifications of the viral DNA be validated, it would be likely that either viral proteins, or host cytoplasmic proteins would be implicated in the base modification process, as opposed to host nuclear proteins. Many mammalian cytoplasmic proteins are known to bind viral nucleic acids56.
To conclude, we have developed a novel assembly pipeline for long read sequencing of Poxvirus genomes, that corrects the lengths of terminal ends. The two confirmed deletions of this vaccine strain in comparison to VACV-MVA make it one of the smallest non-vector derived orthopoxvirus genomes to be reported. We have used the latest software for signal peptide prediction to discover a novel predicted signal peptide in a VACV protein that has not been previously reported, as well as discovering 10 alternate predicted signal peptides in comparisons to those previously reported. We have presented putative epigenetic modifications within the Ducapox genome, based on divergence of the raw signals from a standard model for given sequence motifs. The methods we have detailed may be used for other viral genomes, thus aiding the understanding of the molecular mechanisms underpinning viral virulence, evolution and host preferences.
Methods
Source and composition of vaccine
A commercial live attenuated ‘Ducapox’ vaccine was sourced from Al Bashayer Veterinary Supplies (Dubai, United Arab Emirates), manufactured by Design Biologix (Pretoria, South Africa) and commercialized by Highveld Biological Ltd (Johannesburg, South Africa). The CMLV strain CaPV298-2, the parent strain of this vaccine, was originally isolated in the United Arab Emirates and attenuated through serial passage in Vero cell culture33. Manufacture and expiry dates were 07–2018 and June 2019, respectively and the batch number was DPV0818.
DNA extraction
DNA was extracted using the QIAamp DNA Mini kit (Catalog # 51304, Qiagen, Hilden, Germany), following the DNA purification from tissues protocol, adding 180 μL of Buffer ATL to 25 mg of lyophilized vaccine and following the manufacturer's guidelines with the addition of adding 5 μg of Carrier RNA Poly A (Catalog # 1,017,647, Qiagen, Hilden, Germany) to the 200 μL of Buffer AL solution. The DNA preparation was analyzed for purity on a nanodrop spectrophotometer (ThermoScientific, Rochester, USA), and the concentration was determined using a Qubit dsDNA assay kit (ThermoScientific, Rochester, USA) and a Qubit 4 fluorometer (ThermoScientific, Rochester, USA).
Preparation of nanopore library and sequencing
400 ng of genomic DNA was used for Nanopore library preparation using a Rapid Sequencing Kit (SQK-RAD004, Oxford Nanopore Technologies) and barcode 18 of the Native Barcoding Expansion kit (EXP-NBD114, Oxford Nanopore Technologies). Multiplexed sequencing was performed on a MinION device (Oxford Nanopore Technologies), equipped with a R9.4.1 MinION flow cell. Base calling was performed offline with ONT’s Guppy software pipeline version 4.0.11, enabling the—pt_scaling flag, setting—trim_strategy to DNA, loading the dna_r9.4.1_450bps_hac configuration files, and setting—barcode_kits EXP-NBD114.
Long read— pre-processing, assembly, and polishing
Long read adapter trimming was performed with Porechop version 0.2.4 (www.github.com/rrwick/Porechop), setting both the—adapter_threshold and—barcode_threshold to 98. The trimmed long reads were filtered to remove reads under 3000 bases in length using NanoFilt version 2.6.057. The adapter trimmed, filtered long reads were assembled using Flye version 2.858 using the—nano-raw,—meta,—trestle and—keep-haplotypes flags. A fasta file of non-viral assembled contigs (identified using a blast search) was made from the assembly output using Bandage version 0.8.159. The adapter trimmed, filtered long reads were mapped to the non-viral assembled contigs using minimap2 version 2.17-r94160, and the unmapped reads were extracted from the alignment file and converted to FASTQ using samtools61, thus generating a read set exclusively containing viral DNA. The virus specific reads were assembled using Flye version 2.8, enabling the—nano-raw, setting the minimum overlap to 5000 using the -m 5000 flag, and conducting 3 polishing iterations by setting the -i 3 flag. The assembly was polished, correcting the ITR regions, using the—only-polish flag of the tandemquast tool of the TandemTools package62. Long reads were mapped to the assembly using minimap2 version 2.17-r941, and the resulting alignment file was used to polish the assembly with Racon version v1.4.1363 using the following parameters: -m 8 -x -6 -g -8 -w 500 -no-trimming. A total of 3 rounds of mapping and polishing with Racon were done on the assembly, after which no changes were observed. The corrected consensus was further polished with the same long read set using Medaka version 0.11.5 (https://github.com/nanoporetech/medaka), setting the—m r941_min_high_g360 flag. Figure 7 shows a graphical representation of the full assembly pipeline.
Assessment of assemblies and whole genome comparisons
The non-viral-DNA-free, adapter trimmed, filtered long reads were mapped to both the initial Flye assembly, and the final polished assembly in order to manually assess for the absence of read mapping breaks by plotting read mapping coverage of genome assemblies using pyGenomeTracks version 3.564. Genome comparisons were performed using the nucmer tool of Mummer 365. The final polished assembly was compared against the short-read Ducapox assembly (Genbank accession: MT648498.1) and Vaccinia virus strain Acambis 3000 MVA (Genbank accession: AY603355.1), the closest matching genome to the long-read assembly as determined by an online BLAST search.
Genome annotation
The polished assembly was annotated using Prodigal v2.6.366. The annotation gff3 file was loaded into GenSAS suite version 6.067, after which functional analyses were conducted in the suite using InterProScan version 5.25–68.068 and the ab initio predicted proteins were identified using blastp69 by conducting a protein vs protein search against the SwissProt protein data set to determine best matches. Protein sequences were analyzed for predicted signal peptides using the SignalP v5.070. Non-coding RNAs were detected using StructRNAfinder71.
Assessment of putative epigenetic modification sites
A total of 2214 Fast5 files (599.9 MB) that mapped to the long-read assembly were extracted using the fast5seek tool (github.com/mbhall88/fast5seek). The Tombo suite26 was used to detect Nanopore raw signals that diverged from the standard model, which could signify epigenetic modification sites. After running Tombo’s resquiggle function using the final polished genome, the detect_modifications function was run using the de_novo model with default parameters (dampened fraction estimation [2, 0]). The results of the stats file was converted to a FASTA file using the text_output function of Tombo, setting—num-regions 1000 and—num-bases 15. The central 7 nucleotides of each entry of the fasta file was plotted using the motif_with_stats (plotting the standard model, and default dampened fraction estimation [2, 0]) in Tombo, using the maximum—num-statistics number that would produce a plot for each fasta entry (determined empirically) for all entries with scores > 0.7 for “Frac. Alternate” in the fasta file. The motif_with_stats plots were assessed manually, and the motifs from plots containing increases in the fraction of modified bases (− log10(P-value) exclusively around the central motif only were kept, and these were used to create a separate fasta file containing all motifs for each of the four modified bases that were manually detected from the plots. Meme v5.1.172 was used on each individual fasta file using the—dna and—mod zoops flags to determine motifs. Motifs were compared to known motifs using Tomtom v5.1.173. Nanopolish v0.13.3 was used to assess for 5mC and 6 mA epigenetic modifications (75), setting a methylation frequency of above 0.5 as indicative of evidence for methylation. The presence of 5 mC epigenetic modifications were also assessed using Megalodon (github.com/nanoporetech/megalodon).
Supplementary Information
Acknowledgements
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. We thank staff at the University of Sharjah for facilitating the use of their facilities for DNA extraction.
Author contributions
Z.S. performed DNA extraction, bioinformatics analyses, and wrote the manuscript. M.D.H. performed Nanopore sequencing. T.M.B provided oversight, reviewed the manuscript and provided laboratory support.
Data availability
All data generated in this study has been deposited at the NCBI under Bioproject PRJNA663037. Nanopore sequencing read data can be accessed at the NCBI SRA using the accession number SRR12667950. Sample information can be accessed at the NCBI BioSample repository using the accession number SAMN16115327. The long-read Ducapox genome assembly generated in this study can be accessed using GenBank accession number MT946551 (The 159,696 bp assembly as version MT946551.1 and the corrected 159,695 bp assembly as version MT946551.2). The short-read Ducapox assembly and protein sequences can be accessed using GenBank accession number MT648498.1. The Vaccinia Virus strain Acambis 3000 MVA genomes can be accessed using GenBank accession number AY603355.1. Gene and protein names, and functional annotations (GO terms, InterPro, PFAM) are included in GenBank entries. Bioinformatics tool output files have been deposited in the following GitHub repository-https://github.com/zacksaud/Ducapox-Assembly-Project, as well as in the supplementary information.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Zack Saud, Matthew D. Hitchings and Tariq M. Butt.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-021-97158-x.
References
- 1.Fenner, F., Henderson, D.A., Arita, I., Jezek, Z. & Ladnyi, I.D. Smallpox and its eradication. Geneva: World Health Organization; 1988. [March 14, 2003]. p. 1460. Reference out-of-print. See the World Health Organization, Communicable Disease Surveillance and Response Web site. www.who.int/emc/diseases/smallpox/smallpoxeradication.html.
- 2.Jenner, E. An inquiry into the causes and effects of the variole vaccinae, a disease discovered in some of the Western Counties of England, Particularly Gloucestershire and Known by the Name of the cow‐pox. London: Sampson Low, 1798.
- 3.Sklenovská N, Van Ranst M. Emergence of monkeypox as the most important orthopoxvirus infection in humans. Front. Public Health. 2018;6:241. doi: 10.3389/fpubh.2018.00241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Gubser C, Smith GL. The sequence of camelpox virus shows it is most closely related to variola virus, the cause of smallpox. J. Gen. Virol. 2002;83:855–872. doi: 10.1099/0022-1317-83-4-855. [DOI] [PubMed] [Google Scholar]
- 5.Moss B. Poxvirus DNA replication. Cold Spring Harb. Perspect. Biol. 2013;5(9):a010199. doi: 10.1101/cshperspect.a010199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Winters E, Baroudy BM, Moss B. Molecular cloning of the terminal hairpin of vaccinia virus DNA as an imperfect palindrome in an Escherichia coli plasmid. Gene. 1985;37:221–228. doi: 10.1016/0378-1119(85)90276-8. [DOI] [PubMed] [Google Scholar]
- 7.Hendrickson RC, Wang C, Hatcher EL, Lefkowitz EJ. Orthopoxvirus genome evolution: The role of gene loss. Viruses. 2010;2(9):1933–1967. doi: 10.3390/v2091933. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA. 1977;74(12):5463–5467. doi: 10.1073/pnas.74.12.5463. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Bennett S. Solexa ltd. Pharmacogenomics. 2004;5(4):433–438. doi: 10.1517/14622416.5.4.433. [DOI] [PubMed] [Google Scholar]
- 10.Kasianowicz JJ, Brandin E, Branton D, Deamer DW. Characterization of individual polynucleotide molecules using a membrane channel. Proc. Natl. Acad. Sci. USA. 1996;93:3770–3773. doi: 10.1073/pnas.93.24.13770. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Jain M, Olsen HE, Paten B, Akeson M. The Oxford Nanopore MinION: Delivery of nanopore sequencing to the genomics community. Genome Biol. 2016;17(1):239. doi: 10.1186/s13059-016-1103-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Eid J, Fehr A, Gray J, et al. Real-time DNA sequencing from single polymerase molecules. Science. 2009;323(5910):133–138. doi: 10.1126/science.1162986. [DOI] [PubMed] [Google Scholar]
- 13.Gubser C, Hué S, Kellam P, Smith GL. Poxvirus genomes: A phylogenetic analysis. J. Gen. Virol. 2004;85(1):105–117. doi: 10.1099/vir.0.19565-0. [DOI] [PubMed] [Google Scholar]
- 14.Moss B. Poxviridae: The viruses and their replication. In: Knipe DM, Howley PM, editors. Fields Virology. 4. Philadelphia: Lippincott Williams & Wilkins; 2001. pp. 2849–2883. [Google Scholar]
- 15.Teferi WM, Desaulniers MA, Noyce RS, Shenouda M, Umer B, Evans DH. The vaccinia virus K7 protein promotes histone methylation associated with heterochromatin formation. PLoS ONE. 2017;12(3):e0173056. doi: 10.1371/journal.pone.0173056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Ember SW, Ren H, Ferguson BJ, Smith GL. Vaccinia virus protein C4 inhibits NF-κB activation and promotes virus virulence. J. Gen. Virol. 2012;93(10):2098–2108. doi: 10.1099/vir.0.045070-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Unterholzner L, Sumner RP, Baran M, Ren H, Mansur DS, Bourke NM, et al. Vaccinia virus protein C6 is a virulence factor that binds TBK-1 adaptor proteins and inhibits activation of IRF3 and IRF7. PLoS Pathog. 2011;7(9):e1002247. doi: 10.1371/journal.ppat.1002247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Fahy AS, Clark RH, Glyde EF, Smith GL. Vaccinia virus protein C16 acts intracellularly to modulate the host response and promote virulence. J. Gen. Virol. 2008;89(10):2377–2387. doi: 10.1099/vir.0.2008/004895-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Benfield CT, Mansur DS, McCoy LE, Ferguson BJ, Bahar MW, Oldring AP, et al. Mapping the IkappaB kinase beta (IKKbeta)-binding interface of the B14 protein, a vaccinia virus inhibitor of IKKbeta-mediated activation of nuclear factor kappaB. J. Biol. Chem. 2011;286(23):20727–22035. doi: 10.1074/jbc.M111.231381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Yuwen H, Cox JH, Yewdell JW, Bennink JR, Moss B. Nuclear localization of a double-stranded RNA-binding protein encoded by the vaccinia virus E3L gene. Virology. 1993;195(2):732–744. doi: 10.1006/viro.1993.1424. [DOI] [PubMed] [Google Scholar]
- 21.Senkevich TG, Koonin EV, Moss B. Vaccinia virus F16 protein, a predicted catalytically inactive member of the prokaryotic serine recombinase superfamily, is targeted to nucleoli. Virology. 2011;417(2):334–342. doi: 10.1016/j.virol.2011.06.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Ferguson BJ, Benfield CT, Ren H, Lee VH, Frazer GL, Strnadova P, et al. Vaccinia virus protein N2 is a nuclear IRF3 inhibitor that promotes virulence. J. Gen. Virol. 2013;94(9):2070–2081. doi: 10.1099/vir.0.054114-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Knipe DM. Nuclear sensing of viral DNA, epigenetic regulation of herpes simplex virus infection, and innate immunity. Virology. 2015;479–480:153–159. doi: 10.1016/j.virol.2015.02.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Tsai K, Cullen BR. Epigenetic and epitranscriptomic regulation of viral replication. Nat. Rev. Microbiol. 2020;1:1. doi: 10.1038/s41579-020-0382-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Flusberg BA, Webster DR, Lee JH, Travers KJ, Olivares EC, Clark TA, Korlach J, Turner SW. Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat. Methods. 2010;7(6):461–465. doi: 10.1038/nmeth.1459. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Stoiber MH, Quick J, Egan R, Lee JE, Celniker SE, Neely R, Loman N, Pennacchio L, Brown JB. De novo identification of DNA modifications enabled by genome-guided nanopore signal processing. BioRxiv. 2017;2017:094672. doi: 10.1101/094672. [DOI] [Google Scholar]
- 27.Amarasinghe SL, Su S, Dong X, Zappia L, Ritchie ME, Gouil Q. Opportunities and challenges in long-read sequencing data analysis. Genome Biol. 2020;21(1):30. doi: 10.1186/s13059-020-1935-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Frommer M, McDonald LE, Millar DS, Collis CM, Watt F, Grigg GW, Molloy PL, Paul CL. A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. Proc. Natl. Acad. Sci. 1992;89(5):1827–1831. doi: 10.1073/pnas.89.5.1827. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Feederle R, Schepers A. Antibodies specific for nucleic acid modifications. RNA Biol. 2017;14(9):1089–1098. doi: 10.1080/15476286.2017.1295905. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Müller CA, Boemo MA, Spingardi P, et al. Capturing the dynamics of genome replication on individual ultra-long nanopore sequence reads. Nat. Methods. 2019;16:429–436. doi: 10.1038/s41592-019-0394-y. [DOI] [PubMed] [Google Scholar]
- 31.Nehme Z, Pasquereau S, Herbein G. Control of viral infections by epigenetic-targeted therapy. Clin. Epigenet. 2019;11:55. doi: 10.1186/s13148-019-0654-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Kono N, Arakawa K. Nanopore sequencing: Review of potential applications in functional genomics. Dev. Growth Differ. 2019;61(5):316–326. doi: 10.1111/dgd.12608. [DOI] [PubMed] [Google Scholar]
- 33.Kaaden, D. R., Walz, C.P. Czerny, U. Wernery, U. & Allen., W. R. Progress in the development of a camel pox vaccine. Proceeding of the 1st Int. Camel Conference, 47–49 (1992)
- 34.Saud Z, Butt TM. Another case of mistaken identity? Vaccinia virus in another live Camelpox vaccine. Biologicals. 2020;65:39–41. doi: 10.1016/j.biologicals.2020.04.002. [DOI] [PubMed] [Google Scholar]
- 35.Marcacci M, Khalafalla AIA, Hammadi ZM, Monaco F, Cammà C, Yusof MFA, Yammahi SM, Mangone I, Valleriani F, Alhosani MA, Decaro N, Lorusso A, Almuhairi SS, Savini G. Genome sequencing of a camelpox vaccine reveals close similarity to modified vaccinia virus ankara (MVA) Viruses. 2020;12(8):E786. doi: 10.3390/v12080786. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Howard ST, Ray CA, Patel DD, Antczak JB, Pickup & D.J. A 43-nucleotide RNA cis-acting element governs the site-specific formation of the 3′ end of a poxvirus late mRNA. Virology. 1999;255:190–204. doi: 10.1006/viro.1998.9547. [DOI] [PubMed] [Google Scholar]
- 37.D'Costa SM, Antczak JB, Pickup DJ, Condit RC. Post-transcription cleavage generates the 3′ end of F17R transcripts in vaccinia virus. Virology. 2004;319(1):1–11. doi: 10.1016/j.virol.2003.09.041. [DOI] [PubMed] [Google Scholar]
- 38.Lefkowitz EJ, Upton C, Changayil SS, Buck C, Traktman P, Buller RM. Poxvirus bioinformatics resource center: A comprehensive Poxviridae informational and analytical resource. Nucleic Acids Res. 2005;33:D311–316. doi: 10.1093/nar/gki110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Senkevich TG, Bruno D, Martens C, Porcella SF, Wolf YI, Moss B. Mapping vaccinia virus DNA replication origins at nucleotide level by deep sequencing. Proc. Natl. Acad. Sci. USA. 2015;112(35):10908–10913. doi: 10.1073/pnas.1514809112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Esteban M, Holowczak JA. Replication of vaccinia DNA in mouse L cells. I. In vivo DNA synthesis. Virology. 1977;78(1):57–75. doi: 10.1016/0042-6822(77)90078-2. [DOI] [PubMed] [Google Scholar]
- 41.Pogo BGT, O’Shea MT. The mode of replication of vaccinia virus DNA. Virology. 1978;84(1):1–8. doi: 10.1016/0042-6822(78)90213-1. [DOI] [PubMed] [Google Scholar]
- 42.Parkinson JE, Sanderson CM, Smith GL. The vaccinia virus A38L gene product is a 33-kDa integral membrane glycoprotein. Virology. 1995;214(1):177–188. doi: 10.1006/viro.1995.9942. [DOI] [PubMed] [Google Scholar]
- 43.Sanderson CM, Parkinson JE, Hollinshead M, Smith GL. Overexpression of the vaccinia virus A38L integral membrane protein promotes Ca2+ influx into infected cells. J. Virol. 1996;70(2):905–914. doi: 10.1128/JVI.70.2.905-914.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Alcamí A, Symons JA, Smith GL. The vaccinia virus soluble alpha/beta interferon (IFN) receptor binds to the cell surface and protects cells from the antiviral effects of IFN. J. Virol. 2000;74(23):11230–11239. doi: 10.1128/jvi.74.23.11230-11239.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Senkevich TG, Weisberg AS, Moss B. Vaccinia virus E10R protein is associated with the membranes of intracellular mature virions and has a role in morphogenesis. Virology. 2000;278(1):244–252. doi: 10.1006/viro.2000.0656. [DOI] [PubMed] [Google Scholar]
- 46.Price N, Tscharke DC, Hollinshead M, Smith GL. Vaccinia virus gene B7R encodes an 18-kDa protein that is resident in the endoplasmic reticulum and affects virus virulence. Virology. 2000;267(1):65–79. doi: 10.1006/viro.1999.0116. [DOI] [PubMed] [Google Scholar]
- 47.Meisinger-Henschel C, Späth M, Lukassen S, et al. Introduction of the six major genomic deletions of modified vaccinia virus Ankara (MVA) into the parental vaccinia virus is not sufficient to reproduce an MVA-like phenotype in cell culture and in mice. J. Virol. 2010;84(19):9907–9919. doi: 10.1128/JVI.00756-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Turner PC, Moyer RW. The vaccinia virus fusion inhibitor proteins SPI-3 (K2) and HA (A56) expressed by infected cells reduce the entry of superinfecting virus. Virology. 2008;380(2):226–233. doi: 10.1016/j.virol.2008.07.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Roberts KL, Breiman A, Carter GC, et al. Acidic residues in the membrane-proximal stalk region of vaccinia virus protein B5 are required for glycosaminoglycan-mediated disruption of the extracellular enveloped virus outer membrane. J. Gen. Virol. 2009;90(Pt 7):1582–1591. doi: 10.1099/vir.0.009092-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Sood CL, Moss B. Vaccinia virus A43R gene encodes an orthopoxvirus-specific late non-virion type-1 membrane protein that is dispensable for replication but enhances intradermal lesion formation. Virology. 2010;396(1):160–168. doi: 10.1016/j.virol.2009.10.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Owji H, Nezafat N, Negahdaripour M, Hajiebrahimi A, Ghasemi Y. A comprehensive review of signal peptides: Structure, roles, and applications. Eur. J. Cell Biol. 2018;97(6):422–441. doi: 10.1016/j.ejcb.2018.06.003. [DOI] [PubMed] [Google Scholar]
- 52.Duggan AT, Klunk J, Porter AF, et al. The origins and genomic diversity of American Civil War Era smallpox vaccine strains. Genome Biol. 2020;21:175. doi: 10.1186/s13059-020-02079-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Tombácz D, Prazsák I, Szucs A, Dénes B, Snyder M, Boldogkoi Z. Dynamic transcriptome profiling dataset of vaccinia virus obtained from long-read sequencing techniques. Gigascience. 2018;7(12):1139. doi: 10.1093/gigascience/giy139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Tombácz D, Prazsák I, Csabai Z, et al. Long-read assays shed new light on the transcriptome complexity of a viral pathogen. Sci. Rep. 2020;10(1):13822. doi: 10.1038/s41598-020-70794-5(2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Dhungel P, Cantu FM, Molina JA, Yang Z. Vaccinia virus as a master of host shutoff induction: Targeting processes of the central dogma and beyond. Pathogens. 2020;9(5):400. doi: 10.3390/pathogens9050400. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Habjan M, Pichlmair A. Cytoplasmic sensing of viral nucleic acids. Curr. Opin. Virol. 2015;11:31–37. doi: 10.1016/j.coviro.2015.01.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.De Coster W, D'Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: Visualizing and processing long-read sequencing data. Bioinformatics. 2018;34(15):2666–2669. doi: 10.1093/bioinformatics/bty149. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 2019;37(5):540–546. doi: 10.1038/s41587-019-0072-8. [DOI] [PubMed] [Google Scholar]
- 59.Wick RR, Schultz MB, Zobel J, Holt KE. Bandage: Interactive visualisation of de novo genome assemblies. Bioinformatics. 2015;31(20):3350–3352. doi: 10.1093/bioinformatics/btv383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Li H. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18):3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing Subgroup The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics. 2009;25(16):2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Mikheenko A, Bzikadze AV, Gurevich A, Miga KH, Pevzner PA. TandemTools: Mapping long reads and assessing/improving assembly quality in extra-long tandem repeats. Bioinformatics. 2020;36(1):i75–i83. doi: 10.1093/bioinformatics/btaa440. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Vaser R, Sović I, Nagarajan N, Šikić M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 2017;27(5):737–746. doi: 10.1101/gr.214270.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Ramírez F, Bhardwaj V, Arrigoni L, Lam KC, Grüning BA, Villaveces J, Habermann B, Akhtar A, Manke T. High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat. Commun. 2018;9(1):189. doi: 10.1038/s41467-017-02525-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, et al. Versatile and open software for comparing large genomes. Genome Biol. 2004;5(2):R12. doi: 10.1186/gb-2004-5-2-r12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: Prokaryotic gene recognition and translation initiation site identification. BMC Bioinform. 2010;11:119. doi: 10.1186/1471-2105-11-119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Humann JL, Lee T, Ficklin S, Main D. Structural and functional annotation of eukaryotic genomes with GenSAS. Methods Mol. Biol. 2019;1962:29–51. doi: 10.1007/978-1-4939-9173-0_3. [DOI] [PubMed] [Google Scholar]
- 68.Jones P, Binns D, Chang HY, et al. InterProScan 5: Genome-scale protein function classification. Bioinformatics. 2014;30(9):1236–1240. doi: 10.1093/bioinformatics/btu03. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Camacho C, Coulouris G, Avagyan V, et al. BLAST+: Architecture and applications. BMC Bioinform. 2009;10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Almagro Armenteros JJ, Tsirigos KD, Sønderby CK, et al. (2019) SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat. Biotechnol. 2019;37(4):420–423. doi: 10.1038/s41587-019-0036-z. [DOI] [PubMed] [Google Scholar]
- 71.Arias-Carrasco R, Vásquez-Morán Y, Nakaya HI, et al. StructRNAfinder: An automated pipeline and web server for RNA families prediction. BMC Bioinform. 2018;19:55. doi: 10.1186/s12859-018-2052-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Bailey TL, Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 1994;2:28–36. [PubMed] [Google Scholar]
- 73.Gupta S, Stamatoyannopoulos JA, Bailey TL, Noble WS. Quantifying similarity between motifs. Genome Biol. 2007;8(2):R24. doi: 10.1186/gb-2007-8-2-r24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Simpson JT, Workman RE, Zuzarte PC, David M, Dursi LJ, Timp W. Detecting DNA cytosine methylation using nanopore sequencing. Nat. Methods. 2017;14(4):407–410. doi: 10.1038/nmeth.4184. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data generated in this study has been deposited at the NCBI under Bioproject PRJNA663037. Nanopore sequencing read data can be accessed at the NCBI SRA using the accession number SRR12667950. Sample information can be accessed at the NCBI BioSample repository using the accession number SAMN16115327. The long-read Ducapox genome assembly generated in this study can be accessed using GenBank accession number MT946551 (The 159,696 bp assembly as version MT946551.1 and the corrected 159,695 bp assembly as version MT946551.2). The short-read Ducapox assembly and protein sequences can be accessed using GenBank accession number MT648498.1. The Vaccinia Virus strain Acambis 3000 MVA genomes can be accessed using GenBank accession number AY603355.1. Gene and protein names, and functional annotations (GO terms, InterPro, PFAM) are included in GenBank entries. Bioinformatics tool output files have been deposited in the following GitHub repository-https://github.com/zacksaud/Ducapox-Assembly-Project, as well as in the supplementary information.