Summary
Vertebrate genomes contain lower than expected frequencies of the CpG dinucleotide. Consequently, many vertebrate viruses have evolved to mimic this composition, possibly in order to evade host antiviral defences (Greenbaum et al., 2008). For example, the antiviral protein ZAP binds CpGs in viral single stranded RNA with specific spacing requirements (Gonçalves-Carneiro et al., 2022), though CpGs are also likely depleted in viral genomes due to other selective pressures (Forni et al., 2023). Increasing CpG abundance by synonymous recoding could facilitate attenuation of viruses without compromising their epitope antigenicity by changing non-CpG codons to alternatives containing CpG without changing the overall amino acid sequence (Gonçalves-Carneiro et al., 2022; Le Nouën et al., 2019; Sharp et al., 2023). There are three ways CpGs can be synonymously introduced in codons: at positions 1-2 for arginine (e.g. AGA → CGA), 2-3 for several amino acids (e.g. ACA → ACG), or in a 3-1 split configuration, if a subsequent codon begins with a G (e.g. ATA-GCA → ATC-GCA).
Syn-CpG-Spacer is a Python progressive web app (PWA) (MDN Web Docs, 2023) made with the Panel library (Panel Development Team, 2024) that allows for consistent recoding of viral sequences and applying biologically relevant constraints. These include setting a minimum gap between CpG’s, optimising for an average CpG gap, protecting cis-acting regulatory signals from modification, and modulating the A-content in the overall sequence. The app features a sequence viewer made with the Bokeh library (Bokeh Development Team, 2024) that highlights CpG dinucleotides, allowing for efficient analysis of the resulting distribution of CpGs. This is complemented by a statistical data table. Utilising Biopython (Cock et al., 2009) modules, the user can load their sequence as a FASTA file and download the outputs as an alignment in the same format. As a PWA running on Pyodide (The Pyodide development team, 2023), the code is only executed in the user’s browser and they can install the app onto their machine for offline use.
Statement of need
There are currently no published tools that allow specifying the spacing of CpG dinucleotides when synonymously recoding sequences. SSE can recode a sequence to a defined CpG frequency, and has a graphical interface (Simmonds, 2012). However, it cannot control the spacing of the CpGs. Newer tools require the knowledge of R, such as SynMut (Gu & Poon, 2023). A problem with bioinformatics packages is a high access barrier for users who are not familiar with programming languages. Here, the installation and update processes are streamlined thanks to the PWA approach.
With the current lack of CpG recoding tools, researchers may turn to in-house solutions which can hamper the reproducibility of their results, while also introducing the room for error. Syn-CpG-Spacer makes it more efficient to synonymously recode a sequence compared to doing so without any support tools.
Research applications
In recent years, there has been an increased research focus on introducing CpGs into viral genomes as a mechanism to create live attenuated virus vaccines, such as influenza A virus (Gaunt et al., 2016; Sharp et al., 2023). However, the mechanism of action for how CpGs restrict the virus is unclear and it could be due to sensitising it to antiviral proteins such as ZAP (Ficarelli et al., 2021) or other poorly characterised effects on viral gene expression. This tool can be used to introduce CpGs into different viral genomes with specific spacing to determine if this attenuates the virus in vitro or in vivo and characterise the mechanism of attenuation, which will aid the development of live attenuated viral vaccines (Le Nouën et al., 2019).
Another potential application for the software is creation of CpG islands, which are long stretches of DNA rich in CpG dinucleotides that allow for epigenetic control of transcription. While most vertebrate CpGs are methylated, and thus transcriptionally silent, the DNA in CpG islands is hypomethylated, facilitating transcription factor binding (Angeloni & Bogdanovic, 2021).
Acknowledgements
Aleksander Sulkowski is financially supported by the Association of Clinical Pathologists (UK). Clément Bouton and Chad Swanson are supported by the MRC grant MR/W018519/1.
References
- Angeloni A, Bogdanovic O. Sequence determinants, function, and evolution of CpG islands. Biochemical Society Transactions. 2021;49(3):1109–1119. doi: 10.1042/BST20200695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bokeh Development Team. Bokeh: Python library for interactive visualization (Version 3.3.4) 2024. https://docs.bokeh.org/en/3.3.4/
- Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon MJL. Biopython: Freely available python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422–1423. doi: 10.1093/bioinformatics/btp163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ficarelli M, Neil SJD, Swanson CM. Targeted restriction of viral gene expression and replication by the ZAP antiviral system. Annual Review of Virology. 2021;8(1):265–283. doi: 10.1146/annurev-virology-091919-104213. [DOI] [PubMed] [Google Scholar]
- Forni D, Pozzoli U, Cagliani R, Clerici M, Sironi M. Dinucleotide biases in RNA viruses that infect vertebrates or invertebrates. Microbiology Spectrum. 2023:e02529. doi: 10.1128/spectrum.02529-23. 0(0) [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gaunt E, Wise HM, Zhang H, Lee LN, Atkinson NJ, Nicol MQ, Highton AJ, Klenerman P, Beard PM, Dutia BM, Digard P, et al. Elevation of CpG frequencies in influenza a genome attenuates pathogenicity but enhances host response to infection. eLife. 2016;5:e12735. doi: 10.7554/eLife.12735. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gonçalves-Carneiro D, Mastrocola E, Lei X, DaSilva J, Chan YF, Bieniasz PD. Rational attenuation of RNA viruses with zinc finger antiviral protein. Nature Microbiology. 2022;7(10):1558–1567. doi: 10.1038/s41564-022-01223-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Greenbaum BD, Levine AJ, Bhanot G, Rabadan R. Patterns of evolution and host gene mimicry in influenza and other RNA viruses. PLOS Pathogens. 2008;4(6):e1000079. doi: 10.1371/journal.ppat.1000079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gu H, Poon LLM. SynMut: Designing synonymously mutated sequences with different genomic signatures (Version 1.16.0) Bioconductor version: Release (3.17) 2023 doi: 10.18129/B9.bioc.SynMut. [DOI] [Google Scholar]
- Le Nouën C, Collins PL, Buchholz UJ. Attenuation of human respiratory viruses by synonymous genome recoding. Frontiers in Immunology. 2019;10 doi: 10.3389/fimmu.2019.01250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- MDN Web Docs. Progressive web apps. 2023. Oct 25, https://developer.mozilla.org/en-US/docs/Web/Progressive_web_apps .
- Panel Development Team. Holoviz/panel: The powerful data exploration & web app framework for python (Version v1.3.8) Zenodo; 2024. [DOI] [Google Scholar]
- Sharp CP, Thompson BH, Nash TJ, Diebold O, Pinto RM, Thorley L, Lin Y-T, Sives S, Wise H, Hendry SC, Grey F, et al. CpG dinucleotide enrichment in the influenza a virus genome as a live attenuated vaccine development strategy. PLOS Pathogens. 2023;19(5):e1011357. doi: 10.1371/journal.ppat.1011357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Simmonds P. SSE: A nucleotide and amino acid sequence analysis platform. BMC Research Notes. 2012;5(1):1–10. doi: 10.1186/1756-0500-5-50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- The Pyodide development team. Pyodide/pyodide: A python distribution for the browser and node.js based on WebAssembly (Version 0.24.1) Zenodo; 2023. [DOI] [Google Scholar]
