Abstract
Bacillus Calmette Guerin (BCG) vaccines comprise a family of related strains. Whole genome sequencing has allowed the better characterisation of the differences between many of the BCG vaccines. As sequencing technologies improve, updating of publicly available sequence data becomes common practice. We hereby announce the draft genome of four commonly used BCG vaccines in Brazil, Argentina and Venezuela.
Key words: mycobacteria, BCG, whole genome sequencing.
Mycobacterium bovis Bacillus Calmette Guerin, commonly known as BCG, is the only vaccine against tuberculosis. The original BCG strain was obtained by serial passages of a M. bovis strain in potato-bile media. 1 Deletion of the region of difference (RD) 1 was later confirmed as one of the reasons for the attenuation of its virulence. 2 , 3 After its first use in humans, the vaccine was sent to different laboratories worldwide where different culturing conditions originated strains with different genetic compositions. 4
At present, there are more than 10 different vaccine strains being administered worldwide. 5 In two countries in Latin-America, namely Venezuela and Argentina, the strains BCG Danish 1331 (Statens serum Institut, Denmark), BCG Pasteur 1173P2 (Instituto Nacional de Producción de Biológicos ― ANLIS Carlos G Malbrán, Argentina) and BCG Sofia SL222 (BB NCIPD Ltd, Bulgaria) are licensed for use. The vaccine BCG Pasteur produced in Argentina is a secondary seed lot of the French BCG Pasteur strain 1173P2 and is administered in the Province of Buenos Aires, while the rest of the country is vaccinated either with the Sofia or the Danish strain. In Brazil, BCG Moreau RDJ (Fundação Ataulpho de Paiva, Brazil) was used as a vaccine until 2017, when it was replaced by the Russian strain.
Whole genome sequencing data of the strains Moreau, Pasteur and Danish are already available 6 , 7 , 8 ) and obtained either by using shotgun sequencing and specific primers designed to close the gaps in the assembly (for Moreau and Pasteur strains) or a combination of Illumina and PacBio technology (for the Danish strain). BCG Sofia has so far only been subjected to whole genome analysis using microarrays. 9 We sequenced the genome of these four vaccine strains with Illumina technology in an effort to update the sequencing data available and for BCG Sofia, we report the first sequence data obtained with newer technology.
Genome sequencing of the four vaccine strains was performed using the Nextera XT DNA Library preparation kit on an Illumina HiSeq 2500 platform. De novo assembly was done using Unicycler 10 and annotated with RAST. 11 To determine intra-strain genomic variability of each vaccine, we compared the genomes with previous assemblies obtained from the NCBI 6 , 7 , 8 using the software Artemis Comparison tool 12 and Snippy. 13 The strain BCG Sofia SL222 originated from the Russian vaccine BCG-1 and was chosen as a master seed at the BCG Bulgarian laboratory. 9 Because there is no whole genome assembly available for BCG Sofia SL222, we decided to use the assembly of its parental strain BCG-1 Russia for the comparative studies. 14
Among the four genomes, we obtained between 82 and 108 contigs, an average guanine-cytosine content (GC) of 65%, a size ranging between 4.2 and 4.3 Mb and the number of coding sequences (CDS) between 4205 and 4245 (Table I). The differences in the size of BCG strains genomes we noticed when compared to those available in public databases is probably due to variation in sequencing technologies and of assemblers used.
TABLE I. Assembly statistics for the four vaccine strains sequenced.
| Number of contigs | Moreau RDJ | Pasteur 1173P2 | Sofia SL222 | Danish 1331 |
| 82 | 93 | 102 | 108 | |
| Genome size (bp) | 4288.245 | 4.192.545 | 4.201.889 | 4.202.807 |
| Coverage | 414X | 107X | 101X | 94X |
| % GC | 65.62 | 65.48 | 65.45 | 65.47 |
| N50 | 197411 | 84414 | 70691 | 70718 |
| CDS | 4232 | 4205 | 4245 | 4227 |
| tRNAs | 47 | 47 | 47 | 47 |
bp: base-pairs; %GC: guanine-cytosine content; CDS: coding sequences; tRNA: transfer RNA.
The genome of BCG Moreau RDJ strain revealed 55 single nucleotide polymorphisms (SNPs) compared to that of the shotgun sequencing based genome of the same strain obtained in 2011, 28 of these SNPs are non-synonymous (ns) (Table II). We also detected five insertions and four deletions of 3-4 nucleotides (data not shown) and an inverted IS1608 transposase gene (position 3717335-3717826 bp).
TABLE II. Non-synonymous single nucleotide polymorphisms (SNPs) found in Bacillus Calmette Guerin (BCG) Moreau when using assembly NZ_AM412059* as a reference.
| Position | NZ_AM412059 | BCG Moreau | AA change | Gene |
| 404956 | T | G | Glu713Asp | Iron-sulphur-binding reductase |
| 555536 | C | T | Gly164Glu | FIG00821074: hypothetical protein |
| 555569 | G | A | Pro153Leu | FIG00821074: hypothetical protein |
| 570675 | T | G | Lys91Asn | Aliphatic amidase AmiE |
| 878380 | G | T | Gly233Val | Protease II (EC 3.4.21.83) |
| 1217552 | T | G | His65Gln | PE family protein |
| 1618404 | A | G | Asp284Gly | Anaerobic dimethyl sulfoxide reductase chain A |
| 1618472 | A | C | Met307Leu | Anaerobic dimethyl sulfoxide reductase chain A |
| 1618722 | G | C | Arg390Pro | Anaerobic dimethyl sulfoxide reductase chain A |
| 1618779 | T | G | Val409Gly | Anaerobic dimethyl sulfoxide reductase chain A |
| 1731072 | G | A | Ala234Thr | Sorbitol-6-phosphate 2-dehydrogenase |
| 1985896 | C | G | Pro114Ala | L-gulono-1,4-lactone oxidase |
| 2400116 | A | G | Leu184Pro | Cell division protein FtsL / proline rich membrane protein |
| 2651260 | C | G | Ala266Gly | PE family protein |
| 2701298 | G | T | Pro413Thr | Ribonuclease E |
| 2760281 | T | C | Ser266Gly | GTP-binding protein Obg |
| 2760610 | G | C | Ala156Gly | GTP-binding protein Obg |
| 2760682 | T | C | Glu132Gly | GTP-binding protein Obg |
| 3149570 | C | G | Leu224Val | Coenzyme F420-dependent oxidoreductase |
| 3273878 | A | G | Val602Ala | ATP-dependent DNA helicase RecG |
| 3365033 | A | C | Trp93Gly | Transcriptional regulator, TetR family |
| 3809510 | C | G | Gly67Ala | FIG00820542: hypothetical protein |
| 3879667 | A | G | Asn344Asp | GTP-binding protein Obg |
| 3881120 | T | G | Ile828Ser | GTP-binding protein Obg |
| 3881141 | C | A | Thr835Asn | GTP-binding protein Obg |
| 3891798 | T | G | Asp162Ala | Long-chain fatty-acid-CoA ligase Mycobacterial subgroup FadD19 |
| 3963021 | C | G | Val222Leu | Transcriptional regulator, LacI family |
| 4172275 | T | G | Met67Leu | Membrane proteins related to metalloendopeptidases |
*: accession number for the assembly of BCG Moreau reported by Gomes et al.(6) A: adenine; G: guanine; C: cytosine; T: thymine; Glu: glutamic acid; Asp: aspartic acid; Gly: glycine; Pro: proline; Leu: leucine; Lys: lysine; Asn: asparagine; Val: valine; His: histidine; Gln: glutamine; Met: methionine; Arg: arginine; Thr: threonine; Ser: serine; Ala: alanine; Trp: tryptophan; Ile: isoleucine.
Upon sequencing BCG Sofia SL222 and after comparison with the BCG-1 Russian strain, we observed one synonymous (s) SNP in the gene coding for an uridylyltransferase, in addition to three inverted regions of 42,965 bp, 17,778 bp and 6,765 bp in length. Furthermore, by mapping the reads obtained from the Sofia strain to the genome of the Danish vaccine strain, we confirmed the presence of the 1.6 kb deletion described by Stefanova et al. 9 This deletion affects part of the gene coding for type II toxin-antitoxin system VapC family toxin, the gene for the antitoxin VapB48 and part of the glutamate - cysteine ligase gene.
The genome of BCG Danish 1331 was the last to be assembled by using a combination of Illumina and PacBio reads. 7 One advantage of performing PacBio sequencing is that it generates longer reads that improves detection of repeated regions and duplications. Upon sequencing, we observed five SNPs including four nsSNP and a stop codon (Table III). We also observed a deletion of five nucleotides in a SRPBCC family protein gene and two inversions of 26,170 bp and 7,565 bp.
TABLE III. Non-synonymous single nucleotide polymorphisms (SNPs) found in Bacillus Calmette Guerin (BCG) Danish when using assembly NZ_CP039850* as a reference.
| Position | NZ_CP039850 | BCG Danish | AA change | Gene |
| 593769 | C | T | Gln323** | SDR family oxidoreductase |
| 2076695 | C | A | Ala142Ser | M56 family metallopeptidase |
| 2500583 | T | C | His260Arg | Sulfotransferase |
| 3745609 | G | T | Ser434Tyr | PPE family protein |
| 3839864 | T | G | Thr135Pro | IMP dehydrogenase |
*: accession number for the sequencing of BCG Danish reported by Borgers et al.(7); **: indicates a stop codon; A: adenine; G: guanine; C: cytosine; T: thymine; Gln: glutamine; Ala: alanine; Ser: serine; Thr: threonine; Pro: proline; Tyr: tyrosine.
Genome assembly of BCG Pasteur presented a nsSNP in the GTP-binding protein Obg gen (Asn599Asp) and two inframe insertions of three nucleotides each in the genes coding for NADPH epimerase/NADPH dehydratase and a probable cutinase. We also found one inverted region of 31,516 pb.
De novo sequencing of genomes deposited in public databases becomes imperative as new sequencing technologies arise. Recently, Abdallah et al. 15 reviewed the genomes and transcriptomes of fourteen BCG vaccine strains and together with the work of Borgers on the Danish vaccine comprise the most recent studies in BCG strains genealogy. We announce the initial draft genome of four of the most common BCG vaccines licensed worldwide in an effort to contribute to the update of publicly available data. The comparative analysis of BCG strains remains of crucial importance to trace their divergence in terms of genetic sequence, transcription and proteomic profile and, subsequently, to describe possible variation in the protective efficacy.
Accession numbers - The reads of each genome have been deposited under SRA accession PRJNA575846, BioProject ID: PRJNA575846.
ACKNOWLEDGEMENTS
To the sequencing platform of Fiocruz (RPT01J) and Ricardo Junqueira for assistance in the preparation of libraries. We also acknowledge Kamila Chagas Peronni from the Laboratory of Molecular Genetics and Bioinformatics from the Regional Centre of Haemotherapy and Professor Valdes Bollela from the School of Medicine, São Paulo University in Ribeirão Preto.
Footnotes
Financial support: This study was financed in part by the CAPES (Finance Code 001), CNPq, FAPERJ. PNS was supported by CNPq (grant PQ 310418/2016-0).
REFERENCES
- 1.Calmette A. Preventive vaccination against tuberculosis with BCG. Proc R Soc Med. 1931;24(11):1481–1490. doi: 10.1177/003591573102401109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Behr MA. BCG - Different strains, different vaccines. Lancet Infect Dis. 2002;2(2):86–92. doi: 10.1016/s1473-3099(02)00182-2. [DOI] [PubMed] [Google Scholar]
- 3.Mahairas GG, Sabo PJ, Hickey MJ, Singh DC, Stover CK. Molecular analysis of genetic differences between Mycobacterium bovis BCG and virulent M bovis. J Bacteriol. 1996;178(5):1274–1282. doi: 10.1128/jb.178.5.1274-1282.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Tran V, Liu JUN, Behr MA. BCG vaccines. Microbiol Spectr. 2014;2(1):1–11. doi: 10.1128/microbiolspec.MGM2-0028-2013. [DOI] [PubMed] [Google Scholar]
- 5.Zwerling A, Behr MA, Verma A, Brewer TF, Menzies D, Pai M. The BCG world atlas a database of global BCG vaccination policies and practices. PLoS Med. 2011;8(3):1–7. doi: 10.1371/journal.pmed.1001012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Gomes LHF, Otto TD, Vasconcellos ÉA, Ferrão PM, Maia RM, Moreira AS. Genome sequence of Mycobacterium bovis BCG Moreau, the Brazilian vaccine strain against tuberculosis. J Bacteriol. 2011;193(19):5600–5601. doi: 10.1128/JB.05827-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Borgers K, Ou JY, Zheng PX, Tiels P, Van Hecke A, Plets E. Reference genome and comparative genome analysis for the WHO reference strain for Mycobacterium bovis BCG Danish, the present tuberculosis vaccine. BMC Genomics. 2019;20(1):1–14. doi: 10.1186/s12864-019-5909-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Brosch R, Gordon SV, Garnier T, Eiglmeier K, Frigui W, Valenti P. Genome plasticity of BCG and impact on vaccine efficacy. Proc Natl Acad Sci USA. 2007;104(13):5596–5601. doi: 10.1073/pnas.0700869104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Stefanova T, Chouchkova M, Hinds J, Butcher PD, Inwald J, Dale J. Genetic composition of Mycobacterium bovis BCG substrain Sofia. J Clin Microbiol. 2003;41(11):5349–5349. doi: 10.1128/JCM.41.11.5349.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol. 2017;13(6):1–22. doi: 10.1371/journal.pcbi.1005595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Aziz RK, Bartels D, Best A, De Jongh M, Disz T, Edwards RA. The RAST server rapid annotations using subsystems technology. BMC Genomics. 2008;9:1–15. doi: 10.1186/1471-2164-9-75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Carver TJ, Rutherford KM, Berriman M. Rajandream M-A.Barrell BG.Parkhill J ACT the Artemis Comparison Tool. Bioinformatics. 2005;21(16):3422–3423. doi: 10.1093/bioinformatics/bti553. [DOI] [PubMed] [Google Scholar]
- 13.Seemann T. Snippy: fast bacterial variant calling from NGS reads. V. 4.1. 2015 [Google Scholar]
- 14.Sotnikova EA, Shitikov EA, Malakhova MV, Kostryukova ES, Ilina EN, Atrasheuskaya AV. Complete genome sequence of Mycobacterium bovis strain BCG-1 (Russia) Genome Announc. 2016;4(2):1–2. doi: 10.1128/genomeA.00182-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Abdallah AM, Hill-Cawthorne GA, Otto TD, Coll F, Guerra-Assunção JA, Gao G. Genomic expression catalogue of a global collection of BCG vaccine strains show evidence for highly diverged metabolic and cell-wall adaptations. Sci Rep. 2015;5:15443–15443. doi: 10.1038/srep15443. [DOI] [PMC free article] [PubMed] [Google Scholar]
