Abstract
A highly invasive form of non-typhoidal Salmonella (iNTS) disease has been recently documented in many countries in sub-Saharan Africa. The most common Salmonella enterica serovar causing this disease is Typhimurium. We applied whole-genome sequence-based phylogenetic methods to define the population structure of sub-Saharan African invasive Salmonella Typhimurium and compared these to global Salmonella Typhimurium isolates. Notably, the vast majority of sub-Saharan invasive Salmonella Typhimurium fell within two closely-related, highly-clustered phylogenetic lineages that we estimate emerged independently ~52 and ~35 years ago, in close temporal association with the current HIV pandemic. Clonal replacement of isolates of lineage I by lineage II was potentially influenced by the use of chloramphenicol for the treatment of iNTS disease. Our analysis suggests that iNTS disease is in part an epidemic in sub-Saharan Africa caused by highly related Salmonella Typhimurium lineages that may have occupied new niches associated with a compromised human population and antibiotic treatment.
INTRODUCTION
S. enterica is a diverse bacterial species that remains a common cause of infectious disease in humans and animals throughout the world1. Human Salmonella infections are classically divided into diseases caused by typhoidal or non-typhoidal salmonella (NTS). The former category includes the human restricted S. enterica serovars Typhi and Paratyphi that cause the systemic disease typhoid, while NTS is comprised of the majority of other serovars that predominantly cause self-limiting gastroenteritis in humans2. Salmonella Typhi is a human restricted pathogen that is transmitted from human to human, whereas NTS disease is normally associated with zoonotic reservoirs, typically domesticated animals, with little or no sustained human-to-human transmission.
In contrast to this classical view, NTS are a frequent cause of invasive bacterial disease in many countries in sub-Saharan Africa3,4. This invasive form of NTS disease (iNTS) is common both in children with malnutrition, severe anemia, malaria or HIV4,5 and HIV-infected adults6, frequently surpassing Salmonella Typhi in many parts of the region as the dominant cause of invasive salmonellosis. The clinical presentation of this disease is distinct from both gastroenteritis and typhoid fever, characterized by a non-specific fever that can be indistinguishable from malaria and in rare cases is accompanied by diarrhea7. The frequency of NTS-associated case fatalities can be extremely high in both adults and children (22% – 45%)6,8-10.
S. enterica serovar Typhimurium (Salmonella Typhimurium) is one of the serovars most frequently associated with iNTS in the sub-Saharan region, although other serovars including S. enterica serovar Enteritidis have also been implicated 3,4,8. We previously reported that Salmonella Typhimurium isolates from Kenya and Malawi were predominantly of a novel multi-locus sequence type (ST) designated ST3137, rarely isolated from outside sub-Saharan Africa. The DNA sequence of representative multi-drug-resistant (MDR) ST313 isolates, D23580 and A130, identified genomic features distinct from previously characterized gastroenteritis-associated strains7. These features included evidence of partial genome degradation, with some parallels to that observed in the S. enterica serovars Typhi and Paratyphi A that has been linked to niche adaptation11,12.
Here we use single nucleotide polymorphism (SNP)-based phylogenetic methods based on whole genome sequences to determine the population structure of a geographically-diverse collection of invasive Salmonella Typhimurium isolates from different sub-Saharan African countries. These data are placed in the phylogenetic context of Salmonella Typhimurium isolates from other parts of the world. We provide evidence that two tightly clustered genetic lineages have emerged within the last sixty years to be a dominant cause of an epidemic invasive Salmonella Typhimurium disease in the region. We highlight the potential role of antibiotic-resistance acquisition in driving the epidemic and the temporal association with an increased prevalence of HIV.
RESULTS
Phylogenetic analysis of Salmonella Typhimurium
Salmonella Typhimurium represents an unstratified serologically-defined group within the broader species S. enterica13. Therefore, in order to place the invasive Salmonella Typhimurium isolates from sub-Saharan Africa into an evolutionary and phylogenetic context we exploited whole genome sequencing to discover potentially informative SNPs within a collection of 179 Salmonella Typhimurium spanning the period 1938 and 2010, collected from different parts of the world. Our collection included 129 invasive Salmonella Typhimurium from Malawi, Kenya, Mozambique, Uganda, The Democratic Republic of Congo (DRC), Nigeria and Mali (Supplementary Table 1). Data were available for 10,623 high-quality SNPs, corresponding to approximately 1 SNP for every 407 bp, that were distributed relatively uniformly across the genome of the reference Salmonella Typhimurium SL1344. To refine phylogenetic analysis, SNPs associated with repetitive sequences, mobile elements and phage sequences, representing ~4% of the genome, were excluded. We detected no evidence of extensive recombination within the remaining genomic sequences and consequently SNPs mapping to these regions were used to reconstruct a maximum-likelihood phylogenetic tree14 (Fig. 1).
Notably, invasive Salmonella Typhimurium isolates from sub-Saharan Africa fall predominantly into two distinct ST313 phylogenetic lineages designated as lineages I and II. Furthermore, these lineages form distinct and extremely tight clusters on separate branches from other Salmonella Typhimurium that were isolated elsewhere in the world. The tight clustering is illustrated by the fact that isolates within either lineage I or lineage II are separated by mean differences as few as 33 and 21 SNPs respectively. The two invasive Salmonella Typhimurium lineages are joined to the main tree by relatively long branches, but there is divergence at the branch tips suggesting recent clonal or population expansion. Both lineages are thus more closely related to each other than to any other Salmonella Typhimurium within the tree. The two invasive Salmonella Typhimurium lineages are joined to the main tree by relatively long branches, but there has been divergence at the branch tips suggesting recent clonal or population expansion. Multi-Locus-Sequence-Typing (MLST) analysis confirmed lineages I and II as ST313, although a single isolate, 5580 from lineage I is ST394, a single locus variant of ST313 (Supplementary Fig. 1). All eight invasive Salmonella Typhimurium isolates from sub-Saharan Africa which fall outside of lineages I and II are ST19, a common ST to which 82% (41/50) of the 50 non-African Salmonella Typhimurium that we sequenced belong. Other STs represented in the non sub-Saharan Salmonella Typhimurium lineages include ST34 (5/50), ST98 (1/50), ST128 (2/50) and ST568 (2/50) (Supplementary Fig. 1).
Temporal and geographic distribution relative to phylogeny
We performed BEAST15 analysis on 129 sub-Saharan invasive Salmonella Typhimurium isolates from seven sub-Saharan African countries covering a 22 year-period from 1988 to 2010. BEAST is designed to reconstruct evolutionary history within the context of geographical distribution over time from sampled DNA sequences16, and has been used extensively in bacterial17-20, viral21,22 and eukaryotic population studies23. From this analysis, a single maximum clade credibility tree was produced for each lineage (Fig. 2a & 2b). The mean evolutionary rates, assuming a Bayesian skyline model of population size change and a relaxed molecular clock, were estimated to be 1.9×10−7 and 3.9×10−7substitutions−1site−1year−1, for lineage II and I respectively. This corresponds to an accumulation of approximately 1-2 SNPs genome−1year−1, which is similar to that calculated for the enteric pathogen Vibrio cholerae (8 × 10−7 site−1 year−1)24 and lies between the rates estimated for Yersinia pestis (2 × 10−8)25 and Staphylococcus aureus (3 × 10−6)26. The topologies of the BEAST and maximum-likelihood trees were congruent, and the recovered nodes were supported with high posterior probabilities and bootstrap values, respectively.
A time-dependent phylogeographic reconstruction of lineage 1, which is estimated to have emerged ~52 years ago (95% Highest posterior density HPD, 1920.4 -1979.5; Fig. 2a), indicated that in our collection, isolates from Malawi diverged earliest from the last common ancestor for this lineage. Although we cannot completely eliminate the potential bias due the number of Malawian isolates analyzed within this lineage, 25 permutation datasets using 10 randomly selected Malawi isolates (a different set of 10 isolates, an equivalent sample size to those from other countries, was used for each permutation) returned similar results to the complete data set. Thus, we are confident of our estimates of the age and geographical origin of the ancestral node of this lineage (Supplementary Fig. 2a & 2b). Analyses of the distribution of isolates from each country and the tree topology of lineage I are consistent with at least four independent transmission events or movements across Southeastern Africa, with Malawi having served as a potentially important early hub (Fig. 3a, Supplementary Fig. 3a). The earliest identifiable waves or transmissions were from Malawi to Kenya dated ~1982 (95% HPD, 1967.6 – 1990.2) and between Malawi and DRC dated ~1983 (95% HPD, 1974.8 – 1988.3). This same phylogenetically-linked wave was identified in Uganda ~1989 (95% HPD, 1980.0 – 1994.6) and a further onward wave was identifiable in Mozambique in ~1990 (95% HPD, 1981.0 – 1994.4) and this manifested as a second introduction into Uganda in ~2001 (95% HPD, 1981.0 – 1994.4). We cannot identify the specific geographical route that these bacterial lineages followed but the phylogenetic evidence clearly temporally links these outbreaks as a single epidemic. Our results also show evidence of geographical clustering after a transmission event introduced the lineage into a country. This suggests that the epidemic clone was introduced a limited number of times into each country, giving rise to localized epidemics or outbreaks.
Invasive Salmonella Typhimurium isolates of lineage I disappeared from our collection between 2003–2005 and were replaced by lineage II and isolates after 2006 were found exclusively in this cluster. Lineage II is estimated to have emerged ~35 years ago (95% HPD, 1957.1 – 1986.8) making it genetically younger than lineage I (Fig. 2b,). The spread of lineage II also appears to have occurred in several waves (Fig. 3b, Supplementary Fig. 3b). Our deepest-rooted isolates are from DRC with evidence for transmission onward to Uganda dated ~1985 (95% HPD 1972.6 – 1990.6). This wave was detected in Kenya and Malawi between 1994 and 1996. Malawi likely represents a more recent hub for further dispersal of invasive Salmonella Typhimurium lineage II isolates between 1995 and 1998 to several countries, including neighboring Mozambique and further westwards, across the sub-Saharan region, to Mali and Nigeria. A more recent wave of this lineage appears to have spread from Kenya arriving back in Malawi around 2002. We also see evidence for localized epidemics associated with the lineage II clones, as highlighted by clustering based on geography. Indeed, local epidemiology and molecular typing in Malawi and Kenya7,8 of invasive Salmonella Typhimurium isolates from 1997 to 2006 describes a local clonal replacement event of lineage I by lineage II associated with the emergence of chloramphenicol resistance in an 18-month period from 2001 to 2003.
Evolution of MDR and potential role of cat gene in clonal replacement
Previously, we characterized two distinct composite Tn21-like transposition elements encoding multiple antibiotic resistance determinants located on pSLT, the so-called virulence-associated plasmid, in two representative invasive Salmonella Typhimurium isolates, A130 (lineage I) and D23580 (lineage II)7. These Tn21 elements are inserted at different sites in the pSLT virulence plasmid in each isolate. Notably, in our phylogenetic analysis, we found these insertion sites to be identical within each lineage but different between lineages, suggesting that Tn21 element acquisition was an independent and early event in each lineage (Fig. 4, Supplementary Fig. 4). Interestingly, only one isolate from lineage I (A24924) and one from lineage II (254DRC) do not have a Tn21-like element (Fig. 4). Comparative analyses of these two isolates, which significantly are the most deeply rooted isolates in each lineage, revealed that although the relevant variant of Tn21 element is absent in both isolates (Fig. 2a & 2b), they share the pSLT plasmid backbone with other isolates of the same lineage. This suggests that in each case they share a common ancestor with other isolates within the same lineage, but importantly the ancestor existed before the acquisition of the composite Tn21-like elements (See supplementary data). With the exception of a deletion in istA, a transposase of insertion sequence IS1326, in A16083, the lineage 1-specific Tn21 locus is relatively highly conserved in most isolates of lineage I (Fig 4b). In contrast, the Tn21-like locus encoded by lineage II appears somewhat unstable since isolates in different parts of the tree have lost subsets of genes (14DRC, 5582, J17 and A32751) (Fig. 4a; Supplementary fig. 5).
One striking feature of the dataset is the absence of a chloramphenicol resistance (cat) gene from all isolates in lineage I. In contrast, the gene was present in > 97% of lineage II, being absent in two isolates (Fig. 4a). These isolates are 254DRC, which does not possess a Tn21 element and 5582, a 2005 Kenyan isolate where the loss of the cat gene is due to a simple deletion event (Fig. 4a). This strongly suggests an independent acquisition of the cat gene borne on a lineage II-specific Tn21 element early on in the genealogy, most likely about the time of expansion from DRC, as shown in Figure 2b (median node date 1984; 95% HPD, 1972.6 – 1990.6; state posterior probability = 0.78). The analysis of MDR acquisition is consistent with the antibiotic resistance profiles obtained for the isolates. In some of our sampling sites, such as Malawi, the acquisition of resistance to chloramphenicol was observed in invasive Salmonella Typhimurium isolates from around 2001-2004, consistent with the arrival of lineage II7. At this time, chloramphenicol was the drug of choice for treatment for suspected severe bacterial infections and cases of iNTS infection confirmed by blood culture. The acquisition of chloramphenicol resistance may have afforded lineage II clones a greater opportunity to survive treatment and transmit, which could have in turn contributed to the clonal replacement of lineage I isolates observed between 2003 and 2005 and the subsequent expansion of lineage II clones thereafter.
Transmission is temporally associated with HIV and the HIV pandemic
Time-dependent phylogeographic analysis revealed the clonal expansion of two distinct invasive Salmonella Typhimurium lineages within the last 40-50 years that was accompanied by spread across multiple countries of sub-Saharan Africa. Intriguingly, this emergence temporally coincides with the HIV epidemic in sub-Saharan Africa. Molecular clock analysis of HIV-1 genome sequences suggested that the pandemic began at the start of the twentieth century27-29, with prevalence peaking in the 1990’s in many countries, including those represented within our strain collection (2% in Mali to over 15% in Malawi) (Fig. 2c; Supplementary Fig. 6). An association with the HIV status of the patients is also reflected in terms of the samples analyzed in this study. For example, where a test was conducted for HIV, all adult samples were positive. One of the first reported case of HIV infection in Africa was from an adult in DRC30 and interestingly, the earliest geographic localization of epidemic clones from lineage II is within this country. Thus, the Congo basin provides a potential origin of invasive Salmonella Typhimurium lineage II31. It therefore appears possible that the epidemic of invasive Salmonella Typhimurium and transmission across the sub-Saharan region was potentiated by an increase in the critical population of susceptible and immunocompromised individuals, particularly more mobile adults.
DISCUSSION
The recent reporting of a remarkably high incidence of invasive Salmonella Typhimurium in various parts of the sub-Saharan African region makes it increasingly important to understand the evolutionary origins and spatio-temporal spread of these isolates. Recently, whole-genome sequencing methods have been used to trace intercontinental transmission of different recently emerged and closely related bacterial pathogens18,24,26,32, and we have therefore applied this high resolution analysis to determine the phylogenetic structure of invasive Salmonella Typhimurium. Here we revealed that the vast majority of Salmonella Typhimurium isolates associated with invasive disease from sub-Saharan Africa comprised just two highly conserved lineages of MLST type ST313 that are more closely related to each other than any other known Salmonella Typhimurium lineage. This is in contrast to the considerable phylogenetic variation of Salmonella Typhimurium isolates associated with gastroenteritis or invasive disease from outside sub-Saharan Africa. Thus, invasive Salmonella Typhimurium disease in this region is in part a previously unrecognized epidemic caused by the spread of the clones from these two lineages.
We show how invasive Salmonella Typhimurium transmission into a particular country or geographic area occurs as a discrete, temporally defined introduction that is followed by subsequent spread within that particular location (Fig. 2), although some local regions have experienced multiple introduction events. For example, it is evident that two independent introductions occurred in Mali between 1995 and 2000 (Fig. 2b). Considerable clonal expansion has occurred independently in each of these two lineages beginning around 1960. Independent acquisition of a Tn21 element encoding multiple drug resistance genes by both lineages may have facilitated their successful transmission across the sub-continent, within the susceptible host population. A later acquisition of a cat gene on the composite element within lineage II have contributed to a clonal replacement event, which occurred from 2003 – 2005, and resulted in greater spatial dispersion of clones from this lineage over sub-Saharan Africa. An association between acquisition of chloramphenicol resistance and increased transmission has been observed in early epidemiological studies on chloramphenicol-resistant Salmonella Typhi in Mexico33 and is also confirmed by observations reported in Kenya7 and Malawi8.
HIV increases susceptibility to iNTS infections34 and this form of bacteraemia is an AIDS-defining opportunistic infection in adults35,36. Further, animal models of co-infection between iNTS and simian immunodeficiency virus (SIV)37 or malaria38 indicate that host immune status plays a critical role in determining the outcome of Salmonella infections. Indeed, sporadic human invasive disease is a feature of the non-ST313 lineages of Salmonella Typhimurium. Thus, although ST313 is the dominant form of invasive Salmonella disease in sub-Saharan Africa3,39, it is not unexpected that other S. enterica or indeed Salmonella Typhimurium lineages can also cause sporadic disease. Notably, supporting epidemiological evidence indicates that the ST313 Salmonella Typhimurium lineages may not have reached some parts of Africa, including the Gambia40,41 and Ethiopia42,43 where iNTS has been reported.
It is particularly noteworthy that we see a temporal association of clonal expansion of invasive Salmonella Typhimurium with the peaks in HIV prevalence, particularly in adults in the countries included in our study. The rapid expansion and spread of these clones may have been facilitated by the dramatic expansion of a mobile susceptible host population. Previous analysis has shown that HIV-I arrived in East and Central Africa around the 1950’s, followed by an eastwards expansion in the 1970’s and early 1980’s44. We find temporal parallels in this estimated HIV-I expansion time frame and our estimate of the earliest detectable transmissions in lineage 1 around the early 1980’s (95% HPD, 1967.6 – 1990.2). The continued expansion of the HIV-susceptible population until the peaks of prevalence in the 1990’s (Fig. 2c), together with the acquisition of additional chloramphenicol resistance, is likely contributory to the greater dispersal of lineage II clones. The association of iNTS disease with malaria, anemia and malnourishment in children is well-documented4,5,45-47 and we have isolates within our collection from children with these underlying conditions (Supplementary Table 1). Malnourished and malarial children thus present an additional ecological niche that co-exists with as well as precedes the HIV-positive population. Notably, we found no evidence of phylogenetic segregation between such isolates and those from HIV-positive children or adults within the two epidemic lineages. This is consistent with immunosuppression as a key predisposing factor in iNTS disease. However, the emergence of a large cohort of HIV infected adults may also have facilitated the spread of the invasive Salmonella Typhimurium lineages, as adults are inevitably more mobile. This is especially pertinent since failure of immunological control of iNTS infections in HIV-positive African adults has been well-documented34,48.
The resulting large pool of immunosuppressed individuals may also facilitate an unusual human-to-human transmission (anthroponotic) component in invasive Salmonella Typhimurium disease, in contrast to most disease caused by NTS outside of Africa, where transmission is predominantly zoonotic 49. There is a dearth of information on the specifics of NTS transmission in sub-Saharan Africa although independent, country-based studies have shown evidence of non-zoonotic transmissions patterns39,49,50. It is perhaps noteworthy that we detected a similar pattern of genomic degradation in the form of gene loss and pseudogene formation to the human-adapted serovars S. Typhi12 and S. Paratyphi51 in the two fully sequenced African invasive Salmonella Typhimurium isolates, D23580 and A130 representative of lineages I and II respectively7. Taken together, these results suggest that the invasive clones may have adapted to facilitate direct person-to-person transmission within the human population. Further comparative studies on the virulence, and transmission potential of different Salmonella Typhimurium lineages will be instrumental in closing this critical knowledge gap and are the focus of on-going investigations
These results provide the first whole-genome based transmission study of this kind on iNTS from sub-Saharan Africa, and they highlight the power of these approaches to monitor emergence and spread of clonal bacterial populations associated with epidemics locally or globally over time. The transmission pathways hypothesized here suggest potential routes to the implementation of appropriate clinical intervention strategies.
ONLINE METHODS
Isolate Selection and genomic DNA preparation
129 isolates associated with invasive disease from Malawi, Mali, Kenya, and Nigeria were cultured from venous blood, cerebrospinal fluid or stool of febrile adults and children between 1988 and 2010. Gastrointestinal isolates were obtained from collections at the Salmonella Genetic Stock centre (SGSC)52, Calgary Canada, the Health Protection Agency, London or as indicated in the Supplementary Table 113,53-57. Invasive Salmonella Typhimurium isolates were identified by standard serotyping methods, using the O- and H-antigen agglutination, based on the Kauffmann-White Scheme1. DNA samples were provided for invasive Salmonella Typhimurium isolates from the DRC, Mozambique and Uganda. Isolates were grown on Luria-Bertani (LB) medium; single colonies were incubated in LB Broth overnight at 37°C. Bacteria cells were pelleted by centrifugation (3, 700 × g or 4,300 rpm, 5 minutes) and DNA extracted using either the Wizard® Genomic DNA kit (Promega) according to manufacturers’ instructions or by phenol/chloroform extraction protocol18. DNA quality and quantity were checked by gel electrophoresis and Qubit® quantitation platform (Invitogen). 20-50 ng/μL of DNA from each isolate was submitted for Illumina sequencing.
Genomic library preparation and sequencing
Multiplex libraries with a 200bp insert size were prepared using 12 unique index-tags, and sequenced to generate 54 or 76 base-pair (bp) paired-end reads. Cluster formation, primer hybridisation and sequencing reactions were based on reversible terminator chemistry using the Illumina Genome Analyser II System according to standard protocol26,58. Sequence data were submitted to the European Nucleotide Archive. Accession numbers are given in Supplementary Table 1.
Read alignment and SNP detection
Paired-end Illumina sequence data from each isolate was mapped to the reference genome Salmonella Typhimurium strain SL134457 using SSAHA259. Sequence reads mapped to an average of 97.7% of the reference with a mean depth of 56.5-fold in mapped regions across all isolates (Supplementary Table 1). SNPs were identified using samtools mpileup and filtered with a minimum mapping quality of 30 and quality ratio cut-off of 0.7518,24,26,59,60. SNPs called in phage sequences and repetitive regions of the Salmonella Typhimurium reference were excluded. Repetitive regions were defined as exact repetitive sequences of ≥ 20 bp, identified using repeat finding programs nucmer61, REPeuter62 and repeat-match12,17. Recombinant segments of the genome were removed from the whole genome alignment as described previously18. Following the removal of recombinant segments, mobile elements and repetitive sequences of the genome, a concatenated alignment composed of 10,623 SNP sites from each sequenced isolates was produced. Small insertions and deletions (indels) were also identified from the SSAHA result output, but were not used for subsequent phylogenetic analyses.
Phylogenetic analyses
A Maximum-likelihood phylogenetic tree (Fig. 1) was constructed from SNP alignment with RAxML v7.0.414 using a general time-reversible (GTR) substitution model with γ correction for among-site rate variation. Support for nodes on the trees were assessed using 100 bootstrap replicates. For the identified lineages 1 and II, 487 and 422 chromosomal SNP loci were identified, respectively. These within-cluster SNP alignments were then used to recalculate individual maximum likelihood trees for each cluster, using the same parameters described above. These were used as input trees for subsequent analyses. The methods described above were also applied to obtain a maximum-likelihood phylogenetic reconstruction of plasmids from our isolate collection using 1,251 concatenated SNP sites, with the virulence plasmid pSLT-SL1344 from SL1344 as the reference.
MLST Analyses
Allele coordinates were obtained for the seven housekeeping genes used for the S. enterica MLST typing scheme (aroC, dnaN, hemD, hisD, purE, sucA and thrA) by manually marking the coordinates in the whole-genome alignment of the isolates. The marked up regions were extracted and a multi-sequence alignment of each gene produced for all the isolates. These were used to determine the sequence type of each isolates using the Salmonella enterica MLST database
Bayesian phylogeny, estimating dates of divergence and phylogeographical analyses of lineages
Estimation of rates of evolution, divergence times and phylogeography for our isolate collection as well as for each of the identified lineages was performed using the Bayesian MCMC framework, BEAST15, on SNP alignments. Various combinations of population size change model and molecular clock model were compared to find the model that best fit the data. In all cases Bayes Factors showed strong support (BF << 200) for the use of skyline63 model of population size change and a relaxed uncorrelated lognormal clock64 which allows the evolutionary rates to change among the branches of the tree24 and a GTR substitution model with gamma correction for amongst site rate variation.
Using the same parameters, the geographical locations of ancestral nodes were estimated using the discrete geospatial model implemented in BEAST (Supplementary Table 1)16. In all cases, three independent chains were run for 250 million steps each, and sampled every 10,000 steps. The three chains were combined with LogCombiner15 with the initial 25 million steps removed from each as a burn-in. Maximum clade credibility (MCC) trees were created and annotated using TreeAnnotator and viewed in FigTree15. We report estimates as median values within 95% highest posterior density (HPD) and report posterior probability values as support for identified ancestral node age and geographical location. For the latter, we report values greater than 0.7. Spatial reconstruction of MCC trees was carried out using the SPREAD65 software and visualized with Google Earth (Supplementary figure 3).
HIV prevalence data extrapolation
HIV prevalence data for the sampled countries was modelled with a generalized logistic (or Richards’)66 curve using the grofit R package67. Curves were fit to all data points from the beginning of monitoring until stabilization or decline of the HIV-positive population. We then used these fitted models to extrapolate possible past population sizes.
Validation tests for the origin of lineage I
25 permutation datasets made up of 10 randomly selected Malawian isolates, together with the 7 DRC, 8 Kenyan, 8 Mozambican and 7 Ugandan isolates, were used to reconstruct Bayesian maximum clade credibility (MCC) phylogenetic trees. Each of the 25 datasets included a different set of 10 randomly selected Malawian isolates. The same parameters described above were applied in making the trees. Malawi was the ancestral state of all resulting 25 MCC trees with posterior probability values ranging from 0.58 - 0.92. The resulting phylogenetic trees and their root location state probability distributions are shown in supplementary figure 2b.
Plasmid sequence analyses
Paired-end sequence reads of each isolates were mapped to the multi-fasta sequence features including the Tn21 locus of pSLT-BT, the reference plasmid from invasive strain, D23580 using the Burrows-Wheeler Aligner software BWA68, with minimum base call quality of 50, minimum mapping quality of 30, and minimum read depth of 4. Isolates from each of the three clusters were analyzed separately by cluster. Isolates with <30% of reads mapping to the length of the feature were interpreted as not having the feature, and those with >70% of reads mapping to the feature were interpreted as having the region of interest. A heatmap of the analysis based on the selected cut-off values was generated and aligned to the BEAST MCC tree of each cluster.
De novo sequence assembly and plasmid genome comparisons
Paired-end Illumina sequence data was assembled de novo using Velvet69 and the parameters optimized to give the highest N50 value. The multi-contig draft genomes generated for each isolate were ordered using either pSLT or pSLT-BT to confirm plasmid structure using Abacas70. The draft plasmid genomes were used to query pSLT and/or pSLT-BT sequences using BLASTN71 and comparison files generated and viewed using Artemis Comparison Tool (ACT)72
Supplementary Material
ACKNOWLEDGEMENTS
We thank J. Cheesborough for providing the DRC isolates, M. Okong, N. French and the Medical Research Council, Uganda, for providing the Uganda isolates, S. Nair for providing the Health Protection Agency (HPA) isolates, L. Barquist for modeling the pre-1990 HIV prevalence data and the Sequencing team at the Wellcome Trust Sanger Institute. This work was funded by a Wellcome Trust grant (098051).
C.A.M. was supported by a Tropical Research Fellowship from the Wellcome Trust and a Clinical Research Fellowship from GlaxoSmithKline.
Footnotes
URLs.
European Nucleotide Archive (ENA), http://www.ebi.ac.uk/ena
MLST database, http://mlst.ucc.ie/mlst/mlst/dbs/Senterica/
AIDSInfoOnline.mdb, http://www.aidsinfoonline.org/
UNAIDS, http://www.unaids.org/en/
UNAIDS Report on the Global AIDS Epidemic 2010, http://www.unaids.org/globalreport/global_report.htm
Google Earth, http://www.google.co.uk/intl/en_uk/earth/index.html
ACCESSION NUMBERS Referenced accession codes for data deposited in the NCBI Nucleotide database include: FQ312003, FN424405, HE654726, FN432031and AE006471. The full set of primary accession codes for the Illumina sequence reads of 177 invasive and gastrointestinal Salmonella Typhimurium is given in Supplementary Table 1.
AUTHOR CONTRIBUTIONS Chinyere K. Okoro & Robert A. Kingsley
CKO, RAK contributed to collecting data and manuscript writing. CKO analyzed sequence data, performed phylogenetic, BEAST and comparative genomics analyses. TRC and SRH wrote the coding scripts for phylogenetic and Bayesian statistical analyses and contributed to manuscript writing. CMP, MNA-M, SK, CLM, MAG, ED-P, RSH, SO, PLA, IM, CAM, JW, MDT, MML and SMT contributed to studies from which isolates were drawn and to manuscript writing. GD, JP and RAK designed the study and GD supervised the study.
COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests.
REFERENCES
- 1.Popoff MY, Bockemuhl J, Gheesling LL. Supplement 2002 (no. 46) to the Kauffmann-White scheme. Res Microbiol. 2004;155:568–570. doi: 10.1016/j.resmic.2004.04.005. [DOI] [PubMed] [Google Scholar]
- 2.Langridge GC, Nair S, Wain J. Nontyphoidal Salmonella serovars cause different degrees of invasive disease globally. The Journal of infectious diseases. 2009;199:602–603. doi: 10.1086/596208. [DOI] [PubMed] [Google Scholar]
- 3.Reddy EA, Shaw AV, Crump JA. Community-acquired bloodstream infections in Africa: a systematic review and meta-analysis. Lancet Infect Dis. 2010;10:417–432. doi: 10.1016/S1473-3099(10)70072-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Graham SM. Nontyphoidal salmonellosis in Africa. Curr Opin Infect Dis. 2010;23:409–414. doi: 10.1097/QCO.0b013e32833dd25d. [DOI] [PubMed] [Google Scholar]
- 5.Berkley JA, et al. HIV infection, malnutrition, and invasive bacterial infection among children with severe malaria. Clinical infectious diseases : an official publication of the Infectious Diseases Society of America. 2009;49:336–343. doi: 10.1086/600299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Gordon MA, et al. Non-typhoidal salmonella bacteraemia among HIV-infected Malawian adults: high mortality and frequent recrudescence. AIDS (London, England) 2002;16:1633–1641. doi: 10.1097/00002030-200208160-00009. [DOI] [PubMed] [Google Scholar]
- 7.Kingsley RA, et al. Epidemic multiple drug resistant Salmonella Typhimurium causing invasive disease in sub-Saharan Africa have a distinct genotype. Genome Res. 2009;19:2279–2287. doi: 10.1101/gr.091017.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Gordon MA, et al. Epidemics of invasive Salmonella enterica serovar enteritidis and S. enterica Serovar typhimurium infection associated with multidrug resistance among adults and children in Malawi. Clin Infect Dis. 2008;46:963–969. doi: 10.1086/529146. [DOI] [PubMed] [Google Scholar]
- 9.Gordon MA. Salmonella infections in immunocompromised adults. The Journal of infection. 2008;56:413. doi: 10.1016/j.jinf.2008.03.012. [DOI] [PubMed] [Google Scholar]
- 10.Cheesbrough JS, Taxman BC, Green SD, Mewa FI, Numbi A. Clinical definition for invasive Salmonella infection in African children. Pediatr Infect Dis J. 1997;16:277–283. doi: 10.1097/00006454-199703000-00005. [DOI] [PubMed] [Google Scholar]
- 11.Parkhill J, et al. Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18. Nature. 2001;413:848. doi: 10.1038/35101607. [DOI] [PubMed] [Google Scholar]
- 12.Holt KE, et al. High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi. Nat Genet. 2008;40:987–993. doi: 10.1038/ng.195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Beltran P, et al. Reference collection of strains of the Salmonella typhimurium complex from natural populations. J Gen Microbiol. 1991;137:601–606. doi: 10.1099/00221287-137-3-601. [DOI] [PubMed] [Google Scholar]
- 14.Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006;22:2688–2690. doi: 10.1093/bioinformatics/btl446. [DOI] [PubMed] [Google Scholar]
- 15.Drummond AJ, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol. 2007;7:214. doi: 10.1186/1471-2148-7-214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Lemey P, Rambaut A, Drummond AJ, Suchard MA. Bayesian phylogeography finds its roots. PLoS Comput Biol. 2009;5:e1000520. doi: 10.1371/journal.pcbi.1000520. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.He M, et al. Evolutionary dynamics of Clostridium difficile over short and long time scales. Proceedings of the National Academy of Sciences of the United States of America. 2010;107:7527–7532. doi: 10.1073/pnas.0914322107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Croucher NJ, et al. Rapid pneumococcal evolution in response to clinical interventions. Science. 2011;331:430–434. doi: 10.1126/science.1198545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Holt KE, et al. Temporal fluctuation of multidrug resistant salmonella typhi haplotypes in the mekong river delta region of Vietnam. PLoS Negl Trop Dis. 2011;5:e929. doi: 10.1371/journal.pntd.0000929. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.den Bakker HC, Bundrant BN, Fortes ED, Orsi RH, Wiedmann M. A population genetics-based and phylogenetic approach to understanding the evolution of virulence in the genus Listeria. Appl Environ Microbiol. 2010;76:6085–6100. doi: 10.1128/AEM.00447-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Smith GJ, et al. Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic. Nature. 2009;459:1122–1125. doi: 10.1038/nature08182. [DOI] [PubMed] [Google Scholar]
- 22.Smith GJ, et al. Dating the emergence of pandemic influenza viruses. Proceedings of the National Academy of Sciences of the United States of America. 2009;106:11709–11712. doi: 10.1073/pnas.0904991106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Endicott P, Ho SY, Stringer C. Using genetic evidence to evaluate four palaeoanthropological hypotheses for the timing of Neanderthal and modern human origins. J Hum Evol. 2010;59:87–95. doi: 10.1016/j.jhevol.2010.04.005. [DOI] [PubMed] [Google Scholar]
- 24.Mutreja A, et al. Evidence for several waves of global transmission in the seventh cholera pandemic. Nature. 2011;477:462–465. doi: 10.1038/nature10392. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Morelli G, et al. Yersinia pestis genome sequencing identifies patterns of global phylogenetic diversity. Nat Genet. 2010;42:1140–1143. doi: 10.1038/ng.705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Harris SR, et al. Science. Vol. 327. New York, N.Y.: 2010. Evolution of MRSA during hospital transmission and intercontinental spread; p. 469. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Korber B, et al. Timing the ancestor of the HIV-1 pandemic strains. Science. 2000;288:1789–1796. doi: 10.1126/science.288.5472.1789. [DOI] [PubMed] [Google Scholar]
- 28.Lemey P, et al. The molecular population genetics of HIV-1 group O. Genetics. 2004;167:1059–1068. doi: 10.1534/genetics.104.026666. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Worobey M, et al. Direct evidence of extensive diversity of HIV-1 in Kinshasa by 1960. Nature. 2008;455:661–664. doi: 10.1038/nature07390. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Nahmias AJ, et al. Evidence for human infection with an HTLV III/LAV-like virus in Central Africa, 1959. Lancet. 1986;1:1279–1280. doi: 10.1016/s0140-6736(86)91422-4. [DOI] [PubMed] [Google Scholar]
- 31.Sharp ER, et al. Immunodominance of HIV-1 specific CD8+ T-cell responses is related to disease progression rate in vertically infected adolescents. PloS one. 2011;6:e21135. doi: 10.1371/journal.pone.0021135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Harris SR, et al. Whole-genome analysis of diverse Chlamydia trachomatis strains identifies phylogenetic relationships masked by current clinical typing. Nat Genet. 2012 doi: 10.1038/ng.2214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Gangarosa EJ, et al. An epidemic-associated episome? The Journal of infectious diseases. 1972;126:215–218. doi: 10.1093/infdis/126.2.215. [DOI] [PubMed] [Google Scholar]
- 34.MacLennan CA, et al. Dysregulated humoral immunity to nontyphoidal Salmonella in HIV-infected African adults. Science. 2010;328:508–512. doi: 10.1126/science.1180346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Smith PD, et al. Salmonella typhimurium enteritis and bacteremia in the acquired immunodeficiency syndrome. Ann Intern Med. 1985;102:207–209. doi: 10.7326/0003-4819-102-2-207. [DOI] [PubMed] [Google Scholar]
- 36.Levine WC, Buehler JW, Bean NH, Tauxe RV. Epidemiology of nontyphoidal Salmonella bacteremia during the human immunodeficiency virus epidemic. The Journal of infectious diseases. 1991;164:81–87. doi: 10.1093/infdis/164.1.81. [DOI] [PubMed] [Google Scholar]
- 37.Raffatellu M, et al. Simian immunodeficiency virus-induced mucosal interleukin-17 deficiency promotes Salmonella dissemination from the gut. Nature medicine. 2008;14:421–428. doi: 10.1038/nm1743. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Roux CM, et al. Both hemolytic anemia and malaria parasite-specific factors increase susceptibility to Nontyphoidal Salmonella enterica serovar typhimurium infection in mice. Infect Immun. 2010;78:1520–1527. doi: 10.1128/IAI.00887-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Keddy KH, et al. Genotypic and demographic characterization of invasive isolates of Salmonella Typhimurium in HIV co-infected patients in South Africa. Journal of infection in developing countries. 2009;3:585. doi: 10.3855/jidc.549. [DOI] [PubMed] [Google Scholar]
- 40.Ikumapayi UN, et al. Molecular epidemiology of community-acquired invasive non-typhoidal Salmonella among children aged 2 29 months in rural Gambia and discovery of a new serovar, Salmonella enterica Dingiri. J Med Microbiol. 2007;56:1479. doi: 10.1099/jmm.0.47416-0. [DOI] [PubMed] [Google Scholar]
- 41.Dione MM, et al. Clonal differences between Non-Typhoidal Salmonella (NTS) recovered from children and animals living in close contact in the Gambia. PLoS Negl Trop Dis. 2011;5:e1148. doi: 10.1371/journal.pntd.0001148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Beyene G, et al. Multidrug resistant Salmonella Concord is a major cause of salmonellosis in children in Ethiopia. Journal of infection in developing countries. 2011;5:23–33. doi: 10.3855/jidc.906. [DOI] [PubMed] [Google Scholar]
- 43.Sibhat B, et al. Salmonella serovars and antimicrobial resistance profiles in beef cattle, slaughterhouse personnel and slaughterhouse environment in ethiopia. Zoonoses Public Health. 2011;58:102–109. doi: 10.1111/j.1863-2378.2009.01305.x. [DOI] [PubMed] [Google Scholar]
- 44.Gray RR, et al. Spatial phylodynamics of HIV-1 epidemic emergence in east Africa. AIDS. 2009;23:F9–F17. doi: 10.1097/QAD.0b013e32832faf61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Brent AJ, et al. Salmonella bacteremia in Kenyan children. Pediatr Infect Dis J. 2006;25:230–236. doi: 10.1097/01.inf.0000202066.02212.ff. [DOI] [PubMed] [Google Scholar]
- 46.Mandomando I, et al. Invasive non-typhoidal Salmonella in Mozambican children. Trop Med Int Health. 2009;14:1467. doi: 10.1111/j.1365-3156.2009.02399.x. [DOI] [PubMed] [Google Scholar]
- 47.Rosanova MT, Paganini H, Bologna R, Lopardo H, Ensinck G. Risk factors for mortality caused by nontyphoidal Salmonella sp. in children. Int J Infect Dis. 2002;6:187–190. doi: 10.1016/s1201-9712(02)90109-8. [DOI] [PubMed] [Google Scholar]
- 48.Gordon MA, et al. Invasive non-typhoid salmonellae establish systemic intracellular infection in HIV-infected adults: an emerging disease pathogenesis. Clinical infectious diseases : an official publication of the Infectious Diseases Society of America. 2010;50:953. doi: 10.1086/651080. [DOI] [PubMed] [Google Scholar]
- 49.Kariuki S, et al. Invasive multidrug-resistant non-typhoidal Salmonella infections in Africa: zoonotic or anthroponotic transmission? J Med Microbiol. 2006;55:585–591. doi: 10.1099/jmm.0.46375-0. [DOI] [PubMed] [Google Scholar]
- 50.Fashae K, Ogunsola F, Aarestrup FM, Hendriksen RS. Antimicrobial susceptibility and serovars of Salmonella from chickens and humans in Ibadan, Nigeria. Journal of infection in developing countries. 2010;4:484–494. doi: 10.3855/jidc.909. [DOI] [PubMed] [Google Scholar]
- 51.Holt KE, et al. Pseudogene accumulation in the evolutionary histories of Salmonella enterica serovars Paratyphi A and Typhi. BMC genomics. 2009;10:36. doi: 10.1186/1471-2164-10-36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Zinder ND, Lederberg J. Genetic exchange in Salmonella. Journal of bacteriology. 1952;64:679–699. doi: 10.1128/jb.64.5.679-699.1952. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Helm RA, et al. Pigeon-associated strains of Salmonella enterica serovar Typhimurium phage type DT2 have genomic rearrangements at rRNA operons. Infect Immun. 2004;72:7338. doi: 10.1128/IAI.72.12.7338-7341.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Beltran P, et al. Toward a population genetic analysis of Salmonella: genetic diversity and relationships among strains of serotypes S. choleraesuis, S. derby, S. dublin, S. enteritidis, S. heidelberg, S. infantis, S. newport, and S. typhimurium. Proceedings of the National Academy of Sciences of the United States of America. 1988;85:7753–7757. doi: 10.1073/pnas.85.20.7753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Cooke FJ, et al. Characterization of the genomes of a diverse collection of Salmonella enterica serovar Typhimurium definitive phage type 104. Journal of Bacteriology. 2008;190:8155. doi: 10.1128/JB.00636-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Andrews-Polymenis HL, et al. Host restriction of Salmonella enterica serotype Typhimurium pigeon isolates does not correlate with loss of discrete genes. Journal of Bacteriology. 2004;186:2619. doi: 10.1128/JB.186.9.2619-2628.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Hoiseth SK, Stocker BA. Aromatic-dependent Salmonella typhimurium are non-virulent and effective as live vaccines. Nature. 1981;291:238–239. doi: 10.1038/291238a0. [DOI] [PubMed] [Google Scholar]
- 58.Bentley DR, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456:53–59. doi: 10.1038/nature07517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Ning Z, Cox AJ, Mullikin JC. SSAHA: a fast search method for large DNA databases. Genome Res. 2001;11:1725–1729. doi: 10.1101/gr.194201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Li H, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Kurtz S, et al. Versatile and open software for comparing large genomes. Genome Biol. 2004;5:R12. doi: 10.1186/gb-2004-5-2-r12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Kurtz S, et al. REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res. 2001;29:4633–4642. doi: 10.1093/nar/29.22.4633. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Drummond AJ, Rambaut A, Shapiro B, Pybus OG. Bayesian coalescent inference of past population dynamics from molecular sequences. Mol Biol Evol. 2005;22:1185–1192. doi: 10.1093/molbev/msi103. [DOI] [PubMed] [Google Scholar]
- 64.Drummond AJ, Ho SY, Phillips MJ, Rambaut A. Relaxed phylogenetics and dating with confidence. PLoS Biol. 2006;4:e88. doi: 10.1371/journal.pbio.0040088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Bielejec F, Rambaut A, Suchard MA, Lemey P. SPREAD: spatial phylogenetic reconstruction of evolutionary dynamics. Bioinformatics. 2011;27:2910–2912. doi: 10.1093/bioinformatics/btr481. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Richards FJ. A flexible growth function for empirical use. Journal of Experimental Botany. 1959;10:290–301. [Google Scholar]
- 67.Kahm M, Hasenbrink G, Lichtenberg-Fraté H, Ludwig J, Kschischo M. grofit: Fitting Biological Growth Curves with R. Journal of Statistical Software. 2010;33 [Google Scholar]
- 68.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18:821–829. doi: 10.1101/gr.074492.107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Assefa S, Keane TM, Otto TD, Newbold C, Berriman M. ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics. 2009;25:1968–1969. doi: 10.1093/bioinformatics/btp347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. Journal of molecular biology. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- 72.Carver T, et al. Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database. Bioinformatics. 2008;24:2672–2676. doi: 10.1093/bioinformatics/btn529. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.