Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2011 Feb 28;108(11):4494–4499. doi: 10.1073/pnas.1019751108

Neisseria meningitidis is structured in clades associated with restriction modification systems that modulate homologous recombination

Sonia Budroni a,1, Emilio Siena a,1, Julie C Dunning Hotopp b, Kate L Seib a, Davide Serruto a, Chiara Nofroni a, Maurizio Comanducci a, David R Riley b, Sean C Daugherty b, Samuel V Angiuoli b, Antonello Covacci a, Mariagrazia Pizza a, Rino Rappuoli a,2, E Richard Moxon c, Hervé Tettelin b,3, Duccio Medini a,2,3
PMCID: PMC3060241  PMID: 21368196

Abstract

Molecular data on a limited number of chromosomal loci have shown that the population of Neisseria meningitidis (Nm), a deadly human pathogen, is structured in distinct lineages. Given that the Nm population undergoes substantial recombination, the mechanisms resulting in the evolution of these lineages, their persistence in time, and the implications for the pathogenicity of the bacterium are not yet completely understood. Based on whole-genome sequencing, we show that Nm is structured in phylogenetic clades. Through acquisition of specific genes and through insertions and rearrangements, each clade has acquired and remodeled specific genomic tracts, with the potential to impact on the commensal and virulence behavior of Nm. Despite this clear evidence of a structured population, we confirm high rates of detectable recombination throughout the whole Nm chromosome. However, gene conversion events were found to be longer within clades than between clades, suggesting a DNA cleavage mechanism associated with the phylogeny of the species. We identify 22 restriction modification systems, probably acquired by horizontal gene transfer from outside of the species/genus, whose distribution in the different strains coincides with the phylogenetic clade structure. We provide evidence that these clade-associated restriction modification systems generate a differential barrier to DNA exchange consistent with the observed population structure. These findings have general implications for the emergence of lineage structure and virulence in recombining bacterial populations, and they could provide an evolutionary framework for the population biology of a number of other bacterial species that show contradictory population structure and dynamics.

Keywords: meningococcus, population structure and dynamics, phylogenetic clades, pan-genome


A Gram-negative encapsulated bacterium, Neisseria meningitidis (Nm) is a common commensal with a carriage prevalence of 8–25% in human populations (1). Despite its prevalence, Nm occasionally causes very severe meningococcal meningitis and septicemia as an accidental pathogen that derives no long-term evolutionary benefit from the pathology that it causes (2) in that the capacity to cause disease is apparently not coupled with transmission efficiency between hosts. A number of putative virulence factors have been identified or proposed (3), and some are significantly associated with invasive disease (4, 5). However, the search for a genetic basis that would explain why some lineages but not others have a high odds ratio for disease has proven elusive, even when investigated at the whole-genome level (6).

The Nm population consists of multiple genotypes that exhibit signatures of both clonal descent and genetic exchange (7, 8). Based on multilocus sequence typing (MLST), the Nm genotypes may be grouped into distinct lineages, previously identified as clone (or clonal) complexes (CC), based on the sequence similarity of neutral alleles (9). Despite high rates of recombination, these lineages persist and disseminate globally over many decades (1012). The observation that distinct lineages exist in Nm is consistent with neutral evolution and does not necessarily require any special nonneutral process. However, patterns of variation within and among Nm lineages show evidence of deviations from neutrality, and several explanations have been used to reconcile these findings (1113).

Analysis of multiple Nm genome sequences has shed light on a number of mechanisms for mediating genomic flexibility as a strategy for adapting to changes in the environment within and between human hosts (14, 15). Comparative genome analysis of disease and carriage strains suggests that the species originated from an unencapsulated ancestor by the acquisition of species-specific insertion sequences (ISs), that differentiate Nm from the rest of the genus, and of genes required for the production of a polysaccharide capsule (6). However, the limited number of isolates sequenced so far has not enabled a population study at the whole-genome level to be completed.

We report an analysis of 15 previously unpublished and 5 publicly available Nm genomes isolated from five continents, including strains of five different capsular serogroups. Our results show that the Nm species population has evolved into distinct phylogenetic clades (PCs) larger than the lineages identified by MLST. Each PC possesses a unique set of restriction modification systems (RMSs) that results in a differential barrier to genetic transfer and accounts for the observed structuring of the population of Nm genotypes.

Results

Nm Pan-Genome Grows Slowly Because Strain-Specific Genes Are Rare.

Fourteen Nm serogroup B strains and one Nm serogroup X strain were sequenced in this study and supplemented by five publicly available genomes, providing a dataset of 20 genomes that include strains isolated from five continents, five serogroups, and 10 CCs (Table S1).

The asymptotic core genome size was estimated to be 1,630 ± 62 genes (Fig. 1A). The pan-genome was confirmed as open (16) (Fig. 1B) but growing at a slow rate (Heaps' law exponent = 0.04 ± 0.01) compared with other nonclosed gene repertoires (17). A slightly higher exponent was obtained when correcting for the strain selection bias in the sample caused by CC or serogroup composition (0.06 ± 0.04), suggesting that the abundance in the dataset of strains belonging to serogroup B and five CCs associated with disease might lead to an underestimate of the pan-genome growth rate. Extrapolation of the data indicates that, if 100 genomes were sequenced, the Nm pan-genome would consist of ~2,500 genes (~2,650 or ~2,700 when correcting for CC or serogroup bias, respectively), and each single isolate thereafter would contribute an average of less than two new genes (Fig. 1C). Each meningococcal genome is expected to be composed, on average, of 79% core, 21% dispensable, and <0.1% specific genes. Although the pool of genes pertaining to Nm is open, the number of strain-specific genes is very small. As a consequence, species diversity will be determined by presence/absence of sequences associated with groups of isolates as well as by nucleotide sequence variation of core genes and only to a very limited extent, by strain-specific genes.

Fig. 1.

Fig. 1.

Nm pan-genome analysis. For each distribution of values in the pan-genome sampling process, box plots represent medians and interquartile ranges, whiskers are the central 95% percentile ranges, minimums and maximums are marked with an “x,” and the means are circles. (A) The number of core genes is plotted vs. the number of genomes evaluated. The curve represents the least-square regression of an exponential decay function to the data for more than five genomes; the dashed line indicates the asymptotic value predicted for the core genome: 1,630 genes. (B) The pan-genome size is plotted vs. the number of genomes evaluated. The curve represents the least-square regression of a Heaps law function to the data obtained for α = 0.04 ± 0.01. (C) The number of genes discovered is plotted vs. the number of genomes investigated. The curve represents the least-square fit of a power-law function to the data obtained for (1 − α) = 0.93 in good agreement with the pan-genome regression.

Nm Population Is Structured in Phylogenetic Clades Comprising Different CCs.

Phylogenetic networks based on variation within core and dispensable sequences (Fig. 2 and Fig. S1G) showed that strains from the same CC are closely related and form monophyletic groups. Strains from CC32/CC269 and CC8/CC11 also showed clear common ancestry, similar to the MLST-based relationship identified between ST-41 and ST-44 subcomplexes. This was confirmed by methods based on gene presence/absence, maximum likelihood, and distance-based phylogeny (Fig. S1). We propose a phylogenetic clade (PC) concept for Nm, where each PC can comprise more than one CC: PC32/269, PC8/11, and PC41/44. Five strains, not belonging to these PCs, did not show a phylogenetic structure with robust statistical support, forming a star-like topology (Fig. 2 and Fig. S1). Such topologies reflect an excess of rare alleles compared with neutral expectations and can be attributed to the simultaneous diversification of multiple lineages because of rapid population expansion after a bottleneck (18) and/or to the confounding effect of homologous recombination (19).

Fig. 2.

Fig. 2.

Phylogenetic network for Nm core genome. NeighborNet phylogenetic network obtained for the Nm core genome. PCs are color-coded in blue (PC41/44), green (PC8/11), and red (PC32/269). Strains not clustering into phylogenetic groups are shown in yellow. Colored boxes report, for each PC, the main properties of PC-specific DNA insertions. The box in the bottom right corner reports, for each PC and CC, the percentage of nucleotide identity (average ± SD) and the number of specific genes. The box in the bottom left corner reports the main properties of CC- and PC-specific rearrangements, which are mapped as circular arrows on the phylogeny. Details on DNA insertions and rearrangements are provided in SI Results, where each event is identified by the same label as reported here.

Chromosomal Inversions Produce PC-Specific Arrangements in Pathogenicity-Related Operons.

Ten major chromosomal rearrangements were identified between the 11 genomes sequenced to closure (SI Results and Fig. S2A). Breakpoints were mostly associated with duplicated repeat sequence 3 (dRS3) (20) (significant association, P < 0.01) and IS elements (not significant because of the ubiquity of ISs). Three rearrangements seem to have happened more than one time, revealing regions of particular instability and suggesting caution in the use of chromosomal rearrangements as phylogenetic markers. These inversions occurred within PCs, differentiating the chromosomal structure between CC8 and CC11 and between CC32 and CC269. The other seven rearrangements were found only one time, and six were PC-specific (Fig. S1H). These results support the evolutionary consistency of PCs. Seven inversions have a potential biological impact on the chromosomal regions flanking the breakpoints (SI Results and Fig. S2B). In particular, four PC-specific rearrangements might influence the expression of restriction modification genes and genes involved in the pathogenesis of the meningococcus (Fig. 2, Rχ/Rδ and Rγ/Rη, respectively, SI Results, and Fig. S2).

Clade-Specific Genes Are Highly Conserved and Mostly Involved in Host–Pathogen Interaction.

The biological relevance of PCs is further supported by the consistency of the PC-specific gene pool. On a random basis, it is expected that more genes would be specific to smaller groups, like CCs, and fewer genes would be specific to larger groups, like PCs. Unexpectedly, we find considerably more PC-specific than CC-specific genes (Fig. 2), and all PC-specific DNA regions (eight regions for PC32/269, four regions for PC8/11, and eight regions for PC41/44) (SI Results) are highly conserved (nucleotide identity ≥ 99.3% for 18 of 20 regions), suggesting that each PC emerged recently. Most PC-specific genes are involved in restriction modification and Nm–host interplay, either contributing new functional genes or inserting within preexisting operons without disrupting resident genes (SI Results and Fig. S3). Through acquisition of specific genes and through insertions and rearrangements that have a potential effect on preexisting functions, each PC has acquired and remodeled specific genomic tracts that will likely influence host–pathogen interactions.

Homologous Recombination Is Pervasive and Correlates with the Density of DNA Uptake Sequences.

Phylogenetic networks (Fig. 2 and Fig. S1G) reveal homoplasy in the form of nontree-like edges. Both networks show only 67 and 58 splits (core and dispensable genome networks, respectively) of >1 million possible, indicating that the horizontal phylogenetic signal is confined to a very limited number of DNA donor–acceptor pairs. Homologous recombination was detected in 87% of each chromosome using a coalescent-based method (21) (Materials and Methods). The rate of detectable recombination ρ was in agreement with previous estimates [average = 11.1 (1.0–41.0)95%CI vs. ρ = 13.6 (11)]. The ratio of recombination to mutation rate ρ/θ falls in the range of previous estimates [average = 1.59 (0.06–7.31)95%CI vs. (0.16–7.7) (1012, 22)]. No significant positional bias for recombination was detected along the chromosome, and ρ did not correlate positively with the degree of sequence conservation; this suggests that recombination acts similarly on most of the genome, whereas ρ/θ was lower than average in genes involved in DNA metabolism (Bonferroni-corrected Fisher P value < 0.01). A significant correlation was found between ρ and the density of DNA uptake sequences (DUSs; Spearman correlation coefficient = 0.14, P value < 10−7), and a smaller proportion of nonrecombining DNA was predicted in the core genome (11%) than in the dispensable genome (45%), where DUS density is much lower (23). These results confirm the link between DUSs and homologous recombination and the role of the latter in preserving genome stability rather than generating adaptive variation (23).

Content and Organization of Insertion Sequences Do Not Segregate by PCs.

Thirty-nine IS prototypes, belonging to nine IS families, were present in variable copy numbers per genome (SI Materials and Methods and Fig. S4A). Each strain harbored an average of 41 ISs (minimum of 32 to maximum of 51), with a rather homogeneous distribution across isolates. Clustering based on IS content isolated the meningitidis species from the rest of the genus (6). The same analysis applied to our dataset revealed a very poor correlation between IS content and intraspecies population structure (Fig. S4B), and none of the PCs were reproduced as a monophyletic group. CCs that clustered together had weak bootstrap support (e.g., CC32 with 50% support), whereas relationships incompatible with the established phylogeny had strong support (e.g., the FAM18-Z2491 pair had 80% support). Therefore, with a few exceptions, such as a Tn3-like element specific to PC32/269, ISs move quite freely within the species, frequently crossing PC borders.

Restriction Modification Is a Determinant of PCs.

RMSs were the only functional class showing specific genes or rearrangements in each PC (Fig. 2). We identified 22 putative RMSs (Table S2), including 14 Type II, 4 Type III (2 of which have two distinct alleles for the DNA methylase), and 2 Type I. No RMS is shared by all genomes, and each strain contains five to nine RMSs. Two RMSs are present in 19 of 20 isolates. Eight RMSs are each specific to a single isolate, and five are unique to the capsule-null strain α14. The remaining 12 systems are shared by 2–13 isolates each (Fig. 3). One-half of the RMSs are localized in five hotspots for integration of minimal mobile elements (MMEs) (24) (Table S3 and Fig. S5). The majority of RMSs have a GC content incompatible with Nm (Table S2), indicating that the RMS repertoire is primarily composed of MMEs acquired from other species and/or genera through horizontal gene transfer (HGT).

Fig. 3.

Fig. 3.

Presence/absence of 22 RMSs reconstructs PCs. A presence/absence matrix is shown, where each row represents one Nm genome analyzed, color-coded as in Fig. 2, and each column represents 1 of 22 RMSs identified (Table S2). Presence of an RMS in a strain is indicated by a dark square. The cladogram is a bootstrapped agglomerative hierarchical clustering of strains based on RMS presence/absence, and numbers indicate bootstrap support for each node. Color shades on the cladogram indicate support groups obtained with a distance threshold of one RMS.

In contrast to ISs, clustering of RMS presence/absence accurately reproduces the core genome phylogenetic structure of the species (Fig. 3), with high bootstrap support (92–100%) for the three PCs and poor support for the other nodes (21–63%) (Fig. S1E). Each PC is associated with a specific RMS pattern: PC32/269 with seven RMSs (except strain M13399 that lacks the EcoR124II Type I RMS), PC41/44 with nine RMSs (except the carriage strain OX99-30304 that lacks the EcoPI-ModD Type III RMS), and PC8/11 with seven RMSs. Not all of the RMSs are specific to a single PC, but each PC has a unique combination of RMSs.

A meta-analysis was performed to test 189 meningococcal strains for the presence/absence of five PC-specific RMSs (EcoPI-ModB1 for PC32/269, NmeDI for PC8/11, EcoPI-ModD and NmeSIM for PC41/44, and Nme18ORF1992P for CC213) and one RMS that is specific to two PCs (NmeBI for PC32/269 and PC41/44) (Materials and Methods and Table S4). Each RMS showed a highly significant association with the respective PC or combination of PCs (all Bonferroni-corrected Fisher association P values < 10−2), confirming our pan-genomic study on a larger epidemiological scale.

Gene Conversion Events Are Longer Within PCs than Between PCs.

Gene conversion events within the 20 genomes were mapped using an inferential method (25) (Materials and Methods). The average length of within PC imports was fivefold higher than the length of between PC imports (3.89 vs. 0.68 kb) (Fig. 4A). The frequency of short imports (<1 kb) was twofold higher than between PC imports, whereas that of long imports (≥5 kb) was 40-fold higher within PC imports. Although the number of gene conversions is similar, the length of DNA exchanged between PC is significantly smaller than within PC (approximately fivefold smaller, P value < 10−6). Results were supported by an independent analysis of linkage-disequilibrium patterns (Fig. 4B), and a significant correlation was found between the number of RMSs shared by two isolates and the average length of exchanged DNA (Pearson correlation = −0.54, P value < 10−16).

Fig. 4.

Fig. 4.

Length of predicted gene conversion events within and between PCs. (A) Reverse cumulative distribution of gene conversion events for various lengths of the exchanged DNA (i.e., the proportion of events equal or longer than the length indicated on the x axis). Within PC events (donor and acceptor in the same PC) are shown as open squares. Between PC events (donor and acceptor in different PCs) are shown as circles. (B) Linkage disequilibrium measure D′ (SI Materials and Methods) as a function of the physical distance along the chromosome: within PC (B1) and between PC (B2). Solid curves indicate least-square regression of the exponential decay function κ × exp(−x/τ) + δ to the data. Best fits were obtained for 3τ = 2,430 (within PC) and 3τ = 729 (between PC), where 3τ is an approximate estimate of the average length of gene conversion events. Dashed lines indicate best-fit values for δ.

A mechanism that generates a difference in the length but not in the number of imports would have to play its role after DNA uptake but before its integration into the recipient chromosome. Thus, results indicate the presence of a selective DNA cleavage mechanism, the typical signature of RMSs, whose activity is directly correlated with the population structure of the species.

Discussion and Outlook

Analysis of sequence variation in Nm strains at a limited number of core genome loci (≤10 kb) identified groups of strains (CCs) sharing a common ancestry (9). The search for common ancestry between CCs (i.e., the reconstruction of the evolutionary history at the species level rather than within restricted groups of isolates) proved elusive because of the limited number of loci investigated (22). Here, we show that the whole core genome (~1.4 Mb) provides sufficient power to identify robust phylogenetic relationships between isolates belonging to different CCs. The Nm population is structured in compartments larger than CCs, which we suggest to name PCs. CC32 and CC269 as well as CC8 and CC11, lineages considered for a long time as separate epidemiological entities, do merge into PC32/269 and PC8/11. Over time, as more complete genome sequences are determined, we anticipate a continuous improvement of the definition of meningococcal PCs, with the inclusion of additional CCs in these PCs and the identification of new PCs.

Each of the three PCs identified is associated with specific gene content/arrangements in functional compartments critical to bacterial–host interactions. Further experimental work using relevant in vitro, ex vivo, or in vivo models is needed to determine the role of PC-specific genes in Nm pathogenesis. Nevertheless, genomic results reported here, together with the lack of evidence for a general determinant of the commensal–pathogenic transition in the species (6), are suggestive of a multifactorial nature of Nm pathogenicity developed/acquired by different clones in multiple independent evolutionary events. Meningococcal PCs are not necessarily associated with a defined odds ratio for disease but constitute a simplifying tool for the identification of determinants of pathogenicity, allowing comparative and functional studies to be performed in large and diverse groups of strains whose evolutionary history is now resolved.

We also show that homologous recombination is detectable across the genome, with an average of 1.59 recombination events per mutation event, and we confirm the correlation between recombination and density of DUSs on the chromosome (23), suggesting a regenerative, stabilizing function for transformation as opposed to a diversifying one.

As previously observed in Nm, population structure and dynamics seem to provide contradicting evidences, because the presence and persistence of phylogenetic structures seem to contradict the high and pervasive HGT rate estimated. A possible explanation comes from the dispensable genome of the species, where we identified 22 RMSs distributed in a very heterogeneous manner among the 20 strains. The simple presence/absence of RMSs was sufficient to accurately reconstruct the phylogenetic structure of the sample. Each PC was associated with a specific repertoire of RMSs, and highly significant associations with PCs were measured on a panel of 189 strains for six RMSs, as previously suggested by a microarray-based comparative genomic hybridization study (5).

The strong relationship between PCs and RMS could be interpreted either as the cause of the observed phylogenetic structure or as a mere consequence of a diversification process driven by other mechanisms, such as variation in fitness and transmissibility (26), neutral microepidemic evolution (11), or immune selection (12). In support of the first hypothesis, in silico analysis revealed that gene conversion events within a single PC are more than fivefold longer than events crossing PC barriers and that the length of exchanged DNA is significantly correlated with the number of RMSs shared between isolates. In principle, one might expect a higher rate of recombination among closely related isolates because of sequence similarity. However, no correlation was observed between recombination and degree of sequence polymorphism, and the substantial difference measured here is not in the rate but in the length of recombined DNA. This is suggestive of a PC-specific DNA cleavage mechanism, typical of RMSs, acting in the recipient cell after exogenous DNA is taken up but before it recombines with the recipient chromosome. A similar mechanism, first identified in Nm using cocultivation experiments, showed that a CC32-specific RMS was responsible for partial restriction of DNA transfer from CC11 to CC32 isolates (27) and suggested genetic isolation of the hypervirulent lineages of serogroup C meningococci (28). More recently, clustering of imported DNA endpoints observed in Helicobacter pylori suggested a role for RMSs in limiting recombination length (29), and a correlation between four RMSs and phylogenetic clusters was reported in Haemophilus influenzae, suggesting that heterogeneity in RMSs limits genetic exchange between randomly chosen strains (30).

Nm RMSs displayed characteristics of mobile elements but did not follow the random patterns observed for other mobile elements like ISs. Conversely, we observe that strains belonging to the same PC have one or no different RMSs, whereas the strains belonging to different PCs have on average more than seven different RMSs, providing multiple barriers to DNA exchange, even if some of these systems can phase vary to play a regulatory role (31). Therefore, we propose a model (Fig. 5) where homologous recombination, instead of being a force opposed to the population structure, is the very cause of the PCs observed in Nm. In a substantially pan-mictic population, homologous recombination wipes out phylogenetic relationships. When a clone acquires a new RMS, this is inherited by its offspring, whose ability to exchange DNA with other members of the species is reduced proportionally to the efficiency of the DNA restriction, because the effect of a single recombination event is directly proportional to the length of the converted DNA (32). As a result, (i) the progeny of the clone is less exposed to the confounding effect of homologous recombination, and (ii) efficient homologous recombination among the offspring of the originating clone plays a stabilizing regenerative role, giving rise to a new lineage in the population.

Fig. 5.

Fig. 5.

Working model for RMS-driven origin and persistence of PCs in Nm (in the text).

The paradigm of DNA transformation in Gram-negative bacteria, in which dsDNA crosses the outer membrane and is then transported across the inner membrane as ssDNA (33), has posed conceptual challenges to this model, because ssDNA is often considered to be resistant to endonucleases. However, several restriction enzymes mediate cleavage of ssDNA (34), and restriction of ssDNA has been shown to be a barrier to natural transformation in Pseudomonas stutzeri (35). dsDNA has been detected after transformation of pathogenic Nm (36), and a model for processing of ssDNA and dsDNA in the cytoplasm has been proposed for N. gonorrheae based on processing by RecA and RecBCD (37). Nm DNA uptake is coupled to pilus retraction (38). Although it has been observed that PilQ-mediated DNA uptake is more efficient with ssDNA than dsDNA (39), no preference was detected among other DNA binding proteins identified, including PilG (38).

It has already been proposed that RMSs have a function in maintaining species identity and controlling speciation (40). The mechanism is proposed here in an organic manner as the driver of the population structure within a single bacterial species. We envision a global pan-genomic effort in the meningococcal community, similar to the MLST effort in the early 2000s, which should include determination of the core genome to resolve the population structure and the dispensable genome to validate the relationship between RMSs and PCs and characterize PC-specific gene repertoires. Because the majority of pathogenic bacteria investigated to date occupy an intermediate position between the clonal and pan-mictic paradigms (41), this effort could be relevant to a broad spectrum of microbial species.

Materials and Methods

Sequencing and Annotation.

The genome sequences of 15 Nm strains (Table S1) were determined and annotated as reported in SI Materials and Methods. All gaps were closed in the G2136, H44/76, M01-240149, M01-240355, M04-240196, and NZ-05/33 genomes, whereas N1568, OX99-30304, M6190, M13399, M0579, ES14902, CU385, 961–5945, and M01-240013 remained as draft genomes. Genome sequences and annotations have been deposited in GenBank as CP002419, CP002420, CP002421, CP002422, CP002423, CP002424, AEQD00000000, AEQE00000000, AEQF00000000, AEQG00000000, AEQH00000000, AEQI00000000, AEQJ00000000, AEQK00000000, and AEQL00000000, respectively.

Pan-Genome Analysis.

Genome comparisons for the pan-genome analysis were performed as previously described (42) with the modifications reported in SI Materials and Methods. Four different datasets were analyzed: (i) 20 genomes (Table S1), (ii) 10 genomes—1 genome per CC (G2136, M01-240355, N1568, NZ-05/33, α14, MC58, Z2491, 053442, M04-240196, and FAM18—to correct for the overrepresentation of five CCs, (iii) 7 genomes—1 genome per serogroup A, X, and cnl, 2 genomes per serogroups C and B (α14, MC58, N1568, Z2491, NZ-05/33, 053442, and FAM18—to correct for the overrepresentation of serogroup B, and (iv) 7 serogroup B genomes—G2136, OX99-30304, M01-240355, MC58, M6190, NZ-05/33, and M04-240196—to compare with the previous dataset.

Multiple Genome Alignments.

Three different multiple genome alignments were produced with the progressive Mauve algorithm of the Mauve v2.3.1 toolkit using the seed-family option (43). It is comprised of (i) 20 Nm genomes (Table S1) plus the complete genome of N. gonorrheae strain FA1090 as an outgroup, (ii) 20 Nm genomes without FA1090, and (iii) 11 Nm genomes that were sequenced to closure to perform positional analyses. Core and dispensable sequences were extracted from the first two alignments and realigned using the MAFFT v6.24 aligner (44) with options --ep 0 --genafpair --maxiterate 1,000. The third alignment revealed 137 DNA regions internally free from genome rearrangement [locally collinear blocks (LCBs)], 25 of which included DNA from all genomes. The 25 core LCBs, accounting for 93% ± 2% of each genome, were selected for positional analysis.

Phylogenetic and Positional Analyses.

Core and dispensable genome-based phylogenetic networks of Fig. 2 and Fig. S1G were determined using the NeighborNet algorithm as implemented in SplitsTree v.4 (45) using the multiple sequence alignment of the concatenated core and dispensable genomes, respectively. Other phylogenetic analyses reported in Fig. S1 are described in SI Materials and Methods. Inversion events within the chromosomes of 11 closed Nm genomes were reconstructed with the Bayesian model of genome rearrangements implemented in BADGER version 1.01β (46) with default values for the parameters.

RMS Analysis.

A putative RMS was called when both the DNA methylase and DNA endonuclease were present and adjacent on the chromosome, even if possibly frame-shifted or disrupted. An RMS presence/absence matrix was constructed (Fig. 3) and clustered with the bootstrapped agglomerative hierarchical clustering implemented in Mev 4.4.0 (47), with Euclidian distance, average linkage clustering, and 100 bootstrap replicates. RMS presence/absence data were also collected for six RMSs across 189 Nm strains by merging heterogeneous evidences collected from various sources (Table S4 and SI Materials and Methods). A Fisher exact test, as implemented in R version 2.9.2 (http://www.r-project.org), was used with Bonferroni correction to test RMS–PC associations.

Homologous Recombination Detection.

We estimated the population-scaled recombination rate ρ and the Watterson estimate for the mutation rate θ with the pair-wise algorithm implemented in LDHat 2.1 (48) using an average gene conversion length of 1.1 kb as in ref. 12. The multiple genome alignment of 20 Nm genomes was scanned with 1-kb tailing windows (50% overlapping). Windows containing sequence data for five or more genomes were analyzed. ρ and θ were also estimated with the same algorithm on multiple alignments of Mugsy-based ortholog clusters (SI Materials and Methods). DUS density was determined for Mugsy clusters with 100-bp flanks, and the correlation with ρ was tested using the Spearman correlation test implemented in R 2.9.2. Individual gene conversion events were detected with GENECONV (25), with modifications (SI Materials and Methods) on each LCB ≥ 5 kb obtained from the multiple alignment of 20 Nm genomes.

Supplementary Material

Supporting Information

Acknowledgments

We thank Stephen Bentley for advice on genome annotation and analysis of chromosomal rearrangements; Nicola Pacchiani for support with the computational cluster of Novartis Vaccines and Diagnostics Systems Biology; Stanley Sawyer for suggestions on the use of GENECONV; Heather Huot, Todd Creasy, and A. Scott Durkin for help with manual annotation; the sequencing, assembly, closure, and annotation teams at the Institute for Genomic Research and the Institute for Genome Sciences for genome data processing; Claire Fraser-Liggett and William Nierman for senior management and support; Gerd Pluschke, Julio Vazquez, Tanja Popovic, Leonard Mayer, the Active Bacterial Core Surveillance (ABCs) Team and the Emerging Infection Programs (EIP) Network, Ray Borrow, Ed Kaczmarski, Dominique Caugant, and Martin Maiden for providing Nm strains and suggestions on strain selection; Giorgio Corsi for artwork preparation; and Tim Perkins for critical reading of the manuscript.

Footnotes

Conflict of interest statement: M.C., A.C., M.P., R.R., D.S., K.L.S., and D.M. are employees of Novartis Vaccines; S.B. and E.S. have PhD grants from Novartis Vaccines; and E.R.M. serves on the Scientific Advisory Board of Novartis Vaccines.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1019751108/-/DCSupplemental.

References

  • 1.Stephens DS, Greenwood B, Brandtzaeg P. Epidemic meningitis, meningococcaemia, and Neisseria meningitidis. Lancet. 2007;369:2196–2210. doi: 10.1016/S0140-6736(07)61016-2. [DOI] [PubMed] [Google Scholar]
  • 2.Maiden MC. Population structure of Neisseria meningitidis. In: Ferreirós C, Criado MT, Vázquez J, editors. Emerging Strategies in the Fight Against Meningitis: Molecular and Cellular Aspects. Wymondham, Norfolk, United Kingdom: Horizon Scientific Press; 2002. pp. 151–170. [Google Scholar]
  • 3.Snyder LA, Saunders NJ. The majority of genes in the pathogenic Neisseria species are present in non-pathogenic Neisseria lactamica, including those designated as ‘virulence genes.’. BMC Genomics. 2006;7:128–139. doi: 10.1186/1471-2164-7-128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Bille E, et al. Association of a bacteriophage with meningococcal disease in young adults. PLoS ONE. 2008;3:e3885. doi: 10.1371/journal.pone.0003885. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Dunning Hotopp JC, et al. Comparative genomics of Neisseria meningitidis: Core genome, islands of horizontal transfer and pathogen-specific genes. Microbiology. 2006;152:3733–3749. doi: 10.1099/mic.0.29261-0. [DOI] [PubMed] [Google Scholar]
  • 6.Schoen C, et al. Whole-genome comparison of disease and carriage strains provides insights into virulence evolution in Neisseria meningitidis. Proc Natl Acad Sci USA. 2008;105:3473–3478. doi: 10.1073/pnas.0800151105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Caugant DA, et al. Intercontinental spread of a genetically distinctive complex of clones of Neisseria meningitidis causing epidemic disease. Proc Natl Acad Sci USA. 1986;83:4927–4931. doi: 10.1073/pnas.83.13.4927. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Smith JM, Smith NH, O'Rourke M, Spratt BG. How clonal are bacteria? Proc Natl Acad Sci USA. 1993;90:4384–4388. doi: 10.1073/pnas.90.10.4384. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Maiden MC, et al. Multilocus sequence typing: A portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci USA. 1998;95:3140–3145. doi: 10.1073/pnas.95.6.3140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Feil EJ, et al. Recombination within natural populations of pathogenic bacteria: Short-term empirical estimates and long-term phylogenetic consequences. Proc Natl Acad Sci USA. 2001;98:182–187. doi: 10.1073/pnas.98.1.182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Fraser C, Hanage WP, Spratt BG. Neutral microepidemic evolution of bacterial pathogens. Proc Natl Acad Sci USA. 2005;102:1968–1973. doi: 10.1073/pnas.0406993102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Jolley KA, Wilson DJ, Kriz P, McVean G, Maiden MC. The influence of mutation, recombination, population history, and selection on patterns of genetic diversity in Neisseria meningitidis. Mol Biol Evol. 2005;22:562–569. doi: 10.1093/molbev/msi041. [DOI] [PubMed] [Google Scholar]
  • 13.Buckee CO, et al. Role of selection in the emergence of lineages and the evolution of virulence in Neisseria meningitidis. Proc Natl Acad Sci USA. 2008;105:15082–15087. doi: 10.1073/pnas.0712019105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Bentley SD, et al. Meningococcal genetic variation mechanisms viewed through comparative analysis of serogroup C strain FAM18. PLoS Genet. 2007;3:e23. doi: 10.1371/journal.pgen.0030023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Schoen C, Tettelin H, Parkhill J, Frosch M. Genome flexibility in Neisseria meningitidis. Vaccine. 2009;27(Suppl 2):B103–B111. doi: 10.1016/j.vaccine.2009.04.064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Medini D, Donati C, Tettelin H, Masignani V, Rappuoli R. The microbial pan-genome. Curr Opin Genet Dev. 2005;15:589–594. doi: 10.1016/j.gde.2005.09.006. [DOI] [PubMed] [Google Scholar]
  • 17.Tettelin H, Riley D, Cattuto C, Medini D. Comparative genomics: The bacterial pan-genome. Curr Opin Microbiol. 2008;11:472–477. doi: 10.1016/j.mib.2008.09.006. [DOI] [PubMed] [Google Scholar]
  • 18.Fiala KL, Sokal RR. Factors determining the accuracy of cladogram estimation: Evaluation using computer simulation. Evolution. 1985;39:609–622. doi: 10.1111/j.1558-5646.1985.tb00398.x. [DOI] [PubMed] [Google Scholar]
  • 19.Holmes EC, Urwin R, Maiden MC. The influence of recombination on the population structure and evolution of the human pathogen Neisseria meningitidis. Mol Biol Evol. 1999;16:741–749. doi: 10.1093/oxfordjournals.molbev.a026159. [DOI] [PubMed] [Google Scholar]
  • 20.Parkhill J, et al. Complete DNA sequence of a serogroup A strain of Neisseria meningitidis Z2491. Nature. 2000;404:502–506. doi: 10.1038/35006655. [DOI] [PubMed] [Google Scholar]
  • 21.McVean G, Awadalla P, Fearnhead P. A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics. 2002;160:1231–1241. doi: 10.1093/genetics/160.3.1231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Didelot X, Urwin R, Maiden MC, Falush D. Genealogical typing of Neisseria meningitidis. Microbiology. 2009;155:3176–3186. doi: 10.1099/mic.0.031534-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Treangen TJ, Ambur OH, Tonjum T, Rocha EP. The impact of the neisserial DNA uptake sequences on genome evolution and stability. Genome Biol. 2008;9:1–17. doi: 10.1186/gb-2008-9-3-r60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Saunders NJ, Snyder LA. The minimal mobile element. Microbiology. 2002;148:3756–3760. doi: 10.1099/00221287-148-12-3756. [DOI] [PubMed] [Google Scholar]
  • 25.Sawyer S. Statistical tests for detecting gene conversion. Mol Biol Evol. 1989;6:526–538. doi: 10.1093/oxfordjournals.molbev.a040567. [DOI] [PubMed] [Google Scholar]
  • 26.Lipsitch M, Moxon ER. Virulence and transmissibility of pathogens: What is the relationship? Trends Microbiol. 1997;5:31–37. doi: 10.1016/S0966-842X(97)81772-6. [DOI] [PubMed] [Google Scholar]
  • 27.Claus H, Friedrich A, Frosch M, Vogel U. Differential distribution of novel restriction-modification systems in clonal lineages of Neisseria meningitidis. J Bacteriol. 2000;182:1296–1303. doi: 10.1128/jb.182.5.1296-1303.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Claus H, Stoevesandt J, Frosch M, Vogel U. Genetic isolation of meningococci of the electrophoretic type 37 complex. J Bacteriol. 2001;183:2570–2575. doi: 10.1128/JB.183.8.2570-2575.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Lin EA, et al. Natural transformation of helicobacter pylori involves the integration of short DNA fragments interrupted by gaps of variable size. PLoS Pathog. 2009;5:e1000337. doi: 10.1371/journal.ppat.1000337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Erwin AL, et al. Analysis of genetic relatedness of Haemophilus influenzae isolates by multilocus sequence typing. J Bacteriol. 2008;190:1473–1483. doi: 10.1128/JB.01207-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Srikhanta YN, et al. Phasevarions mediate random switching of gene expression in pathogenic Neisseria. PLoS Pathog. 2009;5:e1000400. doi: 10.1371/journal.ppat.1000400. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Guttman DS, Dykhuizen DE. Clonal divergence in Escherichia coli as a result of recombination, not mutation. Science. 1994;266:1380–1383. doi: 10.1126/science.7973728. [DOI] [PubMed] [Google Scholar]
  • 33.Chen I, Dubnau D. DNA transport during transformation. Front Biosci. 2003;8:s544–s556. doi: 10.2741/1047. [DOI] [PubMed] [Google Scholar]
  • 34.Nishigaki K, Kaneko Y, Wakuda H, Husimi Y, Tanaka T. Type II restriction endonucleases cleave single-stranded DNAs in general. Nucleic Acids Res. 1985;13:5747–5760. doi: 10.1093/nar/13.16.5747. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Berndt C, Meier P, Wackernagel W. DNA restriction is a barrier to natural transformation in Pseudomonas stutzeri JM300. Microbiology. 2003;149:895–901. doi: 10.1099/mic.0.26033-0. [DOI] [PubMed] [Google Scholar]
  • 36.Jyssum K, Jyssum S, Gundersen WB. Sorption of DNA and RNA during transformation of Neisseria meningitidis. Acta Pathol Microbiol Scand B Microbiol Immunol. 1971;79:563–571. doi: 10.1111/j.1699-0463.1971.tb03813.x. [DOI] [PubMed] [Google Scholar]
  • 37.Mehr IJ, Seifert HS. Differential roles of homologous recombination pathways in Neisseria gonorrhoeae pilin antigenic variation, DNA transformation and DNA repair. Mol Microbiol. 1998;30:697–710. doi: 10.1046/j.1365-2958.1998.01089.x. [DOI] [PubMed] [Google Scholar]
  • 38.Lång E, et al. Identification of neisserial DNA binding components. Microbiology. 2009;155:852–862. doi: 10.1099/mic.0.022640-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Assalkhou R, et al. The outer membrane secretin PilQ from Neisseria meningitidis binds DNA. Microbiology. 2007;153:1593–1603. doi: 10.1099/mic.0.2006/004200-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Jeltsch A. Maintenance of species identity and controlling speciation of bacteria: A new function for restriction/modification systems? Gene. 2003;317:13–16. doi: 10.1016/s0378-1119(03)00652-8. [DOI] [PubMed] [Google Scholar]
  • 41.Smith JM, Feil EJ, Smith NH. Population structure and evolutionary dynamics of pathogenic bacteria. Bioessays. 2000;22:1115–1122. doi: 10.1002/1521-1878(200012)22:12<1115::AID-BIES9>3.0.CO;2-R. [DOI] [PubMed] [Google Scholar]
  • 42.Tettelin H, et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial “pan-genome.”. Proc Natl Acad Sci USA. 2005;102:13950–13955. doi: 10.1073/pnas.0506758102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Darling AC, Mau B, Blattner FR, Perna NT. Mauve: Multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 2004;14:1394–1403. doi: 10.1101/gr.2289704. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Katoh K, Toh H. Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform. 2008;9:286–298. doi: 10.1093/bib/bbn013. [DOI] [PubMed] [Google Scholar]
  • 45.Huson DH, Bryant D. Application of phylogenetic networks in evolutionary studies. Mol Biol Evol. 2006;23:254–267. doi: 10.1093/molbev/msj030. [DOI] [PubMed] [Google Scholar]
  • 46.Larget B, Simon DL, Kadane JB, Sweet D. A bayesian analysis of metazoan mitochondrial genome arrangements. Mol Biol Evol. 2005;22:486–495. doi: 10.1093/molbev/msi032. [DOI] [PubMed] [Google Scholar]
  • 47.Saeed AI, et al. TM4: A free, open-source system for microarray data management and analysis. Biotechniques. 2003;34:374–378. doi: 10.2144/03342mt01. [DOI] [PubMed] [Google Scholar]
  • 48.McVean GA, et al. The fine-scale structure of recombination rate variation in the human genome. Science. 2004;304:581–584. doi: 10.1126/science.1092500. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES