Skip to main content
PLOS One logoLink to PLOS One
. 2021 Feb 25;16(2):e0242297. doi: 10.1371/journal.pone.0242297

Whole genome genetic variation and linkage disequilibrium in a diverse collection of Listeria monocytogenes isolates

Swarnali Louha 1,*, Richard J Meinersmann 2, Travis C Glenn 1,3
Editor: Yung-Fu Chang4
PMCID: PMC7906370  PMID: 33630832

Abstract

We performed whole-genome multi-locus sequence typing for 2554 genes in a large and heterogenous panel of 180 Listeria monocytogenes strains having diverse geographical and temporal origins. The subtyping data was used for characterizing genetic variation and evaluating patterns of linkage disequilibrium in the pan-genome of L. monocytogenes. Our analysis revealed the presence of strong linkage disequilibrium in L. monocytogenes, with ~99% of genes showing significant non-random associations with a large majority of other genes in the genome. Twenty-seven loci having lower levels of association with other genes were considered to be potential “hot spots” for horizontal gene transfer (i.e., recombination via conjugation, transduction, and/or transformation). The patterns of linkage disequilibrium in L. monocytogenes suggest limited exchange of foreign genetic material in the genome and can be used as a tool for identifying new recombinant strains. This can help understand processes contributing to the diversification and evolution of this pathogenic bacteria, thereby facilitating development of effective control measures.

Introduction

The bacterial genome is a dynamic structure. Characterizing patterns of genomic variation in bacterial pathogens can provide insights into the forces shaping their biology and evolutionary history [1]. Homologous recombination is an important driver of evolution and increases the adaptive potential of bacteria by allowing variation to be tested across multiple genomic backgrounds [2]. Recombination is mediated by three mechanisms; transformation, transduction, and conjugation, and the availability and efficacy of these mechanisms and their biological consequences play a major role in determining the frequency of recombination in a bacterial population [1, 3]. Recombination is variably distributed in bacterial genomes, with some sites in the genome recombining at a higher or lower frequency than the genomic average, known as hot spots and cold spots respectively [4]. Evidence for recombination and its effect on genomic variation can be obtained by detecting patterns of non-random association of genotypes at different loci within a given population, termed as linkage disequilibrium [1, 3]. Various methods for detecting linkage disequilibrium have been used to study the extent of genetic recombination shaping the population structures of several bacterial species [1, 57].

Listeria monocytogenes, known for causing life-threatening infections in animals and human populations at risk, is one of the bacterial species having the lowest rate of homologous recombination. Genetic diversity in this species is mainly driven by the accumulation of mutations over time, with alleles five times more likely to change by mutation than by recombination [8]. L. monocytogenes is generally considered to have a clonal genetic structure [9, 10]. The population structure of this bacteria consists of 4 evolutionary lineages (I, II, III and IV) and recombination has been observed between isolates of different lineages; suggesting that although recombination is rare in L. monocytogenes, this species is not completely clonal [8, 11, 12]. Interestingly, homologous recombination is not equally frequent among isolates of different lineages, with lineages II, III and IV showing higher rates of recombination and lower degree of sequence similarity than lineage I [11, 1315].

Whole-genome sequencing studies have shown that L. monocytogenes genomes are highly syntenic in their gene content and organization, with a majority of gene-scale differences occurring in the accessory genome and accumulated in a few hypervariable hotspots, prophages, transposons, scattered unique genes and genetic islands encoding proteins of unknown functions [14, 1619]. Several other studies have detected evidence of recombination using a few genes [8, 11, 20] and indicated the presence of significant linkage disequilibrium in L. monocytogenes [21, 22]. However, these studies used a limited number of L. monocytogenes isolates and evaluated recombination present in a small fraction of the genome, mostly made up of house-keeping genes, which are assumed to be under negative selection and less subject to homologous recombination.

Prior to the advent of next-generation sequencing technologies, multi locus enzyme electrophoresis (MLEE), was used for generating large data sets for the statistical analysis of bacterial populations. MLEE differentiates organisms by assessing the relative electrophoretic mobilities of intracellular enzymes and indexes allelic variation in multiple chromosomal genes [23]. MLEE has been successfully used for studying the extent of linkage disequilibrium in a variety of bacterial species [5, 9, 24]. With the easy and cheap availability of sequencing data in the last decade, MLEE has been replaced with an analogous technique called MLST (multi locus sequence typing) for subtyping bacterial genomes [22, 25]. We recently provided an approach that can generate whole-genome MLST (wgMLST) based characterization of L. monocytogenes isolates from whole-genome sequencing data [26]. In this study, we use this wgMLST-based approach for characterizing genomic variation and assessing genome-wide patterns of linkage disequilibrium in a large collection of L. monocytogenes isolates obtained from diverse ecological niches.

Materials and methods

Listeria monocytogenes isolate selection

We selected a large and diverse panel of 180 L. monocytogenes isolates collected from different ecological communities (S1 File). This set included (i) 20 isolates each from food, food contact surfaces (FCS), manure, milk, clinical cases, soil, and ready-to-eat (RTE) products, for which whole-genome sequencing data was obtained from the NCBI Pathogen Detection database and, (ii) 20 isolates from water and sediment samples in the South Fork Broad River watershed located in Northeast Georgia and 20 isolates from effluents from poultry processing plants (EFPP), for which whole-genome sequencing data was provided by the USDA and FSIS [26].

Whole-genome multi-locus sequence typing (wgMLST)

Whole-genome sequencing data for the 180 L. monocytogenes isolates were processed using Haplo-ST (S1 Fig, [26]) for allelic profiling of 2554 genes per isolate. Haplo-ST first cleaned raw Illumina whole-genome sequencing reads obtained as previously described (S1 File) using the FASTX-Toolkit [27]. Next, reads were trimmed to remove all bases with a Phred quality score of < 20 from both ends and filtered such that 90% of bases in the clean reads had a quality of at least 20. After trimming and filtering, all remaining reads with lengths of < 50 bp were filtered out. Next, Haplo-ST used YASRA [28] to assemble the cleaned reads into allele sequences and provided wgMLST profiles to the assembled allele sequences with BIGSdb-Lm (available at http://bigsdb.pasteur.fr/listeria).

Analysis of linkage disequilibrium

First, the raw wgMLST profiles were filtered to remove paralogous loci and genes were ordered according to their genomic position in the L. monocytogenes reference strain EGD-e (NCBI Accession number NC_003210.1). Next, new alleles not defined in the BIGSdb-Lm database and reported as ‘closest matches’ to existing alleles in BIGSdb-Lm were assigned custom allele ID’s with in-house Python scripts. The wgMLST profiles were further filtered to retain loci with < 5% missing data. The remaining loci were used to evaluate linkage disequilibrium (LD) between all pairs of loci with Arlequin v3.5.2 [29]. LD tests for the presence of significant statistical association between pairs of loci and is based on an exact test. The test procedure is analogous to Fisher’s exact test on a two-by-two contingency table but extended to a contingency table of arbitrary size [30]. For each pair of loci, first a contingency table is constructed. The k1 x k2 entries of this table are the observed haplotype frequencies, with k1 and k2 being the number of alleles at locus 1 and locus 2, respectively. The LD test consists in obtaining the probability of finding a table with the same marginal totals and which has a probability equal or less than that of the observed contingency table. Instead of enumerating all possible contingency tables, a Markov chain is used to explore the space of all possible tables. To start from a random initial position in the Markov chain, the chain is explored for a pre-defined number of steps (the dememorization phase), such as to allow the Markov chain to forget its initial phase and make it independent from its starting point. The P-value of the test is then taken as the proportion of the visited tables having a probability smaller or equal to the observed contingency table. In our analysis, we used 100,000 steps of Markov chain to test the P-value of the LD test and 10,000 dememorization steps to reach a random initial position on the Markov chain. The significance level of the LD test was set at a P-value of 0.05.

Assessment of genetic diversity

Genetic diversity between L. monocytogenes isolates collected from the different ecological niches listed as the isolate sources (S1 File) was computed with pairwise FST’s in Arlequin. FST measures the proportion of the variance in allele frequencies attributable to variation between populations [31] and has a history of being used as a measure of the level of differentiation between populations in population genetics [32, 33]. Fifty thousand permutations were used to test the significance of the genetic distances at a significance level of 0.05.

The AMOVA procedure in Arlequin was used to compute the pairwise differences in allelic content between isolate wgMLST profiles as a matrix of Euclidean squared distances. This distance matrix was used to compute a minimum spanning tree (MST) between all isolates. The MST was visualized and annotated with iTOL v3 [34]. For better visualization, the MST was converted to circular format and annotations for the source of isolates were displayed in outer external rings.

Results

We performed whole-genome multi locus sequence typing for 180 L. monocytogenes isolates obtained from 9 different source populations. For each isolate, allele sequences were assembled for 2554 genes and provided allele ID’s based on the unified nomenclature available in the BIGSdb-Lm database (S2 File). This dataset was filtered to remove 133 paralogous loci identified by Haplo-ST and all loci with > 5% missing data (alleles not assigned ID’s by Haplo-ST), and the remaining 2233 loci (S3 File) were ordered according to their position in the L. monocytogenes reference genome EGD-e. Fig 1 shows the minimum spanning tree of the 180 isolates inferred from allelic differences in the wgMLST profiles. Two results are apparent. First, we see a long branch (red) containing a majority of isolates obtained from soil and manure clustered together, which suggests the origin of these strains from a common ancestor. Interestingly, three clinical strains (SRR1030275, SRR974870, SRR974873) are also found in this cluster. Secondly, a large number of food-related isolates (~51%, obtained from food, FCS, RTE products and EFPP) clustered together in a single branch of the tree (blue) with short branch-lengths to the tips, suggesting that these strains are closely related to each other. Although this is expected, it is interesting to find a few strains obtained from clinical cases (SRR1027103, SRR1030281), river water (SRR11051485, SRR11051480), and milk (SRR5085119, SRR5912760, SRR3571283, SRR3571297) in this cluster. The presence of isolates from unrelated ecological communities could be due to the technique used for constructing the dendrogram, which groups isolates based on pairwise differences in allelic content between isolate wgMLST profiles rather than characterizing differences between all variants in nucleotide sequences. For comparison with a reference strain of L. monocytogenes, the minimum spanning tree was rooted with EGD-e (S2 Fig).

Fig 1. Patterns of genetic differentiation in the 180 L. monocytogenes isolates.

Fig 1

Minimum spanning tree based on a distance matrix measuring pairwise differences in allelic content between isolate wgMLST profiles. The isolation source of each isolate is indicated with colors on the outer ring. Majority of the isolates sampled from soil and manure cluster together in a distant branch (red), suggesting their recent emergence from a common ancestor. A large number of food-related isolates cluster together in a single branch of the tree (blue), suggesting their close relatedness.

The genetic differentiation test that computes pairwise FST’s between isolates collected from different ecological communities (Table 1) shows that isolates obtained from soil and manure show considerable genetic differentiation from isolates belonging to other communities, with the exception of isolates obtained from clinical cases. Secondly, isolates from the EFPP-RTE pairing has lower FST than EFPP pairing from all other locations. Thirdly, the clustering dendrogram (Fig 1) and FST test are supportive of each other in that isolates from RTE, FCS and food are not distinguished as separate populations.

Table 1. Pairwise genetic distances (FST) between groups of L. monocytogenes strains isolated from nine different ecological niches.

clinical food FCS manure milk RTE product soil River water
clinical 0
food 0.051* 0
FCS 0.062* 0.015 0
manure 0.067* 0.126* 0.137* 0
milk 0.047* 0.047* 0.073* 0.124* 0
RTE product 0.09* 0.004 0.007 0.159* 0.069* 0
soil 0.064* 0.11* 0.124* 0.019* 0.104* 0.135* 0
River water 0.094* 0.091* 0.107* 0.153* 0.069* 0.092* 0.113* 0
EFPP 0.165* 0.157* 0.137* 0.221* 0.146* 0.076* 0.189* 0.13*

(*P < 0.05).

We investigated LD between pairs of genes in the genome using an exact test, which measures non-random associations between alleles at two loci based on the difference between observed and expected allele frequencies. As expected, most genes pairs (~97%) in the genome of L. monocytogenes show significant LD among pairs of alleles (Fig 2, S4 File). A majority of genes (2205 of 2233, ~99%) were found to be at LD with at least 90% of other genes in the genome (S5 File). Of the remaining 27 genes (~1%) that were at LD with < 90% of genes (Table 2), 10 genes were found to be at LD with < 50% of genes. A single locus, lmo0046, was at LD with only 19 other genes.

Fig 2. Heatmap of the extent of LD in the genome of L. monocytogenes.

Fig 2

Genes are ordered according to their genomic positions in the L. monocytogenes reference strain EGD-e along the x and y axis (for gene names see S4 File). A majority of genes show significant LD in the genome (indigo), while few genes are at linkage equilibrium (yellow).

Table 2. Genes at LD with < 90% of genes in the genome of L. monocytogenes, showing significant evidence for horizontal genetic transfer.

Locus tag Gene symbol # Genes at LD Percentage of genes at LD Location in the chromosome (bp) *Location in core/accessory genome w.r.t. BIGSdb-Lm Function
lmo0046 rpsR 19 0.85 50514..50753 core small subunit ribosomal protein S18
lmo2624 rpmC 185 8.289 2701254..2701445 core large subunit ribosomal protein L29
lmo2856 rpmH 215 9.63 2943569..2943703 accessory large subunit ribosomal protein L34
lmo1364 cspL 239 10.71 1387014..1387214 accessory Cold shock protein
lmo1469 rpsU 454 20.34 1501881..1502054 core small subunit ribosomal protein S21
lmo2616 rplR 458 20.52 2697988..2698347 accessory large subunit ribosomal protein L18
lmo1816 rpmB 484 21.69 1890951..1891139 core large subunit ribosomal protein L28
lmo0248 rplK 576 25.81 265029..265454 accessory large subunit ribosomal protein L11
lmo1335 rpmG 880 39.43 1363826..1363975 core large subunit ribosomal protein L33
lmo0263 inlH 1006 45.07 284365..286011 accessory internalin H
lmo0582 cwhA 1223 54.79 618932..620380 accessory Invasion associated secreted endopeptidase
lmo2047 rpmF 1377 61.69 2130228..2130401 accessory large subunit ribosomal protein L32
lmo2628 rpsS 1508 67.56 2702909..2703187 accessory small subunit ribosomal protein S19
lmo2614 rpmD 1580 70.79 2697267..2697446 core large subunit ribosomal protein L30
lmo0758 - 1606 71.95 783901..784788 core Hypothetical protein
lmo0514 - 1699 76.12 547520..549337 accessory Internalin
lmo0305 - 1709 76.57 329923..330999 core L-allo-threonine aldolase
lmo0659 - 1771 79.35 699410..700306 accessory Transcriptional regulator
lmo2206 clpB 1791 80.24 2294555..2297155 accessory Heat shock proteins
lmo0756 - 1797 80.51 781896..782801 core ABC Transporters
lmo0865 - 1859 83.29 903837..905510 core Amino sugar and nucleotide sugar metabolism
lmo2014 - 1888 84.59 2088797..2091454 accessory Glycan biosynthesis and metabolism
lmo1611 - 1904 85.3 1654902..1655975 core Aminopeptidase
lmo0264 inlE 1913 85.71 286219..287718 accessory Internalin E
lmo1839 pyrP 1925 86.25 1916166..1917452 accessory Electrochemical potential-driven transporters
lmo2179 - 1968 88.17 2264772..2268230 accessory Peptidoglycan binding protein
lmo0434 inlB 1981 88.75 457021..458913 accessory Internalin B

*Location in core/accessory genome has been determined with respect to the core-genome MLST scheme developed by the Institut Pasteur [25].

Discussion

Our dataset reveals the presence of strong LD in the genome of L. monocytogenes. Among the 2233 genes tested for LD, 2205 genes (approx. 99%) were found to have pairwise LD with a majority of other genes (90%) in the genome. High levels of LD can not only arise in highly clonal bacterial populations with low rates of recombination, but may also be temporarily present in bacteria with ‘epidemic’ population structures, in which high recombination rates randomize association between alleles, but adaptive clones emerge and diversify over the short-term [3, 5]. Because Listeria has a clonal genetic structure, it is unlikely that this high level of LD can arise except as a consequence of low rates of recombination. This is consistent with studies which report recombination in chromosomal genes as an infrequent event in natural populations of L. monocytogenes [8, 9]. Because the extent of genetic linkage is a useful index to the horizontal transfer occurring within a species and can be presented as direct evidence for recombination [3], the remaining ~1% of genes (Table 2) that were at LD with < 90% of genes can be described as “hot spots” for the gain of horizontally acquired information. The extensive linkage disequilibrium that we describe in L. monocytogenes is in sharp contrast to other pathogenic bacteria that are naturally competent for transformation and recombine frequently to give rise to either weakly clonal or panmictic population structures [3537].

The L. monocytogenes pan-genome is highly conserved but open to limited acquisition of foreign DNA or genetic variability through evolutionary forces such as mutation, duplication or recombination [14]. Evidence for homologous recombination between closely related strains of L. monocytogenes has been detected by multiple studies, however, non-homologous recombination seems to be rare [12, 13, 38]. Although recombination via conjugation and generalized transduction has been reported in L. monocytogenes [3941], and most competence related genes (which facilitate exogenous DNA uptake, for eg. comK, comE, comG etc.) are present in all Listeria genomes [42], natural competence or induced competence under laboratory conditions has not been observed in L. monocytogenes [43, 44]. This lack of competence may partially explain the low levels of gene acquisition from external gene pools. Limited gene acquisition may also be facilitated by defense systems for foreign DNA/mobile elements such as restriction-modification and/or CRISPR systems, both of which have been shown to restrict horizontal gene transfer in other bacterial genera [18].

The frequency of recombination in L. monocytogenes differs considerably in different regions of the genome and between isolates of different lineages [11, 19]. This may arise from differences in selective pressures in the environment and varying degrees of horizontal gene transfer. Several comparative genomic studies report a clustered distribution of accessory genes on the right replichore of the L. monocytogenes genome (approx. 500 Kb in the first 65°), indicating an area of high genome plasticity [14, 19]. On the contrary, a study by Orsi et al. failed to find any evidence of spatial clustering in a large number of genes which show evidence for recombination in L. monocytogenes [13]. Further, a recent study described the presence of homologous recombination in nearly 60% of loci in the core genome of L. monocytogenes, although most of this variation was also found to be affected by purifying selection and was thus neutral [25]. This is consistent with results from our analysis which finds linkage equilibrium between only ~1% of gene pairs in the genome. Also, genes considered as potential recombination hot spots (Table 2) in our dataset are found to be scattered in the genome. A large number (~41%) of these “hot spot” genes (lmo0046, lmo2624, lmo2856, lmo1469, lmo2616, lmo1816, lmo0248, lmo1335, lmo2047, lmo2628, lmo2614), encode ribosomal proteins and their related subunits. According to the complexity theory [45], informational genes involved in complex biosystems and maintenance of basal cellular functions are usually conserved, as they might be less likely to be compatible in the systems of other species. Thus, housekeeping genes such as ribosomal proteins are generally considered to be relatively restricted to horizontal gene transfer. However, several reports suggest horizontal gene transfer of ribosomal proteins in many prokaryotic genomes [4649]. Two other “hot spot” genes (lmo0865, lmo2014) are involved in carbohydrate and amino acid metabolism and have shown evidence for recombination in a prior study [13], indicating that the rapid diversification of these genes may enable L. monocytogenes to adapt to environments with varying nutrient availabilities. Some of the other genes encode a variety of internalin’s (lmo0263, lmo0514, lmo0264, lmo0434), transporters (lmo0756, lmo1839), transcriptional regulators (lmo0659), cell surface proteins (lmo2179), other invasion-associated proteins (lmo0582), and proteins involved in response to temperature fluctuations (lmo1364, lmo2206). Internalin’s are cell surface proteins with known and hypothesized roles in virulence [18, 50]. Evidence of recombination in internalin’s and these other genes suggests that L. monocytogenes is subjected to sustained selection pressures in the environment, and it responds to these pressures by continuously regulating its transcriptional machinery and remodeling the cell surface, thereby facilitating adaptation within the host and as a saprophyte.

In conclusion, we have identified the presence of strong linkage disequilibrium in the genome of L. monocytogenes. Parts of the genome showing strong non-random association between genes are highly conserved regions, and are most possibly affected by positive selection. The low levels of recombination within the L. monocytogenes genome suggests that the patterns of association observed between genes could be used to recognize newly emerging strains. As new strains are typed, their allelic configurations could be compared to other previously characterized strains. Novel allelic configurations would indicate a previously unobserved strain and can provide insights into the processes involved in the diversification and evolution of L. monocytogenes. Determination of evolutionary relationships between emergent strains and previously characterized pathogenic strains can help determine the potential of the emergent strain for causing disease. Such investigations can ultimately help to develop better control measures for this pathogenic microbe.

Supporting information

S1 File. Panel of 180 L. monocytogenes isolates collected from different ecological communities.

(XLSX)

S2 File. Whole-genome MLST profiles of the 180 L. monocytogenes isolates.

(XLSX)

S3 File. Whole-genome MLST profiles of 2233 loci retained for AMOVA after filtering out paralogous loci and loci with > 5% of missing data.

(XLSX)

S4 File. Heatmap of LD in the genome of L. monocytogenes.

(XLSX)

S5 File. Percentage of genes at LD with each gene in the genome of L. monocytogenes.

(XLSX)

S1 Fig. Workflow diagram for Haplo-ST.

(PDF)

S2 Fig. Minimum spanning tree of 180 Listeria monocytogenes isolates rooted with reference strain EGD-e.

(PDF)

Acknowledgments

We thank USDA and FSIS for providing us with Listeria monocytogenes whole-genome sequencing samples from river water and effluents of poultry processing plants. The high-performance computing cluster at Georgia Advanced Computing Resource Center (GACRC) at the University of Georgia provided computational infrastructure and technical support throughout the work.

Data Availability

All relevant data are within the manuscript and its Supporting Information files. All data used in this study have been uploaded to GenBank and the Accession numbers have been recorded in S1 File.

Funding Statement

This research was supported by funding from USDA Agricultural Research Service Project Number 6040-32000-009-00-D. The funders helped in data collection and preparation of the manuscript.

References

  • 1.Zwick ME, Thomason MK, Chen PE, Johnson HR, Sozhamannan S, Mateczun A, et al. Genetic variation and linkage disequilibrium in Bacillus anthracis. Sci Rep. 2011;1: 169. 10.1038/srep00169 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Yahara K, Didelot X, Jolley KA, Kobayashi I, Maiden MC, Sheppard SK, et al. The Landscape of Realized Homologous Recombination in Pathogenic Bacteria. Mol Biol Evol. 2016;33: 456–471. 10.1093/molbev/msv237 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Feil EJ, Spratt BG. Recombination and the population structures of bacterial pathogens. Annu Rev Microbiol. 2001;55: 561–590. 10.1146/annurev.micro.55.1.561 [DOI] [PubMed] [Google Scholar]
  • 4.Steiner WW, Smith GR. Natural meiotic recombination hot spots in the Schizosaccharomyces pombe genome successfully predicted from the simple sequence motif M26. Mol Cell Biol. 2005;25: 9054–9062. 10.1128/MCB.25.20.9054-9062.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Smith JM, Smith NH, O’Rourke M, Spratt BG. How clonal are bacteria? Proc Natl Acad Sci USA. 1993;90: 4384–4388. 10.1073/pnas.90.10.4384 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Takuno S, Kado T, Sugino RP, Nakhleh L, Innan H. Population Genomics in Bacteria: A Case Study of Staphylococcus aureus. Mol Biol Evol. 2012;29: 797–809. 10.1093/molbev/msr249 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Vigué L, Eyre-Walker A. The comparative population genetics of Neisseria meningitidis and Neisseria gonorrhoeae. PeerJ. 2019;7: e7216. 10.7717/peerj.7216 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Ragon M, Wirth T, Hollandt F, Lavenir R, Lecuit M, Le Monnier A, et al. A new perspective on Listeria monocytogenes evolution. PLoS Pathog. 2008;4(9): e1000146. 10.1371/journal.ppat.1000146 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Piffaretti JC, Kressebuch H, Aeschbacher M, Bille J, Bannerman E, Musser JM, et al. Genetic characterization of clones of the bacterium Listeria monocytogenes causing epidemic disease. Proc Natl Acad Sci USA. 1989;86: 3818–3822. 10.1073/pnas.86.10.3818 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Wiedmann M, Bruce JL, Keating C, Johnson AE, McDonough PL, Batt CA. Ribotypes and Virulence Gene Polymorphisms Suggest Three Distinct Listeria monocytogenes Lineages With Differences in Pathogenic Potential. Infect Immun. 1997;65: 2707–2716. 10.1128/IAI.65.7.2707-2716.1997 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.den Bakker HC, Didelot X, Fortes ED, Nightingale KK, Wiedmann M. Lineage Specific Recombination Rates and Microevolution in Listeria monocytogenes. BMC Evol Biol. 2008;8: 277. 10.1186/1471-2148-8-277 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Dunn KA, Bielawski JP, Ward TJ, Urquhart C, Gu H. Reconciling Ecological and Genomic Divergence Among Lineages of Listeria Under an "Extended Mosaic Genome Concept". Mol Biol Evol. 2009;26: 2605–2615. 10.1093/molbev/msp176 [DOI] [PubMed] [Google Scholar]
  • 13.Orsi RH, Sun Q, Wiedmann M. Genome-wide analyses reveal lineage specific contributions of positive selection and recombination to the evolution of Listeria monocytogenes. BMC Evol Biol. 2008;8: 233. 10.1186/1471-2148-8-233 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Kuenne C, Billion A, Mraheil MA, Strittmatter A, Daniel R, Goesmann A, et al. Reassessment of the Listeria monocytogenes Pan-Genome Reveals Dynamic Integration Hotspots and Mobile Genetic Elements as Major Components of the Accessory Genome. BMC Genomics. 2013; 14:47. 10.1186/1471-2164-14-47 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Meinersmann RJ, Phillips RW, Wiedmann M, Berrang ME. Multilocus sequence typing of Listeria monocytogenes by use of hypervariable genes reveals clonal and recombination histories of three lineages. Appl Environ Microbiol. 2004;70: 2193–2203. 10.1128/aem.70.4.2193-2203.2004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Nelson KE, Fouts DE, Mongodin EF, Ravel J, DeBoy RT, Kolonay JF, et al. Whole Genome Comparisons of Serotype 4b and 1/2a Strains of the Food-Borne Pathogen Listeria monocytogenes Reveal New Insights Into the Core Genome Components of This Species. Nucleic Acids Res. 2004;32: 2386–2395. 10.1093/nar/gkh562 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Hain T, Chatterjee SS, Ghai R, Kuenne CT, Billion A, Steinweg C, et al. Pathogenomics of Listeria spp. Int J Med Microbiol. 2007;297: 541–557. 10.1016/j.ijmm.2007.03.016 [DOI] [PubMed] [Google Scholar]
  • 18.den Bakker HC, Cummings CA, Ferreira V, Vatta P, Orsi RH, Degoricija L, et al. Comparative genomics of the bacterial genus Listeria: Genome evolution is characterized by limited gene acquisition and limited gene loss. BMC Genomics. 2010;11: 688. 10.1186/1471-2164-11-688 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.den Bakker HC, Desjardins CA, Griggs AD, Peters JE, Zeng Q, Young SK, et al. Evolutionary Dynamics of the Accessory Genome of Listeria monocytogenes. PLoS One. 2013;8: e67511. 10.1371/journal.pone.0067511 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Cantinelli T, Chenal-Francisque V, Diancourt L, Frezal L, Leclercq A, Wirth T, et al. "Epidemic clones" of Listeria monocytogenes are widespread and ancient clonal groups. J Clin Microbiol. 2013;51: 3770–3779. 10.1128/JCM.01874-13 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Call DR, Borucki MK, Besser TE. Mixed-genome Microarrays Reveal Multiple Serotype and Lineage-Specific Differences Among Strains of Listeria monocytogenes. J Clin Microbiol. 2003;41: 632–639. 10.1128/jcm.41.2.632-639.2003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Salcedo C, Arreaza L, Alcalá B, de la Fuente L, Vázquez JA. Development of a multilocus sequence typing method for the analysis of Listeria monocytogenes clones. J Clin Microbiol. 2003;41: 757–762. 10.1128/jcm.41.2.757-762.2003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Mallik S. IDENTIFICATION METHODS | Multilocus Enzyme Electrophoresis. In: Batt CA, Tortorello ML, editors. Reference Module in Food Science: Encyclopedia of Food Microbiology (Second Edition); 2014. Pp. 336–343.
  • 24.O’Rourke M, Stevens E. Genetic structure of Neisseria gonorrhoeae populations: a non-clonal pathogen. Journal of General Microbiology. 1993;139: 2603–2611. 10.1099/00221287-139-11-2603 [DOI] [PubMed] [Google Scholar]
  • 25.Moura A, Criscuolo A, Pouseele H, Maury MM, Leclercq A, Tarr C, et al. Whole genome-based population biology and epidemiological surveillance of Listeria monocytogenes. Nat Microbiol. 2016;2: 16185. 10.1038/nmicrobiol.2016.185 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Louha S, Meinersmann RJ, Abdo Z, Berrang ME, Glenn TC. An open-source program (Haplo-ST) for whole-genome sequence typing shows extensive diversity among Listeria monocytogenes isolates in outdoor environments and poultry processing plants. Appl Environ Microbiol. 2020;87: e02248–20. 10.1128/AEM.02248-20 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Hannon GJ. 2010. FASTX-Toolkit, FASTQ/A short-reads pre-processing tools. Repository http://hannonlab.cshl.edu/fastx_toolkit
  • 28.Ratan A. Assembly algorithms for next generation sequence data. Ph.D. Dissertation, The Pennsylvania State University. 2009. Available from: https://etda.libraries.psu.edu/files/final_submissions/587
  • 29.Excoffier L, Lischer HEL. Arlequin suite ver 3.5: A new series of programs to perform population genetics analyses under Linux and Windows. Mol Ecol Resour. 2010;10: 564–567. 10.1111/j.1755-0998.2010.02847.x [DOI] [PubMed] [Google Scholar]
  • 30.Slatkin M. Linkage disequilibrium in growing and stable populations. Genetics. 1994;137: 331–336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Charlesworth B, Charlesworth D. Elements of Evolutionary Genetics. Greenwood Village, Colorado: Roberts and Company publishers; 2010. [Google Scholar]
  • 32.Siol M, Jacquin F, Chabert-Martinello M, Smýkal P, Le Paslier MC, Aubert G, et al. Patterns of Genetic Structure and Linkage Disequilibrium in a Large Collection of Pea Germplasm. G3. 2017;7: 2461–2471. 10.1534/g3.117.043471 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Bahbahani H, Afana A, Wragg D. Genomic signatures of adaptive introgression and environmental adaptation in the Sheko cattle of southwest Ethiopia. PLoS One. 2018;13: e0202479. 10.1371/journal.pone.0202479 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Letunic I, Bork P. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res. 2016; 44: W242–245. 10.1093/nar/gkw290 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Duncan KE, Ferguson N, Kimura K, Zhou X, Istock CA. Fine-scale genetic and phenotypic structure in natural populations of Bacillus subtilis and Bacillus licheniformis: implications for bacterial evolution and speciation. Evolution. 1994;48: 2002–2025. 10.1111/j.1558-5646.1994.tb02229.x [DOI] [PubMed] [Google Scholar]
  • 36.Al Suwayyid BA, Coombs GW, Speers DJ, Pearson J, Wise MJ, Kahler CM. Genomic epidemiology and population structure of Neisseria gonorrhoeae from remote highly endemic Western Australian populations. BMC Genomics. 2018;19: 165. 10.1186/s12864-018-4557-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Suerbaum S, Smith JM, Bapumia K, Morelli G, Smith NH, Kunstmann E, et al. Free recombination within Helicobacter pylori. Proc Natl Acad Sci USA. 1998;95: 12619–12624. 10.1073/pnas.95.21.12619 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Nightingale K, Windham K, Wiedmann M. Evolution and molecular phylogeny of Listeria monocytogenes isolated from human and animal listeriosis cases and foods. J Bacteriol. 2005;187: 5537–5551. 10.1128/JB.187.16.5537-5551.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Flamm RK, Hinrichs DJ, Thomashow MF. Introduction of pAM beta 1 into Listeria monocytogenes by conjugation and homology between native L. monocytogenes plasmids. Infect Immun. 1984;44: 157–161. 10.1128/IAI.44.1.157-161.1984 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Hodgson DA. Generalized transduction of serotype 1/2 and serotype 4b strains of Listeria monocytogenes. Mol. Microbiol. 2000;35: 312–323. 10.1046/j.1365-2958.2000.01643.x [DOI] [PubMed] [Google Scholar]
  • 41.Lebrun M, Loulergue J, Chaslus-Dancla E, Audurier A. Plasmids in Listeria monocytogenes in relation to cadmium resistance. Appl Environ Microbiol. 1992;58: 3183–3186. 10.1128/AEM.58.9.3183-3186.1992 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Buchrieser C. Biodiversity of the species Listeria monocytogenes and the genus Listeria. Microbes Infect. 2007;9: 1147–1155. 10.1016/j.micinf.2007.05.002 [DOI] [PubMed] [Google Scholar]
  • 43.Glaser P, Frangeul L, Buchrieser C, Rusniok C, Amend A, Baquero F, et al. Comparative genomics of Listeria species. Science. 2001;294: 849–852. 10.1126/science.1063447 [DOI] [PubMed] [Google Scholar]
  • 44.Borezee E, Msadek T, Durant L, Berche P. Identification in Listeria monocytogenes of MecA, a homologue of the Bacillus subtilis competence regulatory protein. J Bacteriol. 2000;182: 5931–5934. 10.1128/jb.182.20.5931-5934.2000 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Jain R, Rivera MC, Lake JA. Horizontal gene transfer among genomes: the complexity hypothesis. Proc Natl Acad Sci USA. 1999;96: 3801–3806. 10.1073/pnas.96.7.3801 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Garcia-Vallve S, Simo FX, Montero MA, Arola L, Romeu A. Simultaneous horizontal gene transfer of a gene coding for ribosomal protein l27 and operational genes in Arthrobacter sp. J Mol Evol. 2002;55: 632–637. 10.1007/s00239-002-2358-5 [DOI] [PubMed] [Google Scholar]
  • 47.Makarova KS, Ponomarev VA, Koonin EV. Two C or not two C: recurrent disruption of Zn-ribbons, gene duplication, lineage-specific gene loss, and horizontal gene transfer in evolution of bacterial ribosomal proteins. Genome Biol. 2001;2: RESEARCH 0033. 10.1186/gb-2001-2-9-research0033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Chen K, Roberts E, Luthey-Schulten Z. Horizontal gene transfer of zinc and non-zinc forms of bacterial ribosomal protein S4. BMC Evol Biol. 2009;9: 179. 10.1186/1471-2148-9-179 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Brochier C, Philippe H, Moreira D. The evolutionary history of ribosomal protein RpS14: horizontal gene transfer at the heart of the ribosome. Trends Genet. 2000;16: 529–533. 10.1016/s0168-9525(00)02142-9 [DOI] [PubMed] [Google Scholar]
  • 50.Tsai YH, Maron SB, McGann P, Nightingale KK, Wiedmann M, Orsi RH. Recombination and positive selection contributed to the evolution of Listeria monocytogenes lineages III and IV, two distinct and well supported uncommon L. monocytogenes lineages. Infect Genet Evol. 2011;11: 1881–90. 10.1016/j.meegid.2011.08.001 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Yung-Fu Chang

17 Dec 2020

PONE-D-20-33996

Whole genome genetic variation and linkage disequilibrium in a diverse collection of Listeria monocytogenes isolates

PLOS ONE

Dear Dr. Louha,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Your manuscript has been reviewed by two experts in your field. Based on their comments, a major reviison is needed before a decision can be made.

Please submit your revised manuscript by 4 weeks. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Yung-Fu Chang

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that you are reporting an analysis of a microarray, next-generation sequencing, or deep sequencing data set. PLOS requires that authors comply with field-specific standards for preparation, recording, and deposition of data in repositories appropriate to their field. Please upload these data to a stable, public repository (such as ArrayExpress, Gene Expression Omnibus (GEO), DNA Data Bank of Japan (DDBJ), NCBI GenBank, NCBI Sequence Read Archive, or EMBL Nucleotide Sequence Database (ENA)). In your revised cover letter, please provide the relevant accession numbers that may be used to access these data. For a full list of recommended repositories, see http://journals.plos.org/plosone/s/data-availability#loc-omics or http://journals.plos.org/plosone/s/data-availability#loc-sequencing.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript by Louha et al. presented analysis of wgMLST data from 180 Listeria monocytogenes and evidence of linkage disequilibrium. The study is novel and has not been performed on L. monocytogenes, an important food-borne pathogen. The authors provided a thoughtful introduction and discussion of the utility of LD. Here are additional specific comments:

1. The references are not in the correct citation format within the text. The references should be numbered.

2. Line 82 – It is unclear whether the 180 L. monocytogenes were isolated by the authors, or that whole genome sequences data files were obtained for the analysis. Please clarify. If isolates were sequenced in-house, please provide protocols.

3. The Materials and Methods section should include the following information: a) isolation protocol for the 180 L. monocytogenes strains; b) DNA isolation method, library preparation, and sequencing; c) details of sequencing instrumentations, kit versions, sequencing data analysis software; and d) method of gene annotation and database used.

4. Line 127 – please provide reference for Fst ‘s “history of being used as a measure of the level of differentiation”.

5. Line 148 – please spell out “three”.

6. Line 148 and 152, - please list the SRA accession numbers for the isolates mentioned.

7. Figure 1 – Please add a reference L. monocytogenes (e.g. EGD-e) genome sequence to root the minimum spanning tree as a comparison.

8. Line 158 – Is the Figure 1 title supposed to be within the main text? It is unclear whether lines 158-164 are part of the Figure 1 title and description, or part of the main the text.

9. Line 170 – What is the “Secondly” and “Thirdly” on lines 169 and 170, respectively, following? I do not see a “First”, in this paragraph.

10. Line 191 – Please re-word the Table 2 title. A table title should not include “thus”.

11. Table 2 – please re-label the column “Gene name” to “Locus Tag”. Add another column for “Gene Symbol”. The gene symbol is the 3-4 letter abbreviation (e.g. lmo0046 gene symbol is rps R).

12. Line 200 – please provide reference for the statement “High levels of LD can not only arise in highly clonal bacterial populations with low rates of recombination”.

13. Line 204 – please clarify this statement without using the subjective word “difficult”. In addition, a lack of explanation does not provide convincing causal relationship between high level of LD and low rates of recombination. Please provide direct evidence for the cause.

14. Line 222 – please clarify what “competence related genes” are and provide examples of these genes.

15. Lines 267 – 172 – please explain further the notion of using LD in “identifying emerging strains of L. monocytogenes" and “developing better control measures”. The statement is general and not self-evident. Different DNA sequences in specific genes or LD ‘hot’ or ‘cold’ spots do not determine whether an isolate is a new strain. A comparison of whole genome sequences could identify new strains, and does not require the identification of LD. In addition, how would identification of LD lead to better ”control measures”?

16. Line 406 – S1 File – Please add SRA accession numbers to the isolates from River Water and Poultry Processing Plants

17. Figure 2 – Please provide X and Y axis labels.

Reviewer #2: The authors Louha, et al, of “Whole genome genetic variation and linkage disequilibrium in a diverse collection of Listeria monocytogenes isolates” examine whole genome sequence data from a set of L. monocytogenes from diverse sources to understand linkage disequilibrium and genetic variation in L. monocytogenes. The manuscript confirms existing knowledge in the field that L. monocytogenes has limited exchange of genetic information as a source for genetic variation. Overall the manuscript is well written and adds to our understanding of genetic variation in L. monocytogenes.

Major comments:

a. When listing panel of isolates in materials and methods – only list sources for 60 out of 180 isolates – would recommend listing sources for all isolates. How were the isolates selected that were included in this study if there were more than 20 isolates in the data set originally? Lineage, ST, and CC information should be provided for these isolates, as the authors note, there are differences between lineages and CC in terms of recombination. Also for clonal complex, important to understand if strains are of the same epidemic clone since that is also mentioned in the manuscript.

b. Line 96 – the authors used an unusual choice for assembly, the pipeline YASRA is not peer-reviewed (the reference given was from a dissertation), can the author provide further justification for using this assembler while there are new assemblers available that were build for Illumina data and longer paired end reads than the pipeline used in this manuscript.

c. Line 97 - The authors refer to the wgMLST scheme on the Institute Pasteur website for L. monocytogenes, a core genome MLST scheme is available there that has 1748 loci as part of the scheme, how are alleles for the remaining genes that are part of this manuscript’s wgMLST scheme called? Why are there 2 separate allele calling approaches? Overall this was very confusing and may benefit from a workflow diagram to be included in supplemental figures.

Minor comments:

a. Line 74 – could not find a Moura et al 2017 reference – please update

b. Line 260 – change “pleasures” to “pressures”

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Feb 25;16(2):e0242297. doi: 10.1371/journal.pone.0242297.r002

Author response to Decision Letter 0


29 Jan 2021

Reviewer #1: The manuscript by Louha et al. presented analysis of wgMLST data from 180 Listeria monocytogenes and evidence of linkage disequilibrium. The study is novel and has not been performed on L. monocytogenes, an important food-borne pathogen. The authors provided a thoughtful introduction and discussion of the utility of LD. Here are additional specific comments:

1. The references are not in the correct citation format within the text. The references should be numbered.

>>>We have numbered the references in the correct citation format within the text.

2. Line 82 – It is unclear whether the 180 L. monocytogenes were isolated by the authors, or that whole genome sequences data files were obtained for the analysis. Please clarify. If isolates were sequenced in-house, please provide protocols.

>>> Whole-genome sequencing data for 140 isolates from food, FCS, manure, milk, clinical cases, soil, and RTE products were obtained from the NCBI Pathogen Detection database. For the remaining 40 isolates from river water (#20) and poultry processing plants (#20), whole-genome sequencing data was provided to us by USDA and FSIS. These isolates were cultured and sequenced by USDA and the protocols have been provided in our recent paper (Louha et al. 2020, DOI: 10.1128/AEM.02248-20, citation no. 26 in manuscript). We have clarified this point and cited our paper for protocols in lines 77-83.

3. The Materials and Methods section should include the following information: a) isolation protocol for the 180 L. monocytogenes strains; b) DNA isolation method, library preparation, and sequencing; c) details of sequencing instrumentations, kit versions, sequencing data analysis software; and d) method of gene annotation and database used.

>>> We obtained whole-genome sequencing data for 140 isolates from the NCBI Pathogen detection database. The remaining 40 isolates were isolated and sequenced by USDA, and the protocols for that have been mentioned in detail in the paper “Louha et al. 2020, DOI: 10.1128/AEM.02248-20”. We have cited this paper (# 26) in the Materials and Methods section, line 83.

Gene annotation i.e. wgMLST has been performed with the tool Haplo-ST, and the database used is BIGSdb-Lm (bundled with Haplo-ST). This has been described in details in lines 86-94 of the manuscript. We have also cited the paper (Louha et al. 2020, DOI: 10.1128/AEM.02248-20) which introduces and describes Haplo-ST in line 87. The online version of the BIGSdb-Lm database has been cited in line 94.

4. Line 127 – please provide reference for Fst ‘s “history of being used as a measure of the level of differentiation”.

>>> Two references (Siol et al. 2017, Bahbahani et al. 2018 [citation no. 32 and 33 in manuscript]) has been provided for “Fst ‘s “history of being used as a measure of the level of differentiation” in line 125.

5. Line 148 – please spell out “three”.

>>> Changes have been made in line 145.

6. Line 148 and 152, - please list the SRA accession numbers for the isolates mentioned.

>>>SRA accession numbers for the isolates mentioned in line 145 and 150-151 has been listed in brackets (see highlighted text in these lines).

7. Figure 1 – Please add a reference L. monocytogenes (e.g. EGD-e) genome sequence to root the minimum spanning tree as a comparison.

>>>When the minimum spanning tree is rooted with EGD-e, its structure changes to some extent. While majority of the soil and manure isolates still cluster together to form the red branch, the food-related isolates (in the blue branch) cluster in a different pattern. Due to this change in tree structure, we have included the minimum spanning tree rooted with EGD-e in the Supplemental files as Figure S2 (lines 155-156).

8. Line 158 – Is the Figure 1 title supposed to be within the main text? It is unclear whether lines 158-164 are part of the Figure 1 title and description, or part of the main the text.

>>>PLOS ONE guidelines for figures mentions that:

“Figure captions must be inserted in the text of the manuscript, immediately following the paragraph in which the figure is first cited (read order). Do not include captions as part of the figure files themselves or submit them in a separate document.”

Hence we have inserted the Figure captions in the text, immediately following the paragraph in which the figure has been first cited. We have left a line gap before and after the Figure caption to separate it from the main text. For example, in the marked up manuscript, lines 158-164 are part of Figure 1 title and description.

9. Line 170 – What is the “Secondly” and “Thirdly” on lines 169 and 170, respectively, following? I do not see a “First”, in this paragraph.

>>>This paragraph lists three results apparent from the genetic differentiation test that computes pairwise FST’s. The “First” result is listed in the first line (line 166…169) of this paragraph.

10. Line 191 – Please re-word the Table 2 title. A table title should not include “thus”.

>>> Removed “thus” in the Table 2 title.

11. Table 2 – please re-label the column “Gene name” to “Locus Tag”. Add another column for “Gene Symbol”. The gene symbol is the 3-4 letter abbreviation (e.g. lmo0046 gene symbol is rps R).

>>> Changes have been made in Table 2 as suggested.

12. Line 200 – please provide reference for the statement “High levels of LD can not only arise in highly clonal bacterial populations with low rates of recombination”.

>>>The complete sentence in line 199-203 (“High levels of LD can not only arise in highly clonal bacterial populations with low rates of recombination, but may also be temporarily present in bacteria with ‘epidemic’ population structures, in which high recombination rates randomize association between alleles, but adaptive clones emerge and diversify over the short-term.”) has two references: Smith et al. 1993, and Feil and Spratt 2001 (citation no. 3 and 5). We have listed these two references at the end of the sentence in line 203.

Salmonella enterica can be a specific example of a species in which “high levels of LD arise in highly clonal bacterial populations with low rates of recombination” and has been mentioned in both the cited references.

13. Line 204 – please clarify this statement without using the subjective word “difficult”. In addition, a lack of explanation does not provide convincing causal relationship between high level of LD and low rates of recombination. Please provide direct evidence for the cause.

>>> We have replaced this statement in line 203-204 with “Because Listeria has a clonal genetic structure, it is unlikely that this high level of LD can arise except as a consequence of low rates of recombination.” Here we are not trying to establish a causal relationship, instead we are making a correlation between high levels of LD and low rates of recombination on the basis of the analysis that we have done. Similar correlations between high levels of LD and low rates of recombination has been made for other bacterial species in the literature (Smith et al. 1993, Feil and Spratt 2001 [citation no. 3 and 5]).

14. Line 222 – please clarify what “competence related genes” are and provide examples of these genes.

>>> Many species of bacteria are able to take up genetic material from their surroundings. Occasionally, such absorbed DNA is recombined into the organism’s own genome, resulting in natural transformation. Natural competence for transformation is considered a primary mode of horizontal gene transfer in prokaryotes, together with conjugation (direct cell to cell transfer of DNA via a specialized conjugal pilus) and phage transduction (DNA transfer mediated by viruses). Competence related genes are those that facilitate DNA uptake and consist of genes that encode the DNA uptake apparatus, the proteins that mediate protection of the incoming DNA within the bacterial cytoplasm, and the proteins that initiate recruitment of the recombination enzyme (Blokesch 2017). The DNA uptake machinery required for natural genetic competence is broadly conserved among species, including noncompetent bacteria. Examples of some competence related genes are comK, comG, comE etc. (Rabinovich et al. 2012).

We have described competence related genes and provided examples in line 219..”( which facilitate exogenous DNA uptake, for e.g. comK, come, comG etc.)”

References:

1) Blokesch M. Natural competence for transformation. Current Biology. 2016;26: R1119-R1136.

2) Rabinovich L, Sigal N, Borovok I, Nir-Paz R, Herskovits AA. Prophage excision activates Listeria competence genes that promote phagosomal escape and virulence. Cell. 2012;150: 792-802.

15. Lines 267 – 172 – please explain further the notion of using LD in “identifying emerging strains of L. monocytogenes" and “developing better control measures”. The statement is general and not self-evident. Different DNA sequences in specific genes or LD ‘hot’ or ‘cold’ spots do not determine whether an isolate is a new strain. A comparison of whole genome sequences could identify new strains, and does not require the identification of LD. In addition, how would identification of LD lead to better ”control measures”?

>>> We do not agree with “Different DNA sequences in specific genes or LD ‘hot’ or ‘cold’ spots do not determine whether an isolate is a new strain”. Due to recombination, emergent strains can have novel DNA sequences in their genes, thus giving rise to previously unobserved alleles (with no curated allele IDs present in databases). Thus, such newly arising strains would have novel allelic configurations (wgMLST profiles). Comparison of whole genome sequences can identify new strains, but this is cumbersome when compared to comparison of allelic profiles, which is the popular method currently used for identifying new strains by government laboratories (Jagadeesan et al. 2019).

Determination of LD in the genome helps recognize genes that are non-randomly associated (and thus less prone to recombination) and genes in which recombination is more likely. While recombination in the accessory genome (highly prone to horizontal gene transfer) of bacteria is an important source of evolutionary novelty, adaptation in highly conserved core genes is critical for long-term survival and short-term response to new selection pressures such as resistance to antibiotics (Everitt et al. 2014). As new strains are typed, their allelic configurations could be compared against other previously characterized strains. Novel allelic configurations (in both LD hot and cold spots) would indicate a previously unobserved emergent strain and can provide insights into the evolutionary changes and selection pressures in the environment that gave rise to the emergent strain. Determination of evolutionary relationship between emergent strains and known pathogenic strains can help determine the potential of the new emergent strain for causing disease. Thus, knowledge of the pathogenic potential of a newly arising strain and the selection pressures in the environment that led to the emergence of such a strain can help in developing better control measures for this pathogen. We have added this information in lines 263-268.

References:

1) Jagadeesan B, Baert L, Wiedmann M, Orsi RH. Comparative Analysis of Tools and Approaches for Source Tracking Listeria monocytogenes in a Food Facility Using Whole-Genome Sequence Data. Front Microbiol. 2019;10: 947.

2) Everitt RG, Didelot X, Batty EM, Miller RR, Knox K, Young BC et al. Mobile elements drive recombination hotspots in the core genome of Staphylococcus aureus. Nat Commun. 2014;5: 3956.

16. Line 406 – S1 File – Please add SRA accession numbers to the isolates from River Water and Poultry Processing Plants.

>>>SRA accession numbers for River Water and Poultry processing plant isolates have been added to File S1.

17. Figure 2 – Please provide X and Y axis labels.

>>>X and Y labels has been added to Figure 2.

Reviewer #2: The authors Louha, et al, of “Whole genome genetic variation and linkage disequilibrium in a diverse collection of Listeria monocytogenes isolates” examine whole genome sequence data from a set of L. monocytogenes from diverse sources to understand linkage disequilibrium and genetic variation in L. monocytogenes. The manuscript confirms existing knowledge in the field that L. monocytogenes has limited exchange of genetic information as a source for genetic variation. Overall the manuscript is well written and adds to our understanding of genetic variation in L. monocytogenes.

Major comments:

a. When listing panel of isolates in materials and methods – only list sources for 60 out of 180 isolates – would recommend listing sources for all isolates. How were the isolates selected that were included in this study if there were more than 20 isolates in the data set originally? Lineage, ST, and CC information should be provided for these isolates, as the authors note, there are differences between lineages and CC in terms of recombination. Also for clonal complex, important to understand if strains are of the same epidemic clone since that is also mentioned in the manuscript.

>>> Our dataset contains a total of 180 isolates. This consists of 20 isolates each from the following 9 locations: food, food contact surfaces (FCS), manure, milk, clinical cases, soil, ready-to-eat (RTE) products, river water, and poultry processing plants (20*9=180). Whole genome sequencing data for 140 isolates obtained from food, FCS, manure, milk, clinical cases, soil, and RTE products were obtained from the NCBI Pathogen Detection database. Whole genome sequencing data for the remaining 40 isolates obtained from river water and poultry processing plants were provided to us by USDA and FSIS. We have clarified this information in lines 77-83.

We have provided Isolation sources for all 180 isolates in File S1. We have also provided ST, CC and lineage of all isolates in File S1. Some isolates are new recombinant strains which have not been assigned ST, CC or lineage in the BIGSdb-Lm database, and hence this information has not been provided for these isolates.

b. Line 96 – the authors used an unusual choice for assembly, the pipeline YASRA is not peer-reviewed (the reference given was from a dissertation), can the author provide further justification for using this assembler while there are new assemblers available that were build for Illumina data and longer paired end reads than the pipeline used in this manuscript.

>>>We used YASRA for assembly because YASRA is a comparative assembler which uses a template to guide the assembly of a closely related target sequence and can accommodate high rates of polymorphism between the template and target. Hence, this assembler can be used to assemble an allelic variant of a gene by mapping to a reference sequence, even when the target allele has diverged considerably from the reference gene sequence. We did test other newer assemblers built for Illumina data, but these assemblers failed to assemble alleles that were highly divergent from the reference template. We used YASRA as an assembler within the tool Haplo-ST, which was used for assembly and allele calling of our isolates. Although YASRA has not been peer-reviewed, Haplo-ST has been published recently (Louha et al. 2020, DOI: 10.1128/AEM.02248-20, citation no. 26 in manuscript).

c. Line 97 - The authors refer to the wgMLST scheme on the Institute Pasteur website for L. monocytogenes, a core genome MLST scheme is available there that has 1748 loci as part of the scheme, how are alleles for the remaining genes that are part of this manuscript’s wgMLST scheme called? Why are there 2 separate allele calling approaches? Overall this was very confusing and may benefit from a workflow diagram to be included in supplemental figures.

>>>We have used only one approach (the tool Haplo-ST) to assemble and call alleles for L. monocytogenes isolates. We have mentioned this first in the Introduction, lines 68-73 and later modified the Materials and Methods, lines 86-94 to help resolve confusion.

Haplo-ST first cleans raw whole-genome sequencing reads using the FASTX-Toolkit, then it assembles alleles using YASRA, after which assembled alleles are assigned ID’s according to the nomenclature used by Institute Pasteur Listeria monocytogenes database using BIGSdb (tool used for calling alleles on the Institut Pasteur website). This pipeline has been automated with scripts and made portable by installation of all software dependencies i.e. FASTX, YASRA, and BIGSdb-Lm within a local Linux Virtual machine. A workflow diagram for Haplo-ST has been included in the paper (Louha et al. 2020, DOI: 10.1128/AEM.02248-20) that describes it. We have cited this paper in line 94 (citation no. 26) and also included the workflow diagram for Haplo-ST in Supplemental Fig S1.

We have used a wgMLST scheme consisting of 2554 genes for allele calling (mentioned in line 94). This scheme includes the 1748 loci in the core genome MLST scheme on the Institute Pasteur website as well as other genes present in the Institute Pasteur database. This has been described in details in the paper cited for the tool Haplo-ST.

Minor comments:

a. Line 74 – could not find a Moura et al 2017 reference – please update

>>> Thanks for pointing this out. We have changed the reference from ‘Moura et al 2017’ to ‘Moura et al 2016’, in line 68, citation no. 25.

b. Line 260 – change “pleasures” to “pressures”

>>> Necessary changes have been made in line 255.

Attachment

Submitted filename: Response_to_Reviewers.pdf

Decision Letter 1

Yung-Fu Chang

12 Feb 2021

Whole genome genetic variation and linkage disequilibrium in a diverse collection of Listeria monocytogenes isolates

PONE-D-20-33996R1

Dear Dr. Louha,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Yung-Fu Chang

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: (No Response)

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Acceptance letter

Yung-Fu Chang

17 Feb 2021

PONE-D-20-33996R1

Whole genome genetic variation and linkage disequilibrium in a diverse collection of Listeria monocytogenes isolates

Dear Dr. Louha:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Yung-Fu Chang

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. Panel of 180 L. monocytogenes isolates collected from different ecological communities.

    (XLSX)

    S2 File. Whole-genome MLST profiles of the 180 L. monocytogenes isolates.

    (XLSX)

    S3 File. Whole-genome MLST profiles of 2233 loci retained for AMOVA after filtering out paralogous loci and loci with > 5% of missing data.

    (XLSX)

    S4 File. Heatmap of LD in the genome of L. monocytogenes.

    (XLSX)

    S5 File. Percentage of genes at LD with each gene in the genome of L. monocytogenes.

    (XLSX)

    S1 Fig. Workflow diagram for Haplo-ST.

    (PDF)

    S2 Fig. Minimum spanning tree of 180 Listeria monocytogenes isolates rooted with reference strain EGD-e.

    (PDF)

    Attachment

    Submitted filename: Response_to_Reviewers.pdf

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting Information files. All data used in this study have been uploaded to GenBank and the Accession numbers have been recorded in S1 File.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES