Skip to main content
Journal of Clinical Microbiology logoLink to Journal of Clinical Microbiology
. 2015 Aug 18;53(9):2869–2876. doi: 10.1128/JCM.01193-15

Defining and Evaluating a Core Genome Multilocus Sequence Typing Scheme for Whole-Genome Sequence-Based Typing of Listeria monocytogenes

Werner Ruppitsch a,, Ariane Pietzka a, Karola Prior b, Stefan Bletz c, Haizpea Lasa Fernandez a, Franz Allerberger a, Dag Harmsen b, Alexander Mellmann c
Editor: D J Diekema
PMCID: PMC4540939  PMID: 26135865

Abstract

Whole-genome sequencing (WGS) has emerged today as an ultimate typing tool to characterize Listeria monocytogenes outbreaks. However, data analysis and interlaboratory comparability of WGS data are still challenging for most public health laboratories. Therefore, we have developed and evaluated a new L. monocytogenes typing scheme based on genome-wide gene-by-gene comparisons (core genome multilocus the sequence typing [cgMLST]) to allow for a unique typing nomenclature. Initially, we determined the breadth of the L. monocytogenes population based on MLST data with a Bayesian approach. Based on the genome sequence data of representative isolates for the whole population, cgMLST target genes were defined and reappraised with 67 L. monocytogenes isolates from two outbreaks and serotype reference strains. The Bayesian population analysis generated five L. monocytogenes groups. Using all available NCBI RefSeq genomes (n = 36) and six additionally sequenced strains, all genetic groups were covered. Pairwise comparisons of these 42 genome sequences resulted in 1,701 cgMLST targets present in all 42 genomes with 100% overlap and ≥90% sequence similarity. Overall, ≥99.1% of the cgMLST targets were present in 67 outbreak and serotype reference strains, underlining the representativeness of the cgMLST scheme. Moreover, cgMLST enabled clustering of outbreak isolates with ≤10 alleles difference and unambiguous separation from unrelated outgroup isolates. In conclusion, the novel cgMLST scheme not only improves outbreak investigations but also enables, due to the availability of the automatically curated cgMLST nomenclature, interlaboratory exchange of data that are crucial, especially for rapid responses during transsectorial outbreaks.

INTRODUCTION

Listeria monocytogenes is a facultative anaerobe, a Gram-positive, psychrophilic and salt-tolerant, facultative intracellular pathogen of humans and animals, causing clinical manifestations like gastroenteritis, encephalitis, meningitis, and septicemia. A high hospitalization rate of >90% and a case-fatality rate up to 30% make L. monocytogenes an important human pathogen (1). The characteristic traits (growth at low temperatures, survival of freezing and high-salt and nitrite preservation methods, and biofilm formation) of L. monocytogenes represent a major issue for industrialized food production and facilitate food contamination at several stages of food production (2). Nearly all cases of listeriosis are caused by consumption or use of contaminated food or feed.

The traditional L. monocytogenes serotyping scheme allows the differentiation of 12 serotypes of which 4b, 1/2a, and 1/2b isolates cause about 96% of all reported human listeriosis cases (3). Low discriminatory power, insufficient reproducibility, and antigen sharing between serotypes impede the value of serotyping in outbreak investigations and necessitate more accurate and more discriminatory typing solutions (4).

Pulsed-field gel electrophoresis (PFGE) has been established as the current “gold standard” for L. monocytogenes typing by PulseNet (5, 6) and has been essential for outbreak investigation worldwide (7). However, PFGE is time-consuming, expensive, and difficult to standardize (8, 9). Methods based on DNA sequence analysis appear more promising for fast, accurate, and reproducible strain typing (10). Whereas multilocus sequence typing (MLST) (11, 12) and multi-virulence-locus sequence typing (MVLST) (13) schemes for L. monocytogenes share the characteristics of sequence-based methods, they both lack the discriminative power needed for outbreak investigation of this clonal pathogen (7, 14).

Nowadays, the recent and ongoing evolution of sequencing technologies from Sanger sequencing to next-generation sequencing enables sequence analysis on a whole-genome level. Several studies on a variety of bacterial species have already shown that whole-genome sequence (WGS)-based typing, based either on single nucleotide variants (SNVs) (15, 16) or on gene-by-gene allelic profiling of core genome genes, frequently named core genome MLST (cgMLST) or MLST+ (17, 18), currently represents the ultimate diagnostic tool for strain typing. Recently, we successfully applied a cgMLST typing approach to L. monocytogenes (19). Nevertheless, the broad use of WGS-based approaches is still hampered by the lack of standardized nomenclature that would facilitate global exchange of data, as has already been the reality for classical MLST data (20) for more than a decade.

To achieve a stable cgMLST scheme for L. monocytogenes that can form the basis of a standardized nomenclature for WGS-based L. monocytogenes typing, first we defined an L. monocytogenes core genome gene set representing the genetic diversity within the L. monocytogenes population based on well-characterized reference strains, and second we challenged this scheme for suitability in outbreak investigations using isolates from two outbreaks and sporadic cases.

MATERIALS AND METHODS

Microorganisms and DNA extraction.

All strains and genome sequences used for the development of the novel cgMLST L. monocytogenes scheme are listed in Table 1. For subsequent evaluation of the scheme, a total of 67 L. monocytogenes isolates from sporadic cases (n = 8 isolates, that served also as outgroups for the outbreaks with matching serotypes and highly similar or even identical PFGE pattern) and two outbreaks (n = 42) (2123) with reference strains for all serotypes (n = 17) were used (Table 2). All strains were cultured overnight at 37°C on RAPID'L.Mono agar (Bio-Rad, Vienna, Austria) for species confirmation and subcultivated on Columbia blood agar plates (bioMérieux, Marcy I'Etoile, France) prior to DNA extraction using the GenElute Bacterial Genomic DNA kit (Sigma, St. Louis, MO, USA) according to the manufacturer's instructions.

TABLE 1.

List of L. monocytogenes strains and genomes used for SeqSphere cgMLST L. monocytogenes target definition

Strain MLST STa Lineageb BAPS partition Serogroup Average coverage (no. contigs) NCBI RefSeq or ENA SRA accession number(s)
EGD-e (reference genome) 35 II Lm02 1/2a NAc NC_003210
07PF0776 4 I Lm01 4b NA NC_017728
08-5578 292 II Lm02 1/2a NA NC_013766
08-5923 120 II Lm02 1/2a NA NC_013768
10403S 85 II Lm02 1/2a NA NC_017544
ATCC 19117 2 I Lm01 4d NA NC_018584
C1-387 155 II Lm02 1/2a NA NC_021823
Clip81459 4 I Lm01 4b NA NC_012488
F2365 1 I Lm01 4b NA NC_002973
Finland 1998 155 II Lm02 3a NA NC_017547
FSL R2-561 9 II Lm02 1/2c NA NC_017546
HCC23 201 III Lm04 4a NA NC_011660
J0161 11 II Lm02 1/2a NA NC_017545
J1-220 2 I Lm01 4b NA NC_021830
J1776 6 I Lm01 4b NA NC_021839
J1816 6 I Lm01 4b NA NC_021829
J1817 6 I Lm01 4b NA NC_021827
J1926 6 I Lm01 4b NA NC_021840
J2-031 394 II Lm02 1/2c NA NC_021837
J2-064 5 I Lm01 1/2b NA NC_021824
J2-1091 1 I Lm01 1/2a NA NC_021825
L312 4 I Lm01 4b NA NC_018642
L99 201 III Lm04 4a NA NC_017529
LL195 1 I Lm01 4b NA NC_019556
M7 201 III Lm04 4a NA NC_017537
N1-011A 3 I Lm01 1/2b NA NC_021826
R2-502 3 I Lm01 1/2b NA NC_021838
SLCC0717 518 III Lm03 1/2a 163 (21) ERR664778
SLCC0759 481 III Lm03 1/2a 156 (23) ERR664779
SLCC1042 18 III Lm03 1/2a 124 (20) ERR664780
SLCC2372 122 II Lm02 1/2c NA NC_018588
SLCC2376 71 III Lm04 4c NA NC_018590
SLCC2378 73 I Lm01 4e NA NC_018585
SLCC2479 9 II Lm02 3c NA NC_018589
SLCC2482 3 I Lm01 7 NA NC_018591
SLCC2540 617 I Lm01 3b NA NC_018586
SLCC2755 66 I Lm01 1/2b NA NC_018587
SLCC3287 427 III Lm03 1/2a 132 (18) ERR664782
SLCC4771 467 IV Lm07 4c 162 (25) ERR664786, ERR664787
SLCC5850 12 II Lm02 1/2a NA NC_018592
SLCC6263 466 III Lm03 1/2a 180 (16) ERR664785
SLCC7179 91 II Lm02 3a NA NC_018593
b

Lineage designation in accordance with Haase et al. (7).

c

NA, not applicable.

TABLE 2.

List of L. monocytogenes isolates used for evaluation of the SeqSphere cgMLST L. monocytogenes schemea

Sample identification Country of isolation Origin Collection year Serotype MLST STb Lineagec BAPS partition % good cgMLST targets Coverage (no. of contigs) ENA accession no. Reference(s) or study Commentd
L3308 Austria Human 2008 4b 1 Lineage I Lm01 99.4 180 (25) ERR664375 21 JPO
L3808 Austria Human 2008 4b 1 Lineage I Lm01 99.4 164 (29) ERR664376 21 JPO
L3908 Austria Human 2008 4b 1 Lineage I Lm01 99.3 139 (31) ERR664377 21 JPO
L4008 Austria Human 2008 4b 1 Lineage I Lm01 99.4 133 (29) ERR664378 21 JPO
L4508 Austria Human 2008 4b 1 Lineage I Lm01 99.4 174 (30) ERR664379 21 JPO
L6708 Austria Human 2008 4b 1 Lineage I Lm01 99.4 180 (34) ERR664394, ERR664395 21 JPO
L6808 Austria Human 2008 4b 1 Lineage I Lm01 99.4 160 (29) ERR664380 21 JPO
W9508 Austria Food 2008 4b 1 Lineage I Lm01 99.4 180 (24) ERR664382 21 JPO
W9708 Austria Food 2008 4b 1 Lineage I Lm01 99.4 180 (27) ERR664384 21 JPO
L2708 Austria Human 2008 4b 249 Lineage I Lm01 99.4 151 (33) ERR664374 21 Outgroup of JPO
L7508 Austria Human 2008 4b 4 Lineage I Lm01 99.5 174 (22) ERR664381 21 Outgroup of JPO
3230TP5 Austria Food 2010 1/2a 403 Lineage II Lm02 99.8 106 (26) ERS482542 22, 23 ACCO I
L20-09 Austria Human 2009 1/2a 403 Lineage II Lm02 99.9 120 (23) ERS482565 22, 23 ACCO I
L21-09 Austria Human 2009 1/2a 403 Lineage II Lm02 99.8 120 (45) ERS482567 22, 23 ACCO I
L23-09 Austria Human 2009 1/2a 403 Lineage II Lm02 99.8 120 (25) ERS482568 22, 23 ACCO I
L27-09 Austria Human 2009 1/2a 777 Lineage II Lm02 99.8 117 (24) ERS482569 22, 23 ACCO I
L29-09 Austria Human 2009 1/2a 403 Lineage II Lm02 99.8 120 (46) ERS482570 22, 23 ACCO I
L31-09 Austria Human 2009 1/2a 777 Lineage II Lm02 99.7 120 (56) ERS482572 22, 23 ACCO I
L32-09 Austria Human 2009 1/2a 777 Lineage II Lm02 99.5 64 (76) ERS482573 22, 23 ACCO I
L33-09 Austria Human 2009 1/2a 403 Lineage II Lm02 99.9 120 (21) ERS482575 22, 23 ACCO I
L34-09 Austria Human 2009 1/2a 403 Lineage II Lm02 99.8 113 (36) ERS482577 22, 23 ACCO I
L35-09 Austria Human 2009 1/2a 777 Lineage II Lm02 99.9 98 (23) ERS482578 22, 23 ACCO I
L68-09 Austria Human 2009 1/2a 777 Lineage II Lm02 99.8 113 (35) ERS482582 22, 23 ACCO I
L71-09 Austria Human 2009 1/2a 403 Lineage II Lm02 99.9 120 (22) ERS482583 22, 23 ACCO I
L9-10 Austria Human 2010 1/2a 403 Lineage II Lm02 99.9 120 (19) ERS482585 22, 23 ACCO I
LD27-12 Germany Human 2012 1/2a 403 Lineage II Lm02 99.7 68 (49) ERS482587 This study Outgroup of ACCO I
MRL-13-00230 Germany Food 2013 1/2a 403 Lineage II Lm02 99.8 120 (34) ERS482588 This study Outgroup of ACCO I
Ro-015 Unknown Unknown 2010 1/2a 403 Lineage II Lm02 99.8 120 (22) ERS482589 This study Outgroup of ACCO I
16132 Austria Food 2009 1/2a 398 Lineage II Lm02 99.8 136 (17) ERS482539 This study ACCO II
2010-00770 Austria Food 2010 1/2a 398 Lineage II Lm02 99.8 120 (20) ERS482540 22, 23 ACCO II
3230TP3 Austria Food 2010 1/2a 398 Lineage II Lm02 99.8 146 (17) ERS482541 22, 23 ACCO II
4548TP4 Austria Food 2010 1/2a 398 Lineage II Lm02 99.8 160 (20) ERS482543 22, 23 ACCO II
K70-10 Unknown Food 2010 1/2a 398 Lineage II Lm02 99.8 120 (21) ERS482558 22, 23 ACCO II
L10-10 Austria Human 2010 1/2a 398 Lineage II Lm02 99.8 120 (16) ERS482559 22, 23 ACCO II
L14-10 Austria Human 2010 1/2a 398 Lineage II Lm02 99.8 113 (19) ERS482560 22, 23 ACCO II
L16-10 Austria Human 2010 1/2a 398 Lineage II Lm02 99.8 120 (19) ERS482561 22, 23 ACCO II
L17-10 Austria Human 2010 1/2a 398 Lineage II Lm02 99.8 120 (21) ERS482562 22, 23 ACCO II
L18-10 Austria Human 2010 1/2a 398 Lineage II Lm02 99.8 120 (20) ERS482563 22, 23 ACCO II
L19-10 Austria Human 2010 1/2a 398 Lineage II Lm02 99.8 120 (20) ERS482564 22, 23 ACCO II
L20-10 Austria Human 2010 1/2a 398 Lineage II Lm02 99.7 120 (19) ERS482566 22, 23 ACCO II
L30-10 Austria Human 2010 1/2a 398 Lineage II Lm02 99.8 120 (18) ERS482571 22, 23 ACCO II
L32-10 Austria Human 2010 1/2a 398 Lineage II Lm02 99.5 120 (18) ERS482574 22, 23 ACCO II
L33-10 Austria Human 2010 1/2a 398 Lineage II Lm02 99.8 120 (18) ERS482576 22, 23 ACCO II
L4-10 Austria Human 2009 1/2a 398 Lineage II Lm02 99.8 120 (17) ERS482580 22, 23 ACCO II
L42-10 Austria Human 2010 1/2a 398 Lineage II Lm02 99.8 120 (20) ERS482581 22, 23 ACCO II
L75-09 Austria Human 2009 1/2a 398 Lineage II Lm02 99.7 120 (26) ERS482584 22, 23 ACCO II
LD12-10 Germany Human 2010 1/2a 398 Lineage II Lm02 99.8 120 (18) ERS482586 22, 23 ACCO II
12025641 Austria Food 2012 1/2a 398 Lineage II Lm02 99.8 152 (18) ERS482537 This study Outgroup of ACCO II
12025647 Austria Food 2012 1/2a 398 Lineage II Lm02 99.8 142 (17) ERS482538 This study Outgroup of ACCO II
L38-11 Austria Human 2012 1/2a 398 Lineage II Lm02 99.7 113 (17) ERS482579 This study Outgroup of ACCO II
ATCC15313 United Kingdom Animal Unknown 1/2a 107 Lineage II Lm02 99.5 167 (15) ERS482544 This study Reference strain
CIP104794 United Kingdom Animal 1924 1/2a 12 Lineage II Lm02 99.4 150 (16) ERS482545 This study Reference strain
CIP105448 United Kingdom Human 1935 1/2c 122 Lineage II Lm02 99.8 112 (22) ERS482546 This study Reference strain
CIP105449 Unknown Animal 1967 1/2b 66 Lineage I Lm01 99.4 180 (22) ERS482547 This study Reference strain
CIP105457 New Zealand Animal 1931 4a 202 Lineage III Lm04 99.1 100 (29) ERS482548 This study Reference strain
CIP105458 USA Food 1971 4d 2 Lineage I Lm01 99.5 119 (27) ERS482549 This study Reference strain
CIP105459 USA Food 1959 4e 73 Lineage I Lm01 99.2 101 (28) ERS482550 This study Reference strain
CIP59-53 Germany Human 1953 4b 145 Lineage I Lm01 99.5 90 (29) ERS482551 This study Reference strain
CIP78-34 Denmark Human 1937 3a 98 Lineage II Lm02 99.4 120 (17) ERS482552 This study Reference strain
CIP78-35 USA Human 1956 3b 617 Lineage I Lm01 99.5 120 (28) ERS482553 This study Reference strain
CIP78-36 Unknown Unknown 1966 3c 9 Lineage II Lm01 99.9 112 (29) ERS482554 This study Reference strain
CIP78-39 United Kingdom Food Unknown 4c 71 Lineage III Lm04 99.4 120 (12) ERS482555 This study Reference strain
CIP78-43 Unknown Human 1966 7 3 Lineage I Lm01 99.5 97 (28) ERS482556 This study Reference strain
SLCC3280 Unknown Unknown Unknown 1/2a 18 Lineage III Lm03 99.6 117 (23) ERR664781 This study Reference strain
SLCC3961 Unknown Unknown Unknown 1/2a 18 Lineage III Lm03 99.7 141 (18) ERR664783 This study Reference strain
SLCC4163 Unknown Unknown Unknown 1/2a 18 Lineage III Lm03 99.8 159 (27) ERR664784 This study Reference strain
W9608 Austria Food 2008 1/2b 5 Lineage I Lm01 99.6 178 (43) ERR664383 This study Reference strain
a

Epidemiological data with results of classical typing approaches and the percentage of good cgMLST targets (of all 1,701 cgMLST targets; naming of the cgMLST targets is in accordance with L. monocytogenes reference strain EDG-e locus tags (GenBank accession number NC_003210) are given.

c

Lineage designation in accordance with Haase et al. (7).

d

JPO, jellied pork outbreak; ACCO I, acid curd cheese outbreak clone I; ACCO II, acid curd cheese outbreak clone II.

Whole-genome sequencing and assembly.

Sequencing libraries were prepared using Nextera XT chemistry (Illumina Inc., San Diego, CA, USA) for a 250-bp paired-end sequencing run on an Illumina MiSeq sequencer. Samples were sequenced to aim for minimum coverage of 100-fold using Illumina's recommended standard protocols. The resulting FASTQ files were first quality trimmed and then de novo assembled using the Velvet assembler (24) integrated in Ridom SeqSphere+ software (25) (version 2.3; Ridom GmbH, Münster, Germany). Here, reads were trimmed at their 5′ and 3′ ends until an average base quality of 30 was reached in a window of 20 bases, and the assembly was performed with Velvet version 1.1.04 using optimized k-mer size and coverage cutoff values based on the average length of contigs with >1,000 bp.

BAPS.

To determine the overall L. monocytogenes species variation, we applied a Bayesian analysis of population structure (BAPS) (26, 27). All multilocus sequence typing (MLST) data available as of 24 July 2014 (673 sequence types [STs]) were downloaded from the MLST website (14), and all allelic gene sequences per locus were multiple aligned using MUSCLE (28) and finally concatenated for each ST. The BAPS was carried out using the clustering of linked molecular data functionality. Ten runs were performed, setting an upper limit of 20 partitions. Admixture analysis was performed using the following parameters: minimum population size considered, 5; iterations, 50; number of reference individuals simulated from each population, 50; and number of iterations for each reference individual, 10.

cgMLST target gene definition.

To determine the cgMLST gene set (named MLST+ in the SeqSphere+ software), a genome-wide gene-by-gene comparison was performed using the MLST+ target definer (version 1.1) function of SeqSphere+ with default parameters. These parameters comprise the following filters to exclude certain genes of the EGD-e reference genome (GenBank accession number NC_003210.1, dated 26 March 2015) from the cgMLST scheme: a minimum length filter that discards all genes shorter than 50 bp; a start codon filter that discards all genes that contain no start codon at the beginning of the gene; a stop codon filter that discards all genes that contain no stop codon or more than one stop codon or that do not have the stop codon at the end of the gene; a homologous gene filter that discards all genes with fragments that occur in multiple copies within a genome (with identity of 90% and >100 bp overlap); and a gene overlap filter that discards the shorter gene from the cgMLST scheme if the two genes affected overlap >4 bp. The remaining genes were then used in a pairwise comparison with BLAST version 2.2.12 (parameters used were word size 11, mismatch penalty −1, match reward 1, gap open costs 5, and gap extension costs 2) with the query L. monocytogenes chromosomes. All genes of the reference genome that were common in all query genomes with a sequence identity of ≥90% and 100% overlap and, with the default parameter stop codon percentage filter turned on, formed the final cgMLST scheme; this discards all genes that have internal stop codons in >20% of the query genomes.

Evaluation of the cgMLST target gene set.

To evaluate the applicability and representativeness of the L. monocytogenes cgMLST target gene set, a total of 67 isolates (Table 2) were subsequently analyzed to determine the presence of these target genes. It was assumed that a well-defined cgMLST scheme should cover at least 95% of the cgMLST genes present in all isolates.

To extract the target genes, the default parameters were used in the SeqSphere+ software: (i) for processing options, “Ignore contigs shorter than 200 bases”; (ii) for scanning options, “Matching scanning thresholds for creating targets from assembled genomes” with “required identity to reference sequence of 90%” and “required alignment to reference sequence with 100%”; and (iii) for BLAST options, word size 11, mismatch penalty −1, match reward 1, gap open costs 5, and gap extension costs 2. In addition, the target genes were assessed for quality, i.e., the absence of frame shifts and ambiguous nucleotides. A core genome gene was considered a “good target” only if all of the above criteria were met, in which case the complete sequence was analyzed in comparison to the EGD-e reference. Alleles for each gene were assigned automatically by the SeqSphere+ software to ensure unique nomenclature. The combination of all alleles in each strain formed an allelic profile that was used to generate minimum spanning trees (MST) using the parameter “pairwise ignore missing values” during distance calculation.

In order to maintain backwards compatibility with classical L. monocytogenes MLST, sequences of the seven genes comprising the allelic profile of the MLST scheme were extracted separately from the genome sequences and queried against the L. monocytogenes MLST database in order to assign classical STs in silico.

Nucleotide sequence accession number.

All raw reads generated were submitted to the European Nucleotide Archive (http://www.ebi.ac.uk/ena/) under the study accession number PRJEB6551.

RESULTS

BAPS partition and admixture analysis based on 673 STs resulted in seven partitions (see Table S1 in the supplemental material). As BAPS partitions Lm05 and Lm06 comprised exclusively Listeria innocua species isolates of 43 STs with significant admixtures, these two partitions were excluded from further analysis. For the remaining five partitions, three (partitions Lm01, Lm02, and Lm04) were among the available NCBI RefSeq genome sequences of L. monocytogenes. To achieve complete coverage of the L. monocytogenes population, we sequenced six additional strains from Seeliger's Listeria culture collection (SLCC0717, SLCC0759, SLCC1042, SLCC3287, SLCC4771, and SLCC6263), representing the missing BAPS partitions Lm03 and Lm07 (Table 1). In total, 42 genome sequences, including L. monocytogenes strain EDG-e as reference for core genome gene definition were fed into the MLST+ target definer and resulted in 1,701 genes out of 2,867 genes of strain EGD-e (53.2% of the EDG-e strain chromosome nucleotides) (see Table S2 in the supplemental material).The cgMLST scheme was then challenged with two sets of strains: the first contained 17 serotype reference strains representing all serotypes, genetic lineages, and BAPS partitions to determine its ability to cover the whole L. monocytogenes diversity and the second consisted of 48 isolates from two published outbreaks, including eight outgroup isolates (Table 2). All 17 serotype reference strains had ≥99.1% good cgMLST targets (mean, 99.5%), and for all serotype representatives the correct MLST was obtained. Similarly, for the two outbreaks, all isolates had ≥99.3% good cgMLST targets (mean, 99.7%). The results are summarized in Table 2.

The cgMLST scheme was further evaluated for its usability in outbreak investigation, i.e., whether outbreak isolates could be attributed to the same clone, named cluster type (CT) in the context of cgMLST typing, and clearly separated from the outgroup isolates. Therefore, we determined the maximum number of differing genes within each outbreak that reflect putative microevolutionary events. To facilitate cluster investigations in the future, we finally defined the so-called CT threshold that gives the maximum number of differing alleles that are shared by the same CT. In the two retrospectively analyzed outbreaks, a jellied pork outbreak (JPO) in Austria in the year 2008 and two epidemiologically linked clusters forming the acid curd cheese (Quargel) outbreak (ACCO) in Austria, the Czech Republic, and Germany in the years 2009/2010 (Table 2), detailed analysis resulted in a maximum number of 10 differing alleles (see Table S3 in the supplemental material). cgMLST of seven human and two food isolates from the JPO correctly grouped these isolates together with a maximum of four allelic differences (Fig. 1). Outgroup isolates L2708 (ST249) and L7508 (ST4) exhibited more than 1,000 allelic differences, and reference strains F2365 and LL195 (both ST1) exhibited ≥32 allelic differences (Fig. 1). Extraction of classical MLST targets resulted in STs of all outbreak isolates that were identical to those of ST1 and confirmed the previous Sanger sequencing (Table 2).

FIG 1.

FIG 1

Minimum-spanning tree based on cgMLST allelic profiles of 9 L. monocytogenes isolates (all share ST1) from the jellied pork outbreak (21) and two outgroup isolates L2708 (ST249) and L7508 (ST4) in comparison to reference strains F2365 (GenBank accession number NC_002973) and LL195 (NC_019556) (both ST1) exhibiting the same serotype 4b. Each circle represents an allelic profile based on sequence analysis of 1,701 cgMLST target genes. The numbers on the connecting lines illustrate the numbers of target genes with differing alleles. The different groups of strains are distinguished by the colors of the circles. Closely related genotypes (≤10 allele difference) are shaded in gray. NCBI RefSeq strains are marked with an asterisk.

cgMLST of 33 isolates from the ACCO correctly identified the two different clones (ACCO I and ACCO II) that caused this outbreak (Fig. 2). Within the ACCO I clone, nine isolates were ST403 and five were ST777, a single locus bglA variant of ST403. cgMLST revealed the same dichotomy as classical MLST; the right branch of the ACCO I tree comprised all ST777 isolates (L27-09, L31-09, L32-09, L35-09, and L68-09). All outgroup isolates (MRL-13-00230, LD27-12, and Ro-015) were ST403 with at least 16 allelic differences compared to the ACCO I isolates. ACCO I isolates displayed a maximum of 10 allelic differences from each other (Fig. 2). ACCO II isolates had a maximum of two allelic differences from each other. All ACCO II isolates were correctly assigned to ST398. The three epidemiologically unrelated outgroup isolates (L38-11, 12025641, and 12025647) with an identical PFGE band pattern (data not shown) also exhibited ST398 and had ≥23 allelic differences compared to the ACCO II food isolates (Fig. 2). ACCO I and ACCO II isolates differed in >1,000 alleles from each other (Fig. 2).

FIG 2.

FIG 2

Minimum-spanning tree illustrating the phylogenetic relationship based on the cgMLST allelic profiles of 33 L. monocytogenes isolates from the outbreak associated with acid curd cheese (ACCO) (22, 23) consisting of two clones (ACCO I and ACCO II). Three outgroup isolates per outbreak (with identical PFGE profiles and serotypes) are shown in comparison to the reference strain EGD-e (GenBank accession number NC_003210; ST35). ACCO I isolates L27-09, L31-09, L32-09, L35-09, and L68-09 were ST777; the remaining isolates, including the three ACCO I outgroup isolates were ST403. ACCO II isolates, including the three ACCO II outgroup isolates were all ST398. Each circle represents an allelic profile based on sequence analysis of 1,701 genes. The numbers on the connecting lines illustrate the numbers of target genes with differing alleles. The different groups of strains are distinguished by the colors of the circles. Closely related genotypes (≤10 allele difference) are shaded in gray. The NCBI RefSeq strain is marked with an asterisk.

DISCUSSION

In outbreak situations, a rapid, accurate, and standardized classification of bacterial isolates is essential. Since its introduction in 1998, MLST has become a proof-of-principle method for sequence-based typing methods with a unique centrally curated and thereby standardized nomenclature (20). Building on these experiences nowadays, it is possible to analyze thousands of genes using next-generation sequencing, which dramatically increases discriminatory power and thereby now enables outbreak investigations (18, 19, 2932). In our study, we were able not only to show that our cgMLST typing scheme is representative for the breadth of the L. monocytogenes population with ≥99.1% successfully extracted cgMLST targets but also to differentiate outbreak from nonoutbreak isolates clearly.

The microevolutionary events within each outbreak and the CT threshold of ≤10 differences warrant further comments. Within the first outbreak, the JPO (21), very few allelic changes were detected and the maximum allelic distance within the outbreak was only four alleles. This high similarity reflects the outbreak situation without much time for intraoutbreak microevolution, because all patients belonged to one travel group and became ill after consuming contaminated jellied pork at an Austrian tavern (21). The ACCO cluster of listeriosis occurred from 2009 until 2010 in Austria, Germany, and the Czech Republic and was caused by contaminated acid curd cheese (Quargel) (22, 23). Further epidemiological and molecular outbreak investigations revealed that two different serotype 1/2a clones with distinct PFGE patterns and inlB STs were responsible for this outbreak (33). Interestingly, a recent study focusing on the comparative genomics of the two outbreak clones revealed significant differences in virulence (34). Again, cgMLST analysis corroborated these findings, and the number of differing alleles among the outbreak clones again reflected the outbreak length. Whereas the ACCO I isolates were found over a period of 8 months and up to 10 different alleles were detected; isolates of the ACCO II were found only during a 3-month period, where at maximum two different alleles were recorded. Therefore, we assume that ACCO I isolates are a representative microevolutionary model for the CT threshold determination to facilitate outbreak investigations using cgMLST. Although the software supports outbreak investigation by providing the CT, this does not release the epidemiologist from thorough investigation.

MLST and cgMLST both use alleles and not nucleotide polymorphisms as units of comparison. Irrespective of the number of nucleotide polymorphisms involved, each allelic change is numbered as a single event; i.e., an allelic change is related to at least one point mutation but can also contain several nucleotide changes. This principle covers the conflicting signals of horizontal and vertical transfer of genetic material and considers the higher frequency of recombination than point mutations in bacteria (30, 32). One major advantage of such an allele-based approach is easy storage and curating the nomenclature in a central database, which is obligatory to guarantee universal nomenclature. For classical MLST, this scenario was one of the key factors to success. However, manual curation of the current MLST databases frequently hampers the rapid use of novel allelic sequences as human intervention is necessary to assign new alleles and STs. With the software solution used here, it was already possible to automatically assign novel cgMLST alleles, after dedicated quality control of the read and assembly data. This automation is crucial as the vast amount of sequencing data is not humanly readable anymore in a reasonable time frame that is needed for effective implementation of hygiene measures during outbreaks. The immediate and automated assignment of novel alleles also enables any software user to access identical nomenclature for L. monocytogenes cgMLST typing, a prerequisite for successful interlaboratory exchange of data. In the future, it is desirable to have an open Internet-based nomenclature server that is able to be interrogated by any software or user (35). The SpaServer (http://spaserver.ridom.de), which automatically hosts the nomenclature of the Staphylococcus aureus protein A gene typing (spa) and now contains >300,000 typing entries originating from >100 countries, might serve as a blueprint for such service (36).

Our approach has one limitation. The analysis is reduced to coding regions only because the second-generation sequencing instruments currently in use produce only relatively short reads that do not assemble the frequently highly repetitive intergenic regions well, leading to faulty assemblies. Therefore, when second-generation sequencing machines are used, focusing on coding regions helps to improve the analytical quality. This might change when third-generation sequencing instruments that produce much longer reads from a single molecule are widely available, preferably as benchtop systems. Nevertheless, the current cgMLST approach will be sustainable as it will maintain backward compatibility with expansion of typing schemes to present typing as we see today with the in silico extraction of classical MLST STs from WGS data.

In conclusion, we established a highly representative cgMLST scheme for WGS-based typing of L. monocytogenes and demonstrated both a high discriminatory power and concordance to previous findings in different outbreak scenarios. The remaining challenge is to establish an Internet-based nomenclature server that can be interrogated like the current MLST servers to facilitate universal global nomenclature for any user.

Supplementary Material

Supplemental material

ACKNOWLEDGMENTS

This work was funded by the European Community's Seventh Framework Program (grant FP7/2007-2013 to D.H.) under grant agreement 278864 in the framework of the EU PathoNGenTrace project, by the Medical Faculty of the University of Münster (grant BD9817044 to AM), and by a Leonardo da Vinci Global Training grant to H.L.F.

D.H. is codeveloper of the Ridom SeqSphere+ software mentioned in the paper, a development of Ridom GmbH (Münster, Germany), which is partially owned by him. The other authors declare no conflicts of interest.

Footnotes

Supplemental material for this article may be found at http://dx.doi.org/10.1128/JCM.01193-15.

REFERENCES

  • 1.Allerberger F, Huhulescu S. 2015. Pregnancy related listeriosis: treatment and control. Expert Rev Anti Infect Ther 13:395−403. doi: 10.1586/14787210.2015.1003809. [DOI] [PubMed] [Google Scholar]
  • 2.Allerberger F, Bagó Z, Huhulescu S, Pietzka A. 2015. Listeriosis: the dark side of refrigeration and ensiling, p 249–286. In Sing A. (ed), Zoonoses—infections affecting humans and animals. focus on public health aspects Springer Verlag, Heidelberg, Germany. [Google Scholar]
  • 3.Kasper S, Huhulescu S, Auer B, Heller I, Karner F, Würzner R. 2009. Epidemiology of listeriosis in Austria. Wien Klin Wochenschr 121:113–119. doi: 10.1007/s00508-008-1130-2. [DOI] [PubMed] [Google Scholar]
  • 4.Liu D. 2006. Identification, subtyping and virulence determination of Listeria monocytogenes, an important foodborne pathogen. J Med Microbiol 55:645–659. doi: 10.1099/jmm.0.46495-0. [DOI] [PubMed] [Google Scholar]
  • 5.Graves LM, Swaminathan B. 2001. PulseNet standardized protocol for subtyping Listeria monocytogenes by macrorestriction and pulsed-field gel electrophoresis. Int J Food Microbiol 65:55–62. doi: 10.1016/S0168-1605(00)00501-8. [DOI] [PubMed] [Google Scholar]
  • 6.Murchan S, Kaufmann ME, Deplano A, de Ryck R, Struelens M, Zinn CE, Fussing V, Salmenlinna S, Vuopio-Varkila J, El Solh N, Cuny C, Witte W, Tassios PT, Legakis N, van Leeuwen W, van Belkum A, Vindel A, Laconcha I, Garaizar J, Haeggman S, Olsson-Liljequist B, Ransjo U, Coombes G, Cookson B. 2003. Harmonization of pulsed-field gel electrophoresis protocols for epidemiological typing of strains of methicillin-resistant Staphylococcus aureus: a single approach developed by consensus in 10 European laboratories and its application for tracing the spread of related strains. J Clin Microbiol 41:1574–1585. doi: 10.1128/JCM.41.4.1574-1585.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Haase JK, Didelot X, Lecuit M, Korkeala H, L. monocytogenes MLST Study Group, Achtman M. 2014. The ubiquitous nature of Listeria monocytogenes clones: a large-scale multilocus sequence typing study. Environ Microbiol 16:405–416. doi: 10.1111/1462-2920.12342. [DOI] [PubMed] [Google Scholar]
  • 8.van Belkum A, van Leeuwen W, Kaufmann ME, Cookson B, Forey F, Etienne J, Goering R, Tenover F, Steward C, O'Brien F, Grubb W, Tassios P, Legakis N, Morvan A, El Solh N, de Ryck R, Struelens M, Salmenlinna S, Vuopio-Varkila J, Kooistra M, Talens A, Witte W, Verbrugh H. 1998. Assessment of resolution and intercenter reproducibility of results of genotyping Staphylococcus aureus by pulsed-field gel electrophoresis of SmaI macrorestriction fragments: a multicenter study. J Clin Microbiol 36:1653–1659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Tenover FC, Arbeit R, Archer G, Biddle J, Byrne S, Goering R, Hancock G, Hébert GA, Hill B, Hollis R, Jarvis WR, Kreiswirth B, Eisner W, Maslow J, McDougal LK, Miller JM, Mulligan M, Pfaller MA. 1994. Comparison of traditional and molecular methods of typing isolates of Staphylococcus aureus. J Clin Microbiol 32:407–415. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Aires-de-Sousa M, Boye K, de Lencastre H, Deplano A, Enright MC, Etienne J, Friedrich A, Harmsen D, Holmes A, Huijsdens XW, Kearns AM, Mellmann A, Meugnier H, Rasheed JK, Spalburg E, Strommenger B, Struelens MJ, Tenover FC, Thomas J, Vogel U, Westh H, Xu J, Witte W. 2006. High interlaboratory reproducibility of DNA sequence-based typing of bacteria in a multicenter study. J Clin Microbiol 44:619–621. doi: 10.1128/JCM.44.2.619-621.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Jolley KA, Chan M-S, Maiden MCJ. 2004. mlstdbNet—distributed multi-locus sequence typing (MLST) databases. BMC Bioinformatics 5:86. doi: 10.1186/1471-2105-5-86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Salcedo C, Arreaza L, Alcalá B, de la Fuente L, Vázquez JA. 2003. Development of a multilocus sequence typing method for the analysis of Listeria monocytogenes clones. J Clin Microbiol 41:757–762. doi: 10.1128/JCM.41.2.757-762.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zhang W, Jayarao BM, Knabel SJ. 2004. Multi-virulence-locus sequence typing of Listeria monocytogenes. Appl Environ Microbiol 70:913–920. doi: 10.1128/AEM.70.2.913-920.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Ragon M, Wirth T, Hollandt F, Lavenir R, Lecuit M, Le Monnier A, Brisse S. 2008. A new perspective on Listeria monocytogenes evolution. PLoS Pathog 4:e1000146. doi: 10.1371/journal.ppat.1000146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Turabelidze G, Lawrence SJ, Gao H, Sodergren E, Weinstock GM, Abubucker S, Wylie T, Mitreva M, Shaikh N, Gautom R, Tarr PI. 2013. Precise dissection of an Escherichia coli O157:H7 outbreak by single nucleotide polymorphism analysis. J Clin Microbiol 51:3950–3954. doi: 10.1128/JCM.01930-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Eyre DW, Golubchik T, Gordon NC, Bowden R, Piazza P, Batty EM, Ip CL, Wilson DJ, Didelot X, O'Connor L, Lay R, Buck D, Kearns AM, Shaw A, Paul J, Wilcox MH, Donnelly PJ, Peto TE, Walker AS, Crook DW. 2012. A pilot study of rapid benchtop sequencing of Staphylococcus aureus and Clostridium difficile for outbreak detection and surveillance. BMJ Open 2:pii=e001124. doi: 10.1136/bmjopen-2012-001124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Mellmann A, Harmsen D, Cummings CA, Zentz EB, Leopold SR, Rico A, Prior K, Szczepanowski R, Ji Y, Zhang W, McLaughlin SF, Henkhaus JK, Leopold B, Bielaszewska M, Prager R, Brzoska PM, Moore RL, Guenther S, Rothberg JM, Karch H. 2011. Prospective genomic characterization of the German enterohemorrhagic Escherichia coli O104:H4 outbreak by rapid next generation sequencing technology. PLoS One 6:e22751. doi: 10.1371/journal.pone.0022751. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Maiden MC, van Rensburg MJ, Bray JE, Earle SG, Ford SA, Jolley KA, McCarthy ND. 2013. MLST revisited: the gene-by-gene approach to bacterial genomics. Nat Rev Microbiol 11:728–736. doi: 10.1038/nrmicro3093. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Schmid D, Allerberger F, Huhulescu S, Pietzka A, Amar C, Kleta S, Prager R, Preußel K, Aichinger E, Mellmann A. 2014. Whole genome sequencing as a tool to investigate a cluster of seven cases of listeriosis in Austria and Germany, 2011-2013. Clin Microbiol Infect 20:431–436. doi: 10.1111/1469-0691.12638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Maiden MC, Bygraves JA, Feil E, Morelli G, Russell JE, Urwin R, Zhang Q, Zhou J, Zurth K, Caugant DA, Feavers IM, Achtman M, Spratt BG. 1998. Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci U S A 95:3140–3145. doi: 10.1073/pnas.95.6.3140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Pichler J, Much P, Kasper S, Fretz R, Auer B, Kathan J, Mann M, Huhulescu S, Ruppitsch W, Pietzka A, Silberbauer K, Neumann C, Gschiel E, de Martin A, Schuetz A, Gindl J, Neugschwandtner E, Allerberger F. 2009. An outbreak of febrile gastroenteritis associated with jellied pork contaminated with Listeria monocytogenes. Wien Klin Wochenschr 121:149–156. doi: 10.1007/s00508-009-1137-3. [DOI] [PubMed] [Google Scholar]
  • 22.Fretz R, Sagel U, Ruppitsch W, Pietzka A, Stoger A, Huhulescu S, Heuberger S, Pichler J, Much P, Pfaff G, Stark K, Prager R, Flieger A, Feenstra O, Allerberger F. 2010. Listeriosis outbreak caused by acid curd cheese Quargel, Austria and Germany 2009. Euro Surveill 15(16):pii=19477 http://www.eurosurveillance.org/ViewArticle.aspx?ArticleId=19543. [PubMed] [Google Scholar]
  • 23.Fretz R, Pichler J, Sagel U, Much P, Ruppitsch W, Pietzka AT, Stöger A, Huhulescu S, Heuberger S, Appl G, Werber D, Stark K, Prager R, Flieger A, Karpísková R, Pfaff G, Allerberger F. 2010. Update: multinational listeriosis outbreak due to ‘Quargel,’ a sour milk curd cheese, caused by two different L. monocytogenes serotype 1/2a strains, 2009-2010. Euro Surveill 15(16):pii=19543 http://www.eurosurveillance.org/ViewArticle.aspx?ArticleId=19543. [PubMed] [Google Scholar]
  • 24.Zerbino DR, Birney E. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829. doi: 10.1101/gr.074492.107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Jünemann S, Sedlazeck FJ, Prior K, Albersmeier A, John U, Kalinowski J, Mellmann A, Goesmann A, von Haeseler A, Stoye J, Harmsen D. 2013. Updating benchtop sequencing performance comparison. Nat Biotechnol 31:294–296. doi: 10.1038/nbt.2522. [DOI] [PubMed] [Google Scholar]
  • 26.Corander J, Marttinen P, Sirén J, Tang J. 2008. Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations. BMC Bioinformatics 9:539. doi: 10.1186/1471-2105-9-539. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.van Tonder AJ, Mistry S, Bray JE, Hill DM, Cody AJ, Farmer CL, Klugman KP, von Gottberg A, Bentley SD, Parkhill J, Jolley KA, Maiden MC, Brueggemann AB. 2014. Defining the estimated core genome of bacterial populations using a Bayesian decision model. PLoS Comput Biol 10:e1003788. doi: 10.1371/journal.pcbi.1003788. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Jolley KA, Maiden MCJ. 2014. Using multilocus sequence typing to study bacterial variation: prospects in the genomic era. Future Microbiol 9:623–630. doi: 10.2217/fmb.14.24. [DOI] [PubMed] [Google Scholar]
  • 30.Kohl TA, Diel R, Harmsen D, Rothgänger J, Walter KM, Merker M, Weniger T, Niemann S. 2014. Whole-genome-based Mycobacterium tuberculosis surveillance: a standardized, portable, and expandable approach. J Clin Microbiol 52:2479–2486. doi: 10.1128/JCM.00567-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Leopold SR, Goering RV, Witten A, Harmsen D, Mellmann A. 2014. Bacterial whole genome sequencing revisited: portable, scalable and standardized analysis for typing and detection of virulence and antibiotic resistance genes. J Clin Microbiol 52:2365–2370. doi: 10.1128/JCM.00262-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Underwood AP, Jones G, Mentasti M, Fry NK, Harrison TG. 2013. Comparison of the Legionella pneumophila population structure as determined by sequence-based typing and whole genome sequencing. BMC Microbiol 13:302. doi: 10.1186/1471-2180-13-302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Pietzka AT, Stöger A, Huhulescu S, Allerberger F, Ruppitsch W. 2011. Gene scanning of an internalin B gene fragment using high-resolution melting curve analysis as a tool for rapid typing of Listeria monocytogenes. J Mol Diagn 13:57–63. doi: 10.1016/j.jmoldx.2010.11.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Rychli K, Müller A, Zaiser A, Schoder D, Allerberger F, Wagner M, Schmitz-Esser S. 2014. Genome sequencing of Listeria monocytogenes “Quargel” listeriosis outbreak strains reveals two different strains with distinct in vitro virulence potential. PLoS One 9:e89964. doi: 10.1371/journal.pone.0089964. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Aarestrup FM, Brown EW, Detter C, Gerner-Smidt P, Gilmour MW, Harmsen D, Hendriksen RS, Hewson R, Heymann DL, Johansson K, Ijaz K, Keim PS, Koopmans M, Kroneman A, Lo Fo Wong D, Lund O, Palm D, Sawanpanyalert P, Sobel J, Schlundt J. 2012. Integrating genome-based informatics to modernize global disease monitoring, information sharing, and response. Emerg Infect Dis 18:e1. doi: 10.3201/eid/1811.120453. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Mellmann A, Friedrich AW, Rosenkötter N, Rothgänger J, Karch H, Reintjes R, Harmsen D. 2006. Automated DNA sequence-based early warning system for the detection of methicillin-resistant Staphylococcus aureus outbreaks. PLoS Med 3:e33. doi: 10.1371/journal.pmed.0030033. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental material

Articles from Journal of Clinical Microbiology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES