Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2020 Dec 21;15(12):e0244227. doi: 10.1371/journal.pone.0244227

Whole genome sequencing of Clostridioides difficile PCR ribotype 046 suggests transmission between pigs and humans

Anders Werner 1,*, Paula Mölling 2, Anna Fagerström 2, Fredrik Dyrkell 3, Dimitrios Arnellos 3, Karin Johansson 2, Martin Sundqvist 2, Torbjörn Norén 2
Editor: Yung-Fu Chang4
PMCID: PMC7751860  PMID: 33347506

Abstract

Background

A zoonotic association has been suggested for several PCR ribotypes (RTs) of Clostridioides difficile. In central parts of Sweden, RT046 was found dominant in neonatal pigs at the same time as a RT046 hospital C. difficile infection (CDI) outbreak occurred in the southern parts of the country.

Objective

To detect possible transmission of RT046 between pig farms and human CDI cases in Sweden and investigate the diversity of RT046 in the pig population using whole genome sequencing (WGS).

Methods

WGS was performed on 47 C. difficile isolates from pigs (n = 22), the farm environment (n = 7) and human cases of CDI (n = 18). Two different core genome multilocus sequencing typing (cgMLST) schemes were used together with a single nucleotide polymorphisms (SNP) analysis and the results were related to time and location of isolation of the isolates.

Results

The pig isolates were closely related (≤6 cgMLST alleles differing in both cgMLST schemes) and conserved over time and were clearly separated from isolates from the human hospital outbreak (≥76 and ≥90 cgMLST alleles differing in the two cgMLST schemes). However, two human isolates were closely related to the pig isolates, suggesting possible transmission. The SNP analysis was not more discriminate than cgMLST.

Conclusion

No general pattern suggesting zoonotic transmission was apparent between pigs and humans, although contrasting results from two isolates still make transmission possible. Our results support the need for high resolution WGS typing when investigating hospital and environmental transmission of C. difficile.

Introduction

Clostridioides difficile (formerly Clostridium difficile) is a common cause of antibiotic-associated diarrhoea that causes significant mortality and morbidity as well as high costs for the healthcare system [1,2]. The spread of PCR ribotype (RT) 027 in healthcare settings at the beginning of the millennium put the spotlight on C. difficile infection (CDI) as a nosocomial disease [3]. In recent years, however, a rise in the incidence of community-associated CDI has been observed [4]. The most common methods for epidemiological investigation of C. difficile, such as PCR ribotyping and multilocus sequence typing (MLST), only offer a moderate resolution insufficient for outbreak investigation [5]. Use of core genome MLST (cgMLST) or single nucleotide polymorphisms (SNP) analysis to analyse data produced by whole genome sequencing (WGS) is more discriminate [6] and has revealed that transmission from symptomatic patients in the healthcare system only accounts for a small part of CDI cases. This suggests that asymptomatic carriage or environmental sources play an important role in the transmission of C. difficile [7].

Clostridioides difficile may also be carried by pigs and other livestock [8] and has been proposed as a cause of scouring among newborn piglets [9]. Potential transmission between livestock and humans has been described using WGS [10,11] and especially RT078 is considered to have a zoonotic potential [12]. Clusters of RT046 among humans have been reported in Poland and Chile [13,14], and this RT was one of the most frequently isolated among humans in Sweden in 2009–2013 [15]. In 2011 it was related to a hospital-based outbreak in southern Sweden [15]. During the same time, it was the only RT found among piglets at multiple breeding farms in central Sweden [16]. No zoonotic link has yet been established for RT046 and neither the clonal diversity, change over time within farms nor the relationship with human isolates is currently known.

The objective of this study was to detect possible transmission of RT046 between pig farms and human CDI cases in Sweden and investigate the RT046 diversity in the pig population using WGS. Multiple analysis strategies using two cgMLST schemes and one SNP analysis were performed.

Materials and methods

Sample selection

Forty-seven C. difficile isolates originating from pigs, the farm environment and human clinical cases were included (S1 Table). All human strains were retrieved prior to this study and PCR ribotyped as 046 in surveillance and routine programmes of the National Reference Laboratory for C. difficile either at the Public Health Agency of Sweden or at the Department of Laboratory Medicine, Clinical Microbiology, Örebro as described elsewhere [15]. The pig and environmental isolates were all PCR ribotyped as previously described [16]. Twenty isolates (ten collected in 2012 (P1–P10) and ten in 2017 (P11–P20)) originated from piglets and sows on a pig breeding farm in central Sweden (farm A). The sampling procedure has been described elsewhere [16]. Six environmental isolates from the surroundings of farm A (E1–E3, 2013 and E4–E6, 2017), two isolates collected from pigs at two other pig farms in the same county (P21, farm B, 2012 and P22, farm C, 2012) and one isolate isolated from a stream approximately 1 km from farm C (E7, 2013) were included. Soil samples were collected in sterile containers for transportation; for water samples, 100 mL was collected. At the laboratory, approximately 2 g of soil was dissolved in 5 ml of sterile 0.85% NaCl solution, 2 mL of the soil mixture or 2 mL of water including sediment was transferred to the enrichment broth used for the pig isolates and then handled according to the protocol previously described [16].

Eighteen human CDI RT046 isolates (H1–H18), isolated between 2010 and 2017, were included. Ten were chosen from isolates sent for PCR ribotyping to the National Reference Laboratory for C. difficile because of suspicion of spread C. difficile in healthcare settings. Three of these (H2–H4) were from a previously described hospital-based outbreak [15]. The remaining eight human isolates were a subset of isolates from the yearly national survey on C. difficile run by the public health agency of Sweden [15] and chosen from diverse geographical sites.

Ethical considerations

Clinical isolates were collected through routine programmes for typing of C. difficile. No patient information was collected. All other isolates had been previously collected and retrieved. Ethical approval was therefore not required.

Culture conditions and DNA extraction

Isolates had been stored frozen (-80°C) and were recovered on Fastidious Anaerobe Agar (Neogen®, Auchincruive, Scotland) with the addition of 5% horse blood. DNA from 24-hour cultures was manually extracted using the DNeasy UltraClean Microbial Kit (Qiagen, Hilden, Germany), with the following modified pre-step: one-third of a 10 μL loop of cultured bacteria was added to 300 μL PowerBead solution in a PowerBead tube (Qiagen). Thereafter, 50 μL SL solution (part of DNeasy UltraClean Microbial Kit) was added, vortexed briefly and incubated at 95°C for 5 minutes. The tubes were vortexed for 20 minutes and then centrifuged at 10,000×g for 2 minutes and then processed according to the manufacturer’s instructions and eluted in 50 μL.

Library preparation and sequencing

Sequencing libraries were constructed using the Nextera XT DNA library prep kit (Illumina®, San Diego, CA, USA), with slight modifications to the protocol in order to optimize the average fragment length. The tagmentation was performed using lower input DNA, 0.75 ng, and the tagmentation time was increased to 7 minutes and 30 seconds. Amplification of the tagmented DNA was performed using index primers and the amplified products were automatically purified with the ACSIA NGS LibPrep Edition (PRIMADIAG, Romainville, France) using AMPure XP beads (Beckman Coulter, Brea, CA, USA). The normalization was performed using the ACSIA NGS LibPrep Edition. Pooling was performed manually based on the size of the fragments, determined by a TapeStation 4200 (Agilent, Santa Clara, CA, USA), and the DNA concentration, measured using a Qubit™ fluorometer (Thermo Fisher, Waltham, MA, USA).

Sequencing was performed using Illumina MiSeq™ (Illumina®) with MiSeq™ Reagent Kit v3 (600 cycles) (Illumina®), according to the manufacturer’s instructions. More than 50-fold average coverage over the whole genome in combination with ≥97% of good cgMLST targets in Ridom™ SeqSphere+ version 6.0.2 (Ridom™ GmbH, Münster, Germany) was considered acceptable sequence quality for inclusion in all analyses (results for all isolates are listed in S1 Table). For analysis in Ridom™ SeqSphere+ the reads were de novo assembled through a pipeline using Velvet version 1.1.04 [17] using default settings and sequences were trimmed before assembly until the average Phred quality score was 30 in a window of 20 bases. One MLST gene in isolate H4 was not assembled correctly by Velvet and was therefore assembled using SPAdes version 3.12.0 [18]; this assembly was used only for the MLST analysis in Ridom™ SeqSphere+. For the 1928D platform the raw sequences were trimmed from the 3’ end until Phred quality score of 25 was reached. The 1928 platform’s cgMLST method uses a custom developed allele calling algorithm based on an alignment free k-mer approach. For novel allele extraction, and database acceptance, local assembly is used to validate gene structure using SPAdes version 3.11.1 [18].

Data analysis

The sequences were analysed by MLST and cgMLST using the software Ridom™ SeqSphere+, cgMLST scheme version 2.0 based on 2,147genes [19], and 1928D (1928 Diagnostics, Gothenburg, Sweden), cgMLST scheme based on 2,631 core genes (S2 Table) defined as genes that are present in 95% of 28 reference genomes (complete genomes available on National Center for Biotechnology Information (NCBI) by 11 July 2018). The seed genome for picking target genes belongs to strain 630 delta erm (NCBI RefSeq assembly accession number GCF_002080065.1). The 1928 cgMLST scheme was created with the purpose to be able to analyse sequence types (STs) 1, 35, 3 and 37. ST 1 corresponds to hypervirulent RT027 [3], ST 35 corresponds to RT046 which historically has been common in Sweden [15] and STs 3 and 37 are common in certain demographic groups [20]. The scheme has also been observed to perform well for STs 81, 41, 54 and 55. Further, the presence of genes in genomes was determined by a 90% similarity threshold. A limit of 97% (Ridom™ SeqSphere+) and 95% (1928D) of good cgMLST target genes was set for inclusion in the respective cgMLST analysis.

The SNP analysis was performed using the 1928D platform as a core genome alignment. 1928D’s SNP analysis pipeline uses Burrows-Wheeler Aligner version 0.7.17-r1188 [21,22] for read alignment and FreeBayes version 1.3.2 [23] for variant calling. Isolate H10 was used as the reference genome as no public reference genome was available for RT046 [24]. Regions of genomes that aligned throughout all the samples and the reference genome were used to extract SNPs from. Variant calls were required to be homozygous, quality score ≥60 and support by at least 10 high quality reads. Recombination regions were not identified or excluded. Toxin genes were identified using the 1928D platform.

Results

All 47 isolates were categorized as ST35 by both Ridom™ SeqSphere+ and 1928D. Overall, the three analyses gave similar results regarding phylogenetic relationships (Figs 13). All isolates carried both toxin A and B genes (tdcA, tdcB). None of the isolates carried any of the binary toxin genes (cdtA, cdtB).

Fig 1. Ridom™ SeqSphere+ cgMLST scheme tree.

Fig 1

Neighbour-joining tree of all isolates based on the Ridom™ SeqSphere+ core genome multilocus sequencing typing (cgMLST) scheme version 2 (2147 genes) and MLST scheme (7 genes), ignoring missing alleles, with year of isolation presented after the isolate’s name and coloured according to the source of isolation.

Fig 3. SNP analysis tree.

Fig 3

Unweighted pair group method with arithmetic mean (UPGMA) tree including all isolates based on single nucleotide polymorphisms (SNP) differences in the 1928D analysis, coloured according to the isolation source. Sequence type (multilocus sequencing typing (MLST)), the presence of toxin genes (A = tdcA, B = tdcB, CDT = binary toxin), and the proportion of genome aligned (ALN) are presented after each isolate. The SNP analysis confirmed the clustering observed in both core genome (cg) MLST analyses.

Core genome multilocus sequencing typing analysis by Ridom™ SeqSphere+

The 47 isolates differed from 0 to 186 alleles in pairwise comparisons (S3 Table) and the majority of the human isolates were not closely related to the pig or the environmental isolates (Fig 1). Conversely, 28 out of the 29 isolates from pigs or the environment were closely related, with ≤6 cgMLST alleles differing, and were therefore considered to belong to the same cgMLST cluster (Fig 4) [19]. Within this cluster the isolates were distributed independent of sampling time or location. The remaining isolate E7, isolated in 2013, originated from a stream 1 km from farm C. It differed by 72 alleles from the closest pig or environmental isolate P21 (S3 Table), but showed only 2 allelic differences from one of the human isolates H13, isolated in 2016 (Fig 4). The isolates E7 and H13 originated from different geographical regions. Two human isolates H10 and H11, isolated in 2015 and 2016, differed by one and two alleles from the closest pig isolates P21 and P22 (Fig 4), these isolates were collected as a part of the yearly national survey and not in the same county as the pig farms are situated. All other human isolates differed by ≥65 (65–186) alleles from the 28 isolates in the pig and environmental cluster (S3 Table). Isolates H2–H4 from the previously described hospital outbreak [15] were all in the same cgMLST cluster. In addition, four clinical isolates were linked to this cluster (Fig 4). Three isolates were isolated during the time of the outbreak from a neighbouring county while the last isolate was collected approximately 200 km away the year before the outbreak was discovered. Among the other human isolates only one cluster, containing the two isolates H5 and H17, was observed. These isolates were collected 5 years apart in non-neighbouring counties. The greatest difference of 186 alleles was found between isolate H15 and isolates E1/E3/P13 (S3 Table).

Fig 4. Ridom™ SeqSphere+ cgMLST analysis minimum spanning tree.

Fig 4

Minimum spanning tree of all isolates based on the Ridom™ SeqSphere+ core genome multilocus sequencing typing (cgMLST) scheme version 2 (2147 genes) and MLST scheme (7 genes) with numbers of allelic differences shown on connecting lines (distances not to scale), ignoring missing alleles. Year of isolation is presented after the isolate’s name, nodes are coloured according to the isolation source, and genetically closely related isolates (≤6 cgMLST alleles differing) are shaded grey.

Core genome multilocus sequencing typing analysis by 1928D

The isolates differed from 0 to 238 alleles in pairwise comparisons (S4 Table) and the analysis showed very similar results to the Ridom™ SeqSphere+ analysis. All isolates belonging to the same cgMLST cluster in SeqSphere+ were grouped together in the analysis by 1928D and no new isolates were found to differ by ≤6 alleles (S4 Table). All the isolates in the large pig and environmental cluster, including the two intermingling human isolates, were grouped together and had ≤5 allelic differences. All seven isolates identified as related to the hospital outbreak were grouped by the 1928D analysis (≤2 allelic differences). Similarly, isolates E7/H13 and H5/H17 were also closely connected with 3 and 2 allelic differences, respectively (S4 Table). The greatest difference of 238 alleles was found between isolate H15 and isolates E1/E3 (S4 Table).

Single nucleotide polymorphisms analysis

Good alignment quality was achieved for all isolates, with an average of 98.4% and a median of 98.6% of the genome aligned. This resulted in a core alignment of 3,666,694 core sites, corresponding to 90.3% of the reference genome H10, with 1,278 variant sites. The isolates had between 0 and 899 SNP differences (S5 Table). A visual comparison of the tree structures from both cgMLST analyses (Figs 1 and 2) and the SNP analysis (Fig 3) showed overall similarity and the main clusters were identified. The 28 closely related pig and environmental isolates with the two intermingling human isolates displayed <10 SNP differences. The cluster containing the hospital outbreak isolates showed ≤6 SNP differences, with the three outbreak isolates H2–H4 having ≤5 SNP differences. The SNP analysis also confirmed the close relationship between isolate E7/H13 and H5/H17, with 3 and 7 SNP differences, respectively. The greatest difference of 899 SNP´s was found between isolate H15 and isolate P3 (S5 Table).

Fig 2. 1928D cgMLST analysis tree.

Fig 2

Unweighted pair group method with arithmetic mean (UPGMA) tree including all isolates with distances based on the 1928D core genome multilocus sequencing typing (cgMLST) scheme (2,631 genes), ignoring missing alleles. Coloured according to the isolation source. Sequence type (MLST), the presence of toxin genes (A = tdcA, B = tdcB, CDT = binary toxin), and percentage of good cgMLST genes (CORE) are presented after each isolate.

Discussion

Our results confirm the previously reported low resolution of PCR ribotyping and MLST in C. difficile [5,10,25,26]. Core genome MLST separated the isolates into two distinct clusters, coherent with the epidemiological data available to us. One mainly consisted of isolates from the pig farms and the other which was associated with the human hospital outbreak. In both clusters, additional isolates with no known epidemiological association were included. The two different cgMLST schemes showed the same structure and very similar numbers of allelic differences within the clusters and therefore a similar performance within RT046 despite differences in the number of total alleles analysed and the algorithms used. The SNP analysis confirmed the close genetic similarity within the clusters and did not separate the isolates with known epidemiology from those with unknown epidemiology better than cgMLST. Applying the limit of >10 SNP differences for genetically distinct isolates, as suggested by Eyre et al [7], to our results corresponds to the limit of ≤6 allelic differences set for cgMLST complex types in Ridom™ SeqSphere+. Therefore, cgMLST seemed to offer the same resolution as SNP analysis. As the SNP analysis did not exclude recombination events, this indicates a low level of recombination within RT046. Similar performance of cgMLST and SNP analysis has been shown for other species [2729] and as cgMLST offers a simpler workflow and the advantage of a fixed nomenclature within the same scheme. However, it has been shown that that the method used for assembly can significantly affect results of cgMLST and even cause the counterintuitive phenomenon of introducing false differences not supported by high quality SNP analysis [30]. We believe that cgMLST should be advocated for routine analysis of genetic relatedness in C. difficile but researchers and laboratories should be aware of the need to validate also the assembler software and that unexpected results might need to be confirmed with SNP analysis. It would be beneficial if a fixed cgMLST scheme was set for C. difficile, and other species, and used by different software providers to increase our understanding of worldwide epidemiology.

The close genetic similarity of the isolates from the pig farms, both between farms and over time, indicates a conserved population of RT046 in the sampled farms. The low rate of genetic mutations, indicated by the fact that isolates separated by 5 years were inseparable both in cgMLST and in SNP analysis, was striking and possibly due to dormant spore status. Importantly, the hospital outbreak [15] cluster was clearly separated from the pig and environmental cluster. The majority of the clinical isolates were more closely related to the hospital outbreak strains than to the pig and environmental strains (Fig 1), indicating that hospital transmission is an important route of acquisition of C. difficile RT046 in Sweden. However, two human isolates (H10 and H11) were intermingled in the pig and environmental cluster, indicating transmission of C. difficile between pigs and humans. This has also been observed for other RTs [10,12]. Knetsch et al suggest a bilateral transmission [12] of C. difficile, which in this study may be supported by the environmental isolate E7 which was far more similar to a human isolate than to the pig isolates. However, the high genetic stability over time in the population of C. difficile on the farms makes it difficult to draw any conclusions on the direction and possible time of transmission in our study.

The clonal spread and low mutation frequency [7,31] in C. difficile, combined with the ability to form spores, are factors that probably contribute to the similarity of isolates both over time and geographically. This similarity also includes isolates without known epidemiological links, as illustrated in our study by isolates H5/H17 and E7/H13. Unexplained close relationships like this have also been described by others [7,10,12]. Of course, unknown common sources and ways of transmission must be considered. The risk of unknown epidemiological connections is also one of the limitations of this study. Only basic epidemiological data was available, such as year of isolation, source of the isolate and whether some of the isolates were isolated during ongoing transmission. Additionally, a selection of the routinely collected isolates was performed to achieve a high diversity in the collection of human isolates, which may have hampered the possibility to reveal local spread. Another limitation of our study is that C. difficile has a relatively small proportion of core genome compared with other species [32] and that all the analyses focus on differences in the core genome (also for the SNP analysis, although with a larger core). Muñoz et al [33] found incongruence in the phylogenetic tree topology when comparing core genome with accessory genome analysis. Since the core genome is so stable over time [32], changes in the accessory genome could in theory be more discriminating. Long read sequencing, which is now becoming more accessible, could in the future help to further increase the discriminatory power of WGS in C. difficile by making it easier to track mobile elements in the genome [24].

To conclude, this study analysed RT046 C. difficile isolated from different sources in Sweden from 2010 to 2017. No close connection between isolates from pig farms and a large hospital outbreak was found, but two human isolates (not related to the outbreak) clustered together with pig isolates, suggesting zoonotic transmission. Two different cgMLST schemes showed similar results and an SNP analysis yielded similar genetic resolution. Our study confirms the need for high-resolution typing (i.e. WGS) to map transmission of C. difficile both in hospital and in the surrounding environment.

Supporting information

S1 Table. Summary of all isolates.

All isolates presented with location of isolation, year of isolation, European Nucleotide Archive accession, average coverage, and percentage of good cgMLST targets.

(XLSX)

S2 Table. 1928 Core genes.

List of core genes of the 1928 diagnostic platform core genome multilocus sequence typing scheme, gene names are the annotated genes from the strain 630 delta erm seed genome(NCBI RefSeq assembly accession number GCF_002080065.1) and all annotated genes and their exact position can be found on the chromosomal unit of the genome (NCBI RefSeq assembly accession number NZ_CP016318.1).

(XLSX)

S3 Table. Ridom SeqSphere+ cgMLST distance matrix.

Distance matrix showing pairwise comparison of allelic differences between all 47 isolates based on the 2,147 core genes in the Ridom SeqSphere+ core genome multilocus sequence typing (cgMLST) scheme and 7 MLST genes.

(XLSX)

S4 Table. 1928 Diagnostics cgMLST distance matrix.

Distance matrix showing pairwise comparison of allelic differences between all 47 isolates based on the 2,631 core genes in the 1928 Diagnostics core genome multilocus sequence typing (cgMLST) scheme.

(XLSX)

S5 Table. SNP matrix.

Single nucleotide polymorphism matrix over all isolates based on the 1928 diagnostics analysis.

(XLSX)

Acknowledgments

We would like to thank the Public Health Agency of Sweden and especially Kristina Rizzardi and Thomas Åkerlund for access to isolates, as well as all laboratories in Sweden that contribute to the national surveillance programme for C. difficile. Our thanks also go to Andreas Matussek and colleagues at the Department of Clinical Microbiology of the County Hospital Ryhov, Jönköping, for access to isolates, and Theresa Ennefors for technical assistance.

Data Availability

All genome sequences in this study were deposited in the European Nucleotide Archive (ENA) under study accession number PRJEB34857. Accession of individual isolates is presented in S1 Table.

Funding Statement

K.J., P.M., T.N. and M.S. has received funding for this study from the Research Committee of Region Örebro County (grant OLL-674241). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of manuscript. D.A. and F.D. employees of 1928 Diagnostics. 1928 Diagnostics provided support in the form of salaries for D.A. and F.D. and made the analysis available for free, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.

References

  • 1.Wiegand PN, Nathwani D, Wilcox MH, Stephens J, Shelbaya A, Haider S. Clinical and economic burden of Clostridium difficile infection in Europe: a systematic review of healthcare-facility-acquired infection. Journal of Hospital Infection. 2012;81(1):1–14. 10.1016/j.jhin.2012.02.004 [DOI] [PubMed] [Google Scholar]
  • 2.Balsells E, Shi T, Leese C, Lyell I, Burrows J, Wiuff C, et al. Global burden of Clostridium difficile infections: a systematic review and meta-analysis. J Glob Health. 2019;9(1):010407 10.7189/jogh.09.010407 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Kuijper EJ, Coignard B, Tull P. Emergence of Clostridium difficile-associated disease in North America and Europe. Clin Microbiol Infect. 2006;12 Suppl 6:2–18. 10.1111/j.1469-0691.2006.01580.x [DOI] [PubMed] [Google Scholar]
  • 4.Lessa FC, Mu Y, Bamberg WM, Beldavs ZG, Dumyati GK, Dunn JR, et al. Burden of Clostridium difficile Infection in the United States. New England Journal of Medicine. 2015;372(9):825–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Knetsch CW, Lawley TD, Hensgens MP, Corver J, Wilcox MW, Kuijper EJ. Current application and future perspectives of molecular typing methods to study Clostridium difficile infections. Euro Surveill. 2013;18(4):20381 10.2807/ese.18.04.20381-en [DOI] [PubMed] [Google Scholar]
  • 6.Dominguez SR, Anderson LJ, Kotter CV, Littlehorn CA, Arms LE, Dowell E, et al. Comparison of Whole-Genome Sequencing and Molecular-Epidemiological Techniques for Clostridium difficile Strain Typing. J Pediatric Infect Dis Soc. 2016;5(3):329–32. 10.1093/jpids/piv020 [DOI] [PubMed] [Google Scholar]
  • 7.Eyre DW, Cule ML, Wilson DJ, Griffiths D, Vaughan A, O'Connor L, et al. Diverse sources of C. difficile infection identified on whole-genome sequencing. N Engl J Med. 2013;369(13):1195–205. 10.1056/NEJMoa1216064 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Koene MG, Mevius D, Wagenaar JA, Harmanus C, Hensgens MP, Meetsma AM, et al. Clostridium difficile in Dutch animals: their presence, characteristics and similarities with human isolates. Clin Microbiol Infect. 2012;18(8):778–84. 10.1111/j.1469-0691.2011.03651.x [DOI] [PubMed] [Google Scholar]
  • 9.Yaeger M, Funk N, Hoffman L. A survey of agents associated with neonatal diarrhea in Iowa swine including Clostridium difficile and porcine reproductive and respiratory syndrome virus. J Vet Diagn Invest. 2002;14(4):281–7. 10.1177/104063870201400402 [DOI] [PubMed] [Google Scholar]
  • 10.Knight DR, Squire MM, Collins DA, Riley TV. Genome Analysis of Clostridium difficile PCR Ribotype 014 Lineage in Australian Pigs and Humans Reveals a Diverse Genetic Repertoire and Signatures of Long-Range Interspecies Transmission. Frontiers in microbiology. 2017;7:2138–. 10.3389/fmicb.2016.02138 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Knetsch CW, Connor TR, Mutreja A, van Dorp SM, Sanders IM, Browne HP, et al. Whole genome sequencing reveals potential spread of Clostridium difficile between humans and farm animals in the Netherlands, 2002 to 2011. Euro Surveill. 2014;19(45):20954 10.2807/1560-7917.es2014.19.45.20954 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Knetsch CW, Kumar N, Forster SC, Connor TR, Browne HP, Harmanus C, et al. Zoonotic Transfer of Clostridium difficile Harboring Antimicrobial Resistance between Farm Animals and Humans. J Clin Microbiol. 2018;56(3). 10.1128/JCM.01384-17 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Obuch-Woszczatynski P, Dubiel G, Harmanus C, Kuijper E, Duda U, Wultanska D, et al. Emergence of Clostridium difficile infection in tuberculosis patients due to a highly rifampicin-resistant PCR ribotype 046 clone in Poland. Eur J Clin Microbiol Infect Dis. 2013;32(8):1027–30. 10.1007/s10096-013-1845-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Plaza-Garrido A, Barra-Carrasco J, Macias JH, Carman R, Fawley WN, Wilcox MH, et al. Predominance of Clostridium difficile ribotypes 012, 027 and 046 in a university hospital in Chile, 2012. Epidemiol Infect. 2016;144(5):976–9. 10.1017/S0950268815002459 [DOI] [PubMed] [Google Scholar]
  • 15.Rizzardi K, Norén T, Aspevall O, Mäkitalo B, Toepfer M, Johansson Å, et al. National Surveillance for Clostridioides difficile Infection, Sweden, 2009–2016. Emerging infectious diseases. 2018;24(9):1617–25. 10.3201/eid2409.171658 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Noren T, Johansson K, Unemo M. Clostridium difficile PCR ribotype 046 is common among neonatal pigs and humans in Sweden. Clin Microbiol Infect. 2014;20(1):O2–6. 10.1111/1469-0691.12296 [DOI] [PubMed] [Google Scholar]
  • 17.Zerbino DR. Using the Velvet de novo assembler for short-read sequencing technologies. Curr Protoc Bioinformatics. 2010;Chapter 11:Unit 11.5. 10.1002/0471250953.bi1105s31 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012;19(5):455–77. 10.1089/cmb.2012.0021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Bletz S, Janezic S, Harmsen D, Rupnik M, Mellmann A. Defining and Evaluating a Core Genome Multilocus Sequence Typing Scheme for Genome-Wide Typing of Clostridium difficile. J Clin Microbiol. 2018;56(6). 10.1128/JCM.01987-17 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Tian T-T, Zhao J-H, Yang J, Qiang C-X, Li Z-R, Chen J, et al. Molecular Characterization of Clostridium difficile Isolates from Human Subjects and the Environment. PLoS One. 2016;11(3):e0151964–e. 10.1371/journal.pone.0151964 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60. 10.1093/bioinformatics/btp324 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:13033997. 2013.
  • 23.Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:12073907. 2012.
  • 24.Schurch AC, Arredondo-Alonso S, Willems RJL, Goering RV. Whole genome sequencing options for bacterial strain typing and epidemiologic analysis based on single nucleotide polymorphism versus gene-by-gene-based approaches. Clin Microbiol Infect. 2018;24(4):350–4. 10.1016/j.cmi.2017.12.016 [DOI] [PubMed] [Google Scholar]
  • 25.Cairns MD, Preston MD, Hall CL, Gerding DN, Hawkey PM, Kato H, et al. Comparative Genome Analysis and Global Phylogeny of the Toxin Variant Clostridium difficile PCR Ribotype 017 Reveals the Evolution of Two Independent Sublineages. J Clin Microbiol. 2017;55(3):865–76. 10.1128/JCM.01296-16 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Eyre DW, Golubchik T, Gordon NC, Bowden R, Piazza P, Batty EM, et al. A pilot study of rapid benchtop sequencing of Staphylococcus aureus and Clostridium difficile for outbreak detection and surveillance. BMJ Open. 2012;2(3). 10.1136/bmjopen-2012-001124 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Siira L, Naseer U, Alfsnes K, Hermansen NO, Lange H, Brandal LT. Whole genome sequencing of Salmonella Chester reveals geographically distinct clusters, Norway, 2000 to 2016. Euro Surveill. 2019;24(4). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.de Been M, Pinholt M, Top J, Bletz S, Mellmann A, van Schaik W, et al. Core Genome Multilocus Sequence Typing Scheme for High-Resolution Typing of Enterococcus faecium. J Clin Microbiol. 2015;53(12):3788–97. 10.1128/JCM.01946-15 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Pearce ME, Alikhan N-F, Dallman TJ, Zhou Z, Grant K, Maiden MCJ. Comparative analysis of core genome MLST and SNP typing within a European Salmonella serovar Enteritidis outbreak. Int J Food Microbiol. 2018;274:1–11. 10.1016/j.ijfoodmicro.2018.02.023 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Eyre DW, Peto TEA, Crook DW, Walker AS, Wilcox MH. Hash-Based Core Genome Multilocus Sequence Typing for Clostridium difficile. J Clin Microbiol. 2019;58(1):e01037–19. 10.1128/JCM.01037-19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Didelot X, Eyre DW, Cule M, Ip CLC, Ansari MA, Griffiths D, et al. Microevolutionary analysis of Clostridium difficile genomes to investigate transmission. Genome biology. 2012;13(12):R118&R. 10.1186/gb-2012-13-12-r118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Knight DR, Elliott B, Chang BJ, Perkins TT, Riley TV. Diversity and Evolution in the Genome of Clostridium difficile. Clin Microbiol Rev. 2015;28(3):721–41. 10.1128/CMR.00127-14 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Munoz M, Restrepo-Montoya D, Kumar N, Iraola G, Camargo M, Diaz-Arevalo D, et al. Integrated genomic epidemiology and phenotypic profiling of Clostridium difficile across intra-hospital and community populations in Colombia. Sci Rep. 2019;9(1):11293 10.1038/s41598-019-47688-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Yung-Fu Chang

8 Jul 2020

PONE-D-20-18714

Whole genome sequencing of Clostridioides difficile PCR ribotype 046 suggests transmission between pigs and humans

PLOS ONE

Dear Dr. Werner,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Your manuscript has been reviewed by tow experts in your field. based on their comments, a major revision is needed before a decision can be made.

Please submit your revised manuscript by 4 weeks. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Yung-Fu Chang

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that you are reporting an analysis of a microarray, next-generation sequencing, or deep sequencing data set. PLOS requires that authors comply with field-specific standards for preparation, recording, and deposition of data in repositories appropriate to their field. Please upload these data to a stable, public repository (such as ArrayExpress, Gene Expression Omnibus (GEO), DNA Data Bank of Japan (DDBJ), NCBI GenBank, NCBI Sequence Read Archive, or EMBL Nucleotide Sequence Database (ENA)). In your revised cover letter, please provide the relevant accession numbers that may be used to access these data. For a full list of recommended repositories, see http://journals.plos.org/plosone/s/data-availability#loc-omics or http://journals.plos.org/plosone/s/data-availability#loc-sequencing.

3. Thank you for stating the following in the Competing Interests:

"D.A. and F.D. are employees of 1928 Diagnostics. A.W., P.M., A.F., K.J., M.S., and T.N. declare no conflict of interest."

We note that one or more of the authors are employed by a commercial company: 1928 Diagnostics.

3.1. Please provide an amended Funding Statement declaring this commercial affiliation, as well as a statement regarding the Role of Funders in your study. If the funding organization did not play a role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript and only provided financial support in the form of authors' salaries and/or research materials, please review your statements relating to the author contributions, and ensure you have specifically and accurately indicated the role(s) that these authors had in your study. You can update author roles in the Author Contributions section of the online submission form.

Please also include the following statement within your amended Funding Statement.

“The funder provided support in the form of salaries for authors [insert relevant initials], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.”

If your commercial affiliation did play a role in your study, please state and explain this role within your updated Funding Statement.

3.2. Please also provide an updated Competing Interests Statement declaring this commercial affiliation along with any other relevant declarations relating to employment, consultancy, patents, products in development, or marketed products, etc. 

Within your Competing Interests Statement, please confirm that this commercial affiliation does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to  PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests) . If this adherence statement is not accurate and  there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.

Please include both an updated Funding Statement and Competing Interests Statement in your cover letter. We will change the online submission form on your behalf.

Please know it is PLOS ONE policy for corresponding authors to declare, on behalf of all authors, all potential competing interests for the purposes of transparency. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision-making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests: http://journals.plos.org/plosone/s/competing-interests

4. We note that you have included the phrase “data not shown” in your manuscript. Unfortunately, this does not meet our data sharing requirements. PLOS does not permit references to inaccessible data. We require that authors provide all relevant data within the paper, Supporting Information files, or in an acceptable, public repository. Please add a citation to support this phrase or upload the data that corresponds with these findings to a stable repository (such as Figshare or Dryad) and provide and URLs, DOIs, or accession numbers that may be used to access these data. Or, if the data are not a core part of the research being presented in your study, we ask that you remove the phrase that refers to these data.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: N/A

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: PONE-D-20-18714

This paper compares genome sequences from C. difficile ribotype 046 isolates collected from pigs and humans by using SNP typing and two different schemes for cgMLST, one of which is novel. While some of the results may be interesting, the level of methodological detail given is insufficient to fully assess the conclusions presented.

Line 124: "For analysis in Ridom Seqsphere .... the reads were de novo assembled ... using Velvet ..." Please provide the version and parameter settings of the assembler software.

Line 127: How were the reads assembled for analysis on the 1928D platform?

Line 137: A "cgMLST scheme based on 2,631 core genes" apparently was used for analysis with the 1928D software. This cgMLST scheme appears to be novel. However, the structure of this cgMLST scheme is not reported anywhere in the manuscript, nor is any literature reference provided. Clearly, without more detailed information on this method, the results based on this new cgMLST which are presented here are of little use to the reader. At the least, information on the genes included in the scheme and the exact positions of gene sequences in relation to a publically available reference genome sequence must be provided.

It is also not clear if the novel cgMLST scheme was evaluated in any way, e.g. by applying it to a reference set of genomes or strains.

Line 143: "The SNP analysis was performed using the 1928D platform" -- Unclear, which strategy was used for SNP analysis. Please provide transparent detail on the algorithm/software and parameters used.

Line 146: "Variants were called at 10x minimum coverage" -- Such low coverage is usually not considered sufficient for calling SNPs. Why was a different minimum coverage set in comparison to SeqSphere cgMLST analysis (50x, see Line 129)?

Line 147: Genomic regions affected by recombination may confound SNP-based phylogenetic analyses and therefore commonly are excluded. Why did the authors choose not to do so?

Line 155, Figure 1: The structure within the main clade, including all the pig isolates, cannot be discerned.

Line 159, Figure 2: It is difficult to compare the trees in Figure 1 (SeqSphere) and Figure 2 (1982D), because the former was drawn in a circular format and the other was drawn rectangular. Please provide both trees in the same format.

Line 233: "very similar numbers of allelic differences within the clusters" -- Are the numbers of allelic differences shown anywhere in the manuscript? This would be interesting.

Line 235: "differences in the algorithms used" [by SeqSphere and 1928D] -- The algorithm used by 1928D needs to be provided.

Line 242: "Similar performance of cgMLST and SNP analysis" -- Is this really true? Eyre et al. (J. Clin. Microbiol. 58: 01037-19) recently showed that cgMLST was inferior to SNP analysis for identifying closely related C. difficile isolates.

Line 257: "two human isolates were intermingled in the pig and environmental cluster" -- Where and when had these human specific isolates been collected? Had they been part of the national survey, with no connection to the area around the pig farms? Please provide this information in the manuscript.

The sequence data is not available at ENA under the accession number indicated.

Reviewer #2: In the present manuscript authors describe results of WGS comparison of 47 strains obtained from humans, pigs and the farm environment, using two different approaches, cgMLST, with two different schemes, and SNV analysis.

It is an interesting paper, but I do have some comments.

Add more detailed description of cgMLST on 1928D platform. How were the sequences assembled, which assembly software was used.

There in an updated Ridom SeqSphere cgMLST scheme that was released recently, please re-run the analysis with the updated scheme.

Lines 126 and 127: when were sequences trimmed, before or after assembly?

Add more info on what parameters were used for assessing the SNV. How were the genomes assembled, were there any quality trimming applied before?

Figure 1 and 2. Add number of different alleles between genomes. Also, mark the CC (at least for SeqSphere I know that ST that differ in less than 6 alleles can be shaded).

To maybe improve the resolution you could also include part (core) of the accessory genome for the cgMLST analysis – this can be done using SeqSphere.

Line 53-54: change the sentence…in this paper Eyre et al. did not unequivocally show that transmission outside the healthcare system in an important way for acquisition of CD. They suggested that there is a large, genetically diverse reservoir outside the hospital setting.

Add info on ENA accession number to Materials/Method section.

Line 174 and 204: Change …by 0 and 204… to …from 0 to 204….

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Dec 21;15(12):e0244227. doi: 10.1371/journal.pone.0244227.r002

Author response to Decision Letter 0


18 Oct 2020

Response to reviewer and academic editor.

Thank you for reviewing our manuscript and for the constructive comments.

We now hope we have corrected the style requirements we missed the first time around.

Regarding the availability of data, all our data have always been deposited in the EMBL Nucleotide Sequence Database (ENA) but held private until we felt that we were close to being published and we have now made all sequence data public available. The sequences can as stated in the manuscript be found under project accession number PRJEB34857 (individual sequence accession numbers available in S1 Table).

Thank you for clarifying how you want the competing interest and funding statement to be expressed. We would like to adjust them to the following:

Funding statement:

K.J., P.M., T.N. and M.S. has received funding for this study from the Research

Committee of Region Örebro County (grant OLL-674241). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of manuscript.

D.A. and F.D. employees of 1928 Diagnostics. 1928D diagnostics provided support in the form of salaries for D.A. and F.D. and made the analysis available for free, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.

Competing interest statement.

D.A. and F.D. are employees of 1928 Diagnostics. This does not alter our adherence to PLOS ONE policies on sharing data and materials. A.W., P.M., A.F., K.J., M.S., and T.N. declare no competing interests.

To address that PLOS ONE do not allow data not shown as reference we have added two more distances matrixes as supplementary files (S3 Table and S4 Table) covering all the data formerly referred to as data not shown.

Below we have responded to each question raised by the reviewers, for clarity the reviewer’s questions are marked in bold lettering.

Reviewer #1:

This paper compares genome sequences from C. difficile ribotype 046 isolates collected from pigs and humans by using SNP typing and two different schemes for cgMLST, one of which is novel. While some of the results may be interesting, the level of methodological detail given is insufficient to fully assess the conclusions presented.

Additional methodological information has been inserted into text as well as supplementary items for better understanding. Care has also been taken not to make the methodological issues comparing the different methods to overcast the findings of how the isolates relate to each other.

Line 124: "For analysis in Ridom Seqsphere .... the reads were de novo assembled ... using Velvet ..." Please provide the version and parameter settings of the assembler software.

We have changes the sentence to: “For analysis in Ridom™ SeqSphere+ the reads were de novo assembled through a pipeline using Velvet version 1.1.04 [17] using default settings and sequences were trimmed before assembly until the average Phred quality score was 30 in a window of 20 bases.”

Line 127: How were the reads assembled for analysis on the 1928D platform?

We have added the following text to clarify the assembly in the 1928D platform “The 1928 platform’s cgMLST method uses a custom developed allele calling algorithm based on an alignment free k-mer approach. For novel allele extraction, and database acceptance, local assembly is used to validate gene structure using SPAdes version 3.11.1 [18].”

Line 137: A "cgMLST scheme based on 2,631 core genes" apparently was used for analysis with the 1928D software. This cgMLST scheme appears to be novel. However, the structure of this cgMLST scheme is not reported anywhere in the manuscript, nor is any literature reference provided. Clearly, without more detailed information on this method, the results based on this new cgMLST which are presented here are of little use to the reader. At the least, information on the genes included in the scheme and the exact positions of gene sequences in relation to a publically available reference genome sequence must be provided. It is also not clear if the novel cgMLST scheme was evaluated in any way, e.g. by applying it to a reference set of genomes or strains.

We have added a supplementary table (S2 Table) that contains a list of all core genes in the 1928D cgMLST scheme with gene names derived from the strain 630 delta erm seed genome. And added the text: “The seed genome for picking target genes belongs to strain 630 delta erm (NCBI RefSeq assembly accession number GCF_002080065.1). The 1928 cgMLST scheme was created with the purpose to be able to analyse sequence types (STs) 1, 35, 3 and 37. ST 1 corresponds to hypervirulent RT027 [3], ST 35 corresponds to RT046 which historically has been common in Sweden [15] and STs 3 and 37 are common in certain demographic groups [20]. The scheme has also been observed to perform well for STs 81, 41, 54 and 55.” This text also contains a new reference

Line 143: "The SNP analysis was performed using the 1928D platform" -- Unclear, which strategy was used for SNP analysis. Please provide transparent detail on the algorithm/software and parameters used.

We have added the text “1928D’s SNP analysis pipeline uses Burrows-Wheeler Aligner version 0.7.17-r1188 [21, 22] for read alignment and FreeBayes version 1.3.2 [23] for variant calling.” and references for the software used.

Line 146: "Variants were called at 10x minimum coverage" -- Such low coverage is usually not considered sufficient for calling SNPs. Why was a different minimum coverage set in comparison to SeqSphere cgMLST analysis (50x, see Line 129)?

We have changed some of the wording in the sentence in line 146 to clarify, but we maintain that it is sufficient for variant calling. In the reference referred to by reviewer #1 in a comment on line 242 they used a minimum of 5 reads to call a SNP, and Bush et al. (Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism–calling pipelines, GigaScience, Volume 9, Issue 2, February 2020) also used a minimum of 5 reads, a lover threshold than ours. The minimum 50x average coverage referred to in line 129 is an average coverage for the whole genome and is used as a measurement on over all sequence quality and the minimum 10x coverage is the coverage for a specific base to ensure that we only have high quality SNP included in our analysis. To clarify this we also have changed the wording in line 129.

Line 147: Genomic regions affected by recombination may confound SNP-based phylogenetic analyses and therefore commonly are excluded. Why did the authors choose not to do so?

Manual analysis of recombination filtering results using Gubbins (v2.3.4) resulted in minimal variant differences that neither affected topology of the tree nor the relations between samples, the evaluation used the analysis results coming out directly from the 1928 platform.

Line 155, Figure 1: The structure within the main clade, including all the pig isolates, cannot be discerned.

Since the pig isolates are so closely related it is hard to visualise them in a three structure with all isolates included, it is for this reason that Fig 4 is a minimum spanning tree. We tried many different options and found that minimum spanning tree was the best way to present the data. To hopefully clarify the relationship between the isolate we also made figure 4 to include all isolates when we remade the figure in version 2 of the Ridom SeqSphere cgMLST scheme.

Line 159, Figure 2: It is difficult to compare the trees in Figure 1 (SeqSphere) and Figure 2 (1982D), because the former was drawn in a circular format and the other was drawn rectangular. Please provide both trees in the same format.

We have tried both configurations of the tree for figure 1 and we feel that for the Ridom SeqSphere analysis the circular is easiest to read and shows the clusters in relation to each other in a good way. Unfortunately, it is not possible to make figure 2 circular. We feel that making each figure as easy to read as possible is preferable even with the small disadvantage of having the trees configured in different ways.

Line 233: "very similar numbers of allelic differences within the clusters" -- Are the numbers of allelic differences shown anywhere in the manuscript? This would be interesting.

All pairwise comparisons for all three analyses are now available in S3 Table, S4 Table and S5 Table

Line 235: "differences in the algorithms used" [by SeqSphere and 1928D] -- The algorithm used by 1928D needs to be provided.

Please see our answers to line 127 and 143

Line 242: "Similar performance of cgMLST and SNP analysis" -- Is this really true? Eyre et al. (J. Clin. Microbiol. 58: 01037-19) recently showed that cgMLST was inferior to SNP analysis for identifying closely related C. difficile isolates.

The sentence in line 242 refers to other species than C. difficile but we have read the article referred to and found it a good reference to include in our article in the discussion section. It highlights some of the difficulties in analysing whole genome sequencing data and we think it strengthens our decision to include more than one analysis scheme in the article including a SNP analysis.

Line 257: "two human isolates were intermingled in the pig and environmental cluster" -- Where and when had these human specific isolates been collected? Had they been part of the national survey, with no connection to the area around the pig farms? Please provide this information in the manuscript.

The isolates are discussed in the results section to make the connection clearer for the reader we have named the isolates in the discussion section. The time of isolation was already named in the results section but we have also added the following text “these isolates were collected as a part of the yearly national survey and not in the same county as the pig farms are situated”.

The sequence data is not available at ENA under the accession number indicated.

This has been corrected as described above as a response to the editor’s questions.

Reviewer #2:

Add more detailed description of cgMLST on 1928D platform. How were the sequences assembled, which assembly software was used.

Please see our answer to reviewer #1 Line 137.

There in an updated Ridom SeqSphere cgMLST scheme that was released recently, please re-run the analysis with the updated scheme.

We have now reanalysed the Ridom Seqsphere analysis with version 2 of the cgMLST scheme and updated all number of allelic differences in the manuscript and updated figures 1 and 4.

Lines 126 and 127: when were sequences trimmed, before or after assembly?

The sequences were trimmed before assembly, this information has been added to the manuscript.

Add more info on what parameters were used for assessing the SNV. How were the genomes assembled, were there any quality trimming applied before?

Please see our answer to reviewer #1 Line 143

Figure 1 and 2. Add number of different alleles between genomes. Also, mark the CC (at least for SeqSphere I know that ST that differ in less than 6 alleles can be shaded).

To configure the figures so they are easy to read in the article is challenging and we have tried multiple configurations in Ridom SeqSphere and found that adding the requested information makes them harder to interpret. All allelic differences can be found in the now added tables S3 and S4. In Fig 4 we display the number of allelic differences in the minimum spanning trees for the Ridom SeqSphere+ analysis and the CC is shaded, we believe that from this figure the most important information can be extracted.

To maybe improve the resolution you could also include part (core) of the accessory genome for the cgMLST analysis – this can be done using SeqSphere.

We appreciate the suggestion and have looked in to it but we believe this to be outside the scope of this article since it would make the focus even more on compering different typing methods and our primary focus is intended to be the relationship between the isolates. We have tried analysing the isolates with the part accessory genome scheme in Ridom SeqSphere and it did not change any of our findings.

Line 53-54: change the sentence…in this paper Eyre et al. did not unequivocally show that transmission outside the healthcare system in an important way for acquisition of CD. They suggested that there is a large, genetically diverse reservoir outside the hospital setting.

We have changed the sentence to” and has revealed that transmission from symptomatic patients in the healthcare system only accounts for a small part of CDI cases, this suggests that asymptomatic carriage or environmental sources play an important role in the transmission of C. difficile CDI [7]”

Add info on ENA accession number to Materials/Method section.

As we have interpreted the submission guidelines the data availability section will be typeset with the article and to repeat the accession number in the method section feels unnecessary so at the moment we have made no changes. But if it is required to do so we will gladly add a paragraph with the accession number.

Line 174 and 204: Change …by 0 and 204… to …from 0 to 204….

We have changed this according to the reviewer’s suggestion.

We would like to thank the reviewers and editor for their input, we feel they have greatly improved our manuscript.

Yours sincerely

Anders Werner

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Yung-Fu Chang

6 Nov 2020

PONE-D-20-18714R1

Whole genome sequencing of Clostridioides difficile PCR ribotype 046 suggests transmission between pigs and humans

PLOS ONE

Dear Dr. Werner,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Your revised manuscript has been re-reviewed by the original reviewers and a major revision is still suggested.

Please submit your revised manuscript by 3 weeks. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Yung-Fu Chang

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: N/A

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: My most important point of critique of this manuscript is that it is based on a novel algorithm for cgMLST analysis that is insufficiently explained. While the authors have now added a supplementary table (S2_Table.xlsx) in response to my previous request, this table merely provides a list of gene names, which is useless to any reader without information on the precise positions of the sequence stretches that got evaluated. I had requested the same information in my previous review (see below). It seems that the authors wish to keep this information proprietary for any reason, but in that case the method cannot be reproduced by any other researchers and the results cannot be usefully compared to those from previously published cgMLST methods for C. difficile.

Previous comment:

Line 137: A "cgMLST scheme based on 2,631 core genes" apparently was used for

analysis with the 1928D software. This cgMLST scheme appears to be novel.

However, the structure of this cgMLST scheme is not reported anywhere in the

manuscript, nor is any literature reference provided. Clearly, without more detailed

information on this method, the results based on this new cgMLST which are presented

here are of little use to the reader. At the least, information on the genes included in the

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation

scheme and the exact positions of gene sequences in relation to a publically available

reference genome sequence must be provided. It is also not clear if the novel cgMLST

scheme was evaluated in any way, e.g. by applying it to a reference set of genomes or

strains.

Previous response:

We have added a supplementary table (S2 Table) that contains a list of all core genes

in the 1928D cgMLST scheme with gene names derived from the strain 630 delta erm

seed genome.

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Dec 21;15(12):e0244227. doi: 10.1371/journal.pone.0244227.r004

Author response to Decision Letter 1


25 Nov 2020

Thank you for reviewing our manuscript again.

If we have understood the critique correctly you want us to specify the exact position in the reference genome of each gene, thereby providing the reader with the sequences of all genes used in the cgMLST scheme. The gene names we have enclosed in S2 Table are the annotated gene names from the reference genome, so the exact position for each gene is publicly available in NCBI. It has never been our intention to keep this information proprietary. Contrary, the reasoning behind providing the annotated gene names is to take advantage of a publicly available nomenclature both to reduce the amount of information needed to be recited in the article and to make everything as transparent as possible.

To make this clearer to the reader vi have changed the supporting information legend of S2 Table to “S2 Table. 1928 Core genes. List of core genes of the 1928 diagnostic platform core genome multilocus sequence typing scheme, gene names are the annotated genes from the strain 630 delta erm seed genome(NCBI RefSeq assembly accession number GCF_002080065.1) All annotated genes and their exact positions can be found on the chromosomal unit of the genome (NCBI RefSeq assembly accession number NZ_CP016318.1).” We added the reference number for the chromosomal unit of the genome since it has to be selected to be able to browse through the genes.

If we have misunderstood the critique in any way or there are any further unclarities we will gladly try our best to explain our work.

Kind regards

Anders Werner

Decision Letter 2

Yung-Fu Chang

7 Dec 2020

Whole genome sequencing of Clostridioides difficile PCR ribotype 046 suggests transmission between pigs and humans

PONE-D-20-18714R2

Dear Dr. Werner,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Yung-Fu Chang

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Acceptance letter

Yung-Fu Chang

11 Dec 2020

PONE-D-20-18714R2

Whole genome sequencing of Clostridioides difficile PCR ribotype 046 suggests transmission between pigs and humans

Dear Dr. Werner:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Yung-Fu Chang

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Summary of all isolates.

    All isolates presented with location of isolation, year of isolation, European Nucleotide Archive accession, average coverage, and percentage of good cgMLST targets.

    (XLSX)

    S2 Table. 1928 Core genes.

    List of core genes of the 1928 diagnostic platform core genome multilocus sequence typing scheme, gene names are the annotated genes from the strain 630 delta erm seed genome(NCBI RefSeq assembly accession number GCF_002080065.1) and all annotated genes and their exact position can be found on the chromosomal unit of the genome (NCBI RefSeq assembly accession number NZ_CP016318.1).

    (XLSX)

    S3 Table. Ridom SeqSphere+ cgMLST distance matrix.

    Distance matrix showing pairwise comparison of allelic differences between all 47 isolates based on the 2,147 core genes in the Ridom SeqSphere+ core genome multilocus sequence typing (cgMLST) scheme and 7 MLST genes.

    (XLSX)

    S4 Table. 1928 Diagnostics cgMLST distance matrix.

    Distance matrix showing pairwise comparison of allelic differences between all 47 isolates based on the 2,631 core genes in the 1928 Diagnostics core genome multilocus sequence typing (cgMLST) scheme.

    (XLSX)

    S5 Table. SNP matrix.

    Single nucleotide polymorphism matrix over all isolates based on the 1928 diagnostics analysis.

    (XLSX)

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    All genome sequences in this study were deposited in the European Nucleotide Archive (ENA) under study accession number PRJEB34857. Accession of individual isolates is presented in S1 Table.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES