Skip to main content
Data in Brief logoLink to Data in Brief
. 2022 Jul 8;43:108446. doi: 10.1016/j.dib.2022.108446

Pediococcus pentosaceus IMI 507025 genome sequencing data

Ivana Nikodinoska a, Jenny Makkonen b, Daniel Blande b, Colm Moran c,
PMCID: PMC9294475  PMID: 35864877

Abstract

The genome sequence data for the pickled cucumbers isolate, Pediococcus pentosaceus IMI 507025, is reported. The raw reads and analysed genome reads were deposited at NCBI under Bioproject with the accession number PRJNA814992. The number of contigs before and after trimming were 17 and 12 contigs, respectively. The total size of the genome was 1,795,439 bp containing 1,811 total genes, of which 1,751 were coding sequences. IMI 507025 identity was determined via average nucleotide identity (ANI), obtaining an identity value of 99.5994% between IMI 507025 and the type strain P. pentosaceus ATCC 33316, identifying the strain as P. pentosaceus. Screening for the antimicrobial resistance (AMR) and virulence genes in the genome of IMI 507025 showed no hits, confirming the safety of the tested strain. Presence of plasmids was not found.

Keywords: Microbial genome sequencing, Lactic acid bacteria, Antimicrobial resistance, Search for genes of concern

Specifications Table

Subject Microbiology
Specific subject area Microbial genomics
Type of data Raw reads and analysed genome of Pediococcus pentosaceus IMI 507025
How the data were acquired Illumina NovaSeq 6000, Unicycler v 0.4.8, PGAP v6.0, NCBI Bacterial AMR Reference Gene Database v. 2021–06–01.1, ResFinder, Virulence Factor Database (VFDB), PlasmidFinder.
Data format Raw
Analysed
Description of data collection Pediococcus pentosaceus IMI 507025 was isolated from pickled cucumbers. The DNA extracted from pure culture was sequenced with NovaSeq 6000 Platform (Illumina) to obtain information about the strain identity and safety.
Data source location Institution: Alltech Inc.
City/Town/Region: Nicholasville, Kentucky
Country: USA
Data accessibility Bioproject Accession Number: PRJNA814992
NCBI GenBank Accession Number: JALBYI000000000
NCBI SRA Accession Number: SRR18325428

Value of the Data

  • Members of the genus Pediococcus are highly associated with the various types of forage crops microbiota, having an important impact on the fermentation characteristics of silage. A homofermentative Pediococcus pentosaceus isolates with a safe trait, as absence of AMR genes, could be successfully used in silage fermentation improvement.

  • The data herein reported, relate to the Pediococcus pentosaceus IMI 507025 safety characteristics and strain identity.

  • The sequencing data could be used for Pediococcus comparative genomics, and for evaluation of genes of concern among lactic acid bacteria members.

1. Data Description

The whole genome sequencing data of Pediococcus pentosaceus (P. pentosaceus) IMI 507025, the taxonomic identification data, genome screening for AMR, virulence factors and plasmids related data are described.

The whole genome sequencing coverage was 1020x. The annotated assembly consisted of 12 contigs with a total length of 1,794,629 bp, a GC% of 37.03, N50 contig of 354,566 bp. The annotation produced 1811 genes, of which 1751 were coding sequences, 53 RNA genes (2 ribosomal RNAs, 47 transfer RNA and 4 miscellaneous RNA) and 7 pseudogenes.

The genome comparison showed the best hit (low distance and high matching) to Pediococcus pentosaceus CGMCC 7049 (Table 1).

Table 1.

Taxonomic identification of IMI 507025 via MinHash.

Strain Mash distance Statistically significant differences Matching Hashes* Assembly accession
Pediococcus pentosaceus CGMCC 7049 0.00671909 0.00 326/400 GCF_000708635.1
Pediococcus pentosaceus IE-3 0.00847159 0.00 310/400 GCF_000285875.1
Pediococcus pentosaceus ATCC 25745 0.0129347 0.00 274/400 GCF_000014505.1 (complete)
Pediococcus pentosaceus SL4 0.0147554 0.00 261/400 GCF_000496265.1 (complete)
Fusobacterium sp. CAG:649 0.195209 1.02262e-15 9/400 GCF_000433695.1

Selected genomes with upper threshold of 400 hashes, available in the NCBI database, were used for comparison purposes

The similarity between two genome sequences was identified via average nucleotide identity (ANI) using OrthoANI algorithm [1]. Usually the ANI result (%) is approximately (1 – Mash distance) x 100 (see Table 1).

In the Table 2. are summarised the genomes that were included in the comparison study via orthoANI.

Table 2.

Genome assemblies included in the OrthoANI and Roary calculations.

Strain Assembly Accession Contigs Size (bp) GC%
Pediococcus pentosaceus ATCC 33316 (T) GCF_004354495.1 19 1,764,498 37.27
Pediococcus pentosaceus ATCC 25745 GCF_000014505.1 1 1,832,387 37.36
Pediococcus pentosaceus SL001 GCF_007923185.1 2 1,919,175 37.44
Pediococcus pentosaceus SL4 GCF_000496265.1 1 1,789,138 37.30
Pediococcus pentosaceus SRCM 100892 GCF_002173535.1 7 2,002,472 37.30
Pediococcus pentosaceus KCCM 40703 GCF_002982155.1 1 1,758,362 37.20
Pediococcus pentosaceus SRCM 100194 GCF_002202155.1 3 1,869,792 37.38
Pediococcus pentosaceus SS1–3 GCF_003429405.1 3 1,844,764 37.28
Pediococcus pentosaceus wikim20 GCF_001411765.2 4 1,830,629 37.29
Pediococcus pentosaceus JQI-7 GCF_006770865.1 1 1,732,880 37.25
Pediococcus pentosaceus CGMCC 7049 GCA_000708635.1 8 1,751,049 37.30
Pediococcus pentosaceus IE-3 GCA_000285875.1 91 1,802,376 37.22
Pediococcus parvulus strain NBRC 100673 GCA_007990205.1 111 1,968,745 38.62

In Table 3. is reported the outcome from the comparison of IMI 507025 with closely related P. pentosaceus strains. The pairwise comparisons showed 99.6397% identity between IMI 507025 and P. pentosaceus CGMCC 7049 genomes. The ANI match with the P. pentosaceus type strain ATCC 33316 was 99.5994%. The species identification cut off is set as 95% [2].

Table 3.

OrthoANI (%) calculations between IMI 507,025 and selected Pediococcus strains.

IE-3 CGMCC 7049 SRCM 100892 NBRC 100673 ATCC 25745 SL4 WIKIM20 SRMC 100194 KCCM 40703 SS1–3 ATCC 33316 JQI-7 SL001 IMI 507025
IE-3 100 99.5855 98.6183 69.6613 98.7894 98.8043 99.006 98.803 98.8491 98.8682 99.8011 98.9924 98.8655 99.5911
CGMCC 7049 99.5855 100 98.6299 69.7265 98.941 98.7646 98.9261 98.7455 98.7238 98.8718 99.6489 98.928 98.7695 99.6397
SRCM 100892 98.6183 98.6299 100 70.0009 98.5077 98.5249 98.8419 98.636 98.694 98.3403 98.8348 98.7515 98.5201 98.6664
NBRC 100673 69.6759 69.7265 70.0009 100 69.4997 69.5623 69.7784 69.8436 69.4379 69.6443 69.3328 69.6737 69.7063 69.3435
ATCC 25745 98.7894 98.941 98.5077 69.496 100 98.701 99.0151 98.7317 99.0686 98.6461 99.0889 98.8409 98.8845 98.8932
SL4 98.8043 98.7646 98.5249 69.5623 98.701 100 98.8113 98.5947 98.7895 98.5489 98.9992 99.0609 98.7396 98.9198
WIKIM20 99.006 98.9261 98.8419 69.7784 99.0151 98.8113 100 99.8005 99.3719 98.6075 99.072 98.9113 98.9053 98.9745
SRMC 100194 98.803 98.7455 98.636 69.8436 98.7317 98.5947 99.8005 100 99.1816 98.5821 99.0155 98.8542 98.8621 98.8483
KCCM 40703 98.8491 98.7238 98.694 69.4379 99.0686 98.7895 99.3719 99.1816 100 98.8299 98.9772 98.9322 98.9049 98.8254
SS1–3 98.8682 98.8718 98.3403 69.6399 98.6461 98.5489 98.6075 98.5821 98.8299 100 99.0075 98.9297 98.7307 98.8575
ATCC 33316 99.8011 99.6489 98.8348 69.3328 99.0889 98.9992 99.072 99.0155 98.9772 99.0075 100 98.9963 98.9475 99.5994
JQI-7 98.9924 98.928 98.7515 69.6737 98.8409 99.0609 98.9113 98.8542 98.9322 98.9297 98.9963 100 99.7332 98.8202
SL001 98.8655 98.7695 98.5201 69.7062 98.8845 98.7396 98.9053 98.8621 98.9049 98.7307 98.9475 99.7332 100 98.7146
IMI 507025 99.5911 99.6397 98.6664 69.3296 98.8932 98.9198 98.9745 98.8483 98.8254 98.8575 99.5994 98.8202 98.7146 100

The threshold values for AMR and virulence genes screening, were considered the once proposed by the European Food Safety Authority (EFSA), namely sequences with above 80% identity and 70% coverage should be considered for further analysis [2]. The genome searches revealed no AMR genes nor virulence or pathogenicity factors presence in the sequenced genome of the strain IMI 507025. The bioinformatic analysis did not identified putative plasmids in the sequenced data.

Based on the data presented above, the strain IMI 507025 was unequivocally identified as Pediococcus pentosaceus. In addition, the safety-related data described, confirm that the strain P. pentosaceus IMI 507025 is safe and did not raise safety concerns.

2. Experimental Design, Materials and Methods

2.1. DNA Extraction

For the DNA extraction, 10 mL MRS Broth cultures were incubated aerobically at +30 ⁰C for 16–17 h. The cells were centrifuged (1780 rcf, 10 min) and the pellet was used for DNA extraction according to previously described procedure [8].

2.2. Whole Genome Sequencing, Assembly, and Annotation

The DNA was sequenced using Illumina NovaSeq 6000, 150 bp paired-end library, sequencing technology at Eurofins genomics (Constance, Germany), obtaining 6,688,243 reads. Trimmomatic v.0.38.1 [3] was used for trimming the reads and Unicycler v 0.4.8 [4] for assembling. The average reference coverage (total number of bases / assembly length) of the assembly was 1020-fold. Gene predictions and functional annotations were performed using NCBI Prokaryotic Genome Annotation Pipeline v6.0 [5].

2.3. Taxonomic Identification

Mash using MinHash v. 0.1.1 [6] and OrthoANI v. 1.40 [7] were used for strain identification via alignment-free genome distance estimation and calculating of average nucleotide identity.

2.4. Screening for AMR and Virulence Factors Related Genes

Two databases were used for AMR genes search, the NCBI Bacterial AMR Reference Gene database (v. 2021–06–01.1) and the ResFinder database (downloaded on 20.04.2021). Screening for virulence factors was performed using the virulence factor database (VFDB). Default parameters were used except where otherwise stated in previously published study [8].

2.5. Screening for Plasmids

PlasmidFinder database [9] and Blast searches were used for search for plasmid related contigs in the sequenced genome, the circular contigs presence was examined in the assembly files.

CRediT authorship contribution statement

Ivana Nikodinoska: Writing – original draft, Data curation. Jenny Makkonen: Writing – review & editing, Methodology, Software. Daniel Blande: Writing – review & editing, Software, Formal analysis. Colm Moran: Writing – review & editing, Supervision, Project administration, Funding acquisition.

Declaration of Competing Interest

The authors I.N and C.A.M. are employees of Alltech which produces Pediococcus pentosaceus IMI 507025 evaluated in this study.

Acknowledgments

The research was funded by Alltech SARL (France).

Data Availability

References

  • 1.Lee I., Kim Y.O., Park S.C., Chun J. OrthoANI: an improved algorithm and software for calculating average nucleotide identity. Int. J. Syst. Evol. Microbiol. 2016;66:1100–1103. doi: 10.1099/ijsem.0.000760. [DOI] [PubMed] [Google Scholar]
  • 2.European Food Safety Authority (EFSA) EFSA statement on the requirements for whole genome sequence analysis of microorganisms intentionally used in the foodchain. EFSA J. 2021;19(7):6506. doi: 10.2903/j.efsa.2021.6506. , 14 pp. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Bolger A.M., Lohse M., Usadel B. Trimmomatic: a flexible trimmer for illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Wick R.R., Judd L.M., Gorrie C.L., Holt K.E. Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput. Biol. 2017;13 doi: 10.1371/journal.pcbi.1005595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Tatusova T., Di Cuccio M., Badretdin A., Chetvernin V., Nawrocki EP., Zaslavsky L., Lomsadze A., Pruitt K.D., Borodovsky M., Ostell J. NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res. 2016;44:6614–6624. doi: 10.1093/nar/gkw569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Ondov B.D., Treangen T.J., Melsted P., Mallonee A.B., Bergman N.H., Koren S., Phillippy A.M. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 2016;17:132. doi: 10.1186/s13059-016-0997-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Lee I., Kim Y.O., Park S.C., Chun J. OrthoANI: an improved algorithm and software for calculating average nucleotide identity. Int. J. Syst. Evol. Microbiol. 2016;66:1100–1103. doi: 10.1099/ijsem.0.000760. [DOI] [PubMed] [Google Scholar]
  • 8.Nikodinoska I., Makkonen J., Blande D., Moran C. Whole genome sequence data of Lactiplantibacillus plantarum IMI 507027. Data Brief. 2022;42 doi: 10.1016/j.dib.2022.108025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Carattoli A., Zankari E., Garcìa-Fernandez A., Larsen M.V., Lund O., Villa L., Aarestrup F.M., Hasman H. PlasmidFinder and pMLST: in silico detection and typing of plasmids. Antimicrob. Agents Chemother. 2014;58:3895–3903. doi: 10.1128/AAC.02412-14. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement


Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES