Draft Genome Sequence of Clinical Isolate USM026 of the Pathogenic Yeast Candida parapsilosis

Dina Yamin; Wan Khairunnisa Wan Juhari; Nur Waliyuddin Hanis Zainal Abidin; Shuhaila Mat-Sharani; Azian Harun

doi:10.1128/mra.00839-22

. 2022 Oct 31;11(11):e00839-22. doi: 10.1128/mra.00839-22

Draft Genome Sequence of Clinical Isolate USM026 of the Pathogenic Yeast Candida parapsilosis

Dina Yamin ^a,^✉, Wan Khairunnisa Wan Juhari ^a,^b, Nur Waliyuddin Hanis Zainal Abidin ^c, Shuhaila Mat-Sharani ^d, Azian Harun ^a,^e,^✉

Editor: Jason E Stajich^f

PMCID: PMC9670998 PMID: 36314917

ABSTRACT

Here, we report the draft genome sequence of a Candida parapsilosis clinical isolate (USM026) that was recovered from a blood sample from a patient who was treated for a catheter-related bloodstream infection (CRBSI). The draft genome is 12,839,916 bp in length, with 22,076,712 reads, 249 scaffolds, and 5,537 genes.

ANNOUNCEMENT

Candida parapsilosis, a yeast belonging to the cellular organisms (Eukaryota superkingdom, Fungi kingdom, Dikarya subkingdom, Ascomycota phylum, Saccharomycotina subfamily, Saccharomycetes class, Saccharomycetales order, Debaryomycetaceae family, and Candida genus), is a major cause of candidemia worldwide (1). In this announcement, we present the draft genome sequence of a clinical isolate of C. parapsilosis. This will contribute to our understanding of the genetic variability among C. parapsilosis isolates and provide clues to genomic evolution. This isolate was recovered from a patient who had been admitted to Hospital Universiti Sains Malaysia (USM) (Kota Bharu, Kelantan, Malaysia) and later developed a catheter-related bloodstream infection (CRBSI). This work was approved by the Human Research Ethics Committee of USM (approval number JEPeM-USM-16040162).

The isolate was subcultured on a Sabouraud dextrose agar (SDA) plate and incubated at 37°C overnight. Pure colonies were harvested in the stationary phase. Genomic DNA (gDNA) was extracted using a conventional DNA extraction method, namely, the phenol-chloroform-isoamyl alcohol DNA extraction protocol (2). Molecular identification by sequencing of the internal transcribed spacer (ITS) region of ribosomal DNA was performed for species confirmation (3).

The DNA sample was fragmented by sonication to a size of 350 bp. DNA fragments were end polished, A-tailed, and ligated with full-length adaptors for Illumina sequencing. Library preparation with further PCR amplification was performed using the NEBNext Ultra DNA library preparation kit for Illumina (New England Biolabs, USA) according to the manufacturer’s protocol. PCR products were purified (AMPure XP system), and libraries were analyzed for size distribution with an Agilent 2100 Bioanalyzer and quantified using real-time PCR. The qualified libraries were used to carry out sequencing. The sequencing was performed using an Illumina NovaSeq 150-bp paired-end protocol. The original optic data obtained by high-throughput sequencing (Illumina platform) were transformed into raw sequence reads by CASAVA base calling and stored in FASTQ (fq) format. In total, the sequencing produced 22,076,712 raw paired-end reads. Quality control was performed using FastQC software, and the adapter and low-quality sequences were removed with Trimmomatic (v0.36) (4) by performing sliding window trimming with a minimum average quality score of 20 and a minimum sequence read length of 20 bases.

Genome assembly and annotation were performed in the Galaxy platform using default parameters unless stated otherwise (5). The gDNA was sequenced to an average sequencing depth of 254× using the Illumina NovaSeq 6000 platform, producing 2 × 150-bp paired-end reads. The genome sequences were assembled into contigs and scaffolds using SPAdes (v3.12.0) (6). The quality of the assembly was evaluated using QUAST (v5.0.2) (7). The final assembly for the C. parapsilosis USM026 strain consists of 249 scaffolds (≥1,000 bp), with a total length of 12,839,916 bp, a G+C content of 38.62%, an N₅₀ value of 116,314 bp, and a longest scaffold of 343,659 bp. The assemblies align 98% against the reference genome, C. parapsilosis CDC317 (GenBank assembly accession number GCA_000182765.2), obtained from the NCBI GenBank database. Repetitive sequences of interspersed and low-complexity elements were masked using RepeatMasker (v4.0.9) (http://repeatmasker.org) with combined Dfam (v3.0) and RepBase (release 20181026) databases, based on fungal species, which resulted 2.28% masked bases (8). The final scaffolds were annotated using MAKER (v2.31.10) (9) with ab initio gene prediction using AUGUSTUS (v3.3.3) (10) and SNAP (v2013-11-29) (11), with soft masking in the repeat masking step. The training species was Candida albicans. C. parapsilosis strain USM026 was predicted to contain a total of 5,537 genes.

Data availability.

This whole-genome shotgun project has been deposited in DDBJ/ENA/GenBank under BioProject accession number PRJNA610714 with GenBank accession number JADCQS000000000. The version described in this paper is the first version, JADCQS010000000. Raw reads are available in the NCBI Sequence Read Archive (SRA) under accession number SRR11249110.

ACKNOWLEDGMENTS

We acknowledge Poh Yang Ming, a bioinformatician from Perdana University, for his efforts in revising the genome annotation.

This study was supported by the Malaysia Ministry of Higher Education (MOHE) Fundamental Research Grant Scheme (grant FRGS/1/2019/SKK11/USM/02/3) and a USM RUI grant (grant 1001/PPSP/812206) awarded to A.H.

Contributor Information

Dina Yamin, Email: dinayamin@student.usm.my.

Azian Harun, Email: azian@usm.my.

Jason E. Stajich, University of California, Riverside

REFERENCES

1.Tóth R, Nosek J, Mora-Montes HM, Gabaldon T, Bliss JM, Nosanchuk JD, Turner SA, Butler G, Vágvölgyi C, Gácser A. 2019. Candida parapsilosis: from genes to the bedside. Clin Microbiol Rev 32:e00111-18. doi: 10.1128/CMR.00111-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Sambrook J, Russell D. 2001. Molecular cloning: a laboratory manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. [Google Scholar]
3.White TJ, Bruns TD, Lee SB, Taylor JW. 1990. Amplification and direct sequencing of fungal ribosomal RNA genes for phylogenetics, p 315–322. In Innis MA, Gelfand DH, Sninsky JJ and White TJ, Eds., PCR protocols: a guide to methods and applications. Academic Press, New York, NY. [Google Scholar]
4.Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Afgan E, Baker D, Batut B, van den Beek M, Bouvier D, Cech M, Chilton J, Clements D, Coraor N, Grüning BA, Guerler A, Hillman-Jackson J, Hiltemann S, Jalili V, Rasche H, Soranzo N, Goecks J, Taylor J, Nekrutenko A, Blankenberg D. 2018. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res 46:W537–W544. doi: 10.1093/nar/gky379. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477. doi: 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Gurevich A, Saveliev V, Vyahhi N, Tesler G. 2013. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29:1072–1075. doi: 10.1093/bioinformatics/btt086. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Smit A, Hubley R, Green P. 2015. RepeatMasker Open v4.0. https://www.repeatmasker.org.
9.Campbell MS, Holt C, Moore B, Yandell M. 2014. Genome annotation and curation using MAKER and MAKER-P. Curr Protoc Bioinformatics 48:4.11.1–4.11.39. doi: 10.1002/0471250953.bi0411s48. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Stanke M, Keller O, Gunduz I, Hayes A, Waack S, Morgenstern B. 2006. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34:W435–W439. doi: 10.1093/nar/gkl200. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Korf I. 2004. Gene finding in novel genomes. BMC Bioinformatics 5:59. doi: 10.1186/1471-2105-5-59. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[B1] 1.Tóth R, Nosek J, Mora-Montes HM, Gabaldon T, Bliss JM, Nosanchuk JD, Turner SA, Butler G, Vágvölgyi C, Gácser A. 2019. Candida parapsilosis: from genes to the bedside. Clin Microbiol Rev 32:e00111-18. doi: 10.1128/CMR.00111-18. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] 2.Sambrook J, Russell D. 2001. Molecular cloning: a laboratory manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. [Google Scholar]

[B3] 3.White TJ, Bruns TD, Lee SB, Taylor JW. 1990. Amplification and direct sequencing of fungal ribosomal RNA genes for phylogenetics, p 315–322. In Innis MA, Gelfand DH, Sninsky JJ and White TJ, Eds., PCR protocols: a guide to methods and applications. Academic Press, New York, NY. [Google Scholar]

[B4] 4.Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5] 5.Afgan E, Baker D, Batut B, van den Beek M, Bouvier D, Cech M, Chilton J, Clements D, Coraor N, Grüning BA, Guerler A, Hillman-Jackson J, Hiltemann S, Jalili V, Rasche H, Soranzo N, Goecks J, Taylor J, Nekrutenko A, Blankenberg D. 2018. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res 46:W537–W544. doi: 10.1093/nar/gky379. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] 6.Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477. doi: 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7.Gurevich A, Saveliev V, Vyahhi N, Tesler G. 2013. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29:1072–1075. doi: 10.1093/bioinformatics/btt086. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] 8.Smit A, Hubley R, Green P. 2015. RepeatMasker Open v4.0. https://www.repeatmasker.org.

[B9] 9.Campbell MS, Holt C, Moore B, Yandell M. 2014. Genome annotation and curation using MAKER and MAKER-P. Curr Protoc Bioinformatics 48:4.11.1–4.11.39. doi: 10.1002/0471250953.bi0411s48. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] 10.Stanke M, Keller O, Gunduz I, Hayes A, Waack S, Morgenstern B. 2006. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34:W435–W439. doi: 10.1093/nar/gkl200. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] 11.Korf I. 2004. Gene finding in novel genomes. BMC Bioinformatics 5:59. doi: 10.1186/1471-2105-5-59. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Draft Genome Sequence of Clinical Isolate USM026 of the Pathogenic Yeast Candida parapsilosis

Dina Yamin

Wan Khairunnisa Wan Juhari

Nur Waliyuddin Hanis Zainal Abidin

Shuhaila Mat-Sharani

Azian Harun

Roles

ABSTRACT

ANNOUNCEMENT

Data availability.

ACKNOWLEDGMENTS

Contributor Information

REFERENCES

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Draft Genome Sequence of Clinical Isolate USM026 of the Pathogenic Yeast Candida parapsilosis

Dina Yamin

Wan Khairunnisa Wan Juhari

Nur Waliyuddin Hanis Zainal Abidin

Shuhaila Mat-Sharani

Azian Harun

Roles

ABSTRACT

ANNOUNCEMENT

Data availability.

ACKNOWLEDGMENTS

Contributor Information

REFERENCES

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases