ABSTRACT
Aduncisulcus paluster is a free-living, unicellular flagellate belonging to the eukaryotic lineage Fornicata, which includes free-living and commensal/parasitic organisms. Here, we report the draft genome sequence of A. paluster, which provides clues for elucidating the adaptation to microaerophilic/anaerobic environments and the transition between free-living and commensal/parasitic lifestyles in Fornicata.
ANNOUNCEMENT
The Fornicata include various microaerophiles/anaerobes, which possess mitochondrion-related organelles. Aduncisulcus paluster (1) is a free-living fornicate that shares a common ancestor with the parasites (2). Previously, only transcriptome data were available for A. paluster, which contain many sequences derived from bacteria in the culture medium (2). Therefore, to obtain more accurate genetic information, we performed genome/transcriptome analyses on A. paluster cells purified by density gradient centrifugation (3). This is the third genomic analysis of free-living fornicates (3, 4).
We have maintained the laboratory culture of A. paluster NY0171, which is identical to the strain NIES-1843 (1). The culture containing its food bacteria was kept at 17.5°C in a modified TYGM-9 medium (10%) prepared with filtered seawater. We inoculated from a single A. paluster culture to fresh medium in three flasks (1,650 mL in total) and cultured the cells for 1 week. The cells in two out of the three flasks were combined and used for RNA extraction, and the cells in the other flask were applied to DNA extraction. The cells were collected by density gradient centrifugation using the 0, 10, and 20% Optiprep (Sigma) gradient at 17°C, 800 × g, for 20 min (3). The purified cells at 3.4 × 107 and 1.56 × 106 were used for RNA and DNA extractions, respectively. DNA was extracted using the SDS plus phenol-chloroform-isoamyl alcohol (25:24:1) method (5), while RNA was extracted using TRIzol reagent (Sigma) according to the manufacturer’s instructions. We constructed the libraries for transcriptome sequencing (RNA-seq) and genome sequencing (Genome-seq) analyses using the Kapa stranded mRNA-seq kit for RNA and the Kapa HyperPrep kit (Kapa Biosystem) for DNA, respectively. Prior to the library construction, the DNA sample was sheared into ≈500 bp by an ultrasonicator (Covaris). Then, Genome-seq and RNA-seq analyses (150-bp paired-end) were performed on HiSeqX and NextSeq500 (Illumina) at a biotech company (SeibutsuGiken Inc.).
We obtained 463,267,674 reads/69.9 Gbp (Genome-seq) and 223,709,682 reads/31.8 Gbp (RNA-seq). We used the default parameters for subsequent analyses unless otherwise specified. Reads were preprocessed using Fastp v0.21.0 with a >Q30 threshold (6), followed by de novo assembly performed in MaSuRCA v3.3.3 (Genome-seq) (7) and Trinity v2.12 (RNA-seq) (8). For gene prediction, we executed Braker2 v2.1.6 (9–15) with the supporting information generated by aligning the cleaned RNA-seq reads to the MaSuRCA assembly by TopHat v2.1.1 (16) (option: -g 1), and we used TransDecoder v5.5.0 (https://github.com/TransDecoder/TransDecoder) for the Trinity assembly. Based on the BLASTN/BLASTP (17, 18) searches using the assembled/predicted sequences against the NCBI nt/nr databases (ver. Dec-21-2021/Feb-04-2022), we removed the putative contaminated sequences that matched to the top hit database sequences from outside the phylum, Metamonada, that includes the Fornicata, with an E value of <1e−10 and pident (percentage of identical matches) of >0.95. Then, we assessed genome completeness of the predicted sequences by BUSCO v5.1.2 with the eukaryote ODB10 data set (19), and it was 40.4% (complete, 30.2%; fragment, 10.2%). We finally summarized the statistics in Table 1. Both predicted protein sequences were annotated by InterProScan v5.52 (20, 21) and by BLASTP search against nr (if pident was >0.9 and qcov (query coverage) was >0.9, the best-hit annotation was used).
TABLE 1.
Overview of nuclear genome sequences for fornicates
| Description | Genome size (Mb) | No. of contigs | N50 (kbp) | GC (%) | No. of predicted proteins | Mean amino acid (aa) length | No. of introns | Reference(s) |
|---|---|---|---|---|---|---|---|---|
| Aduncisulcus paluster | 29.4 | 25,863 | 4.6 | 39.1 | 15,316 | 418.5 | 11,743 | This study |
| Carpediemonas membranifera | 24.2 | 69 | 905.8 | 57.1 | 8,300 | 467.2 | 4; this study | |
| Kipferlia bialata | 51.0 | 11,563 | 10.5 | 49.4 | 17,389 | 333.0 | 124,912 | 3 |
| Spironucleus salmonicida | 12.9 | 233 | 150.8 | 33.5 | 8,067 | 373.0 | 3 | 3, 4 |
| Giardia intestinalis | 12.8 | 211 | 2,762.4 | 49.2 | 5,901 | 530.0 | 8 | 3, 4 |
This draft genome will help in the study of the genome evolution associated with the evolutionary transition between free-living and commensal/parasitic lifestyles in the Fornicata.
Data availability.
The genome/transcriptome sequences are available at DDBJ/ENA/GenBank (sra-run DRR351251/DRR353576) under accession no. BQXS01000001 to BQXS01023235/ICSK01000001 to ICSK01036959.
ACKNOWLEDGMENTS
Computations were partially performed on the NIG supercomputer at ROIS National Institute of Genetics.
This work was supported in part by grants from the Japan Society for the Promotion of Science (19KK0185 and 22K06368 awarded to T.H.) and by the “Tree of Life” research project of the University of Tsukuba.
Contributor Information
Keitaro Kume, Email: keitaro_kume@md.tsukuba.ac.jp.
Jason E. Stajich, University of California, Riverside
REFERENCES
- 1.Yubuki N, Huang SSC, Leander BS. 2016. Comparative ultrastructure of fornicate excavates, including a novel free-living relative of Diplomonads: Aduncisulcus paluster gen. et sp. nov. Protist 167:584–596. doi: 10.1016/j.protis.2016.10.001. [DOI] [PubMed] [Google Scholar]
- 2.Leger MM, Kolisko M, Kamikawa R, Stairs CW, Kume K, Čepička I, Silberman JD, Andersson JO, Xu F, Yabuki A, Eme L, Zhang Q, Takishita K, Inagaki Y, Simpson AGB, Hashimoto T, Roger AJ. 2017. Organelles that illuminate the origins of Trichomonas hydrogenosomes and Giardia mitosomes. Nat Ecol Evol 1:0092. doi: 10.1038/s41559-017-0092. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Tanifuji G, Takabayashi S, Kume K, Takagi M, Nakayama T, Kamikawa R, Inagaki Y, Hashimoto T. 2018. The draft genome of Kipferlia bialata reveals reductive genome evolution in fornicate parasites. PLoS One 13:e0194487. doi: 10.1371/journal.pone.0194487. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Salas-Leiva DE, Tromer EC, Curtis BA, Jerlström-Hultqvist J, Kolisko M, Yi Z, Salas-Leiva JS, Gallot-Lavallée L, Williams SK, Kops GJPL, Archibald JM, Simpson AGB, Roger AJ. 2021. Genomic analysis finds no evidence of canonical eukaryotic DNA processing complexes in a free-living protist. Nat Commun 12:6003. doi: 10.1038/s41467-021-26077-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Green MR, Sambrook J. 2012. Molecular cloning: a laboratory manual, 4th ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. [Google Scholar]
- 6.Chen S, Zhou Y, Chen Y, Gu J. 2018. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34:i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Zimin AV, Marçais G, Puiu D, Roberts M, Salzberg SL, Yorke JA. 2013. The MaSuRCA genome assembler. Bioinformatics 29:2669–2677. doi: 10.1093/bioinformatics/btt476. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A. 2011. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652. doi: 10.1038/nbt.1883. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Brůna T, Hoff KJ, Lomsadze A, Stanke M, Borodovsky M. 2021. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom Bioinform 3:lqaa108. doi: 10.1093/nargab/lqaa108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Hoff KJ, Lomsadze A, Borodovsky M, Stanke M. 2019. Whole-genome annotation with BRAKER. Methods Mol Biol 1962:65–95. doi: 10.1007/978-1-4939-9173-0_5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Hoff KJ, Lange S, Lomsadze A, Borodovsky M, Stanke M. 2016. BRAKER1: unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics 32:767–769. doi: 10.1093/bioinformatics/btv661. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Stanke M, Diekhans M, Baertsch R, Haussler D. 2008. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24:637–644. doi: 10.1093/bioinformatics/btn013. [DOI] [PubMed] [Google Scholar]
- 13.Stanke M, Schöffmann O, Morgenstern B, Waack S. 2006. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 7:62. doi: 10.1186/1471-2105-7-62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup . 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Barnett DW, Garrison EK, Quinlan AR, Strömberg MP, Marth GT. 2011. BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics 27:1691–1692. doi: 10.1093/bioinformatics/btr174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. 2013. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14:R36. doi: 10.1186/gb-2013-14-4-r36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol 215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- 18.Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. 2009. BLAST+: architecture and applications. BMC Bioinformatics 10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. 2021. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol 38:4647–4654. doi: 10.1093/molbev/msab199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Blum M, Chang H-Y, Chuguransky S, Grego T, Kandasaamy S, Mitchell A, Nuka G, Paysan-Lafosse T, Qureshi M, Raj S, Richardson L, Salazar GA, Williams L, Bork P, Bridge A, Gough J, Haft DH, Letunic I, Marchler-Bauer A, Mi H, Natale DA, Necci M, Orengo CA, Pandurangan AP, Rivoire C, Sigrist CJA, Sillitoe I, Thanki N, Thomas PD, Tosatto SCE, Wu CH, Bateman A, Finn RD. 2021. The InterPro protein families and domains database: 20 years on. Nucleic Acids Res 49:D344–D354. doi: 10.1093/nar/gkaa977. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, Pesseat S, Quinn AF, Sangrador-Vegas A, Scheremetjew M, Yong S-Y, Lopez R, Hunter S. 2014. InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236–1240. doi: 10.1093/bioinformatics/btu031. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The genome/transcriptome sequences are available at DDBJ/ENA/GenBank (sra-run DRR351251/DRR353576) under accession no. BQXS01000001 to BQXS01023235/ICSK01000001 to ICSK01036959.
