Skip to main content
Microbiology Resource Announcements logoLink to Microbiology Resource Announcements
. 2023 Nov 1;12(12):e00506-23. doi: 10.1128/MRA.00506-23

Transcriptome data sets of free-living diplomonads, Trepomonas sp. and Hexamita sp.

Keitaro Kume 1,, Tsubasa Gen 2, Kazutaka Abe 3, Hiroshi Komatsuzaki 4, Euki Yazaki 5,6, Goro Tanifuji 7, Ryoma Kamikawa 8, Yuji Inagaki 2,4,9, Tetsuo Hashimoto 2,3,4,10
Editor: Jason E Stajich11
PMCID: PMC10720568  PMID: 37909738

ABSTRACT

Most species belonging to the diplomonad genera, Trepomonas and Hexamita, are considered to have secondarily adapted to free-living lifestyles from the parasitic ancestor. Here, we report the annotated transcriptome data of Trepomonas sp. NIES-1444 and Hexamita sp. NIES-1440, the analysis of which will provide insights into the lifestyle transitions.

KEYWORDS: evolutionary biology, parasitology, protists, bioinformatics, Giardia, Trichomonas

ANNOUNCEMENT

Diplomonadida, a microaerophilic/anaerobic protist subgroup of Fornicata, comprises various parasitic/commensal genera. Interestingly, most species of the genera Trepomonas and Hexamita are considered to have secondarily adapted from parasitic to free-living lifestyles (1, 2). Since the sequence data of these species are crucial for understanding lifestyle transitions, we conducted RNA-seq analyses on Trepomonas sp. and Hexamita sp. and annotated these assemblies. We also annotated the existing transcriptome assembly of a free-living fornicate, Kipferlia bialata (3).

We obtained Trepomonas sp. NIES-1444 and Hexamita sp. NIES-1440 strains from the NIES collection. The cultures were maintained at 17°C in UYTS + rice medium (by ultrapure water; https://mcc.nies.go.jp/02medium-e.html#uyts_rice), and each protist was mass-cultured in flasks with fresh medium (500 mL) with its diverse food bacteria maintained in the culture medium for 1 week.

For RNA extraction, cells were collected by the following methods.

For Hexamita sp., density-gradient centrifugation was performed at 17°C, 800 × g for 20 min using the 0%, 10%, 20%, and 70% Optiprep (Sigma) gradient modified from the previous method (3). Purified cells were processed for RNA extraction with TRIzol and Turbo DNA-free Kit (Ambion), followed by PolyA selection using Dynabeads mRNA Purification Kit (Ambion). Following SMART-Seq v.4 Ultra Low Input RNA Kit for Sequencing (Illumina), we constructed libraries for RNA-seq analysis with Nextera XT DNA Library Prep Kit (Illumina) and analyzed these on a MiSeq (Illumina, 300-bp paired end).

For Trepomonas sp., cells were collected by two-round centrifugation at 4°C, first with 2,030 × g for 20 min, then 4,670 × g for 5 min. Using TRIzol, we extracted total RNA and provided 100 µg for Eurofins Genomics [polyA-selected RNA-seq, HiSeq2000 (Illumina), TruSeq SBS Kit v.3, 100-bp paired end].

We obtained 50,863,282 and 291,258,186 reads for Hexamita sp. and Trepomonas sp. Default parameters were used except where otherwise noted (see doi: 10.5281/zenodo.8246431). Reads were preprocessed using Trimmomatic v.0.39 [option: ILLUMINACLIP: (TruSeq3-PE-2.fa/NexteraPE-PE.fa) :3:30:10; LEADINGL: 30; TRAILING: 30; SLIDINGWINDOW: 4:25; MINLEN: 36] (4) and assembled using Trinity v.2.15.0 (option: max_memory: 200 G, CPU: 128) (5). ORFs (open reading frames) were determined for both assemblies using TransDecoder v.5.5.0 (https://github.com/TransDecoder/TransDecoder) (option: genetic_code: Hexamita, m: 50). We removed putative contaminants based on BLASTP (6, 7) against NCBI-nr v20221219 if they met all of the following criteria for the BLASTP top hit: e value of ≤1e-10, percentage of identical matches of ≥0.95, and query coverage per subject of ≥0.5, except when Metamonada NCBI: txid2611341 was hit in the hit list with an e value of ≤1e-10.

To compare the transcriptomes of these free-living diplomonads with another free-living fornicate, we also analyzed the Trinity assembly of K. bialata reported in the previous study (3). These three resultant assemblies were annotated by the best-hit annotation of InterProscan v.5.52 (8, 9) or BLASTP against the local database (Uniprot v.20230125 + Trichomonas/Giardia/Trepomonas/Spironucleus data: PRJNA16084/1439/288252/60811). If both BLASTP and InterProScan annotations were available, we used the former if its e value was ≤1e-40; otherwise, we chose the latter. Non-annotated CDSs (coding sequences) of ≥300 aa were added as hypothetical proteins. Final statistics, including assembly completeness (protein-mode BUSCO v.5.1.2) (10), are in Table 1.

TABLE 1.

Overview of transcriptome sequences for the three fornicates

Organism Kipferlia bialata Trepomonas sp. NIES-1444 Hexamita sp. NIES-1440
Predominant bacterial genera (top 3) contaminated in culture (percentage of sequences per genus against all bacterial sequences) Pseudoalteromonas (100)b Undibacterium (38.9), Chryseobacterium (20.0), Sulfurospirillum (12.5) Cutibacterium (85.5), Ralstonia (2.3), Alcanivorax (1.1)
No. of bases sequenced (Gb) 27.6b 29.1 7.6
No. of assembled transcripts 20,294b 76,247 48,794
No. of bases in assembled transcripts (kb) 28,897b 61,267 38,374
No. of annotated CDS regions 10,283 37,247 24,666
GC content of CDS regions (%) 56.9 43.1 45.9
N50 of CDS regions (bp) 2,820 1,466 1,295
Average length of annotated proteins (aa) 516.3 370.1 328.9
BUSCO completeness for annotated proteins (data set: eukaryote odb 10) Completea 33.4 (S:31.8, D:1.6) 20.4 (S:11.4, D:9.0) 15.3 (S:10.6, D:4.7)
Fragment 12.2 7.5 9.0
Missing 54.4 72.1 75.7
Reference Accession no. PRJDB5457 and DRR083618 PRJDB15716 and DRR460931 PRJDB15717 and DRR460932
Publicationb Reference (3)
a

D, duplicate; S, single.

b

Numbers updated from original report; –, none, this study.

ACKNOWLEDGMENTS

Trepomonas sp. NIES-1444 and Hexamita sp. NIES-1440 strains were isolated in 2005 by Dr. Naoji Yubuki from Hyotaro Pond in the University of Tsukuba campus and were deposited in the NIES collection at National Institute for Environmental Studies, Japan. Computations were partially performed on the NIG supercomputer at ROIS National Institute of Genetics, Japan. This work was supported in part by grants from the Japan Society for the Promotion of Science (23K16986 awarded to K.K.; 19KK0185 and 22K06368 awarded to T.H.) and by the “Tree of Life” research project of University of Tsukuba.

Contributor Information

Keitaro Kume, Email: keitaro_kume@md.tsukuba.ac.jp.

Jason E. Stajich, University of California, Riverside, California, USA

DATA AVAILABILITY

BioProject numbers are PRJDB15716/PRJDB15717 (Trepomonas/Hexamita). Raw sequence and TSA (Transcriptome Shotgun Assembly) data are available at DDBJ/ENA/GenBank under accession numbers DRR460931/DRR460932 (Trepomonas/Hexamita) and ICTH01000001-ICTH01037247/ICTI01000001-ICTI01024666/ICTJ01000001-ICTJ01010283 (Trepomonas/Hexamita/Kipferlia).

REFERENCES

  • 1. Xu F, Jerlström-Hultqvist J, Kolisko M, Simpson AG, Roger AJ, Svärd SG, Andersson JO. 2016. On the reversibility of parasitism: adaptation to a free-living lifestyle via gene acquisitions in the diplomonad Trepomonas sp. PC1. BMC Biol 14:62. doi: 10.1186/s12915-016-0284-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Leger MM, Kolisko M, Kamikawa R, Stairs CW, Kume K, Čepička I, Silberman JD, Andersson JO, Xu F, Yabuki A, Eme L, Zhang Q, Takishita K, Inagaki Y, Simpson AGB, Hashimoto T, Roger AJ. 2017. Organelles that illuminate the origins of Trichomonas hydrogenosomes and Giardia mitosomes. Nat Ecol Evol 1:0092. doi: 10.1038/s41559-017-0092 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Tanifuji G, Takabayashi S, Kume K, Takagi M, Nakayama T, Kamikawa R, Inagaki Y, Hashimoto T. 2018. The draft genome of Kipferlia bialata reveals reductive genome evolution in fornicate parasites. PLoS One 13:e0194487. doi: 10.1371/journal.pone.0194487 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. doi: 10.1093/bioinformatics/btu170 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A. 2011. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652. doi: 10.1038/nbt.1883 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol 215:403–410. doi: 10.1016/S0022-2836(05)80360-2 [DOI] [PubMed] [Google Scholar]
  • 7. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. 2009. BLAST+: architecture and applications. BMC Bioinformatics 10:421. doi: 10.1186/1471-2105-10-421 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Blum M, Chang H-Y, Chuguransky S, Grego T, Kandasaamy S, Mitchell A, Nuka G, Paysan-Lafosse T, Qureshi M, Raj S, Richardson L, Salazar GA, Williams L, Bork P, Bridge A, Gough J, Haft DH, Letunic I, Marchler-Bauer A, Mi H, Natale DA, Necci M, Orengo CA, Pandurangan AP, Rivoire C, Sigrist CJA, Sillitoe I, Thanki N, Thomas PD, Tosatto SCE, Wu CH, Bateman A, Finn RD. 2021. The InterPro protein families and domains database: 20 years on. Nucleic Acids Res 49:D344–D354. doi: 10.1093/nar/gkaa977 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, Pesseat S, Quinn AF, Sangrador-Vegas A, Scheremetjew M, Yong S-Y, Lopez R, Hunter S. 2014. InterProscan 5: genome-scale protein function classification. Bioinformatics 30:1236–1240. doi: 10.1093/bioinformatics/btu031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. 2021. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol 38:4647–4654. doi: 10.1093/molbev/msab199 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

BioProject numbers are PRJDB15716/PRJDB15717 (Trepomonas/Hexamita). Raw sequence and TSA (Transcriptome Shotgun Assembly) data are available at DDBJ/ENA/GenBank under accession numbers DRR460931/DRR460932 (Trepomonas/Hexamita) and ICTH01000001-ICTH01037247/ICTI01000001-ICTI01024666/ICTJ01000001-ICTJ01010283 (Trepomonas/Hexamita/Kipferlia).


Articles from Microbiology Resource Announcements are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES