A new chromosome-level genome assembly and annotation of Cryptosporidium meleagridis

Lasya R Penumarthi; Rodrigo P Baptista; Megan S Beaudry; Travis C Glenn; Jessica C Kissinger

doi:10.1101/2024.02.16.580748

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2024 Feb 17:2024.02.16.580748. [Version 1] doi: 10.1101/2024.02.16.580748

A new chromosome-level genome assembly and annotation of Cryptosporidium meleagridis

Lasya R Penumarthi ^1,², Rodrigo P Baptista ^1,^2,^†, Megan S Beaudry ^3,^††, Travis C Glenn ^1,^3,⁴, Jessica C Kissinger ^1,^2,⁴

PMCID: PMC10888889 PMID: 38405792

Abstract

Cryptosporidium spp. are medically and scientifically relevant protozoan parasites that cause severe diarrheal illness in infants and immunosuppressed populations as well as animals. Although most human Cryptosporidium infections are caused by C. parvum and C. hominis, there are several other human-infecting species including C. meleagridis, which is commonly observed in developing countries. Here, we polished and annotated a long-read genome sequence assembly for C. meleagridis TU1867, a species which infects birds and humans. The genome sequence was generated using a combination of whole genome amplification (WGA) and long-read Oxford Nanopore Technologies sequencing. The assembly was then polished with Illumina data. The chromosome-level genome assembly is 9.2 Mbp with a contig N50 of 1.1 Mb. Annotation revealed 3,923 protein-coding genes. A BUSCO analysis indicates a completeness of 96.6% (n=446), including 430 (96.4%) single-copy and 1 (0.224%) duplicated apicomplexan conserved gene(s). The new C. meleagridis genome assembly is nearly gap-free and provides a valuable new resource for the Cryptosporidium community and future studies on evolution and host-specificity.

Keywords: Whole genome amplification, Oxford Nanopore, functional annotation, comparative genomics

Background & Summary

Cryptosporidium is an apicomplexan protozoan parasite of global medical, scientific, and veterinary significance that can cause moderate-to-severe diarrhea in humans and animals¹. It is the leading cause of waterborne disease outbreaks in the US^2,3. Though cryptosporidiosis causes illness in both immunocompromised and immunocompetent individuals, it is especially severe in immunocompromised and elderly populations as well as in children, resulting in persistent infection, malnutrition, and, in some cases, death^3–5. In 2019, the Global Burden of Disease study found 133,422 global deaths and an annual loss of 8.2 million disability-adjusted life years (DALYs) due to Cryptosporidium⁶. C. meleagridis is an avian and mammalian-infecting Cryptosporidium species that was first described in turkeys^7,8. Human Cryptosporidium infections are caused predominantly by C. parvum and C. hominis, but species such as C. meleagridis can also infect humans. In fact, C. meleagridis is the third most common human-infecting Cryptosporidium species following C. parvum and C. hominis⁹. Though generally less common, C. meleagridis infection has been reported to be as common as C. parvum in some parts of the world and can lead to death in rare cases^10,11.

At this time, 15 of the >30 reported Cryptosporidium species have assembled genome sequences. However, only 8 have been annotated including C. parvum, C. hominis, C. tyzzeri, and C. meleagridis¹². The latest release of the C. meleagridis genome strain UKMEL1 (CmUKMEL1) contains gaps and is assembled into 57 contigs. Due to a highly compact genome, it is challenging to sequence the genome of Cryptosporidium parasites from a single individual. Since cloning of Cryptosporidium parasites is not possible, sequencing a small pool of individuals is preferred over bulk sequencing to reduce heterozygosity. Recently, a new method has been implemented to generate DNA sequences from Cryptosporidium using a whole genome amplification (WGA) approach and was tested on C. meleagridis strain TU1867 (CmTU1867) which provided sufficient DNA for library construction and generation of a high-quality genome through long-read sequencing ¹³. Here we share a chromosome-level assembly of the C. meleagridis genome. The new CmTU1867 genome assembly is 201,275 base pairs longer than that of CmUKMEL1. The largest contig in the new assembly is 632,735 base pairs (bp) longer than the largest contig in CmUKMEL1. We note a larger N50 value of 1,105,563 bp in the new CmTU1867 assembly compared to CmUKMEL1 which has an N50 value of 322,908 bp (Table 1). The new CmTU1867 assembly and annotation provides a valuable resource to the Cryptosporidium community. The high-quality C. meleagridis genome results from a new experimental approach designed to help generate whole genome sequences from limiting amounts of genomic DNA and is an important resource that will contribute to our understanding of Cryptosporidium evolution and host specificity.

Table 1.

Statistics of the C. meleagridis and CpBGF genome assemblies and annotation

Statistics	CmTU1867	CmUKMEL1	CpBGF
# of contigs	13	57	8
Largest contig (bp)	1365597	732862	1379130
# T2T chromosomes	0	0	8
Total length (bp)	9178485	8973200	9222738
N50 (bp)	1105563	322908	1108772
GC (%)	30.9	31.0	30.1
# N’s per 100 kbp	0.00	0.00	0.00
# of telomeres identified	1	10	16
# rRNA	17	8	15
# tRNA	45	45	45
# protein-coding genes	3923	3753	3932
Average gene length (bp)	1824	1885	2145

Open in a new tab

The initial genome assembly contained 13 contigs including 8 chromosomes, 5 contigs (681–30,300 bp), 2 of which were later identified as contamination and removed (Figure 1). Two additional contigs were created manually (“contig_10” and “contig_11”) from the beginnings of chromosome 2 and chromosome 6 due to detection of an assembly artifact in these chromosomes. Thus, the final assembly contains 13 contigs, 8 chromosomes and contigs 9–13. Contig_9 and contig_13 have regions identical to regions of chromosome 1 and chromosome 3, respectively, but assembled separately from the full chromosomes (Table 1). One drawback to the new assembly is its lack of telomeres in comparison to CmUKMEL1. We were only able to detect 1 telomere in CmTU1867 on chromosome 5. Searching through the long reads, we were able to find several reads with telomeres on them that did not assemble. Though these reads did not assemble, regions of the read that did not contain the telomere pattern matched the assembly. Upon mapping these reads back to the assembly, we identified three additional telomere locations that could be placed manually (beginning and end of chromosome 3 and beginning of chromosome 4). At least 4 telomere-containing long-reads mapped to these regions with at least 1 long (>1kb) read that extended into unique regions of the chromosome. However, due to relatively low read support for these telomeres, we did not extend the ends of chromosomes in the assembly with these telomere-containing reads.

Figure 1. — Jupiterplot between the previous CmUKMEL1 genome sequence and the new CmTU1867 genome sequence. Ribbons are colored with respect to the reference genome (CmTU1867).

The new CmTU1867 genome sequence has 10 additional ribosomal RNA genes compared to CmUKMEL1. The 16 rRNA genes (excluding the 5.8S) are in clusters of 2–3 and are found on either chromosome 1, 2, 3, 7, or 8 (Figure 2). We noticed that what RNAmmer¹⁴ detected as 5.8S rRNAs in CmTU1867 clustered separately from the 18S and the 28S rRNAs with the exception of one 5.8S (ID=cmbei_2001394) located on chromosome 2 adjacent to a 28S rRNA. Most Apicomplexans have been annotated to have the 5.8S rRNA in between the 18S and 28S rRNAs. Additional searches revealed that all but the 5.8S on chromosome 2 were 5S rRNAs. The six 5S rRNAs in CmTU1867 are in 2 clusters of 3, one on chromosome 3 and the other on chromosome 7 (Figure 2). In CpBGF the cluster of 5S rRNAs in chromosome 3 contains 2 rRNAs whereas in CpIOWA-ATCC and CmTU1867, the cluster of 5S rRNAs in chromosome 3 contains 3 rRNAs. These patterns may arise because of variation in the copy number of the 5S rRNA within a population of parasites or among different species of Cryptosporidium or compressions during genome assembly. When CmTU1867 reads were mapped to the assembly at regions where there are 5S rRNA clusters in chromosomes 3 and 7, we saw relatively even coverage throughout the region. However, when we did this test with the CpBGF reads and genome sequence, we found a 2–3x read compression at precisely the 5S regions on chromosome 3 and 7.

Figure 2. — Circos plot rings, moving from the center to the exterior illustrate shared ortholog clusters between CmTU1867 and CpBGF, number of base pairs in 50,000 bp increments, GC content histogram, and gene density. Locations of rRNA genes are as indicated.

The annotation was validated by comparing the CmTU1867, CpBGF, and CmUKMEL1 protein-coding sequences using orthology-based algorithms. We initially found a group of 5 endonucleases in CmTU1867 not in CpBGF or CmUKMEL1. The members of this group did not have any significant hit to any Cryptosporidium species except C. ubiquitum and C. felis in a blastp search. However, when we ran tblastn of this gene family against annotated transcripts in CryptoDB, we found hits to C. parvum, C. hominis, C. tyzerri, C. ryanae, and CmUKMEL1. The members of the gene family have been annotated as 18S ribosomal or non-coding RNAs in C. parvum, C. hominis, C. tyzerri, and C. ryanae and as a 5S rRNA and ncRNA in CmUKMEL1 but as protein-coding genes in C. ubiquitum. Upon manual inspection we found that the genomic sequence for these proteins exists in the C. parvum, C. hominis, and C. tyzerri genome in varying copy number. We additionally found one putative open-reading frame (ORF) predicted by AUGUSTUS in CmTU1867 Chr4 region 828518–828700 bp that was not in CmUKMEL1, CpBGF, or any other species according to blastp and blastn searches. We removed this putative gene from our annotations since we could not validate it with RNAseq data or evidence in any other species, but it may be a unique C. meleagridis gene detected by the improved assembly.

Some of the orthogroups initially detected by OrthoVenn3 fell at the ends of chromosomes in C.parvum that extended beyond the ends of the CmTU1867 and CmUKMEL1 chromosomes. Other times they were unannotated in one species or the other but present in the genome sequence. When we found unannotated proteins that were not initially detected by Liftoff or AUGUSTUS in CmTU1867, we manually added these annotations. Ultimately, we found very few orthogroups that were unique to a species (Figure 3). A description of their investigation can be found in Table 3.

Figure 3. — Orthogroup comparison among the new CmTU1867, the previous CmUKMEL1, and the newly released reference genome, CpBGF. See Figure 5 for the pre-validation results. Arrows link gene IDs to their orthogroups.

Table 3.

Manual validation of orthogroups not present in all species examined in Figure 5.

Venn Group	CmBEI	CmUKMEL1	CpBGF	Gene Product	Findings

3			cpbgf_7005600	GMP synthase	Subtelomeric – cannot conclude if in syntenic regions of CmBEI or CmUKMEL1
			cpbgf_1003900	GMP synthase	Subtelomeric in C. parvum, as above
			cpbgf_1003890	GMP synthase	Subtelomeric in C. parvum, as above
	Found cmbei_2003270	Found cmeUKMEL1_06080	cpbgf_2003270	phosphoglucomutase	Orthofinder created two separate groups for this single gene (see below)
			cpbgf_1003950	tryptophan synthase beta chain	Subtelomeric in C. parvum – cannot conclude if in syntenic regions of CmBEI or CmUKMEL1
		cmeUKMEL1_09025	cpbgf_200460	unspecified product	Subtelomeric in C. parvum, as above
1	Found cmbei_7005006	cmeUKMEL1_03235	Not found	hypothetical protein	Exists in CmBEI – annotation added
11
	cmbei_2003270	cmeUKMEL1_06080	Found: cpbgf_2003270	phosphoglucomutase/phosphomannomutase alpha/beta/alpha domain I family protein	Orthofinder created two separate groups for this single gene (see above)
	cmbei_2003445	cmeUKMEL1_05990	Found	hypothetical protein, only 78 aa	No start methionine in C. parvum. Limited expression evidence in C. parvum
	cmbei_3002025	cmeUKMEL1_12210	Found cpbgf_3002017	Rpp14/Pop5 family protein	Exists as noncoding RNA in C. parvum IOWA BGF
	cmbei_4002857	cmeUKMEL1_09770	Found	hypothetical protein	Exists in C. parvum IOWA BGF, Unannotated
	cmbei_4003225	cmeUKMEL1_09950	Not found	hypothetical protein, only 71 aa	Gene cannot be found in C. parvum IOWA BGF syntenic region
	cmbei_400495	cmeUKMEL1_08280	Not found	hypothetical protein, only 78 aa	Exists in C. parvum IOWA BGF, Unannotated
	cmbei_500300	cmeUKMEL1_09795	Found	hypothetical protein	Exists in C. parvum IOWA BGF, Unannotated
	cmbei_110040	cmeUKMEL1_17820	Found	hypothetical protein	Exists in C. parvum IOWA BGF, Unannotated
	cmbei_6004835	cmeUKMEL1_05455	Found	hypothetical protein	Exists in C. parvum IOWA BGF, Unannotated
	cmbei_7001355	cmeUKMEL1_10685	Found	hypothetical protein, only 64 aa	There is a stop codon in the middle of the 3’ ORF
	cmbei_100030	cmeUKMEL1_09030	Found cpbgf_200445	putative integral membrane protein	Exists as long noncoding RNA in C. parvum IOWA BGF

Open in a new tab

aa = amino acids

While annotating the genome, we noticed several genes that were annotated as a single long transcript in CmUKMEL1 but as two distinct genes in CpBGF. Upon investigation, we discovered that these gene annotations vary in size in several Cryptosporidium species. In CmTU1867, the protein is annotated as one single long transcript for the 22 cases described in Table 2 since it is unlikely that the whole ORF could be annotated with no errors in the annotation software and since it is annotated as a single gene in CmUKMEL1 and other Cryptosporidium spp. A lack of RNAseq evidence for C. meleagridis made it challenging to validate whether these genes exist as a single long gene in nature. We made a note that the gene is annotated as two or three distinct genes in other species in the annotation file for CmTU1867 (two of the 20 proteins are annotated as 3 proteins in CpBGF). Five C. meleagridis orthologs of C. parvum sub-telomeric genes could not be found in the current assembly including: cpbgf_1003890, cpbgf_1003900, cpbgf_1003950, and cpbgf_7005600 as well as cpbf_200460 which was observed in CmUKMEL but not CmTU1867 (Figure 3). These gene differences probably result from the incomplete C. meleagridis TU1867 sub-telomeric regions as we did not assemble many telomeres. However, it is also possible that those genes do not exist in C. meleagridis. This determination will require a C. meleagridis T2T assembly.

Table 2.

Single large, annotated genes in CmTU1867 that are annotated as two or three distinct sequential genes in CpBGF and/or other Cryptosporidium spp.

Chr	Gene ID	Protein in CmTU1867	Gene IDs in CpBGF
1	cmbei_100150 cmbei_100730 cmbei_1002800	Uncharacterized secreted protein (SKSR gene family) Glutamine cyclotransferase domain containing protein Methyltransferase TRM13, MED7, Zinc finger domain-containing protein	cpbgf_100150	cpbgf_100160
			cpbgf_100730	cpbgf_100733
			cpbgf_1002800	cpbgf_1002810
2	cmbei_20010	SFI domain containing protein	cpbgf_20010	cpbgf_200470
3	cmbei_3002310 cmbei_3002700	RNA recognition motif and AAA-type ATPase core domain containing protein Transport protein particle (TRAPP) domain containing protein	cpbgf_3002310	cpbgf_3002300	cpbgf_3002290
3	cmbei_3002310 cmbei_3002700		cpbgf_3002700	cpbgf_3002706
4	cmbei_4002100 cmbei_4002180	PIG-A GPI anchor and glucosyltransferase domain containing protein Peptidase A1 and Dpy-19/Dpy19-like domain-containing protein	cpbgf_4002100	cpbgf_4002093
4	cmbei_4002100 cmbei_4002180		cpbgf_4002180	cpbgf_4002190
5	cmbei_500340 cmbei_500470 cmbei_5002280 cmbei_5002830 cmbei_5004500 cmbei_5003110	Signal peptide containing protein Peptidase S9, prolyl oligopeptidase, catalytic domain containing protein Signal peptide and transmembrane domain containing protein Vacuolar protein sorting-associated protein 13 domain containing protein Vacuolar protein sorting-associated protein 13 AAA+ ATPase and VWFA domain containing protein	cpbgf_500340	cpbgf_500350
			cpbgf_500470	cpbgf_500466
			cpbgf_5002280	cpbgf_5002290
			cpbgf_5002830	cpbgf_5002840
			cpbgf_5004500	cpbgf_5004490	cpbgf_5004480
			cpbgf_5003110	cpbgf_5005540
6	cmbei_600540 cmbei_6001250 cmbei_6002100 cmbei_6002140	Serine/threonine protein kinase domain containing protein Uncharacterized protein Uncharacterized protein Potassium channel domain containing protein	cpbgf_600540	cpbgf_600530
			cpbgf_6001250	cpbgf_6001260
			cpbgf_6002100	cpbgf_6002110
			cpbgf_6002140	cpbgf_6002143
7	None
8	cmbei_800690 cmbei_8002510	Signal peptide containing putative Formin J protein Putative cyclin dependent kinase	cpbgf_800690	cpbgf_800680
8	cmbei_800690 cmbei_8002510		cpbgf_8002510	cpbgf_8002500

Open in a new tab

Methods

Whole Genome Sequencing and Assembly

C. meleagridis isolate TU1867 genomic DNA was obtained from BEI Resources (cat. number NR-2521 ATCC, Manassas, VA). A total of 10 ng of C. meleagridis DNA was amplified through whole genome amplification using multiple displacement amplification (MDA), followed by T7 endonuclease debranching yielding 400 ng debranched DNA following¹³ (Figure 4). ONT library preparation was performed using the SQK-RBK004 Rapid Barcoding Sequencing Kit (Oxford Nanopore Technologies, Oxford, UK) as per the manufacturer’s instructions. Sequencing was performed on an ONT MinION device with R9.4.1 flow cells and base called by guppy v.6.4.2 using the high-accuracy base call model.. The long-read fastq reads were assembled using Flye v.2.8.2¹⁵ with the --nano-raw option and -g 9m. The draft long read based genome was polished with PolyPolish v.0.5.0¹⁶ using default parameters to increase the accuracy of the base calls by using C. meleagridis strain TU1867 Illumina sequences (SRX253214) generated by others. Intermediate files needed for PolyPolish were generated using BWA v.0.7.17¹⁷. The resulting contigs were ordered and oriented to match the reference CpBGF genome assembly using AGAT¹⁸ v. 1.1.0 PERL script agat_sq_reverse_complement.pl and GenomeTools¹⁹. Contig orientation was using the progressive Mauve alignment v 1.1.3²⁰ in Geneious Prime v 2023.2.1²¹. Contamination was detected by searching the NCBI nr database using BLAST²² (blastx default parameters) and FCS-GX²³. Contaminant contigs were removed from further analysis. We manually searched the contigs for telomeres. Telomeres were also identified, as in CpBGF²⁴ using the telomere-locating python script FindTelomeres to find the Cryptosporidium telomere repeat 5’-CCTAAA-3’ and its complement at the ends of assembled contigs (https://github.com/JanaSperschneider/FindTelomeres). The unassembled ONT long-reads were also searched for this telomere repeat with FindTelomeres and reads with telomeres were mapped back to the genome assembly using minimap2 (default parameters). Read-mapping to the whole genome was done using minimap2 v.2.26 with the option – secondary=no to prevent multi-mapping. Genome statistics were generated using GenomeTools v.1.6.2¹⁹ programs gt stat and gt seqstat. AGAT v.1.1.0¹⁸ PERL scripts agat_sq_stat_basic.pl and agat_sp_statistics.pl were used to generate statistical information with default parameters.

Figure 4. — Bioinformatics workflow for assembly and annotation of the DNA derived from CmTU1867 WGA. Green boxes represent main initial steps as well as new data used for parts of the pipeline and blue boxes represent subsequent downstream analyses of the data generated. Please refer to methods for additional details.

Genome Annotation

Tracks for manual annotation were generated using a local Apollo2 server²⁵ using two approaches: (1) an orthology based annotation transfer using the tool Liftoff²⁶ and (2) an ab initio gene prediction using Augustus²⁷ trained with C. parvum IOWA-ATCC and CmUKMEL1 protein sequences from CryptoDBv.50 with the -copies flag to look for extra gene copies and otherwise default parameters. Annotation Liftoff tracks were created from the current CmUKMEL1, CpBGF, and CpIOWA-ATCC annotated genes with the -copies flag to look for extra gene copies. In situations where AUGUSTUS and Liftoff gene structures disagreed, the conflicting gene models were searched in BLASTp in CryptoDB to check for the gene structure that was most abundant in existing annotations. As there is no available RNA-seq data for C. meleagridis there is no way to confirm gene predictions and UTRs are not annotated. Tracks for prediction and manual annotation of rRNAs were created using RNAmmer 1.2¹⁴ (S -euk, -m lsu, ssu). TRNAscan 2.0²⁸ was used to predict tRNAs using default parameters. Functional annotation was generated with Blast2GO²⁹ (using blastp, the nr database nr, word size 5, and e-value 1e-5) and compared with results from the reference telomere-to-telomere CpBGF genome functional annotation. Edits to the CmTU1867 gff file gene names were performed with basic bash and awk commands.

Comparative Genomics

A comparison of orthologous genes between the new C. meleagridis assembly and the previous C. meleagridis assembly³⁰ was completed using OrthoFinder v2.5.4³¹ with default parameters and visualized using OrthoVenn3³². Figure 3 represents the orthology results following extensive manual validation (Figure 5 and Table 3) of each orthogroup difference. Manual analyses utilized both NCBI BLASTp and CryptoDB³³ BLASTp. Orthology, genome, and rRNA comparisons were created using Circos³⁴, TBTools³⁵, and JupiterPlot³⁶. Comparisons of rRNA clusters in CpBGF, CpIOWA-ATCC, and CmTU1867 were performed using RNAmmer¹⁴ with default parameters. CmTU1867 long reads were mapped back to contig regions containing 5S rRNA clusters using minimap2 with --secondary=no to account for multi-mapping.

Figure 5. — Orthogroup comparison among the new CmTU1867, the previous CmUKMEL1, and the newly released reference genome, CpBGF prior to validation and correction. Arrows point to orthogroups containing the indicated gene IDs.

Data Records

The genomic sequences, reads, and metadata for the Cryptosporidium meleagridis TU1867 strain have been deposited in the NCBI GenBank under BioProject PRJNA1022047. The sequence was polished with reads from NCBI GenBank SRA SRR793561. The fully assembled and annotated sequence was submitted to the NCBI GenBank under submission ID SUB14212942 and will be released as soon as it completes processing.

Technical Validation

CmTU1867 assembly completeness was evaluated using the Benchmarking Universal Single-Copy Orthologs (BUSCO) software v.5.5.0³⁷ to search against apicomplexan databases (apicomplexa_odb10) which contain 446 orthologous single-copy genes in total. The results showed an overall completeness score of 96.6% (n=446). Of these, 430 (96.4%) single-copy genes were retrieved of which 1 (0.224%) was duplicated. These results indicate high completeness of the genome assembly.

Further analysis of the assembly and annotated protein encoding regions utilized an orthology comparison of CmTU1867, CmUKMEL1, and CpBGF with the OrthoFinder algorithm in OrthoVenn3 (Figure 5). Orthogroups belonging to CmUKMEL1 only, CpBGF only, CmUKMEL and CpBGF only, and CmUKMEL1 and CmTU1867 only were extensively analyzed (Table 3). Several genes found only in CpBGF were shown to be subtelomeric in both CmUKMEL1 and CmBEI and thus likely missing from the incomplete chromosome ends of CmTU1867. Several genes encoding short < 100 amino acid proteins found in both CmBEI and CmUKMEL1 exist in CpBGF but are unannotated. Following these analyses, a new Venn diagram (Figure 3) was created that represents the revised, validated findings.

Acknowledgements

This work was funded by NIH R01AI14866 to JCK and TCG.

Footnotes

Competing interests

The authors declare no competing interests.

Code Availability

Pipelines and code involved in processing the data were executed by following the respective manuals of the bioinformatics software programs used. No custom scripts were generated in this study.

References

1.Putignani L. et al. Characterization of a mitochondrion-like organelle in Cryptosporidium parvum. Parasitology 129, 1–18 (2004). [DOI] [PubMed] [Google Scholar]
2.Hlavsa M. C. et al. Outbreaks Associated with Treated Recreational Water - United States, 2000–2014. MMWR Morb Mortal Wkly Rep 67, 547–551 (2018). 10.15585/mmwr.mm6719a3 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Kotloff K. L. et al. Burden and aetiology of diarrhoeal disease in infants and young children in developing countries (the Global Enteric Multicenter Study, GEMS): a prospective, case-control study. Lancet 382, 209–222 (2013). 10.1016/S0140-6736(13)60844-2 [DOI] [PubMed] [Google Scholar]
4.Girma M., Teshome W., Petros B. & Endeshaw T. Cryptosporidiosis and Isosporiasis among HIV-positive individuals in south Ethiopia: a cross sectional study. BMC Infect Dis 14, 100 (2014). 10.1186/1471-2334-14-100 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Investigators M.-E. N. The MAL-ED study: a multinational and multidisciplinary approach to understand the relationship between enteric pathogens, malnutrition, gut physiology, physical growth, cognitive development, and immune responses in infants and children up to 2 years of age in resource-poor environments. Clin Infect Dis 59 Suppl 4, S193–206 (2014). 10.1093/cid/ciu653 [DOI] [PubMed] [Google Scholar]
6.Gilbert I. H. et al. Safe and effective treatments are needed for cryptosporidiosis, a truly neglected tropical disease. BMJ Glob Health 8 (2023). 10.1136/bmjgh-2023-012540 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Akiyoshi D. E. et al. Characterization of Cryptosporidium meleagridis of human origin passaged through different host species. Infect Immun 71, 1828–1832 (2003). 10.1128/IAI.71.4.1828-1832.2003 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Slavin D. Cryptosporidium meleagridis (sp. nov.). J Comp Pathol 65, 262–266 (1955). 10.1016/s0368-1742(55)80025-2 [DOI] [PubMed] [Google Scholar]
9.Fayer R. Taxonomy and species delimitation in Cryptosporidium. Exp Parasitol 124, 90–97 (2010). 10.1016/j.exppara.2009.03.005 [DOI] [PubMed] [Google Scholar]
10.Stensvold C. R., Beser J., Axen C. & Lebbad M. High applicability of a novel method for gp60-based subtyping of Cryptosporidium meleagridis. J Clin Microbiol 52, 2311–2319 (2014). 10.1128/JCM.00598-14 [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Cama V. A. et al. Cryptosporidium species and genotypes in HIV-positive patients in Lima, Peru. J Eukaryot Microbiol 50 Suppl, 531–533 (2003). https://doi.org: 10.1111/j.1550-7408.2003.tb00620.x [DOI] [PubMed] [Google Scholar]
12.Baptista R. P. et al. Long-read assembly and comparative evidence-based reanalysis of Cryptosporidium genome sequences reveal expanded transporter repertoire and duplication of entire chromosome ends including subtelomeric regions. Genome Res 32, 203–213 (2022). 10.1101/gr.275325.121 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Agyabeng-Dadzie F. et al. Evaluating the benefits and limits of multiple displacement amplification with whole-genome Oxford Nanopore Sequencing. bioRxiv (2024). 10.1101/2024.02.09.579537 [DOI] [Google Scholar]
14.Lagesen K. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 35, 3100–3108 (2007). 10.1093/nar/gkm160 [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Kolmogorov M., Yuan J., Lin Y. & Pevzner P. A. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol 37, 540–546 (2019). 10.1038/s41587-019-0072-8 [DOI] [PubMed] [Google Scholar]
16.Wick R. R. & Holt K. E. Polypolish: Short-read polishing of long-read bacterial genome assemblies. PLoS Comput Biol 18, e1009802 (2022). 10.1371/journal.pcbi.1009802 [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Li H. & Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009). 10.1093/bioinformatics/btp324 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Dainat J.. AGAT: Another Gff Analysis Toolkit to handle annotations in any GTF/GFF format. [Google Scholar]
19.Gremme G., Steinbiss S. & Kurtz S. GenomeTools: a comprehensive software library for efficient processing of structured genome annotations. IEEE/ACM Trans Comput Biol Bioinform 10, 645–656 (2013). 10.1109/TCBB.2013.68 [DOI] [PubMed] [Google Scholar]
20.Darling A. E., Mau B. & Perna N. T. progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5, e11147 (2010). 10.1371/journal.pone.0011147 [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Kearse M. et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649 (2012). 10.1093/bioinformatics/bts199 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Altschul S. F., Gish W., Miller W., Myers E. W. & Lipman D. J. Basic local alignment search tool. J Mol Biol 215, 403–410 (1990). 10.1016/S0022-2836(05)80360-2 [DOI] [PubMed] [Google Scholar]
23.Astashyn A. et al. Rapid and sensitive detection of genome contamination at scale with FCS-GX. bioRxiv (2023). 10.1101/2023.06.02.543519 [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Baptista R. P., Xiao R., Li Y., Glenn T. C. & Kissinger J. C. New T2T assembly of Cryptosporidium parvum IOWA annotated with reference genome gene identifiers. bioRxiv, 2023.2006.2013.544219 (2023). 10.1101/2023.06.13.544219 [DOI] [Google Scholar]
25.Lee E. et al. Web Apollo: a web-based genomic annotation editing platform. Genome Biol 14, R93 (2013). 10.1186/gb-2013-14-8-r93 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Shumate A. & Salzberg S. L. Liftoff: accurate mapping of gene annotations. Bioinformatics (2020). 10.1093/bioinformatics/btaa1016 [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Stanke M. & Morgenstern B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res 33, W465–467 (2005). 10.1093/nar/gki458 [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Schattner P., Brooks A. N. & Lowe T. M. The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res 33, W686–689 (2005). 10.1093/nar/gki366 [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Conesa A. et al. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21, 3674–3676 (2005). 10.1093/bioinformatics/bti610 [DOI] [PubMed] [Google Scholar]
30.Ifeonu O. O. et al. Annotated draft genome sequences of three species of Cryptosporidium: Cryptosporidium meleagridis isolate UKMEL1, C. baileyi isolate TAMU-09Q1 and C. hominis isolates TU502_2012 and UKH1. Pathog Dis 74 (2016). 10.1093/femspd/ftw080 [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Emms D. M. & Kelly S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol 16, 157 (2015). 10.1186/s13059-015-0721-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Sun J. et al. OrthoVenn3: an integrated platform for exploring and visualizing orthologous data across genomes. Nucleic Acids Res 51, W397–W403 (2023). 10.1093/nar/gkad313 [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Warrenfeltz S., Kissinger J. C. & EuPath D. B. T. Accessing Cryptosporidium Omic and Isolate Data via CryptoDB.org. Methods Mol Biol 2052, 139–192 (2020). 10.1007/978-1-4939-9748-0_10 [DOI] [PubMed] [Google Scholar]
34.Krzywinski M. et al. Circos: An information aesthetic for comparative genomics. Genome Res (2009). https://doi.org:gr.092759.109 [pii] 10.1101/gr.092759.109 [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Chen C. et al. TBtools: An Integrative Toolkit Developed for Interactive Analyses of Big Biological Data. Mol Plant 13, 1194–1202 (2020). 10.1016/j.molp.2020.06.009 [DOI] [PubMed] [Google Scholar]
36.Chu J. JupiterPlot: A Circos-based tool to visualize genome assembly consistency (1.0). Zenodo (2018). [Google Scholar]
37.Hulsen T., Huynen M. A., de Vlieg J. & Groenen P. M. Benchmarking ortholog identification methods using functional genomics data. Genome Biol 7, R31 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] 1.Putignani L. et al. Characterization of a mitochondrion-like organelle in Cryptosporidium parvum. Parasitology 129, 1–18 (2004). [DOI] [PubMed] [Google Scholar]

[R2] 2.Hlavsa M. C. et al. Outbreaks Associated with Treated Recreational Water - United States, 2000–2014. MMWR Morb Mortal Wkly Rep 67, 547–551 (2018). 10.15585/mmwr.mm6719a3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Kotloff K. L. et al. Burden and aetiology of diarrhoeal disease in infants and young children in developing countries (the Global Enteric Multicenter Study, GEMS): a prospective, case-control study. Lancet 382, 209–222 (2013). 10.1016/S0140-6736(13)60844-2 [DOI] [PubMed] [Google Scholar]

[R4] 4.Girma M., Teshome W., Petros B. & Endeshaw T. Cryptosporidiosis and Isosporiasis among HIV-positive individuals in south Ethiopia: a cross sectional study. BMC Infect Dis 14, 100 (2014). 10.1186/1471-2334-14-100 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Investigators M.-E. N. The MAL-ED study: a multinational and multidisciplinary approach to understand the relationship between enteric pathogens, malnutrition, gut physiology, physical growth, cognitive development, and immune responses in infants and children up to 2 years of age in resource-poor environments. Clin Infect Dis 59 Suppl 4, S193–206 (2014). 10.1093/cid/ciu653 [DOI] [PubMed] [Google Scholar]

[R6] 6.Gilbert I. H. et al. Safe and effective treatments are needed for cryptosporidiosis, a truly neglected tropical disease. BMJ Glob Health 8 (2023). 10.1136/bmjgh-2023-012540 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Akiyoshi D. E. et al. Characterization of Cryptosporidium meleagridis of human origin passaged through different host species. Infect Immun 71, 1828–1832 (2003). 10.1128/IAI.71.4.1828-1832.2003 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Slavin D. Cryptosporidium meleagridis (sp. nov.). J Comp Pathol 65, 262–266 (1955). 10.1016/s0368-1742(55)80025-2 [DOI] [PubMed] [Google Scholar]

[R9] 9.Fayer R. Taxonomy and species delimitation in Cryptosporidium. Exp Parasitol 124, 90–97 (2010). 10.1016/j.exppara.2009.03.005 [DOI] [PubMed] [Google Scholar]

[R10] 10.Stensvold C. R., Beser J., Axen C. & Lebbad M. High applicability of a novel method for gp60-based subtyping of Cryptosporidium meleagridis. J Clin Microbiol 52, 2311–2319 (2014). 10.1128/JCM.00598-14 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Cama V. A. et al. Cryptosporidium species and genotypes in HIV-positive patients in Lima, Peru. J Eukaryot Microbiol 50 Suppl, 531–533 (2003). https://doi.org: 10.1111/j.1550-7408.2003.tb00620.x [DOI] [PubMed] [Google Scholar]

[R12] 12.Baptista R. P. et al. Long-read assembly and comparative evidence-based reanalysis of Cryptosporidium genome sequences reveal expanded transporter repertoire and duplication of entire chromosome ends including subtelomeric regions. Genome Res 32, 203–213 (2022). 10.1101/gr.275325.121 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Agyabeng-Dadzie F. et al. Evaluating the benefits and limits of multiple displacement amplification with whole-genome Oxford Nanopore Sequencing. bioRxiv (2024). 10.1101/2024.02.09.579537 [DOI] [Google Scholar]

[R14] 14.Lagesen K. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 35, 3100–3108 (2007). 10.1093/nar/gkm160 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Kolmogorov M., Yuan J., Lin Y. & Pevzner P. A. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol 37, 540–546 (2019). 10.1038/s41587-019-0072-8 [DOI] [PubMed] [Google Scholar]

[R16] 16.Wick R. R. & Holt K. E. Polypolish: Short-read polishing of long-read bacterial genome assemblies. PLoS Comput Biol 18, e1009802 (2022). 10.1371/journal.pcbi.1009802 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Li H. & Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009). 10.1093/bioinformatics/btp324 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Dainat J.. AGAT: Another Gff Analysis Toolkit to handle annotations in any GTF/GFF format. [Google Scholar]

[R19] 19.Gremme G., Steinbiss S. & Kurtz S. GenomeTools: a comprehensive software library for efficient processing of structured genome annotations. IEEE/ACM Trans Comput Biol Bioinform 10, 645–656 (2013). 10.1109/TCBB.2013.68 [DOI] [PubMed] [Google Scholar]

[R20] 20.Darling A. E., Mau B. & Perna N. T. progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5, e11147 (2010). 10.1371/journal.pone.0011147 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Kearse M. et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649 (2012). 10.1093/bioinformatics/bts199 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Altschul S. F., Gish W., Miller W., Myers E. W. & Lipman D. J. Basic local alignment search tool. J Mol Biol 215, 403–410 (1990). 10.1016/S0022-2836(05)80360-2 [DOI] [PubMed] [Google Scholar]

[R23] 23.Astashyn A. et al. Rapid and sensitive detection of genome contamination at scale with FCS-GX. bioRxiv (2023). 10.1101/2023.06.02.543519 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Baptista R. P., Xiao R., Li Y., Glenn T. C. & Kissinger J. C. New T2T assembly of Cryptosporidium parvum IOWA annotated with reference genome gene identifiers. bioRxiv, 2023.2006.2013.544219 (2023). 10.1101/2023.06.13.544219 [DOI] [Google Scholar]

[R25] 25.Lee E. et al. Web Apollo: a web-based genomic annotation editing platform. Genome Biol 14, R93 (2013). 10.1186/gb-2013-14-8-r93 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Shumate A. & Salzberg S. L. Liftoff: accurate mapping of gene annotations. Bioinformatics (2020). 10.1093/bioinformatics/btaa1016 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Stanke M. & Morgenstern B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res 33, W465–467 (2005). 10.1093/nar/gki458 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Schattner P., Brooks A. N. & Lowe T. M. The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res 33, W686–689 (2005). 10.1093/nar/gki366 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Conesa A. et al. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21, 3674–3676 (2005). 10.1093/bioinformatics/bti610 [DOI] [PubMed] [Google Scholar]

[R30] 30.Ifeonu O. O. et al. Annotated draft genome sequences of three species of Cryptosporidium: Cryptosporidium meleagridis isolate UKMEL1, C. baileyi isolate TAMU-09Q1 and C. hominis isolates TU502_2012 and UKH1. Pathog Dis 74 (2016). 10.1093/femspd/ftw080 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Emms D. M. & Kelly S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol 16, 157 (2015). 10.1186/s13059-015-0721-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] 32.Sun J. et al. OrthoVenn3: an integrated platform for exploring and visualizing orthologous data across genomes. Nucleic Acids Res 51, W397–W403 (2023). 10.1093/nar/gkad313 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Warrenfeltz S., Kissinger J. C. & EuPath D. B. T. Accessing Cryptosporidium Omic and Isolate Data via CryptoDB.org. Methods Mol Biol 2052, 139–192 (2020). 10.1007/978-1-4939-9748-0_10 [DOI] [PubMed] [Google Scholar]

[R34] 34.Krzywinski M. et al. Circos: An information aesthetic for comparative genomics. Genome Res (2009). https://doi.org:gr.092759.109 [pii] 10.1101/gr.092759.109 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] 35.Chen C. et al. TBtools: An Integrative Toolkit Developed for Interactive Analyses of Big Biological Data. Mol Plant 13, 1194–1202 (2020). 10.1016/j.molp.2020.06.009 [DOI] [PubMed] [Google Scholar]

[R36] 36.Chu J. JupiterPlot: A Circos-based tool to visualize genome assembly consistency (1.0). Zenodo (2018). [Google Scholar]

[R37] 37.Hulsen T., Huynen M. A., de Vlieg J. & Groenen P. M. Benchmarking ortholog identification methods using functional genomics data. Genome Biol 7, R31 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

This is a preprint.

A new chromosome-level genome assembly and annotation of Cryptosporidium meleagridis

Lasya R Penumarthi

Rodrigo P Baptista

Megan S Beaudry

Travis C Glenn

Jessica C Kissinger

Abstract

Background & Summary

Table 1.

Figure 1. DNA synteny plot of the eight chromosome level contigs of CmTU1867 (left hemisphere) and CmUKMEL1 (right hemisphere).

Figure 2. Protein synteny analysis of the eight chromosome-level contigs of CmTU1967 (right hemisphere) and Cryptosporidium parvum, CpBGF (left hemisphere).

Figure 3. Venn diagram of ortholog search results following manual validation.

Table 3.

Table 2.

Methods

Whole Genome Sequencing and Assembly

Figure 4. Experimental workflow for genome sequencing, assembly, annotation, and analysis.

Genome Annotation

Comparative Genomics

Figure 5. Ortholog search results shown in a Venn diagram.

Data Records

Technical Validation

Acknowledgements

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

This is a preprint.

A new chromosome-level genome assembly and annotation of Cryptosporidium meleagridis

Lasya R Penumarthi

Rodrigo P Baptista

Megan S Beaudry

Travis C Glenn

Jessica C Kissinger

Abstract

Background & Summary

Table 1.

Figure 1. DNA synteny plot of the eight chromosome level contigs of CmTU1867 (left hemisphere) and CmUKMEL1 (right hemisphere).

Figure 2. Protein synteny analysis of the eight chromosome-level contigs of CmTU1967 (right hemisphere) and Cryptosporidium parvum, CpBGF (left hemisphere).

Figure 3. Venn diagram of ortholog search results following manual validation.

Table 3.

Table 2.

Methods

Whole Genome Sequencing and Assembly

Figure 4. Experimental workflow for genome sequencing, assembly, annotation, and analysis.

Genome Annotation

Comparative Genomics

Figure 5. Ortholog search results shown in a Venn diagram.

Data Records

Technical Validation

Acknowledgements

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases