Abstract
Cryptosporidium spp. are medically and scientifically relevant protozoan parasites that cause severe diarrheal illness in infants and immunosuppressed populations as well as animals. Although most human Cryptosporidium infections are caused by C. parvum and C. hominis, there are several other human-infecting species including C. meleagridis, which is commonly observed in developing countries. Here, we polished and annotated a long-read genome sequence assembly for C. meleagridis TU1867, a species which infects birds and humans. The genome sequence was generated using a combination of whole genome amplification (WGA) and long-read Oxford Nanopore Technologies sequencing. The assembly was then polished with Illumina data. The chromosome-level genome assembly is 9.2 Mbp with a contig N50 of 1.1 Mb. Annotation revealed 3,923 protein-coding genes. A BUSCO analysis indicates a completeness of 96.6% (n=446), including 430 (96.4%) single-copy and 1 (0.224%) duplicated apicomplexan conserved gene(s). The new C. meleagridis genome assembly is nearly gap-free and provides a valuable new resource for the Cryptosporidium community and future studies on evolution and host-specificity.
Keywords: Whole genome amplification, Oxford Nanopore, functional annotation, comparative genomics
Background & Summary
Cryptosporidium is an apicomplexan protozoan parasite of global medical, scientific, and veterinary significance that can cause moderate-to-severe diarrhea in humans and animals1. It is the leading cause of waterborne disease outbreaks in the US2,3. Though cryptosporidiosis causes illness in both immunocompromised and immunocompetent individuals, it is especially severe in immunocompromised and elderly populations as well as in children, resulting in persistent infection, malnutrition, and, in some cases, death3–5. In 2019, the Global Burden of Disease study found 133,422 global deaths and an annual loss of 8.2 million disability-adjusted life years (DALYs) due to Cryptosporidium6. C. meleagridis is an avian and mammalian-infecting Cryptosporidium species that was first described in turkeys7,8. Human Cryptosporidium infections are caused predominantly by C. parvum and C. hominis, but species such as C. meleagridis can also infect humans. In fact, C. meleagridis is the third most common human-infecting Cryptosporidium species following C. parvum and C. hominis9. Though generally less common, C. meleagridis infection has been reported to be as common as C. parvum in some parts of the world and can lead to death in rare cases10,11.
At this time, 15 of the >30 reported Cryptosporidium species have assembled genome sequences. However, only 8 have been annotated including C. parvum, C. hominis, C. tyzzeri, and C. meleagridis12. The latest release of the C. meleagridis genome strain UKMEL1 (CmUKMEL1) contains gaps and is assembled into 57 contigs. Due to a highly compact genome, it is challenging to sequence the genome of Cryptosporidium parasites from a single individual. Since cloning of Cryptosporidium parasites is not possible, sequencing a small pool of individuals is preferred over bulk sequencing to reduce heterozygosity. Recently, a new method has been implemented to generate DNA sequences from Cryptosporidium using a whole genome amplification (WGA) approach and was tested on C. meleagridis strain TU1867 (CmTU1867) which provided sufficient DNA for library construction and generation of a high-quality genome through long-read sequencing 13. Here we share a chromosome-level assembly of the C. meleagridis genome. The new CmTU1867 genome assembly is 201,275 base pairs longer than that of CmUKMEL1. The largest contig in the new assembly is 632,735 base pairs (bp) longer than the largest contig in CmUKMEL1. We note a larger N50 value of 1,105,563 bp in the new CmTU1867 assembly compared to CmUKMEL1 which has an N50 value of 322,908 bp (Table 1). The new CmTU1867 assembly and annotation provides a valuable resource to the Cryptosporidium community. The high-quality C. meleagridis genome results from a new experimental approach designed to help generate whole genome sequences from limiting amounts of genomic DNA and is an important resource that will contribute to our understanding of Cryptosporidium evolution and host specificity.
Table 1.
Statistics of the C. meleagridis and CpBGF genome assemblies and annotation
Statistics | CmTU1867 | CmUKMEL1 | CpBGF |
---|---|---|---|
# of contigs | 13 | 57 | 8 |
Largest contig (bp) | 1365597 | 732862 | 1379130 |
# T2T chromosomes | 0 | 0 | 8 |
Total length (bp) | 9178485 | 8973200 | 9222738 |
N50 (bp) | 1105563 | 322908 | 1108772 |
GC (%) | 30.9 | 31.0 | 30.1 |
# N’s per 100 kbp | 0.00 | 0.00 | 0.00 |
# of telomeres identified | 1 | 10 | 16 |
# rRNA | 17 | 8 | 15 |
# tRNA | 45 | 45 | 45 |
# protein-coding genes | 3923 | 3753 | 3932 |
Average gene length (bp) | 1824 | 1885 | 2145 |
The initial genome assembly contained 13 contigs including 8 chromosomes, 5 contigs (681–30,300 bp), 2 of which were later identified as contamination and removed (Figure 1). Two additional contigs were created manually (“contig_10” and “contig_11”) from the beginnings of chromosome 2 and chromosome 6 due to detection of an assembly artifact in these chromosomes. Thus, the final assembly contains 13 contigs, 8 chromosomes and contigs 9–13. Contig_9 and contig_13 have regions identical to regions of chromosome 1 and chromosome 3, respectively, but assembled separately from the full chromosomes (Table 1). One drawback to the new assembly is its lack of telomeres in comparison to CmUKMEL1. We were only able to detect 1 telomere in CmTU1867 on chromosome 5. Searching through the long reads, we were able to find several reads with telomeres on them that did not assemble. Though these reads did not assemble, regions of the read that did not contain the telomere pattern matched the assembly. Upon mapping these reads back to the assembly, we identified three additional telomere locations that could be placed manually (beginning and end of chromosome 3 and beginning of chromosome 4). At least 4 telomere-containing long-reads mapped to these regions with at least 1 long (>1kb) read that extended into unique regions of the chromosome. However, due to relatively low read support for these telomeres, we did not extend the ends of chromosomes in the assembly with these telomere-containing reads.
Figure 1. DNA synteny plot of the eight chromosome level contigs of CmTU1867 (left hemisphere) and CmUKMEL1 (right hemisphere).
Jupiterplot between the previous CmUKMEL1 genome sequence and the new CmTU1867 genome sequence. Ribbons are colored with respect to the reference genome (CmTU1867).
The new CmTU1867 genome sequence has 10 additional ribosomal RNA genes compared to CmUKMEL1. The 16 rRNA genes (excluding the 5.8S) are in clusters of 2–3 and are found on either chromosome 1, 2, 3, 7, or 8 (Figure 2). We noticed that what RNAmmer14 detected as 5.8S rRNAs in CmTU1867 clustered separately from the 18S and the 28S rRNAs with the exception of one 5.8S (ID=cmbei_2001394) located on chromosome 2 adjacent to a 28S rRNA. Most Apicomplexans have been annotated to have the 5.8S rRNA in between the 18S and 28S rRNAs. Additional searches revealed that all but the 5.8S on chromosome 2 were 5S rRNAs. The six 5S rRNAs in CmTU1867 are in 2 clusters of 3, one on chromosome 3 and the other on chromosome 7 (Figure 2). In CpBGF the cluster of 5S rRNAs in chromosome 3 contains 2 rRNAs whereas in CpIOWA-ATCC and CmTU1867, the cluster of 5S rRNAs in chromosome 3 contains 3 rRNAs. These patterns may arise because of variation in the copy number of the 5S rRNA within a population of parasites or among different species of Cryptosporidium or compressions during genome assembly. When CmTU1867 reads were mapped to the assembly at regions where there are 5S rRNA clusters in chromosomes 3 and 7, we saw relatively even coverage throughout the region. However, when we did this test with the CpBGF reads and genome sequence, we found a 2–3x read compression at precisely the 5S regions on chromosome 3 and 7.
Figure 2. Protein synteny analysis of the eight chromosome-level contigs of CmTU1967 (right hemisphere) and Cryptosporidium parvum, CpBGF (left hemisphere).
Circos plot rings, moving from the center to the exterior illustrate shared ortholog clusters between CmTU1867 and CpBGF, number of base pairs in 50,000 bp increments, GC content histogram, and gene density. Locations of rRNA genes are as indicated.
The annotation was validated by comparing the CmTU1867, CpBGF, and CmUKMEL1 protein-coding sequences using orthology-based algorithms. We initially found a group of 5 endonucleases in CmTU1867 not in CpBGF or CmUKMEL1. The members of this group did not have any significant hit to any Cryptosporidium species except C. ubiquitum and C. felis in a blastp search. However, when we ran tblastn of this gene family against annotated transcripts in CryptoDB, we found hits to C. parvum, C. hominis, C. tyzerri, C. ryanae, and CmUKMEL1. The members of the gene family have been annotated as 18S ribosomal or non-coding RNAs in C. parvum, C. hominis, C. tyzerri, and C. ryanae and as a 5S rRNA and ncRNA in CmUKMEL1 but as protein-coding genes in C. ubiquitum. Upon manual inspection we found that the genomic sequence for these proteins exists in the C. parvum, C. hominis, and C. tyzerri genome in varying copy number. We additionally found one putative open-reading frame (ORF) predicted by AUGUSTUS in CmTU1867 Chr4 region 828518–828700 bp that was not in CmUKMEL1, CpBGF, or any other species according to blastp and blastn searches. We removed this putative gene from our annotations since we could not validate it with RNAseq data or evidence in any other species, but it may be a unique C. meleagridis gene detected by the improved assembly.
Some of the orthogroups initially detected by OrthoVenn3 fell at the ends of chromosomes in C.parvum that extended beyond the ends of the CmTU1867 and CmUKMEL1 chromosomes. Other times they were unannotated in one species or the other but present in the genome sequence. When we found unannotated proteins that were not initially detected by Liftoff or AUGUSTUS in CmTU1867, we manually added these annotations. Ultimately, we found very few orthogroups that were unique to a species (Figure 3). A description of their investigation can be found in Table 3.
Figure 3. Venn diagram of ortholog search results following manual validation.
Orthogroup comparison among the new CmTU1867, the previous CmUKMEL1, and the newly released reference genome, CpBGF. See Figure 5 for the pre-validation results. Arrows link gene IDs to their orthogroups.
Table 3.
Manual validation of orthogroups not present in all species examined in Figure 5.
Venn Group | CmBEI | CmUKMEL1 | CpBGF | Gene Product | Findings |
---|---|---|---|---|---|
| |||||
3 | cpbgf_7005600 | GMP synthase | Subtelomeric – cannot conclude if in syntenic regions of CmBEI or CmUKMEL1 | ||
cpbgf_1003900 | GMP synthase | Subtelomeric in C. parvum, as above | |||
cpbgf_1003890 | GMP synthase | Subtelomeric in C. parvum, as above | |||
Found cmbei_2003270 | Found cmeUKMEL1_06080 | cpbgf_2003270 | phosphoglucomutase | Orthofinder created two separate groups for this single gene (see below) | |
cpbgf_1003950 | tryptophan synthase beta chain | Subtelomeric in C. parvum – cannot conclude if in syntenic regions of CmBEI or CmUKMEL1 | |||
cmeUKMEL1_09025 | cpbgf_200460 | unspecified product | Subtelomeric in C. parvum, as above | ||
1 | Found cmbei_7005006 | cmeUKMEL1_03235 | Not found | hypothetical protein | Exists in CmBEI – annotation added |
11 | |||||
cmbei_2003270 | cmeUKMEL1_06080 | Found: cpbgf_2003270 | phosphoglucomutase/phosphomannomutase alpha/beta/alpha domain I family protein | Orthofinder created two separate groups for this single gene (see above) | |
cmbei_2003445 | cmeUKMEL1_05990 | Found | hypothetical protein, only 78 aa | No start methionine in C. parvum. Limited expression evidence in C. parvum | |
cmbei_3002025 | cmeUKMEL1_12210 | Found cpbgf_3002017 | Rpp14/Pop5 family protein | Exists as noncoding RNA in C. parvum IOWA BGF | |
cmbei_4002857 | cmeUKMEL1_09770 | Found | hypothetical protein | Exists in C. parvum IOWA BGF, Unannotated | |
cmbei_4003225 | cmeUKMEL1_09950 | Not found | hypothetical protein, only 71 aa | Gene cannot be found in C. parvum IOWA BGF syntenic region | |
cmbei_400495 | cmeUKMEL1_08280 | Not found | hypothetical protein, only 78 aa | Exists in C. parvum IOWA BGF, Unannotated | |
cmbei_500300 | cmeUKMEL1_09795 | Found | hypothetical protein | Exists in C. parvum IOWA BGF, Unannotated | |
cmbei_110040 | cmeUKMEL1_17820 | Found | hypothetical protein | Exists in C. parvum IOWA BGF, Unannotated | |
cmbei_6004835 | cmeUKMEL1_05455 | Found | hypothetical protein | Exists in C. parvum IOWA BGF, Unannotated | |
cmbei_7001355 | cmeUKMEL1_10685 | Found | hypothetical protein, only 64 aa | There is a stop codon in the middle of the 3’ ORF | |
cmbei_100030 | cmeUKMEL1_09030 | Found cpbgf_200445 | putative integral membrane protein | Exists as long noncoding RNA in C. parvum IOWA BGF |
aa = amino acids
While annotating the genome, we noticed several genes that were annotated as a single long transcript in CmUKMEL1 but as two distinct genes in CpBGF. Upon investigation, we discovered that these gene annotations vary in size in several Cryptosporidium species. In CmTU1867, the protein is annotated as one single long transcript for the 22 cases described in Table 2 since it is unlikely that the whole ORF could be annotated with no errors in the annotation software and since it is annotated as a single gene in CmUKMEL1 and other Cryptosporidium spp. A lack of RNAseq evidence for C. meleagridis made it challenging to validate whether these genes exist as a single long gene in nature. We made a note that the gene is annotated as two or three distinct genes in other species in the annotation file for CmTU1867 (two of the 20 proteins are annotated as 3 proteins in CpBGF). Five C. meleagridis orthologs of C. parvum sub-telomeric genes could not be found in the current assembly including: cpbgf_1003890, cpbgf_1003900, cpbgf_1003950, and cpbgf_7005600 as well as cpbf_200460 which was observed in CmUKMEL but not CmTU1867 (Figure 3). These gene differences probably result from the incomplete C. meleagridis TU1867 sub-telomeric regions as we did not assemble many telomeres. However, it is also possible that those genes do not exist in C. meleagridis. This determination will require a C. meleagridis T2T assembly.
Table 2.
Single large, annotated genes in CmTU1867 that are annotated as two or three distinct sequential genes in CpBGF and/or other Cryptosporidium spp.
Chr | Gene ID | Protein in CmTU1867 | Gene IDs in CpBGF | ||
---|---|---|---|---|---|
1 | cmbei_100150 cmbei_100730 cmbei_1002800 |
Uncharacterized secreted protein (SKSR gene family) Glutamine cyclotransferase domain containing protein Methyltransferase TRM13, MED7, Zinc finger domain-containing protein |
cpbgf_100150 | cpbgf_100160 | |
cpbgf_100730 | cpbgf_100733 | ||||
cpbgf_1002800 | cpbgf_1002810 | ||||
2 | cmbei_20010 | SFI domain containing protein | cpbgf_20010 | cpbgf_200470 | |
3 | cmbei_3002310 cmbei_3002700 |
RNA recognition motif and AAA-type ATPase core domain containing protein Transport protein particle (TRAPP) domain containing protein | cpbgf_3002310 | cpbgf_3002300 | cpbgf_3002290 |
cpbgf_3002700 | cpbgf_3002706 | ||||
4 | cmbei_4002100 cmbei_4002180 |
PIG-A GPI anchor and glucosyltransferase domain containing protein Peptidase A1 and Dpy-19/Dpy19-like domain-containing protein | cpbgf_4002100 | cpbgf_4002093 | |
cpbgf_4002180 | cpbgf_4002190 | ||||
5 | cmbei_500340 cmbei_500470 cmbei_5002280 cmbei_5002830 cmbei_5004500 cmbei_5003110 |
Signal peptide containing protein Peptidase S9, prolyl oligopeptidase, catalytic domain containing protein Signal peptide and transmembrane domain containing protein Vacuolar protein sorting-associated protein 13 domain containing protein Vacuolar protein sorting-associated protein 13 AAA+ ATPase and VWFA domain containing protein |
cpbgf_500340 | cpbgf_500350 | |
cpbgf_500470 | cpbgf_500466 | ||||
cpbgf_5002280 | cpbgf_5002290 | ||||
cpbgf_5002830 | cpbgf_5002840 | ||||
cpbgf_5004500 | cpbgf_5004490 | cpbgf_5004480 | |||
cpbgf_5003110 | cpbgf_5005540 | ||||
6 | cmbei_600540 cmbei_6001250 cmbei_6002100 cmbei_6002140 |
Serine/threonine protein kinase domain containing protein Uncharacterized protein Uncharacterized protein Potassium channel domain containing protein |
cpbgf_600540 | cpbgf_600530 | |
cpbgf_6001250 | cpbgf_6001260 | ||||
cpbgf_6002100 | cpbgf_6002110 | ||||
cpbgf_6002140 | cpbgf_6002143 | ||||
7 | None | ||||
8 | cmbei_800690 cmbei_8002510 |
Signal peptide containing putative Formin J protein Putative cyclin dependent kinase |
cpbgf_800690 | cpbgf_800680 | |
cpbgf_8002510 | cpbgf_8002500 |
Methods
Whole Genome Sequencing and Assembly
C. meleagridis isolate TU1867 genomic DNA was obtained from BEI Resources (cat. number NR-2521 ATCC, Manassas, VA). A total of 10 ng of C. meleagridis DNA was amplified through whole genome amplification using multiple displacement amplification (MDA), followed by T7 endonuclease debranching yielding 400 ng debranched DNA following13 (Figure 4). ONT library preparation was performed using the SQK-RBK004 Rapid Barcoding Sequencing Kit (Oxford Nanopore Technologies, Oxford, UK) as per the manufacturer’s instructions. Sequencing was performed on an ONT MinION device with R9.4.1 flow cells and base called by guppy v.6.4.2 using the high-accuracy base call model.. The long-read fastq reads were assembled using Flye v.2.8.215 with the --nano-raw option and -g 9m. The draft long read based genome was polished with PolyPolish v.0.5.016 using default parameters to increase the accuracy of the base calls by using C. meleagridis strain TU1867 Illumina sequences (SRX253214) generated by others. Intermediate files needed for PolyPolish were generated using BWA v.0.7.1717. The resulting contigs were ordered and oriented to match the reference CpBGF genome assembly using AGAT18 v. 1.1.0 PERL script agat_sq_reverse_complement.pl and GenomeTools19. Contig orientation was using the progressive Mauve alignment v 1.1.320 in Geneious Prime v 2023.2.121. Contamination was detected by searching the NCBI nr database using BLAST22 (blastx default parameters) and FCS-GX23. Contaminant contigs were removed from further analysis. We manually searched the contigs for telomeres. Telomeres were also identified, as in CpBGF24 using the telomere-locating python script FindTelomeres to find the Cryptosporidium telomere repeat 5’-CCTAAA-3’ and its complement at the ends of assembled contigs (https://github.com/JanaSperschneider/FindTelomeres). The unassembled ONT long-reads were also searched for this telomere repeat with FindTelomeres and reads with telomeres were mapped back to the genome assembly using minimap2 (default parameters). Read-mapping to the whole genome was done using minimap2 v.2.26 with the option – secondary=no to prevent multi-mapping. Genome statistics were generated using GenomeTools v.1.6.219 programs gt stat and gt seqstat. AGAT v.1.1.018 PERL scripts agat_sq_stat_basic.pl and agat_sp_statistics.pl were used to generate statistical information with default parameters.
Figure 4. Experimental workflow for genome sequencing, assembly, annotation, and analysis.
Bioinformatics workflow for assembly and annotation of the DNA derived from CmTU1867 WGA. Green boxes represent main initial steps as well as new data used for parts of the pipeline and blue boxes represent subsequent downstream analyses of the data generated. Please refer to methods for additional details.
Genome Annotation
Tracks for manual annotation were generated using a local Apollo2 server25 using two approaches: (1) an orthology based annotation transfer using the tool Liftoff26 and (2) an ab initio gene prediction using Augustus27 trained with C. parvum IOWA-ATCC and CmUKMEL1 protein sequences from CryptoDBv.50 with the -copies flag to look for extra gene copies and otherwise default parameters. Annotation Liftoff tracks were created from the current CmUKMEL1, CpBGF, and CpIOWA-ATCC annotated genes with the -copies flag to look for extra gene copies. In situations where AUGUSTUS and Liftoff gene structures disagreed, the conflicting gene models were searched in BLASTp in CryptoDB to check for the gene structure that was most abundant in existing annotations. As there is no available RNA-seq data for C. meleagridis there is no way to confirm gene predictions and UTRs are not annotated. Tracks for prediction and manual annotation of rRNAs were created using RNAmmer 1.214 (S -euk, -m lsu, ssu). TRNAscan 2.028 was used to predict tRNAs using default parameters. Functional annotation was generated with Blast2GO29 (using blastp, the nr database nr, word size 5, and e-value 1e-5) and compared with results from the reference telomere-to-telomere CpBGF genome functional annotation. Edits to the CmTU1867 gff file gene names were performed with basic bash and awk commands.
Comparative Genomics
A comparison of orthologous genes between the new C. meleagridis assembly and the previous C. meleagridis assembly30 was completed using OrthoFinder v2.5.431 with default parameters and visualized using OrthoVenn332. Figure 3 represents the orthology results following extensive manual validation (Figure 5 and Table 3) of each orthogroup difference. Manual analyses utilized both NCBI BLASTp and CryptoDB33 BLASTp. Orthology, genome, and rRNA comparisons were created using Circos34, TBTools35, and JupiterPlot36. Comparisons of rRNA clusters in CpBGF, CpIOWA-ATCC, and CmTU1867 were performed using RNAmmer14 with default parameters. CmTU1867 long reads were mapped back to contig regions containing 5S rRNA clusters using minimap2 with --secondary=no to account for multi-mapping.
Figure 5. Ortholog search results shown in a Venn diagram.
Orthogroup comparison among the new CmTU1867, the previous CmUKMEL1, and the newly released reference genome, CpBGF prior to validation and correction. Arrows point to orthogroups containing the indicated gene IDs.
Data Records
The genomic sequences, reads, and metadata for the Cryptosporidium meleagridis TU1867 strain have been deposited in the NCBI GenBank under BioProject PRJNA1022047. The sequence was polished with reads from NCBI GenBank SRA SRR793561. The fully assembled and annotated sequence was submitted to the NCBI GenBank under submission ID SUB14212942 and will be released as soon as it completes processing.
Technical Validation
CmTU1867 assembly completeness was evaluated using the Benchmarking Universal Single-Copy Orthologs (BUSCO) software v.5.5.037 to search against apicomplexan databases (apicomplexa_odb10) which contain 446 orthologous single-copy genes in total. The results showed an overall completeness score of 96.6% (n=446). Of these, 430 (96.4%) single-copy genes were retrieved of which 1 (0.224%) was duplicated. These results indicate high completeness of the genome assembly.
Further analysis of the assembly and annotated protein encoding regions utilized an orthology comparison of CmTU1867, CmUKMEL1, and CpBGF with the OrthoFinder algorithm in OrthoVenn3 (Figure 5). Orthogroups belonging to CmUKMEL1 only, CpBGF only, CmUKMEL and CpBGF only, and CmUKMEL1 and CmTU1867 only were extensively analyzed (Table 3). Several genes found only in CpBGF were shown to be subtelomeric in both CmUKMEL1 and CmBEI and thus likely missing from the incomplete chromosome ends of CmTU1867. Several genes encoding short < 100 amino acid proteins found in both CmBEI and CmUKMEL1 exist in CpBGF but are unannotated. Following these analyses, a new Venn diagram (Figure 3) was created that represents the revised, validated findings.
Acknowledgements
This work was funded by NIH R01AI14866 to JCK and TCG.
Footnotes
Competing interests
The authors declare no competing interests.
Code Availability
Pipelines and code involved in processing the data were executed by following the respective manuals of the bioinformatics software programs used. No custom scripts were generated in this study.
References
- 1.Putignani L. et al. Characterization of a mitochondrion-like organelle in Cryptosporidium parvum. Parasitology 129, 1–18 (2004). [DOI] [PubMed] [Google Scholar]
- 2.Hlavsa M. C. et al. Outbreaks Associated with Treated Recreational Water - United States, 2000–2014. MMWR Morb Mortal Wkly Rep 67, 547–551 (2018). 10.15585/mmwr.mm6719a3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Kotloff K. L. et al. Burden and aetiology of diarrhoeal disease in infants and young children in developing countries (the Global Enteric Multicenter Study, GEMS): a prospective, case-control study. Lancet 382, 209–222 (2013). 10.1016/S0140-6736(13)60844-2 [DOI] [PubMed] [Google Scholar]
- 4.Girma M., Teshome W., Petros B. & Endeshaw T. Cryptosporidiosis and Isosporiasis among HIV-positive individuals in south Ethiopia: a cross sectional study. BMC Infect Dis 14, 100 (2014). 10.1186/1471-2334-14-100 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Investigators M.-E. N. The MAL-ED study: a multinational and multidisciplinary approach to understand the relationship between enteric pathogens, malnutrition, gut physiology, physical growth, cognitive development, and immune responses in infants and children up to 2 years of age in resource-poor environments. Clin Infect Dis 59 Suppl 4, S193–206 (2014). 10.1093/cid/ciu653 [DOI] [PubMed] [Google Scholar]
- 6.Gilbert I. H. et al. Safe and effective treatments are needed for cryptosporidiosis, a truly neglected tropical disease. BMJ Glob Health 8 (2023). 10.1136/bmjgh-2023-012540 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Akiyoshi D. E. et al. Characterization of Cryptosporidium meleagridis of human origin passaged through different host species. Infect Immun 71, 1828–1832 (2003). 10.1128/IAI.71.4.1828-1832.2003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Slavin D. Cryptosporidium meleagridis (sp. nov.). J Comp Pathol 65, 262–266 (1955). 10.1016/s0368-1742(55)80025-2 [DOI] [PubMed] [Google Scholar]
- 9.Fayer R. Taxonomy and species delimitation in Cryptosporidium. Exp Parasitol 124, 90–97 (2010). 10.1016/j.exppara.2009.03.005 [DOI] [PubMed] [Google Scholar]
- 10.Stensvold C. R., Beser J., Axen C. & Lebbad M. High applicability of a novel method for gp60-based subtyping of Cryptosporidium meleagridis. J Clin Microbiol 52, 2311–2319 (2014). 10.1128/JCM.00598-14 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Cama V. A. et al. Cryptosporidium species and genotypes in HIV-positive patients in Lima, Peru. J Eukaryot Microbiol 50 Suppl, 531–533 (2003). https://doi.org: 10.1111/j.1550-7408.2003.tb00620.x [DOI] [PubMed] [Google Scholar]
- 12.Baptista R. P. et al. Long-read assembly and comparative evidence-based reanalysis of Cryptosporidium genome sequences reveal expanded transporter repertoire and duplication of entire chromosome ends including subtelomeric regions. Genome Res 32, 203–213 (2022). 10.1101/gr.275325.121 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Agyabeng-Dadzie F. et al. Evaluating the benefits and limits of multiple displacement amplification with whole-genome Oxford Nanopore Sequencing. bioRxiv (2024). 10.1101/2024.02.09.579537 [DOI] [Google Scholar]
- 14.Lagesen K. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 35, 3100–3108 (2007). 10.1093/nar/gkm160 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kolmogorov M., Yuan J., Lin Y. & Pevzner P. A. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol 37, 540–546 (2019). 10.1038/s41587-019-0072-8 [DOI] [PubMed] [Google Scholar]
- 16.Wick R. R. & Holt K. E. Polypolish: Short-read polishing of long-read bacterial genome assemblies. PLoS Comput Biol 18, e1009802 (2022). 10.1371/journal.pcbi.1009802 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Li H. & Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009). 10.1093/bioinformatics/btp324 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Dainat J.. AGAT: Another Gff Analysis Toolkit to handle annotations in any GTF/GFF format. [Google Scholar]
- 19.Gremme G., Steinbiss S. & Kurtz S. GenomeTools: a comprehensive software library for efficient processing of structured genome annotations. IEEE/ACM Trans Comput Biol Bioinform 10, 645–656 (2013). 10.1109/TCBB.2013.68 [DOI] [PubMed] [Google Scholar]
- 20.Darling A. E., Mau B. & Perna N. T. progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5, e11147 (2010). 10.1371/journal.pone.0011147 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Kearse M. et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649 (2012). 10.1093/bioinformatics/bts199 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Altschul S. F., Gish W., Miller W., Myers E. W. & Lipman D. J. Basic local alignment search tool. J Mol Biol 215, 403–410 (1990). 10.1016/S0022-2836(05)80360-2 [DOI] [PubMed] [Google Scholar]
- 23.Astashyn A. et al. Rapid and sensitive detection of genome contamination at scale with FCS-GX. bioRxiv (2023). 10.1101/2023.06.02.543519 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Baptista R. P., Xiao R., Li Y., Glenn T. C. & Kissinger J. C. New T2T assembly of Cryptosporidium parvum IOWA annotated with reference genome gene identifiers. bioRxiv, 2023.2006.2013.544219 (2023). 10.1101/2023.06.13.544219 [DOI] [Google Scholar]
- 25.Lee E. et al. Web Apollo: a web-based genomic annotation editing platform. Genome Biol 14, R93 (2013). 10.1186/gb-2013-14-8-r93 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Shumate A. & Salzberg S. L. Liftoff: accurate mapping of gene annotations. Bioinformatics (2020). 10.1093/bioinformatics/btaa1016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Stanke M. & Morgenstern B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res 33, W465–467 (2005). 10.1093/nar/gki458 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Schattner P., Brooks A. N. & Lowe T. M. The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res 33, W686–689 (2005). 10.1093/nar/gki366 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Conesa A. et al. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21, 3674–3676 (2005). 10.1093/bioinformatics/bti610 [DOI] [PubMed] [Google Scholar]
- 30.Ifeonu O. O. et al. Annotated draft genome sequences of three species of Cryptosporidium: Cryptosporidium meleagridis isolate UKMEL1, C. baileyi isolate TAMU-09Q1 and C. hominis isolates TU502_2012 and UKH1. Pathog Dis 74 (2016). 10.1093/femspd/ftw080 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Emms D. M. & Kelly S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol 16, 157 (2015). 10.1186/s13059-015-0721-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Sun J. et al. OrthoVenn3: an integrated platform for exploring and visualizing orthologous data across genomes. Nucleic Acids Res 51, W397–W403 (2023). 10.1093/nar/gkad313 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Warrenfeltz S., Kissinger J. C. & EuPath D. B. T. Accessing Cryptosporidium Omic and Isolate Data via CryptoDB.org. Methods Mol Biol 2052, 139–192 (2020). 10.1007/978-1-4939-9748-0_10 [DOI] [PubMed] [Google Scholar]
- 34.Krzywinski M. et al. Circos: An information aesthetic for comparative genomics. Genome Res (2009). https://doi.org:gr.092759.109 [pii] 10.1101/gr.092759.109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Chen C. et al. TBtools: An Integrative Toolkit Developed for Interactive Analyses of Big Biological Data. Mol Plant 13, 1194–1202 (2020). 10.1016/j.molp.2020.06.009 [DOI] [PubMed] [Google Scholar]
- 36.Chu J. JupiterPlot: A Circos-based tool to visualize genome assembly consistency (1.0). Zenodo (2018). [Google Scholar]
- 37.Hulsen T., Huynen M. A., de Vlieg J. & Groenen P. M. Benchmarking ortholog identification methods using functional genomics data. Genome Biol 7, R31 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]