Abstract
Dinoflagellates are a diverse group of microplankton that include free-living, symbiotic, and parasitic species. Amoebophrya, a basal lineage of parasitic dinoflagellates, infects a variety of marine microorganisms, including harmful-bloom-forming algae. Although there are currently 3 published Amoebophrya genomes, this genus has considerable genomic diversity. We add to the growing genomic data for Amoebophrya with an annotated genome assembly for Amoebophrya sp. ex Karlodinium veneficum. This species appears to translate all 3 canonical stop codons contextually. Stop codons are present in the open reading frames of about half of the predicted gene models, including genes essential for cellular function. The in-frame stop codons are likely translated by suppressor tRNAs that were identified in the assembly. We also assembled the mitochondrial genome, which has remained elusive in the previous Amoebophrya genome assemblies. The mitochondrial genome assembly consists of many fragments with high sequence identity in the genes but low sequence identity in intergenic regions. Nuclear and mitochondrially-encoded proteins indicate that Amoebophrya sp. ex K. veneficum does not have a bipartite electron transport chain, unlike previously analyzed Amoebophrya species. This study highlights the importance of analyzing multiple genomes from highly diverse genera such as Amoebophrya.
Keywords: amoebophrya, syndineans, marine alveolates, nonstop genetic code, UGA, dinoflagellate, intracellular parasite
Introduction
Dinoflagellates (Alveolata, Myzozoa) are ecologically important microplankton found in most aquatic environments. The core dinoflagellates (sensu Janouškovec et al. 2017) are predominantly free-living or symbiotic, including common phytoplankton, coral symbionts (Freudenthal 1962; Blank and Trench 1986), and voracious microbial predators (Gaines and Taylor 1984; Jacobson and Anderson 1986). However, the base of the dinoflagellate tree is teeming with parasitic species; these parasites are predominantly known from environmental amplicon sequencing (Groisillier et al. 2006; Guillou et al. 2008) and collectively infect metazoans, ciliates, rhizarians, and other dinoflagellates (Cachon and Cachon 1987). Amoebophrya, a parasitic lineage in the MAGII/MALVII rDNA sequence clade (Guillou et al. 2008), is ubiquitous in marine environments and infects various dinoflagellate hosts (Cachon and Cachon 1987). The life cycle begins as a biflagellate infective dinospore, which adheres to the surface of its host and penetrates its membrane (Cachon and Cachon 1987; Miller et al. 2012). The parasite travels into the host nucleus or cytoplasm and develops into a multinucleate, beehive-like structure as it feeds on the cell (Cachon and Cachon 1987; Fritz and Nass 1992; Miller et al. 2012). Finally, the infection culminates when the host is violently lysed by plasmodial, multi-flagellate, vermiform Amoebophrya, which subsequently swims off as it undergoes cytokinesis to produce new dinospores (Cachon and Cachon 1987; Miller et al. 2012).
Amoebophrya is particularly noteworthy in its ability to infect dinoflagellates that cause harmful algal blooms (Coats et al. 1996; 1999; Park et al. 2004; Chambouvet et al. 2008, 2011; Place et al. 2012; Velo-Suárez et al. 2013). For example, Amoebophrya sp. ex K. veneficum (Amoebophrya sp. Kv), the target of this sequencing study, is a species that infects Karlodinium veneficum, a small ∼12 micron, photosynthetic alga that feeds on dinoflagellates, cryptophytes, and animals (Place et al. 2012; Yang et al. 2020). Karlodinium veneficum secretes ichthyotoxic, hemolytic, and cytotoxic compounds known as karlotoxins (Deeds et al. 2002, 2006), which decrease grazing pressure from zooplankton and aid in the capture of K. veneficum's prey (Adolf et al. 2007). During a bloom of K. veneficum, karlotoxin secretion and rapid oxygen consumption by the algal mass can potentially kill large amounts of fish, with the largest fish kill attributed to K. veneficum being 30,000–50,000 fish (Place et al. 2012). Amoebophrya sp. Kv evades these toxins (Place et al. 2006) and is thought to aid in the termination of the bloom (Place et al. 2012).
Three Amoebophrya genomes are currently publicly available (Amoebophrya sp. A25, A120, and AT5; John et al. 2019; Farhat et al. 2021), all from core dinoflagellate hosts. These genomes are strikingly diverse for organisms of the same genus. They contain a large number of species-specific genes and even closely related species like A25 and A120 share less than half of their predicted proteins (Farhat et al. 2021). The genomes of some species have canonical introns (John et al. 2019), while others have many non-canonical splice sites and invasive intronic elements (Farhat et al. 2021).
Although there is no publicly available genome assembly for Amoebophrya sp. Kv, Bachvaroff (2019) analyzed a short-read-based, fragmented genomic survey and transcriptome. This genome differed from the previously published ones in 2 significant ways. First, about half of the annotated, transcribed genes in Amoebophrya sp. Kv were seemingly interrupted by all 3 canonical stop codons (UAA, UAG, and UGA), including genes of large and well-studied protein families like dynein heavy chains (Bachvaroff 2019). The stop codons were not edited out of the stop-codon-containing genes (SCG) transcripts, and tRNAs recognizing UAA and UAG codons were identified (Bachvaroff 2019). These traits suggested that, unlike the other Amoebophrya species, Amoebophrya sp. Kv utilizes ambiguous stop codons—that is, all 3 canonical stop codons are contextually translated. Second, this assembly included the mitochondrial genome of Amoebophrya sp. Kv (Bachvaroff 2019), which has remained elusive in the other genome assemblies (Kayal and Smith 2021); it has even been tentatively suggested that Amoebophrya lacks a mitochondrial genome altogether and that complex III of the electron transport chain is missing, splitting the electron transport chain into 2 (John et al. 2019). However, this does not appear to be true for Amoebophrya sp. Kv, which was reported to encode cytochrome b in its mitochondrial genome (Bachvaroff 2019).
The original culture used by Bachvaroff (2019) has since been lost. The isolation of a new Amoebophrya strain and the development of Oxford Nanopore sequencing offer the opportunity to analyze a high-quality, independent genome of Amoebophrya sp. Kv. Therefore, we sought to validate the previous observations about Amoebophrya sp. Kv and add to the growing genomic data on Amoebophrya. Here, we present a draft genome of Amoebophrya sp. Kv, further evaluate the evidence for ambiguous stop codons in this species and provide the most complete Amoebophrya mitochondrial contigs to date.
Methods
Culturing
The Amoebophrya strain used in this study was isolated from the Inner Harbor, Baltimore, in June of 2011, where a K. veneficum bloom was observed. Vibrant green-blue fluorescence from within the K. veneficum cells was observed using fluorescence microscopy, indicating infection by Amoebophrya. Water samples were filtered across an 8 μm millipore membrane, diluted, and added to naive cultured K. veneficum CCMP1975 in a 48-well plate. The naive host culture was grown at 20°C with 100 μMoles photons m−2s−2 and a 14 h light/10 h dark cycle in F/2 media (without silica) at a salinity of 15 made from locally collected estuarine water. After 1 week, several wells contained green fluorescent hosts and visible fluorescent dinospores. Individual infected cells were picked with a drawn glass pipette, washed in filtered media, and added to new naive hosts in 48-well plates. A singular clonal culture from a washed infected host cell was established and maintained on K. veneficum.
DNA sequencing and assembly
Weekly culture changes with a 10:1 ratio of parasite spore to host cells (100,000 cells mL−1 vs 10,000 cells mL−1) of the host were diluted in fresh media to concentrations of 50,000 cells mL−1 of Amoebophrya spores and 5,000 cells mL−1 of naive host cells. Cell concentrations were measured using a Coulter counter with a 20 μm orifice. Week-old cultures at 100,000 spore cells mL−1 in a total volume of 150 mL were centrifuged at 10,000 g for 20 minutes, and the cell pellets were used for DNA isolation. Cells were lysed with a 2% CTAB detergent solution and incubated for 20 minutes at 50°C. DNA was extracted from the crude lysate with 2 rounds of chloroform extraction and precipitation with 2 volumes of ethanol. The Short Read eliminator kit from PacBio was also used to increase average fragment size.
Each sequencing library used 1 μg of starting DNA with polishing of the DNA using the NEB Nanopore Sequencing Companion Kit, followed by ligation and size selection using the Nanopore Ligation Sequencing Kit V14. Sequencing was performed on a MinION sequencer on an R10 chip with multiple library loadings when the read output declined. Basecalling was performed using the Dorado pipeline's “Super Accurate” model with duplex basecalling using the stereo model of predicted duplex reads on the ada GPU cluster at the University of Maryland Baltimore County. The genome was assembled using Canu 2.2 (Koren et al. 2017) with an expected genome size of 300 Mb and read parameters appropriate for an R10 Nanopore (chp “corMhapOptions=−threshold 0.8 −ordered-sketch-size 1000 −ordered-kmer-size 14′ correctedErrorRate = 0.105). ABySS-fac (Jackman et al. 2017) was used to generate assembly statistics.
Contigs were flagged as bacterial contaminants using MEGAN (Huson et al. 2007). The Amobeophrya subset of the metagenome assembly was determined by selecting non-bacterial contigs with at least 20 × genomic coverage and 50% coverage from the Amoebophrya sp. Kv transcriptome and an AT proportion of 63–67%. These thresholds were based on previous Amoebophrya genome assemblies showing a relatively compact genome with little intergenic space (Farhat et al. 2021) and the previous Amoebophrya sp. Kv assembly having an AT proportion of about 65%. The total metagenome assembly was visualized using the ggplot2 package (Wickham 2016). The Bachvaroff (2019)Amoebophrya sp. Kv assembly was mapped to the new assembly using minimap2 (Li, 2018). A custom script counted the number of contigs from the old assembly that were at least 90% represented in the new assembly.
Genome annotation
Repeat libraries for the parasite contigs were generated with RepeatModeler and masked with RepeatMasker (Flynn et al. 2020). Transfer RNA genes were identified using tRNAscan-SE with maximum sensitivity (Lowe and Eddy 1997). The tRNAscan-SE predicted secondary structure of TrpCCA tRNAs was manually investigated for variation in the anticodon stem length. Transcripts were mapped to the genome using hisat2 (Kim et al. 2019), and intron/exon boundaries were extracted using Regtools with a minimum anchor length of 30 and a minimum read coverage of 3 (Cotto et al. 2023). The donor and acceptor splice site frequencies were visualized using WebLogo (Crooks et al. 2004).
Because one of the primary objectives of this study was to validate the alternative genetic code of Amoebophrya sp. Kv, we adopted a conservative approach to gene annotation. The genome was initially annotated using the MAKER pipeline (Cantarel et al. 2008), which cannot predict gene models containing in-frame stop codons. Consequently, MAKER only called genes that could plausibly lack in-frame stop codons. After that, SCGs in the intergenic regions were annotated using a custom pipeline utilizing miniprot (Li, 2023).
The MAKER annotation pipeline was provided with de novo assembled Trinity transcripts from Bachvaroff et al. (2014), repeats from RepeatMasker, proteins from other species of Amoebophrya, and a variety of myzozoan proteomes previously used to assemble gene families for Amoebophrya sp. A120 (Amoebophrya sp. A120 (Farhat et al. 2021; https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_905178155.1/) and A25 (Farhat et al. 2021; https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_905178165.1/), Perkinsus marinus (Unpublished; https://protists.ensembl.org/Perkinsus_marinus_atcc_50983_gca_000006405/), Symbiodinium microadriaticum (Aranda et al. 2016; http://smic.reefgenomics.org/), Fugacium kawagutii (Lin et al. 2015; http://web.malab.cn/symka_new/), Breviolum minutum (Shoguchi et al. 2013), Plasmodium falciparum strain 3D7 (Aurrecoechea et al. 2009; http://phlasmodb.org/plasmo/), Toxoplasma gondii strain ME49 (Kissinger 2003; http://toxodb.org/), Chromera velia strain CCMP2878 (Woo et al. 2015; https://cryptodb.org/), Vitrella brassicaformis strain CCMP 3155 (Woo et al. 2015; https://cryptodb.org/), Theileria equi (Kappmeyer et al. 2012; https://piroplasmadb.org/), and Cryptosporidium parvum (Abrahamsen et al. 2004; https://cryptodb.org/).
Genomic regions lacking MAKER annotations were blasted against Amoebophrya sp. A120 and A25 proteomes with a minimum e-value of 1e-5. Regions surrounding the hits were extracted and aligned with the proteins with miniprot. Stop-codon-containing alignments were extracted from the miniprot output, and only the best-scoring alignment was kept when protein alignments from different queries overlapped. Any alignments containing frameshifts were discarded from further analyses. Miniprot alignments without in-frame stop codons were reanalyzed in MAKER to evaluate whether these represented missed gene predictions; however, no additional gene models were identified. To assess transcriptomic support of the SCGs, stop-codon-containing sequences were aligned to the Trinity transcriptome using BLASTn, with a minimum percent identity of 90%. Only alignments with at least 80% coverage by the transcriptome were considered supported.
In-frame stop codons of the alignments were masked by replacing them with the “X” character, and the MAKER predictions and alignments were annotated by InterProScan (Jones et al. 2014) and assigned KEGG orthology using BlastKOALA (Kanehisa et al. 2016). Completeness of the gene predictions was assessed with BUSCO using the alveolata_odb10 database (Simão et al. 2015). OrthoFinder (Emms and Kelly 2019) was used to find orthologous groups between the proteomes previously used in the MAKER annotation pipeline and the MAKER-predicted proteins and stop-codon-masked protein alignments. Proteins that are important for cellular processes were identified based on shared sequence identity, protein domains, and KEGG orthology. Select proteins and their in-frame stop codons were visualized using the gggenome R package (Hackl et al. 2024). These same proteins were blasted against RefSeq's non-redundant database, aligned to their top ten hits using COBALT (Papadopoulos and Agarwala 2007), and their alignment was visualized with Jalview (Waterhouse et al. 2009). To ensure our predicted gene models had a comparable length to the other members of its orthogroup, each predicted protein was blasted to the other members of its predicted orthogroup, and the ratio between the predicted protein's length and its best hit was calculated.
Mitochondrial analysis
The Hematodinium mitochondrial proteins COXI, COXIII, and CYTB (CCE53570.1, CCE53572.1, CCE53571.1) were queried against the assembly using tBLASTn using the protozoan mitochondrial code. The best hits were labeled as putative mitochondrial proteins and scanned for functional domains using InterProScan (Jones et al. 2014). These proteins were blasted back to the assembly, and contigs containing hits with a minimum e-value of 1e-75 and 95% shared identity were labeled as putative mitochondrial contigs. Mitochondrial contigs were searched for additional protein-encoding genes using InterProScan (Jones et al. 2014). Inverted repeats were identified using the repeat-match program packaged with MUMmer (Marçais et al. 2018). Nuclear-encoded proteins belonging to the mitochondrial respiratory chain were searched for in the MAKER-predicted gene models and unincorporated protein alignments. Select mitochondrial contigs were visualized using the gggenome R package (Hackl et al. 2024).
Results
DNA extraction and assembly
The input for the assembly was 5.3 million Nanopore reads with an N50 of 9 kb and a total length of 27.7 Gb. The overall assembly of Amoebophrya, host, and bacterial contigs using Canu was 344.4 Mb in 3018 contigs with an N50 of 3.41 Mb and an L50 of 32 (Fig. 1). The cultured metagenomic assembly included 3 distinct bins of data. A total of ∼121 Mb in 39 contigs were attributed to co-cultured bacteria based on SSU ribosomal RNA strong BLAST hits to the 16S_microbial database from NCBI. The Amoebophrya subset of the metagenome assembly consisted of 48 contigs with a total length of 127 Mbp and N50 of 4.1 Mb (Table 1).
Fig. 1.
A plot of the base coverage against the AT content of the contigs in the canu metagenome assembly. The side plots indicate the number of megabases represented in a region of the plot. The large high coverage genomic and smaller, lower coverage mitochondrial contigs of Amoebophrya sp. ex Karlodinium veneficum are displayed with different colors, respectively, and the remaining bacterial and K. veneficum contigs are presented with a single color.
Table 1.
Comparisons of the assemblies and annotation of the amoebophrya sp. ex Karlodinium veneficum and Amoebophrya sp. A120, A25, and AT5.
| Kv | A120 | A25 | AT5 | |
|---|---|---|---|---|
| Assembly | ||||
| Number of scaffolds/contigs | 48 | 50 | 556 | 2351 |
| Cumulative size (Mb) | 127 | 115.5 | 116 | 87.7 |
| Scaffold/Contig N50 | 4.119 Mb | 9.243 Mb | 1.082 Mb | 83.9 kb |
| Scaffold/Contig max. size | 11.551 Mb | 16.512 Mb | 3.013 Mb | 1.914 Mb |
| AT content | 65.2 | 48.8 | 52.2 | 54.5 |
| Assembly | ||||
| Number of scaffolds/contigs | 48 | 50 | 556 | 2351 |
| Cumulative size (Mb) | 127 | 115.5 | 116 | 87.7 |
| Protein-encoding genes | ||||
| Number | 8,295 | 26,441 | 28,091 | 19,925 |
| Average gene length (bp) | 2,267 | 3,482 | 2,965 | 2,782 |
| Average protein length (aa) | 426 | 591 | 446 | 654 |
Genome annotation
Repeat masking using RepeatModeler and RepeatMasker masked 13.7 Mb (10.72%) of parasite contigs. No DNA transposons were identified, but 848 retroelements across 2.1 Mb were found. There were 2.0 Mb of LTR elements dispersed throughout the genome, but most repeats were unclassified. The 216 different unclassified repeats spanned 8.6 Mb (6.8%) of the genome and had a wide range of AT bias. Simple sequence repeats and low-complexity regions represented only 2.1 Mb of the assembled data.
The Amoebophrya sp. Kv contigs were scanned for tRNAs with maximum sensitivity using tRNAscan-SE. A total of 163 tRNAs and 24 pseudo-tRNAs were predicted. Four putative cognate suppressor tRNAs were found in the Amoebophrya sp. Kv assembly—2 cognate tRNAs for UAA and UAG. In the assemblies of Amoebophrya sp. 120, A25, and AT5, only one pseudo-tRNA identified as a stop-suppressor was detected in A25. One of the 7 predicted TrpCCA tRNAs had a 4 bp anticodon stem, and the bases typically forming the fifth pair of the anticodon stem are U26 and C42 (Supplementary Fig. 1).
The Rfam website was used to identify 4 contigs with 6773 base rDNA repeats containing SSU, ITS1, 5.8S, ITS2, and LSU repeated from 6 to 16 times in a row. There were no credible sequence differences in the 45 complete or near complete SSU and LSU regions in the 4 contigs. Although tig00002270 starts with a variant SSU, read mapping does not support this result. After the LSU, there were consistently 5S RNA and SL sequences between the rDNA regions, followed by the next SSU region. For protein-coding genes, introns were predominantly GT/AG splicing (Fig. 2) and rarely AT/AC, GC/AG, or AT/TC splicing, which is supported by the identification of U1, U2, U4, U5, and U6 snRNAs in the parasite assembly.
Fig. 2.
A logo plot indicating the sequence motifs at the end of exons (E4 to E1), the beginning of introns (I1 to I4), the end of introns (−I4 to −I1), and the beginning of exons (−E1 to −E4).
Two complementary approaches were used for the prediction of protein-coding genes. One was based on MAKER, which was restricted to using the standard genetic code, and a second method was based on miniprot for predicting genes with apparent stop codons in their open reading frames (SCGs). Based on protein identity and transcriptome-based evidence, MAKER predicted 4,668 protein-encoding genes, and 2,367 proteins contained predicted Pfam domains. The average MAKER gene was 2,063 bp long, contained 2.6 exons, and encoded a 330.8 amino acid protein (Table 1). Miniprot revealed 3,627 SCGs, with 2,462 represented in the previously published transcriptome (Supplementary data file) and 3,544 having predicted Pfam domains. The average SCG was 549 amino acids long, and the average number of stops in each SCG was 13.6 (58% UAA, 32.3% UAG, 9.7% UGA). When SCGs were excluded, the annotation only had 59.7% BUSCO completeness; including the SCGs raised the BUSCO completeness score to 79.5%. Of these SGC BUSCO genes, 136 out of 144 were full-length copies, and 124 were single-copy genes. For comparison, the alveolate BUSCO completeness of the predicted genes for AT5, A120, and A25 are 85.6%, 94.7%, and 90.6%, respectively. Of the 8,295 amino acid sequences provided to OrthoFinder, 6,837 were assigned to 4,798 orthogroups. Of the 4,798 orthogroups, 2,181 exclusively contained SCGs, but not MAKER-predicted genes. When the length of Amoebophrya sp. Kv orthologs were compared with the best hit in their assigned orthogroups, the miniprot-predicted genes were, on average, 70.5% of the length of their best hit, while the MAKER-predicted genes were 84.1% of the length (Supplementary Fig. 2).
Mitochondrial analyses
The assembly contained a large number of high-quality tBLASTn hits for mitochondrially-encoded proteins from Hematodinium (CCE53570.1, CCE53572.1, CCE53571.1) corresponding to coxI, coxIII, and cytB that shared little identity with those of K. veneficum (A825.2, ABR15108.1, ABR15096.1). The best-scoring BLAST hits were selected as putative mitochondrial proteins for Amoebophrya sp. ex K. veneficum.
The mitochondrial proteins occur on 491 AT-rich contigs in the assembly (Fig. 1). However, the contigs are quite dissimilar; some only encode a single protein, while others encode all 3 (Fig. 3). The mitochondrial contigs have relatively low base coverage (1×-8.05×), with lengths ranging from 3,339 bp to 56,915 bp and little-to-no RNA-seq coverage (Supplementary Fig. 3). When contigs were searched for inverted repeats with lengths >50 nt, 3196 pairs of varying lengths (50–5332 bp) were detected; clustering with CD-HIT at 95% identity revealed 122 different clusters of inverted repeats.
Fig. 3.
Selected mitochondrial contigs from amoebophrya sp. ex Karlodinium veneficum with fragmented and full copies of coxI, coxIII, and cytb. Grey blocks along the contigs indicate the location of inverted repeats.
In the nuclear genome, proteins from all major mitochondrial electron transport chain complexes except Complex I were detected. Complex II (succinate dehydrogenase) and the F1 subunit of Complex V (ATP synthase alpha, beta, gamma, delta, and delta OSCP subunits) were complete. Complex III (cytochrome c1, Rieske iron-sulfur protein, mitochondrial processing-peptidase subunits alpha and beta, and cytochrome b-c1 complex subunit 7) and Complex IV (cytochrome c oxidase subunit 2, 5b, 6b, 11, 15, and 19) were only partially represented, and the F0 subunit of Complex V was undetected. Genes encoding cytochrome c, alternative NADH dehydrogenase, and alternative oxidase were also found. All the nuclear-encoded proteins lacked in-frame stop codons except for the alternative NADH dehydrogenase and cytochrome c oxidase subunits 11, 15, and 19.
Discussion
The nuclear genome of Amoebophrya sp. ex K. veneficum
In eukaryotes, canonical stop codons typically terminate translation when the eRF1 protein recognizes the stop codon and releases the nascent peptide chain (Jackson et al. 2012). However, several eukaryotes have reassigned canonical stop codons to code for amino acids by using suppressor tRNAs that pair with stop codons. Such codon reassignment has been reported in diverse lineages, including mitochondria (Barrell et al. 1979), viruses (Borges et al. 2022), bacteria (Inamine et al. 1990), green algae (Schneider et al. 1989; Schneider and de Groot 1991), diplomonads (Keeling and Doolittle 1996), and ciliates (Hoffman et al. 1995; Lozupone et al. 2001). However, some species use ambiguous stop codons recognized by both suppressor tRNAs and eRF1, allowing the codon to code for an amino acid or terminate translation depending on the context. Organisms that utilize ambiguous stop codons include the trypanosomatid genus Blastocrithidia (Kachale et al. 2023), the heterotrich ciliate Condylostoma magnum (Swart et al. 2016), and karyorelict ciliates (Swart et al. 2016; Seah et al. 2022). Although a mechanism has been proposed for how certain ciliates define stop/coding contexts of ambiguous stop codons (Swart et al. 2016), it seems unlikely that this is conserved across the distantly related lineages that utilize this sort of genetic code.
Here, we present the genome assembly of a novel strain of Amoebophrya sp. Kv, which is the second most contiguous Amoebophrya genome assembly currently available (Table 1). This assembly is also a significant improvement on the prior assembly by Bachvaroff (2019), condensing 8,801 contigs into just 48. The genome of Amoebophrya sp. Kv is much more AT-rich than the other published Amoebophrya genomes (Table 1), and Amoebophrya sp. Kv primarily uses canonical GT/AG splicing patterns (Fig. 2). This splicing pattern is consistent with what has been reported for Amoebophrya sp. AT5 (John et al. 2019) and suggests that the strange intronic elements of A120 and A25 are confined to only a subset of the genus (Farhat et al. 2021).
Similar to the 2019 genomic survey, this assembly contained cognate tRNAs for UAA and UAG codons. Although we detected no UGA cognate tRNA, Blastocrithidia nonstop, a trypanosomatid that has repurposed all 3 canonical stop codons, also has no dedicated UGA cognate tRNA in its genome. Instead, it can translate UGA using a TrpCCA tRNA with a 4 bp anticodon stem (AS) rather than the typical 5 bp AS (Kachale et al. 2023). The suppression of UGA by a 4 bp AS TrpCCA tRNAs was experimentally transferable to Trypanosoma brucei, Condylostoma magnum, and Saccharomyces cerevisiae, suggesting that this is a characteristic of these tRNAs in general (Kachale et al. 2023). Moreover, such 4 bp AS suppressor TrpCCA tRNAs appear to be present in Blepharisma and Loxodes (Swart et al. 2023) and precedent for 4 bp AS suppressor tRNAs in Escherichia coli dates back to 1994 (Schultz and Yarus 1994a, 1994b; Swart et al. 2023). Like these organisms, Amoebophrya sp. Kv also has a TrpCCA tRNA with a 4 bp AS; interestingly, the 2 bases that would otherwise form the fifth pair of the anticodon stem are identical to those of the tRNA that suppressed UGA in C. magnum (Kachale et al. 2023). We, therefore, feel that this TrpCCA tRNA is likely the mechanism for UGA suppression.
Amoebophrya sp. Kv possesses 2 eRF1 homologs that are identical to other Amoebophrya species at amino acid motifs associated with stop codon recognition or polypeptide release. However, in all Amoebophrya species analyzed, the conserved NIKS motif has been modified to RIKS. Mutations at this asparagine residue are known to modify the termination efficiency of UAA and UAG (Frolova et al. 2002), but the exact effect of this mutation is impossible to assess from sequence data alone. Furthermore, modification of such a highly conserved residue in Amoebophrya with a canonical genetic code indicates that it does not prevent termination at UAA and UAG. Combined with the previously mentioned suppressor tRNAs, this suggests that all 3 canonical stop codons are recognized by both tRNAs and eRF1 in Amoebophrya sp. Kv.
Nearly half of the predicted proteins in this assembly contained in-frame stop codons, which is consistent with the inference that Amoebophrya sp. Kv utilizes ambiguous stop codons. These SCGs are a genuine characteristic of this species, evidenced by their presence in genomes of different Amoebophrya sp. Kv isolates that were sequenced using different technologies. Furthermore, the SCGs are unlikely to be pseudogenes. This is supported by the observation that 2,181 (45% of the total) orthogroups contained an SCG but no MAKER-predicted proteins. If the SCGs in these orthogroups are pseudogenes, it would mean that Amoebophrya sp. Kv performs these orthogroups' functions via alternative mechanisms or has lost the need for that function altogether. Parasite genomes are often characterized by a loss of function and increased dependence on the host, and many orthogroups may be functionally redundant to some extent. However, several SCGs are critical for cellular processes that cannot be easily delegated to the host and are not known to be functionally redundant. This is especially true for proteins involved in DNA replication (e.g. RNase H, Top1–3, and the catalytic subunits of DNA polymerases alpha and delta), DNA repair (e.g. 8-oxoguanine DNA glycosylase, DNA polymerase kappa, and RAD54), and pre-mRNA splicing (e.g. splicing factor 3B subunits 1, 3, and 4; Fig. 4, Supplementary Figs. 4,5,6). If these SCGs are pseudogenes, this would imply a substantial loss of function; Amoebophrya sp. Kv would presumably be hindered in its ability to replicate DNA, repair common DNA lesions, and splice introns. The improbability of these scenarios suggests that the SCGs are indeed translated into proteins. Ultimately, the translation of all 3 stop codons is the most parsimonious explanation for our observations. From the isoforms of the suppressor tRNAs and previous codon usage analyses (Bachvaroff 2019), it seems likely that UGA codes for tryptophan and UAA/UAG codes for glutamine, but this will need to be confirmed experimentally. Translation termination at one or more of these codons is likely contextual, but determining which codon(s) signal for termination and the required context will require additional study.
Fig. 4.
Select gene models from amoebophrya sp. ex Karlodinium veneficum. Here, actin is a gene model that lacks in-frame stop codons. Stop codons present in the open reading frame are indicated by circles (TAA), triangles (TAG), and squares (TGA) above the gene model.
Ambiguous stop codons make it challenging to discriminate between translated and untranslated reading frames; this makes gene calling particularly difficult, especially in a lineage with many species-specific genes. There are likely more protein-encoding genes than the 8,295 we predicted. However, because these unidentified genes lack strong sequence identity with known myzozoan genes and likely contain many stop codons, we were unable to find an efficient way to identify them. Furthermore, our gene predictions may contain some pseudogenes, missed introns/exons, and other artifacts of high throughput annotations. Refining our gene models requires knowledge of what constitutes a true stop codon, which is difficult to determine with the available data.
The mitochondria of Amoebophrya sp. ex. K. veneficum
Dinoflagellates and their sister, the apicomplexans, are notable for having some of the most gene-poor mitochondrial genomes on the eukaryotic tree (Gagat et al. 2017), encoding for only 3 proteins (COXI, COXIII, and CYTB) and scattered fragments of rDNA (Gagat et al. 2017). The mitochondrial genome of Amoebophrya has not been detected in the AT5, A120, or A25 genome assemblies, except for a short fragment resembling coxI in A120 and AT5 (Kayal and Smith 2021). This caused a debate about whether Amoebophrya lacked a mitochondrial genome altogether (Kayal and Smith 2021; John et al. 2019). Although only mentioned in passing, the mitochondrial genome and base editing of Amoebophrya sp. ex K. veneficum has been previously reported (Bachvaroff 2019). The controversy surrounding Amoebophrya's mitochondrial genome prompted further examination here.
The present study detected nearly 500 contigs containing genes for Amoebophrya sp. Kv's coxI, coxIII, and cytb (Fig. 1, Fig. 3, and Supplementary Fig. 3). Although we lack localization data for these sequences and recognize that these contigs could be from the nuclear genome, we strongly favor their assignment to the mitochondrial genome. The gene content of these contigs is typical for a myzozoan mitochondrial genome, and despite the large size of some contigs, they lack any sign of nuclear-encoded genes.
The protein-encoding genes in each mitochondrial contig had highly conserved sequence identity, but outside of the genes, the contigs varied greatly. All 3 protein-encoding genes were present in some mitochondrial contigs, while others contained only a subset of the genes or fragmented copies. Additionally, the intergenic regions often consisted of inverted repeats, the exact sequence of which varied between 122 repeat families. The extreme variation in mitochondrial contigs raises the possibility that a heterogeneous population of mitochondrial fragments contributes to the functionality of a single mitochondrion in Amoebophrya sp. Kv. Alternatively, the mitochondrial genome of Amoebophrya sp. Kv may differ significantly from individual to individual and exist only at low copy numbers; this would make it difficult to assemble a complete mitochondrial genome from many different dinospores. However, given that the sequenced cultures descend from a single infected host and are passed through a population bottleneck with each transfer, it would be surprising to observe such a diversity of mitochondrial genomes among individuals. Ultimately, more work is needed to assess if Amoebophrya's mitochondrial genome is fragmented and what the relationship is among these multiple contigs.
Mitochondrial proteins from all electron transport chain (ETC) complexes besides complex I were encoded in the nuclear genome. The absence of complex I is consistent with previous Amoebophrya genomes; however, we cannot rule out the possibility that the available genomes and annotations are incomplete and Amoebophrya possesses complex I. All complex III respiratory and core subunits were identified, which are notably missing in the gene predictions of Amoebophrya sp. AT5, A120, and A25. The absence of complex III proteins in these genomes led to the suggestion that Amoebophrya has a bipartite ETC (John et al. 2019). While the ETC may indeed be split in some Amoebophrya species, this does not appear to be the case for Amoebophrya sp. Kv is, therefore, not a general property of the genus. Further investigations are required to determine if Amoebophrya has diverse mitochondrial biology or if it is more conventional than was previously thought.
Supplementary Material
Acknowledgments
We would also like to thank Shehre Banoo Malik, John Mattick, and Charles Francis Delwiche for their comments on the manuscript. WCD is currently a graduate student in CFD's lab. The BioAnalytical Services Laboratory (BASLab) at the Institute of Marine and Environmental Technology (IMET) was used for Nanopore GridION and MinION long-read sequencing. See https://www.umces.edu/baslab for more information on the BASLab capabilities.
Contributor Information
Wesley DeMontigny, Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, College Park, MD 20742, USA.
Tsvetan Bachvaroff, Institute for Marine and Environmental Technology, University of Maryland Center for Environmental Sciences, Baltimore, MD 21202, USA.
Data Availability
Analyses that are more complicated than singular commands are available as scripts and can be found at https://github.com/Wesley-DeMontigny/AmexKv-Genome-Scripts. Supplemental figures and data files can be found at https://doi.org/10.6084/m9.figshare.26940256.v1. The NCBI BioProject accession is PRJNA1103200, the raw reads can be found at SRR28830816–8 and SRR28782276–8, the genome accession number is JBKEIR000000000.
Supplemental material available at G3 online.
Funding
This project was funded by the G. Unger Vetlesen Foundation and IMET Angel Investors Program. The hardware used in the computational studies is part of the UMBC High Performance Computing Facility hpcf.umbc.edu (accessed on 7 July 2024). The facility is supported by the U.S. National Science Foundation through the MRI program (grant nos. CNS–0821258, CNS–1228778, OAC–1726023, and CNS–1920079) and the SCREMS program (grant no. DMS–0821311), with additional substantial support from the University of Maryland, Baltimore County (UMBC).
Literature cited
- Abrahamsen MS, Templeton TJ, Enomoto S, Abrahante JE, Zhu G, Lancto CA, Deng M, Liu C, Widmer G, Tzipori S, et al. 2004. Complete genome sequence of the apicomplexan, Cryptosporidium parvum. Science. 304(5669):441–445. doi: 10.1126/science.1094786. [DOI] [PubMed] [Google Scholar]
- Adolf JE, Krupatkina D, Bachvaroff T, Place AR. 2007. Karlotoxin mediates grazing by Oxyrrhis marina on strains of Karlodinium veneficum. Harmful Algae. 6(3):400–412. doi: 10.1016/j.hal.2006.12.003. [DOI] [Google Scholar]
- Aranda M, Li Y, Liew YJ, Baumgarten S, Simakov O, Wilson MC, Piel J, Ashoor H, Bougouffa S, Bajic VB, et al. 2016. Genomes of coral dinoflagellate symbionts highlight evolutionary adaptations conducive to a symbiotic lifestyle. Sci Rep. 6(1):39734. doi: 10.1038/srep39734. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aurrecoechea C, Brestelli J, Brunk BP, Dommer J, Fischer S, Gajria B, Gao X, Gingle A, Grant G, Harb OS, et al. 2009. PlasmoDB: a functional genomic database for malaria parasites. Nucleic Acids Res. 37(Database):D539–D543. doi: 10.1093/nar/gkn814. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bachvaroff TR. 2019. A precedented nuclear genetic code with all three termination codons reassigned as sense codons in the syndinean Amoebophrya sp. PLoS One. 14:e0212912. doi: 10.1371/journal.pone.0212912. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bachvaroff TR, Gornik SG, Concepcion GT, Waller RF, Mendez GS, Lippmeier JC, Delwiche CF. 2014. Dinoflagellate phylogeny revisited: using ribosomal proteins to resolve deep branching dinoflagellate clades. Mol Phylogenet Evol. 70:314–322. doi: 10.1016/j.ympev.2013.10.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Barrell BG, Bankier AT, Drouin J. 1979. A different genetic code in human mitochondria. Nature. 282(5735):189–194. doi: 10.1038/282189a0. [DOI] [PubMed] [Google Scholar]
- Blank RJ, Trench RK. 1986. Nomenclature of endosymbiotic dinoflagellates. Taxon. 35(2):286–294. doi: 10.2307/1221270. [DOI] [Google Scholar]
- Borges AL, Lou YC, Sachdeva R, Al-Shayeb B, Penev PI, Jaffe AL, Lei S, Santini JM, Banfield JF. 2022. Widespread stop-codon recoding in bacteriophages may regulate translation of lytic genes. Nat Microbiol. 7(6):918–927. doi: 10.1038/s41564-022-01128-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cachon J, Cachon M. 1987. The biology of dinoflagellates. In: Taylor FJR. in edited by. Oxford.: Botanical Monographs, Blackwell Scientific Publications. p. 571–610. [Google Scholar]
- Cantarel BL, Korf I, Robb SMC, Parra G, Ross E, Moore B, Holt C, Sánchez Alvarado A, Yandell M. 2008. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 18(1):188–196. doi: 10.1101/gr.6743907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chambouvet A, Alves-de-Souza C, Cueff V, Marie D, Karpov S, Guillou L. 2011. Interplay between the parasite Amoebophrya sp. (Alveolata) and the cyst formation of the red tide dinoflagellate Scrippsiella trochoidea. Protist. 162(4):637–649. doi: 10.1016/j.protis.2010.12.001. [DOI] [PubMed] [Google Scholar]
- Chambouvet A, Morin P, Marie D, Guillou L. 2008. Control of toxic marine dinoflagellate blooms by serial parasitic killers. Science. 322(5905):1254–1257. doi: 10.1126/science.1164387. [DOI] [PubMed] [Google Scholar]
- Coats DW. 1999. Parasitic life styles of marine dinoflagellates. J Eukaryot Microbiol. 46(4):402–409. doi: 10.1111/j.1550-7408.1999.tb04620.x. [DOI] [Google Scholar]
- Coats D, Adam E, Gallegos C, Hedrick S. 1996. Parasitism of photosynthetic dinoflagellates in a shallow subestuary of Chesapeake Bay, USA. Aquat Microb Ecol. 11:1–9. doi: 10.3354/ame011001. [DOI] [Google Scholar]
- Cotto KC, Feng Y-Y, Ramu A, Richters M, Freshour SL, Skidmore ZL, Xia H, McMichael JF, Kunisaki J, Campbell KM, et al. 2023. Integrated analysis of genomic and transcriptomic data for the discovery of splice-associated variants in cancer. Nat Commun. 14(1):1589. doi: 10.1038/s41467-023-37266-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crooks GE, Hon G, Chandonia J-M, Brenner SE. 2004. WebLogo: a sequence logo generator. Genome Res. 14(6):1188–1190. doi: 10.1101/gr.849004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deeds JR, Reimschuessel R, Place AR. 2006. Histopathological effects in fish exposed to the toxins from Karlodinium micrum. J Aqua Anim Hlth. 18(2):136–148. doi: 10.1577/H05-027.1. [DOI] [Google Scholar]
- Deeds JR, Terlizzi DE, Adolf JE, Stoecker DK, Place AR. 2002. Toxic activity from cultures of Karlodinium micrum (=Gyrodinium galatheanum) (Dinophyceae)—a dinoflagellate associated with fish mortalities in an estuarine aquaculture facility. Harmful Algae. 1(2):169–189. doi: 10.1016/S1568-9883(02)00027-6. [DOI] [Google Scholar]
- Emms DM, Kelly S. 2019. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20(1):238. doi: 10.1186/s13059-019-1832-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Farhat S, Le P, Kayal E, Noel B, Bigeard E, Corre E, Maumus F, Florent I, Alberti A, Aury J-M, et al. 2021. Rapid protein evolution, organellar reductions, and invasive intronic elements in the marine aerobic parasite dinoflagellate Amoebophrya spp. BMC Biol. 19(1):1. doi: 10.1186/s12915-020-00927-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Flynn J, Hubley R, Goubert J, Rosen J, Clark A, Feschotte C, Smit A. 2020. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. U.S.A. 117(17):9451–9457. doi: 10.1073/pnas.1921046117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Freudenthal HD. 1962. Symbiodinium gen. Nov. And Symbiodinium microadriaticum sp. Nov., a zooxanthella: taxonomy, life cycle, and morphology. J Protozool. 9(1):45–52. doi: 10.1111/j.1550-7408.1962.tb02579.x. [DOI] [Google Scholar]
- Fritz L, Nass M. 1992. Development of the endoparasitic dinoflagellate Amoebophrya ceratii within host dinoflagellate Species. J Phycol. 28(3):312–320. doi: 10.1111/j.0022-3646.1992.00312.x. [DOI] [Google Scholar]
- Frolova L, Seit-Nebi A, Kisselev L. 2002. Highly conserved NIKS tetrapeptide is functionally essential in eukaryotic translation termination factor eRF1. RNA. 8(2):129–136. doi: 10.1017/S1355838202013262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gagat P, Mackiewicz D, Mackiewicz P. 2017. Peculiarities within peculiarities—dinoflagellates and their mitochondrial genomes. Mitochondrial DNA B Resour. 2(1):191–195. doi: 10.1080/23802359.2017.1307699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gaines G, Taylor FJR. 1984. Extracellular digestion in marine dinoflagellates. J Plankton Res. 6(6):1057–1061. doi: 10.1093/plankt/6.6.1057. [DOI] [Google Scholar]
- Groisillier A, Massana R, Valentin K, Vaulot D, Guillou L. 2006. Genetic diversity and habitats of two enigmatic marine alveolate lineages. Aquat Microb Ecol. 42:277–291. doi: 10.3354/ame042277. [DOI] [Google Scholar]
- Guillou L, Viprey M, Chambouvet A, Welsh RM, Kirkham AR, Massana R., Scanlan DJ, Worden AZ, 2008. Widespread occurrence and genetic diversity of marine parasitoids belonging to syndiniales (Alveolata). Environ Microbiol. 10(12):3349–3365. doi: 10.1111/j.1462-2920.2008.01731.x. [DOI] [PubMed] [Google Scholar]
- Hackl T, Ankenbrand M, van Adrichem B. 2024. gggenomes: effective and versatile visualizations for comparative genomics. arXiv. 2411.13556. doi: 10.48550/arXiv.2411.13556. [DOI] [Google Scholar]
- Hoffman DC, Anderson RC, DuBois ML, Prescott DM. 1995. Macronuclear gene-sized molecules of hypotrichs. Nucleic Acids Res. 23(8):1279–1283. doi: 10.1093/nar/23.8.1279. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huson DH, Auch AF, Qi J, Schuster SC. 2007. MEGAN analysis of metagenomic data. Genome Res. 17(3):377–386. doi: 10.1101/gr.5969107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Inamine JM, Ho KC, Loechel S, Hu PC. 1990. Evidence that UGA is read as a tryptophan codon rather than as a stop codon by Mycoplasma pneumoniae, Mycoplasma genitalium, and Mycoplasma gallisepticum. J Bacteriol. 172(1):504–506. doi: 10.1128/jb.172.1.504-506.1990. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jackman SD, Vandervalk BP, Mohamadi H, Chu J, Yeo S, Hammond SA, Jahesh G, Khan H, Coombe L, Warren RL, et al. 2017. ABySS 2.0: resource-efficient assembly of large genomes using a bloom filter. Genome Res. 27(5):768–777. doi: 10.1101/gr.214346.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jackson RJ, Hellen CUT, Pestova TV. 2012. Termination and post-termination events in Eukaryotic translation. In: Advances in protein chemistry and structural biology. Elsevier. p. 45–93. [DOI] [PubMed] [Google Scholar]
- Jacobson DM, Anderson DM. 1986. Thecate heterotrophic dinoflagellates: feeding behavior and mechanisms 1. J Phycol. 22(3):249–258. doi: 10.1111/j.1529-8817.1986.tb00021.x. [DOI] [Google Scholar]
- Janouškovec J, Gavelis GS, Burki F, Dinh D, Bachvaroff TR, Gornik SG, Bright KJ, Imanian B, Strom SL, Delwiche CF, et al. 2017. Major transitions in dinoflagellate evolution unveiled by phylotranscriptomics. Proc Natl Acad Sci U S A. 114(2):E171–E180. doi: 10.1073/pnas.1614842114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- John U, Lu Y, Wohlrab S, Groth M, Janouškovec J, Kohli GS, Mark FC, Bickmeyer U, Farhat S, Felder M, et al. 2019. An aerobic eukaryotic parasite with functional mitochondria that likely lacks a mitochondrial genome. Sci Adv. 5(4):eaav1110. doi: 10.1126/sciadv.aav1110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, et al. 2014. InterProScan 5: genome-scale protein function classification. Bioinformatics. 30(9):1236–1240. doi: 10.1093/bioinformatics/btu031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kachale A, Pavlíková Z, Nenarokova A, Roithová A, Durante IM, Miletínová P, Záhonová K, Nenarokov S, Votýpka J, Horáková E, et al. 2023. Short tRNA anticodon stem and mutant eRF1 allow stop codon reassignment. Nature. 613(7945):751–758. doi: 10.1038/s41586-022-05584-2. [DOI] [PubMed] [Google Scholar]
- Kanehisa M, Sato Y, Morishima K. 2016. BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J Mol Biol. 428(4):726–731. doi: 10.1016/j.jmb.2015.11.006. [DOI] [PubMed] [Google Scholar]
- Kappmeyer LS, Thiagarajan M, Herndon DR, Ramsay JD, Caler E, Djikeng A, Gillespie JJ, Lau AO, Roalson EH, Silva JC, et al. 2012. Comparative genomic analysis and phylogenetic position of Theileria equi. BMC Genomics. 13(1):603. doi: 10.1186/1471-2164-13-603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kayal E, Smith DR. 2021. Is the dinoflagellate Amoebophrya really missing an mtDNA? (A. Ouangraoua, ed.). Mol Biol Evol. 38(6):2493–2496. doi: 10.1093/molbev/msab041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Keeling PJ, Doolittle WF. 1996. A non-canonical genetic code in an early diverging eukaryotic lineage. EMBO J. 15(9):2285–2290. doi: 10.1002/j.1460-2075.1996.tb00581.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. 2019. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 37(8):907–915. doi: 10.1038/s41587-019-0201-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kissinger JC. 2003. ToxoDB: accessing the Toxoplasma gondii genome. Nucleic Acids Res. 31(1):234–236. doi: 10.1093/nar/gkg072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. 2017. Canu: scalable and accurate long-read assembly via adaptive k -mer weighting and repeat separation. Genome Res. 27(5):722–736. doi: 10.1101/gr.215087.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H. 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 34(18):3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H. 2023. Protein-to-genome alignment with miniprot. Bioinformatics. 39(1):btad014. doi: 10.1093/bioinformatics/btad014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lin S, Cheng S, Song B, Zhong X, Lin X, Li W, Li L, Zhang Y, Zhang H, Ji Z, et al. 2015. The Symbiodinium kawagutii genome illuminates dinoflagellate gene expression and coral symbiosis. Science. 350(6261):691–694. doi: 10.1126/science.aad0408. [DOI] [PubMed] [Google Scholar]
- Lowe TM, Eddy SR. 1997. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25(5):955–964. doi: 10.1093/nar/25.5.955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lozupone CA, Knight RD, Landweber LF. 2001. The molecular basis of nuclear genetic code change in ciliates. Curr Biol. 11(2):65–74. doi: 10.1016/S0960-9822(01)00028-8. [DOI] [PubMed] [Google Scholar]
- Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A, 2018. MUMmer4: a fast and versatile genome alignment system. PLoS Comput Biol. 14(1):e1005944. doi: 10.1371/journal.pcbi.1005944. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miller JJ, Delwiche CF, Coats DW. 2012. Ultrastructure of Amoebophrya sp. And its changes during the course of infection. Protist. 163(5):720–745. doi: 10.1016/j.protis.2011.11.007. [DOI] [PubMed] [Google Scholar]
- Papadopoulos JS, Agarwala R. 2007. COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics. 23(9):1073–1079. doi: 10.1093/bioinformatics/btm076. [DOI] [PubMed] [Google Scholar]
- Park MG, Yih W, Coats DW. 2004. Parasites and phytoplankton, with special emphasis on dinoflagellate infections. J Eukaryot Microbiol. 51(2):145–155. doi: 10.1111/j.1550-7408.2004.tb00539.x. [DOI] [PubMed] [Google Scholar]
- Place AR, Bowers HA, Bachvaroff TR, Adolf JE, Deeds JR, Sheng J. 2012. Karlodinium veneficum—the little dinoflagellate with a big bite. Harmful Algae 14: 179–195. doi: 10.1016/j.hal.2011.10.021. [DOI] [Google Scholar]
- Place A, Harvey H, Bai X, Coats D. 2006. Sneaking under the toxin surveillance radar: parasitism and sterol content. African J Marine Sci. 28(2):347–351. doi: 10.2989/18142320609504175. [DOI] [Google Scholar]
- Schneider SU, De Groot EJ. 1991. Sequences of two rbcS cDNA clones of Batophora oerstedii: structural and evolutionary considerations. Curr Genet. 20(1–2):173–175. doi: 10.1007/BF00312782. [DOI] [PubMed] [Google Scholar]
- Schneider SU, Leible MB, Yang X-P. 1989. Strong homology between the small subunit of ribulose-1,5-bisphosphate carboxylase/oxygenase of two species of Acetabularia and the occurrence of unusual codon usage. Mol Gen Genet. 218(3):445–452. doi: 10.1007/BF00332408. [DOI] [PubMed] [Google Scholar]
- Schultz DW, Yarus M. 1994a. tRNA structure and ribosomal function. II. Interaction between anticodon helix and other tRNA mutations. J Mol Biol. 235(5):1395–1405. doi: 10.1006/jmbi.1994.1096. [DOI] [PubMed] [Google Scholar]
- Schultz DW, Yarus M. 1994b. tRNA structure and ribosomal function. I. tRNA nucleotide 27–43 mutations enhance first position wobble. J Mol Biol. 235(5):1381–1394. doi: 10.1006/jmbi.1994.1095. [DOI] [PubMed] [Google Scholar]
- Seah BKB, Singh A, Swart EC. 2022. Karyorelict ciliates use an ambiguous genetic code with context-dependent stop/sense codons. Peer Community J. 2:e42. doi: 10.24072/pcjournal.141. [DOI] [Google Scholar]
- Shoguchi E, Shinzato C, Kawashima T, Gyoja F, Mungpakdee S, Koyanagi R, Takeuchi T, Hisata K, Tanaka M, Fujiwara M, et al. 2013. Draft assembly of the Symbiodinium minutum nuclear genome reveals dinoflagellate gene structure. Curr Biol. 23(15):1399–1408. doi: 10.1016/j.cub.2013.05.062. [DOI] [PubMed] [Google Scholar]
- Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. 2015. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 31(19):3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
- Swart EC, Emmerich C, Seah KBB, Singh M, Shulgina Y, Singh A. 2023. How did UGA codon translation as tryptophan evolve in certain ciliates? A critique of Kachale et al. 2023 Nature. bioRxiv. doi: 10.1101/2023.10.09.561518. [DOI] [Google Scholar]
- Swart EC, Serra V, Petroni G, Nowacki M, 2016. Genetic codes with No dedicated stop Codon: context-dependent translation termination. Cell 166(3): 691–702. doi: 10.1016/j.cell.2016.06.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Velo-Suárez L, Brosnahan ML, Anderson DM, McGillicuddy DJ. 2013. A quantitative assessment of the role of the parasite Amoebophrya in the termination of Alexandrium fundyense blooms within a small coastal embayment. PLoS One. 8(12):e81150. doi: 10.1371/journal.pone.0081150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ. 2009. Jalview version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics. 25(9):1189–1191. doi: 10.1093/bioinformatics/btp033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wickham H. 2016. ggplot2: Elegant Graphics for Data Analysis. Springer International Publishing: Imprint: Springer, Cham. [Google Scholar]
- Woo YH, Ansari H, Otto TD, Klinger CM, Kolisko M, Michálek J, Saxena A, Shanmugam D, Tayyrov A, Veluchamy A, et al. 2015. Chromerid genomes reveal the evolutionary path from photosynthetic algae to obligate intracellular parasites. eLife. 4:e06974. doi: 10.7554/eLife.06974. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang H, Hu Z, Shang L, Deng Y, Tang YZ. 2020. A strain of the toxic dinoflagellate Karlodinium veneficum isolated from the east China sea is an omnivorous phagotroph. Harmful Algae. 93:101775. doi: 10.1016/j.hal.2020.101775. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Analyses that are more complicated than singular commands are available as scripts and can be found at https://github.com/Wesley-DeMontigny/AmexKv-Genome-Scripts. Supplemental figures and data files can be found at https://doi.org/10.6084/m9.figshare.26940256.v1. The NCBI BioProject accession is PRJNA1103200, the raw reads can be found at SRR28830816–8 and SRR28782276–8, the genome accession number is JBKEIR000000000.
Supplemental material available at G3 online.




