Skip to main content
PLOS Pathogens logoLink to PLOS Pathogens
. 2020 Aug 3;16(8):e1008717. doi: 10.1371/journal.ppat.1008717

Genomic and transcriptomic evidence for descent from Plasmodium and loss of blood schizogony in Hepatocystis parasites from naturally infected red colobus monkeys

Eerik Aunin 1, Ulrike Böhme 1, Theo Sanderson 2, Noah D Simons 3, Tony L Goldberg 4, Nelson Ting 3, Colin A Chapman 5,6,7, Chris I Newbold 1,8, Matthew Berriman 1, Adam J Reid 1,*
Editor: Tim JC Anderson9
PMCID: PMC7425995  PMID: 32745123

Abstract

Hepatocystis is a genus of single-celled parasites infecting, amongst other hosts, monkeys, bats and squirrels. Although thought to have descended from malaria parasites (Plasmodium spp.), Hepatocystis spp. are thought not to undergo replication in the blood–the part of the Plasmodium life cycle which causes the symptoms of malaria. Furthermore, Hepatocystis is transmitted by biting midges, not mosquitoes. Comparative genomics of Hepatocystis and Plasmodium species therefore presents an opportunity to better understand some of the most important aspects of malaria parasite biology. We were able to generate a draft genome for Hepatocystis sp. using DNA sequencing reads from the blood of a naturally infected red colobus monkey. We provide robust phylogenetic support for Hepatocystis sp. as a sister group to Plasmodium parasites infecting rodents. We show transcriptomic support for a lack of replication in the blood and genomic support for a complete loss of a family of genes involved in red blood cell invasion. Our analyses highlight the rapid evolution of genes involved in parasite vector stages, revealing genes that may be critical for interactions between malaria parasites and mosquitoes.

Author summary

Hepatocystis parasites are single-celled organisms, closely related to the Plasmodium species which cause malaria. But Hepatocystis are distinct–unlike Plasmodium they are thought not to replicate in the blood and cause little or no disease in their mammalian hosts. They are transmitted from one host to the next, not by mosquitoes, but by biting midges. In this study we generated a genome sequence for Hepatocystis–the first time this data has ever been produced and analysed for this species. We compared genome sequences of Hepatocystis and Plasmodium, confirming that Hepatocystis is descended from Plasmodium. We strengthened support for the absence of replication in the blood and, in line with this finding, discovered that genes involved in interaction with red blood cells have been lost in Hepatocystis. Our analyses revealed rapid evolution of genes which are active when the parasite is in the insect vector, highlighting those which might be important for understanding interaction between malaria parasites and mosquitoes. Hepatocystis has a fascinating evolutionary story and is a powerful comparator for understanding malaria parasite biology.

Introduction

Species of the genus Hepatocystis are single-celled eukaryotic parasites infecting, amongst other hosts, Old World monkeys, fruit bats and squirrels [1]. Phylogenetically, they are thought to reside within a clade containing Plasmodium species, including the parasites causing malaria in humans [2]. They were originally considered distinct from Plasmodium and have remained in a different genus because they lack the defining feature of asexual development in the blood, known as erythrocytic schizogony [3]. The presence of macroscopic exoerythrocytic schizonts (merocysts) in the liver of the vertebrate host is the most prominent feature of Hepatocystis [3]. Similar to Plasmodium parasites, Hepatocystis merocysts yield many single-celled merozoites. However, unlike Plasmodium, Hepatocystis merocysts appear to be the only replication phase in the vertebrate host [4]. First generation merozoites of Plasmodium spp. are released from liver cells and invade red blood cells, where they multiply asexually, before erupting from red cells as secondary merozoites. These merozoites invade further red blood cells before some develop into stages that can be transmitted to the vector. In contrast, liver merozoites of Hepatocystis spp. are thought to commit to the development of gametocyte transmission stages directly upon invading red blood cells. They are then vectored not by mosquitoes, but by biting midges of the genus Culicoides [5]. After fertilisation, Hepatocystis ookinetes encyst in the head and thorax of the midge between muscle fibres, whereas Plasmodium ookinetes encyst in the midgut wall of mosquitoes. After maturation, oocysts of both Plasmodium and Hepatocystis rupture and release sporozoites that migrate to the salivary glands [1]. These discrete biological differences, in the face of phylogenetic similarity and many shared biological features, make Hepatocystis a potentially powerful comparator for understanding important aspects of malaria parasite biology, such as transmission and host specificity.

A population of red colobus monkeys (Piliocolobus tephrosceles), from Kibale National Park, Uganda were previously shown to host Hepatocystis parasites based on morphological identification of infected red blood cells and DNA sequencing of the cytochrome b gene [6]. In this work we use Hepatocystis sp. genome and transcriptome sequences derived from P. tephrosceles whole blood samples to generate a draft genome sequence and gain insights into Hepatocystis evolution. We go on to use these insights to explore key aspects of malaria parasite biology such as red blood cell invasion, gametocytogenesis and parasite-vector interactions.

Results

Genome assembly and annotation

While examining a published genomic sequence generated from the whole blood of a red colobus monkey (Piliocolobus tephrosceles) in Kibale National Park, Uganda (NCBI assembly ASM277652v1), we noticed the presence of sequences with significant similarity to Plasmodium spp. We hypothesised that this represented genomic material from a bloodborne parasite captured during the sequencing process and for each contig in the assembly we examined the AT-content and sequence similarity to Plasmodium spp. and macaque (Fig 1A). We identified a substantial subset of contigs that appeared to be derived from an apicomplexan parasite. Phylogenetic analysis, using an orthologue of cytochrome b from these contigs, suggested that they represent the first substantial genomic sequence from the genus Hepatocystis (Fig 1B), a parasite previously reported in Kibale National Park that infects at least four species of Old World monkeys [6]. At least four species of Hepatocystis are known to infect African monkeys–H. kochi, H. simiae, H. bouillezi and H. cercopitheci [6] –but with little sequence data currently linked to morphological identification, it was not possible to determine the species. We have thus classified the parasite as Hepatocystis sp. ex Piliocolobus tephrosceles (hereafter Hepatocystis sp.; NCBI Taxonomy ID: 2600580). The extraction of the Hepatocystis sp. sequences from the P. tephrosceles assembly yielded a set of 11,877 scaffolds with a total size of 26.26 Mb and an N50 of 2.4 kb. Automated genome annotation with Companion [7] identified 2,967 genes and 1,432 pseudogenes in these scaffolds. To improve upon this assembly, we isolated putative Hepatocystis sp. reads from the original short read DNA sequencing data. These were assembled into a draft quality nuclear genome assembly of 19.95 Mb, comprising 2,439 contigs with an N50 of 18.3 kb (Table 1). The GC content (22.05%) was identical to that of Plasmodium spp. infecting rodents, and slightly higher than P. falciparum (19.34%). We identified 5,341 genes, compared to 5,441 in P. falciparum and 5,049 in P. berghei, suggesting a largely complete gene set (Table 1; S1 Table). Despite the fragmented nature of the assembly, we were able to identify synteny with P. falciparum around centromeres (S1 Fig) and evidence of clustering of contingency gene families (S2 Fig), as seen in most Plasmodium species.

Fig 1. An assembly of genomic sequencing reads from a red colobus blood sample contained significant amounts of sequence from the parasite Hepatocystis spp.

Fig 1

(A) Contigs from the red colobus (Piliocolobus tephrosceles) assembly had a bimodal distribution of AT-content and sequence similarity to Plasmodium spp. (B). A phylogenetic tree of cytochrome b indicated that the closest match for the apicomplexan parasite sequenced from red colobus blood is a Hepatocystis isolate from a monkey host. Parasite cytochrome b sequences derived from RNA-seq assemblies from red colobus blood samples are almost entirely identical to the cytochrome b sequence assembled from Hepatocystis DNA reads from a single monkey. Branches of the tree have been coloured by bootstrap support values from 15 (red) to 100 (green). Some bootstrap support values are also shown next to the nodes as text. Red arrows highlight the Hepatocystis samples from the current study. Blue place names indicate the African continent, green Australia, orange Asia.

Table 1. Features of the Hepatocystis sp. ex. Piliocolobus tephrosceles assembly compared to P. falciparum 3D7, P. vivax P01 and P. berghei ANKA.

Hepatocystis sp. P. falciparum 3D7 P. vivax P01 P. berghei ANKA
Nuclear genome
    Genome size (Mb) 19.95 23.3 29.0 18.7
    G+C content (%) 22.05 19.34 39.8 22.05
    Gaps within scaffolds 979 0 431 0
    No. of scaffolds 2439 14 240 19
    No. of chromosomes ND 14 14 14
    No. of genes* 5,341 5,441 6,650 5,049
    No. of pseudogenes 28 158 158 129
    No. of partial genes 1,475 0 196 8
    No. of ncRNA 19 103 35 47
    No. of tRNAs 41 45 45 45
    No. of telomeres 0** 26 1 12
    No. of centromeres 5 13 14 14
Mitochondrial genome
    Genome size (bp) 6,595 5,967 5,989 5,957
    G+C content (%) 30.99 31.6 30.5 30.9
    No. of genes
3 3 3 3
Apicoplast genome
    Genome size (kb) 27.0 34.3 29.6 34.3
    G+C content (%) 13.29 14.22 13.3 15.1
    No. of genes 28
30 30 30
Completeness
    CEGMA—complete 63.31% 69.35% 68.15% 70.16%
        as least partial 69.35% 71.77% 71.77% 73.39%

* including pseudogenes, duplications and partial genes, excluding non-coding RNA genes

** two small contigs have telomeric repeats (scaffold_2410–5 telomeric repeats, scaffold_2364–9 telomeric repeats)

Phylogenetic position of Hepatocystis sp. ex. Piliocolobus tephrosceles

There is consensus that Hepatocystis spp. are nested within the Plasmodium genus [2,8,9], however their placement within the genus has not been robustly determined. Indeed, our cytochrome b phylogeny confirms that our assembled genome is that of a Hepatocystis species, but it provides little support for the placement of this genus in relation to Plasmodium spp. A phylogeny generated using all mitochondrially encoded protein sequences also provided little support for key nodes (S3 Fig; see GitHub page for all sequence alignment data). The mitochondrial genome is therefore not reliable for determining the species phylogeny. A phylogeny based on 18 apicoplast proteins was more robust and placed Hepatocystis sp. as an outgroup to the Plasmodium species infecting rodents (Vinckeia; S4 Fig). We wanted to reliably place Hepatocystis sp. relative to other Hepatocystis species. Limited sequence data are available for Hepatocystis outside of this study. However, 11 nuclear genes have been sequenced for H. epomophori, a parasite of bats [2]. Based on the sequence of these genes, we found that Hepatocystis sp. forms a sister group to H. epomophori (S5 Fig). Furthermore, the Hepatocystis genus again forms a sister group to the Vinckeia subgenus of Plasmodium, although the tree contains some ambiguous branch points. To improve the robustness of the placement of Hepatocystis sp. within Plasmodium, we used 2673 orthologous nuclear genes from each of 12 species across the Plasmodium genus, which robustly places Hepatocystis sp. as a sister clade to the Plasmodium species infecting rodents (subgenus Vinckeia; Fig 2). Interestingly, some Vinckeia species (P. cyclopsi) also infect bats, supporting an earlier suggestion that the ancestor of Hepatocystis and Vinckeia might have infected bats [10]. However, whole-genome data also suggest that Vinckeia is derived from a group of monkey-infecting parasites [11]. Hepatocystis sp. has a long branch that could indicate rapid evolution after splitting from its ancestor with Vinckeia.

Fig 2. Whole genome phylogeny and key features of the Hepatocystis genome.

Fig 2

A whole-genome phylogenetic tree is combined with a graphical overview of key features of Hepatocystis and Plasmodium species (genome versions from August 2019). The maximum likelihood phylogenetic tree of Hepatocystis, Plasmodium and Haemoproteus species is based on an amino acid alignment of 2673 single copy orthologs encoded by the nuclear genome. Bootstrap support values of all nodes were 100, except for one node where the value was 79. The rooting of the tree at Hae. tartakovskyi is based on previously published Plasmodium phylogenetic trees [11,45]. TRAP—thrombospondin-related anonymous protein. RBP protein—reticulocyte binding protein.

In vivo transcriptome data supports a lack of erythrocytic schizogony

Transcriptome sequencing of blood samples from 29 individuals was performed as part of the red colobus monkey genome sequencing project [12]. We found evidence that each of these individuals was infected with the same species of Hepatocystis as found in the genomic reads, consistent with high prevalence of this parasite in Kibale red colobus monkeys [6]. The extremely low SNP density suggested that parasites from different red colobus individuals were highly related (Fig 3A). We identified an average of 1.36 SNPs (standard deviation = 0.47) and 0.43 indels (standard deviation 0.15) per 10 kb of genome when calling variants using RNA-seq reads.

Fig 3. Hepatocystis sp. in vivo RNA-seq data supports a lack of erythrocytic schizogony and a variable sex ratio.

Fig 3

(A) Distributions of SNPs per 100 kb in each Hepatocystis sp. RNA-seq sample, relative to the genome assembly reference, highlight consistently low genetic diversity. Samples SAMN07757853, SAMN07757863, SAMN07757870 and SAMN07757873 have been excluded from the figure due to their low expression of Hepatocystis genes. (B) Deconvolution of RNA-seq samples to identify parasite stage composition shows no evidence for blood schizonts. Ring and trophozoite cells are assumed to relate to early stages of gametocyte development, which are not distinguishable from asexual rings and trophozoites. (C) Proportions of early blood stages (ring and trophozoite) are negatively correlated with mature female gametocytes, however male and female gametocyte ratios are poorly correlated, suggesting that sex ratios vary among samples.

Although it is believed that Hepatocystis spp. do not undergo erythrocytic schizogony [3], this has been challenged by limited microscopic evidence for asexual stages in the blood [13]. To determine whether there was transcriptomic evidence for schizonts in the blood we deconvoluted the Hepatocystis sp. transcriptomes using transcriptome profiles representing different Plasmodium life stages. We found no such evidence, but observed varying proportions of cells identified as early blood stages (rings/trophozoites) and mature gametocytes (Fig 3B; S6 Fig; S2 Table). It is thought that chronic infections (of up to 15 months) may be maintained from continual development in the liver [3]. The presence of early blood stages in these individuals may therefore reflect this continual production of new blood forms, rather than recent infection. Proportions of rings and trophozoites were positively correlated and both these forms correlated negatively with female gametocytes (Fig 3C). Interestingly, the inferred proportions of male and female gametocytes were not strongly correlated suggesting there might be variation in commitment rates of gametocytes to male or female development.

Expanded and novel gene families

The largest gene family in Hepatocystis sp. was a novel family, which we have named Hepatocystis-specific family 1 (hep1; Table 2). These 12 single-exon genes (plus four pseudogenes) each encode proteins of ~250 amino acids, beginning with a predicted signal peptide (S7A Fig). We could find no significant sequence similarity to genes from any other sequenced genome (using HHblits [14]). However, they contain a repeat region with striking similarity to that in Plasmodium kahrp, a gene involved in presenting proteins on the red blood cell surface [15]. Three members were highly expressed in in vivo blood stages, with one correlating well with the presence of early stages (S8A Fig; HEP_00211100, Pearson’s r = 0.80 with rings). Another novel family, hep2, contained N-terminal PEXEL motifs, suggesting it is exported (S7B Fig). Distinct parts of its sequence showed similarity (albeit with low significance) to exported proteins from P. malariae (PmUG01_00051800; probability: 77.88% HHblits) and P. ovale curtisi (PocGH01_00025800; 78.47%) as well as a gene in P. gallinaceum (PGAL8A_00461100; 69.52%). Three or four members were highly expressed in blood stages in vivo, with one member highly correlated with predicted proportions of early blood stages (S8B Fig; HEP_00165500, Pearson’s r = 0.83 with rings; HEP_00480100, Pearson’s r = 0.77 with rings).

Table 2. Size of known and novel gene families in Hepatocystis sp. in comparison to its relatives.

Hepatocystis sp. Plasmodium falciparum 3D7 P. berghei ANKA P. vivax P01 Haemoproteus tartakovskyi
ApiAP2 transcription factors 27 27 26 27 20
Hepatocystis-specific family 1 (hep1) 16 (includes 4 pseudogenes) 0 0 0 0
Hepatocystis-specifc family 2 (hep2) 10 (includes 4 pseudogenes) 0 0 0 0
cpw-wpc 8 9 9 9 8
6-cysteine proteins 7 14 13 15 (includes 1 pseudogene) 22
lccl 6 6 6 6 6
Thrombospondin-Related Anonymous Protein (trap) 6 1 1 1 1
pir 5 (includes 1 pseudogene) 0 218 (includes 83 pseudogenes) 1212 (includes 109 pseudogenes) 0
Serine repeat antigen (SERA) 5 (1 pseudogene) 9 5 13 2
early transcribed membrane protein (etramp) 4 15 7 10 1
exported protein 1 (exp1) 4 (includes 2 pseudogenes) 1 1 1 0
Tryptophan-rich antigen 4 4 (includes 1 pseudogene) 11 (includes 4 pseudogenes) 40 0
Lysophospholipase 2–4 6 (includes 1 pseudogene) 5 (includes 2 pseudogenes) 7 (includes 1 pseudogene) 1
Phist domain-containing 2 81 (includes 19 pseudogenes) 3 82 (includes 6 pseudogenes) 3
fam-a 1 1 74 (includes 26 pseudogenes) 1 1 (partial)
Reticulocyte binding protein (Rh/rbp) 0 7 15 10 0

The gene encoding Thrombospondin-Related Anonymous Protein (trap) is involved in infection of salivary glands and liver cells by Plasmodium sporozoites. It is found strictly in a single copy in all Plasmodium species sequenced to date. However, it is present in six copies in Hepatocystis sp., suggesting that trap-mediated aspects of sporozoite-host interactions may be more complex. None of these genes were highly expressed in blood stage transcriptomes, consistent with their known role in sporozoites. Exported protein 1 (e.g. PF3D7_1121600) is a single copy gene in all Plasmodium species. It encodes a Parasitophorous Vacuolar Membrane (PVM) protein and is important for host-parasite interactions in the liver [16]. In Hepatocystis sp. it is expanded to four copies.

Missing orthologues tend to be involved in erythrocytic schizogony

The genomes of Plasmodium spp. each contain large families of genes known or thought to be involved in host parasite interactions [17]. These include, amongst others the var, rif and stevor genes in the Laverania subgenus, SICAvar genes in P. knowlesi and the pir genes across the genus. We find only four intact pir genes and a single pir pseudogene in Hepatocystis sp., compared to ~200–1000 in Plasmodium spp. infecting rodents. One was particularly highly expressed in the blood stages from most monkeys that were sampled (S8C Fig; HEP_00069900). All of these are most similar to the ancestral pir subfamily, present in single copy in Vinckeia species and in 19 copies in P. vivax P01 (Table 2; S9 Fig). The best described role for pir genes is in Vinckeia parasites, where they are involved in establishing chronic infections in mice [18]. Given that asexual Hepatocystis parasites are not thought to exist in the blood of monkeys or bats, there would be no need for this function of pir genes. However, in Vinckeia, pir genes are expressed in several other stages, including male gametocytes [19], which do feature in the Hepatocystis lifecycle. The function of the ancestral pir gene subfamily is unknown, although it is expressed at multiple stages of the lifecycle in P. berghei [20].

We wanted to determine, more generally, the types of genes that might have been lost in Hepatocystis sp. relative to Plasmodium spp. This is made difficult due to uncertainty in determining missingness in a draft genome. We overcame this problem using clusters of genes identified as having common expression patterns across the lifecycle of the close Hepatocystis relative P. berghei [20]. Cluster 10 (late schizont expression; Fisher’s Exact test odds ratio = 0.09, FDR = 0.0002) tended to contain orthologues shared by P. berghei, P. ovale wallikeri and P. vivax, but not Hepatocystis sp. (Fig 4; S3 Table; S10 Fig). Eleven out of 25 orthologues missing in the late schizont cluster (including two pseudogenes) encoded Reticulocyte Binding Proteins (RBPs). In fact, we could not identify any RBPs in the Hepatocystis sp. genome or Hepatocystis sp. RNA-seq assemblies. Interestingly, counter to previous suggestions [21], we found no evidence for RBP sequences in the Hae. tartakovskyi genome sequence. However, we did find fragmentary sequences with significant sequence similarity to RBPs from malaria parasites with avian hosts in transcriptome assemblies of Leucocytozoon buteonis [22] and Hae. columbae [23] (S11 Fig). Also missing from this cluster was Cdpk5, a kinase that regulates parasite egress from red cells [24]. The other principal gene families involved in erythrocyte binding and invasion by Plasmodium are the erythrocyte binding ligands (eh)/duffy-binding protein (dbp) and the merozoite surface protein (msp) families [25]. These are largely conserved relative to P. berghei. Thus, orthologues missing relative to Plasmodium spp. tend to be involved in erythrocytic schizogony, the part of the life cycle also absent.

Fig 4. Hepatocystis sp. orthologues of genes which are highly expressed in late schizogony in P. berghei tend to be missing from the genome.

Fig 4

P. berghei gene expression clusters from the Malaria Cell Atlas were used to determine whether orthologous genes, conserved with P. vivax and P. ovale, but absent in Hepatocystis, tended to be expressed in particular parts of the life cycle. The only significant cluster was cluster 10, which includes genes most highly expressed in late schizont stages (highlighted by a red box). The top panels show the log2 observed/expected ratios for orthologous genes in each cluster which are shared between P. berghei and Hepatocystis and the same ratio for those which are shared between P. berghei, P. vivax and P. ovale, but not Hepatocystis sp. Cluster 1, 18, 19 and 20 were not tested because they contained fewer than 2 expected counts for either ratio. The asterisk indicates a Fisher’s exact test false discovery rate of < = 0.05.

The most rapidly evolving genes are often involved in vector biology and control of gene expression

Our whole-genome phylogeny (Fig 2) showed a long Hepatocystis sp. branch, suggesting that some genes have changed extensively in Hepatocystis compared to Plasmodium spp. This might indicate functional changes important for the particular biology of Hepatocystis. We found previously that the ratio of synonymous mutations to synonymous sites (dS) saturates between Plasmodium clades [45] and therefore considered the ratio of non-synonymous mutations (dN) rather than the more commonly used dN/dS. We first looked for enrichment of conserved protein domain families in genes with the highest 1% of dN values. There was an enrichment for the AP2 domain (Pfam:PF00847.20; Fisher test with BH correction; p-value = 0.01). Most Plasmodium species possess 27 ApiAP2 transcription factors containing this domain, which are thought to be the key players in control of gene expression and parasite development across the life cycle. AP2-G plays an important role in exiting the cycle of schizogony and commitment to gametocytogenesis in Plasmodium spp. [26,27], whereas, as demonstrated, Hepatocystis sp. lacks erythrocytic schizogony. Hepatocystis spp. also form much larger cysts in the liver (giving the genus its name) and develop in different tissues within a different insect vector compared to Plasmodium spp. Our Hepatocystis sp. assembly contained orthologues of all 27 ApiAP2 genes present in P. falciparum (Table 2). This suggests that life cycle differences between Plasmodium and Hepatocystis spp. are not reflected in gain or loss of these key transcription factors. However, the relatively high rate of non-synonymous mutations suggests there may have been significant adjustment in how these transcription factors act. To determine parts of the life cycle that were enriched for the most rapidly evolving genes, we looked at whether particular gene expression clusters from the Malaria Cell Atlas [19,20] were enriched for genes with high dN (top 5% of values; Table 3; S12A Fig; S4 Table). We found that three clusters (2, 4 and 6) had fewer genes with high dN than expected by chance (Fisher’s exact test with Holm multiple hypothesis testing correction, p-value < 0.05) and that these contained genes expressed across much of the life cycle, especially growth phases. These clusters also tended to contain essential genes expected to be highly conserved in Hepatocystis sp. Although there was not a significant trend for gametocyte-associated genes having higher than average dN, the top 5% of genes ranked by dN contained several putative gametocyte genes (Table 3). Two of these encode putative 6-cysteine proteins P47 and P38, the first required for female gamete fertility [28]. Additionally, Merozoite TRAP-like protein (MTRAP), essential for Plasmodium gamete egress from erythrocytes [29] and two genes (HEP_00254800 and HEP_00195400) with orthologues involved in osmiophilic body formation [30,31] had high dN values. Overall, clusters involved in ookinete (15) and general mosquito stages (16) had significantly higher values than other clusters (Kolmogorov-Smirnov test: Cluster 15 vs all other clusters: D = 0.42, p-value = 1.05e-05. Cluster 16 vs all other clusters: D = 0.52, p-value = 4.50e-12; S12B Fig). This is also reflected in the Hepatocystis sp. genes with the highest dN values, which include oocyst rupture protein 2 (orp2; dN = 1.08), ap2-o, ap2-sp2, secreted ookinete protein (psop7) and osmiophilic body protein (g377). These genes provide clues about changes in the parasite that might relate to its adaptation to transmission by biting midges, rather than mosquitoes.

Table 3. Top 15 genes with functional annotations ranked by Hepatocystis sp. dN in comparison of Hepatocystis sp., P. berghei ANKA and P. ovale curtisi.

Genes with completely unknown function and genes with very little information on their possible functions have been left out from this table. The rank column indicates the Hepatocystis sp. dN rank of each gene in the complete table (with 4009 genes) that includes genes with unknown function (S4 Table).

Gene id Hepato-
cystis
dN
P. berghei dN P. ovale dN Annotations Rank Putative function
HEP_00146800
PBANKA_1303400
PocGH01_12075300
1.08 0.21 0.42 Oocyst rupture protein 2 (ORP2) 3 Sporozoite egress from the oocyst [32]
HEP_00446500
PBANKA_1003000
PocGH01_03012800
0.71 0.19 0.34 liver specific protein 2 (LISP2) 19 Involved in liver stage development [33]
HEP_00295100
PBANKA_0905900
PocGH01_09049300
0.71 0.31 0.63 AP2-O 20 Essential for morphogenesis in ookinete stage in Plasmodium [34]
HEP_00035800
PBANKA_1107600
PocGH01_10033500
0.69 0.27 0.16 6-cysteine protein (p38) 24 P. berghei p38 is expressed in gametocytes and in asexual blood stages [28]
HEP_00213800
PBANKA_1001800
PocGH01_03011500
0.67 0.45 0.63 AP2 domain transcription factor AP2-SP2 28 Required for sporozoite production in Plasmodium [35]
HEP_00337100
PBANKA_0112100
PocGH01_11043100
0.62 0.36 0.42 AP2 domain transcription factor ApiAP2 40 Involved in blood stage replication [35,36]
HEP_00456700
PBANKA_0512800
PocGH01_06021700
0.62 0.61 0.38 Merozoite TRAP-like protein (MTRAP) 42 Essential for gamete egress from erythrocytes [29]
HEP_00304800
PBANKA_1353400
PocGH01_12023100
0.62 0.32 0.26 Secreted ookinete protein (PSOP7) 43 Secreted ookinete proteins are necessary for invasion of the mosquito midgut [37]
HEP_00254800
PBANKA_1449000
PocGH01_14060000
0.59 0.26 0.26 Microgamete surface protein MiGS 54 Plays a critical role in male gametocyte osmiophilic body formation and exflagellation [30]
HEP_00115700
PBANKA_0304400
PocGH01_04023000
0.55 0.41 0.15 Merozoite surface protein 4 (MSP4) 62 Merozoite surface proteins are involved in red blood cell invasion [38]
HEP_00166600
PBANKA_0301000
PocGH01_04026800
0.54 0.36 0.2 Repetitive organellar protein (ROPE) 64 Localised to the apical end of merozoites, possibly involved in red blood cell invasion [39]
HEP_00195400
PBANKA_1463000
PocGH01_14074600
0.54 0.24 0.26 Osmiophilic body protein (G377) 67 Female‐specific protein, affects the size of the osmiophilic body and female gamete egress efficiency [31]
HEP_00130400
PBANKA_1358000
PocGH01_12018000
0.53 0.17 0.11 Thioredoxin 2 (TRX2) 75 Part of a protein complex in parasitophorous vacuolar membrane, required for pathogenic protein secretion into host [40], important for maintaining normal blood‐stage growth [41]
HEP_00391300
PBANKA_0313400
PocGH01_04013500
0.52 0.49 0.27 Autophagy-related protein 11 (ATG11) 76 Predicted to be involved in cargo selection in selective autophagy [42]
HEP_00155500
PBANKA_1302300
PocGH01_12076400
0.52 0.18 0.17 Metacaspase-2 80 Protease with caspase-like activity [43]

Discussion

We have taken advantage of parasite reads captured as part of a primate genome sequencing study in order to assemble and annotate a draft quality genome sequence for a species of Hepatocystis. Our nuclear and apicoplast genome phylogenies confirm the recently proposed phylogenetic placement of this genus as an outgroup to the rodent-infecting Vinckeia subgenus of Plasmodium [2]. However, a distinct branching pattern and low bootstrap support for many nodes in our mitochondrial genome phylogeny highlights why some previous analyses have come to different conclusions about the placement of Hepatocystis. Thus, the use of mitochondrial genes to infer phylogenetic relationships between species within the Haemosporidia should be approached with caution. We found a long branch leading to Hepatocystis sp., suggesting a relatively deep split from the rodent-infecting species. In addition, we showed robustly that Hepatocystis sp. clusters, as expected, with Hepatocystis epomophori, which infects bats. This finding supports the polyphyly of the Plasmodium parasites infecting apes, monkeys and rodents but the monophyly of Hepatocystis itself. A close relative of rodent-infecting P. berghei (P. cyclopsi) has been found in bats [10] and thus the Hepatocystis/Vinckeia group represents a relatively labile group with respect to host preference. Indeed, the possibility of cross-species transmission of Hepatocystis sp. was reported previously [6].

The paraphyly of Plasmodium with respect to Hepatocystis exists because Hepatocystis spp. lack a defining characteristic of the Plasmodium genus, namely erythrocytic schizogony—asexual development in the blood. Thus, the very part of the Plasmodium life cycle which causes the symptoms of malaria is thought to be absent in Hepatocystis. While multiple lines of enquiry have failed to identify these forms (Garnham, 1966) there has remained some doubt, with reports of cells with schizont-like morphology in the blood for some species [13,44]. We were able to take advantage of bulk RNA-seq data collected from blood samples of a number of red colobus monkeys, all apparently infected with very closely related Hepatocystis parasites. By comparing to single-cell RNA-seq data from known cell types, we found no evidence for schizont stages in the blood. The apparent lack of schizonts could be due to sequestration of these stages away from the bloodstream. This is seen in some Plasmodium species, so we cannot rule out schizogony away from peripheral circulation. However, in its current state, this lack of evidence supports the idea that erythrocytic schizogony has been lost three times: in Hepatocystis, Polychromophilus and Nycteria [2]. Ancestrally, gametocytogenesis must have been the default developmental pathway, being required for transmission. However, in Plasmodium, it seems that erythrocytic schizogony became the default developmental pathway with epigenetic control of the ApiAP2-G transcription factor, required for development into sexual stages [26,27]. Perhaps the simplest explanation for a loss of erythrocytic schizogony would be that ApiAP2-G is no longer under control, but is constitutively expressed in parasites leaving the liver. In line with this idea we find ApiAP2-G present and highly expressed in blood stage Hepatocystis sp. Furthermore, all Plasmodium ApiAP2 transcription factors are conserved in Hepatocystis sp., indicating that changes in its life cycle are not associated with their loss or gain.

The lack of erythrocytic schizogony is supported by a tendency for orthologous genes missing in Hepatocystis sp. (relative to Plasmodium spp.) to be those expressed in blood schizonts. The most noticeable example is the complete absence of the Reticulocyte Binding Protein (RBP) family, found across all Plasmodium spp. examined so far, including those which infect birds [45]. RBP proteins are known to function as essential red blood cell invasion ligands in Plasmodium falciparum [46] and multiple copies are thought to provide alternative invasion pathways [25]. However, previous transcriptomic data have suggested that, while rbp genes in P. berghei are highly expressed in schizonts, they are less abundant or have a distinct repertoire in liver stages [20,47,48]. This implies that RBPs are less important, or at least that distinct invasion pathways are used by first generation merozoites. Also missing were cdpk5 (involved in schizont egress) and msp9 (an invasion related gene enriched in blood vs. liver schizonts in Caldelari et al. [47]). Taken together, these results underscore the increasing realisation that first generation merozoites have distinct properties from merozoites that have developed in the blood and suggest the way in which first generation merozoites invade red blood cells may be distinct. Looking more widely across the Haemosporidia, we found scant evidence for RBPs outside of the Plasmodium genus. There were no matches at all in the draft genome sequence of the relatively close Plasmodium outgroup Haemoproteus tartakovskyi. However, there were fragmentary matches to RBPs from Plasmodium species infecting birds in transcriptome assemblies of Hae. columbae and the more distant Leucocytozoon buteonis. The Hae. columbae fragment aligned to a conserved region of this divergent family, suggesting it could encode some core function of RBPs. More complete genome sequences from across the haemosporidians will be needed to understand the evolution of this gene family, which seems to be key for understanding the host specificity of Plasmodium parasites [49].

The genomes of Plasmodium spp. each contain large, rapidly evolving gene families that are known, or thought to be involved in host-parasite interactions, principally in asexual blood stages. The reason for their numbers may be due to a bet-hedging strategy, providing the diversity necessary for evading adaptive immune responses or dealing with unpredictable host variation. Although the Hepatocystis sp. genome contains two novel multigene families, we identified only 10–15 copies of each. The largest gene families in the closely related rodent-infecting Plasmodium species (pir and fam-a) are present here only as their ancestral orthologues, also conserved in the monkey-infecting Plasmodium species. We should be cautious in noting a lack of expansion in such families in Hepatocystis sp., as previous draft Plasmodium genome sequences have been shown to under-represent these genes. However, it is perhaps not surprising that pir genes are poorly represented. They are thought to play a role in the maintenance of chronic infections mediated by asexual stages in the blood [18] and Hepatocystis infection does not involve this stage of development. Given that Hepatocystis can sustain long chronic infections [1], presumably from the liver, this parasite may help us to better understand how Plasmodium survives in the liver.

A striking feature of the Hepatocystis life cycle is its vector—a biting midge rather than a mosquito. We found evidence of rapid evolution amongst orthologues of Plasmodium genes involved in mosquito stages of development, suggesting that adaptation to a new insect vector was a major evolutionary force. These rapidly evolving genes provide insights into parasite-vector interactions and may provide avenues for the development of interventions to prevent transmission of the malaria parasite.

Overall, our findings demonstrate the insights that can be gained into malaria parasite biology from relatives of Plasmodium, even with draft-quality genome sequences. In the future, we expect that high-quality genome sequences of Hepatocystis spp., and additional relatives from genera such as Nycteria, Haemoproteus, and Polychromophilus, will be of great value for understanding the evolution and molecular biology of one of humanity's greatest enemies.

Methods

Ethics statement

All animal research was approved by the Uganda Wildlife Authority (permit number UWA/TDO/33/02), the Uganda National Council for Science and Technology (permit number HS 364), and the University of Wisconsin-Madison Animal Care and Use Committee (V005039) prior to initiation of the study. Biological materials were shipped internationally under CITES permit #002290 (Uganda). The animal was anesthetized with a combination of Ketamine (5 mg/kg) and Xylazine (2 mg/kg) administered intramuscularly using a variable-pressure air rifle (Pneudart, Inc, Williamsport, PA, USA). After sampling, the animal was given a reversal agent (Atipamezole, 0.5 mg/kg), and released after recovery back to its social group. All animal use followed the guidelines of the Weatherall Report on the use of non-human primates in research.

Sample collection and data generation

The sequence data used in this study were part of a project originally designed to generate a reference genome for the red colobus monkey (genus Piliocolobus). Biomaterials used were from wild Ugandan (or Ashy) red colobus monkey (Piliocolobus tephrosceles) individuals from Kibale National Park, Uganda. These animals reside in a habituated group that has been a focus of long-term studies in health, ecology, and disease [12,50,51]. Red colobus individuals were immobilised in the field as previously described [52]. Whole blood was collected using a modified PreAnalytiX PAXgene Blood RNA System protocol as described in Simons et al. [12]. Additionally, whole blood was collected into BD Vacutainer Plasma Preparation Tubes, blood plasma and cells were separated via centrifugation, and both were subsequently aliquoted into cryovials and stored in liquid nitrogen. Samples were transported to the United States in an IATA-approved liquid nitrogen dry shipper and then transferred to −80 °C for storage until further processing.

Methods for DNA extraction, library preparation, and whole genome sequencing are described in Simons [53]. Briefly, high molecular weight DNA was extracted from the blood cells of one red colobus monkey individual and size selected for fragments larger than 50,000 base pairs. A 10X Genomics Chromium System library preparation was performed and subsequently sequenced on two lanes of a 150 bp paired-end Illumina HiSeqX run as well as two lanes of a 150 bp paired-end Illumina HiSeq 4000 run.

Methods for RNA extraction and library preparation are described in Simons et al. (2019). Briefly, RNA was extracted from 29 red colobus individuals using a modified protocol for the PreAnalytiX PAXgene Blood RNA Kit protocol. Total RNA extracts were concentrated, depleted of alpha and beta globin mRNA, and assessed for integrity (RIN mean: 8.1, range: 6.6–9.2). Sequencing libraries were prepared using the KAPA Biosystems Stranded mRNA-seq Kit and sequenced on four partial lanes of a 150 bp paired-end Illumina HiSeq 4000 run. These data were uploaded to NCBI as part of BioProject PRJNA413051.

Separation of Hepatocystis and Piliocolobus scaffolds

The Piliocolobus tephrosceles genome assembly (ASM277652v1) was downloaded from the NCBI database. Scaffolds were first sorted by their GC% and Diamond 0.9.22 [54] BLASTX hits against a database of representative apicomplexan and Old World monkey proteomes. The sorting was improved by examining mapping scores of the scaffolds mapped to Plasmodium species and Macaca mulatta genomes (Mmul_8.0.1, GenBank assembly accession GCA_000772875.3) using Minimap2 2.12 [55]. The separation of scaffolds was further verified and refined by running NCBI BLAST of 960 bp fragments of all scaffolds against the NCBI nt database (Jul 18 2017 version) [56]. To predict genes in the apicomplexan scaffolds, Companion automatic annotation software [7] was run with these scaffolds as input and the P. vivax P01 genome as the reference.

Identification of Hepatocystis sequences in Piliocolobus RNA-seq data

Illumina HiSeq 4000 RNA-seq reads from the study PRJNA413051 were downloaded from the European Nucleotide Archive. In order to find out if the RNA-seq data contained apicomplexan sequences, mapping of these reads to apicomplexan scaffolds from Piliocolobus tephrosceles genome assembly (ASM277652v1) was done using HISAT2 2.1.0 [57].

Hepatocystis genome assembly

Filtering of reads for assembly

Minimap2 [55] and Kraken 2.0.8-beta [58] were used to identify the best matching species for each 10x Chromium genomic DNA read (from Illumina HiSeq X and HiSeq4000 platforms). Our Kraken database contained 17 Old World monkey genomes and 19 Plasmodium genomes downloaded from NCBI FTP in June 2018 [56]. The Kraken database also included the contigs of the P. tephrosceles assembly ASM277652v1, separated into P. tephrosceles and Hepatocystis sp. Plasmodium malariae UG01 (from PlasmoDB [59] version 39) and Macaca mulatta (Mmul_8.0.1) assemblies were used as reference genomes for the assignment of reads based on Minimap2 mapping scores. Reads that were unambiguously identified as monkey sequences using Kraken and Minimap2 were excluded from subsequent assemblies. The Supernova assembler manual [60] warns against exceeding 56x coverage in assemblies. Reads selected for Supernova assemblies were therefore divided into 34 batches, with ~10 million reads in each batch. Reads were ordered by their barcodes so that those with the same barcode would preferentially occur in the same batch.

Supernova and SPAdes assemblies

We generated 34 assemblies with Supernova v2.1.1 with default settings. In addition, two SPAdes v3.11.0 [61] assemblies with default settings were generated with Hepatocystis reads: one with HiSeq X reads and another with HiSeq 4000 reads. Chromium barcodes were removed from the reads before the SPAdes assemblies.

Deriving the mitochondrial sequence

Hepatocystis Supernova and SPAdes assembly contigs were mapped to the P. malariae UG01 genome from PlasmoDB version 40 with Minimap2. The sequences of contigs that mapped to the P. malariae mitochondrion were extracted using SAMTools 0.1.19-44428cd [62] and BEDtools v2.17.0 [63]. The contigs were oriented and then aligned using Clustal Omega 1.2.4 [64]. Consensus sequence of aligned contigs was derived using Jalview 2.10.4b1 [65]. The consensus sequence was circularised with Circlator minimus2 [66].

Canu assembly

Scaffolds from the Supernova assemblies were broken into contigs. All contigs from the Supernova and SPAdes assemblies were pooled and used as the input for Canu assembler 1.6 [67] in place of long reads. Canu assembly was done without read correction and trimming stages. The settings for Canu were as follows: -assemble genomeSize = 23000k minReadLength = 300 minOverlapLength = 250 corMaxEvidenceErate = 0.15 correctedErrorRate = 0.16 stopOnReadQuality = false -nanopore-raw.

Processing of Canu unassembled sequences file

Selected contigs from the Canu unassembled sequences output file (*.unassembled.fasta) were recovered and pooled with assembled contigs (*.contigs.fasta). The first step in the filtering of the contigs of the unassembled sequences file was to exclude contigs that had a BLAST match in the assembled sequences output file (with E value cutoff 1e-10). Next, contigs where low complexity sequence content exceeded 50% (detected using Dustmasker 1.0.0 [68]) were removed. Contigs with GC content higher than 50% were also removed. Diamond BLASTX (against a database of Macaca mulatta, P. malariae UG01, P. ovale wallikeri, P. falciparum 3D7 and P. vivax P0 proteomes) and BLAST (using the nt database from Jul 18 2017 and nr database from Jul 19 2017) were then used to exclude all contigs where the top hits were not an apicomplexan species. In total, 0.34% of contigs from the unassembled sequences file were selected to be included in the assembly.

Deduplication of contigs

Initial deduplication of contigs was done using BBTools dedupe [69] (Nov 20, 2017 version) and GAP5 v1.2.14-r3753M [70] autojoin. In addition, BUSCO 3.0.1 [71] was used to detect duplicated core genes with the protists dataset. Two contigs flagged by BUSCO as containing duplicated genes were removed. All vs all BLAST of contigs (with E-value cutoff 1e-20, minimum overlap length 100 bp, minimum identity 85%) was used to find possible cases of remaining duplicated contigs. Contigs yielding BLAST hits were aligned with MAFFT v7.205 [72] and the alignments were manually inspected. Contained contigs were deleted and contigs that had unique overlaps with high identity were merged into consensus sequences using Jalview.

Removal of contaminants after Canu assembly

All Canu assembly contigs were checked with Diamond against a database of Macaca mulatta, P. malariae UG01, P. ovale wallikeri, P. falciparum 3D7 and P. vivax P01 proteomes. The Diamond search did not detect any contaminants. Contigs not identified by Diamond were checked with BLAST against the nt database (Jul 18 2017 version). Contigs where the top BLAST hit was a human or monkey sequence were removed from the assembly.

A subset of contigs in the assembly was observed to consist of short sequences with low complexity, high GC% and low frequency of stop codons. These contigs did not match any sequences by BLAST search against nt and nr databases (with E-value cutoff 1e-10). Due to their difference from the rest of the contigs in the assembly, it was assumed that these contigs were contaminants rather than Hepatocystis sequences. In order to programmatically find these contigs, GC%, tandem repeats percentage, percentage of low complexity content and frequency of stop codons were recorded for all contigs in the assembly. Tandem Repeats Finder 4.04 [73] was used to assess tandem repeats percentage and Dustmasker 1.0.0 [68] was used to find low complexity sequence content. PCA and k-means clustering (using R version 3.5.1) showed that the assembly contigs separated into two groups based on these parameters. The group of contigs with low complexity (189 contigs) was removed from the assembly.

Scaffolding and polishing of Canu assembly contigs

Before scaffolding, contigs were filtered by size to remove sequences shorter than 200 bp. Hepatocystis RNA-seq reads were extracted from RNA-seq sample SAMN07757854 using Kraken 2. Canu assembly contigs were scaffolded with these reads using P_RNA_scaffolder [74]. To correct scaffolding errors, the scaffolds were processed with REAPR 1.0.18 [75] using 197819014 unbarcoded Hepatocystis DNA read pairs. REAPR was run with the perfectmap option and -break b = 1. Next, the assembly was scaffolded using Scaff10x (https://github.com/wtsi-hpag/Scaff10X) version 3.1, run for 4 iterations with the following settings: -matrix 4000 -edge 1000 -block 10000 -longread 0 -link 3 -reads 5. 197,819,014 Hepatocystis DNA read pairs were used for Scaff10x scaffolding. After this, P_RNA_scaffolder was run again as above. This was followed by running Tigmint 1.1.2 [76] with 419,652,376 Hepatocystis read pairs to correct misassemblies. fill_gaps_with_gapfiller (https://github.com/sanger-pathogens/assembly_improvement/blob/master/bin/fill_gaps_with_gapfiller) was used to fill gaps in scaffolds, using 197819014 unbarcoded Hepatocystis DNA read pairs. After this, ICORN v0.97 [77] was run for 5 iterations with 4608740 Hepatocystis read pairs. This was followed by polishing the assembly with Pilon 1.19 [78] using 21,794,613 Hepatocystis read pairs. Assembly completeness was assessed with CEGMA v2.5 [79]. P. berghei ANKA, P. ovale curtisi and P. falciparum 3D7 genomes from PlasmoDB release 45 were also assessed with CEGMA with the same settings in order to compare the Hepatocystis assembly with Plasmodium assemblies.

Curation and annotation of the Hepatocystis genome assembly

The assembly was annotated using Companion [7]. The alignment of reference proteins to target sequence was enabled in the Companion run but all other parameters were left as default. A GTF file derived from mapping of Hepatocystis RNA-seq reads of three biological samples (SAMN07757854, SAMN07757861 and SAMN07757872) to the assembly was used as transcript evidence for Companion. To produce the GTF file, the RNA-seq reads were mapped to the assembly using 2-pass mapping with STAR RNA-seq aligner [80] (as described in the "Variant calling of RNA-seq samples" section) and the mapped reads were processed with Cufflinks [81]. All Plasmodium genomes available in the web version of Companion were tested as the reference genome for annotating the Hepatocystis genome, in order to find out which reference genome yields the highest gene density. For the final Companion run the P. falciparum 3D7 reference genome (version from June 2015) was used. The Companion output was manually curated using Artemis [82] and ACT [83] version 18.0.2. Manual curation was carried out to correct the overprediction of coding sequences, add missing genes and correct exon-intron boundaries. Altogether 680 gene models were corrected, 546 genes added and 221 genes deleted. RNA-seq data was used as supporting evidence. Non-coding RNAs were predicted with Rfam [84].

All genes were analysed for the presence of a PEXEL-motif using the updated HMM algorithm ExportPred v2.0 [85]. Distant homology to hep1 and hep2 gene families was sought by using the HHblits webserver with default options [14].

The reference genomes used to produce statistics on features of Plasmodium genomes in Fig 2 and Table 1 were as follows: P. relictum SGS1, P. gallinaceum 8A [45], P. malariae UG01, P. ovale wallikeri, P. ovale curtisi GH01 [11], P knowlesi H [86], P. vivax P01 [87], P. cynomolgi M [88], P. chabaudi AS [18], P. berghei ANKA [89], P. reichenowi CDC [90], P. falciparum 3D7 [91].

For S1 Table, transmembrane domains of proteins were predicted using TMHMM 2.0 [92]. Conserved domains were detected in proteins using HMMER i1.1rc3 (http://hmmer.org/) and Pfam-A database release 28.0 [93], with E-value cutoff 1e-5. Besides predicting exported proteins with ExportPred 2 [85], matches to PEXEL consensus sequence (RxLxE/Q/D) were counted in protein sequences using string search in Python. Signal peptides were detected using SignalP-5 [94].

Analysis of other Haemosporidian genomes and transcriptomes

Genomes and transcriptomes of other Haemosporidians

A Haemoproteus tartakovskyi genome assembly was downloaded from the Malavi database (http://130.235.244.92/Malavi/Downloads/Haemoproteus_tartakovskyi). The Companion annotation tool [7] was used to automatically annotate the assembly, using the P. falciparum 3D7 genome as the reference. The alignment of reference proteins to target sequence was enabled and the rest of the settings were left as default. The genome annotation was further edited manually using Artemis 18.1.0 [82].

A transcriptome assembly of Haemoproteus columbae [23] was downloaded from GenBank (GenBank ID GGWD00000000.1). Illumina MiSeq reads of Leucocytozoon and its host Buteo buteo were downloaded from the European Nucleotide Archive (study PRJEB5722). The reads from all four samples were processed with Cutadapt 2.7 (http://journal.embnet.org/index.php/embnetjournal/article/view/200) to remove artificial Illumina sequences and then assembled with SPAdes v3.13.1 [95] with the—rna flag. Contigs from the transcripts.fasta and soft_filtered_transcripts.fasta output files of SPAdes were pooled. Leucocytozoon contigs were separated from the contigs of Buteo buteo and bacterial contaminants using Diamond 0.9.24 [54], BLAST and Minimap2 2.17-r941 [55], similarly with how Hepatocystis contigs were isolated from the Piliocolobus assembly ASM277652v1. GAP5 1.2.14-r [70] was used to join the Leucocytozoon assembly contigs by unique overlaps. The Leucocytozoon assembly contigs were then polished using Pilon 1.23 [78] (3 iterations).

The completeness of the Haemoproteus and Leucocytozoon assemblies was assessed using CEGMA 2.5 [79]. The CEGMA completeness statistics for these Haemosporidian assemblies were the following. Haemoproteus tartakovskyi genome assembly: Completeness Complete: 64.52%, Completeness Partial: 68.55%. Haemoproteus columbae transcriptome assembly: Completeness Complete: 37.90%, Completeness Partial: 57.26%. Leucocytozoon buteonis transcriptome assembly: Completeness Complete 25.40%, Completeness Partial 35.08%.

BLAST searches for RBPs

A BLAST database was made from Plasmodium RBPs downloaded from PlasmoDB [96] (release 46). NCBI blastx 2.9.0+ with e-value cutoff 1e-5 was run against this database, using the Haemoproteus genome and transcriptome assemblies and the Leucocytozoon transcriptome assembly as queries. The transcripts that yielded BLAST hits were further examined using BLAST against the NCBI nt and nr databases (April 2020 versions) and by sequence alignments with RBPs from PlasmoDB. One of the Haemoproteus transcripts (GGWD01016989.1) matched RBPs in the blastx search (the best match was with PRELSG_0014300: E-value 1.31e-23, score: 89.0) and did not yield non-RBP BLAST hits in searches against the nt and nr databases. One Leucocytozoon transcript also matched Plasmodium RBPs (top match: PRELSG_0013000, E-value 1.06e-09, score: 56.2). The sequence of the Haemoproteus transcript GGWD01016989.1 was translated to amino acids using ExPASy Translate (April 2020 version) [97] and then aligned with a selection of Plasmodium RBPs from PlasmoDB (release 46) using MAFFT 7 [98] with default settings. The resulting alignment was cropped in Jalview 2.10.4b1 to include only the region that contained the Haemoproteus sequence [99]. A phylogenetic tree was generated from the alignment as described in the next section.

Phylogenetic trees

Haemosporidian sequences were downloaded from NCBI FTP and PlasmoDB (release 43). The phylogenetic tree of cytochrome B and the tree that included 11 Hepatocystis epomophori genes were based on DNA alignments. The cytochrome B tree also included cytochrome B sequences from de novo assemblies of Hepatocystis RNA-seq reads derived from Piliocolobus tephrosceles blood. The trees of mitochondrial, apicoplast and nuclear proteomes were based on protein alignments. For apicoplast proteome and nuclear proteome trees, orthologous proteins were identified using OrthoMCL 1.4 [100]. The OrthoMCL run included the Haemoproteus tartakovskyi proteome that had been derived from the Haemoproteus tartakovskyi genome assembly using Companion, as previously described. All vs all BLAST for OrthoMCL was done using blastall 2.2.25 with E-value cutoff 1e-5. OrthoMCL was run with mode 3. Proteins with single copy orthologs across all the selected species were used for the protein phylogenetic trees. Sequences were aligned with MAFFT 7.205 [98] (with—auto flag) and the alignments were processed using Gblocks 0.91b [101] with default settings. Individual Gblocks-processed alignments were concatenated into one alignment. The phylogenetic trees were generated using IQ-TREE multicore version 1.6.5 [102] with default settings and plotted using FigTree 1.4.4 (https://github.com/rambaut/figtree/releases). Inkscape (https://inkscape.org) version 0.92 was used to edit text labels of the phylogenetic trees generated with FigTree.

Clustering of pir proteins into subfamilies

Sequences of Plasmodium pir family proteins (including bir, cyir, kir, vir and yir proteins) were downloaded from PlasmoDB [59] (release 39). The sequences were clustered using MCL [103], following the procedures described in the section "Clustering similarity graphs encoded in BLAST results" in clmprotocols (https://micans.org/mcl/man/clmprotocols.html). The BLAST E-value cutoff used for clustering was 0.01 and the MCL inflation value was 2. The pir protein counts per subfamily in each species were plotted as a heatmap using the heatmap.2 function in gplots package version 3.0.1.1 in R version 3.5.1.

Mapping and assembly of Hepatocystis sp. RNA-seq data

To separate Hepatocystis reads from Piliocolobus reads, RNA-seq data from the ENA (study PRJNA413051) were mapped to a FASTA file containing genome assemblies of Hepatocystis and M. mulatta (NCBI assembly Mmul_8.0.1), using HISAT2 version 2.1.0 [57], with "—rna-strandness RF". BED files were generated from the mapped reads using BEDTools 2.17.0 [63]. Reads from each technical replicate were merged, resulting in a single set of read counts for each individual monkey. The BED files were filtered to remove multimapping reads and reads with mapping quality score lower than 10. Names of reads that specifically mapped to the Hepatocystis assembly were extracted from the BED file. SeqTK 1.0-r31 (https://github.com/lh3/seqtk) was used to isolate Hepatocystis FASTQ reads based on the list of reads from the previous step. The Hepatocystis reads were then mapped to the Hepatocystis genome assembly using HISAT2 2.1.0 with "—rna-strandness RF" flag. The SAM files with mapped reads were converted to sorted BAM files with SamTools 0.1.19-44428cd [62]. The EMBL file of Hepatocystis genome annotations was converted to GFF format using Artemis 18.0.1 [82]. Htseq-count 0.7.1 [104] was used to count mapped reads per gene in the GFF file with "-t mRNA -a 0 -s reverse". Htseq-count files of individual RNA-seq runs were merged into a single file.

In order to extract Hepatocystis cytochrome b sequences of each RNA-seq sample, Hepatocystis RNA-seq reads of each sample were isolated from Piliocolobus reads as described above and then assembled with the SPAdes assembler v3.11.0 [105] with the "—rna" flag. Hepatocystis cytochrome b contigs were identified in each of the 29 RNA-seq assemblies using BLAST against Hepatocystis cytochrome b from the DNA assembly (E-value cutoff 1e-10).

In addition to assemblies of individual RNA-seq samples, an assembly of all RNA-seq samples pooled was done. The reads for this assembly were sorted by competitive mapping to P. ovale curtisi GH01 (from PlasmoDB release 45) and Macaca mulatta (Mmul_8.0.1, GenBank assembly accession GCA_000772875.3) genomes with Minimap2 (with the "-ax sr" flag). Reads mapping to the Macaca mulatta genome with minimum mapping score 20 were removed and the rest of the reads were assembled with the SPAdes assembler v3.13.1 [105] with the "—rna" flag. Hepatocystis contigs were identified by comparison of sequences with Plasmodium and Macaca mulatta reference genomes using Diamond, Minimap2 and BLAST, similarly to what is described in the section "Separation of Hepatocystis and Piliocolobus scaffolds". Further decontamination was done using Diamond and BLAST searches against 19747 sequences from Ascomycota and 165860 bacterial sequences downloaded from UniProt (release 2019_10) [106] and 3 Babesia proteomes from PiroplasmaDB (release 46) [107]. Selected contigs were also checked with BLAST against the NCBI nt database. The assembly was deduplicated using BBTools dedupe (Nov 20, 2017 version) and GAP5 v1.2.14-r3753M. Assembly completeness was assessed using CEGMA 2.5. In order to reduce the number of contigs so that they could be used as input for Companion, the assembly was scaffolded with RaGOO Version 1.1 [108], using the Hepatocystis DNA assembly as the reference. The assembly was then processed by the Companion annotation software (Glasgow server, November 2019 version, with P. falciparum 3D7 reference genome, with protein evidence enabled and the rest of the settings left as default). In order to detect proteins missed by Companion, EMBOSS Transeq (version 6.3.1) was used to translate the transcriptome assembly in all 6 reading frames. The output of Transeq was then filtered to keep sequences between stop codons with minimum length of 240 amino acids. Protein BLAST with E-value cutoff 1e-20 was used to detect sequences in Transeq output that were not present in the proteins annotated by Companion. These selected Transeq output sequences were checked for contaminants with BLAST similarly to what was described before. The sequences that passed the contaminant check were combined with the set of Hepatocystis RNA-seq assembly proteins that were detected by Companion. OrthoMCL was run with proteins from Hepatocystis RNA-seq assembly (Companion and selected Transeq sequences combined), Hepatocystis DNA assembly proteins, 20 Plasmodium proteomes from PlasmoDB release 43 and P. ovale wallikeri proteome (GenBank GCA_900090025.2). The settings for OrthoMCL were as described in the "Phylogenetic trees" section.

dN analysis

P. berghei ANKA and P. ovale curtisi protein and transcript sequences were retrieved from PlasmoDB [59] (release 45). One-to-one orthologs between Hepatocystis, P. berghei ANKA and P. ovale curtisi were identified using OrthoMCL [100]and a Newick tree of the three species was generated with IQ-TREE [102]. The settings for OrthoMCL and IQ-TREE were as described in the "Phylogenetic trees" section. Transcripts of one-to-one orthologs were aligned using command line version of TranslatorX [109] with "-p F -t T" flags, so that each alignment file contained sequences from three species. Gaps were removed from alignments while retaining the correct reading frame. Alignment regions where the nucleotide sequence surrounded by gaps was shorter than 42 bp were also removed. In addition, the script truncated alignments at the last whole codon if a sequence ended with a partial codon due to a contig break. The alignments and the Newick tree of the 3 species were then used as input for codeml [110] in order to determine the dN and dN/dS of each alignment. The codeml settings that differed from default settings were: seqtype = 1, model = 1. P. berghei RNA-seq cluster numbers from Malaria Cell Atlas [20] were assigned to each alignment based on the P. berghei gene in the alignment. Transcriptomics-based gametocyte specificity scores of Plasmodium genes were taken from an existing study on this topic [111] (transcripts S2 Table of "Transcriptomics_all_studies" tab). The P. falciparum genes in the gametocyte specificity scores table were matched with equivalent Hepatocystis genes using OrthoMCL (run with the same settings as when used for phylogenetic trees). Statistical tests with the dN results (Kolmogorov Smirnov test, Fisher test and Spearman correlation) were performed using the stats library in R.

Variant calling of RNA-seq samples

SNPs and indels were called in Hepatocystis RNA-seq reads that had been separated from Piliocolobus reads as described above. Four technical replicates of each RNA-seq sample were pooled. Variant calling followed the "Calling variants in RNAseq" workflow in GATK [112] user guide (https://gatkforums.broadinstitute.org/gatk/discussion/3891/calling-variants-in-rnaseq). First, the reads were mapped to the reference genome using 2-pass mapping with the STAR RNA-seq aligner [80] version 2.5.3a. 2-pass mapping consisted of indexing the genome with genomeGenerate command, aligning the reads with the genome, generating a new index based on splice junction information contained in the output of the first pass and then producing a final alignment using the new index. GATK [112] version 4.0.3.0 was used for the next steps. The mapped reads were processed with GATK MarkDuplicates and SplitNCigarReads commands. GATK HaplotypeCaller was then run with the following settings:—dont-use-soft-clipped-bases—emit-ref-confidence GVCF—sample-ploidy 1—standard-min-confidence-threshold-for-calling 20.0. Joint genotyping of the samples was then done using GATK CombineGVCFs and GenotypeGVCFs commands. This was followed by running VariantFiltration with these settings: -window 35 -cluster 3—filter-name FS -filter 'FS > 30.0'—filter-name QD -filter 'QD < 2.0'. SNPs were separated from indels using GATK SelectVariants. Samples SAMN07757853, SAMN07757863, SAMN07757870 and SAMN07757873 were excluded from further analysis due to their low expression of Hepatocystis genes (htseq-count reported below 50,000 reads mapped to the Hepatocystis assembly in each of these samples). The average filtered SNP counts per 10 kb of reference genome for each sample were calculated as the number of filtered SNPs divided by (genome size in kb * 10).

RNA-seq deconvolution

Deconvolution of a bulk RNA-seq transcriptome sequence aims to determine the relative proportions of different cell types in the original sample. This requires a reference dataset of transcriptomes from “pure” cell types. To create this, we used single-cell P. berghei transcriptome sequences from the Malaria Cell Atlas [20]. For each cell type, single-cell transcriptome sequences were combined by summing read counts per gene to generate a set of pseudobulk transcriptome sequences (see our GitHub repository). The aim of summing across cells is to reduce the number of dropouts which are common in individual single-cell transcriptome sequences. Bulk Hepatocystis RNA-seq transcriptome sequences, mapped and counted as above, were summed across replicates and filtered to exclude those with fewer than 100,000 reads. Hepatocystis and P. berghei pseudobulk read counts were converted to Counts Per Million (CPM) and Hepatocystis gene ids were converted to those of P. berghei one-to-one orthologues. Genes without one-to-one orthologues (defined by orthoMCL analysis) were excluded. CIBERSORT v1.06 [113] was used to deconvolute the Hepatocystis transcriptomes with the MCA pseudobulk as the signature matrix file. To test the accuracy of this deconvolution process we generated mixtures of the pseudobulk resulting in e.g. equal representation of read counts from male gametocyte, female gametocyte, ring, trophozoite and schizont pseudobulk transcriptomes (see our GitHub repository). We also deconvoluted bulk RNA-seq transcriptomes from Otto et al. [89] processed as in Reid et al. [19].

Enrichment of missing genes in Malaria Cell Atlas gene clusters

We wanted to determine whether there were functional patterns common to orthologues missing from the Hepatocystis genome relative to Plasmodium species. To do this we looked for orthologous groups (orthoMCL as above) containing genes from P. berghei, P. ovale wallikeri and P. vivax P01, but not Hepatocystis. Genes from P. berghei have previously been assigned to 20 clusters based on their gene expression patterns across the whole life cycle [20]. We looked to see whether missing orthologues tended to fall into particular clusters more often than expected by chance (see our GitHub repository). We used Fisher’s exact test with Benjamini-Hochberg correction to control the false discovery rate. We reported clusters with FDR > = 0.05.

Supporting information

S1 Fig. Conservation of synteny in the core regions of the assembly.

ACT (Artemis Comparison Tool) screenshot showing a comparison of centromere-proximal regions of Hepatocystis scaffold 132, P. falciparum 3D7 (Pf3D7) chromosome 4 and P. vivax (PvP01) chromosome 5. The red blocks represent sequence similarity (tBLASTx). The centromere is shown in green. Coloured boxes represent genes. The graph shows the GC-content.

(TIF)

S2 Fig. Organization of putative subtelomeric regions of Hepatocystis scaffold 67, scaffold 211, P. knowlesi H chromosome 4 and P. falciparum 3D7 chromosome 9.

Exons are shown in coloured boxes with introns as linking lines. ‘//’ represents a gap. The shaded/grey areas in P. knowlesi and P. falciparum mark the start of the conserved, syntenic regions to other Plasmodium species. The presence of genes that are subtelomeric in Plasmodium species, i.e. PHIST proteins, suggests that the Hepatocystis scaffolds are also subtelomeric. A complete subtelomere that includes telomeric repeats is missing in our Hepatocystis assembly. Thus, whether Hepatocystis chromosomes retain the organisation common to most Plasmodium species remains unclear.

(TIF)

S3 Fig. Phylogenetic tree of Haemosporidian mitochondrial proteins.

Hepatocystis sp. ex. Piliocolobus tephrosceles (this work, marked with red arrow) appears next to a previously sequenced Hepatocystis sample from the flying fox Pteropus hypomelanus (NCBI accession FJ168565.1). Branches of the tree have been coloured by bootstrap support values from 45 (red) to 100 (green). Bootstrap values below 100 have also been added to the figure as text.

(TIF)

S4 Fig. Phylogenetic tree of 18 apicoplast protein sequences of Plasmodium spp. and Hepatocystis.

Branches of the tree have been coloured by bootstrap support values from 66 (red) to 100 (green). Bootstrap values below 100 have also been added to the figure as text.

(TIF)

S5 Fig. Phylogenetic tree of 11 nuclear genes of Hepatocystis and Plasmodium species.

Genes of Hepatocystis sp. ex Piliocolobus tephrosceles are highly similar to Hepatocystis epomophori genes sequenced in a different study [2]. The tree is based on the following genes: splicing factor 3B subunit 1, tubulin gamma chain, DNA polymerase delta catalytic subunit, eukaryotic translation initiation factor 2 gamma subunit, T-complex protein 1 subunit alpha, pantothenate transporter, ribonucleoside-diphosphate reductase large subunit, aminophospholipid-transporting P-ATPase, GCN20, transport protein Sec24A and RuvB-like helicase 3. Branches of the tree have been coloured by bootstrap values from 73 (red) to 100 (green). Bootstrap values below 100 have also been added to the figure as text. The red arrow points to the Hepatocystis sample from the current study.

(TIF)

S6 Fig. Deconvolution using CIBERSORT and the Malaria Cell Atlas accurately determines the presence and absence of different Plasmodium life stages in bulk RNA-seq data.

(A) Pre-defined mixtures of pseudobulk RNA-seq data were deconvoluted with very high accuracy. (B) Real samples of P. berghei bulk RNA-seq from Otto et al (2014) were deconvoluted showing almost pure mixtures of gametocyte, ookinete or asexual stages as expected. The low proportions of expected parts of the IDC in each asexual sample may result from differences between what the MCA defines as a ring/trophozoite/schizont and what would microscopically be defined as such.

(TIF)

S7 Fig. Multiple sequence alignments of two Hepatocystis-specific gene families.

(A) Alignment of Hepatocystis-specific gene family 1 (hep1). Pseudogenes (HEP_00099300, HEP_00250500, HEP_00323900) were not included in the alignment. HEP_00353700 is 476 amino acids long and was truncated here. (B) Alignment of Hepatocystis-specific gene family 2 (Hep2). This gene family contains a PEXEL motif (marked with a black box). Pseudogenes (HEP_00165000, HEP_00165200, HEP_00324000, HEP_00489100) were not included in the alignment.

(TIF)

S8 Fig. Heatmaps of Hepatocystis gene family expression in the blood of its mammalian host.

(A) Expression levels (log vst-normalised) of hep1 genes across blood samples from multiple red colobus monkeys. The estimated proportions of early blood stages (rings/trophozoites) and mature gametocytes are highlighted above. (B) Expression levels of hep2 genes (C) Expression levels of pir genes.

(TIF)

S9 Fig. Heatmap of pir protein subfamilies in Hepatocystis and Plasmodium species.

Rows correspond to species and columns correspond to pir subfamilies. The columns have been ordered by the number of sequences in each subfamily and the order of rows is approximately based on phylogeny. Colours represent the numbers of proteins belonging to each subfamily for each species. All Hepatocystis pir proteins belong to the only subfamily conserved across all these species [114] (indicated with red arrow).

(TIF)

S10 Fig. Some orthologues missing in Hepatocystis sp. relative to Plasmodium species show common gene expression patterns across the Plasmodium life cycle.

(A) Malaria Cell Atlas (MCA) gene cluster 10 represents genes highly expressed in late schizonts. 25 genes from this cluster were conserved in P. ovale wallikeri and P. vivax, but were missing from our Hepatocystis genome assembly. Genes were clustered here by expression pattern and single-cells were ordered by pseudotime as in [20]. (B) MCA cluster 4 represents genes highly expressed across much of the life cycle—liver stages, trophozoites, female gametocytes and ookinetes/oocysts. 27 genes from this cluster were conserved in P. ovale wallikeri and P. vivax, but were missing from our Hepatocystis genome assembly.

(TIF)

S11 Fig. Alignment of phylogenetic tree of putative RBP-related gene fragment in Haemoproteus columbae.

(A) Alignment of the translation of a sequence (GGWD01016989.1) from Haemoproteus columbae transcriptome assembly (GenBank: GGWD00000000.1) [23] with Plasmodium reticulocyte-binding proteins (RBPs) from PlasmoDB [115]. The alignment has been cropped to the length of the Haemoproteus columbae sequence. (B) Phylogenetic tree of Plasmodium RBPs and the Haemoproteus columbae sequence GGWD01016989.1, based on the alignment in panel A. Branch colours indicate bootstrap support values, from 33 (red) to 100 (green).

(TIF)

S12 Fig. Distributions of Hepatocystis dN values in Malaria Cell Atlas (MCA) clusters.

Hepatocystis dN was calculated in 3-way comparison between Hepatocystis, P. berghei ANKA and P. ovale curtisi using codeml. The Malaria Cell Atlas clusters have been described in Fig 2B in the article on Malaria Cell Atlas [20]. (A) Hepatocystis genes with dN in the top 5%: observed versus expected ratios for Malaria Cell Atlas clusters. Hepatocystis genes that correspond to Malaria Cell Atlas clusters 2, 4 and 6 have less genes with dN rank in the top 5% than expected by chance (Fisher exact test p-value < 0.05). None of the MCA clusters contain significantly more genes ranked in the top 5% of dN than expected by chance, although there is a trend towards clusters 15 and 16 having higher dN. (B) Boxplot of all Hepatocystis dN values per each Malaria Cell Atlas cluster. Distribution of values in clusters 15 and 16 differs from the rest of the clusters. Kolmogorov-Smirnov test statistics are the following. Cluster 15 vs all other clusters: D = 0.42, p-value = 1.05e-05. Cluster 16 vs all other clusters: D = 0.52, p-value = 4.50e-12. Clusters 15 and 16 combined vs all other clusters: D = 0.46, p = 2.33e-15.

(TIF)

S1 Table. Summary of gene properties.

For each gene in the assembly, the following is listed: annotation, number of exons, gene length (bp), the presence or absence of start and stop codons (reflecting the completeness of the assembly of the gene) and RNA-seq expression level (mean FPKM with standard deviation) in sample SAMN07757854 (RC106R). For the proteins encoded by the genes, the table shows the number of transmembrane segments predicted by TMHMM, ExportPred 2 score, 1 to 1 orthologs in P. berghei ANKA and P. ovale curtisi GH01 (based on OrthoMCL), PFAM domains, the number of matches to PEXEL motif (RxLxE/Q/D) and SignalP-5 signal peptide prediction.

(XLSX)

S2 Table. Raw and normalised Hepatocystis gene expression data.

(XLSX)

S3 Table. Plasmodium orthologues missing in the Hepatocystis genome assembly.

Plasmodium berghei genes, which have an orthologue in P. ovale curtisi or P. vivax, but not the Hepatocystis sp. DNA (A) or RNA-seq (B) assemblies and are enriched in Malaria Cell Atlas gene clusters.

(XLSX)

S4 Table. Genes with Hepatocystis dN rank in the top 5% in codeml 3-way comparison between Hepatocystis, P. berghei ANKA and P. ovale curtisi GH01.

The total number of genes in dN analysis was 4009, out of which 200 correspond to 5%. The table includes Malaria Cell Atlas cluster numbers for each gene.

(XLSX)

Acknowledgments

We would like to thank Alan Tracey for advice on genome assembly and J. Byaruhanga, P. Katurama, A. Mbabazi, A. Nyamwija, J. Rusoke, D. Hyeroba and G. Weny and the staff of Makerere University Biological Field Station for assistance in the field. Manoj Duraisingh provided helpful comments on the manuscript.

Data Availability

The Hepatocystis sp. assembly can be retrieved from the European Nucleotide Archive, under the study PRJEB32891 and sample accession number ERS3649919. The individual accession numbers for the contigs are: CABPSV010000001-CABPSV010002439. Accession numbers for the apicoplast and the mitochondrion are LR699571-LR699572. Illumina HiSeq 4000 RNA-seq reads, containing a mix of Piliocolobus tephrosceles and Hepatocystis sp. sequences can be found in the European Nucleotide Archive under study accession PRJNA413051. Other data and code are available from our GitHub repository: https://github.com/adamjamesreid/hepatocystis-genome.

Funding Statement

This work was funded by National Institutes of Health (NIH; https://www.nih.gov/), USA grant TW009237 as part of the joint NIH-NSF Ecology of Infectious Disease program and the UK Economic and Social Research Council (TLG, NT, CAC). National Science Foundation Grant BCS-1540459 (NT, NDS; https://www.nsf.gov/). The Wellcome Sanger Institute is funded by the Wellcome Trust (grant 206194/Z/17/Z; https://wellcome.ac.uk/) which supports EA, UB, MB, and AJR. AJR is also supported by funding from the UK Medical Research Council (MRC Programme grant #: MR/M003906/1; https://mrc.ukri.org/). TS is supported by a Wellcome Trust Sir Henry Wellcome Fellowship (210918/Z/18/Z). CIN is funded by a Wellcome Investigator Award (104792/Z/14/Z; https://wellcome.ac.uk/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Garnham PCC. Malaria parasites and other haemosporidia. Blackwell Scientific; 1966. [Google Scholar]
  • 2.Galen SC, Borner J, Martinsen ES, Schaer J, Austin CC, West CJ, et al. The polyphyly of Plasmodium: comprehensive phylogenetic analyses of the malaria parasites (order Haemosporida) reveal widespread taxonomic conflict. R Soc Open Sci. 2018;5: 171780 10.1098/rsos.171780 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Garnham PCC. The developmental cycle of Hepatocystes (Plasmodium) kochi in the monkey host. Trans R Soc Trop Med Hyg. 1948;41: 601–616. 10.1016/s0035-9203(48)90418-0 [DOI] [PubMed] [Google Scholar]
  • 4.Perkins SL, Schaer J. A Modern Menagerie of Mammalian Malaria. Trends Parasitol. 2016;32: 772–782. 10.1016/j.pt.2016.06.001 [DOI] [PubMed] [Google Scholar]
  • 5.Garnham PCC, Heisch RB, Minter DM, Others. The Vector of Hepatocystis (= Plasmodium) kocht; the Successful Conclusion of Observations in Many Parts of Tropical Africa. Trans R Soc Trop Med Hyg. 1961;55: 497–502. 10.1016/0035-9203(61)90071-2 [DOI] [PubMed] [Google Scholar]
  • 6.Thurber MI, Ghai RR, Hyeroba D, Weny G, Tumukunde A, Chapman CA, et al. Co-infection and cross-species transmission of divergent Hepatocystis lineages in a wild African primate community. Int J Parasitol. 2013;43: 613–619. 10.1016/j.ijpara.2013.03.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Steinbiss S, Silva-Franco F, Brunk B, Foth B, Hertz-Fowler C, Berriman M, et al. Companion: a web server for annotation and analysis of parasite genomes. Nucleic Acids Res. 2016;44: W29–34. 10.1093/nar/gkw292 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Boundenga L, Ngoubangoye B, Mombo IM, Tsoubmou TA, Renaud F, Rougeron V, et al. Extensive diversity of malaria parasites circulating in Central African bats and monkeys. Ecol Evol. 2018;8: 10578–10586. 10.1002/ece3.4539 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Chang Q, Sun X, Wang J, Yin J, Song J, Peng S, et al. Identification of Hepatocystis species in a macaque monkey in northern Myanmar. Res Rep Trop Med. 2011;2: 141–146. 10.2147/RRTM.S27182 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Schaer J, Perkins SL, Decher J, Leendertz FH, Fahr J, Weber N, et al. High diversity of West African bat malaria parasites and a tight link with rodent Plasmodium taxa. Proc Natl Acad Sci U S A. 2013;110: 17415–17419. 10.1073/pnas.1311016110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Rutledge GG, Böhme U, Sanders M, Reid AJ, Cotton JA, Maiga-Ascofare O, et al. Plasmodium malariae and P. ovale genomes provide insights into malaria parasite evolution. Nature. 2017;542: 101–104. 10.1038/nature21038 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Simons ND, Eick GN, Ruiz-Lopez MJ, Hyeroba D, Omeja PA, Weny G, et al. Genome-Wide Patterns of Gene Expression in a Wild Primate Indicate Species-Specific Mechanisms Associated with Tolerance to Natural Simian Immunodeficiency Virus Infection. Genome Biol Evol. 2019;11: 1630–1643. 10.1093/gbe/evz099 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Schaer J, Perkins SL, Ejotre I, Vodzak ME, Matuschewski K, Reeder DM. Epauletted fruit bats display exceptionally high infections with a Hepatocystis species complex in South Sudan. Sci Rep. 2017;7: 6928 10.1038/s41598-017-07093-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Remmert M, Biegert A, Hauser A, Söding J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods. 2011;9: 173–175. 10.1038/nmeth.1818 [DOI] [PubMed] [Google Scholar]
  • 15.Looker O, Blanch AJ, Liu B, Nunez-Iglesias J, McMillan PJ, Tilley L, et al. The knob protein KAHRP assembles into a ring-shaped structure that underpins virulence complex assembly. PLoS Pathog. 2019;15: e1007761 10.1371/journal.ppat.1007761 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Sá E Cunha C, Nyboer B, Heiss K, Sanches-Vaz M, Fontinha D, Wiedtke E, et al. Plasmodium berghei EXP-1 interacts with host Apolipoprotein H during Plasmodium liver-stage development. Proc Natl Acad Sci U S A. 2017;114: E1138–E1147. 10.1073/pnas.1606419114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Reid AJ. Large, rapidly evolving gene families are at the forefront of host–parasite interactions in Apicomplexa. Parasitology. 2015;142: S57–S70. 10.1017/S0031182014001528 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Brugat T, Reid AJ, Lin J, Cunningham D, Tumwine I, Kushinga G, et al. Antibody-independent mechanisms regulate the establishment of chronic Plasmodium infection. Nat Microbiol. 2017;2: 16276 10.1038/nmicrobiol.2016.276 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Reid AJ, Talman AM, Bennett HM, Gomes AR, Sanders MJ, Illingworth CJR, et al. Single-cell RNA-seq reveals hidden transcriptional variation in malaria parasites. Elife. 2018;7 10.7554/eLife.33105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Howick VM, Russell AJC, Andrews T, Heaton H, Reid AJ, Natarajan K, et al. The Malaria Cell Atlas: Single parasite transcriptomes across the complete Plasmodium life cycle. Science. 2019;365 10.1126/science.aaw2619 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Bensch S, Canbäck B, DeBarry JD, Johansson T, Hellgren O, Kissinger JC, et al. The Genome of Haemoproteus tartakovskyi and Its Relationship to Human Malaria Parasites. Genome Biol Evol. 2016;8: 1361–1373. 10.1093/gbe/evw081 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Pauli M, Chakarov N, Rupp O, Kalinowski J, Goesmann A, Sorenson MD, et al. De novo assembly of the dual transcriptomes of a polymorphic raptor species and its malarial parasite. BMC Genomics. 2015;16: 1038 10.1186/s12864-015-2254-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Toscani Field J, Weinberg J, Bensch S, Matta NE, Valkiūnas G, Sehgal RNM. Delineation of the Genera Haemoproteus and Plasmodium Using RNA-Seq and Multi-gene Phylogenetics. J Mol Evol. 2018;86: 646–654. 10.1007/s00239-018-9875-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Dvorin JD, Martyn DC, Patel SD, Grimley JS, Collins CR, Hopp CS, et al. A plant-like kinase in Plasmodium falciparum regulates parasite egress from erythrocytes. Science. 2010;328: 910–912. 10.1126/science.1188191 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Wright GJ, Rayner JC. Plasmodium falciparum erythrocyte invasion: combining function with immune evasion. PLoS Pathog. 2014;10: e1003943 10.1371/journal.ppat.1003943 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Sinha A, Hughes KR, Modrzynska KK, Otto TD, Pfander C, Dickens NJ, et al. A cascade of DNA-binding proteins for sexual commitment and development in Plasmodium. Nature. 2014;507: 253–257. 10.1038/nature12970 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Kafsack BFC, Rovira-Graells N, Clark TG, Bancells C, Crowley VM, Campino SG, et al. A transcriptional switch underlies commitment to sexual development in malaria parasites. Nature. 2014;507: 248–252. 10.1038/nature12920 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.van Dijk MR, van Schaijk BCL, Khan SM, van Dooren MW, Ramesar J, Kaczanowski S, et al. Three members of the 6-cys protein family of Plasmodium play a role in gamete fertility. PLoS Pathog. 2010;6: e1000853 10.1371/journal.ppat.1000853 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Bargieri DY, Thiberge S, Tay CL, Carey AF, Rantz A, Hischen F, et al. Plasmodium Merozoite TRAP Family Protein Is Essential for Vacuole Membrane Disruption and Gamete Egress from Erythrocytes. Cell Host Microbe. 2016;20: 618–630. 10.1016/j.chom.2016.10.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Tachibana M, Ishino T, Takashima E, Tsuboi T, Torii M. A male gametocyte osmiophilic body and microgamete surface protein of the rodent malaria parasite Plasmodium yoelii (PyMiGS) plays a critical role in male osmiophilic body formation and exflagellation. Cell Microbiol. 2018;20: e12821 10.1111/cmi.12821 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Olivieri A, Bertuccini L, Deligianni E, Franke-Fayard B, Currà C, Siden-Kiamos I, et al. Distinct properties of the egress-related osmiophilic bodies in male and female gametocytes of the rodent malaria parasite Plasmodium berghei. Cell Microbiol. 2015;17: 355–368. 10.1111/cmi.12370 [DOI] [PubMed] [Google Scholar]
  • 32.Siden-Kiamos I, Pace T, Klonizakis A, Nardini M, Garcia CRS, Currà C. Identification of Plasmodium berghei Oocyst Rupture Protein 2 (ORP2) domains involved in sporozoite egress from the oocyst. Int J Parasitol. 2018;48: 1127–1136. 10.1016/j.ijpara.2018.09.004 [DOI] [PubMed] [Google Scholar]
  • 33.Gupta DK, Dembele L, Voorberg-van der Wel A, Roma G, Yip A, Chuenchob V, et al. The Plasmodium liver-specific protein 2 (LISP2) is an early marker of liver stage development. Elife. 2019;8 10.7554/eLife.43362 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Yuda M, Iwanaga S, Shigenobu S, Mair GR, Janse CJ, Waters AP, et al. Identification of a transcription factor in the mosquito-invasive stage of malaria parasites. Mol Microbiol. 2009;71: 1402–1414. 10.1111/j.1365-2958.2009.06609.x [DOI] [PubMed] [Google Scholar]
  • 35.Modrzynska K, Pfander C, Chappell L, Yu L, Suarez C, Dundas K, et al. A Knockout Screen of ApiAP2 Genes Reveals Networks of Interacting Transcriptional Regulators Controlling the Plasmodium Life Cycle. Cell Host Microbe. 2017;21: 11–22. 10.1016/j.chom.2016.12.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Jeninga MD, Quinn JE, Petter M. ApiAP2 Transcription Factors in Apicomplexan Parasites. Pathogens. 2019;8 10.3390/pathogens8020047 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Langer RC, Li F, Vinetz JM. Identification of novel Plasmodium gallinaceum zygote- and ookinete-expressed proteins as targets for blocking malaria transmission. Infect Immun. 2002;70: 102–106. 10.1128/iai.70.1.102-106.2002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Beeson JG, Drew DR, Boyle MJ, Feng G, Fowkes FJI, Richards JS. Merozoite surface proteins in red blood cell invasion, immunity and vaccines against malaria. FEMS Microbiol Rev. 2016;40: 343–372. 10.1093/femsre/fuw001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Werner EB, Taylor WR, Holder AA. A Plasmodium chabaudi protein contains a repetitive region with a predicted spectrin-like structure. Mol Biochem Parasitol. 1998;94: 185–196. 10.1016/s0166-6851(98)00067-x [DOI] [PubMed] [Google Scholar]
  • 40.Sharma A, Sharma A, Dixit S, Sharma A. Structural insights into thioredoxin-2: a component of malaria parasite protein secretion machinery. Sci Rep. 2011;1: 179 10.1038/srep00179 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Matthews K, Kalanon M, Chisholm SA, Sturm A, Goodman CD, Dixon MWA, et al. The Plasmodium translocon of exported proteins (PTEX) component thioredoxin-2 is important for maintaining normal blood-stage growth. Mol Microbiol. 2013;89: 1167–1186. 10.1111/mmi.12334 [DOI] [PubMed] [Google Scholar]
  • 42.Navale R, Atul, Allanki AD, Sijwali PS. Characterization of the autophagy marker protein Atg8 reveals atypical features of autophagy in Plasmodium falciparum. PLoS One. 2014;9: e113220 10.1371/journal.pone.0113220 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Vandana, Singh AP, Singh J, Sharma R, Akhter M, Mishra PK, et al. Biochemical characterization of unusual cysteine protease of P. falciparum, metacaspase-2 (MCA-2). Mol Biochem Parasitol. 2018;220: 28–41. 10.1016/j.molbiopara.2018.01.001 [DOI] [PubMed] [Google Scholar]
  • 44.Rodhain J. Plasmodium epomophori n. sp. parasite commun des Roussettes epaulieres au Congo Belge. Bull Soc Pathol Exot Filiales. 1926;19: 828–838. [Google Scholar]
  • 45.Böhme U, Otto TD, Cotton JA, Steinbiss S, Sanders M, Oyola SO, et al. Complete avian malaria parasite genomes reveal features associated with lineage-specific evolution in birds and mammals. Genome Res. 2018;28: 547–560. 10.1101/gr.218123.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Crosnier C, Bustamante LY, Bartholdson SJ, Bei AK, Theron M, Uchikawa M, et al. Basigin is a receptor essential for erythrocyte invasion by Plasmodium falciparum. Nature. 2011;480: 534–537. 10.1038/nature10606 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Caldelari R, Dogga S, Schmid MW, Franke-Fayard B, Janse CJ, Soldati-Favre D, et al. Transcriptome analysis of Plasmodium berghei during exo-erythrocytic development. Malar J. 2019;18: 330 10.1186/s12936-019-2968-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Preiser PR, Khan S, Costa FTM, Jarra W. Stage-specific transcription of distinct repertoires of a multigene family during Plasmodium life cycle. 2002. Available: https://science.sciencemag.org/content/295/5553/342.short 10.1126/science.1064938 [DOI] [PubMed] [Google Scholar]
  • 49.Galaway F, Yu R, Constantinou A, Prugnolle F, Wright GJ. Resurrection of the ancestral RH5 invasion ligand provides a molecular explanation for the origin of P. falciparum malaria in humans. PLoS Biol. 2019;17: e3000490 10.1371/journal.pbio.3000490 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Goldberg TL, Sintasath DM, Chapman CA, Cameron KM, Karesh WB, Tang S, et al. Coinfection of Ugandan red colobus (Procolobus [Piliocolobus] rufomitratus tephrosceles) with novel, divergent delta-, lenti-, and spumaretroviruses. J Virol. 2009;83: 11318–11329. 10.1128/JVI.02616-08 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Lauck M, Sibley SD, Hyeroba D, Tumukunde A, Weny G, Chapman CA, et al. Exceptional simian hemorrhagic fever virus diversity in a wild African primate community. J Virol. 2013;87: 688–691. 10.1128/JVI.02433-12 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Lauck M, Hyeroba D, Tumukunde A, Weny G, Lank SM, Chapman CA, et al. Novel, divergent simian hemorrhagic fever viruses in a wild Ugandan red colobus monkey discovered using direct pyrosequencing. PLoS One. 2011;6: e19056 10.1371/journal.pone.0019056 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Simons N. The Role of Gene Regulation in Infectious Disease in the Ugandan Red Colobus Monkey (Piliocolobus tephrosceles). 2018. Available: https://scholarsbank.uoregon.edu/xmlui/handle/1794/23729
  • 54.Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 2015;12: 59–60. 10.1038/nmeth.3176 [DOI] [PubMed] [Google Scholar]
  • 55.Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34: 3094–3100. 10.1093/bioinformatics/bty191 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Sayers EW, Agarwala R, Bolton EE, Brister JR, Canese K, Clark K, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2019;47: D23–D28. 10.1093/nar/gky1069 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 2015;12: 357–360. 10.1038/nmeth.3317 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 2014;15: R46 10.1186/gb-2014-15-3-r46 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.PlasmoDB: An integrative database of the Plasmodium falciparum genome. Tools for accessing and analyzing finished and unfinished sequence data. The Plasmodium Genome Database Collaborative. Nucleic Acids Res. 2001;29: 66–69. 10.1093/nar/29.1.66 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Weisenfeld NI, Kumar V, Shah P, Church DM, Jaffe DB. Direct determination of diploid genome sequences. Genome Res. 2017;27: 757–767. 10.1101/gr.214874.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012;19: 455–477. 10.1089/cmb.2012.0021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25: 2078–2079. 10.1093/bioinformatics/btp352 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26: 841–842. 10.1093/bioinformatics/btq033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol. 2011;7: 539 10.1038/msb.2011.75 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ. Jalview Version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics. 2009;25: 1189–1191. 10.1093/bioinformatics/btp033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Hunt M, Silva ND, Otto TD, Parkhill J, Keane JA, Harris SR. Circlator: automated circularization of genome assemblies using long sequencing reads. Genome Biol. 2015;16: 294 10.1186/s13059-015-0849-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27: 722–736. 10.1101/gr.215087.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Morgulis A, Gertz EM, Schäffer AA, Agarwala R. A fast and symmetric DUST implementation to mask low-complexity DNA sequences. J Comput Biol. 2006;13: 1028–1040. 10.1089/cmb.2006.13.1028 [DOI] [PubMed] [Google Scholar]
  • 69.Bushnell B, Rood J, Singer E. BBMerge—Accurate paired shotgun read merging via overlap. PLoS One. 2017;12: e0185056 10.1371/journal.pone.0185056 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Bonfield JK, Whitwham A. Gap5—editing the billion fragment sequence assembly. Bioinformatics. 2010;26: 1699–1703. 10.1093/bioinformatics/btq268 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Waterhouse RM, Seppey M, Simão FA, Manni M, Ioannidis P, Klioutchnikov G, et al. BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol Biol Evol. 2017. 10.1093/molbev/msx319 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30: 772–780. 10.1093/molbev/mst010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27: 573–580. 10.1093/nar/27.2.573 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Zhu B-H, Xiao J, Xue W, Xu G-C, Sun M-Y, Li J-T. P_RNA_scaffolder: a fast and accurate genome scaffolder using paired-end RNA-sequencing reads. BMC Genomics. 2018;19: 175 10.1186/s12864-018-4567-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Hunt M, Kikuchi T, Sanders M, Newbold C, Berriman M, Otto TD. REAPR: a universal tool for genome assembly evaluation. Genome Biol. 2013;14: R47 10.1186/gb-2013-14-5-r47 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Jackman SD, Coombe L, Chu J, Warren RL, Vandervalk BP, Yeo S, et al. Tigmint: correcting assembly errors using linked reads from large molecules. BMC Bioinformatics. 2018;19: 393 10.1186/s12859-018-2425-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Otto TD, Sanders M, Berriman M, Newbold C. Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology. Bioinformatics. 2010;26: 1704–1707. 10.1093/bioinformatics/btq269 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014;9: e112963 10.1371/journal.pone.0112963 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Parra G, Bradnam K, Korf I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 2007;23: 1061–1067. 10.1093/bioinformatics/btm071 [DOI] [PubMed] [Google Scholar]
  • 80.Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29: 15–21. 10.1093/bioinformatics/bts635 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010;28: 511–515. 10.1038/nbt.1621 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Carver T, Harris SR, Berriman M, Parkhill J, McQuillan JA. Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data. Bioinformatics. 2012;28: 464–469. 10.1093/bioinformatics/btr703 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Carver TJ, Rutherford KM, Berriman M, Rajandream M-A, Barrell BG, Parkhill J. ACT: the Artemis Comparison Tool. Bioinformatics. 2005;21: 3422–3423. 10.1093/bioinformatics/bti553 [DOI] [PubMed] [Google Scholar]
  • 84.Kalvari I, Nawrocki EP, Argasinska J, Quinones-Olvera N, Finn RD, Bateman A, et al. Non-Coding RNA Analysis Using the Rfam Database. Curr Protoc Bioinformatics. 2018;62: e51 10.1002/cpbi.51 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Boddey JA, Carvalho TG, Hodder AN, Sargeant TJ, Sleebs BE, Marapana D, et al. Role of plasmepsin V in export of diverse protein families from the Plasmodium falciparum exportome. Traffic. 2013;14: 532–550. 10.1111/tra.12053 [DOI] [PubMed] [Google Scholar]
  • 86.Pain A, Böhme U, Berry AE, Mungall K, Finn RD, Jackson AP, et al. The genome of the simian and human malaria parasite Plasmodium knowlesi. Nature. 2008;455: 799–803. 10.1038/nature07306 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Auburn S, Böhme U, Steinbiss S, Trimarsanto H, Hostetler J, Sanders M, et al. A new Plasmodium vivax reference sequence with improved assembly of the subtelomeres reveals an abundance of pir genes. Wellcome Open Res. 2016;1: 4 10.12688/wellcomeopenres.9876.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Pasini EM, Böhme U, Rutledge GG, Voorberg-Van der Wel A, Sanders M, Berriman M, et al. An improved Plasmodium cynomolgi genome assembly reveals an unexpected methyltransferase gene expansion. Wellcome Open Res. 2017;2: 42 10.12688/wellcomeopenres.11864.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Otto TD, Böhme U, Jackson AP, Hunt M, Franke-Fayard B, Hoeijmakers WAM, et al. A comprehensive evaluation of rodent malaria parasite genomes and gene expression. BMC Biol. 2014;12: 86 10.1186/s12915-014-0086-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Otto TD, Rayner JC, Böhme U, Pain A, Spottiswoode N, Sanders M, et al. Genome sequencing of chimpanzee malaria parasites reveals possible pathways of adaptation to human hosts. Nat Commun. 2014;5: 4754 10.1038/ncomms5754 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Böhme U, Otto TD, Sanders M, Newbold CI, Berriman M. Progression of the canonical reference malaria parasite genome from 2002–2019. Wellcome Open Res. 2019;4: 58 10.12688/wellcomeopenres.15194.2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Krogh A, Larsson B, von Heijne G, Sonnhammer EL. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001;305: 567–580. 10.1006/jmbi.2000.4315 [DOI] [PubMed] [Google Scholar]
  • 93.El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, et al. The Pfam protein families database in 2019. Nucleic Acids Res. 2019;47: D427–D432. 10.1093/nar/gky995 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Almagro Armenteros JJ, Tsirigos KD, Sønderby CK, Petersen TN, Winther O, Brunak S, et al. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat Biotechnol. 2019;37: 420–423. 10.1038/s41587-019-0036-z [DOI] [PubMed] [Google Scholar]
  • 95.Bushmanova E, Antipov D, Lapidus A, Prjibelski AD. rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data. Gigascience. 2019;8 10.1093/gigascience/giz100 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Tresguerres FGF, Torres J, López-Quiles, Hernández G, Vega JA, Tresguerres IF. Corrigendum to “The osteocyte: A multifunctional cell within the bone” [Ann. Anat. 227 (2020) 10.1016/j.aanat.2019.151422 ]. Ann Anat. 2020;230: 151510. [DOI] [PubMed] [Google Scholar]
  • 97.Gasteiger E, Gattiker A, Hoogland C, Ivanyi I, Appel RD, Bairoch A. ExPASy: The proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res. 2003;31: 3784–3788. 10.1093/nar/gkg563 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Katoh K, Misawa K, Kuma K-I, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30: 3059–3066. 10.1093/nar/gkf436 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Clamp M, Cuff J, Searle SM, Barton GJ. The Jalview Java alignment editor. Bioinformatics. 2004;20: 426–427. 10.1093/bioinformatics/btg430 [DOI] [PubMed] [Google Scholar]
  • 100.Li L, Stoeckert CJ Jr, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13: 2178–2189. 10.1101/gr.1224503 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Castresana J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 2000;17: 540–552. 10.1093/oxfordjournals.molbev.a026334 [DOI] [PubMed] [Google Scholar]
  • 102.Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015;32: 268–274. 10.1093/molbev/msu300 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Enright AJ, Van Dongen S, Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002;30: 1575–1584. 10.1093/nar/30.7.1575 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Anders S, Pyl PT, Huber W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31: 166–169. 10.1093/bioinformatics/btu638 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Nurk S, Bankevich A, Antipov D, Gurevich AA, Korobeynikov A, Lapidus A, et al. Assembling single-cell genomes and mini-metagenomes from chimeric MDA products. J Comput Biol. 2013;20: 714–737. 10.1089/cmb.2013.0084 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Consortium UniProt. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019;47: D506–D515. 10.1093/nar/gky1049 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Warrenfeltz S, Basenko EY, Crouch K, Harb OS, Kissinger JC, Roos DS, et al. EuPathDB: The Eukaryotic Pathogen Genomics Database Resource. Methods Mol Biol. 2018;1757: 69–113. 10.1007/978-1-4939-7737-6_5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Alonge M, Soyk S, Ramakrishnan S, Wang X, Goodwin S, Sedlazeck FJ, et al. RaGOO: fast and accurate reference-guided scaffolding of draft genomes. Genome Biol. 2019;20: 224 10.1186/s13059-019-1829-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Abascal F, Zardoya R, Telford MJ. TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations. Nucleic Acids Res. 2010;38: W7–13. 10.1093/nar/gkq291 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24: 1586–1591. 10.1093/molbev/msm088 [DOI] [PubMed] [Google Scholar]
  • 111.Meerstein-Kessel L, van der Lee R, Stone W, Lanke K, Baker DA, Alano P, et al. Probabilistic data integration identifies reliable gametocyte-specific proteins and transcripts in malaria parasites. Sci Rep. 2018;8: 410 10.1038/s41598-017-18840-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20: 1297–1303. 10.1101/gr.107524.110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Newman AM, Liu CL, Green MR, Gentles AJ, Feng W, Xu Y, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods. 2015;12: 453–457. 10.1038/nmeth.3337 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Frech C, Chen N. Variant surface antigens of malaria parasites: functional and evolutionary insights from comparative gene family classification and analysis. BMC Genomics. 2013;14: 427 10.1186/1471-2164-14-427 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Bahl A, Brunk B, Crabtree J, Fraunholz MJ, Gajria B, Grant GR, et al. PlasmoDB: the Plasmodium genome resource. A database integrating experimental and computational data. Nucleic Acids Res. 2003;31: 212–215. 10.1093/nar/gkg081 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Xin-zhuan Su, Tim JC Anderson

12 Mar 2020

Dear Dr. Reid,

Thank you very much for submitting your manuscript "Genomic and transcriptomic evidence for descent from Plasmodium and loss of blood schizogony in Hepatocystis parasites from naturally infected red colobus monkeys" for consideration at PLOS Pathogens. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

A brief editorial comment: I share reviewer 2's view that a complete genome assembly should not be a requirement for publication. The sparse parasite data gleaned from sequencing of the host has been effectively mined to answer the important questions posed. However, I agree with reviewer 1 that some comparison with the published Haemoproteus genome would strengthen the manuscript.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Tim J.C. Anderson

Guest Editor

PLOS Pathogens

Xin-zhuan Su

Section Editor

PLOS Pathogens

Kasturi Haldar

Editor-in-Chief

PLOS Pathogens

orcid.org/0000-0001-5065-158X

Michael Malim

Editor-in-Chief

PLOS Pathogens

orcid.org/0000-0002-7699-2064

***********************

Reviewer's Responses to Questions

Part I - Summary

Please use this section to discuss strengths/weaknesses of study, novelty/significance, general execution and scholarship.

Reviewer #1: The manuscript describes the first genomic and transcriptomic analysis of the haemosporidian parasite genus Hepatocystis. The authors generated a draft genome from sequence data that originally were designed to generate a reference genome of the primate host Piliocolobus tephrosceles and use transcriptomic data to investigate parasite life cycle characteristics in the blood stage in the vertebrate host.

Malaria parasites with the predominant species Plasmodium falciparum, are the causative agents of the severe tropical disease in humans. No vaccines with high efficacy could be developed against Plasmodium parasites to date. Plasmodium parasites feature a complex life cycle with distinct invasive and replicating stages and host switches and a single life cycle phase, the intra-erythrocytic replication of asexual blood stages, induces disease. This particular life cycle step is thought to be exclusive to species of the genus Plasmodium and represents the trait that is used to define the genus Plasmodium. Notably, all other haemosporidian parasite genera, even the closest relatives of Plasmodium, Hepatocystis parasites, lack the asexual multiplication in the blood. In the present manuscript the authors work to an understanding of the evolution of haemosporidian parasites and the evolution of erythrocytic merogony in particular by studying the genome of Hepatocystis in combination with the transcriptome of Hepatocystis in a comparative approach with the published Plasmodium datasets. As Hepatocystis presents the closest relative of mammalian Plasmodium species, the study of this genus is of significance for the entire research field.

In the introduction the authors introduce the genus Hepatocystis very comprehensively and highlight the differences to Plasmodium (in e.g. life cycle, hosts). The authors confirm the sister relationship of Hepatocystis with species of the Vinckeia Plasmodium clade that infect rodents, which has been proposed in previous studies. Without question one of the most important findings of the study is the fact that no transcriptomic evidence for schizonts in the blood has been found. However, the authors point out that the schizonts could sequester away from the bloodstream as seen in some Plasmodium species. The detection of novel gene families is another essential finding and will certainly be the subject of many subsequent studies. The low number of pir genes in Hepatocystis in comparison to species in the Vinckeia clade and the complete absence of the Reticulocyte Binding Protein family again underline the numerous differences relating to the differing life cycle despite the close phylogenetic relationship. With the conclusion about changes in specific genes that might relate to its adaptation to the Culicoides vector, the authors support previous hypotheses that vector shifts into different dipteran families are associated with significant genomic changes.

The methods used are described very detailed and comply with the best standards in this field. Several authors on this manuscript have been involved in different Plasmodium genome publications in the past and therefore belong to the experts in this field. This study/genome & transcriptome data provides yet another valuable resource for the entire malaria parasite research community. Therefore, I consider it the greatest weakness of the study that the authors did not aim for a publication of the full annotated genome and rather provide a draft genome. Without question, using the sequence data that had been generated to study the monkey genome, to assemble the Hepatocystis draft genome was a great opportunity and the interesting results of the study speak for themselves. But the effort to work towards a complete genome would have made the outcome even stronger and uncertainties in some results such as in the lack of expansion of specific gene families could be ruled out.

For the first time, a genomic study investigates a haemosporidian parasite species that infects mammals and does not belong to the genus Plasmodium. However, the authors do not compare their data to the published genome of Haemoproteus, another haemosporidian genus that also lacks the erythrocytic merogony and infects birds. In the last sentence of the manuscript, the authors point to the fact that genome sequences of Haemoproteus will provide more insights in the molecular biology and evolution of malaria parasites. Haemoproteus presents a basal taxon in the haemosporidian phylogeny and hence, an inclusion in the comparative genomic analysis is important and could reveal additional interesting findings. I recommend, if possible, to include the published Haemoproteus genome data in the comparative analysis.

Figures and Tables are comprehensive and especially the figures follow the design of previous Plasmodium publications (that have been authored/co-authored by some of the authors of the manuscript) and enable the reader to make comparisons.

Data availability/depositions has been carried out to a great extent and a previous version of this manuscript had been made available on bioRxiv.

Overall, the study/manuscript presents novel insights into the evolution of a neglected haemosporidian parasite genus that infects mammals and features a greatly modified life cycle in comparison to Plasmodium and therefore the results allow conclusions about the evolution of the entire group of malaria parasites. I consider the data and results of the study a valuable resource for the large Plasmodium parasite research community.

Reviewer #2: Aunin et al. present research on a parasite in the genus Hepatocystis based on data from a project conducted on its monkey host. The genome assembly from only "parasite contamination" of host data can not reach the standards of quasi-chromosome level genome reconstructions that have become custom in recent years. Additionally, the authors can't even determine the parasite species for which they assembled a genome. But - read on - Aunin et al. still present a fascinating manuscript, as these seeming shortcomings are irrelevant for a clearly defined research question.

Schizogony is the process of asexual replication of malaria parasites in the vertebrate host (e.g. human). During this process the host can suffer anaemia, as red blood cells are depleted. Aunin et al. seek an answer to the question whether Hepatocystis has lost blood schizogony. This has long been suggested, as asexual replicative forms (schizonts) have not been observed in parasites from this genus. If this question can be answered positively, the genetic correlates of presence/absence of such a process are interesting, as they could hint towards genetic pathways regulating blood schizogony in Plasmodium.

Genomic and transcriptomic analyses is presented based on samples from an undefined species. Does it matter? Not in my opinion! What matters is that we are presented with a parasite species sharing a more recent common ancestor with some Plasmodium species than those Plasmodium species share with each other. Aunin et al. resolve the phylogenetic placement of Hepatocystis as a sister to rodent Plasmodium species used as model systems in malaria research. But also P. ovale and P.vivax infecting humans are more closely related to Hepatocystis than to the most pathogenic human malaria parasite P. falciparum. In phylogenetic terms this means that Hepatocystis species are rendering the genus Plasmodium paraphyletic. Because of this phylogenetic context, we can learn about Plasmodium species from the single - even undetermined - Hepatocystis species analysed here.

A more complete and contiguous reconstruction of the genome would surely allow subtle questions on genome evolution. While this would certainly be beneficial for future research on other Hepatocystis species, I rather want to focus on whether the data is sufficient for a clear answer on the research question. This boils down to asking whether the genome is sufficiently well reconstructed to conclude that absence of certain gene families (i.e. those involved in blood schizogony) constitutes a "true negative" finding.

Relevant for this question are centromere-proximal and subtelomeric regions of the genome. Those contain not only replicated sequences notoriously hard to assemble, but also effector genes important for host-parasite interaction. The authors are open about the fact that telomeric repeat regions were not assembled and centromere-proximal regions were only assembled to a certain degree. Aunin et al., nevertheless, show that the fragmented genomic scaffolds recovered for those regions still contain an expected gene repertoire. They employ a "custom annotation" derived from single cell transcriptomics in a project called the malaria cell atlas (MCA) and demonstrate the absence (significant under-representation, more technically) of orthologues of "schizogony genes" in Hepatocystis sp..

This genomic data would not have been sufficient for strongly concluding the absence of blood schizogony. To strengthen their argument Aunin et al. were able to obtain Hepatocystis sp. transcriptomes from an impressive number (25 samples deemed usable based on coverage in an analysis of a total 29) of samples. RNASeq data had also originally been produced for the annotation of the monkey host's genome. Aunin et al. then used a clever analysis called "deconvolution", which characterises expression profiles by lifecycle stages. Essentially - again - in comparison to single cell (MCA) transcriptome profiles, this shows that the schizont expression profiles are absent in the Hepatocystis sp.'s transcriptome.

I am convinced the research question has been convincingly answered based on these two analyses. The analyses are performed expertly and potential problems are resolved with scrutiny. Aunin et al. present a number of interesting findings as genetic correlates with the loss of blood schizogony: a completely novel Hepatocystis specific gene family, expansion of gene families relevant for sporozoites (the lifecycle stage in the insect vector transmitting the parasite and in liver cells) and evolutionary divergence - likely by positive selection on non-synonymous sites - of genes involved in sexual replication and/or vector infection.

**********

Part II – Major Issues: Key Experiments Required for Acceptance

Please use this section to detail the key new experiments or modifications of existing experiments that should be absolutely required to validate study conclusions.

Generally, there should be no more than 3 such required experiments or major modifications for a "Major Revision" recommendation. If more than 3 experiments are necessary to validate the study conclusions, then you are encouraged to recommend "Reject".

Reviewer #1: As pointed out in the summary (Part I), a high-quality full genome of Hepatocystis would have been desirable, given the experience and expertise of several of the authors of this manuscript. Could the authors consider to invest additional work and aim for publication of a full genome? Could the authors provide reasons, why this is perhaps not feasible?

Again, as pointed out in the summary (Part I), why did the authors not compare their genomic Hepatocystis data to the published genome of the avian-infecting genus Haemoproteus? This basal haemosporidian taxon also lacks the asexual multiplication in the blood of their vertebrate hosts and a comparison to Hepatocystis could potentially reveal additional insights into the evolution of the entire malaria parasite group. Could the published Haemoproteus genome data be incorporated in the comparative analysis?

Reviewer #2: No major issues.

**********

Part III – Minor Issues: Editorial and Data Presentation Modifications

Please use this section for editorial suggestions as well as relatively minor modifications of existing data that would enhance clarity.

Reviewer #1: Please find below some minor issues/suggestions

The authors state: “Species of the genus Hepatocystis are single-celled eukaryotic parasites infecting Old World monkeys, fruit bats and squirrels“

Species of Hepatocystis have also been described from the mammalian order Artiodactyla (e.g. Hippopotamus and Tragulus). So, I recommend adding “amongst others” or also listing this mammalian host group.

The authors state: “In contrast, liver merozoites of Hepatocystis spp. are thought to commit to the development of transmission stages directly upon invading red blood cells.“

Just for clarity, I recommend to specify that the transmission stages are sexual (gametocyte) stages.

The authors state: „They are then vectored not by mosquitoes, but by biting midges of the genus Culicoides“

To highlight that the Culicoides midges belong to a different family, the authors could add the family name (Ceratopogonidae).

One additional comment the vector: So far, Culicoides has only been confirmed as vector for the primate-infecting species Hepatocystis kochi. It still needs to be verified, if this or other Culicoides species transmit Hepatocystis in bats and the other mammalian groups.

The authors state: “At least four species of Hepatocystis are known to infect African monkeys – H. kochi, H. simiae, H. bouillezi and H. cercopitheci (6) – but with little sequence data currently linked to morphological identification, it was not possible to determine the species.”

Studies of the bat-infecting Hepatocystis species have proposed that bats in Africa and Australia are rather infected with parasites that belong to Hepatocystis species-complexes than different species. Could this also apply to the primate-infecting “species”? Perhaps the authors could comment on this.

The authors state: “Limited sequence data are available for Hepatocystis outside of this study, however 11 nuclear genes have been sequenced for H. epomophori, a parasite of bats (2). Based on the sequence of these genes, we found that HexPt forms a sister group to H. epomophori (S5 Fig; S1 Dataset).”

Limited sequence data for Hepatocystis is available and almost exclusively comprises short gene sequences for phylogenetic studies. However, it is probably still worth noting that some multiple-gene phylogenies investigated the phylogenetic placements within the Hepatocystis clade, showing that all Hepatocystis species from monkeys (from Asia and Africa) group in one monophyletic clade, sister to the Hepatocystis species that infect African bats. There is evidence that the Hepatocystis species of bats are not monophyletic and that Hepatocystis species from bat hosts of the genus Pteropus group basal to all other Hepatocystis species and therefore Hepatocystis species of monkeys might represent a derived clade.

The authors state: “In vivo transcriptome data supports a lack of erythrocytic schizogony. Transcriptome sequencing of blood samples from 29 individuals was performed as part of the red colobus monkey genome sequencing project (12). We found evidence that each of these individuals was infected with the same species of Hepatocystis as found in the genomic reads, consistent with high prevalence of this parasite in Kibale red colobus monkeys as previously reported (6)."

Mixed haplotype infections have been reported/investigated by Thurber et al. How did the authors deal with the mixed haplotype infections in their samples? Did the sequences that were detected within the monkey genome sequences show hints of a mixed infection?

The manuscript contains a few spelling errors/typos. In several cases, “spp.” is erroneously written italics (e.g. abstract, results) and “Vinckei” should be “Vinckeia”.

“Sample collection and data generation”

Have permits and agreements been expanded to allow use of DNA samples for parasite analysis in addition to the originally planned analysis of the monkey genome? (Nagoya/CBD)

Reviewer #2: The issues I see in the manuscript are relatively minor. I first want to highlight my perception of the special character of the manuscript as an asset: I consider the (very successful) re-use of host-genome "contamination" for parasite research a central aspect of the study. In my opinion this could be more clearly stated as not only viable but attractive for parasitologists working with genomic methods. The first sentence of the discussion could e.g. easily include a statement on data provenance. I also find the the first sentence of the methods could be improved (missing "a" before "project", instead of "a different project", rather "a project with the aim to generate" or similar). In my opinion the manuscript could be generally more pronounced on this point.

Taxonomy: The data is from only one species of the genus Hepatocystis. This is made clear in the context of the whole manuscript (esp. in the introduction "thus classified the parasite as Hepatocystis sp. ex Piliocolobus tephrosceles (HexPt; NCBI Taxonomy ID: 2600580)". In my opinion, nevertheless, Hepatocystis _sp._ should be used to refer to a - for the purpose of this manuscript undetermined - species in this genus throughout the manuscript (I'd prefer this over the more technical HexPt, but this is a matter of taste). As stated above, I am very positive about the general value of findings from this one species, as loss of blood schizogony is likely a major, sufficiently rare, evolutionary transition. Using _sp_ throughout the manuscript would still be more conservative in my opinion.

Apart from this, I think that the the figures could be improved to do full justice to findings:

Fig 1 A. This figure and the corresponding analysis could include additional information on the coverage profiles for the contigs/scaffolds with the different sequence similarity and GC (see e.g. "blobtools"). The signal of (low) GC content is expectedly strong for a Plasmodium sp./Hepatocystis sp. data set, it would be great to have sequencing coverage as a second visually distinguishing feature. This could also be (an) additional panel(s) for different sequencing libraries.

Fig 1 B. The presented colour legend should be changed/expanded to separate geographical information from host species information. Information on the latter should also be included for Plasmodium species, like in figure 2 (ideally using the same symbols).

Fig 2. In my opinion the phylogenetic tree - transporting the core message of the whole figure - should attract more attention in this. The "figure columns" "ap2 protein" and "tRNA" take (together) a similar amount of space as the tree but show relatively little variation, hence contain hardly any information.

- I am missing a figure for the paragraph "Missing orthologues tend to be involved in erythrocytic schizogony". This lacks representation in both main text figures or tables. Plotting expected/observed ratios for the number of orthologues found (like for SNPs in figure S11, but see comment below!) could potentially help to visualise missing or surplus orthologues in MCA clusters. Using this as an anchor (with Hepatocystis sp. data), visualisation of "Cluster 4" and "Cluster 10" from MCA relying on P.berghei data (Figure S10) could be included as additional panels. This might help to introduce the interpretation of MCA clusters for the present manuscript generally. MCA clusters play an important role as "custom annotation" in two parts of the analysis: enrichment of genes missing in Hepatocystis sp. and enrichment of high-dN gens in those clusters. It could be nice to introduce MCA more visually this way as part of a main manuscript figure!

Fig 3. The caption of this figure only refers to figure panel B (and somewhat to C), but not to A. For A, I am also missing the information (in the legend) that the SNPs are called relative to the genome assembly(?). I am wondering whether A could be a figure on its own (SNPs in the transcriptome), maybe with an additional panel presenting the enrichment of SNPs in the MCA clusters (currently S11)? Btw. Fig S11 A should make more clear that a ratio of 1 is the null expectation: rotation by 90° and plotting deviations from 1 (as bars or lines) would help to make this more obvious. Then current 3 B and C could be separate figure committed to (deconvoluted) expression patterns in the transcriptome and all panles would fit the caption.

Table 2 should, in my opinion, contain context for the expected size of gene families. Listing the family size from Plasmodium species (e.g. P. berghei, P. vivax and P. falciparum; just like in the genome comparisons of Table 1) would add such context. Also: "Frequency of members" sounds a bit too technical to me, isn't this simply the "gene family size"?

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Emanuel Heitlinger

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example see here on PLOS Biology: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see http://journals.plos.org/plospathogens/s/submission-guidelines#loc-materials-and-methods

Decision Letter 1

Xin-zhuan Su, Tim JC Anderson

19 Jun 2020

Dear Dr. Reid,

We are pleased to inform you that your manuscript 'Genomic and transcriptomic evidence for descent from Plasmodium and loss of blood schizogony in Hepatocystis parasites from naturally infected red colobus monkeys' has been provisionally accepted for publication in PLOS Pathogens.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email.  A member of our team will be in touch with a set of requests. In addition, please address the minor edits requested by reviewer #2.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Pathogens.

Best regards,

Tim J.C. Anderson

Guest Editor

PLOS Pathogens

Xin-zhuan Su

Section Editor

PLOS Pathogens

Kasturi Haldar

Editor-in-Chief

PLOS Pathogens

orcid.org/0000-0001-5065-158X

Michael Malim

Editor-in-Chief

PLOS Pathogens

orcid.org/0000-0002-7699-2064

***********************************************************

Thanks for the care taken in thoroughly addressing the reviewers comments. I have no further comments and think this is an excellent paper

Reviewer Comments (if any, and for reference):

Reviewer's Responses to Questions

Part I - Summary

Please use this section to discuss strengths/weaknesses of study, novelty/significance, general execution and scholarship.

Reviewer #1: The manuscript is substantially improved and clarified in the revised version and the authors should be commended for responding so positively to the reviewer's comments. All the major issues have been clarified.

Reviewer #2: As mentioned in my first review I find that Aunin et al. present an outstanding manuscript and I recommend it for publication in its present form with very minor revisions, if any at all.

Reports finding that databases are ripe contamination wrongly assigned to target species are ample. Such contamination can be problematic for the assignment of taxonomic origin in metagenomics or when gene family evolution is reconstructed [1,2]. When not scrutinised in genome projects even simple contamination from co-cultured organisms can lead to wrong conclusions, such as claims about astonishingly prevalent horizontal gene transfer into eukaryote genomes [3]. A study on contamination of animal genomes found that high amounts of sequences derived from parasites and symbiont are present in genome assemblies. When treated correctly sequence data from such pathogens and symbionts, in contrast to culture contamination, presents a biological signal worth further investigation. Remarkably the parasite contamination in animal genomes was found be be derived mainly from Apicomplexan parasites, providing opportunities especially for genomics in parasitology [4].

Aunin et al. take such genomic data re-use in parasitology to a novel and extraordinary level: they present research on a single parasite species reconstructing a relatively complete genome and transcriptome. Aunin et al. present the genome of Hepatocystis sp., a relative of Plasmodium species. They can - based on the gonome this one species - confirm previous findings, that Hepatocystis is rendering the genus Plasmodium paraphyletic. This means that a common ancestor of species of the two genera is found more recently than common ancestors with e.g. P. falciparum, the causative agent of Malaria tropica. In contrast to the other species in this common clade, species in the genus Hepatocystis are believed to have lost asexual stages (schizonts) replicating in red blood cells.

From mere contamination of the red colobus monkey's genome Aunin et al. construct a parasite genome sufficiently complete to conclude on even the absence of gene families as a "true negative" finding: Reticulocyte binding proteins (RBPs) a gene family involved in schizogony (the process producing erythrozytic schizonts) is found absent from the Hepatocystis sp. genome.

Aunin et al. employ a "custom annotation" derived from single cell transcriptomics. Briefly, data from a project called the malaria cell atlas (MCA), is used to classify Plasmodium genes regarding their expression in particular lifecycles stages. Using this classification Aunin et al demonstrate the absence (significant under-representation, more technically) of orthologues of "schizogony genes" genes in genome and transcriptome assemblies of Hepatocystis sp.. Taken together the absence of "blood schizogony genes" in relatively complete genome assemblies, and the lack of expression of those genes in blood, the tissue those genes are expected to be expressed most in, provides sufficient evidence for absence. This delivers genomic correlates with the absences of blood schizogony. As this process is causing most virulence of Plasmodium species in their vertebrate hosts, this is an important finding.

[1] Florian P. Breitwieser, Mihaela Pertea, Aleksey V. Zimin, and Steven L. Salzberg. Human contamination in bacterial genomes has created thousands of spurious proteins. Genome Res. June 2019 29: 954-960; Published in Advance May 7, 2019, doi:10.1101/gr.245373.118

[2] Clementine M. Francois, Faustine Durand, Emeric Figuet and Nicolas Galtier. Prevalence and Implications of Contamination in Public Genomic Resources: A Case Study of 43 Reference Arthropod Assemblies. G3: Genes, Genomes, Genetics February 1, 2020 vol. 10 no. 2 721-730; https://doi.org/10.1534/g3.119.400758

[3] Georgios Koutsovoulos, Sujai Kumar, Dominik R. Laetsch, Lewis Stevens, Jennifer Daub, Claire Conlon, Habib Maroon, Fran Thomas, Aziz A. Aboobaker, Mark Blaxter No evidence for extensive horizontal gene transfer in the genome of the tardigrade Hypsibius dujardini. Proceedings of the National Academy of Sciences May 2016, 113 (18) 5053-5058; DOI: 10.1073/pnas.1600338113

[4] Borner, J., Burmester, T. Parasite infection of public databases: a data mining approach to identify apicomplexan contaminations in animal genome and transcriptome assemblies. BMC Genomics 18, 100 (2017). https://doi.org/10.1186/s12864-017-3504-1

**********

Part II – Major Issues: Key Experiments Required for Acceptance

Please use this section to detail the key new experiments or modifications of existing experiments that should be absolutely required to validate study conclusions.

Generally, there should be no more than 3 such required experiments or major modifications for a "Major Revision" recommendation. If more than 3 experiments are necessary to validate the study conclusions, then you are encouraged to recommend "Reject".

Reviewer #1: (No Response)

Reviewer #2: None.

**********

Part III – Minor Issues: Editorial and Data Presentation Modifications

Please use this section for editorial suggestions as well as relatively minor modifications of existing data that would enhance clarity.

Reviewer #1: (No Response)

Reviewer #2: Individual points in the revision after my first review (all my points I don't comment on have been fully addressed, I also comment on one more than fully addressed point) :

- "We had some doubts about the use of HexPt and have altered this throughout the manuscript to the less intrusive Hepatocystis sp." I find this a good decission. However, I still find a few occasions of "Hepatocystis" (without the "sp.") referring clearly to only the one species studied. E.g. "In fact, we could not identify any RBPs in the Hepatocystis genome or Hepatocystis RNA-seq assemblies." It's probably pedantic, but I would appreciate to clearly refer to the one species studied in those and similar cases using the "sp.". There are other occurrences were Hepatocystis clearly is meant as a genus, and a "twilight zone" where either the one species is meant or an extrapolation to mean the whole genus is made. Clearly the decision where this extrapolation is suitable can be left to the authors.

- I find that the 2D density plots integrating coverage profiles with density distributions of GC content add information. I assume the high coverage - low GC contigs are from the apicoplast genome of Hepatocystis sp.. I fully understand that the authors want to use the figure in it's present form to indicate what they used in this study as sufficient to sort out Hepatocystis contigs. I only remark that the visualisation of the very prominent high coverage apicoplast contigs could help guide others to detect apicomplexan "genome contaminants". This might be relevant in apicomplexans with higher GC content.

- "We have redrawn Figure 2B from the MCA paper, using the supplementary data and added the observed/expected ratios used in calculating the clusters with unexpectedly high levels of missingness in Hepatocystis. This has been added as a new Figure 3 in the main manuscript." It's rather Figure 4 not 3. I find this to be a very nice and informative figure now. I much appreciate how the enrichment scores (great to use log ratios!) are aligned with the heatmap columns! Much better than what I imagined when I recommended a figure in this style. Absolutely great figure

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Juliane Schaer

Reviewer #2: Yes: Emanuel Heitlinger

Acceptance letter

Xin-zhuan Su, Tim JC Anderson

15 Jul 2020

Dear Dr. Reid,

We are delighted to inform you that your manuscript, "Genomic and transcriptomic evidence for descent from Plasmodium and loss of blood schizogony in Hepatocystis parasites from naturally infected red colobus monkeys," has been formally accepted for publication in PLOS Pathogens.

We have now passed your article onto the PLOS Production Department who will complete the rest of the pre-publication process. All authors will receive a confirmation email upon publication.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any scientific or type-setting errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Note: Proofs for Front Matter articles (Pearls, Reviews, Opinions, etc...) are generated on a different schedule and may not be made available as quickly.

Soon after your final files are uploaded, the early version of your manuscript, if you opted to have an early version of your article, will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Pathogens.

Best regards,

Kasturi Haldar

Editor-in-Chief

PLOS Pathogens

orcid.org/0000-0001-5065-158X

Michael Malim

Editor-in-Chief

PLOS Pathogens

orcid.org/0000-0002-7699-2064

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Conservation of synteny in the core regions of the assembly.

    ACT (Artemis Comparison Tool) screenshot showing a comparison of centromere-proximal regions of Hepatocystis scaffold 132, P. falciparum 3D7 (Pf3D7) chromosome 4 and P. vivax (PvP01) chromosome 5. The red blocks represent sequence similarity (tBLASTx). The centromere is shown in green. Coloured boxes represent genes. The graph shows the GC-content.

    (TIF)

    S2 Fig. Organization of putative subtelomeric regions of Hepatocystis scaffold 67, scaffold 211, P. knowlesi H chromosome 4 and P. falciparum 3D7 chromosome 9.

    Exons are shown in coloured boxes with introns as linking lines. ‘//’ represents a gap. The shaded/grey areas in P. knowlesi and P. falciparum mark the start of the conserved, syntenic regions to other Plasmodium species. The presence of genes that are subtelomeric in Plasmodium species, i.e. PHIST proteins, suggests that the Hepatocystis scaffolds are also subtelomeric. A complete subtelomere that includes telomeric repeats is missing in our Hepatocystis assembly. Thus, whether Hepatocystis chromosomes retain the organisation common to most Plasmodium species remains unclear.

    (TIF)

    S3 Fig. Phylogenetic tree of Haemosporidian mitochondrial proteins.

    Hepatocystis sp. ex. Piliocolobus tephrosceles (this work, marked with red arrow) appears next to a previously sequenced Hepatocystis sample from the flying fox Pteropus hypomelanus (NCBI accession FJ168565.1). Branches of the tree have been coloured by bootstrap support values from 45 (red) to 100 (green). Bootstrap values below 100 have also been added to the figure as text.

    (TIF)

    S4 Fig. Phylogenetic tree of 18 apicoplast protein sequences of Plasmodium spp. and Hepatocystis.

    Branches of the tree have been coloured by bootstrap support values from 66 (red) to 100 (green). Bootstrap values below 100 have also been added to the figure as text.

    (TIF)

    S5 Fig. Phylogenetic tree of 11 nuclear genes of Hepatocystis and Plasmodium species.

    Genes of Hepatocystis sp. ex Piliocolobus tephrosceles are highly similar to Hepatocystis epomophori genes sequenced in a different study [2]. The tree is based on the following genes: splicing factor 3B subunit 1, tubulin gamma chain, DNA polymerase delta catalytic subunit, eukaryotic translation initiation factor 2 gamma subunit, T-complex protein 1 subunit alpha, pantothenate transporter, ribonucleoside-diphosphate reductase large subunit, aminophospholipid-transporting P-ATPase, GCN20, transport protein Sec24A and RuvB-like helicase 3. Branches of the tree have been coloured by bootstrap values from 73 (red) to 100 (green). Bootstrap values below 100 have also been added to the figure as text. The red arrow points to the Hepatocystis sample from the current study.

    (TIF)

    S6 Fig. Deconvolution using CIBERSORT and the Malaria Cell Atlas accurately determines the presence and absence of different Plasmodium life stages in bulk RNA-seq data.

    (A) Pre-defined mixtures of pseudobulk RNA-seq data were deconvoluted with very high accuracy. (B) Real samples of P. berghei bulk RNA-seq from Otto et al (2014) were deconvoluted showing almost pure mixtures of gametocyte, ookinete or asexual stages as expected. The low proportions of expected parts of the IDC in each asexual sample may result from differences between what the MCA defines as a ring/trophozoite/schizont and what would microscopically be defined as such.

    (TIF)

    S7 Fig. Multiple sequence alignments of two Hepatocystis-specific gene families.

    (A) Alignment of Hepatocystis-specific gene family 1 (hep1). Pseudogenes (HEP_00099300, HEP_00250500, HEP_00323900) were not included in the alignment. HEP_00353700 is 476 amino acids long and was truncated here. (B) Alignment of Hepatocystis-specific gene family 2 (Hep2). This gene family contains a PEXEL motif (marked with a black box). Pseudogenes (HEP_00165000, HEP_00165200, HEP_00324000, HEP_00489100) were not included in the alignment.

    (TIF)

    S8 Fig. Heatmaps of Hepatocystis gene family expression in the blood of its mammalian host.

    (A) Expression levels (log vst-normalised) of hep1 genes across blood samples from multiple red colobus monkeys. The estimated proportions of early blood stages (rings/trophozoites) and mature gametocytes are highlighted above. (B) Expression levels of hep2 genes (C) Expression levels of pir genes.

    (TIF)

    S9 Fig. Heatmap of pir protein subfamilies in Hepatocystis and Plasmodium species.

    Rows correspond to species and columns correspond to pir subfamilies. The columns have been ordered by the number of sequences in each subfamily and the order of rows is approximately based on phylogeny. Colours represent the numbers of proteins belonging to each subfamily for each species. All Hepatocystis pir proteins belong to the only subfamily conserved across all these species [114] (indicated with red arrow).

    (TIF)

    S10 Fig. Some orthologues missing in Hepatocystis sp. relative to Plasmodium species show common gene expression patterns across the Plasmodium life cycle.

    (A) Malaria Cell Atlas (MCA) gene cluster 10 represents genes highly expressed in late schizonts. 25 genes from this cluster were conserved in P. ovale wallikeri and P. vivax, but were missing from our Hepatocystis genome assembly. Genes were clustered here by expression pattern and single-cells were ordered by pseudotime as in [20]. (B) MCA cluster 4 represents genes highly expressed across much of the life cycle—liver stages, trophozoites, female gametocytes and ookinetes/oocysts. 27 genes from this cluster were conserved in P. ovale wallikeri and P. vivax, but were missing from our Hepatocystis genome assembly.

    (TIF)

    S11 Fig. Alignment of phylogenetic tree of putative RBP-related gene fragment in Haemoproteus columbae.

    (A) Alignment of the translation of a sequence (GGWD01016989.1) from Haemoproteus columbae transcriptome assembly (GenBank: GGWD00000000.1) [23] with Plasmodium reticulocyte-binding proteins (RBPs) from PlasmoDB [115]. The alignment has been cropped to the length of the Haemoproteus columbae sequence. (B) Phylogenetic tree of Plasmodium RBPs and the Haemoproteus columbae sequence GGWD01016989.1, based on the alignment in panel A. Branch colours indicate bootstrap support values, from 33 (red) to 100 (green).

    (TIF)

    S12 Fig. Distributions of Hepatocystis dN values in Malaria Cell Atlas (MCA) clusters.

    Hepatocystis dN was calculated in 3-way comparison between Hepatocystis, P. berghei ANKA and P. ovale curtisi using codeml. The Malaria Cell Atlas clusters have been described in Fig 2B in the article on Malaria Cell Atlas [20]. (A) Hepatocystis genes with dN in the top 5%: observed versus expected ratios for Malaria Cell Atlas clusters. Hepatocystis genes that correspond to Malaria Cell Atlas clusters 2, 4 and 6 have less genes with dN rank in the top 5% than expected by chance (Fisher exact test p-value < 0.05). None of the MCA clusters contain significantly more genes ranked in the top 5% of dN than expected by chance, although there is a trend towards clusters 15 and 16 having higher dN. (B) Boxplot of all Hepatocystis dN values per each Malaria Cell Atlas cluster. Distribution of values in clusters 15 and 16 differs from the rest of the clusters. Kolmogorov-Smirnov test statistics are the following. Cluster 15 vs all other clusters: D = 0.42, p-value = 1.05e-05. Cluster 16 vs all other clusters: D = 0.52, p-value = 4.50e-12. Clusters 15 and 16 combined vs all other clusters: D = 0.46, p = 2.33e-15.

    (TIF)

    S1 Table. Summary of gene properties.

    For each gene in the assembly, the following is listed: annotation, number of exons, gene length (bp), the presence or absence of start and stop codons (reflecting the completeness of the assembly of the gene) and RNA-seq expression level (mean FPKM with standard deviation) in sample SAMN07757854 (RC106R). For the proteins encoded by the genes, the table shows the number of transmembrane segments predicted by TMHMM, ExportPred 2 score, 1 to 1 orthologs in P. berghei ANKA and P. ovale curtisi GH01 (based on OrthoMCL), PFAM domains, the number of matches to PEXEL motif (RxLxE/Q/D) and SignalP-5 signal peptide prediction.

    (XLSX)

    S2 Table. Raw and normalised Hepatocystis gene expression data.

    (XLSX)

    S3 Table. Plasmodium orthologues missing in the Hepatocystis genome assembly.

    Plasmodium berghei genes, which have an orthologue in P. ovale curtisi or P. vivax, but not the Hepatocystis sp. DNA (A) or RNA-seq (B) assemblies and are enriched in Malaria Cell Atlas gene clusters.

    (XLSX)

    S4 Table. Genes with Hepatocystis dN rank in the top 5% in codeml 3-way comparison between Hepatocystis, P. berghei ANKA and P. ovale curtisi GH01.

    The total number of genes in dN analysis was 4009, out of which 200 correspond to 5%. The table includes Malaria Cell Atlas cluster numbers for each gene.

    (XLSX)

    Attachment

    Submitted filename: Hepatocystis response 200520.docx

    Data Availability Statement

    The Hepatocystis sp. assembly can be retrieved from the European Nucleotide Archive, under the study PRJEB32891 and sample accession number ERS3649919. The individual accession numbers for the contigs are: CABPSV010000001-CABPSV010002439. Accession numbers for the apicoplast and the mitochondrion are LR699571-LR699572. Illumina HiSeq 4000 RNA-seq reads, containing a mix of Piliocolobus tephrosceles and Hepatocystis sp. sequences can be found in the European Nucleotide Archive under study accession PRJNA413051. Other data and code are available from our GitHub repository: https://github.com/adamjamesreid/hepatocystis-genome.


    Articles from PLoS Pathogens are provided here courtesy of PLOS

    RESOURCES