Abstract
Background: Plasmodium cynomolgi, a non-human primate malaria parasite species, has been an important model parasite since its discovery in 1907. Similarities in the biology of P. cynomolgi to the closely related, but less tractable, human malaria parasite P. vivax make it the model parasite of choice for liver biology and vaccine studies pertinent to P. vivax malaria. Molecular and genome-scale studies of P. cynomolgi have relied on the current reference genome sequence, which remains highly fragmented with 1,649 unassigned scaffolds and little representation of the subtelomeres.
Methods: Using long-read sequence data (Pacific Biosciences SMRT technology), we assembled and annotated a new reference genome sequence, PcyM, sourced from an Indian rhesus monkey. We compare the newly assembled genome sequence with those of several other Plasmodium species, including a re-annotated P. coatneyi assembly.
Results: The new PcyM genome assembly is of significantly higher quality than the existing reference, comprising only 56 pieces, no gaps and an improved average gene length. Detailed manual curation has ensured a comprehensive annotation of the genome with 6,632 genes, nearly 1,000 more than previously attributed to P. cynomolgi. The new assembly also has an improved representation of the subtelomeric regions, which account for nearly 40% of the sequence. Within the subtelomeres, we identified more than 1300 Plasmodium interspersed repeat ( pir) genes, as well as a striking expansion of 36 methyltransferase pseudogenes that originated from a single copy on chromosome 9.
Conclusions: The manually curated PcyM reference genome sequence is an important new resource for the malaria research community. The high quality and contiguity of the data have enabled the discovery of a novel expansion of methyltransferase in the subtelomeres, and illustrates the new comparative genomics capabilities that are being unlocked by complete reference genomes.
Keywords: P. cynomolgi, PacBio assembly, P. coatneyi, methyltransferase
Introduction
Plasmodium cynomolgi, a non-human primate malaria parasite first mentioned by Mayer in 1907 1 and established as a separate species from P. inui by Mulligan in 1935 2, has been used as a model parasite species since its discovery. First used to establish the level of susceptibility of Malaysian Anophelines to non-human primate malaria 3, P. cynomolgi forms hypnozoites (a dormant liver stage), similar to those of human-infective P. vivax and P. ovale species. Other shared characteristics between P. cynomolgi and P. vivax include erythrocyte morphology (e.g. Schüffner's stippling), amoeboidity and the tertian periodicity of intraerythrocytic asexual development (48h life-cycle). P. cynomolgi is thus regarded as a powerful model for P. vivax and potentially P. ovale human malaria. The use of P. cynomolgi as a model organism is further reinforced by it being readily infective to and transmitted by a large number of mosquito species 4– 7, and by having a wide range of natural 8– 10 and experimental hosts 3, 11.
A particular strength of the P. cynomolgi system is access to chronic infections and to the developing and dormant liver stages in a parasite similar to P. vivax. An in vivo- vitro shuttle system for the study of P. cynomolgi liver stages 12 is being exploited to better understand hypnozoite biology using molecular tools and genome-scale approaches, which rely on the availability of a complete and well annotated P. cynomolgi reference genome sequence. However, the current P. cynomolgi B reference is very fragmented 13, and lacks large parts of the subtelomeric regions, thought to harbour genes involved in host-parasite interactions. Other closely related malaria parasite species have been sequenced, including P. coatneyi 14 which is closely related to P. knowlesi, and P. simiovale that was sequenced but never systematically assembled 15.
In this paper, we describe the improved genome sequence assembly of the P. cynomolgi M strain and compare it the genomes of five other Plasmodium species ( P. vivax, P. falciparum, P. knowlesi, P. coatneyi, P. simiovale) that infect humans or monkeys, to uncover similarities and differences that may inform future studies aimed at harnessing P. cynomolgi as a model for P. vivax human malaria.
Methods
Samples
DNA was obtained from a blood stage infection of an Indian rhesus macaque donor with P. cynomolgi M strain stocks originally provided by Dr. Bill Collins from the Center for Disease Control, Atlanta. After PlasmodiPur filtration, parasites were matured in vitro overnight. Parasites were purified over a 15.1% (w/v) Nycodenz gradient and DNA was isolated using the Gentra Puregene Blood kit (Qiagen) and processed according to the manufacturers’ instructions. The material was handled carefully in order to ensure the integrity of the DNA was maintained.
Ethical approval
Ethical approval for the donor infection was provided under DEC750 following Dutch and European legislation in terms of animal experimentation. Prior to the start of the experiment, ethical approval for the donor monkey infection was provided by the local independent ethical committee, complying with Dutch law (BPRC Dier Experimenten Commissie, DEC; agreement number DEC# 750). The monkey was healthy as assessed by a veterinarian and as determined by clinical and hematological parameters measured before the start of the experiment. The experiment was performed according to Dutch and European laws. The Council of the Association for Assessment and Accreditation of Laboratory Animal Care (AAALAC International) has awarded BPRC full accreditation. Thus, BPRC is fully compliant with the international demands on animal studies and welfare as set forth by the European Council Directive 2010/63/EU, and Convention ETS 123, including the revised Appendix A as well as the ‘Standard for humane care and use of Laboratory Animals by Foreign institutions’ identification number A5539-01, provided by the Department of Health and Human Services of the United States of America’s National Institutes of Health (NIH) and Dutch implementing legislation.
The donor monkey ( Macaca mulatta, male, age 5 years, Indian origin) used in this study was captive-bred and socially housed. Animal housing was according to international guidelines for nonhuman primate care and use. Besides the standard feeding regime, and drinking water ad libitum via an automatic watering system, the animal followed an environmental enrichment program in which, next to permanent and rotating non-food enrichment, an item of food-enrichment was daily offered to the macaque. Monitoring of parasitemia was done by thigh pricks each time followed by a reward. The intravenous injection and large blood collection were performed under ketamine sedation, and all efforts were made to minimize any suffering of the animal. The monkey was daily monitored for health and discomfort. Immediately after taking blood from the monkey, the monkey was cured from malaria by intramuscular injection of chloroquine (7.5 mg/kg, on 3 consecutive days) and the absence of parasites was verified two weeks after treatment by microscopy of Giemsa stained slides of thigh prick blood of the monkey.
Sequencing, assembly and annotation of P. cynomolgi
Genomic DNA was sheared into 250–350 base-pair fragments by focused ultrasonication (Covaris Adaptive Focused Acoustics technology (AFA Inc., Woburn, USA), and amplification-free Illumina libraries were prepared 16. Paired 76-base reads were generated on the Illumina GAII platform according to the manufacturer’s standard sequencing protocol.
We also generated a SMRTbell template library using the Pacific Biosciences issued protocol (20 kb Template Preparation Using BluePippin Size-Selection System). Five SMRT cells were sequenced on the PacBio RS II platform using P5 polymerase and the chemistry version 3 (C3/P5).
Raw sequence data were deposited in the European Nucleotide Archive under accession number ERP000298.
Sequence data from the SMRT cells were assembled with HGAP 17 (version 2.3.0), assuming an assembly size of 30 Mb. The resulting draft assembly was further improved using the IPA script ( https://github.com/ThomasDOtto/IPA), version 1.0.1. This script performs the following steps:
1) deletes small contigs,
2) identifies overlapping contigs with low Illumina coverage,
3) orders contigs against the P. vivax P01 reference using ABACAS2 18 (version 1),
4) corrects errors with Illumina reads using iCORN2 19 (version 0.95),
5) circularizes the two plastid genomes with Circlator 20 (version 0.12.0); and
6) renames the chromosomes and contigs.
Draft genome annotation was transferred from P. vivax P01 using RATT 21 (version 1), and supplemented with the output of the Augustus 22 gene finder, trained on P. vivax P01 as described in 23. This was followed by manual curation of the gene models in Artemis 24 (version from January 2015).
Re-annotation of P. coatneyi
The published P. coatneyi genome assembly 14 (accession numbers CP016239 to CP016252 from NCBI) contains several large open reading frames that appear to correspond to coding sequences, especially in the subtelomeric regions. Using the reference gnomes of P. vivax P01 and P. knowlesi, we re-annotated P. coatneyi using Companion 25 (version 1.0.1). Default settings were used, with the exception of a cut-off of 0.2 for the “Augustus” parameter.
Analysis of P. simiovale
Short reads of P. simiovale were obtained from the SRA 15 (accession number SRR826495). The reads were assembled with MaSuRCA 26 (version 2.1.0), improved with PAGIT 27 (version 1) and annotated with Companion 25 (version 1.0.1), reference P. vivax P01 and default settings.
OrthoMCL
To identify orthologues, genes from the following eleven genome sequences were clustered using OrthoMCL 28 (version 1.4): the present P. cynomolgi M, P. vivax P01 29, P. falciparum 3D7 30, P. reichenowi CDC 31, the re-annotated P. coatneyi, the rodent malaria parasites ( P. yoelii, P. chabaudi and P. berghei 32), P. knowlesi 33, P. malariae and P. ovale curtisi 34. We used the May 2016 version of the genome annotations, taken from GeneDB 35. The amino acid sequences were compared using a BLASTp all-against-all, with an E-value cut-off of 1e-6. OrthoMCL version 1.4 was used, and a PERL script ascribed the gene functions to each gene ID.
MSP analysis
All the genes annotated as ‘merozoite surface protein’ from P. falciparum, P. reichenowi CDC, P. ovale curtisi, P. malariae, P. cynomolgi M, P. vivax P01, P. coatneyi and P. knowlesi were selected and compared with a BLASTp (E-value 1e-6 -F F). The results were visualized with Gephi 36 (version 0.9.1). Genes that clustered together in that analysis were aligned with mafft 37 (version 7.205, parameter --auto). The alignment was trimmed with GBLOCKS 38 (version 0.91b) in Seaview 39 (version 4.6.1) and the tree was built with raxML 40 (version 8.0.24) using the PROTGAMMAGTR model and a bootstrap of 100. Visualization was done in FigTree 41 (version 1.4.2).
Methyltransferases
Genes with the product ‘methyltransferase’ were all selected as nucleotide sequences. A selection of these genes, based on sequence similarity, was aligned with mafft. The phylogenetic tree was generated as the MSP tree, using the PROTGAMMAIGTR model. Potential transpososons were analysed with http://www.girinst.org 42 (using the RepbaseSubmitter section).
PIR analysis
The amino acid sequences of the Plasmodium interspersed repeat ( pir) genes were extracted from five genomes (PcyM, P. vivax P01, P. coatneyi, P. ovale curtisi and P. knowlesi). First, low complexity sequences were trimmed with seg 43. Next, proteins smaller than 250aa were excluded. A BLASTp all-against-all comparison was run (E-value 1e-6, -F F, allowing for up to 4500 hits). The results were visualized in Gephi 36, clustered with the force field and the Reingold-Watermann algorithm. We also clustered the pir genes from the same BLAST with TribeMCL 44, using an inflation coefficient of 1.5.
Results and discussion
Improved genome assembly and annotation
The existing P. cynomolgi reference (B-strain, referred henceforth as PcyB) is highly fragmented, with 1,649 unassigned scaffolds. We generated a new reference genome sequence ( P. cynomolgi M strain – PcyM) using high-depth (>100x) Pacific Bioscience long-read sequence data and further improved it with Illumina sequencing reads. The new PcyM assembly is significantly larger than the PcyB assembly (31 versus 26 Mb) (see Table 1), more contiguous (N90 of 370kb versus 3.9kb), and has no sequencing gaps (0 versus 1943 gaps). The unassigned scaffolds have been reduced from 1,649 in PcyB to just 40 in the new PcyM assembly (see Figure 1).
Table 1. Comparison of P. cynomolgi M, P. cynomolgi B and P. vivax P01 genome features.
Genome features | PcyM | PcyB a | PvP01 b |
---|---|---|---|
Nuclear genome | |||
Assembly size (Mb) | 30.6 | 26.2 | 29.0 |
Coverage (fold) | >150 | 161 | 212 |
G + C content (%) | 37.3 | 40.4 | 39.8 |
No. contigs assigned to chrom. | 14 | 14 | 14 |
No. unassigned contigs | 40 | 1,649 | 226 |
# Sequencing Gaps | 0 | 1943 | 560 |
No. genes c | 6,632 | 5,722 | 6,642 |
Average gene length (bp) d | 758 | 622 | 741 |
No. pir genes | 1,373 | 265 | 1,212 |
Mitochondrial genome c | |||
Assembly size (bp) | 6,017 | 5,986 | 5,989 |
G + C content (%) | 30.3 | 30,3 | 30.5 |
Apicoplast genome | |||
Assembly size (kb) | 34.5 | 29.3 | 29.6 |
G + C content (%) | 14.2 | 13.0 | 13.3 |
No. genes | 30 | 23 | 30 |
a,b: Published sequences
c: Including pseudogenes and partial genes, excluding non-coding RNA genes.
d: Based on 1-1 orthologous
These improvements in contiguity and reduction of gaps had a large impact on the quality of the gene models. Overall, genes in PcyM are similar in size to their orthologues in P. vivax P01, while those in PcyB are around 20% shorter. In terms of annotation, 966 new genes were found in the PcyM assembly compared to PcyB, with most of these genes being found in the subtelomeres (see Table 2). The new genes, however, also include 119 genes that are 1-1 orthologous to genes in P. vivax. Due to the manual curation, 12% more genes have been assigned a gene function in the new assembly. These systematic improvements make the PcyM genome sequence a better reference for the community to use when studying the biology of P. cynomolgi and relapsing malaria parasites in general.
The genome sequences were obtained from samples that were originally described as being two different strains, Mulligan (M strain) and Bastianelli (B-strain). However, a genome-wide comparison of the gene repertoires reveals that 67% of the 1:1 orthologues are identical, which is much more than the number of identical genes observed (32%) between two P. vivax isolates (P01 versus C01). This is in line with the findings in the original publication describing the PcyB genome assembly 13, suggesting that the two strains are likely derived from the same isolate. This was further confirmed by a recent study that analysed the diversity of several P. cynomolgi isolates 45. Although the authors proposed to call the isolate M/B, we will use the M(ulligan) nomenclature for continuity.
Table 2. Number of gene members of different (subtelomeric) multigene families in the genomes of P. cynomolgi B, P. cynomolgi M, P. vivax P01.
Subtelomeric
genes * |
other
(previous) names |
|||
---|---|---|---|---|
PcyM | PcyB ** | PvP01 ** | ||
Gene family | ||||
PIR protein | 1373 | 265 | 1212 | vir-like, kir-like |
tryptophan-rich
protein |
39 | 36 | 40 | Pv-fam-a,
TRAG, tryptophan-rich antigen |
methyltransferase,
pseudogene |
36 | 26 *** | 0 | |
lysophospholipase | 8 | 9 | 10 | PST-A protein |
STP1 protein | 51 | 3 | 10 | PvSTP1 |
early transcribed
membrane protein (ETRAMP) |
9 | 9 | 9 | |
Plasmodium
exported protein (PHIST), unknown function |
54 | 48 | 84 | Phist protein
(Pf-fam-b), RAD protein (Pv-fam-e) |
reticulocyte
binding protein |
6 | 8 | 9 ** | reticulocyte-
binding protein, RBP |
exported
protein **** |
276 | 175 | 447 |
Key:
*Numbers including pseudogenes and partial genes
**Published sequence
***annotated as hypothetical protein
****ExportPred
OrthoMCL clustering
To look for conserved orthologues between species, an OrthoMCL 28 clustering of genes from eleven genome assemblies was performed (see Methods and Supplementary Table 1). We used the clustering to look further into genes potentially involved in the formation and development of the dormant hypnozoite stage. There are 103 gene clusters (see Figure 2) that are common to the relapsing parasites, but absent in P. knowlesi and P. coatneyi. Of these, 73 gene clusters are uniquely shared between P. vivax P01, PcyM and P. ovale curtisi GH01. The remaining 30 clusters are either shared with various combinations of the other nine parasite species (see Supplementary Table 1) or only with P. malariae (20 out of the 30 clusters).
The 73 clusters unique to the relapsing parasites include three tryptophan rich protein clusters where the orthology is 1:1:1 with the exception of one cluster in which P. vivax presents an expansion to four genes; two PHIST proteins (before named RAD and Pv-fam-e) clusters containing 1:1:1 orthologs; 11 clusters featuring 1:1:1 orthologs annotated as ‘Plasmodium exported proteins’; three clusters of 1:1:1 hypothetical protein orthologs; one cluster annotated as MSP-7 or MSP-7-like and 56 pir gene clusters showing different degrees of expansion in the three relapsing species. While their specificity is interesting, clusters corresponding to multigene families are probably less likely to have a direct function in dormancy. The hypothetical protein clusters (PcyM_0326800, PcyM_0423700 and PcyM_0904700), however, being specific to the three relapsing Plasmodium species, are intriguing, as is the MSP-like protein cluster.
Paralogous expansion of the merozoite surface protein (MSP) family. Although the specific function of the different merozoite surface proteins (MSPs) remains elusive, MSP-1 and MSP-3 are currently under evaluation as vaccine candidates. The OrthoMCL clustering shows that MSP-1, MSP-1 paralog, MSP-4, MSP-5, MSP-9 and MSP-10 are highly conserved and present across different Plasmodium species. MSP-2 and MSP-6 are present only in P. falciparum and P. reichenowi (see Figure 3A). In contrast, MSP-3 and MSP-7/7-like are highly expanded. MSP-3 is expanded in P. vivax, P. malariae, P. ovale and P. cynomolgi (see Figure 3B). Interestingly, while in P. malariae and to P. ovale, MSP-3 paralogs seem to be species-specific, in P. cynomolgi, P. vivax, P. coatneyi and P. knowlesi many of the paralogs seem to predate speciation, indicating that MSP-3 duplicated in the common ancestor of the latter four species. These findings of MSP-3 expansions are in line with the finding of multi-allelic diversification reported previously 46, but also confirm the expansion in P. malariae and P. ovale. In addition to the pre-speciation expansion in P. cynomolgi, a species-specific expansion of MSP-3 (see area indicated with ‘*’ in Figure 3B) genes suggests ongoing evolutionary pressure on these genes.
We also observed an expansion of MSP-7/7-like genes. In the OrthoMCL clustering, the genes were distributed in nine different clusters: 108, 4913, 5404, 5550, 5065, 6376 and 5765–5767 ( Supplementary Table 1). A phylogenetic tree of the MSP-7/7-like proteins revealed a complex evolutionary relationship (see Figure 3C), splitting the tree into three major clades. Across the tree we find paralogous expansions of different ages, some of which predate speciation. A particularly striking branch comprises only genes from the three hypnozoite-forming species. As a result of the large amount of genome sequences now available for different Plasmodium species, a complex pattern now emerges in the MSP7/7-like tree, suggesting that the different MSP7 proteins likely have different functions.
Improved sub-telomeres reveals insights into subtelomeric gene families
The new high-quality PcyM assembly has an improved representation of the subtelomeric regions of the genome, which now encompass nearly 40% of the genome sequence. Manual curation of the gene annotation enabled the complete set of subtelomeric genes to be resolved (see Table 2). In P. vivax, genes encoding the exported protein family ‘PHIST’, and exported proteins in general (as predicted by ExportPred 47), have paralogously expanded compared to P. cynomolgi (84 vs 54). It is tempting to speculate about the reason for the higher number of exported proteins in P. vivax. One hypothesis is that it could be due to differences in the blood cells of humans compared to primates; while another could be that they are involved in the regulation of genes involved in host parasite interaction. In P. falciparum, it was suggested that PHISTb regulates var genes 48. In P. cynomolgi, we observed an expansion of the STP1 family (51 genes). STP1 proteins are common in P. malariae and P. ovale curtisi (166 and 70 genes, respectively), but are contracted in number in P. vivax (10 genes). One may also speculate that the expansion of PHIST and exported proteins in P. vivax compensates for the lack of STP1 proteins.
The largest multigene family in P. cynomolgi comprises pir genes. The pir superfamily occurs in all Plasmodium species 49, but their function remains poorly understood. Recent studies suggest a possible role in the regulation of the establishment of chronic infections 50 and they have been found expressed in liver stage infections of rodent parasites 51. An extensive repertoire of 1373 pir genes was identified in the PcyM assembly, compared to 263 in PcyB. This updated number puts the P. cynomolgi pir gene repertoire at a similar size to that of P. vivax (1,216), while P. ovale curtisi has an even larger repertoire (1,949). Conversely, P. knowlesi, has only 70 pir genes present. Interestingly, the re-annotated P. coatneyi genome that clusters closely to P. knowlesi has 827 pir genes (see Figure 4B). In the published annotation it has just 256 pir genes.
As previously reported 29, 52, the pir genes can be grouped based on sequence similarity. We observe that the diversity of the pir repertoire is dramatically reduced in P. coatneyi and P. knowlesi. Most of the pir genes form the same cluster (cluster 0; Figure 4A). However, that cluster splits into two groups in the gene-gene network due to the different lengths of the pir genes in P. coatneyi and P. knowlesi (see Figure 4B). One hypothesis for the loss of other pir types might be the occurrence of sicaVAR genes in P. knowlesi and P. coatneyi 33. The reduction of the pir repertoire is an interesting parallel to the Laverania, where the amount of rif genes (analogous to pir genes) is reduced but a new gene family evolved, the var genes. Additionally, in the Laverania the number of rif genes drops further when the parasite is in the human compared to the primate 31.
As for the other clusters, it seems that the underlying structure of the pir genes predates the speciation of P. ovale, P. vivax and P. cynomolgi. Depending on the type of pir, the amount can fluctuate, as can be seen by the large variance in number of genes per cluster. Some clusters are specific to P. ovale and some others contain just the two human malaria parasites, P. vivax and P. ovale. Interestingly, several pir genes have 1:1 orthologues across the different species ( Supplementary Table 1, see Figure 4B). As those genes seem to be conserved across evolutionary time, it is unlikely that they are extracellular (where they would be under immune pressure), rather they must have more conserved core functions.
Expansion of methyltransferases. While paralogous expansions of pir genes and genes encoding MSP genes have been described in other Plasmodium species, P. cynomolgi exhibits an unexpected expansion of 36 methyltransferase pseudogenes. These pseudogenes are found in the subtelomeres, and were annotated as encoding 26 hypothetical proteins in the PcyB assembly. The role of pseudogenes in Plasmodium is little understood, but in several malaria parasite species conserved pseudogenes are found in the subtelomeres. In the OrthoMCL clustering, all 36 methyltransferase pseudogenes cluster with one full-length core gene (PcyM_0947500, Figure 5A). This gene is found on chromosome 9 and has one conserved orthologue across all other Plasmodium species (cluster 51, Supplementary Table 1), and is found in many other species on OrthoMCL as cluster OG5_129798. The 36 copies are spread evenly throughout the subtelomeres, without evidence of spatial clustering.
The methyltransferase pseudogenes contain motifs of the Caulimovirus, a virus often found integrated in to plant genomes, and of different retrotransposons families such as aedes aegypti, Gypsy, Helitron-5, CACTA-1, RTEX and CR1 (see Supplementary Table 2). While the Caulimovirus insert was mostly found to have occurred in an antisense orientation hinting towards a role in stability, the LTR and non-LTR insertions were found most often to have occurred in a sense orientation 53. The hits were mostly to low complexity regions, suggesting that recombination in the subtelomeres may be a result of mechanisms similar to those used by retro elements.
We also found evidence that this duplication of methyltransferases was also found in P. simiovale, a close outgroup to P. cynomolgi, P. vivax, and P. knowlesi. Fewer copies were observed in the P. simiovale assembly (13), but this may be due to the fragmentation of the assembly. Although they are generally less degenerate at their 5’ ends, they are nevertheless pseudogenized.
To further understand the duplication, we mapped the reads of P. cynomolgi, P. simiovale and P. vivax P01 against the locus on chromosome 9 containing the ancestral methyltransferase in P. cynomolgi (see Figure 5B). Although the coverage is shown as log scale, the coverage across the methyltransferase seems to be identical for P. simiovale and P. cynomolgi, but significantly lower for P. vivax. This leaves us to speculate that the number of methyltransferases is roughly the same in both P. simiovale and P. cynomolgi. Further, the coverage plot also reveals that the next core gene of unknown function, PcyM_0947600, is also duplicated. In PcyM we find two further paralogous genes: PcyM_0054800 PcyM_0012100. Furthermore, it is more often duplicated in P. simiovale, as the coverage of that gene is high ( Figure 5B). A search for structural similarity using I-TASSER 54 yielded no conclusive results.
A phylogenetic tree (see Figure 5A) shows the methyltransferase paralogs in P. cynomolgi and P. simiovale compared to the orthologues in the other species. The genes generally follow the species tree, but they are expanded in P. cynomolgi and P. simiovale. As P. simiovale is thought to be an outgroup to P. cynomolgi and P. vivax, we expect that P . vivax has lost the expansions.
We compared the location of the ancestral methyltransferase between PcyM and P. vivax. To our surprise, we found a potential open reading frame inserted between two methyltransferases in P. cynomolgi. A tBLASTn of that CDS against the Nucleotide NCBI database revealed no significant similarity to any other sequence, except for the subtelomeres of P. vivax and P. cynomolgi. A very weak hit (e-value of e-4) to a DNA translocase FtsK, is an interesting finding, in light of the potential LTR transposon-like sequences discussed previously, but is to be taken with caution. This particular open reading frame is absent in P. simiovale and seems that have occurred subsequent to the expansion and is not likely to be implicated in the expansion itself.
It remains speculative if the paralogs of the methyltransferase genes and the adjacent gene were functional in the ancestor. Hypothetical roles of the methyltransferase could involve any of the following: 1) the epigenetic control of differential pir gene expression in acute and chronic infections 50, 2) the sequence may have a role in genome stability and recombination, or 3) this could be a selfish gene that was able to transpose.
Conclusion
The availability of a new and improved P. cynomolgi reference genome sequence will enable in-depth studies of this widely used model parasite, including investigations into dormant stages and the selection of new drug targets and vaccine candidates. High quality genomics related studies will now be possible, including studies of previously missed core genes. In particular, the improved subtelomeres have enabled us to dissect the pir gene family further, and have revealed a novel and unexpected expansion of methyltransferase genes.
Data and software availability
The project number of the P. cynomolgi raw reads is deposited in the European Nucleotide Archive under accession number ERP000298. The submitted genome is under the project number PRJEB2243.
The chromosomes have the accession: LT841379-LT841394, and the scaffolds: FXLJ01000001-FXLJ01000040.
The annotation can be found at: ftp://ftp.sanger.ac.uk/pub/project/pathogens/Plasmodium/cynomolgi/M/Jan2017/
The automated re-annotation of P. coatneyi and the draft assembly of P. simiovale can be found at: ftp://ftp.sanger.ac.uk/pub/project/pathogens/Plasmodium/coatneyi/ReAnnotation/ and ftp://ftp.sanger.ac.uk/pub/project/pathogens/Plasmodium/simiovale/May2017/, respectively.
The IPA software is available on GitHub: https://github.com/ThomasDOtto/IPA. Version 1.0.1 was used for this work.
The software is also available on Zenodo: https://doi.org/10.5281/zenodo.806818 55
License: GNU General Public License v3.0
Funding Statement
This work was supported by the Wellcome Trust (098051), EVIMalaR (contract number 242095) and Gates Foundation Project OPP1023583. GGR is supported by the Medical Research Council (MR/J004111/1).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
[version 1; referees: 2 approved]
Supplementary material
Supplementary Table 1: Annotated OrthoMCL of 11 species.
Supplementary Table 2: Results of the search for motifs associated with transposons, http://www.girinst.org.
References
- 1. Mayer M: Über malaria beim Affen. Med Klin, Berl. 1907;579–580. [Google Scholar]
- 2. Mulligan HW: Descriptions of two species of monkey Plasmodium isolated from Silenus irus. Arch Protistenkunde. 1935;84(2):285–314. Reference Source [Google Scholar]
- 3. Garnham PC: A new sub-species of Plasmodium cynomolgi. Rivista di Parassitologia. 1959;20(4):273–278. Reference Source [Google Scholar]
- 4. Eyles DE: The species of simian malaria: taxonomy, morphology, life cycle, and geographical distribution of the monkey species. J Parasitol. 1963;49(6);866–887. 10.2307/3275712 [DOI] [PubMed] [Google Scholar]
- 5. Cheong WH, Coombs GL: Transmission of Plasmodium cynomolgi (Perlis strain) to man. Se Asian J Trop Med Pub Hlth. 1970;302. [Google Scholar]
- 6. Bennet GF, Warren M, Cheong WH: Biology of the simian malarias of Southeast Asia. II. The susceptibility of some Malaysian mosquitoes to infection with five strains of Plasmodium cynomolgi. J Parasitol. 1966;52(4);625–631. 10.2307/3276417 [DOI] [PubMed] [Google Scholar]
- 7. Warren M, Wharton RH: The vectors of simian malaria: identity, biology, and geographical distribution. J Parasitol. 1963;49(6);892–904. 10.2307/3275715 [DOI] [PubMed] [Google Scholar]
- 8. Prakash S, Chakrabarti SC: The isolation and description of Plasmodium cynomolgi and Plasmodium inui from naturally occurring mixed infections in Macaca radiata radiata monkeys of the Nilgiris, Madras state, India. Ind J Malariol. 1962;303–311. [Google Scholar]
- 9. Eyles DE, Laing AB, Warren MW, et al. : Malaria parasites of Malayan leaf monkeys of the genus Presbytis. Med J Malaya. 1962;85–86. [Google Scholar]
- 10. Dissanaike AS: Simian malaria parasites of Ceylon. Bull World Health Organ. 1965;32(4);593–597. [PMC free article] [PubMed] [Google Scholar]
- 11. Wolfson F, Winter MW: Studies of Plasmodium cynomolgi in the rhesus monkey, Macaca mulatta. Am J Hyg. 1946;44(2):273–300. 10.1093/oxfordjournals.aje.a119097 [DOI] [PubMed] [Google Scholar]
- 12. Zeeman AM, van Amsterdam SM, McNamara CW, et al. : KAI407, a potent non-8-aminoquinoline compound that kills Plasmodium cynomolgi early dormant liver stage parasites in vitro. Antimicrob Agents Chemother. 2014;58(3):1586–1595. 10.1128/AAC.01927-13 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Tachibana S, Sullivan SA, Kawai S, et al. : Plasmodium cynomolgi genome sequences provide insight into Plasmodium vivax and the monkey malaria clade. Nat Genet. 2012;44(9):1051–1055. 10.1038/ng.2375 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Chien JT, Pakala SB, Geraldo JA, et al. : High-Quality Genome Assembly and Annotation for Plasmodium coatneyi, Generated Using Single-Molecule Real-Time PacBio Technology. Genome Announc. 2016;4(5): pii: e00883-16. 10.1128/genomeA.00883-16 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Hester J, Chan ER, Menard D, et al. : De novo assembly of a field isolate genome reveals novel Plasmodium vivax erythrocyte invasion genes. PLoS Negl Trop Dis. 2013;7(12):e2569. 10.1371/journal.pntd.0002569 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Kozarewa I, Ning Z, Quail MA, et al. : Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes. Nat Methods. 2009;6(4):291–295. 10.1038/nmeth.1311 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Chin CS, Alexander DH, Marks P, et al. : Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods. 2013;10(6):563–569. 10.1038/nmeth.2474 [DOI] [PubMed] [Google Scholar]
- 18. Assefa S, Keane TM, Otto TD, et al. : ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics. 2009;25(15):1968–1969. 10.1093/bioinformatics/btp347 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Otto TD, Sanders M, Berriman M, et al. : Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology. Bioinformatics. 2010;26(14):1704–1707. 10.1093/bioinformatics/btq269 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Hunt M, Silva ND, Otto TD, et al. : Circlator: automated circularization of genome assemblies using long sequencing reads. Genome Biol. 2015;16:294. 10.1186/s13059-015-0849-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Otto TD, Dillon GP, Degrave WS, et al. : RATT: Rapid Annotation Transfer Tool. Nucleic Acids Res. 2011;39(9):e57. 10.1093/nar/gkq1268 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Stanke M, Keller O, Gunduz I, et al. : AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 2006;34(Web Server issue):W435–439. 10.1093/nar/gkl200 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Otto TD: From sequence mapping to genome assemblies. Methods Mol Biol. 2015;1201:19–50. 10.1007/978-1-4939-1438-8_2 [DOI] [PubMed] [Google Scholar]
- 24. Carver T, Berriman M, Tivey A, et al. : Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database. Bioinformatics. 2008;24(23):2672–2676. 10.1093/bioinformatics/btn529 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Steinbiss S, Silva-Franco F, Brunk B, et al. : Companion: a web server for annotation and analysis of parasite genomes. Nucleic Acids Res. 2016;44(W1):W29–34. 10.1093/nar/gkw292 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Zimin AV, Marçais G, Puiu D, et al. : The MaSuRCA genome assembler. Bioinformatics. 2013;29(21):2669–2677. 10.1093/bioinformatics/btt476 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Swain MT, Tsai IJ, Assefa SA, et al. : A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs. Nat Protoc. 2012;7(7):1260–1284. 10.1038/nprot.2012.068 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Li L, Stoeckert CJ, Jr, Roos DS: OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13(9):2178–2189. 10.1101/gr.1224503 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Auburn S, Böhme U, Steinbiss S, et al. : A new Plasmodium vivax reference sequence with improved assembly of the subtelomeres reveals an abundance of pir genes [version 1; referees: 2 approved]. Wellcome Open Res. 2016;1:4. 10.12688/wellcomeopenres.9876.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Gardner MJ, Hall N, Fung E, et al. : Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 2002;419(6906):498–511. 10.1038/nature01097 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Otto TD, Rayner JC, Böhme U, et al. : Genome sequencing of chimpanzee malaria parasites reveals possible pathways of adaptation to human hosts. Nat Commun. 2014;5: 4754. 10.1038/ncomms5754 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Otto TD, Böhme U, Jackson AP, et al. : A comprehensive evaluation of rodent malaria parasite genomes and gene expression. BMC Biol. 2014;12:86. 10.1186/s12915-014-0086-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Pain A, Böhme U, Berry AE, et al. : The genome of the simian and human malaria parasite Plasmodium knowlesi. Nature. 2008;455(7214):799–803. 10.1038/nature07306 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Rutledge GG, Böhme U, Sanders M, et al. : Plasmodium malariae and P. ovale genomes provide insights into malaria parasite evolution. Nature. 2017;542(7639):101–104. 10.1038/nature21038 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Logan-Klumpler FJ, De Silva N, Boehme U, et al. : GeneDB--an annotation database for pathogens. Nucleic Acids Res. 2012;40(Database issue):D98–108. 10.1093/nar/gkr1032 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Bastian M, Heymann S, Jacomy M: In International AAAI Conference on Weblogs and Social Media2009. [Google Scholar]
- 37. Katoh K, Standley DM: MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30(4):772–780. 10.1093/molbev/mst010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Talavera G, Castresana J: Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol. 2007;56(4):564–577. 10.1080/10635150701472164 [DOI] [PubMed] [Google Scholar]
- 39. Gouy M, Guindon S, Gascuel O: SeaView version 4: A multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol Biol Evol. 2010;27(2):221–224. 10.1093/molbev/msp259 [DOI] [PubMed] [Google Scholar]
- 40. Stamatakis A: RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–1313. 10.1093/bioinformatics/btu033 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. FigTree v.1.4.2.2014. Reference Source [Google Scholar]
- 42. Kohany O, Gentles AJ, Hankus L, et al. : Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor. BMC Bioinformatics. 2006;7:474. 10.1186/1471-2105-7-474 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Wootton JC, Federhen S: Analysis of compositionally biased regions in sequence databases. Methods Enzymol.Academic Press,1996;266:554–571. 10.1016/S0076-6879(96)66035-2 [DOI] [PubMed] [Google Scholar]
- 44. Enright AJ, Van Dongen S, Ouzounis CA: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002;30(7):1575–1584. 10.1093/nar/30.7.1575 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Sutton PL, Luo Z, Divis PC, et al. : Characterizing the genetic diversity of the monkey malaria parasite Plasmodium cynomolgi. Infect Genet Evol. 2016;40:243–252. 10.1016/j.meegid.2016.03.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Rice BL, Acosta MM, Pacheco MA, et al. : The origin and diversification of the merozoite surface protein 3 ( msp3) multi-gene family in Plasmodium vivax and related parasites. Mol Phylogenet Evol. 2014;78:172–184. 10.1016/j.ympev.2014.05.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Boddey JA, Carvalho TG, Hodder AN, et al. : Role of plasmepsin V in export of diverse protein families from the Plasmodium falciparum exportome. Traffic. 2013;14(5):532–550. 10.1111/tra.12053 [DOI] [PubMed] [Google Scholar]
- 48. Oberli A, Slater LM, Cutts E, et al. : A Plasmodium falciparum PHIST protein binds the virulence factor PfEMP1 and comigrates to knobs on the host cell surface. FASEB J. 2014;28:4420–4433. 10.1096/fj.14-256057 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Janssen CS, Phillips RS, Turner CM, et al. : Plasmodium interspersed repeats: the major multigene superfamily of malaria parasites. Nucleic Acids Res. 2004;32(19):5712–5720. 10.1093/nar/gkh907 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Brugat T, Reid AJ, Lin JW, et al. : Antibody-independent mechanisms regulate the establishment of chronic Plasmodium infection. Nat Microbiol. 2017;2: 16276. 10.1038/nmicrobiol.2016.276 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Fougère A, Jackson AP, Bechtsi DP, et al. : Variant Exported Blood-Stage Proteins Encoded by Plasmodium Multigene Families Are Expressed in Liver Stages Where They Are Exported into the Parasitophorous Vacuole. PLoS Pathog. 2016;12(11):e1005917. 10.1371/journal.ppat.1005917 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Lopez FJ, Bernabeu M, Fernandez-Becerra C, et al. : A new computational approach redefines the subtelomeric vir superfamily of Plasmodium vivax. BMC Genomics. 2013;14:8. 10.1186/1471-2164-14-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. van de Lagemaat LN, Medstrand P, Mager DL: Multiple effects govern endogenous retrovirus survival patterns in human gene introns. Genome Biol. 2006;7(9):R86. 10.1186/gb-2006-7-9-r86 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Roy A, Kucukural A, Zhang Y: I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc. 2010;5(4):725–738. 10.1038/nprot.2010.5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Otto TD: ThomasDOtto/IPA: Release which is in Zendon. Zenodo. 2017. Data Source [Google Scholar]