Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2019 Feb 5;116(8):3183–3192. doi: 10.1073/pnas.1810815116

Comparative 3D genome organization in apicomplexan parasites

Evelien M Bunnik a, Aarthi Venkat b,1, Jianlin Shao b,1,2, Kathryn E McGovern c,3, Gayani Batugedara d, Danielle Worth c, Jacques Prudhomme d, Stacey A Lapp e,f,g, Chiara Andolina h,i,4, Leila S Ross j, Lauren Lawres k, Declan Brady l, Photini Sinnis m, Francois Nosten h,i, David A Fidock j, Emma H Wilson c, Rita Tewari l, Mary R Galinski e,f,g, Choukri Ben Mamoun k, Ferhat Ay b,n,5,6, Karine G Le Roch d,5,6
PMCID: PMC6386730  PMID: 30723152

Significance

From yeast to human cells, genome organization in eukaryotes has a tight relationship with gene expression. We investigated the 3D organization of chromosomes in malaria parasites to identify possible connections between genome architecture and pathogenicity. Genome organization was dominated by the clustering of Plasmodium-specific gene families in 3D space. In particular, the two most pathogenic human malaria parasites shared unique features in the organization of gene families involved in antigenic variation and immune escape. Related human parasites Babesia microti and Toxoplasma gondii that are less virulent lacked the correlation between gene expression and genome organization observed in human Plasmodium species. Our results suggest that genome organization in malaria parasites has been shaped by parasite-specific gene families and correlates with virulence.

Keywords: malaria, genome organization, virulence, Hi-C, epigenomics

Abstract

The positioning of chromosomes in the nucleus of a eukaryotic cell is highly organized and has a complex and dynamic relationship with gene expression. In the human malaria parasite Plasmodium falciparum, the clustering of a family of virulence genes correlates with their coordinated silencing and has a strong influence on the overall organization of the genome. To identify conserved and species-specific principles of genome organization, we performed Hi-C experiments and generated 3D genome models for five Plasmodium species and two related apicomplexan parasites. Plasmodium species mainly showed clustering of centromeres, telomeres, and virulence genes. In P. falciparum, the heterochromatic virulence gene cluster had a strong repressive effect on the surrounding nuclear space, while this was less pronounced in Plasmodium vivax and Plasmodium berghei, and absent in Plasmodium yoelii. In Plasmodium knowlesi, telomeres and virulence genes were more dispersed throughout the nucleus, but its 3D genome showed a strong correlation with gene expression. The Babesia microti genome showed a classical Rabl organization with colocalization of subtelomeric virulence genes, while the Toxoplasma gondii genome was dominated by clustering of the centromeres and lacked virulence gene clustering. Collectively, our results demonstrate that spatial genome organization in most Plasmodium species is constrained by the colocalization of virulence genes. P. falciparum and P. knowlesi, the only two Plasmodium species with gene families involved in antigenic variation, are unique in the effect of these genes on chromosome folding, indicating a potential link between genome organization and gene expression in more virulent pathogens.


Apicomplexans are obligate intracellular parasites that can be highly pathogenic and are responsible for a wide range of diseases in humans and animals. The phylum consists of at least 6,000 species, with potentially many undiscovered members (1). Among apicomplexan parasites that infect humans, Plasmodium spp., the causative agents of malaria, have the highest health and economic impact. The most prevalent and deadly human malaria parasite is Plasmodium falciparum, responsible for an estimated 445,000 deaths per year (2). Other Plasmodium species that infect humans include Plasmodium vivax and Plasmodium knowlesi. P. vivax is widespread predominantly outside Africa, and P. knowlesi is a zoonosis in southeast Asia. The natural hosts of P. knowlesi are long-tailed and pig-tailed macaques. However, transmission from monkeys to humans through a mosquito vector has been widely reported in Malaysia and can cause severe, potentially lethal disease (3). Other human-relevant apicomplexans include Babesia microti (4), the causative agent of human babesiosis, a malaria-like illness endemic in the United States but with worldwide distribution, and Toxoplasma gondii, the causative agent of toxoplasmosis, an opportunistic infection commonly encountered among individuals with weakened immune systems (5).

During millions of years of coevolution with their hosts, apicomplexan parasites have developed species-specific large multigene families that are involved in host–parasite interactions (6). These gene families are important for parasite survival, pathogenesis, virulence, and immune evasion. All Plasmodium species contain virulence genes that belong to the Plasmodium interspersed repeat (pir) superfamily. In addition, P. falciparum and P. knowlesi have evolved unique gene families that orchestrate these parasites to undergo antigenic variation, called var and SICAvar, respectively (7, 8). During the process of antigenic variation, the parasite can escape from host immune responses by changing which members of these large families of parasite antigens are expressed. The ability of these parasites to switch their antigenic profile correlates with their high virulence and persistence in the face of adaptive immune responses.

The biological functions of pir genes are largely unknown, but it has been suggested that they have many different roles in virulence, signaling, trafficking, protein folding, adhesion, and establishment of chronic infections (911). The availability of genome sequences for selected apicomplexan parasites has revealed the genomic landscape of these virulence gene families. Copy numbers for the virulence gene families typically range between 150 and 300 genes per organism, although there are some exceptions (for example, 980 yir genes in Plasmodium yoelii 17X) (6, 1214). These genes are located close to the telomere ends of most (sometimes all) of the 14 chromosomes of the various Plasmodium genomes. Similarly, the subtelomeric regions of B. microti chromosomes contain several small gene families encoding exported proteins. These proteins are targets of the antibody response in B. microti-infected humans and may be involved in antigenic variation (1517). T. gondii also has multiple parasite-specific gene families involved in pathogenesis and immune evasion. Some of these are subtelomeric, but most are dispersed among the genome, either as individual genes or in smaller and larger arrays (18). T. gondii does not use classic antigenic variation, although some of these Toxoplasma-specific gene families may be involved in escape from immune responses (19).

First discovered in P. knowlesi (2023), the P. falciparum var and P. knowlesi SICAvar gene families mediate antigenic variation and immune escape and may be one of the factors that make P. knowlesi and P. falciparum so lethal in humans. The var and SICAvar gene families encode P. falciparum Erythrocyte Membrane Protein 1 (PfEMP1) and Schizont Infected Cell Agglutination variant antigen, respectively, which are expressed on the surface of the infected red blood cell. As a result, these proteins are exposed to the host immune system and elicit strong antibody responses. Each parasite expresses a single PfEMP1 or a limited repertoire of SICAvar antigens (24, 25). Parasites can rapidly and efficiently escape from the host immune response by switching the gene variant that is expressed, resulting in successive cycles of antibody production and parasite escape (22, 26). In addition, PfEMP1 mediates adherence of infected red blood cells to the vasculature, which prevents clearance of the parasite by the spleen. Cytoadherence in vital organs causes tissue damage and is a major cause of pathology in P. falciparum malaria (27).

Similar to the pir genes, most var genes are located in the subtelomeric regions of all 14 P. falciparum chromosomes, although several chromosomes also harbor internal var genes. The SICAvar genes are distributed more evenly along the chromosomes, with only a few located in subtelomeric regions (7). One of the mechanisms involved in the complex network of clonal var gene expression is localization of these genes in perinuclear clusters of heterochromatin (28, 29). In previous studies (30, 31), we observed that the requirement for var genes to come together in 3D space has a strong influence on the overall organization of the P. falciparum genome. Chromosomes with internal var gene clusters form loops to accommodate the perinuclear localization of all var genes. In addition, we observed a strong association between 3D genome organization and gene expression.

Based on these observations, we hypothesized that gene families involved in antigenic variation need to be tightly regulated and therefore have a strong influence on genome organization. Organisms that do not undergo antigenic variation may thus have less-stringent requirements with respect to the structure of their genome in the nucleus. To test this hypothesis, we studied the genome architecture of five different Plasmodium species parasites, two that are known to undergo antigenic variation (P. falciparum and P. knowlesi), and three that are not (P. vivax, Plasmodium berghei, and P. yoelii). In addition, we studied two related apicomplexan parasites (B. microti and T. gondii) to identify characteristics of genome organization that are specific to the Plasmodium genus. When considering genome-wide clustering of all virulence genes, we observed that all organisms studied here, with the exception of T. gondii and P. knowlesi, showed significant colocalization for these genes. In P. knowlesi, even though contact counts between SICAvar genes show a moderate enrichment compared with random, these were found to be scattered throughout the nucleus. However, SICAvar loci showed cross-shaped patterns in intrachromosomal contact with enriched contact counts to other SICAvars on the same chromosome similar to interaction patterns created by P. falciparum var genes, suggesting that colocalization of these genes is a conserved feature and may play a role in mutually exclusive expression of SICAvars. T. gondii virulence genes are not located subtelomerically and did not cluster in 3D space, pointing toward differences in regulation of gene expression in this apicomplexan parasite.

Results

Profiling Genome Organization for Apicomplexan Parasites.

We performed Hi-C experiments for five different apicomplexan parasites using the in situ Hi-C methodology (32) (Materials and Methods): P. knowlesi, P. yoelii, P. berghei, T. gondii, and B. microti (SI Appendix, Table S1 and Fig. 1). Hi-C data for P. falciparum and P. vivax were available from previous studies (30, 31). For all Plasmodium species except P. vivax, we performed the Hi-C experiment on human blood-stage parasites. The only P. vivax sample available to this study was from mosquito salivary gland sporozoites, and we therefore also included P. falciparum sporozoites to allow comparisons between different developmental stages in different parasite species. Even though the global features of genome organization are comparable in P. falciparum trophozoites and gametocytes, we also included the P. falciparum gametocyte stage to allow a direct comparison with P. berghei gametocytes. B. microti tick isolate (LabS1) and clinical isolate (Bm1438) samples were obtained from nonsynchronized blood-stage infections in mice and from cultures that were predominantly at the last stage of intraerythrocytic development (tetrads). Finally, for T. gondii we included the rapidly replicating tachyzoite stage and the more dormant bradyzoite stage, both cultured in vitro.

Fig. 1.

Fig. 1.

Overview of samples and protocol. (A) Phylogenetic tree showing the genetic relationship between the seven different apicomplexan parasites used in this study. Adapted with permission from ref. 61; permission conveyed through Copyright Clearance Center, Inc. (B) Light microscopy images of the various parasites. (C) Schematic representation of the in situ Hi-C protocol.

For several of these samples, we performed Hi-C experiments for two or more biological replicates or highly related strains (SI Appendix, Table S1 and Materials and Methods), which showed a high degree of similarity as quantified by a Hi-C–specific reproducibility measure (33) (SI Appendix, Fig. S1). For P. falciparum, we observed higher reproducibility among replicates of the same stage compared with samples from different stages. Therefore, we combined the two gametocyte samples (early and late) as well as the two sporozoite replicates for subsequent analysis to obtain higher resolution. For P. vivax, the two sporozoite replicates showed very high reproducibility and were therefore also combined. For P. berghei, all three conditions (WT and Smc2/Smc4 mutants) were highly similar and, hence, were combined for further analysis. For B. microti, synchronous and asynchronous samples had higher reproducibility within compared with across these two groups regardless of whether the samples were from the tick or field isolates. Hence, we combined all synchronous samples together and similarly the asynchronous samples. After combining specimens with high correlations, we obtained a total of 11 samples from the various stages, strains, and conditions of the seven different apicomplexan parasites (SI Appendix, Table S1).

For each sample, the Hi-C reads were processed (i.e., mapping, filtering, pairing, and removing duplicates) using the HiCPro package (34), resulting in a total of ∼300 million unique valid read pairs out of more than 1.5 billion pairs sequenced (SI Appendix, Table S2). The observed intrachromosomal and interchromosomal contacts were aggregated into contact-count matrices at 10-kb resolution and were then normalized using the ICE method to correct for experimental and technical biases (35) (normalized contact-count heat maps are accessible at apicomplexan3d.lji.org/). Next, we inferred a consensus 3D genome structure for each of the 11 samples using the negative binomial model from the PASTIS 3D modeling toolbox (36) (https://github.com/hiclib/pastis) (Fig. 2 and SI Appendix, Fig. S2). When comparing this modeling approach to the multidimensional scaling method for PASTIS, all major hallmarks of genome organization are conserved (SI Appendix, Fig. S3). However, the negative binomial distribution model better captures the overdispersion of contact counts and was therefore used for inferring all models in this study. For assessing the stability of the consensus structures, we used 100 random initializations of the 3D coordinates and clustered the resulting 100 structures for each sample (SI Appendix, Fig. S4). For most of our samples, we did not find any subset of 3D models that distinctly cluster with each other, suggesting the resulting models are robust to differences in initialization. For P. vivax, which showed some sign of clustering, we sampled representative models from each potentially distinct cluster. Our comparison of 3D structures from the four most prominent clusters showed no striking differences in genome organization (SI Appendix, Fig. S5). Since we observed conservation of all of the main structural features among the different initializations in either case, we used a single representative model for each of our Hi-C samples. Raw and ICE-normalized contact-count heat maps, Fit-Hi-C P value heat maps (37), PASTIS Euclidean distance matrices, and PASTIS 3D models for each chromosome are shown in Dataset S1.

Fig. 2.

Fig. 2.

Hi-C data and 3D genome modeling. (A) Normalized interchromosomal contact-count heat maps at 10-kb resolution. Chromosomes are lined up in numerical order starting with chr1 in the bottom left corner. Individual chromosomes are delineated by dashed lines. Intrachromosomal contacts are not displayed, hence the white squares along the diagonal of each heat map. Larger versions of the interchromosomal heat maps as well as all intrachromosomal heat maps are accessible at apicomplexan3d.lji.org/. (B) Representative 3D models for genomes of all organisms studied here. Chromosomes are shown as transparent white ribbons. Centromeres are indicated with red spheres, telomeres with blue spheres, and virulence genes with orange spheres. Representative 3D models for asynchronous B. microti blood stages and T. gondii tachyzoites are shown in SI Appendix, Fig. S2.

Detection of Genome Assembly Problems and Their Correction with Hi-C Data.

Hi-C data have been used to detect translocation events or to improve genome assemblies based on Illumina and/or PacBio reads in many organisms, including Arabidopsis thaliana, Aedes aegypti, and humans (3840). Therefore, we first scanned our samples using a metric developed to detect genome assembly errors in Hi-C data (38) to avoid biases in our downstream analysis of 3D genome organization. Small misassemblies were observed in P. yoelii chr9 and in B. microti chr1 for the tetrad sample (SI Appendix, Fig. S6), but these are unlikely to influence our results. For P. knowlesi, we recently published an assembly of the P. knowlesi genome based our Hi-C data in combination with sequencing data generated using Illumina and PacBio platforms (41). All analyses presented here were performed using the updated version of the P. knowlesi genome assembly. All other Plasmodium and Babesia samples were error-free. However, several issues were detected for T. gondii that were handled as described below.

The current genome assembly of T. gondii consists of 14 chromosomes, the same number as for Plasmodium species and close apicomplexan relatives, such as Neospora caninum. For 13 chromosomes, the location of the centromere has previously been identified by chromatin immunoprecipitation of centromeric and pericentromeric proteins (42, 43). For chromosome VIIb, the centromere has thus far remained elusive. In the interchromosomal heat map using the original genome assembly with 14 chromosomes, it can be observed that chrVIIb and chrVIII have a higher number of interactions than any other combination of chromosomes, and that the number of contacts is highest between the right telomere of chrVIIb and the left telomere of chrVIII (Fig. 3A). These observations suggest that chrVIIb and chrVIII could be physically linked. After we computationally stitched chrVIIb and chrVIII together to create one large chromosome, the interchromosomal contact counts were at the expected levels, while no apparent discrepancies were observed in either the interchromosomal heat map (Fig. 3B) or the intrachromosomal heat map of the combined chrVIIb and chrVIII (Fig. 3C). This stitched chromosome showed a single centromere interaction with every other chromosome. In addition, we did not detect any signs of misassembly in the stitched chromosome (Fig. 3D). The small signal observed in the tachyzoite sample is not at the junction of the stitched chrVIIb and chrVIII. Based on these observations, we propose that chrVIIb and chrVIII are in fact a single chromosome. This would explain the unusual interaction pattern in our original interchromosomal interaction plot, as well as the apparent absence of a centromere in chrVIIb. We have used this stitched chromosome in all of our analyses and refer to this chromosome as chrVIII.

Fig. 3.

Fig. 3.

Correction of the T. gondii genome assembly by Hi-C data. (A) Normalized interchromosomal contact-count heat map plotted using the current version of the T. gondii genome showing unusually high levels of contact counts between chrVIIb and chrVIII (inside the red box). (B) Normalized interchromosomal contact-count heat map plotted using an updated version of the T. gondii genome in which chrVIIb and chrVIII have been stitched to form one large chrVIII. All interchromosomal contacts are at expected levels and the newly formed chromosome shows a single centromeric interaction with each other chromosome. (C) The intrachromosomal contact-count heat map of the newly formed chromosome chrVIII, showing no discrepancies in contact counts along the chromosome or at the junction. (D) Misassembly metric for the newly formed chrVIII, showing no signs of misassembly. The junction is indicated with a dashed line.

The misassembly metric detected several other issues in the T. gondii genome (SI Appendix, Fig. S6). Most prominently, chrXII showed an unusual pattern that is most likely caused by an inversion of a segment spanning from bin 272 to bin 499 (SI Appendix, Fig. S7A). The fraction of the parasite population harboring this inversion is ∼10% in the tachyzoite sample and is close to 100% in the bradyzoite population. The tachyzoite and bradyzoite samples were generated using an unmodified and a transgenic ME49 strain, respectively. This result indicates that this inversion most likely arose spontaneously in the ME49 strain and can be selected during bottleneck events such as the generation of a transgenic strain. For the purpose of this study, the contact-count patterns caused by this inversion were not considered a misassembly of the T. gondii genome.

Finally, chrIX showed an unexpected signal around bin 500 (SI Appendix, Fig. S7B). Upon inspection of the contact counts in this region, we noted that bins 498 and 505 showed a much higher number of interactions than expected based on the surrounding bins, suggesting amplification of genomic sequences within these two bins. This effect was observed in both the tachyzoite and bradyzoite samples and may point toward an error in the T. gondii genome assembly. Since Hi-C does not provide sufficient resolution to resolve such potentially tandem array of repeats, we did not attempt to correct the reference genome for this region. Finally, smaller potential misassemblies were observed in chrIV and chrV.

Colocalization of Functional Elements and Gene Families in Plasmodium.

In a previous study, we showed that during the blood stages in the human host the P. falciparum genome has a more complex organization than the Saccharomyces cerevisiae genome, which is organized in a classical Rabl conformation. In P. falciparum, similar to yeast, we observed clustering of telomeres and centromeres on opposite sides of the nucleus, but in addition we also observed domain-like structures (DLS), similar to multicellular organisms, surrounding genes involved in pathogenicity (30, 31). Clustering of telomeres and centromeres has previously been observed in budding and fission yeast (44, 45), plants, Drosophila, and recently also in mammalian cells using advanced single-cell Hi-C analysis and genome modeling (46). It can also clearly be observed in 3D models for the P. falciparum trophozoite and gametocyte stages (Fig. 2B).

Here, we analyzed the organization of these genomic hallmarks in all analyzed apicomplexan parasites by testing for an enrichment in interactions between loci of interest, as well as for colocalization in our 3D models (Table 1). The centromeres interacted with each other in P. knowlesi, P. berghei, and P. yoelii, although the interchromosomal heat maps showed that these interactions were relatively localized in P. knowlesi and P. yoelii (SI Appendix, Fig. S8). In agreement with this finding, the centromeres strongly clustered in the 3D models of these organisms. As highlighted in a previous study (31), and unlike the blood stages of any of the Plasmodium species where all pairs of centromeres significantly interact with each other, the centromeres in P. vivax salivary gland sporozoites showed weaker contacts and enrichment for only a subset of centromere pairs. Salivary gland sporozoites from P. falciparum showed no clustering of centromeres, suggesting that the loss of centromere interactions could be a general feature of the sporozoite stage. For P. vivax, the subset of significantly interacting centromere pairs was sufficient to drive significance in the contact-count-based Witten–Noble colocalization test (47), but not for strong colocalization in 3D models when evaluated visually or using 3D distances (Table 1).

Table 1.

Colocalization of centromeres, telomeres, and virulence genes

Centromeres Telomeres Virulence genes
Sample Contact counts 3D distance 3D visual Contact counts 3D distance 3D visual Contact counts 3D distance 3D visual
Pf trophozoites ++ ++ ++ ++ ++ ++ ++ ++ ++
Pf gametocytes ++ ++ + ++ ++ ++ ++ ++ ++
Pf sporozoites + ++ + + ++ +
Pv sporozoites ++ + ++ ++ + ++ ++ +
Pk trophozoites ++ ++ +
Pb gametocytes ++ ++ ++ + + + ++ +
Py mixed IDC ++ ++ + ++ ++ + ++ ++ +
Bm mixed IDC ++ ++ ++ ++ ++ ++ + +
Bm tetrads + ++ ++ ++ ++ ++ + +
Tg tachyzoites ++ ++ ++ ++ ++ +
Tg bradyzoites ++ ++ ++ ++ ++ +

++ denotes P < 0.001 for contact counts and P < 0.001 as well as mean and median pairwise distance less than 0.7 of the respective value from a randomized set of loci on 3D models; + denotes P < 0.05 for contact counts. Two different colocalization tests were used, one based on the statistical significance (37, 47) of contact counts between bins in the Hi-C data and the other based on the distance between bins in the 3D models. The “3D visual” column denotes whether the listed functional elements are tightly clustered (++), somewhat clustered (+), or not clustered at all (−) as a result of visual inspection of the 3D models. Bm, B. microti; IDC, intraerythrocytic developmental cycle; Pb, P. berghei; Pf, P. falciparum; Pk, P. knowlesi; Pv, P. vivax; Py, P. yoelii; Tg, T. gondii.

The telomeres of P. yoelii and P. vivax showed strong enrichment in contact counts, while the telomeres in P. berghei and P. knowlesi did not. However, the telomeres in P. berghei did come together in 3D space, although to a lesser extent than those in P. yoelli and P. vivax. All Plasmodium species that were analyzed in this study harbor virulence genes at the subtelomeric regions of nearly every chromosome. In line with clustering of the telomeres, we also observed colocalization of virulence genes in all of these organisms, with the exception of the P. knowlesi SICAvar genes (Table 1). The strong clustering of these genes was recapitulated in the 3D models of P. falciparum, P. vivax, P. berghei, and P. yoelii. In conclusion, while we observed varying degrees of clustering of telomeres and centromeres, all Plasmodium genomes, except for P. knowlesi, showed colocalization of pir genes.

Antigenic Variation Genes Are Associated with Formation of DLS in Plasmodium Species.

In P. knowlesi, SICAvar genes are located in subtelomeric regions but are also found scattered throughout the genome, either individually or in small groups of up to four genes. Due to the highly repetitive nature of their sequences, 31 of the SICAvar genes have low mappability. We were unable to detect strong colocalization for the 136 remaining SICAvar genes. However, our colocalization test measures whether all loci in the genome cluster and does not pick up localized clusters that consist of only a limited subset of genes. Although Hi-C signals were weak for loci that consist of only one or a few genes, the contact-count heat maps showed additional interactions between large internal and subtelomeric SICAvar gene clusters, most clearly observed in chr4 (Fig. 4 and SI Appendix, Fig. S9). This interaction is reminiscent of var gene interaction patterns observed in P. falciparum that give rise to DLS (30). We calculated a domain score to quantify the insulation of each locus from its neighborhood (Materials and Methods) and observed a total of six DLS in P. knowlesi. Using the same metric, we identified three DLS in P. falciparum and two DLS in P. vivax, but none in any of the other organisms (Fig. 4 and Dataset S1). It should be noted that these DLS are distinct from topologically associating domains observed in metazoan genomes, both in size and in mechanism of formation. Homologs of CCCTC-binding factor (CTCF) have not been identified in Plasmodium spp., and it is therefore unlikely that chromatin loop exclusion or any similar mechanism is involved in the formation and maintenance of these DLS. These results suggest that subsets of SICAvar genes throughout the genome may interact with each other, similar to the var genes in P. falciparum. Such interactions may be crucial to coordinate mutually exclusive gene expression necessary for antigenic variation.

Fig. 4.

Fig. 4.

Formation of DLS and chromosome loops by var and SICAvar genes. Top row: normalized intrachromosomal contact-count heat maps at 10-kb resolution for representative chromosomes, showing a canonical “X” shape for chromosomes of P. berghei and P. yoelii, and DLS in chromosomes with internal var and SICAvar genes in P. falciparum and P. knowlesi, respectively. Second row: domain score tracks. Dips in the tracks that reach the threshold of a DLS are marked with a black box and centromeres are marked with a black dashed line. Third row: mappability tracks. Bottom row: individual chromosome conformation extracted from the 3D model of the full genome. P. berghei and P. yoelii chromosomes show a folded structure anchored at the centromere, with both chromosome arms arranged in parallel. P. falciparum and P. knowlesi chromosomes show additional folding structures to bring virulence genes in close spatial proximity.

Genome Organization in the Apicomplexan Parasites B. microti and T. gondii.

B. microti has a relatively small genome with four chromosomes that showed strong interactions among the telomeres and the centromeres, resulting in a classic Rabl conformation (Fig. 2B and SI Appendix, Fig. S2). Consequently, subtelomeric multigene families were strongly clustered, but additional virulence genes were localized away from the telomeric cluster. The 3D models of the asynchronous and the synchronous sample were very similar (SI Appendix, Fig. S2). However, the contact-count heat maps showed additional interaction patterns within chromosomes for the synchronized samples obtained at the tetrad stage, but not for the asynchronous samples (SI Appendix, Fig. S10). For chr3, these patterns may partially be caused by a virulence gene of the BMN1 family located at position 272,813–273,799. These results suggest that these interactions may not be maintained during the complete cell cycle and underscore the importance of using tightly synchronized cell populations to be able to observe transient interactions.

In agreement with a previous microscopy study (42), T. gondii centromeres showed strong interactions and colocalized in the 3D models of both the tachyzoite and the bradyzoite stage. The telomeres also interacted, but to a much lower extent compared with the centromeres, and this interaction was not readily apparent from the 3D models (SI Appendix, Fig. S2). At the resolution used in our models, no significant differences were observed between tachyzoites and bradyzoites (SI Appendix, Fig. S2). Most virulence genes in T. gondii are not located in subtelomeric regions but are found on every chromosome arranged as single genes as well as smaller and larger arrays of up to 19 genes. As a consequence, virulence genes were scattered throughout the 3D model of the genome and we were unable to detect any significant colocalization pattern for these genes (Table 1).

Another difference between genome organization in Plasmodium species and T. gondii was observed for the strength of chromosome territories. The 3D models for T. gondii exhibited more territorialized chromosomes whereas Plasmodium chromosomes were more stretched out through the nucleus (SI Appendix, Fig. S11). To quantify these differences directly from Hi-C data, we interrogated the relationship between distance and contact probability, which has been shown to depend on the arrangement of chromosomes within the nucleus (48, 49). However, this scaling relationship has been mainly used to compare different samples of the same organism or organisms with similar genome sizes. Here we adapted this analysis for comparing all our samples ranging from ∼4.5 Mb (B. microti) to ∼70 Mb (T. gondii) in size to each other as well as to human (32) and budding yeast genomes (50) as reference points. To do that, at 10-kb resolution, we focused only on intrachromosomal interactions within 1-Mb distance and we computed the logtwofold change of the average contact for each genomic distance with respect to the overall average contact count of all possible pairs within 1 Mb (SI Appendix, Fig. S12). Our results show that, similar to human genome organization, the contacts within 50 kb to 500 kb are highly enriched for both T. gondii samples compared with all other apicomplexan parasites and budding yeast. Interestingly, this genomic distance range corresponds to topological domains (TADs) for human genome, which are known to be enriched for higher within-domain interactions. However, we have not seen clear TAD patterns in the T. gondii genome, suggesting that this trend may be related to the larger chromosomes and genome size of T. gondii compared with other apicomplexan parasites. Further analysis may be necessary to understand the relationship between genome size and chromosome territory formation in the presence and absence of TAD-like structures.

Relation Between Gene Expression and Genome Organization.

P. falciparum gene expression has been shown to be associated with the position of genes in the nucleus (30, 31). To determine the relation between gene expression and genome organization in other apicomplexan parasites, we binned the genes of each organism into 20 groups based on their distance from the centroid of all telomeres. For each bin, we calculated the average gene expression using stage-specific transcriptomics datasets and plotted these values against the average distances from the telomere centroid (Fig. 5A). We also colored the 20 bins in the 3D models based on normalized average gene expression values (Fig. 5B).

Fig. 5.

Fig. 5.

Correlation between genome organization and gene expression. (A) Relation between gene expression and distance from the centroid of the telomeres. For each organism, genes were divided into 20 bins. For each bin, the average gene expression value was plotted. Error bars denote the range of expression values within each bin. (B) Average gene expression values of each bin were projected onto the 3D models, using a color scale ranging from dark blue (low gene expression) to white (high gene expression). Centromeres are shown as yellow spheres, while telomeres are depicted as red spheres.

As expected based on previous work, P. falciparum trophozoites showed the lowest gene expression in the bin closest to the centroid of the telomere and a gradient of increasing gene expression in the next three bins. The remaining 16 bins showed relatively comparable levels of gene expression. The results for P. falciparum gametocytes were almost identical. For P. vivax sporozoites and P. berghei gametocytes, a similar but much weaker pattern was observed, with reduced gene expression only in the bin closest to the telomere centroid. P. yoelii and B. microti did not show any relation between gene expression and 3D location relative to the telomeres. Surprisingly, the P. knowlesi trophozoite stage displayed a gene expression gradient across the entire genome, with the lowest gene expression close to the centroid of the telomere. In the absence of strong telomere and centromere clustering, the 3D model of P. knowlesi looked somewhat unorganized. However, the gene-expression gradient observed here suggests that genome organization of this parasite is in fact strongly correlated with expression with highly expressed genes preferentially localizing on one side of the 3D genome and the telomeres localizing on the other side.

Finally, for the T. gondii samples, we observed a decrease in gene expression in the bin furthest away from the telomere centroid, in stark contrast to all other organisms analyzed here. This decrease was reproducible between the two stages. As can be observed in the 3D models, these genes are located at the nuclear periphery. In both tachyzoites and bradyzoites, the genes in this bin were strongly enriched for merozoite-specific gene expression (51) (P < 0.00001, two-tailed Fisher’s exact test), including 19 of 33 members of T. gondii family A proteins. Collectively, these results suggest that the organization of Plasmodium genomes is to a large extent driven by virulence genes. However, T. gondii has adopted different ways to organize gene expression in relation to its nuclear architecture, although the nuclear periphery may also function as a site of gene silencing.

Discussion

From yeast to human cells, genome organization in eukaryotes has a tight relationship to gene expression. In particular, evidence is accumulating that the compartmentalized architecture of a cell nucleus is critical to many biological functions. Evolutionarily conserved mechanisms in the compartmentalization and function of yeast and mammalian genomes have been identified, but genome organization in these organisms also shows differences due to the fact that the nuclei of single-cell organisms are typically 1,000 times smaller than those of mammals. While nuclear architecture in single-cell eukaryotes has been extensively studied in yeast, little is known about the genome organization of other unicellular organisms. We therefore investigated the genome architecture of various apicomplexan parasites. Our goal was to identify common features of genome organization and possible connections between genome architecture and pathogenicity.

In this study, we demonstrated that the Hi-C methodology can be employed to correct genome assemblies and study its 3D organization at the same time. While Hi-C experiments detect many long-range interactions, genomic segments that are physically linked and in close proximity along the DNA strand are preferentially ligated to one another. This results in a highly reproducible relationship between genomic distance and contact probability. Using this property, we detected that chromosomes VIIb and VIII of the T. gondii genome are likely to be physically linked. Our proposed genome assembly provides a coherent explanation of the apparent absence of a centromere in our original chrVIIb contact-count matrix as well as the absence of centromeric protein TgCenH3 and pericentromeric protein TgChromo1 in chrVIIb in previous ChIP-on-chip analyses (42, 43). Our corrected T. gondii assembly has one centromere for each of the 13 chromosomes and the interaction between all of these centromeres is clearly visible in interchromosomal contact maps as well as 3D models.

Once genome assemblies were corrected, we focused our analysis on the identification of commonalities and differences in genome organization between members of the Plasmodium family, as well as related apicomplexan parasites. We previously showed (30) that although the genomes of P. falciparum and S. cerevisae are similar in size, the P. falciparum genome has a more complex 3D structure. In particular, we noted that spatial complexity was added by the requirement for virulence genes of the var family to colocalize.

The results from this study demonstrate that the organization of other Plasmodium genomes is also largely driven by their virulence genes. However, in contrast to P. falciparum and P. knowlesi, the rodent malaria species harbor virulence genes only in subtelomeric regions. The organization of their genomes is therefore relatively simple and similar to fission and budding yeast, with clustering of pericentromeric and subtelomeric heterochromatin islands driving the overall structure. However, the internal localization of var and SICAvar genes necessitates the formation of chromosome loops to bring distal loci in close spatial proximity. Similarly, P. vivax chr5 harbors a locus that interacts with subtelomeric regions, generating similar domain-like patterns compared with internal var gene islands (31). This locus contains genes of the Pv-fam-e family, encoding exported proteins involved in erythrocyte remodeling. The precise function of this gene family and the reasons for its association with subtelomeric heterochromatin remain to be discovered. The clustered organization of virulence genes allows for increased rates of recombination to generate additional diversity and coordination of gene expression. We can only speculate why certain genes in the human malaria parasites are located away from subtelomeric regions, but the requirements for domain formation and complex chromosome structure highlight that additional layers of control are necessary to orchestrate mutually exclusive gene expression.

The pir genes of P. vivax and the rodent malaria parasites are not subject to the epigenetically driven mutually exclusive expression that is observed for the var and SICAvar gene families, pointing toward additional differences in regulation between these gene families. While clonal expression of var and SICAvar genes is thought to enable escape of the parasite from host antibody responses, the expression of certain pir genes has been associated with the establishment of acute and chronic infections, independent of adaptive immunity (9). This difference in virulence gene expression patterns between primate and rodent malaria parasites is reflected by the absence of several nuclear proteins in the genome of rodent malaria parasites. Of these, both the C-terminal extension of Rpb1 (52) and the histone methyltransferase PfSET2 (53, 54) have been shown to contribute to var gene regulation. These observations suggest that the various parasite lineages have evolved different strategies to survive in their respective hosts. One contributing factor could be the life span of the host. Parasites infecting long-lived primates may require escape from adaptive immune responses to establish chronic infections and ensure transmission to a susceptible host, while parasites infecting short-lived rodent hosts may benefit from more flexible expression of their virulence genes.

The unique features of Plasmodium genome organization as described above are highlighted by the analysis of B. microti and T. gondii. Both parasites show distinct differences in the structure of their genomes. The B. microti genome showed a classical Rabl organization, with colocalization of a subset of virulence genes, those located in subtelomeric regions, in 3D space. From this observation, we conclude that genome organization in B. microti is not as strongly involved in regulation of virulence gene expression as in Plasmodium parasites. In fact, we observed no association between genome structure and gene expression in general in this parasite. In T. gondii, virulence genes were scattered throughout the genome and did not colocalize in the nucleus. The only association between genome organization and gene expression that we could detect was the possible repression of stage-specific genes by the formation of perinuclear heterochromatin. Our observation that tachyzoites and bradyzoites showed similar genome organization is in line with the relatively small differences in gene expression between these two life-cycle stages, in particular compared with gene expression profiles of merozoites and oocysts (55). However, it should be mentioned that the bradyzoites used here were obtained from in vitro culture and may not fully represent bradyzoites in a tissue cyst. Attempts to prepare Hi-C libraries from bradyzoites isolated from mouse brain tissue cysts were not successful.

The distinct organization of virulence genes suggests that T. gondii has adopted different mechanisms for gene regulation compared with Plasmodium species, which may be related to its much broader host range and cell tropism. This is reflected in the larger number of ApiAP2 transcription factors encoded by the T. gondii genome (67 versus ∼27 for Plasmodium species), which are used for both transcriptional activation and repression (56). During the evolution from the ancestor of apicomplexan organisms, the Plasmodium lineage has lost many genes that were retained in T. gondii (57). Plasmodium species have thus evolved into highly specialized organisms that control their virulence through a highly restricted mechanism, whereas T. gondii has retained more properties of its free-living ancestor and is more flexible when it comes to gene regulation.

A limitation of this study is that Hi-C experiments were performed using millions of cells as input. The contact-count matrices and 3D models therefore represent the average of a population of possible genome organizations. In P. falciparum, various microscopy approaches have shown clustering of telomeres and virulence genes in two to five foci spread around the nuclear periphery (28, 29, 5860). The nature of our data does not allow us to draw a conclusion about whether virulence genes are organized into a single large cluster as observed in our 3D model or into multiple clusters that vary in var gene content among the parasite population.

Another limitation is the dependency of our 3D modeling on strong contact-count signals to establish hallmarks of genome organization. In the 3D models, strongly interacting centromeres and telomeres (as seen for example in the trophozoite stage of P. falciparum and in B. microti) were organized as clusters located at the nuclear periphery. In organisms with weaker centromere interactions (P. yoelii and possibly P. knowlesi), the 3D models showed clustering of the centromeres in the center of the nucleus. Similarly, weaker telomere interactions (as seen in P. berghei and T. gondii) also formed clusters in the nuclear center. To further explore this behavior, we deleted the centromeres (all bins containing centromeric sequence) or the telomeres (40-kb region) from each genome and generated 3D models of these modified genomes. Removal of one hallmark (centromeres or telomeres) resulted in distortion of the 3D structure and loss of clustering of the other hallmark (SI Appendix, Fig. S11). For T. gondii, fluorescence in situ hybridization (FISH) experiments have shown the colocalization of telomeres at the nuclear periphery (43). We therefore believe that the central location of centromere and telomere clusters may be an artifact of our modeling approach and consider it possible that these clusters are in fact located at the nuclear periphery. For P. knowlesi, the telomeres were mostly not mappable and the absence of a signal from the telomeres may explain the unusual genome structure that we obtained for this parasite. Even though the centromeres showed localized interactions in the contact-count heat maps, the P. knowlesi 3D model displayed only weak centromere clustering. In the absence of data from telomeres due to mappability issues, the centromeres may be more dispersed in our 3D model than in reality. The mappable telomeres showed approximately threefold higher contact counts compared with background, and it is therefore not unthinkable that the telomeres in fact cluster as well. Additional approaches such as immunofluorescence microscopy or FISH may be necessary to confirm and further investigate our Hi-C based observations.

In conclusion, this study highlights the association between spatial organization of virulence gene families and gene expression in Plasmodium species. In P. falciparum and P. knowlesi, gene families involved in antigenic variation provide a potential link between genome organization and pathogenicity. In contrast, related human parasites B. microti and T. gondii lack the correlation between gene expression and genome organization observed in human Plasmodium species. Our results emphasize the importance of 3D genome organization in eukaryotes and suggest that genome organization in malaria parasites has been shaped by parasite-specific gene families that affect virulence and clinical phenotypes. Identifying the molecular components regulating these parasite-specific genes at the chromatin structure level will assist the identification of new targets for novel therapeutic strategies.

Materials and Methods

We provide a brief description of methods below. Full details are available in SI Appendix, Supplementary Materials and Methods.

In Situ Hi-C Procedure.

Parasites were cross-linked in 1.25% formaldehyde in warm PBS for 25 min on a rocking platform in a total volume between 1 and 10 mL, depending on the number of parasites harvested. Glycine was added to a final concentration as 150 mM, followed by 15 min of incubation at 37 °C and 15 min of incubation at 4 °C, both steps on a rocking platform. The parasites were centrifuged at 660 × g for 20 min at 4 °C, resuspended in five volumes of ice-cold PBS, and incubated for 10 min at 4 °C on a rocking platform. Parasites were centrifuged at 660 × g for 15 min at 4 °C, washed once in ice-cold PBS, and stored as a pellet at −80 °C. To map the inter- and intrachromosomal contact counts, cross-linked parasites were subjected to the in situ Hi-C procedure (32), using MboI for restriction digests.

Hi-C Data Processing.

For each sample, the Hi-C reads were processed (i.e., mapping, filtering, pairing, and removing duplicates) using HiCPro package (34). The observed intrachromosomal and interchromosomal contacts were aggregated into contact-count matrices at 10-kb resolution and were then normalized using the ICE method to correct for experimental and technical biases (35).

Reproducibility Score for Samples from the Same Organisms.

To compute the reproducibility scores for Hi-C data from the same organisms, we used 3DChromatin_ReplicateQC and method GenomeDISCO (33) with default parameters. GenomeDISCO employs graph diffusion and random walks for transformation and compares the smoothed contact maps between pairs to estimate global similarity. The reproducibility scores in SI Appendix, Fig. S1 are genome-wide concordance scores, averaged across all chromosomes.

Three-Dimensional Modeling and Visualization.

We inferred a consensus 3D genome structure for each organism using the negative binomial model from the PASTIS 3D modeling toolbox (36) (https://github.com/hiclib/pastis). For assessing the stability of the consensus structures, we used 100 random initializations of the 3D coordinates to generate a set of 11 3D models for Hi-C data from various stages, strains, and conditions of the seven different apicomplexan parasites. To check the robustness of inferred 3D models to initialization differences, we calculated a disparity score for each possible pair of 3D models (100 choose 2) by first performing Procrustes transformation to find the best alignment and then computing the sum of the squares of the pointwise differences between the pair of structures. To determine stability of disparity, we clustered these disparity scores in cluster maps and observed very little variation among models with different initializations as well as conservation of all of the main structural features among these initializations (SI Appendix, Fig. S3). This allowed us to use only a single representative model for each of our Hi-C samples, which we visualized using Jmol, an open-source Java viewer for chemical structures in 3D (jmol.sourceforge.net/). PDB files and a manual to recreate the structures as displayed in Fig. 2 is available at apicomplexan3d.lji.org/.

Colocalization Tests on 3D Distances.

For each set of genes or functional annotations such as centromeres or telomeres, we characterized each locus/gene/annotation by including every 10-kb bin that it overlaps with. For assessing the P value of colocalization of loci within a given set, we computed the median pairwise distance for all pairs of loci within the set. Then, we randomly generated the same number of bins on the same chromosomes while preserving the genomic distance relationships for locus on each chromosome. The latter part is done by selecting an initial random locus and pairing it with an anchor locus in the real set and then choosing all other random loci that are at the same distance offset to the random anchor locus compared with the distance of their counterparts and the real anchor locus. This approach could be considered a generalization of Witten–Noble colocalization test (47) to 3D models instead of contact counts and to handle intrachromosomal pairs as well as interchromosomal pairs. We performed the randomization 100,000 times and computed the median pairwise distance between all pairs of loci for each random selection. This median is compared with the corresponding median from the real input set to compute the P value of observing a random set of loci that are at least as proximal to each other as the real set. The smaller the P value the more significantly colocalized the real set of loci.

Colocalization Tests on Contact Maps.

The colocalization test for contact counts was performed using the Witten–Noble test (47). Statistical confidences of pairwise contacts were computed using Fit-Hi-C (37) and thresholded at 1% false discovery rate to identify pairs of loci with significant interactions. The representation of loci by bins, the number of randomizations, and the process to get a randomized set of loci was identical to the 3D distance test. The only difference, in this case, is that the number of significant pairs was used for the calculation of P values, which corresponds to the chance of observing at least as many such contacts among pairs of a random set of loci compared with the real set.

Quantification of Insulation of Each Locus from Neighborhood.

We computed a coverage-based domain score to quantify the extent of insulation of each region from their nearest neighbors, which deviates from expected or average for regions of virulence genes in P. falciparum and several other parasites. For each bin x, this domain score was calculated as the average normalized contact count from bin x to upstream and downstream bins that were between 100-kb to 200-kb distance to x and were mappable (not filtered out by ICE normalization) to avoid near diagonal interactions. For bins that were unmappable (filtered by ICE) the domain score was linearly interpolated from the scores of neighboring mappable bins. The running median of domain score was computed for each bin ± two bins around it (five bins total) and this smoothed score was further scaled to stay in between 0 and 1 using the overall range of scores computed for each Hi-C dataset (organism + stage). The smoothed and scaled domain score was utilized for visualization and for calling of DLS using scipy.signal package find_peaks module with default parameters (score is multiplied by −1 to find dips instead of peaks). The dips corresponding to centromeric regions (±10 bins) and each telomeric end of a chromosome (10% of total chromosome length) were filtered to identify internal dip regions. We then further filtered these internal dips using a prominence—that is, the amplitude of the dip relative to the scores of surrounding regions—of 0.25, a maximum size of 10 adjacent bins.

Distance Scaling Plots.

The relationship between genomic distance between a pair of loci and the expected number of contacts for this pair was computed using Fit-Hi-C’s equal occupancy binning of contact counts (before the spline fit) with up to 100 distance bins within 30-kb to 1-Mb interval. To account for differences across genome size, chromosome length, and chromosome number for each parasite, we normalized the average number of contact counts per pair at each distance bin by the overall average for the whole distance range up to 1 Mb. We then log2-transformed these values for y axis; x axis is simply the log10 of the genomic distance.

Correlation Between Gene Expression and 3D Models.

Gene expression was represented as a function of distance to telomeres. To generate these plots, all genes are first sorted by increasing distance to the centroid of telomeres (x axis). Then, the distance was binned into 20 equal width quantiles and the log average expression value together with the range of values in the bin (y axis) was plotted for genes in each quantile. For the coloring of 3D models, all 10-kb bins are first sorted by increasing distance to the centroid of telomeres (x axis) and these distances are also binned into 20 equal width quantiles. For each quantile, the representative expression value that is used as the color gradient in 3D visualization was computed as the overall average of the average value of gene expression for all 10-kb bins in the quantile.

Data Availability.

Sequence reads have been deposited in the NCBI Sequence Read Archive with accession number SRP151138. Hi-C data for P. falciparum trophozoites from a previous study are available from the NCBI Gene Expression Omnibus (GEO) under accession number GSE50199. Hi-C data for P. falciparum gametocytes and sporozoites, as well as P. vivax sporozoites from a previous study (31) are available from the NCBI Sequence Read Archive (SRA) under accession number SRP091967.

Supplementary Material

Supplementary File
pnas.1810815116.sapp.pdf (31.5MB, pdf)
Supplementary File
pnas.1810815116.sd01.pdf (49.5MB, pdf)

Acknowledgments

We thank Nelle Varoquaux (University of California, Berkeley) for help with generating 3D genome models; Kate Cook (University of Washington, Seattle) for help with implementation of the misassembly metrics; Clay Clark, John Weger, and Glenn Hicks (Institute for Integrative Genome Biology, University of California, Riverside) for their assistance in Illumina sequencing; the Insectary and Parasitology Core Facilities at the Johns Hopkins Malaria Research Institute, in particular Abai Tripathi, Godfree Mlambo, and Chris Kizito for their outstanding work; and The Bloomberg Family Foundation for supporting these facilities. Shoklo Malaria Research Unit is part of the Mahidol-Oxford University Research Unit, supported by the Wellcome Trust. The following reagent was obtained through the MR4 as part of the BEI Resources Repository, National Institute of Allergy and Infectious Diseases (NIAID), NIH: NF54 (MRA-1000), deposited by Megan Dowler, Walter Reed Army Institute of Research. This work was supported by NIH Grants R21 AI142506, R01 AI085077, R01 AI06775, and R01 AI136511 (to K.G.L.R.), R35 GM128938 (to F.A.), and R01 AI056840 (to P.S.); the NIAID/NIH Department of Health and Human Services (Contract HHSN272201200031C), which established the Malaria Host-Pathogen Interaction Center; the Office of Research Infrastructure Programs/Office of the Director Grant P51OD011132 (to M.R.G.); University of California, Riverside Grant NIFA-Hatch-225935 (to K.G.L.R.); Bill & Melinda Gates Foundation Grant OPP1040938 (to D.A.F.); Medical Research Council Grant MR/K011782/1 (to R.T.); and the University of Texas Health Science Center at San Antonio (E.M.B.).

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: Sequence reads have been deposited in the NCBI Short Read Archive (accession no. SRP151138).

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1810815116/-/DCSupplemental.

References

  • 1.Adl SM, et al. Diversity, nomenclature, and taxonomy of protists. Syst Biol. 2007;56:684–689. doi: 10.1080/10635150701494127. [DOI] [PubMed] [Google Scholar]
  • 2.WHO 2017 World malaria report 2017. Available at https://www.who.int/malaria/publications/world-malaria-report-2017/en/. Accessed March 15, 2018.
  • 3.Cox-Singh J, et al. Plasmodium knowlesi malaria in humans is widely distributed and potentially life threatening. Clin Infect Dis. 2008;46:165–171. doi: 10.1086/524888. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Beugnet F, Moreau Y. Babesiosis. Rev Sci Tech. 2015;34:627–639. doi: 10.20506/rst.34.2.2385. [DOI] [PubMed] [Google Scholar]
  • 5.Flegr J, Prandota J, Sovičková M, Israili ZH. Toxoplasmosis–A global threat. Correlation of latent toxoplasmosis with specific disease burden in a set of 88 countries. PLoS One. 2014;9:e90203. doi: 10.1371/journal.pone.0090203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Reid AJ. Large, rapidly evolving gene families are at the forefront of host-parasite interactions in Apicomplexa. Parasitology. 2015;142(Suppl 1):S57–S70. doi: 10.1017/S0031182014001528. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Galinski MR, et al. MAHPIC CONSORTIUM Plasmodium knowlesi: A superb in vivo nonhuman primate model of antigenic variation in malaria. Parasitology. 2018;145:85–100. doi: 10.1017/S0031182017001135. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Deitsch KW, Dzikowski R. Variant gene expression and antigenic variation by malaria parasites. Annu Rev Microbiol. 2017;71:625–641. doi: 10.1146/annurev-micro-090816-093841. [DOI] [PubMed] [Google Scholar]
  • 9.Brugat T, et al. Antibody-independent mechanisms regulate the establishment of chronic Plasmodium infection. Nat Microbiol. 2017;2:16276. doi: 10.1038/nmicrobiol.2016.276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Yam XY, et al. Characterization of the Plasmodium Interspersed Repeats (PIR) proteins of Plasmodium chabaudi indicates functional diversity. Sci Rep. 2016;6:23449. doi: 10.1038/srep23449. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Cunningham D, Lawton J, Jarra W, Preiser P, Langhorne J. The pir multigene family of Plasmodium: Antigenic variation and beyond. Mol Biochem Parasitol. 2010;170:65–73. doi: 10.1016/j.molbiopara.2009.12.010. [DOI] [PubMed] [Google Scholar]
  • 12.Gardner MJ, et al. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 2002;419:498–511. doi: 10.1038/nature01097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Otto TD, et al. A comprehensive evaluation of rodent malaria parasite genomes and gene expression. BMC Biol. 2014;12:86. doi: 10.1186/s12915-014-0086-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Carlton JM, et al. Comparative genomics of the neglected human malaria parasite Plasmodium vivax. Nature. 2008;455:757–763. doi: 10.1038/nature07327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Lodes MJ, et al. Serological expression cloning of novel immunoreactive antigens of Babesia microti. Infect Immun. 2000;68:2783–2790. doi: 10.1128/iai.68.5.2783-2790.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Cornillot E, et al. Whole genome mapping and re-organization of the nuclear and mitochondrial genomes of Babesia microti isolates. PLoS One. 2013;8:e72657. doi: 10.1371/journal.pone.0072657. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Silva JC, et al. Genome-wide diversity and gene expression profiling of Babesia microti isolates identify polymorphic genes that mediate host-pathogen interactions. Sci Rep. 2016;6:35284. doi: 10.1038/srep35284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Lorenzi H, et al. Local admixture of amplified and diversified secreted pathogenesis determinants shapes mosaic Toxoplasma gondii genomes. Nat Commun. 2016;7:10147. doi: 10.1038/ncomms10147. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Lekutis C, Ferguson DJ, Grigg ME, Camps M, Boothroyd JC. Surface antigens of Toxoplasma gondii: Variations on a theme. Int J Parasitol. 2001;31:1285–1292. doi: 10.1016/s0020-7519(01)00261-2. [DOI] [PubMed] [Google Scholar]
  • 20.Howard RJ, Barnwell JW, Kao V. Antigenic variation of Plasmodium knowlesi malaria: Identification of the variant antigen on infected erythrocytes. Proc Natl Acad Sci USA. 1983;80:4129–4133. doi: 10.1073/pnas.80.13.4129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Eaton MD. The agglutination of Plasmodium knowlesi by immune serum. J Exp Med. 1938;67:857–870. doi: 10.1084/jem.67.6.857. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Brown KN, Brown IN. Immunity to malaria: Antigenic variation in chronic infections of Plasmodium knowlesi. Nature. 1965;208:1286–1288. doi: 10.1038/2081286a0. [DOI] [PubMed] [Google Scholar]
  • 23.al-Khedery B, Barnwell JW, Galinski MR. Antigenic variation in malaria: A 3′ genomic alteration associated with the expression of a P. knowlesi variant antigen. Mol Cell. 1999;3:131–141. doi: 10.1016/s1097-2765(00)80304-4. [DOI] [PubMed] [Google Scholar]
  • 24.Lapp SA, et al. Spleen-dependent regulation of antigenic variation in malaria parasites: Plasmodium knowlesi SICAvar expression profiles in splenic and asplenic hosts. PLoS One. 2013;8:e78014. doi: 10.1371/journal.pone.0078014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Chen Q, et al. Developmental selection of var gene expression in Plasmodium falciparum. Nature. 1998;394:392–395. doi: 10.1038/28660. [DOI] [PubMed] [Google Scholar]
  • 26.Brown KN, Hills LA. Antigenic variation and immunity to Plasmodium knowlesi: Antibodies which induce antigenic variation and antibodies which destroy parasites. Trans R Soc Trop Med Hyg. 1974;68:139–142. doi: 10.1016/0035-9203(74)90187-4. [DOI] [PubMed] [Google Scholar]
  • 27.Autino B, Corbett Y, Castelli F, Taramelli D. Pathogenesis of malaria in tissues and blood. Mediterr J Hematol Infect Dis. 2012;4:e2012061. doi: 10.4084/MJHID.2012.061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Freitas-Junior LH, et al. Frequent ectopic recombination of virulence factor genes in telomeric chromosome clusters of P. falciparum. Nature. 2000;407:1018–1022. doi: 10.1038/35039531. [DOI] [PubMed] [Google Scholar]
  • 29.Lopez-Rubio JJ, Mancio-Silva L, Scherf A. Genome-wide analysis of heterochromatin associates clonally variant gene regulation with perinuclear repressive centers in malaria parasites. Cell Host Microbe. 2009;5:179–190. doi: 10.1016/j.chom.2008.12.012. [DOI] [PubMed] [Google Scholar]
  • 30.Ay F, et al. Three-dimensional modeling of the P. falciparum genome during the erythrocytic cycle reveals a strong connection between genome architecture and gene expression. Genome Res. 2014;24:974–988. doi: 10.1101/gr.169417.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Bunnik EM, et al. Changes in genome organization of parasite-specific gene families during the Plasmodium transmission stages. Nat Commun. 2018;9:1910. doi: 10.1038/s41467-018-04295-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Rao SS, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159:1665–1680. doi: 10.1016/j.cell.2014.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Ursu O, et al. GenomeDISCO: A concordance score for chromosome conformation capture experiments using random walks on contact map graphs. Bioinformatics. 2018;34:2701–2707. doi: 10.1093/bioinformatics/bty164. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Servant N, et al. HiC-Pro: An optimized and flexible pipeline for Hi-C data processing. Genome Biol. 2015;16:259. doi: 10.1186/s13059-015-0831-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Imakaev M, et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat Methods. 2012;9:999–1003. doi: 10.1038/nmeth.2148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Varoquaux N, Ay F, Noble WS, Vert JP. A statistical approach for inferring the 3D structure of the genome. Bioinformatics. 2014;30:i26–i33. doi: 10.1093/bioinformatics/btu268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Ay F, Bailey TL, Noble WS. Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts. Genome Res. 2014;24:999–1011. doi: 10.1101/gr.160374.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Dudchenko O, et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017;356:92–95. doi: 10.1126/science.aal3327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Jiao WB, et al. Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data. Genome Res. 2017;27:778–786. doi: 10.1101/gr.213652.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Kaplan N, Dekker J. High-throughput genome scaffolding from in vivo DNA interaction frequency. Nat Biotechnol. 2013;31:1143–1147. doi: 10.1038/nbt.2768. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Lapp SA, et al. MaHPIC consortium PacBio assembly of a Plasmodium knowlesi genome sequence with Hi-C correction and manual annotation of the SICAvar gene family. Parasitology. 2018;145:71–84. doi: 10.1017/S0031182017001329. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Brooks CF, et al. Toxoplasma gondii sequesters centromeres to a specific nuclear region throughout the cell cycle. Proc Natl Acad Sci USA. 2011;108:3767–3772. doi: 10.1073/pnas.1006741108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Gissot M, et al. Toxoplasma gondii chromodomain protein 1 binds to heterochromatin and colocalises with centromeres and telomeres at the nuclear periphery. PLoS One. 2012;7:e32671. doi: 10.1371/journal.pone.0032671. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Duan Z, et al. A three-dimensional model of the yeast genome. Nature. 2010;465:363–367. doi: 10.1038/nature08973. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Tanizawa H, et al. Mapping of long-range associations throughout the fission yeast genome reveals global genome organization linked to transcriptional regulation. Nucleic Acids Res. 2010;38:8164–8177. doi: 10.1093/nar/gkq955. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Stevens TJ, et al. 3D structures of individual mammalian genomes studied by single-cell Hi-C. Nature. 2017;544:59–64. doi: 10.1038/nature21429. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Witten DM, Noble WS. On the assessment of statistical significance of three-dimensional colocalization of sets of genomic elements. Nucleic Acids Res. 2012;40:3849–3855. doi: 10.1093/nar/gks012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Lieberman-Aiden E, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289–293. doi: 10.1126/science.1181369. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Mirny LA. The fractal globule as a model of chromatin architecture in the cell. Chromosome Res. 2011;19:37–51. doi: 10.1007/s10577-010-9177-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Eser U, et al. Form and function of topologically associating genomic domains in budding yeast. Proc Natl Acad Sci USA. 2017;114:E3061–E3070. doi: 10.1073/pnas.1612256114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Hehl AB, et al. Asexual expansion of Toxoplasma gondii merozoites is distinct from tachyzoites and entails expression of non-overlapping gene families to attach, invade, and replicate within feline enterocytes. BMC Genomics. 2015;16:66. doi: 10.1186/s12864-015-1225-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Kishore SP, Perkins SL, Templeton TJ, Deitsch KW. An unusual recent expansion of the C-terminal domain of RNA polymerase II in primate malaria parasites features a motif otherwise found only in mammalian polymerases. J Mol Evol. 2009;68:706–714. doi: 10.1007/s00239-009-9245-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Ukaegbu UE, et al. Recruitment of PfSET2 by RNA polymerase II to variant antigen encoding loci contributes to antigenic variation in P. falciparum. PLoS Pathog. 2014;10:e1003854. doi: 10.1371/journal.ppat.1003854. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Jiang L, et al. PfSETvs methylation of histone H3K36 represses virulence genes in Plasmodium falciparum. Nature. 2013;499:223–227. doi: 10.1038/nature12361. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Behnke MS, Zhang TP, Dubey JP, Sibley LD. Toxoplasma gondii merozoite gene expression analysis with comparison to the life cycle discloses a unique expression state during enteric development. BMC Genomics. 2014;15:350. doi: 10.1186/1471-2164-15-350. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Radke JB, et al. Transcriptional repression by ApiAP2 factors is central to chronic toxoplasmosis. PLoS Pathog. 2018;14:e1007035. doi: 10.1371/journal.ppat.1007035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Woo YH, et al. Chromerid genomes reveal the evolutionary path from photosynthetic algae to obligate intracellular parasites. eLife. 2015;4:e06974. doi: 10.7554/eLife.06974. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Brancucci NMB, et al. Heterochromatin protein 1 secures survival and transmission of malaria parasites. Cell Host Microbe. 2014;16:165–176. doi: 10.1016/j.chom.2014.07.004. [DOI] [PubMed] [Google Scholar]
  • 59.Flueck C, et al. A major role for the Plasmodium falciparum ApiAP2 protein PfSIP2 in chromosome end biology. PLoS Pathog. 2010;6:e1000784. doi: 10.1371/journal.ppat.1000784. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Ralph SA, Scheidig-Benatar C, Scherf A. Antigenic variation in Plasmodium falciparum is associated with movement of var loci between subnuclear locations. Proc Natl Acad Sci USA. 2005;102:5414–5419. doi: 10.1073/pnas.0408883102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Bensch S, et al. The genome of Haemoproteus tartakovskyi and its relationship to human malaria parasites. Genome Biol Evol. 2016;8:1361–1373. doi: 10.1093/gbe/evw081. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File
pnas.1810815116.sapp.pdf (31.5MB, pdf)
Supplementary File
pnas.1810815116.sd01.pdf (49.5MB, pdf)

Data Availability Statement

Sequence reads have been deposited in the NCBI Sequence Read Archive with accession number SRP151138. Hi-C data for P. falciparum trophozoites from a previous study are available from the NCBI Gene Expression Omnibus (GEO) under accession number GSE50199. Hi-C data for P. falciparum gametocytes and sporozoites, as well as P. vivax sporozoites from a previous study (31) are available from the NCBI Sequence Read Archive (SRA) under accession number SRP091967.


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES