Skip to main content
eLife logoLink to eLife
. 2023 Feb 15;12:e81401. doi: 10.7554/eLife.81401

Antibodies to repeat-containing antigens in Plasmodium falciparum are exposure-dependent and short-lived in children in natural malaria infections

Madhura Raghavan 1, Katrina L Kalantar 2, Elias Duarte 3, Noam Teyssier 1, Saki Takahashi 1, Andrew F Kung 1, Jayant V Rajan 1, John Rek 4, Kevin KA Tetteh 5, Chris Drakeley 5, Isaac Ssewanyana 4,5, Isabel Rodriguez-Barraquer 1,6, Bryan Greenhouse 1,6,, Joseph L DeRisi 1,6,
Editors: Urszula Krzych7, Dominique Soldati-Favre8
PMCID: PMC10005774  PMID: 36790168

Abstract

Protection against Plasmodium falciparum, which is primarily antibody-mediated, requires recurrent exposure to develop. The study of both naturally acquired limited immunity and vaccine induced protection against malaria remains critical for ongoing eradication efforts. Towards this goal, we deployed a customized P. falciparum PhIP-seq T7 phage display library containing 238,068 tiled 62-amino acid peptides, covering all known coding regions, including antigenic variants, to systematically profile antibody targets in 198 Ugandan children and adults from high and moderate transmission settings. Repeat elements – short amino acid sequences repeated within a protein – were significantly enriched in antibody targets. While breadth of responses to repeat-containing peptides was twofold higher in children living in the high versus moderate exposure setting, no such differences were observed for peptides without repeats, suggesting that antibody responses to repeat-containing regions may be more exposure dependent and/or less durable in children than responses to regions without repeats. Additionally, short motifs associated with seroreactivity were extensively shared among hundreds of antigens, potentially representing cross-reactive epitopes. PfEMP1 shared motifs with the greatest number of other antigens, partly driven by the diversity of PfEMP1 sequences. These data suggest that the large number of repeat elements and potential cross-reactive epitopes found within antigenic regions of P. falciparum could contribute to the inefficient nature of malaria immunity.

Research organism: Human, P. falciparum

Introduction

Malaria, a disease caused by the single-celled eukaryotic parasite Plasmodium, caused an estimated 241 million cases and 627,000 deaths in 2020, mostly by the species Plasmodium falciparum (P. falciparum; WHO Report 2021). Various strategies have been adopted for elimination of malaria, focusing on vector control, chemoprevention and vaccines. In 2021, the World Health Organization (WHO) made its first recommendation for widespread use of a malaria vaccine, RTS,S. While this is an encouraging step, there is nevertheless need for improvement as the efficacy of RTS,S is only 30–40% and protection wanes in a few months despite a four-dose regimen (Clinical Trials Partnership, 2015; Olotu et al., 2016). To design more effective vaccines, a deeper understanding of the nature of acquired immunity to malaria is critical.

Natural protection against malaria, particularly protection from uncomplicated malaria and ability to control parasitemia, requires multiple exposures and wanes upon cessation of exposure (Doolan et al., 2009). This naturally acquired immunity develops gradually with age and increasing cumulative exposure to P. falciparum in endemic settings, where adults may obtain substantial protection from disease, and children under 5 face the highest risk of death (Doolan et al., 2009; WHO Report 2020). While a comprehensive understanding of the factors influencing the slow development of immunity is still lacking, certain properties of parasite antigens have been proposed to contribute (Portugal et al., 2013). These include, amongst other properties, antigenic variation, antigens containing repeat elements and cross-reactive epitopes (Anders, 1986; Reeder and Brown, 1996; Schofield, 1991). While antigenic variation has been extensively studied in malaria, a systematic investigation of repeat-containing antigens and cross-reactive epitopes has been lacking.

Repeat elements are those where identical or similar motifs are repeated in tandem or with spaces within a protein. Repeat elements are widely prevalent in the proteome of P. falciparum and have been described to be highly immunogenic in a few antigens (Davies et al., 2017), such as the short, linear ‘NANP’ repeats from circumsporozoite protein (CSP) present in the RTS,S and R21 vaccines (Cockburn and Seder, 2018). Due to increased valency, epitopes in repeat elements can behave differently in comparison to the presentation of the same epitope as a single copy and have the potential to alter the nature of the resulting response. For instance, increased valency may lead to increased plasmablast formation by increasing the strength of the antigen-B-cell receptor (BCR) interaction, potentially altering the T-dependent response and inducing a T-independent response (Feldmann and Easten, 1971; Kato et al., 2020; O’Connor et al., 2006; Ochiai et al., 2013; Paus et al., 2006; Schofield, 1991; Schwickert et al., 2011). Although a few repeat antigens in P. falciparum have been well characterized, there has not been a comprehensive investigation of repeat elements with respect to their seroreactivity and their associations with humoral development.

The presence of biochemically similar epitopes can lead to cross-reactivity with antibodies and BCR. While non-identical repeat elements may represent such potential cross-reactive epitopes within a protein, similar epitopes may also be present across different proteins. How the quality of humoral response may be impacted by the presence of cross-reactive epitopes remains largely unexplored, although a study with viral variant antigens points to a frustrated affinity maturation process due to conflicting selection forces from variant epitopes (Wang et al., 2015). A handful of cross-reactive epitopes have been reported in P. falciparum (Wåhlin et al., 1992) and have been proposed to negatively impact the affinity maturation process, although direct evidence is lacking (Anders, 1986). To obtain a deeper understanding of how cross-reactive epitopes influence B cell immunity to malaria, a comprehensive atlas of cross-reactive epitopes across the P. falciparum proteome is first needed.

A systematic proteome-wide investigation of the humoral response to P. falciparum would provide important insights to our understanding of malaria immunity, including features such as repeat elements and cross-reactive epitopes. Specific technical challenges have impeded progress in this area. Although high-throughput approaches like protein arrays and alpha screens have reached a high coverage of the P. falciparum proteome (Camponovo et al., 2020; Morita et al., 2017), they do not allow for high-resolution, characterization of regions within antigenic proteins. In contrast, peptide arrays offer high-resolution antigenic profiling but are inherently limited to the numbers of targets that can be produced and printed on an array, usually in the order of tens of proteins (Hou et al., 2020; Jaenisch et al., 2019).

Here, we customized a programmable phage display system (PhIP-seq; Larman et al., 2011), previously used for antigenic profiling in many diseases, including autoimmune disorders and viral infections (Mandel-Brehm et al., 2019; Rajan et al., 2021; Vazquez et al., 2020; Zamecnik et al., 2020), for interrogation of the humoral response to P. falciparum infection. We designed a custom library (‘Falciparome’) that features over 238,000 individual 62 amino acid peptides encoded in T7 Phage, tiled every 25 amino acids across all annotated P. falciparum open reading frames from 3D7/IT genomes with additional variant antigenic sequences. Importantly, PhIP-seq leverages advances in next-generation sequencing to effectively convert serological assays into digital sequence counts. Furthermore, programmable phage display allows iterative enrichment, driving a high signal to background ratio with high specificity and sensitivity (O’Donovan et al., 2020).

We performed PhIP-seq with the Falciparome phage library to characterize the targets of the naturally acquired antibody response to P. falciparum in high-resolution, leveraging well-defined cohorts composed of 198 Ugandan children and adults from two different transmission settings and compared these to a large set of US anonymous blood donors. The resulting high-resolution atlas of seroreactive peptides suggest that antibody responses to repeat-containing regions are more exposure-dependent and/or less durable in children, compared to antibody responses to regions without repeats. Further, an extensive presence of potential cross-reactive motifs was identified among antigenic peptides from many proteins highly targeted by the immune system. These results have important implications for understanding the nature of humoral response in malaria and the future vaccine designs.

Results

PhIP-seq was performed on plasma samples selected from two Ugandan cohorts with household level data on entomologic exposure as well as detailed individual characteristics (Table 1). For this study, we selected a single sample from each of 200 age-stratified individuals (children aged 2–11 years and adults) from two different sites in Uganda: Tororo, a region which had very high malaria transmission at the time of sampling (annual entomological inoculation rate [EIR] - 49 infective bites per person) and Kanungu, a region of moderately high transmission (EIR - 5 infective bites per person; Kamya et al., 2015). While the majority of individuals from Tororo were positive for infection at the time of sampling, those from Kanungu were sampled at a median of 100 days after their previous infection. We have previously shown that children acquire clinical immunity to malaria more rapidly in Tororo than Kanungu, consistent with higher rates of exposure (Rodriguez-Barraquer et al., 2018), and that adults at both sites have substantial immunity (Rek et al., 2016).

Table 1. Characteristics of the Ugandan cohorts.

Region Age group (yrs) No. of people Proportion positive for infection at the time of sample collection Time since last infection (days) - median (IQR) Incidence of symptomatic malaria per year - median (IQR) Household annual EIR* (infective bites / person) - median (IQR)
Tororo 2–3 10 0.5 18.5 (0,85) 5.8 (2.9,7.7) 56 (33,148)
4–6 30 0.66 0 (0,45) 3.6 (2.6,4.8) 59 (38,84)
7–11 30 0.63 0 (0,45) 2.3 (2,4.3) 46 (30,110)
>18 30 0.7 0 (0,45) 1.2 (0.9,1.6) 49 (35,94)
Kanungu 2–3 10 0.1 155 (61,190) 1.7 (0.9,2) 4.3 (4, 14)
4–6 30 0.2 114 (43,289) 1.5 (0.7, 2.3) 7.3 (4.5, 15)
7–11 30 0.13 121 (41,263) 1.5 (0.6, 2) 5.2 (4, 7)
>18 30 0.2 109 (38, 223) 1.1 (0.8, 1.3) 6.8 (4.8, 15.4)
*

EIR – Entomological Inoculation Rate.

Falciparome library design

We constructed a T7 phage display library programmed to display the entire proteome of P. falciparum in 62-amino acid peptides with 25-amino acid step size, resulting in 37-amino acid tiling, referred to as the Falciparome. (Methods, Figure 1). The complete design files are available at Dryad doi:10.7272/Q69S1P9G and protocol at 10.17504/protocols.io.j8nlkkrr5l5r/v1. Overall, the library includes 238,068 peptides from 8980 protein sequences, including all known protein sequences from 2 reference strains (3D7 and IT) and extensive diversity of variant sequences from key antigens including PfEMP1s, RIFINs and STEVORs (Table 2, Figure 1—figure supplement 1, Methods). Of these, 223,145 peptides from 7577 proteins are from P. falciparum. Greater than 99.5% of the programmed peptides were represented in the final packaged phage library with relatively uniform distribution of abundance, with 90% of the peptides within a 16-fold difference (Figure 2—figure supplement 1).

Figure 1. PhIP-seq overview and analysis pipeline.

Falciparome phage library displays the proteome of Plasmodium falciparum in 62-aa peptides with 25-aa step size on T7 phage and also includes variant sequences of many antigens, including major Variant Surface Antigens (VSA). PhIP-seq was performed with incubation of Falciparome library with human plasma, followed by IP of antibodies in the sample and enrichment of antibody binding phage. Two rounds of enrichment were performed and enriched phage were sequenced to obtain the identity of the encoded peptides. A filtering pipeline was then used to identify seroreactive peptides specific to the malaria cohort.

Figure 1.

Figure 1—figure supplement 1. Pipeline for library construction.

Figure 1—figure supplement 1.

Input sequences of different groups were filtered with CD-HIT to remove similar sequences with more than the indicated % identity in Table 2. The filtered sequences were then processed into peptides using the peptide processing pipeline and quality checks were performed as described in NT sequence verification.

Table 2. Composition of Falciparome phage library.

Input sequences before collapsing on similarity Identity threshold for collapsing byCD-HIT # Final collapsed Protein sequences
P. falciparum reference proteome 3D7, IT (10,771 total) 99% 6372
P. falciparum variant sequences
  • PfEMP1 (431 from 3D7, IT, IGH, RAJ116, PFCLIN, IT4, DD2 genomes)

  • RIFIN (all 3D7+IT)

  • STEVOR (all 3D7+IT)

  • SURFIN (all 3D7+IT + 15)

  • AMA1 (2)

  • CSP (6)

  • MSPDBL1 (6)

  • MSPDBL2 (5)

  • PfMC2TM (all 3D7+IT)

100%
(90% for CSP)
1205
Other variants P. reichnowi PfEMP1 (PFREICH)
Anopheles - CE5 (5), SG6 (5)
Anopheles salivary proteins 53 proteins from 19 Anopheles species as described in Figure 1 of Arcà et al., 2017 98% 708
Vaccine/Viral/Toxin sequences
  • Tetanus

  • Diphtheria

  • Pertussis

  • EBV

  • Measles

  • Mumps

  • Rubella

  • Polio

  • RotoAB

98%
(90% for RotoAB)
684
Laboratory positive controls
  • GFAP

  • GFP

  • Gephryn

  • MYC, NR1

  • Tubulin (alpha/beta)

98% 11
TOTAL PROTEINS 8,980
TOTAL PEPTIDES 238,068

PhIP-seq using the Falciparome library robustly identifies peptides that differentiate individuals in malaria endemic areas from US controls

PhIP-seq was performed with Falciparome on less than 1 µl plasma from the 200 Ugandan cohort samples, and 86 samples from New York Blood Center (US controls) were run for non-specific background correction, assuming most were unlikely to have been exposed. Two rounds of enrichment were performed. The scalability of the assay allowed for high-throughput processing of all 286 samples in replicates. High correlation observed between technical replicates (Pearson r: median (IQR)=0.96 (0.92–0.98)) indicated that the technique was highly reproducible (Figure 2—figure supplement 2). Prior to any filtering, a clean separation of Ugandan and US controls was observed (Figure 2—figure supplement 2). Furthermore, expected target peptides were enriched in a sample-specific manner - PhIP-seq with a polyclonal control antibody α-GFAP highly enriched for GFAP peptides and seroreactivity against a common virus, Epstein-Barr virus (EBV), was higher across all human samples than in the control α-GFAP experiment (Figure 2—figure supplement 3). Two Ugandan samples were dropped due to low quality data, resulting in 198 Ugandan samples for further analysis.

A stringent analysis pipeline was implemented to identify malaria-specific enriched peptides while minimizing the potential for false positives. An increase in base read counts (enrichment) compared to US controls (Z-score ≥ 3 in both replicates in a given sample) was implemented, plus a requirement that the enrichment be present in at least five Ugandan samples (Materials and methods, Figure 1, Figure 2—figure supplement 4) to identify malaria-specific enriched peptides (‘seroreactive peptides’). Using this conservative approach, a total of 9927 peptides were identified as seroreactive across all samples, representing the identified targets of antibodies in this study (Supplementary file 1).

Overview of the malaria-specific seroreactive peptides identified with PhIP-seq

The 9927 seroreactive peptides identified by the pipeline were derived from 1648 parasite proteins (‘seroreactive proteins’) and antigenic variants, representing approximately 30% of the 5400 member proteome of P. falciparum, many of which showed broad seroreactivity across pediatric and adult Ugandan samples, whereas these same peptides showed no or rare seroreactivity in US controls (Figure 2a). The number of peptides enriched (‘breadth’) in children from moderate transmission settings was less than half of that in children in high transmission settings or adults in either setting (Figure 2b), an observation that we examined in greater detail below.

Figure 2. PhIP-seq with Falciparome captures known, novel antigens and relationships between age, exposure and breadth of seroreactive regions.

(a) Heatmap of Z-score enrichment over US controls for seroreactive peptides (rows) with >10% seropositivity across different age groups in the moderate and high exposure cohorts. Peptides are sorted by protein name and samples(columns) are ordered by increasing age in each group. Examples of well-characterized (black labels) as well as under-characterized/novel (blue labels) antigens in Plasmodium falciparum identified with this approach are indicated. (b) Breadth of antibody reactivity, shown as number of seroreactive peptides in each person. Dotted red line and red text indicate median breadth for each population group. Children from the moderate transmission setting had significantly lower breadth than children from the high transmission setting as well as all adults (KS test p-value <0.05). (c) Number of proteins identified as seroreactive in this study that are specific to different stages. Stage classification is based on proteomic datasets in PlasmoDB (spectral count ≥ 1 for at least 1 peptide in a protein in a given stage is counted as expression) and shows enrichment of proteins from all life stages of Plasmodium falciparum in the human host. (d) Breadth of VSA reactivity, shown as number of variant proteins of RIFINs, STEVORs, and PfEMP1s seroreactive per person. In the moderate transmission setting, children had a significantly lower breadth than adults for PfEMP1 and both age groups poorly recognized RIFINs and STEVORs. In contrast, in the high transmission setting, children had a significantly (* KS test <0.05) higher breadth than adults for all three families.

Figure 2—source data 1. GO analysis of top seroreactive proteins.

Figure 2.

Figure 2—figure supplement 1. Histogram of read counts of Falciparome phage library.

Figure 2—figure supplement 1.

Read counts corresponding to the 5th and 95th percentile in the distribution (indicated in blue) are within a 16-fold difference. Cumulative density plot of the distribution is shown in red.
Figure 2—figure supplement 2. Technical replicates are well correlated.

Figure 2—figure supplement 2.

Top - Pearson correlation matrix of depth-adjusted read counts across all samples. Technical replicates are placed symmetrically on rows and columns. Bottom three - Representative scatter plots of reads per 500,000 (RP5K) of technical replicates of samples from Tororo, Kanungu and US.
Figure 2—figure supplement 3. Target peptides are enriched in a sample-specific manner.

Figure 2—figure supplement 3.

Top panel - PhIP-seq with polyclonal anti-GFAP enriches for GFAP peptides and enrichment is specific to IP with anti-GFAP, but is observed rarely in the Ugandan cohort and US controls. Left - Scatter plot of Reads Per 500,000 (RP5K) of technical replicates of an IP with anti-GFAP. GFAP peptides are in red. Right – Heat map of RP5K of GFAP peptides (rows) in different samples (columns). Bottom panel - Heat map of RP5K of top 10 Epstein-Barr virus (EBV) peptides (rows) with highest read counts in human samples. Enrichment is observed across Ugandan and US samples, but not in the IP with anti-GFAP.
Figure 2—figure supplement 4. Moving threshold analysis to determine optimal thresholds for calling peptides as seroreactive based on minimum Z-score and enrichment in a minimum number of samples in a group.

Figure 2—figure supplement 4.

Box plots of resultant number of seroreactive peptides for corresponding thresholds are shown for Ugandan samples and US controls. The final thresholds for calling seroreactivity were selected based on minimizing the number of peptides identified as seroreactive in the US controls and is indicated by the red box.
Figure 2—figure supplement 5. Breadth of non-redundant seroreactive peptide groups per person across age and exposure.

Figure 2—figure supplement 5.

All seroreactive peptides in each person were collapsed based on sequence similarity (sharing of 7mer identical motifs). The resulting number of non-redundant groups was used as a measure of conservative non-shared breadth. Children from the moderate transmission setting had a significantly lower breadth than children from the high transmission setting and all adults.* indicates p-value <0.05 by KS-test. Median for each group is labeled on the side of the box.
Figure 2—figure supplement 6. Breadth of seroreactivity in the variable regions of RIFIN and PfEMP1.

Figure 2—figure supplement 6.

Top - Box plot of number of domain variants seroreactive in the variable region V2 of RIFINs. Significantly different groups (KS test <0.05) are marked with an *. Bottom - Heatmap of proportion of variants from the library that are seroreactive in a given person for each PfEMP1 domain. Each column is a person. Schematic of domain structure of PfEMP1 is shown below the heatmap.

The 1648 seroreactive proteins identified here have reported expression across the lifecycle stages occurring in the human – sporozoite, asexual, and sexual blood stages (Figure 2c). Although liver stage proteomic P. falciparum datasets were not available for comparison, several known liver stage antigens in the dataset (LSA1, LSA3, etc.) were detected. Notably, very few of the proteins were exclusively detected in the mosquito oocyst stage. Among the 40 seroreactive proteins with the highest seropositivity (percent of people enriching for at least one seroreactive peptide in that protein), protective antibodies have been reported for 20 of them (Supplementary file 2). Moreover, as expected, and consistent with previous studies, the top seroreactive proteins (Supplementary file 3) were enriched for those at the host-parasite interface (GO analysis – Figure 2—source data 1).

The proteins identified here overlapped substantially with antigenic proteins identified in previous protein array screens (28%, 49%, and 44% of those reported in Camponovo et al., 2020; Crompton et al., 2010; Helb et al., 2015 respectively). However, this whole-proteome approach also identified 952 proteins not identified in the above studies. Antigens identified in previous studies may have not been enriched here because of a known limitation of PhIP-seq – it detects predominantly linear epitopes, as opposed to conformational epitopes, which would account for loss of sensitivity with respect to particular proteins. In addition, prior studies were performed in different populations and may include false positives, for example by not using a large number of unexposed human samples to account for non-specific cross-reactivity. Furthermore, in vitro protein production for arrays may not guarantee full length, or correct folding.

Expected and new relationships between age, exposure, and breadth of seroreactive regions captured at high resolution by Falciparome

Since our cohort was stratified by age and exposure, we next set out to investigate how the overall breadth of seroreactive regions varied with age and exposure. Breadth was evaluated in two ways – (i) the total number of seroreactive peptides per person (ii) the number of non-redundant seroreactive peptide groups in each person. The latter was calculated to minimize redundant counting of potential shared linear epitopes between seroreactive peptides due to the tiled nature of the library as well as common sequences across peptides (Materials and methods). Breadth increased with age in both settings, occurring more rapidly in the higher transmission setting such that children reached a similar breadth as adults by age 11 (Figure 2b, Figure 2—figure supplement 5). As a result, children in the higher transmission setting had a significantly higher breadth than children in the moderate transmission setting. In contrast, adults in both settings had comparable breadth. Overall, these results are consistent with expected expansion of the repertoire of antibody targets with recurrent exposure to P. falciparum (Crompton et al., 2010; Helb et al., 2015).

Variant surface antigen (VSA) families are highly diverse, multi-member gene families in P. falciparum that are expressed on the surface of host erythrocytes and facilitate important functions of the parasite (Niang et al., 2014; Reeder and Brown, 1996; Saito et al., 2017; Tan et al., 2015; Xie et al., 2021). Expression of these genes is typically limited to one or few members of each family per parasite, presumably to evade the host immune system. Multiple variants from three VSA families were represented in the library - PfEMP1s (431 members from seven strains), RIFINs (157 3D7+118 IT), and STEVORs (32 3D7+32 IT variants), and the breadth of seroreactive variants was investigated across age and exposure based on the number of variant proteins in each family to which the VSA seroreactive peptides belonged in each person (Materials and methods; Figure 2d, Figure 2—figure supplement 6). In the moderate transmission setting, adults had a significantly higher breadth of PfEMP1 variants recognized than children (fold increase in median breadth in adults over 4–6 and 7–11 year-old children: 1.69; KS-test p-value <0.05), suggesting an age and/or cumulative exposure-dependent increase in PfEMP1 breadth in this setting, as previously observed in Cham et al., 2009. We note that the majority of seroreactivity in the moderate setting was present in the conserved ATS domain of PfEMP1, as opposed to the variable domains (Figure 2—figure supplement 6). Because sequences of variable domains may differ between the PhIPseq library and the parasites to which cohort members have been exposed, this may result in reduced sensitivity for these domains. On the other hand, both children and adults in this setting poorly recognized RIFINs and STEVORs.

In contrast, in the high transmission setting, children had a significantly higher breadth of variants recognized than adults for all three VSAs. Children of 2–6 years had the broadest responses to RIFINs (including in the variable V2 region) and STEVORS (fold increase in median breadth in 2- to 6-year-old children over adults for RIFINs: 17; KS-test p-value <0.05, for STEVORs: 21; KS-test p-value <0.05) and children of 4–11 years to PfEMP1s (including in the variable DBL domains; fold increase in median breadth in 4- to 11-year-old children over adults for PfEMP1: 2.13; KS-test p-value <0.05), suggesting a decline in responses to variants as children develop into adults in this setting. This is consistent with observations from a previous study investigating antibody responses to PfEMP1 DBLα domains in Papua New Guinea (Barry et al., 2011). The loss of VSA breadth in adults in the high transmission setting could be due to various reasons, including a decline in antibody levels to VSA variants due to reduced antigenic exposure, as adults have a lower parasitic load than children in this setting, or a shift in the focus of the immune response to less variable targets. It may also be possible that the decrease in VSA breadth in adults reflects a transition from recognition of linear epitopes to conformational epitopes, which may not be detected in this assay (or prior microarray assays).

Tiled design of library facilitates high-resolution characterization of seroreactive proteins

The short length and tiling design of the peptides in this library facilitated high-resolution characterization of antigenic regions within seroreactive proteins. Representative examples from previously characterized proteins, such as Falciparum Interspersed Repeat Antigen (FIRA), Circumsporozoite Protein (CSP) and Liver Stage Antigen (LSA3) are shown (Figure 3a), where known epitopes consisted of short amino-acid motifs repeated multiple times within the proteins (‘repeat elements’). Comparison with a previous study using a high-density linear peptide array covering a subset of antigens showed substantial overlap of the seroreactive regions within these antigens (Jaenisch et al., 2019; Figure 3—figure supplement 1), although some differences were apparent. Differences in the length of peptides as well as nature of display (linear 15-aa peptides on an array versus phage display of 62-aa peptides) are potential explanations for these discrepancies.

Figure 3. Tiled design of library facilitates high resolution characterization of seroreactive regions.

(a) Examples of previously well-characterized antigens and (b) novel/previously under-characterized antigens identified in this dataset. Average percentage of people seropositive at each residue (seropositivity) based on signal from peptides spanning it are shown for each protein for different groups in the cohort. The magnitude of exposure- and age-related differences in seropositivity varies by individual protein and even within different regions of specific proteins. Reddish bars underneath each protein represent repeat elements and blue bars represent examples of regions encompassing targets of protective antibodies described in previous studies. Snapshots of sequences of repeat elements present in a protein are represented beneath the protein.

Figure 3.

Figure 3—figure supplement 1. Comparison of high-resolution localization of seroreactive regions identified in this study with regions identified through a peptide-array approach.

Figure 3—figure supplement 1.

Location of seroreactive peptides identified in this dataset (red bar) and seroreactive 15-mer peptides identified using a high-density peptide array (black bar) in Jaenisch et.al. (peptides with p-value <0.05 in (-) samples [malaria low parasitemia samples from Burkina Faso] over C [control - European samples]) for 12 vaccine candidates in that study. Average seropositivity per residue observed for moderate and high transmission samples in our study is also plotted.

Importantly, high-resolution maps of seroreactivity for over 1000 proteins were characterized for the first time in our dataset (Figure 3b). A notable example is, PHISTc (PF3D7_0801000), which has previously been described as an antigenic protein, but not dissected at high resolution (Baum et al., 2013; Dent et al., 2015). It is exported from the parasite during the asexual blood stage and has unknown function, although mildly protective antibodies have been described against the N-terminal segment (Nagaoka et al., 2021). Another example is RON4 (Rhoptry Neck Protein 4), part of the moving junction during merozoite invasion of the host (Morahan et al., 2009) and is also critical for sporozoite invasion of hepatocytes (Giovannini et al., 2011).

Beyond the overall breadth of seroreactive peptides, the dataset facilitated a high-resolution lens for investigating the effect of age and exposure on seroreactivity to individual proteins. For instance, as expected (Kazmin et al., 2017), we observed exposure-dependent seropositivity at the B-cell epitope in CSP targeted by the RTS,S vaccine (NANP repeating sequence; Figure 3a). The magnitude of exposure- and age-related differences in proportion seropositive varied by individual protein and even within different regions of specific proteins (Figure 3, Supplementary file 4), highlighting the importance of dissecting responses to different antigenic regions within seroreactive proteins.

Seroreactive proteins contain more repeat elements than non-seroreactive proteins

A prominent feature that stood out following high-resolution characterization of seroreactive regions was the presence of repeat elements, where identical or similar motifs were repeated in tandem or with gaps within a given protein (Figure 3). Previous studies focused on individual or targeted subsets of antigens in P. falciparum have highlighted the immunogenic nature of short amino acid repeat sequences (Davies et al., 2017). The proteome of P. falciparum is notoriously rich in such sequences; however, their functions have remained enigmatic and their properties have been difficult to characterize. To systematically investigate the association of seroreactivity with these elements, repeats throughout all coding sequences were first identified using RADAR (Rapid Automatic Detection and Alignment of Repeats) (Madeira et al., 2019) and then compared to both PhIP-seq seroreactive and non-seroreactive proteins. The number of repeats in each protein sequence was significantly higher in the seroreactive proteins in comparison to non-seroreactive proteins (median number of repeats per protein: seroreactive proteins – 20; non-seroreactive proteins – 6; p-value = <0.001, based on 1000 iterations [1636 proteins per iteration] of random sampling of the non-seroreactive set; Figure 4a).

Figure 4. Repeat elements are more enriched in seroreactive peptides within seroreactive proteins than non-seroreactive peptides.

(a) Distribution of cumulative frequency of repeat elements per protein is significantly higher (KS test p-value <0.05) in the seroreactive protein set than a randomly sampled subset of non-seroreactive proteins (1000 iterations). (b) Pipeline to compute the representation of repeats in each peptide as repeat index. (c) Distribution of repeat indices is significantly higher (KS test p-value <0.05) in seroreactive peptides than a randomly sampled subset of non-seroreactive peptides within seroreactive proteins (1000 iterations). Distribution of repeat indices also significantly increases with increase in seropositivity (KS test p-value <0.05 between all successive distributions). (d) Seropositivity of all peptides (dots) colored by their repeat indices in the top 9 most seropositive repeat-containing proteins shows enrichment of repeat elements in peptides with high seropositivity.

Figure 4.

Figure 4—figure supplement 1. Distribution of repeat indices of seroreactive and non-seroreactive peptides within hit proteins for different lengths and degeneracy of the repeating motif.

Figure 4—figure supplement 1.

Left three: Conservative substitutions ([GA],[ST],[DE],[NQ],[RHK],[LVI],[YFW]) are allowed at all positions in the motif. Right three: Identical residues at all positions in the motif. For all six methods of defining repeats, all seroreactive regions were significantly different from the non-seroreactive set (p = <0.01 based on 1000 random samplings of non-seroreactive set).

Seroreactive peptides, within seroreactive proteins, contain more repeat elements than non-seroreactive peptides

Next, we investigated if seroreactive regions within seroreactive proteins were enriched for repeat elements. Because the Falciparome is composed of overlapping peptides tiled across each gene, the contribution of individual peptide sequences within each seroreactive protein can be further classified into those that are seroreactive vs. those from the same protein that are non-seroreactive. This enables a comparison of repeat elements among seroreactive and non-seroreactive peptides within each protein sequence.

To accomplish this, a k-mer approach was used to characterize repeat elements (Figure 4b, Methods). Briefly, the frequency of all biochemically similar k-mers of sizes 6-9aa (approximately the size of a linear B-cell epitope) was calculated for each protein. Then, each peptide in the protein was assigned a repeat index based on the maximum intra-protein frequency of any repeat element it encompassed. To minimize redundant representation, multiple peptides from a given protein deriving their repeat indices from the same repeat element were collapsed such that a repeat element was represented only once for each protein (Figure 4b). In this manner, the set of all 5171 non-VSA seroreactive-peptides was collapsed based on their repeat elements to a set of 3091 non-redundant seroreactive peptides. The non-seroreactive peptides within each seroreactive protein were also collapsed similarly.

Overall, seroreactive peptides yielded significantly higher repeat indices than non-seroreactive peptides from seroreactive proteins, and this trend was more pronounced as a function of seropositivity (Figure 4c). The median repeat index for non-seroreactive peptides was 1, while the median index for >10% and>40% seropositivity was 3 and 13 respectively, for a kmer of size 7 (KS test p-value <0.05 between successive distributions). These results suggest that seroreactive peptides are dominated by repeat elements and those with higher seropositivity also have progressively higher repeat indices. Examination of individual proteins, including well characterized repeat-containing antigens such as FIRA, LSA1, LSA3, MESA, and GLURP, illustrate the relationship between seropositivity and repeat index (Figure 4d). This relationship was consistently observed, regardless of kmer size from 6 to 9aa, and was insensitive to the level of degeneracy or biochemical similarity used for determining repeat matches (Figure 4—figure supplement 1). However, the presence of a repeat element within any given peptide does not necessarily imply that the peptide will be seroreactive.

Taken together, these data indicate that seroreactive proteins tend to be repeat-containing proteins, and within these proteins, the individual seroreactive peptides tend to be those that contain the repeats. Furthermore, seroreactive regions that are shared widely among individuals tend to feature higher numbers of repeat elements.

Seropositivity is more exposure-dependent and short-lived in children for peptides containing repeat elements than those without repeats

To investigate whether the breadth of seroreactive repeat-containing peptides differed depending on exposure-setting and age, seroreactive peptides were first binned into two categories: those with repeats, and those without. Specifically, seroreactive peptides with a 7-mer repeat index ≥ 3 were binned together as ‘repeat-containing peptides’ and those with a 7-mer repeat index ≤ 2 were binned as ‘non-repeat peptides’. For the set of non-repeat containing peptides, breadth (number of non-repeat peptides enriched per person) was significantly higher in adults than children in both exposure settings (percent increase in median breadth in adults over 4- to 6-year old children: moderate setting – 28%; high setting – 20%; Figure 5a). However, within each set of age groups, there was no significant difference in breadth between the two exposure settings.

Figure 5. Breadth of seroreactive repeat-containing peptides, but not non-repeat peptides, increases with exposure in children.

(a) Breadth of seroreactive non-repeat peptides per person is not significantly different (KS-test p-value >0.05) between the two exposure settings within each age group. (b) Breadth of seroreactive repeat-containing peptides per person is significantly higher (KS-test p-value <0.05) in the high exposure setting than in the moderate exposure setting within the three groups in children, but not adults.

Figure 5.

Figure 5—figure supplement 1. Breadth of repeat-containing peptides per person using different repeat index thresholds for categorizing repeat-containing peptides.

Figure 5—figure supplement 1.

Age groups showing significant difference between the two transmission settings are marked by * based on a KS-test p-value <0.05.
Figure 5—figure supplement 2. Seropositivity of individual seroreactive repeat elements increases with exposure in children, but not adults.

Figure 5—figure supplement 2.

Each dot represents a seroreactive repeat element and seropositivity for the repeat element in a given group was calculated as the percent of people in that group enriching for any seroreactive peptide with that repeat element.
Figure 5—figure supplement 3. Controlling for time since infection status, breadth of seroreactive repeat-containing peptides, but not non-repeat peptides, still shows an increase with exposure in children.

Figure 5—figure supplement 3.

Groups showing significant difference between the two transmission settings are marked by * based on a KS-test p-value <0.05.
Figure 5—figure supplement 4. Breadth of seroreactive repeat-containing peptides, but not non-repeat peptides, wanes with increased time since infection in the moderate exposure setting in children.

Figure 5—figure supplement 4.

Groups showing significant differences are marked by * based on a KS-test p-value <0.05.

For repeat-containing seroreactive peptides, breadth was calculated as follows. Each repeat-containing seroreactive peptide was defined by the 7-mer (repeat element) that was used to calculate its repeat index as described above. To avoid redundant counting, all repeat-containing peptides from a given protein defined by the same repeat element were collapsed and counted only once. Similar to non-repeat peptides, breadth of these peptides was higher in adults than children, reaching a similar level in both exposure settings (percent increase in median breadth in adults over 4- to 6-year-old children: moderate setting – 193%; high setting – 56%). In contrast to non-repeat peptides however, there was an exposure dependence in the responses to repeat-containing peptides with age, such that children living in the high versus moderate exposure setting had twice the breadth of repeat-containing peptides, reaching the same level in adulthood in both settings (Figure 5b). These results were consistently observed with different thresholds for categorizing repeat-containing peptides (repeat index ≥ 4 or 5; Figure 5—figure supplement 1). Investigation of individual repeat elements recapitulated this trend and showed higher seropositivity in the high exposure setting compared to moderate exposure in children, but not adults (Figure 5—figure supplement 2). There were a small number of notable exceptions, including repeat elements from PHISTc (PF3D7_0801000), LSA3 (PF3D7_0220000), and FIRA (PF3D7_0501400), none of which showed a transmission setting-dependent response in children (Supplementary file 5).

Samples from the two exposure settings differed not just by exposure, but also with respect to time since most recent infection, reflecting the differing epidemiology of infection in these settings. In the moderate exposure setting, the median number of days since last infection was 100, whereas over 65% of the samples from the high exposure setting were taken during periods of active infection. The difference in seroreactivity to repeat-containing peptides observed here between the settings could therefore emerge from two related mechanisms. In the first, the difference could be driven by a requirement for a minimum level of cumulative exposure to the target repeats to generate a robust response. In the second, the antibody response to repeats may be inherently less durable, leading to rapid waning in the absence of frequent exposure. Each of these two possibilities were investigated below.

First, to determine the effect of exposure while controlling for infection status, children (ages 2–11 years) were combined to afford sufficient statistical power and were classified as actively infected or 60–120 days from infection (those infected 1–59 days prior were excluded from this analysis to look at a time point well past infection). For repeat containing peptides, breadth was significantly increased in the high exposure setting relative to moderate exposure, regardless of infection status at the time of sampling. This difference was not observed for non-repeat containing peptides (Figure 5—figure supplement 3). These data suggest a dependency on cumulative exposure for the observed differential antibody responses in children to repeat-containing peptides relative to non-repeat peptides.

Next, durability of responses was investigated by characterizing differences in breadth with, time since last infection, in each exposure setting (Figure 5—figure supplement 4). For repeat containing peptides, a significant decrease in breadth was evident between the most recent infections (0–30 days) and the longest (240–360 days) in the moderate exposure setting, while the difference in the high exposure setting between 0–30 days and 60–120 days was not significant; notably everyone in this setting was infected within the prior 120 days. No significant difference was evident for non-repeat peptides in either setting. Together, these results are consistent with the notion that the antibody response to repeat containing peptides are short lived relative to non-repeat peptides. Overall, the above data show that antibody responses to repeat-containing peptides may be less efficiently acquired and/or maintained in children than non-repeat peptides, but plateau at the same level of prevalence in adulthood.

Extensive sharing of motifs observed between seroreactive proteins, particularly the PfEMP1 family

While repeat elements within individual proteins were explored in the previous section, similar or identical motifs may also be shared among different proteins. If these motifs are a part of an epitope, then antibodies and BCR specific to a motif can potentially cross-react with the motif variants in different proteins, depending on accessibility and other factors. Identifying such shared motifs serves as the first step in exploring potential cross-reactivity between individual seroreactive proteins, and to identify them, a systematic investigation was performed.

First, enriched kmers (6–9 amino acids) were identified by collecting those present in a significantly (FDR-adjusted p-value <0.001) higher number of seroreactive peptides (9927) than a random sampling (1000 iterations) of 9927 peptides from the whole library. From this collection, enriched kmers that were shared by different seroreactive proteins were identified as ‘inter-protein motifs’ (Figure 6a). Using a kmer size of 7, and allowing for up to two biochemically conservative substitutions, a total of 911 significantly enriched inter-protein motifs were identified, representing 509 seroreactive proteins (Supplementary file 6). Limiting the selection of inter-protein motifs to only the most significantly enriched motif per seroreactive peptide (the motif with the lowest p-value among all motifs in each peptide) yielded 417 significant inter-protein motifs, from a similar number of proteins (507). As expected, increasing the kmer size, or further constraining the number of allowed substitutions resulted in fewer identified motifs (Supplementary file 7). For the subsequent analysis, we show results with a kmer size of 7, which is in the range of average length of a linear B-cell epitope (Buus et al., 2012). As expected, previously described cross-reactive epitopes between antigens were well represented, such as the glutamate-rich motifs in Pf11-1 and Ag332 (Mattei et al., 1989), among others (Figure 6a). Taken as a group, the collected motifs had a lower hydrophobicity index (mean Kyle-Doolittle=–1.95), a lower net charge (mean = –0.47; at pH 7), enrichment of charged glutamate, lysine, asparagine, and aspartate residues and depletion of cysteine and hydrophobic residues than a random set of motifs in the proteome (Figure 6—figure supplement 1). These biochemical characteristics are consistent with those observed in prior studies of residues in B cell epitopes (Akbar et al., 2021; Rubinstein et al., 2008).

Figure 6. Extensive sharing of motifs observed among seroreactive proteins, with the most shared with PfEMP1 family.

(a) Pipeline to identify inter-protein motifs (6-9aa) significantly enriched (FDR <0.001) in seroreactive peptides from different seroreactive proteins (different colors) over background. Background for each motif was estimated based on the number of random peptides possessing the motif in 1000 random samplings of 9927 peptides. Examples of inter-protein motifs and seroreactive proteins sharing them are also shown. (b) Network of PfEMP1 sharing inter-protein motifs with other seroreactive proteins based on 7-aa motifs with up to two conservative substitutions. PfEMP1 shared inter-protein motifs with the greatest number of other seroreactive proteins.

Figure 6.

Figure 6—figure supplement 1. Biochemical characteristics of inter-protein motifs.

Figure 6—figure supplement 1.

Top - Histogram of net charge and hydrophobicity index of the 911 inter-protein motifs (7-aa motifs with at least five identical residues and up to two conservative substitutions) in comparison to a random set of 911 kmers of the same length from Pf proteome. Bottom - Distribution of amino acid frequencies in inter-protein and random motifs. All except Methionine (M) are significantly different between the two groups.
Figure 6—figure supplement 2. Inter-protein motifs are associated with seroreactivity.

Figure 6—figure supplement 2.

(a) Design of the tiled peptide library showing segments in Peptide 4 overlapping with neighboring peptides. Start and end amino acid positions of each peptide are marked at either ends. (b) Comparison of maximum seropositivity of overlapping peptides with and without inter-protein motifs. Each row in the heatmap pertains to a collection of overlapping peptides surrounding a consecutive set of seroreactive peptides with an inter-protein motif. (c) Same as in c, but for all 'enriched' motifs in seroreactive peptides.
Figure 6—figure supplement 3. Co-occurrence of reactivity to peptides containing inter-protein motifs from different proteins within individuals.

Figure 6—figure supplement 3.

Each plot in orange depicts the Cumulative Distribution Function (CDF) for the proportion of people showing reactivity in >y proteins for the set of inter-proten motifs shared among n proteins. The background distribution (blue) is based on a random sampling of peptides without inter-protein motifs from different proteins and reflects the level of sharing observed by chance.
Figure 6—figure supplement 4. Histogram of number of other seroreactive proteins with which a seroreactive protein shares inter-protein motifs.

Figure 6—figure supplement 4.

Figure 6—figure supplement 5. Network of seroreactive proteins outside the PfEMP1 network.

Figure 6—figure supplement 5.

(a) All seroreactive proteins except PfEMP1 (b) Proteins with >30% seropositivity.

The design of the programmable phage display library used here features 62 amino acid peptides tiled with a 25 amino acid step size, yielding an overlap of 37 amino acids for sequential fragments, and 12 amino acids for every second fragment (Figure 6—figure supplement 2a). The design provides for localization of seroreactive sequences to the region of overlap when considering adjacent fragments. For all except the first and last two peptides in each protein (85% of peptides in the library), the seroreactive region can theoretically be narrowed down to a 12-13aa segment within the peptide. Given that B cell linear epitopes are typically 5–12 amino acids in length (Buus et al., 2012), the 12-13aa mapping provides a near-epitope resolution.

To test the notion that the inter-protein motifs within each peptide are actually the elements associated with the observed seroreactivity, we leveraged the tiled peptide library design by comparing inter-protein motif carrying peptides with overlapping and adjacent peptides (Figure 6—figure supplement 2b). The maximum seropositivity among peptides containing an inter-protein motif was on average 54-fold higher than the maximum seropositivity among overlapping peptides not containing the motif (using a pseudo-seropositivity of 0.1% for peptides with 0% seropositivity to facilitate fold change calculation), suggesting a strong association between seroreactivity and the inter-protein motif itself, not just the whole peptide within which it resides (comparison of median seropositivity yielded a similar result). Furthermore, a similar result was observed when the same analysis was done with all the significantly enriched kmers (Figure 6—figure supplement 2c).

Then, to evaluate potential cross-reactivity, we measured the co-occurrence of reactivity to peptides containing inter-protein motifs from different proteins within the same individual. A cumulative profile was then created from the aggregation of the results across all individuals in the study, and then compared to a background distribution drawn by random sampling of peptides without inter-protein motifs (Figure 6—figure supplement 3). We observed that shared seroreactivity of inter-protein motifs within individuals was significantly higher than in the randomly drawn peptides lacking these motifs (KS test p-value <0.01). Overall, these data suggest a possibility for cross-reactivity to these motifs within individuals. However, it is important to note that these data were derived from complex polyclonal responses in each person, and do not represent direct evaluation of the cross-reactivity of single antibodies.

On average, each inter-protein motif was shared by 3 seroreactive proteins (median = 2). Among the 509 seroreactive proteins, each of them shared inter-protein motifs with six other proteins on average (median = 3), (Supplementary file 8, Figure 6—figure supplement 4). Visualized as a network (Figure 6b), the PfEMP1 family of proteins formed a central hub to which a large number of other seroreactive proteins were connected. The PfEMP1 family of proteins possessed at least 90 shared inter-protein motifs, and this family shared those motifs with the greatest number of other seroreactive proteins (57) compared to all other proteins in this analysis. Approximately five times as many proteins shared connections with PfEMP1 than would be expected by chance (PfEMP1 shared motifs with 12–16 other proteins using a set of 9927 peptides consisting of PfEMP1 seroreactive peptides + random non-PfEMP1 peptides). Seroreactive proteins sharing motifs with PfEMP1 included many of the proteins with the highest measured seropositivity, such as RIFINs, SURFINs, FIRA, and PHISTc. This extent of sharing was driven, in part, by the number of PfEMP1 sequences included in the analysis. This was apparent when the same analysis performed with a reduced diversity of PfEMP1 sequences in the seroreactive peptide set (using PfEMP1 peptides from only PF3D7 and PFIT genomes instead of 7 genomes) resulted in PfEMP1 sharing motifs with 32 seroreactive proteins instead of 57. This suggests that the extent of sharing for PfEMP1 observed in this study may only be a small fraction of that occurring in the extensive natural diversity of PfEMP1 variants in circulating parasites.

Outside the main network driven by PfEMP1, 495 seroreactive proteins were also found to be highly connected to each other through motif sharing (Figure 6—figure supplement 5a). A large proportion of proteins with high seropositivity were connected (80% and 58% of proteins with >30% and 10–30% seropositivity, respectively). This included proteins like GARP, LSA3, Pf332, Pf11-1, and MESA (Figure 6b, Figure 6—figure supplement 5b). As observed for the full set of inter protein motifs, motifs shared by the subset of proteins with >30% seropositivity also consisted predominantly of charged glutamate, lysine, asparagine, and aspartate residues (Figure 6—figure supplement 5b). Since the analysis used here to identify inter-protein motifs allowed only up to two conservative substitutions in 7-mer motifs, the similarity of motifs in the network in Figure 6—figure supplement 5b suggests that with a less stringent threshold of identifying motifs, these proteins would be even more highly connected. Moreover, 80% of proteins in this network had reported expression in the asexual blood stage of the lifecycle of P. falciparum (PlasmoDB), suggesting temporally concordant presence of proteins sharing motifs within their seroreactive regions.

These results indicate that the interprotein motifs are strongly associated with seroreactivity and are extensively shared across seroreactive proteins, including among regions highly targeted by the antibodies. Furthermore, PfEMP1 shares motifs with the greatest number of other seroreactive proteins, partly driven by the sequence diversity of PfEMP1 variants.

Discussion

Using a customized programmable phage display (PhIP-seq) library, we have evaluated the proteome-wide antigenic landscape of the malaria parasite P. falciparum, using the sera of 198 individuals living in two distinct malaria endemic areas. This approach readily identified previously known antigens, including proteins that are targets of protective antibodies, as well as novel antigens. In our study, we characterized features of P. falciparum antigens that could potentially contribute to the inefficient acquisition and maintenance of immunity to malaria. Repeat elements were found to be commonly targeted by antibodies, and had patterns of seropositivity that were less durable and more dependent on exposure than non-repeat peptides. Furthermore, extensive sharing of motifs associated with seroreactivity was observed among hundreds of parasite proteins, indicating potential for extensive cross-reactivity among antigens in P. falciparum. These data suggest that repeat elements– a common feature of the P. falciparum proteome, and shared motifs between antigenic proteins could have important roles in shaping the nature and development of the immune response to malaria.

To map the antigenic landscape, PhIP-seq for P. falciparum offers several attractive advantages. The library described here contains >99.5% of the proteome, including variants for several antigenic families, surpassing the coverage of other existing proteome-wide tools for P. falciparum (Camponovo et al., 2020; Morita et al., 2017), while simultaneously providing high-resolution characterization of antigens (up to 12–13 aa regions within peptides). Unlike peptide arrays, the platform converts a proteomic assay into a genomic assay, leveraging the massive scale and low-cost nature of next-generation short-read sequencing. The result is a cost-effective and scalable system, allowing for the processing of hundreds of samples in parallel. Finally, an important aspect of all phage display systems is the ability to sequentially enrich, release phage, and repeat, thus greatly amplifying the signal to noise (O’Donovan et al., 2020; Smith and Petrenko, 1997). Only one published study to date evaluates responses to more than a quarter of the proteome (Camponovo et al., 2020), inherently limiting the scope of potential targets interrogated. Twenty-eight percent of the hits described in this study of individuals living in Tanzania and exposed to various doses of PfSpz vaccine overlapped with the hit proteins described in our study (Camponovo et al., 2020). The limited overlap may be due to multiple factors, including differences in the characteristics of individuals sampled, the use of vaccine, and determination of seropositivity based on technical as opposed to biological controls (sera from unexposed individuals).

The near-epitope resolution provided by this platform allowed a systematic investigation of targets of antibodies. Targets with high seropositivity were observed to be significantly enriched for repeat elements. In some previous reports, the elevated antigenic potential of repeat elements has been noted (Davies et al., 2017); however, the proteome-wide approach described here demonstrates that a large collection of proteins containing these elements are highly immunogenic. The high immunogenicity of repeat sequences observed here may be the result of competitive advantages that B cell clones could encounter when binding to higher valency epitopes, as opposed to single copy epitopes. Evidence from experimental inoculations of antigens with differing repeat numbers support this notion (Kato et al., 2020). Moreover, tandem repeat regions are predicted to be intrinsically disordered, which in turn have favorable predictions as linear B cell epitopes (Guy et al., 2015). Notably, this high immunogenicity can potentially restrict responses to other epitopes within the antigen, as has been reported for responses to protective non-repeat epitopes in the circumsporozoite protein (CSP; Chatterjee et al., 2021).

A key finding of this study is the difference in seroreactivity to repeat-containing peptides in children vs. other peptides, with the breadth of seroreactivity to repeat-containing peptides increasing more quickly with age in the high versus moderate exposure setting and decreasing with increase in time since infection. These results suggest that antibody responses to repeats are more likely to be exposure-dependent and/or short-lived in children than responses to non-repeats. There were a few exceptions, including repeats from FIRA, PHISTc, and LSA3, that did not show an exposure-setting dependent difference in seropositivity, suggesting that factors beyond the repeated nature of the epitope influence the nature of the response. Regardless, the predominance of repeat containing peptides in antibody targets, along with the remarkable abundance of these peptides in the P. falciparum proteome, suggests a possible strategy evolved by the parasite for the purpose of diverting the humoral response towards short-lived or exposure-dependent responses.

The hypothesis of less durable antibody responses to repeat antigens in P. falciparum can be reconciled with a model in which repeating epitopes favor extrafollicular B cell responses, which are typically short-lived (Cockburn and Seder, 2018; Schofield, 1991). This is based on the potential of repeat epitopes in an antigen to interact with multiple BCRs on naive B-cells, thereby conferring high binding strength and sufficient activation to direct these cells into an early extrafollicular response and production of short-lived plasmablasts. Several studies provide support to this model, where strong binding of BCR to the antigen, including through increased valency, increases the production of plasmablasts (Kato et al., 2020; O’Connor et al., 2006; Ochiai et al., 2013; Paus et al., 2006; Schwickert et al., 2011). This could also happen via a T-cell-independent response, as has been reported for some repeat antigenic structures (Schofield, 1991; Schofield and Uadia, 1990). On the other hand, the effect on germinal centers (GC), which result in long-lived plasma cells (LLPCs) and isotype-switched memory cells, is unclear. While some studies have reported no change or a decrease in the formation of GCs (O’Connor et al., 2006; Ochiai et al., 2013; Paus et al., 2006) with increased strength of interaction between antigen and B-cells, some have reported an increase (Kato et al., 2020; Schwickert et al., 2011), although it is unclear whether the latter were productive GCs. More insights come from a few studies that measured the outcome of GCs following increased strength of BCR-antigen interaction and these have reported a decrease in LLPCs (Fink et al., 2007) and IgG-switched memory cells (Pape et al., 2018; Taylor et al., 2015). If repeat antigens in P. falciparum follow this pattern, an expected outcome would be defective formation of LLPCs and memory B cells.

Another major finding of this study is the extensive presence of inter-protein motifs among seroreactive proteins. Since a strong association with seroreactivity was observed for these motifs and there was evidence of shared reactivity among peptides containing the motif from different proteins, they may potentially represent cross-reactive epitopes. Definitive confirmation of cross-reactivity could be demonstrated with targeted studies using monoclonal antibodies. Furthermore, whether these inter-protein motifs are cross-reactive in vivo is unclear and may depend on expression timing and accessibility to the immune system, among other factors.

Analogously, seroreactive repeat elements with non-identical repeating units could represent cross-reactive epitopes within proteins. Extensive presence of potential cross-reactive epitopes in P. falciparum antigens may play an important role in influencing the quality of the immune response to malaria. While it could be advantageous for the host if multiple parasite proteins could be targeted by antibodies through cross-reactivity, simultaneous presence of cross-reactive epitopes could alternatively frustrate the affinity maturation process due to conflicting selection forces, as was observed for variant HIV antigens (Wang et al., 2015). Further, recurrent exposure may be necessary for the generation of strong cross-binding antibodies to cross-reactive epitopes (Murugan et al., 2020). Thus, the extensive presence of cross-reactive epitopes, both within and between antigenic proteins in P. falciparum, could represent an evolutionary strategy aimed at limiting high-affinity antibodies in favor of lower affinity, cross reactive antibodies. In essence, the large number of shared seroreactive sequences in P. falciparum may represent a complex immune counter measure, resulting in inefficient immunity acquisition which requires extensive exposure. The atlas of seroreactive repeat elements and inter-protein motifs from this study will be useful for future investigations in understanding their impact on the quality of immune response to malaria.

The PfEMP1 family shared inter-protein motifs with the greatest number of other antigens in this study. This was driven in part by the wide diversity of PfEMP1 variants, indicating that as one becomes naturally exposed to different PfEMP1 variants (Cham et al., 2009), there may be an increase in not only the sequence diversity, but the number of cross-reactive epitopes that the immune system encounters. Possessing cross-reactive epitopes with other antigens could result in binding of pre-existing antibodies to the new variants, which could be disadvantageous to the host if binding strength is weak. Further, cross-reactivity may inhibit generation of antibodies specific to the new variant due to original antigenic sin (Vatti et al., 2017). Thus, the mechanism of evolving PfEMP1 variants within a network of shared sequences with other antigens could be another strategy evolved by the parasite for immune evasion. On the other hand, binding of new variants to pre-existing antibodies may be advantageous to the host if those antibodies are effective against the new variants.

While phage display of small peptides yields high-resolution discrimination of linear epitopes, this approach may not capture antibodies binding to conformational epitopes. Therefore, such epitopes are likely to be missed by this assay, although polyclonal responses are frequently a mixture of linear, partially linear, and conformational epitopes. Reassuringly, we observed a large-scale enrichment of P. falciparum peptide sequences in exposed individuals when compared with control sera from the US. This suggests that the humoral immune system of exposed individuals acquires an extensive and diverse set of P. falciparum targets, including thousands of linear sequences. The bias towards linear epitopes may have increased the relative detectability of repeat regions by this assay since they often form intrinsically disordered regions. However, that would not account for the observed differences between exposure settings for children and adults. Furthermore, antibodies that depend on epitopes with disulfide linkages and post-translational modifications for binding would likely not be enriched using phage-presented peptides. Another limitation of our study is that the PhIP-seq technique does not inherently provide quantitative measures of antibody affinity or titer, as many factors influence the actual number of reads recovered after immunoprecipitation, such as starting copy number in the library and non-specific interactions with beads. Instead, our analysis relies on per peptide relative (ratiometric) enrichments, using non-malaria control (n=86) sera as the basis for comparison, which also serves to remove non-specific enrichments. We imposed a stringent filter to minimize false positives by requiring that each seroreactive peptide be enriched in at least five Ugandan samples over control sera. While this excluded possible seroreactive peptides unique to a single sample, the resulting sequences that passed were those that exhibited a minimum level of sharing among multiple individuals, thereby enriching for those seroreactive peptides that represent common serological responses to malaria.

Findings from this study could have important implications on malaria vaccine design. Results from our study suggest that that in natural infections in children, repeat regions in P. falciparum could lead to an exposure-dependent and/or short-lived antibody response to a higher degree than for non-repeat regions. While we recognize that vaccine induced immunity is distinct from naturally acquired immunity, this potential limitation should be considered when evaluating repeat-containing antigens as vaccine targets. Further, given that highly immunogenic regions in natural immunity to malaria are predominantly repeats and there is widespread presence of potential cross-reactive epitopes across many proteins, whole-parasite vaccines may also be susceptible to similar limitations. If the findings from this study translate to vaccine-induce immune responses, non-repetitive, unique antigenic regions may be more effective targets.

Materials and methods

Key resources table.

Reagent type (species) or resource Designation Source or reference Identifiers Additional information
Strain, strain background (E. coli) BLT5403 Novagen/EMD Millipore, T7 Select Kit Cat# 70550–3
Strain, strain background (T7 Bacteriophage) T7 vector arms, Packaging extract Novagen/EMD Millipore, T7 Select Kit Cat# 70550–3
Genetic reagent (T7 Bacteriophage library) Falciparome Made in this study See Materials and Methods
Biological sample (Humans) Ugandan cohort plasma Kamya et al., 2015, Rek et al., 2016; Yeka et al., 2015
Biological sample (Humans) US control plasma New York Blood Center
Antibody Anti-Glial Fibrillary Associated Protein (rabbit, polyclonal) Agilent Cat# Z033429-2 1 ug used
Peptide, recombinant protein Protein A conjugated magnetic beads Invitrogen/Thermo Fisher Sci Cat# 10008D
Peptide, recombinant protein Protein G conjugated magnetic beads Invitrogen/Thermo Fisher Sci Cat# 10009D
Peptide, recombinant protein BSA Fraction V Sigma-Aldrich Cat# 10735094001
Peptide, recombinant protein T4 ligase New England Bio Cat# M0202S
Peptide, recombinant protein Phusion DNA Polymerase New England Bio Cat# M0530L
Commercial assay or kit T7 Select 10-3b Cloning kit EMD Millipore Cat# 70550–3
Commercial assay or kit Ampure XP Beads Beckman Coulter Cat# A63881
Software, algorithm CD-HIT Fu et al., 2012; Li and Godzik, 2006 http://weizhongli-lab.org/cd-hit/
Software, algorithm numpy Open Source https://doi.org/10.1109/MCSE.2011.37
Software, algorithm scipy Open Source https://www.nature.com/articles/s41592-019-0686-2
Software, algorithm Matplotlib Open Source https://ieeeexplore.ieee.org/document/4160265
Software, algorithm Cutadapt Martin, 2011 https://cutadapt.readthedocs.io/en/stable/
Software, algorithm Cytoscape Shannon et al., 2003 https://cytoscape.org

Ethical approval

The study protocol was reviewed and approved by the Makerere University School of Medicine Research and Ethics Committee (Identification numbers 2011–149 and 2011–167), the London School of Hygiene and Tropical Medicine Ethics Committee (Identification numbers 5943 and 5944), the University of California, San Francisco, Committee on Human Research (Identification numbers 11–05539 and 11–05995) and the Uganda National Council for Science and Technology (Identification numbers HS-978 and HS-1019). Written informed consent was obtained from all participants in the study. For children, this was obtained from the parents or guardians.

Study sites and participants

Plasma samples for the study were obtained from the Kanungu and Tororo sites of the Program for Resistance, Immunology, and Surveillance of Malaria (PRISM) cohort studies, part of the East African International Centers of Excellence in Malaria Research (Kamya et al., 2015). Kihihi sub-county in Kanungu district is a rural highland area in southwestern Uganda characterized by moderate transmission; samples used from this region were collected between 2012 and 2016. Nagongera sub-county in Tororo district is a rural area in southeastern Uganda with high transmission and samples used from this region were collected between Aug and Sep 2012. Samples from Tororo were restricted to individuals with fewer than six malaria cases per year to exclude individuals with very high incidence. Entomological inoculation rates (EIR) used in the study were calculated for each household based on entomological surveys involving collection of mosquitoes with CDC light traps and quantifying the number of P. falciparum-containing female anopheles mosquitoes along with sporozoite rates (Kilama et al., 2014). These cohorts and study sites have been described extensively in prior publications Helb et al., 2015; Kamya et al., 2015; Rek et al., 2016; Yeka et al., 2015; briefly, follow up consisted of continuous passive surveillance for malaria at a study clinic open 7 days a week where all routine medical care was provided, routine active surveillance for parasitemia, and routine entomologic surveillance. One plasma sample was selected from each of 100 participants, stratified by age, from each of the two cohorts. The 86 US controls were de-identified plasma obtained from adults who donated blood to the New York Blood Center.

Bioinformatic construction of Falciparome Phage Library

The pipeline for library construction is shown in Figure 1—figure supplement 1. To construct the library, raw protein sequence files were downloaded from their respective public databases. Coding sequences from 3D7 and IT strains were downloaded from PlasmoDB (Amos et al., 2022) and vaccine/viral sequences were downloaded from the RefSeq database (O’Leary et al., 2016). Antigenic variant sequences were curated from multiple sources. The entire collection of protein sequences used as input for designing the peptides in the study can be found in the Dryad dataset doi:10.7272/Q69S1P9G. Pseudogenes were removed and any remaining stop codons within coding sequences were replaced with Alanine residues. These sequences were combined and filtered using CD-HIT (Fu et al., 2012; Li and Godzik, 2006) to remove sequences with >x% identity, where the threshold X used varied for different sets of sequences are in Table 2.

The final set of protein sequences (n=8980) was then merged and short sequences (<30 aa long) were removed prior to collapsing at 100% sequence identity (n=8534). Next, all sequences were split into 62-amino acid peptide fragments with 25-amino acid step size. Fragments with homopolymer runs of ≥ 8 exact amino acid matches in a row were removed, X amino acids were substituted to Alanine and Z amino acids (Glutamic acid or Glutamine) to Q (Glutamine), and finally, lzw compression was used to identify and remove low-complexity sequences with a compression ratio less than 0.4. Lastly, sequence headers were renamed to remove spaces and the resulting peptide fragments were converted to nucleotide sequences. Adapter sequences were appended, with a library-specific linker on the 5’ end (GTGGTTGGTGCTGTAGGAGCA) and a 3’ linker sequence coding for two stop codons and a 17mer (-TGATAA- GCATATGCCATGGCCTC). This file was then iteratively scanned for restriction enzyme sites (EcoRI, HindIII), which were eliminated by replacement with synonymous codons to facilitate cloning. The final set of nucleotide sequences was collapsed at 100% nucleotide sequence identity (n=238,068) and then ordered from Agilent Technologies.

Cloning and packaging into T7 phage

A single vial of lyophilized DNA was received from Agilent. The lyophilized oligonucleotides were resuspended in 10 mM Tris–HCl-1 mM EDTA, pH 8.0 to a final concentration of 0.2 nM and PCR amplified for cloning into T7 phage vector arms (Novagen/EMD Millipore Inc T7 Select 10–3 Cloning kit). Detailed protocol can be found in 10.17504/protocols.io.j8nlkkrr5l5r/v1. Four 30 µl packaging reactions were performed and all were pooled in the end. Plaque assays were done with the packaging reaction to determine the titer of infectious phage in the packaging reaction and estimated to be 2x108 pfu/ml. Phage libraries were then prepared and amplified fresh from packaging reactions. Resulting phage libraries were tittered by plaque assay and adjusted to a working concentration of 1010 pfu/mL before incubation with patient plasma.

Immunoprecipitation of antibody-bound phage

Plasma samples were first diluted in 1:1 storage buffer (0.04% NaN3, 40% Glycerol, 40 mM HEPES (pH 7.3), 1 x PBS (-Ca and –Mg)) to preserve antibody integrity. Then, a 1:2.5 x dilution of that stock was made in 1 x PBS resulting in a final 1:5 dilution and 1 µl of this was used in PhIP-seq. The protocol was followed as in 10.17504/protocols.io.j8nlkkrr5l5r/v1. Forty µl of Pierce Protein A/G Bead slurry (ThermoFisher Scientific) were used per sample. After round 1 of IP, the eluted phage were amplified in E. coli and enriched through a second round of IP. The final lysate was spun and stored at –20 °C for NGS library prep. Immunoprecipitated phage lysate was heated to 70 °C for 15 min to expose DNA. DNA was then amplified in two subsequent reactions. All samples had a minimum of two technical replicates.

Bioinformatic analysis of PhIP-Seq data

Identification of seroreactive peptides

Sequencing reads were first trimmed to cut out adaptors with Cutadapt (Martin, 2011):

Trimmed reads were then aligned to the full Falciparome peptide library using GSNAP (Wu and Nacu, 2010) paired end alignment, outputting a SAM file:

For each aligned sequence, the CIGAR string was examined, and all alignments where the CIGAR string did not indicate a perfect match were removed. The final set of peptides was tabulated to generate counts for each peptide in each individual sample. Samples with less than 250,000 aligned reads were dropped from further analysis and any resulting samples with only one technical replicate were also dropped (2 of the 200 Ugandan samples were dropped). To keep the analysis restricted to P. falciparum peptides and limit the influence from non-P. falciparum peptides, reads mapping to all vaccine, viral and experimental control peptides were excluded from analysis. The remaining peptide counts were normalized for read depth and multiplied by 500,000, resulting in reads/500,000 total reads (RP5K) for each peptide. The null distribution for each peptide was modeled using read counts from a set of 86 plasma from the US (New York Blood Center) using a normal distribution, with the assumption that most of these individuals were likely unexposed to malaria. To avoid inflation by division, if the standard deviation of read counts of any peptide in the US samples was <1, then that was set as 1. Z-score enrichments ((x-mean US)/std. dev US) were then calculated for each peptide in each sample using the US distribution and Z-score ≥ 3 in both technical replicates (or more than 75% of the replicates if there were more than two technical replicates) of a sample was used to identify enriched peptides within a given sample. To call malaria-specific peptide enrichments (‘seroreactive peptides’), enrichment was required in a minimum of five Ugandan samples. Seropositivity for a peptide was calculated by the percent of Ugandan samples enriching for that peptide. Seropositivity for a protein was calculated by the percent of Ugandan samples enriching for any peptide within that protein.

Calculation of breadth of non-redundant peptide groups per person

Seroreactive peptides in each person were collapsed based on shared sequences (7-mer identical motifs) using the network approach described in AVARDA (Monaco et al., 2021) to get a conservative estimate of the number of non-redundant peptide groups in each person.

Calculation of VSA breadth per person

VSA breadth was calculated as the number of variant proteins in each VSA family that were seroreactive in a given person and was calculated as follows. Since all these families possess conserved as well as variable regions, during library design, peptides across conserved regions from many variants that share identical sequences were filtered out to avoid redundant representation and only one representative peptide was retained in the final Falciparome library. Therefore, to accurately calculate the number of VSA proteins recognized in a person, all seroreactive VSA peptides were mapped back to the sequences from the full collection of VSA protein sequences to identify all the variant proteins each seroreactive peptide sequence mapped to. This information was then used to get the number of variant proteins reactive to a person’s plasma. Domain classification for PfEMP1s was done using the VarDom server (Rask et al., 2010). Domain classification for RIFINs was done based on Joannin et al., 2008.

Repeat analysis

Only unique 3D7/IT proteins in the library (if both 3D7 and IT homologs were present in the library, only the 3D7 homolog was considered) that were not members of Variant Surface Antigens (PfEMP1, RIFIN, STEVOR, SURFIN, Pfmc-2TM) were considered for all repeat analysis to avoid redundancy of representation.

Cumulative repeat frequency in proteins - For calculation of cumulative repeat frequency in proteins, amino acid sequences of proteins were input into the RADAR (Madeira et al., 2019) program for denovo identification of repeats using default settings. Cumulative frequency of repeats in a protein was then determined by adding the repeat counts of all reported repeats in the protein. To compare to the non-seroreactive set, the same number of proteins as the seroreactive protein set was randomly sampled from the total non-seroreactive protein set 1000 times and the distribution of cumulative frequencies between the seroreactive and non-seroreactive sets were compared using a 2-sample KS-test in each iteration.

Repeat index calculation - To systematically compare the distribution of repeats between seroreactive and non-seroreactive peptides within seroreactive proteins, the following approach was adopted. Firstly, for each protein, repeats and their frequency within that protein was calculated using a k-mer approach. K-mers were fixed length sequences (6/7/8/9-aa) with any number of conservative substitutions (AG, DE, RHK, ST, NQ, LVI, YFW) and did not include polymeric stretches of single amino acids from N/Q/D/E/R/H/K. For each protein sequence, all possible kmers in the protein and their frequency (number of non-overlapping occurrences) in the protein (intra-protein repeat frequency) was calculated. Then for each peptide in the protein, all k-mers in the peptide sequence were taken and the k-mer with the highest intra-protein repeat frequency was identified. This frequency was assigned as the repeat index for the peptide. Once all peptides across all seroreactive proteins were assigned a repeat index, they were subsequently classified according to seropositivity. In each seropositivity group, since peptides from the same protein could have the same highest intra-protein repeat k-mer, to avoid redundancy of representation, peptides sharing the same highest k-mer were collapsed and counted only once. For the non-seroreactive peptide set, random sampling of peptides from all non-seroreactive peptides was performed (1000 iterations). The 2-sample KS test was then used to compare distributions.

Inter-protein motif analysis

First, all motifs with wildcards (any amino acid allowed at that position) or conservative substitutions (AG, DE, RHK, ST, NQ, LVI, YFW), shared by at least two seroreactive peptides were identified using the SLiMFinder program (Edwards et al., 2007), a part of the SLiMSuite package. The following parameters were used for running the program with the seroreactive peptide sequences as input: teiresias = T efilter = F blastf = F masking = F ftmask = F imask = F compmask = F metmask = F slimlen = 7 absmin = 2 absminamb = 2 slimchance = F maxwild = 1 maxseq = 10,000 walltime = 240 minocc = 0.0002 ambocc = 0.0003 wildvar = False equiv = <txt file that lists the allowed conservative substitutions - AG, DE, RHK, ST, NQ, LVI, YFW>.

Following this, a custom script was used to parse motifs with desired length and degeneracy threshold and identify those enriched over background. First, motifs of length K with at least N fixed positions and allowed number of conservative substitutions and wildcards were filtered depending on the degeneracy thresholds used. Motifs with homopolymeric stretches of KKKKK/ NNNNN/ EEEEE were not considered as this is a common feature in the proteome of plasmodium. Then, for each motif, the number of seroreactive peptides possessing that motif was determined (frequency in the seroreactive set). Next, enrichment in the seroreactive set over background was estimated with the following approach. Random sampling was performed on the whole library to get the same number of random peptides as seroreactive peptides (n=9927) and the occurrence frequency of each motif was calculated in the random set each time. This was bootstrapped 1000 times and this represented the background frequency of the motifs in 1000 iterations. A p-value for enrichment in the seroreactive set was then calculated using a Poisson model for the background frequency distribution. Significantly enriched ones were then identified following multiple hypothesis correction (FDR of 0.1%). This set of motifs represented the final collection of significantly enriched motifs. From this set, those that were shared by at least two seroreactive proteins were identified as inter-protein motifs. Network visualizations were performed with Cytoscape (Shannon et al., 2003). For the analysis on PfEMP1 with random set of peptides, all PfEMP1 peptides from the seroreactive set (n=3001) were combined with random peptides (n=6926) to a total of 9927 peptides. This was treated as the ‘seroreactive’ set and a similar analysis was performed to identify significantly enriched motifs in this set.

Data availability

The data and sample metadata associated with this study can be accessed in the Dryad repository with the doi:10.7272/Q69S1P9G (https://datadryad.org/stash/share/YuYmQNKNvrWmoMX8n99wle_2bFyrtweAGclxYPHkPjY).

The code generated for the study is on GitHub and can be accessed at https://github.com/madhura-raghavan/phage-malaria-uganda200.git, (copy archived at swh:1:rev:b49a3ec15d86048dd570b340487cee139fbeb445; Raghavan, 2023).

Acknowledgements

We thank all study participants who participated in this study and their families. We thank the New York Blood Center for providing us with the de-identified human control plasma samples. We thank Caleigh Mandel-Brehm for advice and discussions on PhIP-seq. We thank members of the DeRisi and Greenhouse labs for helpful discussions.

Joseph L DeRisi - Chan Zuckerberg Biohub. Bryan Greenhouse - CZB Investigator program, NIH/NIAID awards A1089674 (East Africa ICEMR), AI119019, and AI144048. The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Bryan Greenhouse, Email: bryan.greenhouse@ucsf.edu.

Joseph L DeRisi, Email: joe@derisilab.ucsf.edu.

Urszula Krzych, Walter Reed Army Institute of Research, United States.

Dominique Soldati-Favre, University of Geneva, Switzerland.

Funding Information

This paper was supported by the following grants:

  • Chan Zuckerberg Biohub to Joseph L DeRisi.

  • National Institutes of Health A1089674 (East Africa ICEMR) to Bryan Greenhouse.

  • Chan Zuckerberg Biohub Investigator program to Bryan Greenhouse.

  • National Institutes of Health AI119019 to Bryan Greenhouse.

  • National Institutes of Health AI144048 to Bryan Greenhouse.

Additional information

Competing interests

No competing interests declared.

Reviewing editor, eLife.

Paid scientific advisor for Allen & Co. Paid scientific advisor for the Public Health Company, Inc and holds stock options. Founder and holding stock options in VeriPhi Health, Inc.

Author contributions

Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing.

Software, Methodology, Writing – review and editing.

Validation, Methodology.

Software, Formal analysis.

Resources, Writing – review and editing.

Software, Writing – review and editing.

Methodology.

Resources.

Resources.

Resources, Writing – review and editing.

Resources.

Conceptualization, Formal analysis, Investigation, Visualization, Writing – review and editing.

Conceptualization, Resources, Formal analysis, Supervision, Funding acquisition, Investigation, Visualization, Writing – original draft, Project administration, Writing – review and editing.

Conceptualization, Resources, Software, Formal analysis, Supervision, Funding acquisition, Investigation, Visualization, Methodology, Writing – original draft, Project administration, Writing – review and editing.

Ethics

The study protocol was reviewed and approved by the Makerere University School of Medicine Research and Ethics Committee (Identification numbers 2011-149 and 2011-167), the London School of Hygiene and Tropical Medicine Ethics Committee (Identification numbers 5943 and 5944), the University of California, San Francisco, Committee on Human Research (Identification numbers 11-05539 and 11-05995) and the Uganda National Council for Science and Technology (Identification numbers HS-978 and HS-1019). Written informed consent was obtained from all participants in the study. For children, this was obtained from the parents or guardians. The US control samples were from New York Blood Center and these samples came from volunteer blood donors who consented as follows, "I authorize NYBC to use or transfer my blood or portions of it for any purpose it deems appropriate, including transfusion, research, or commercial purposes.".

Additional files

Supplementary file 1. List of 9927 seroreactive peptides identified in this dataset with their sequences.
elife-81401-supp1.xlsx (985.1KB, xlsx)
Supplementary file 2. Top 40 proteins with highest seropositivity and associated literature.
elife-81401-supp2.zip (43.4KB, zip)
Supplementary file 3. List of top 100 proteins with highest seropositivity used for GO analysis.
elife-81401-supp3.zip (1.7KB, zip)
Supplementary file 4. Seropositivity rate (proportion of people seropositive) for all 9927 seroreactive peptides across different groups in the two exposure settings.
elife-81401-supp4.xls (3.8MB, xls)
Supplementary file 5. Seropositivity rate (proportion of people seropositive) for top repeat elements across different groups in the two exposure settings.
elife-81401-supp5.xlsx (13.1KB, xlsx)
Supplementary file 6. List of inter-protein motifs and the proteins sharing them.

Motifs reported here are 7-mers with at least 5 identical amino acids and up to two conservative substitutions (and no wildcards).

elife-81401-supp6.xlsx (192.4KB, xlsx)
Supplementary file 7. Table describing the number of interprotein motifs obtained with varied parameters for calling the motifs.
elife-81401-supp7.xlsx (10.3KB, xlsx)
Supplementary file 8. Gene network file for interprotein motifs (7-mers with at least 5 identical amino acids and up to two conservative substitutions (and no wildcards)).

Can be visualized on Cytoscape.

elife-81401-supp8.zip (13.6KB, zip)
MDAR checklist

Data availability

All data generated or analyzed during this study are included in the manuscript, supporting files and in the Dryad repository with the https://doi.org/10.7272/Q69S1P9G.

The following dataset was generated:

Raghavan M, Kalantar K, Duarte E, Teyssier N, Takahashi S, Kung A, Rajan J, Rek J, Tetteh K, Drakeley C, Ssewanyana I, Rodriguez-Barraquer I, Greenhouse B, DeRisi J. 2022. Proteome-wide antigenic profiling in Ugandan cohorts identifies associations between age, exposure intensity, and responses to repeat-containing antigens in Plasmodium falciparum. Dryad Digital Repository.

References

  1. Akbar R, Robert PA, Pavlović M, Jeliazkov JR, Snapkov I, Slabodkin A, Weber CR, Scheffer L, Miho E, Haff IH, Haug DTT, Lund-Johansen F, Safonova Y, Sandve GK, Greiff V. A compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding. Cell Reports. 2021;34:108856. doi: 10.1016/j.celrep.2021.108856. [DOI] [PubMed] [Google Scholar]
  2. Amos B, Aurrecoechea C, Barba M, Barreto A, Basenko EY, Bażant W, Belnap R, Blevins AS, Böhme U, Brestelli J, Brunk BP, Caddick M, Callan D, Campbell L, Christensen MB, Christophides GK, Crouch K, Davis K, DeBarry J, Doherty R, Duan Y, Dunn M, Falke D, Fisher S, Flicek P, Fox B, Gajria B, Giraldo-Calderón GI, Harb OS, Harper E, Hertz-Fowler C, Hickman MJ, Howington C, Hu S, Humphrey J, Iodice J, Jones A, Judkins J, Kelly SA, Kissinger JC, Kwon DK, Lamoureux K, Lawson D, Li W, Lies K, Lodha D, Long J, MacCallum RM, Maslen G, McDowell MA, Nabrzyski J, Roos DS, Rund SSC, Schulman SW, Shanmugasundram A, Sitnik V, Spruill D, Starns D, Stoeckert CJ, Tomko SS, Wang H, Warrenfeltz S, Wieck R, Wilkinson PA, Xu L, Zheng J. VEuPathDB: the eukaryotic pathogen, vector and host bioinformatics resource center. Nucleic Acids Research. 2022;50:D898–D911. doi: 10.1093/nar/gkab929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Anders RF. Multiple cross-reactivities amongst antigens of Plasmodium falciparum impair the development of protective immunity against malaria. Parasite Immunology. 1986;8:529–539. doi: 10.1111/j.1365-3024.1986.tb00867.x. [DOI] [PubMed] [Google Scholar]
  4. Arcà B, Lombardo F, Struchiner CJ, Ribeiro JMC. Anopheline salivary protein genes and gene families: An evolutionary overview after the whole genome sequence of sixteen anopheles species. BMC Genomics. 2017;18:153. doi: 10.1186/s12864-017-3579-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Barry AE, Trieu A, Fowkes FJI, Pablo J, Kalantari-Dehaghi M, Jasinskas A, Tan X, Kayala MA, Tavul L, Siba PM, Day KP, Baldi P, Felgner PL, Doolan DL. The stability and complexity of antibody responses to the major surface antigen of Plasmodium falciparum are associated with age in a malaria endemic area. Molecular & Cellular Proteomics. 2011;10:M111. doi: 10.1074/mcp.M111.008326. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Baum E, Badu K, Molina DM, Liang X, Felgner PL, Yan G. Protein microarray analysis of antibody responses to Plasmodium falciparum in western kenyan highland sites with differing transmission levels. PLOS ONE. 2013;8:e82246. doi: 10.1371/journal.pone.0082246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Buus S, Rockberg J, Forsström B, Nilsson P, Uhlen M, Schafer-Nielsen C. High-resolution mapping of linear antibody epitopes using ultrahigh-density peptide microarrays. Molecular & Cellular Proteomics. 2012;11:1790–1800. doi: 10.1074/mcp.M112.020800. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Camponovo F, Campo JJ, Le TQ, Oberai A, Hung C, Pablo JV, Teng AA, Liang X, Sim BKL, Jongo S, Abdulla S, Tanner M, Hoffman SL, Daubenberger C, Penny MA. Proteome-wide analysis of a malaria vaccine study reveals personalized humoral immune profiles in tanzanian adults. eLife. 2020;9:e53080. doi: 10.7554/eLife.53080. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Cham GKK, Turner L, Lusingu J, Vestergaard L, Mmbando BP, Kurtis JD, Jensen ATR, Salanti A, Lavstsen T, Theander TG. Sequential, ordered acquisition of antibodies to Plasmodium falciparum erythrocyte membrane protein 1 domains. Journal of Immunology. 2009;183:3356–3363. doi: 10.4049/jimmunol.0901331. [DOI] [PubMed] [Google Scholar]
  10. Chatterjee D, Lewis FJ, Sutton HJ, Kaczmarski JA, Gao X, Cai Y, McNamara HA, Jackson CJ, Cockburn IA. Avid binding by B cells to the plasmodium circumsporozoite protein repeat suppresses responses to protective subdominant epitopes. Cell Reports. 2021;35:108996. doi: 10.1016/j.celrep.2021.108996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Clinical Trials Partnership S. Efficacy and safety of RTS,S/AS01 malaria vaccine with or without a booster dose in infants and children in africa: final results of a phase 3, individually randomised, controlled trial. Lancet. 2015;386:31–45. doi: 10.1016/S0140-6736(15)60721-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Cockburn IA, Seder RA. Malaria prevention: From immunological concepts to effective vaccines and protective antibodies. Nature Immunology. 2018;19:1199–1211. doi: 10.1038/s41590-018-0228-6. [DOI] [PubMed] [Google Scholar]
  13. Crompton PD, Kayala MA, Traore B, Kayentao K, Ongoiba A, Weiss GE, Molina DM, Burk CR, Waisberg M, Jasinskas A, Tan X, Doumbo S, Doumtabe D, Kone Y, Narum DL, Liang X, Doumbo OK, Miller LH, Doolan DL, Baldi P, Felgner PL, Pierce SK. A prospective analysis of the AB response to Plasmodium falciparum before and after A malaria season by protein microarray. PNAS. 2010;107:6958–6963. doi: 10.1073/pnas.1001323107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Davies HM, Nofal SD, McLaughlin EJ, Osborne AR. Repetitive sequences in malaria parasite proteins. FEMS Microbiology Reviews. 2017;41:923–940. doi: 10.1093/femsre/fux046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Dent AE, Nakajima R, Liang L, Baum E, Moormann AM, Sumba PO, Vulule J, Babineau D, Randall A, Davies DH, Felgner PL, Kazura JW. Plasmodium falciparum protein microarray antibody profiles correlate with protection from symptomatic malaria in kenya. Journal of Infectious Diseases. 2015;212:1429–1438. doi: 10.1093/infdis/jiv224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Doolan DL, Dobaño C, Baird JK. Acquired immunity to malaria. Clinical Microbiology Reviews. 2009;22:13–36. doi: 10.1128/CMR.00025-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Edwards RJ, Davey NE, Shields DC, Thattai M. SLiMFinder: A probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins. PLOS ONE. 2007;2:e967. doi: 10.1371/journal.pone.0000967. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Feldmann M, Easten A. The relationship between antigenic structure and the requirement for thymus-derived cells in the immune response. The Journal of Experimental Medicine. 1971;134:103–119. doi: 10.1084/jem.134.1.103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Fink K, Manjarrez-Orduño N, Schildknecht A, Weber J, Senn BM, Zinkernagel RM, Hengartner H. B cell activation state-governed formation of germinal centers following viral infection. Journal of Immunology. 2007;179:5877–5885. doi: 10.4049/jimmunol.179.9.5877. [DOI] [PubMed] [Google Scholar]
  20. Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28:3150–3152. doi: 10.1093/bioinformatics/bts565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Giovannini D, Späth S, Lacroix C, Perazzi A, Bargieri D, Lagal V, Lebugle C, Combe A, Thiberge S, Baldacci P, Tardieux I, Ménard R. Independent roles of apical membrane antigen 1 and rhoptry neck proteins during host cell invasion by apicomplexa. Cell Host & Microbe. 2011;10:591–602. doi: 10.1016/j.chom.2011.10.012. [DOI] [PubMed] [Google Scholar]
  22. Guy AJ, Irani V, MacRaild CA, Anders RF, Norton RS, Beeson JG, Richards JS, Ramsland PA. Insights into the immunological properties of intrinsically disordered malaria proteins using proteome scale predictions. PLOS ONE. 2015;10:e0141729. doi: 10.1371/journal.pone.0141729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Helb DA, Tetteh KKA, Felgner PL, Skinner J, Hubbard A, Arinaitwe E, Mayanja-Kizza H, Ssewanyana I, Kamya MR, Beeson JG, Tappero J, Smith DL, Crompton PD, Rosenthal PJ, Dorsey G, Drakeley CJ, Greenhouse B, Rayner JC. Novel serologic biomarkers provide accurate estimates of recent Plasmodium falciparum exposure for individuals and communities. PNAS. 2015;112:E4438–E4447. doi: 10.1073/pnas.1501705112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Hou N, Jiang N, Ma Y, Zou Y, Piao X, Liu S, Chen Q. Low-complexity repetitive epitopes of Plasmodium falciparum are decoys for humoural immune responses. Frontiers in Immunology. 2020;11:610. doi: 10.3389/fimmu.2020.00610. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Jaenisch T, Heiss K, Fischer N, Geiger C, Bischoff FR, Moldenhauer G, Rychlewski L, Sié A, Coulibaly B, Seeberger PH, Wyrwicz LS, Breitling F, Loeffler FF. High-density peptide arrays help to identify linear immunogenic B-cell epitopes in individuals naturally exposed to malaria infection. Molecular & Cellular Proteomics. 2019;18:642–656. doi: 10.1074/mcp.RA118.000992. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Joannin N, Abhiman S, Sonnhammer EL, Wahlgren M. Sub-grouping and sub-functionalization of the RIFIN multi-copy protein family. BMC Genomics. 2008;9:19. doi: 10.1186/1471-2164-9-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Kamya MR, Arinaitwe E, Wanzira H, Katureebe A, Barusya C, Kigozi SP, Kilama M, Tatem AJ, Rosenthal PJ, Drakeley C, Lindsay SW, Staedke SG, Smith DL, Greenhouse B, Dorsey G. Malaria transmission, infection, and disease at three sites with varied transmission intensity in uganda: Implications for malaria control. The American Journal of Tropical Medicine and Hygiene. 2015;92:903–912. doi: 10.4269/ajtmh.14-0312. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Kato Y, Abbott RK, Freeman BL, Haupt S, Groschel B, Silva M, Menis S, Irvine DJ, Schief WR, Crotty S. Multifaceted effects of antigen valency on B cell response composition and differentiation in vivo. Immunity. 2020;53:548–563. doi: 10.1016/j.immuni.2020.08.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Kazmin D, Nakaya HI, Lee EK, Johnson MJ, van der Most R, van den Berg RA, Ballou WR, Jongert E, Wille-Reece U, Ockenhouse C, Aderem A, Zak DE, Sadoff J, Hendriks J, Wrammert J, Ahmed R, Pulendran B. Systems analysis of protective immune responses to RTS, S malaria vaccination in humans. PNAS. 2017;114:2425–2430. doi: 10.1073/pnas.1621489114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Kilama M, Smith DL, Hutchinson R, Kigozi R, Yeka A, Lavoy G, Kamya MR, Staedke SG, Donnelly MJ, Drakeley C, Greenhouse B, Dorsey G, Lindsay SW. Estimating the annual entomological inoculation rate for Plasmodium falciparum transmitted by anopheles gambiae s.l. using three sampling methods in three sites in uganda. Malaria Journal. 2014;13:1–13. doi: 10.1186/1475-2875-13-111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Larman HB, Zhao Z, Laserson U, Li MZ, Ciccia A, Gakidis MAM, Church GM, Kesari S, Leproust EM, Solimini NL, Elledge SJ. Autoantigen discovery with a synthetic human peptidome. Nature Biotechnology. 2011;29:535–541. doi: 10.1038/nbt.1856. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Li W, Godzik A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22:1658–1659. doi: 10.1093/bioinformatics/btl158. [DOI] [PubMed] [Google Scholar]
  33. Madeira F, Park YM, Lee J, Buso N, Gur T, Madhusoodanan N, Basutkar P, Tivey ARN, Potter SC, Finn RD, Lopez R. The EMBL-EBI search and sequence analysis tools apis in 2019. Nucleic Acids Research. 2019;47:W636–W641. doi: 10.1093/nar/gkz268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Mandel-Brehm C, Dubey D, Kryzer TJ, O’Donovan BD, Tran B, Vazquez SE, Sample HA, Zorn KC, Khan LM, Bledsoe IO, McKeon A, Pleasure SJ, Lennon VA, DeRisi JL, Wilson MR, Pittock SJ. Kelch-like protein 11 antibodies in seminoma-associated paraneoplastic encephalitis. The New England Journal of Medicine. 2019;381:47–54. doi: 10.1056/NEJMoa1816721. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.Journal. 2011;17:10. doi: 10.14806/ej.17.1.200. [DOI] [Google Scholar]
  36. Mattei D, Berzins K, Wahlgren M, Udomsangpetch R, Perlmann P, Griesser HW, Scherf A, Müller-Hill B, Bonnefoy S, Guillotte M. Cross-reactive antigenic determinants present on different Plasmodium falciparum blood-stage antigens. Parasite Immunology. 1989;11:15–29. doi: 10.1111/j.1365-3024.1989.tb00645.x. [DOI] [PubMed] [Google Scholar]
  37. Monaco DR, Kottapalli SV, Breitwieser FP, Anderson DE, Wijaya L, Tan K, Chia WN, Kammers K, Caturegli P, Waugh K, Roederer M, Petri M, Goldman DW, Rewers M, Wang LF, Larman HB. Deconvoluting virome-wide antibody epitope reactivity profiles. EBioMedicine. 2021;75:103747. doi: 10.1016/j.ebiom.2021.103747. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Morahan BJ, Sallmann GB, Huestis R, Dubljevic V, Waller KL. Plasmodium falciparum: genetic and immunogenic characterisation of the rhoptry neck protein pfron4. Experimental Parasitology. 2009;122:280–288. doi: 10.1016/j.exppara.2009.04.013. [DOI] [PubMed] [Google Scholar]
  39. Morita M, Takashima E, Ito D, Miura K, Thongkukiatkul A, Diouf A, Fairhurst RM, Diakite M, Long CA, Torii M, Tsuboi T. Immunoscreening of Plasmodium falciparum proteins expressed in a wheat germ cell-free system reveals a novel malaria vaccine candidate. Scientific Reports. 2017;7:46086. doi: 10.1038/srep46086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Murugan R, Scally SW, Costa G, Mustafa G, Thai E, Decker T, Bosch A, Prieto K, Levashina EA, Julien JP, Wardemann H. Evolution of protective human antibodies against Plasmodium falciparum circumsporozoite protein repeat motifs. Nature Medicine. 2020;26:1135–1145. doi: 10.1038/s41591-020-0881-9. [DOI] [PubMed] [Google Scholar]
  41. Nagaoka H, Kanoi BN, Morita M, Nakata T, Palacpac NMQ, Egwang TG, Horii T, Tsuboi T, Takashima E. Characterization of a Plasmodium falciparum phistc protein, pf3d7_0801000, in blood- stage malaria parasites. Parasitology International. 2021;80:102240. doi: 10.1016/j.parint.2020.102240. [DOI] [PubMed] [Google Scholar]
  42. Niang M, Bei AK, Madnani KG, Pelly S, Dankwa S, Kanjee U, Gunalan K, Amaladoss A, Yeo KP, Bob NS, Malleret B, Duraisingh MT, Preiser PR. STEVOR is a Plasmodium falciparum erythrocyte binding protein that mediates merozoite invasion and rosetting. Cell Host & Microbe. 2014;16:81–93. doi: 10.1016/j.chom.2014.06.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Ochiai K, Maienschein-Cline M, Simonetti G, Chen J, Rosenthal R, Brink R, Chong AS, Klein U, Dinner AR, Singh H, Sciammas R. Transcriptional regulation of germinal center B and plasma cell fates by dynamical control of IRF4. Immunity. 2013;38:918–929. doi: 10.1016/j.immuni.2013.04.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. O’Connor BP, Vogel LA, Zhang W, Loo W, Shnider D, Lind EF, Ratliff M, Noelle RJ, Erickson LD. Imprinting the fate of antigen-reactive B cells through the affinity of the B cell receptor. The Journal of Immunology. 2006;177:7723–7732. doi: 10.4049/jimmunol.177.11.7723. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. O’Donovan B, Mandel-Brehm C, Vazquez SE, Liu J, Parent AV, Anderson MS, Kassimatis T, Zekeridou A, Hauser SL, Pittock SJ, Chow E, Wilson MR, DeRisi JL. High-resolution epitope mapping of anti-hu and anti-yo autoimmunity by programmable phage display. Brain Communications. 2020;2:fcaa059. doi: 10.1093/braincomms/fcaa059. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, Rajput B, Robbertse B, Smith-White B, Ako-Adjei D, Astashyn A, Badretdin A, Bao Y, Blinkova O, Brover V, Chetvernin V, Choi J, Cox E, Ermolaeva O, Farrell CM, Goldfarb T, Gupta T, Haft D, Hatcher E, Hlavina W, Joardar VS, Kodali VK, Li W, Maglott D, Masterson P, McGarvey KM, Murphy MR, O’Neill K, Pujar S, Rangwala SH, Rausch D, Riddick LD, Schoch C, Shkeda A, Storz SS, Sun H, Thibaud-Nissen F, Tolstoy I, Tully RE, Vatsan AR, Wallin C, Webb D, Wu W, Landrum MJ, Kimchi A, Tatusova T, DiCuccio M, Kitts P, Murphy TD, Pruitt KD. Reference sequence (refseq) database at NCBI: Current status, taxonomic expansion, and functional annotation. Nucleic Acids Research. 2016;44:D733–D745. doi: 10.1093/nar/gkv1189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Olotu A, Fegan G, Wambua J, Nyangweso G, Leach A, Lievens M, Kaslow DC, Njuguna P, Marsh K, Bejon P. Seven-year efficacy of RTS, S/AS01 malaria vaccine among young african children. The New England Journal of Medicine. 2016;374:2519–2529. doi: 10.1056/NEJMoa1515257. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Pape KA, Maul RW, Dileepan T, Paustian AS, Gearhart PJ, Jenkins MK. Naive B cells with high-avidity germline-encoded antigen receptors produce persistent igm+ and transient igg+ memory B cells. Immunity. 2018;48:1135–1143. doi: 10.1016/j.immuni.2018.04.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Paus D, Phan TG, Chan TD, Gardam S, Basten A, Brink R. Antigen recognition strength regulates the choice between extrafollicular plasma cell and germinal center B cell differentiation. The Journal of Experimental Medicine. 2006;203:1081–1091. doi: 10.1084/jem.20060087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Portugal S, Pierce SK, Crompton PD. Young lives lost as B cells falter: What we are learning about antibody responses in malaria. Journal of Immunology. 2013;190:3039–3046. doi: 10.4049/jimmunol.1203067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Raghavan M. Phage-malaria-uganda200. swh:1:rev:b49a3ec15d86048dd570b340487cee139fbeb445Software Heritage. 2023 https://archive.softwareheritage.org/swh:1:dir:b5eb627d24bae8bb5a30f89157db391dd95c080c;origin=https://github.com/madhura-raghavan/phage-malaria-uganda200;visit=swh:1:snp:905b76d7f2c528466e2faac9643cc288ab30115e;anchor=swh:1:rev:b49a3ec15d86048dd570b340487cee139fbeb445
  52. Rajan JV, McCracken M, Mandel-Brehm C, Gromowski G, Pollett S, Jarman R, DeRisi JL. Phage display demonstrates durable differences in serological profile by route of inoculation in primary infections of non-human primates with dengue virus 1. Scientific Reports. 2021;11:10823. doi: 10.1038/s41598-021-90318-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Rask TS, Hansen DA, Theander TG, Gorm Pedersen A, Lavstsen T. Plasmodium falciparum erythrocyte membrane protein 1 diversity in seven genomes -- divide and conquer. PLOS Computational Biology. 2010;6:e1000933. doi: 10.1371/journal.pcbi.1000933. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Reeder JC, Brown GV. Antigenic variation and immune evasion in Plasmodium falciparum malaria. Immunology and Cell Biology. 1996;74:546–554. doi: 10.1038/icb.1996.88. [DOI] [PubMed] [Google Scholar]
  55. Rek J, Katrak S, Obasi H, Nayebare P, Katureebe A, Kakande E, Arinaitwe E, Nankabirwa JI, Jagannathan P, Drakeley C, Staedke SG, Smith DL, Bousema T, Kamya M, Rosenthal PJ, Dorsey G, Greenhouse B. Characterizing microscopic and submicroscopic malaria parasitaemia at three sites with varied transmission intensity in uganda. Malaria Journal. 2016;15:470. doi: 10.1186/s12936-016-1519-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Rodriguez-Barraquer I, Arinaitwe E, Jagannathan P, Kamya MR, Rosenthal PJ, Rek J, Dorsey G, Nankabirwa J, Staedke SG, Kilama M, Drakeley C, Ssewanyana I, Smith DL, Greenhouse B. Quantification of anti-parasite and anti-disease immunity to malaria as a function of age and exposure. eLife. 2018;7:e35832. doi: 10.7554/eLife.35832. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Rubinstein ND, Mayrose I, Halperin D, Yekutieli D, Gershoni JM, Pupko T. Computational characterization of B-cell epitopes. Molecular Immunology. 2008;45:3477–3489. doi: 10.1016/j.molimm.2007.10.016. [DOI] [PubMed] [Google Scholar]
  58. Saito F, Hirayasu K, Satoh T, Wang CW, Lusingu J, Arimori T, Shida K, Palacpac NMQ, Itagaki S, Iwanaga S, Takashima E, Tsuboi T, Kohyama M, Suenaga T, Colonna M, Takagi J, Lavstsen T, Horii T, Arase H. Immune evasion of Plasmodium falciparum by RIFIN via inhibitory receptors. Nature. 2017;552:101–105. doi: 10.1038/nature24994. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Schofield L, Uadia P. Lack of IR gene control in the immune response to malaria. I. A thymus-independent antibody response to the repetitive surface protein of sporozoites. Journal of Immunology. 1990;144:2781–2788. [PubMed] [Google Scholar]
  60. Schofield L. On the function of repetitive domains in protein antigens of plasmodium and other eukaryotic parasites. Parasitology Today. 1991;7:99–105. doi: 10.1016/0169-4758(91)90166-L. [DOI] [PubMed] [Google Scholar]
  61. Schwickert TA, Victora GD, Fooksman DR, Kamphorst AO, Mugnier MR, Gitlin AD, Dustin ML, Nussenzweig MC. A dynamic T cell-limited checkpoint regulates affinity-dependent B cell entry into the germinal center. The Journal of Experimental Medicine. 2011;208:1243–1252. doi: 10.1084/jem.20102477. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Research. 2003;13:2498–2504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Smith GP, Petrenko VA. Phage display. Chemical Reviews. 1997;97:391–410. doi: 10.1021/cr960065d. [DOI] [PubMed] [Google Scholar]
  64. Tan J, Pieper K, Piccoli L, Abdi A, Perez MF, Geiger R, Tully CM, Jarrossay D, Maina Ndungu F, Wambua J, Bejon P, Fregni CS, Fernandez-Rodriguez B, Barbieri S, Bianchi S, Marsh K, Thathy V, Corti D, Sallusto F, Bull P, Lanzavecchia A. A LAIR1 insertion generates broadly reactive antibodies against malaria variant antigens. Nature. 2015;529:105–109. doi: 10.1038/nature16450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Taylor JJ, Pape KA, Steach HR, Jenkins MK. Humoral immunity. apoptosis and antigen affinity limit effector cell differentiation of a single naïve B cell. Science. 2015;347:784–787. doi: 10.1126/science.aaa1342. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Vatti A, Monsalve DM, Pacheco Y, Chang C, Anaya JM, Gershwin ME. Original antigenic sin: A comprehensive review. Journal of Autoimmunity. 2017;83:12–21. doi: 10.1016/j.jaut.2017.04.008. [DOI] [PubMed] [Google Scholar]
  67. Vazquez SE, Ferré EM, Scheel DW, Sunshine S, Miao B, Mandel-Brehm C, Quandt Z, Chan AY, Cheng M, German M, Lionakis M, DeRisi JL, Anderson MS. Identification of novel, clinically correlated autoantigens in the monogenic autoimmune syndrome aps1 by proteome-wide phip-seq. eLife. 2020;9:e55053. doi: 10.7554/eLife.55053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Wåhlin B, Sjölander A, Ahlborg N, Udomsangpetch R, Scherf A, Mattei D, Berzins K, Perlmann P. Involvement of pf155/RESA and cross-reactive antigens in Plasmodium falciparum merozoite invasion in vitro. Infection and Immunity. 1992;60:443–449. doi: 10.1128/iai.60.2.443-449.1992. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Wang S, Mata-Fink J, Kriegsman B, Hanson M, Irvine DJ, Eisen HN, Burton DR, Wittrup KD, Kardar M, Chakraborty AK. Manipulating the selection forces during affinity maturation to generate cross-reactive HIV antibodies. Cell. 2015;160:785–797. doi: 10.1016/j.cell.2015.01.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Wu TD, Nacu S. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics. 2010;26:873–881. doi: 10.1093/bioinformatics/btq057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Xie Y, Li X, Chai Y, Song H, Qi J, Gao GF. Structural basis of malarial parasite RIFIN-mediated immune escape against LAIR1. Cell Reports. 2021;36:109600. doi: 10.1016/j.celrep.2021.109600. [DOI] [PubMed] [Google Scholar]
  72. Yeka A, Nankabirwa J, Mpimbaza A, Kigozi R, Arinaitwe E, Drakeley C, Greenhouse B, Kamya MR, Dorsey G, Staedke SG. Factors associated with malaria parasitemia, anemia and serological responses in a spectrum of epidemiological settings in uganda. PLOS ONE. 2015;10:e0118901. doi: 10.1371/journal.pone.0118901. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Zamecnik CR, Rajan JV, Yamauchi KA, Mann SA, Loudermilk RP, Sowa GM, Zorn KC, Alvarenga BD, Gaebler C, Caskey M, Stone M, Norris PJ, Gu W, Chiu CY, Ng D, Byrnes JR, Zhou XX, Wells JA, Robbiani DF, Nussenzweig MC, DeRisi JL, Wilson MR. ReScan, a multiplex diagnostic pipeline, pans human sera for SARS-cov-2 antigens. Cell Reports. Medicine. 2020;1:100123. doi: 10.1016/j.xcrm.2020.100123. [DOI] [PMC free article] [PubMed] [Google Scholar]

Editor's evaluation

Urszula Krzych 1

Malaria immunity is complex, and this new platform, namely the phage display of Plasmodium falciparum proteome-wide peptides for profiling of antibody targets, provides a valuable addition to the toolkit for understanding humoral responses. The study, conducted using plasma from Ugandan children and adults, represents an important aspect of naturally acquired antibodies with sera-reactive responses to the intra-and inter-protein repeat regions. The revised version solidly supports the claims of the authors; it contains a reanalysis of cohort comparisons accounting for infection status, updated analyses of cross-reactive epitopes to account for within-individual effects, and it emphasizes the limitations in the conclusions.

Decision letter

Editor: Urszula Krzych1

Our editorial process produces two outputs: i) public reviews designed to be posted alongside the preprint for the benefit of readers; ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Proteome-wide antigenic profiling in Ugandan cohorts identifies associations between age, exposure intensity, and responses to repeat-containing antigens in Plasmodium falciparum" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Dominique Soldati-Favre as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1) Please conduct analyses of the data stratified by BS/PCR status.

2) Please include analyses of antibody data to confirm your hypothesis that "inter-protein motifs" do cross-react.

3) Please revise the manuscript according to the additional suggestion indicated by the numbered lines corresponding to the text in the manuscript.

Reviewer #1 (Recommendations for the authors):

Specific comments

Interprotein motifs as potential cross-reactive epitopes:

Line 59: "short motifs associated with seroreactivity were extensively shared among hundreds of antigens, potentially representing cross-reactive epitopes." Please present analyses showing whether reactivity to these shared motifs indeed correlates between different proteins within the same individual. One presumes so but this can be assessed with available data.

Lines 396-479 – an extended passage is dedicated to the analysis of interprotein motifs, and their association with seroreactive proteins. The authors should analyze whether seroreactivities to different peptides containing the same/similar interprotein motif correlate or not within individuals. This would test the authors' hypothesis on L 399 "If these motifs are a part of an epitope, then antibodies and B-cell receptors (BCR) specific to a motif can potentially cross-react with the motif variants in different proteins".

L 608: "Another limitation of our study is that it did not provide quantitative measures of absolute antibody reactivity to individual peptides per person." This is a bit confusing, since Figure 2 —figure supplement 2 shows that "Technical replicates are well correlated" for seroreactivity to individual peptides, indicating the assay is quantitative (but perhaps not for "antibody reactivity", although that is the underlying selective force?). In any case, it should be possible to compare RP5K between peptides that share the same interprotein motif, assuming that these motifs represent important B cell epitopes as the authors hypothesize.

Effects of concurrent parasitemia on antibody:

Line 61: "PfEMP1 shared motifs with the greatest number of other antigens". This echoes prior studies showing short-term responses against different (heterologous) PfEMP1 variants during/after infection, which is an important potential confounder in the analyses presented here. The authors should address/analyze this issue of concurrent malaria infection effects more thoroughly, otherwise, any comparison between the two cohorts is hard to interpret.

Line 174 – please clarify if there was a rationale for the different characteristics of the sample sets between the two sites – "the majority of Tororo samples were positive for infection", while those in Kanungu were "100 days after their last infection". This will bias any comparisons between the two cohorts. For example, Helb et al. previously observed, with sera from these same study sites, that "Pf-Specific Antibody Profiles Showed Decreased Responses with Increased Days Since Infection."

Line 254 – "children in the higher transmission setting had a significantly higher breadth than children in the moderate transmission setting." Here and elsewhere in the Results, the analyses could be confounded by concurrent parasitemia during sampling (which is more common in the high transmission cohort), since acute parasitemia has acute effects on antibody profiles including cross-reactive antibodies (for example, see https://pubmed.ncbi.nlm.nih.gov/15175751/). The authors should compare individuals with patent parasitemia (by microscopy), subpatent parasitemia (by PCR), and without parasitemia at the time of sample collection, to understand these potential confounding effects.

L 532-536: "The difference in seroreactivity to repeat-containing peptides observed here between the [high versus moderate transmission] settings could therefore emerge from two related mechanisms. In the first, the difference could be driven by a requirement for a minimum level of cumulative exposure to the target repeats to generate a robust response. In the second, the antibody response to repeats may be inherently less durable, leading to rapid waning in the absence of frequent exposure." Please test the possibility that these differences among children are simply related to concurrent parasitemia at the time of sample collection.

Platform limitations:

Line 228 – "none of the proteins expressed in the mosquito oocyst stage were identified as seroreactive" Kindly clarify this statement, since for example CSP is expressed during the oocyst stage.

Line 239 – "a known limitation of PhIP-seq – it detects predominantly linear epitopes". The passage including this statement may leave readers with the impression that the previous protein array studies cited here (which appear to have all used the same in vitro protein translation system) were generated using properly folded proteins with conformationally correct epitopes. However, this may be unlikely given the difficulty to express properly folded malaria antigens. Many of the antigens targeted by protective immunity are highly folded proteins and functional antibodies against these antigens are frequently conformation-dependent. As such, one might expect the PhIP-seq platform (and the other protein array platforms they cite) to perform poorly at studying such antibodies. The authors should clarify these issues a bit more as context for readers. The limitation of this PhIP approach may be indicated by "Figure 2 —figure supplement 7 Breadth of seroreactivity in the variable regions of RIFIN and PfEMP1" which shows modest reactivity of adult sera to peptides of PfEMP1, whereas prior studies have established that adult sera in Africa cross-react to many/all blood-stage parasite surface variants circulating in a community (eg, see Figure 1 in https://pubmed.ncbi.nlm.nih.gov/10882604/ ), presumably reflecting seroreactivity to the immunodominant PfEMP1 proteins.

Line 267 – "In the moderate transmission setting, adults had a significantly higher breadth of PfEMP1 variants recognized than children, suggesting an age and/or cumulative exposure dependent increase in PfEMP1 breadth in this setting, as previously observed in (Cham et al., 2009)." Please provide numbers and results of statistical analysis to substantiate this statement. This difference looking at the figure is not striking-it appears reasonably clear for CIDR a/b/g, but less so for DBL domains.

Line 274: "a decline in responses to variants as children develop into adults in this [high transmission] setting. This is consistent with observations from a previous study investigating antibody responses to PfEMP1 DBLa domains in Papua New Guinea (Barry et al., 2011)". While the results are consistent, it's worth noting that Barry used an array of in vitro translated proteins that may not be properly folded-this was never tested in Barry's study and hence would have similar limitations to the study here that uses only linear epitopes.

Figure 1: please state in the figure the total number of falciparum proteins and corresponding peptides in the library-the number 8980 stated here includes >1400 non-malaria proteins (and these were apparently excluded from analysis-line 725).

Reviewer #2 (Recommendations for the authors):

Overall, this paper was excellent. The writing was careful, and the experiment is exciting.

"Slow development of immunity" is a motivation described in the introduction for investigating repeat-containing elements. I would recommend adding some nuance to this description of natural immunity to avoid confusion for some readers. Immunity to severe disease in children (and in pregnancy) is acquired quite rapidly after only one or two infections. This is relevant here because it may differentiate the immunity to the types of antigens focused on in this paper from variant antigens and genes with completely different, but discrete numbers, of alleles-e.g., MSP1, EBA175, etc.

A potential major confounder in the interpretation of results indicating responses to repeat elements that are shared amongst multiple proteins is that the assay wouldn't differentiate antibodies derived from one or many of these loci. The conclusions about the importance of inter-protein repeats on immunity are based on the assumption that any signal in these regions is based on antibodies derived from those proteins, rather than Ab's to one region causing the immunoprecipitation of the others. This seems especially relevant where seroreactivity is discussed. Is it possible to look at regions adjacent to the inter-protein repeats to discern this? I understand this was done when investigating seroreactivity within proteins, but this would be to confirm signals from different proteins. If it's just that I'm missing something obvious, some discussion clarifying why this isn't a confounder would be a good addition.

I thought the focus on repeat regions was important and wonderful to see, but some comparison to polymorphic loci feels like a lost opportunity to add significant value. It would be particularly interesting to know if an immunity to repeats was anticorrelated within individuals to other loci, and how this compares across transmission intensity. For example, does high repeat-element seroreactivity stymie development of antibodies to invasion receptors?

Another potential confounder would be if the parasites in one of your cohorts was more similar to 3D7/IT than the other. This might be assessed with a PCA plot of SNPs from your NGS reads, or perhaps with MalariaGen or PlasmoDB tools comparing different locations in Uganda.

Reviewer #3 (Recommendations for the authors):

Lines 218-220: "The 9,927 seroreactive peptides identified by the pipeline were derived from 1,648 parasite proteins and antigenic variants, many of which showed broad seroreactivity across pediatric and adult Ugandan samples"

The 9,927 seroreactive peptides were identified from 238,068 total peptides corresponding to 4.2% of the peptides. Similarly, 1,648 seroreactive proteins were identified from 8,980 total proteins corresponding to 18.4% of the proteins. Taken together, the seropositivity in this study was lower than the previous studies using protein array and α screen. Please discuss why this difference has happened.

The reader of this paper will be interested in the seropositivity ranking among the top 100 proteins (not only the list of Top protein names; Supplementary table 2). Please provide the ranking of the seropositive proteins and add a discussion about the selected proteins.

As I listed in the public review 1), the phage display system is the ability to sequentially enrich and amplify the signal to noise. Although more rounds of IP definitely increase the S/N, an increased number of rounds of IP will have disadvantages such as biased results. Please add a description of why the authors defined the methods to round 2 of IP, not 1 or 3 or more.

eLife. 2023 Feb 15;12:e81401. doi: 10.7554/eLife.81401.sa2

Author response


Essential revisions:

1) Please conduct analyses of the data stratified by BS/PCR status.

We thank the reviewers for this suggestion. We have added this analysis, taking into account both exposure and infection status. (lines 409-435). Two new supplementary figures (Figure 5 Figure Supplement 3, Figure 5 Figure Supplement 4), also accompany the new text to support these analyses.

2) Please include analyses of antibody data to confirm your hypothesis that "inter-protein motifs" do cross-react.

We thank the reviewers for this suggestion. We have added this analysis to the manuscript (lines 495- 503). A new supplementary figure (Figure 6 Figure Supplement 3) also accompanies the new text to support this analysis.

3) Please revise the manuscript according to the additional suggestion indicated by the numbered lines corresponding to the text in the manuscript.

We have revised the manuscript as indicated by the numbered line suggestions.

Reviewer #1 (Recommendations for the authors):

Specific comments

Interprotein motifs as potential cross-reactive epitopes:

Line 59: "short motifs associated with seroreactivity were extensively shared among hundreds of antigens, potentially representing cross-reactive epitopes." Please present analyses showing whether reactivity to these shared motifs indeed correlates between different proteins within the same individual. One presumes so but this can be assessed with available data.

We have performed the suggested analysis. This has been added to the Results section (lines 495-503), as well as a new supplemental figure (Figure 6, Supplement 3) comparing seroreactivity of inter-protein motifs within individuals. We find significantly more sharing of seroreactivity to peptides with inter-protein motifs compared to peptides without inter-protein motifs, supporting the notion of cross-reactivity. Despite this association, we believe it prudent to be conservative about the conclusions, since the response of every individual is complex and polyclonal. Thus, we maintain our position that the data suggests wide-spread cross reactivity, but does not prove it. We have emphasized this point in the discussion.

Lines 396-479 – an extended passage is dedicated to the analysis of interprotein motifs, and their association with seroreactive proteins. The authors should analyze whether seroreactivities to different peptides containing the same/similar interprotein motif correlate or not within individuals. This would test the authors' hypothesis on L 399 "If these motifs are a part of an epitope, then antibodies and B-cell receptors (BCR) specific to a motif can potentially cross-react with the motif variants in different proteins".

As noted above, we have included a new analysis to address this question.

L 608: "Another limitation of our study is that it did not provide quantitative measures of absolute antibody reactivity to individual peptides per person." This is a bit confusing, since Figure 2 —figure supplement 2 shows that "Technical replicates are well correlated" for seroreactivity to individual peptides, indicating the assay is quantitative (but perhaps not for "antibody reactivity", although that is the underlying selective force?). In any case, it should be possible to compare RP5K between peptides that share the same interprotein motif, assuming that these motifs represent important B cell epitopes as the authors hypothesize.

While PhIPseq technical replicates are highly correlated, the assay is not considered quantitative in terms of titer or antibody affinity. As discussed in previous papers using PhIPseq ((Vazquez et al. 2020; Yuan et al. 2018), the actual number of reads recovered from an IP may vary due to many factors. This includes the starting copy number of phage targets in the library, position of the epitope in the peptide, non-specific interactions with the magnetic beads, and so on. In this work, and previous papers, the fold enrichment (by RPK) of any given peptide species is evaluated relative to the same library applied to control samples (non-malaria exposed plasma in this case), allowing a semi-quantitative, relative assessment of seroreactivity. An important aspect of this work is our inclusion of a large number (n=86) of (non-malaria) US controls, which forms the basis for measuring the relative enrichment in the Ugandan cohort. We have added a clarification in the limitations section of the discussion to hopefully avoid confusion (Lines 677-681).

As noted above, we have also included the requested analysis of per individual interprotein motif comparisons for cross-reactivity.

Effects of concurrent parasitemia on antibody:

Line 61: "PfEMP1 shared motifs with the greatest number of other antigens". This echoes prior studies showing short-term responses against different (heterologous) PfEMP1 variants during/after infection, which is an important potential confounder in the analyses presented here. The authors should address/analyze this issue of concurrent malaria infection effects more thoroughly, otherwise, any comparison between the two cohorts is hard to interpret.

We thank the reviewer for this comment. The analysis on inter-protein motifs was processed using the entire set of 9927 seroreactive peptides, regardless of cohort, to obtain a landscape of potential cross-reactive sequences. Note that the quoted statement refers to a comparison of responses to PfEMP1 to those of other, non-PfEMP1 proteins, mitigating the potential effect of broad responses to heterologous PfEMP1 variants. Therefore, the result on motif sharing with PfEMP1 only suggests that with the widely diverse PfEMP1 variants, there is a potential for cross-reactivity with different non-PfEMP1 antigens depending on the variant expressed within an individual. We claim only that the potential for cross reactivity exists, by sequence similarity alone, among the seroreactive peptide sequences. We have made the limitations of this claim clear in the discussion (Line 632-635).

We have however tested the important confounder of concurrent malaria with respect to breadth of reactivity to repeats as suggested by the reviewer, as described in the public review.

Line 174 – please clarify if there was a rationale for the different characteristics of the sample sets between the two sites – "the majority of Tororo samples were positive for infection", while those in Kanungu were "100 days after their last infection". This will bias any comparisons between the two cohorts. For example, Helb et al. previously observed, with sera from these same study sites, that "Pf-Specific Antibody Profiles Showed Decreased Responses with Increased Days Since Infection."

The samples were selected based on age stratification and not based on positivity for infection per se – the descriptive characteristics of the sample sets mentioned by the reviewer partially reflect the fact that the individuals in Tororo were much more often infected and thus many more will be infected at the time of sampling.

Line 254 – "children in the higher transmission setting had a significantly higher breadth than children in the moderate transmission setting." Here and elsewhere in the Results, the analyses could be confounded by concurrent parasitemia during sampling (which is more common in the high transmission cohort), since acute parasitemia has acute effects on antibody profiles including cross-reactive antibodies (for example, see https://pubmed.ncbi.nlm.nih.gov/15175751/). The authors should compare individuals with patent parasitemia (by microscopy), subpatent parasitemia (by PCR), and without parasitemia at the time of sample collection, to understand these potential confounding effects.

As suggested by the reviewer and described in public review, we have added new analyses with respect to current and recent parasitemia. The practical limitations in sample size with this study, did not allow us to stratify by patent/subpatent parasitimea however.

L 532-536: "The difference in seroreactivity to repeat-containing peptides observed here between the [high versus moderate transmission] settings could therefore emerge from two related mechanisms. In the first, the difference could be driven by a requirement for a minimum level of cumulative exposure to the target repeats to generate a robust response. In the second, the antibody response to repeats may be inherently less durable, leading to rapid waning in the absence of frequent exposure." Please test the possibility that these differences among children are simply related to concurrent parasitemia at the time of sample collection.

This is included in our new analyses, as described above.

Platform limitations:

Line 228 – "none of the proteins expressed in the mosquito oocyst stage were identified as seroreactive" Kindly clarify this statement, since for example CSP is expressed during the oocyst stage.

This is correct. Our original analysis was based on mass spectrometry data deposited in PlasmoDB until 2019, and using a spectral count >= 2 for calling expression, CSP was not called as expressed in the oocyst stage. Since then, newer datasets have been deposited and we have re-analyzed using all the currently available datasets, and we reduced the threshold to spectral count >= 1 to improve sensitivity, at the potential cost of specificity. Using this revised analysis, we now observe some oocyst sporozoite proteins (including CSP) identified in the seroreactive set. Notably though, the number of oocyst-only proteins is small, consistent with less representation of mosquito stage-specific proteins in the seroreactive set. We have updated the manuscript accordingly (lines 231-233 and Figure 2c).

Line 239 – "a known limitation of PhIP-seq – it detects predominantly linear epitopes". The passage including this statement may leave readers with the impression that the previous protein array studies cited here (which appear to have all used the same in vitro protein translation system) were generated using properly folded proteins with conformationally correct epitopes. However, this may be unlikely given the difficulty to express properly folded malaria antigens. Many of the antigens targeted by protective immunity are highly folded proteins and functional antibodies against these antigens are frequently conformation-dependent. As such, one might expect the PhIP-seq platform (and the other protein array platforms they cite) to perform poorly at studying such antibodies. The authors should clarify these issues a bit more as context for readers. The limitation of this PhIP approach may be indicated by "Figure 2 —figure supplement 7 Breadth of seroreactivity in the variable regions of RIFIN and PfEMP1" which shows modest reactivity of adult sera to peptides of PfEMP1, whereas prior studies have established that adult sera in Africa cross-react to many/all blood-stage parasite surface variants circulating in a community (eg, see Figure 1 in https://pubmed.ncbi.nlm.nih.gov/10882604/ ), presumably reflecting seroreactivity to the immunodominant PfEMP1 proteins.

We agree and thank the reviewer for pointing out this important point that the previous protein array studies may suffer from similar limitations of PhIP-seq in not displaying the conformations properly, especially for malaria proteins. We have modified the text to convey this point (lines 247).

With regards to PfEMP1, we acknowledge that our findings could just indicate a decrease in breadth of variant PfEMP1 recognition through linear epitopes in adults, and perhaps there is an increase in breadth of recognition through conformational epitopes that cannot be captured in our assay. This point has been added in lines 293-295.

Line 267 – "In the moderate transmission setting, adults had a significantly higher breadth of PfEMP1 variants recognized than children, suggesting an age and/or cumulative exposure dependent increase in PfEMP1 breadth in this setting, as previously observed in (Cham et al., 2009)." Please provide numbers and results of statistical analysis to substantiate this statement. This difference looking at the figure is not striking-it appears reasonably clear for CIDR a/b/g, but less so for DBL domains.

We have included this analysis in the Results section as suggested by the reviewer (lines 273-274) and we added additional detail to Figure 2, Supplement 6, including the ATS domain. In the moderate setting, the median number of PfEMP1 recognized in adults was 44, whereas the median in children was 26. For the moderate setting, the majority of seroreactivity was localized to the ATS domain, which is unsurprising, since this domain is conserved. The variable domains are inherently unlikely to be perfectly reflected between the PhIPseq library and the actual domains present in parasites from this cohort, which may result in less overall sensitivity in this assay. The text has been amended to clarify this point (lines 276-279).

Line 274: "a decline in responses to variants as children develop into adults in this [high transmission] setting. This is consistent with observations from a previous study investigating antibody responses to PfEMP1 DBLa domains in Papua New Guinea (Barry et al., 2011)". While the results are consistent, it's worth noting that Barry used an array of in vitro translated proteins that may not be properly folded-this was never tested in Barry's study and hence would have similar limitations to the study here that uses only linear epitopes.

Yes, as noted above, we have pointed out this limitation in the revision.

Figure 1: please state in the figure the total number of falciparum proteins and corresponding peptides in the library-the number 8980 stated here includes >1400 non-malaria proteins (and these were apparently excluded from analysis-line 725).

We have changed Figure 1 now to represent the total number of malaria proteins and peptides instead of overall total and added this to the text (lines 189-190).

Reviewer #2 (Recommendations for the authors):

Overall, this paper was excellent. The writing was careful, and the experiment is exciting.

We thank the reviewer for these comments.

"Slow development of immunity" is a motivation described in the introduction for investigating repeat-containing elements. I would recommend adding some nuance to this description of natural immunity to avoid confusion for some readers. Immunity to severe disease in children (and in pregnancy) is acquired quite rapidly after only one or two infections. This is relevant here because it may differentiate the immunity to the types of antigens focused on in this paper from variant antigens and genes with completely different, but discrete numbers, of alleles-e.g., MSP1, EBA175, etc.

Agreed. We have added a modification to the Introduction, as suggested by the reviewer (lines 101-102).

A potential major confounder in the interpretation of results indicating responses to repeat elements that are shared amongst multiple proteins is that the assay wouldn't differentiate antibodies derived from one or many of these loci. The conclusions about the importance of inter-protein repeats on immunity are based on the assumption that any signal in these regions is based on antibodies derived from those proteins, rather than Ab's to one region causing the immunoprecipitation of the others. This seems especially relevant where seroreactivity is discussed. Is it possible to look at regions adjacent to the inter-protein repeats to discern this? I understand this was done when investigating seroreactivity within proteins, but this would be to confirm signals from different proteins. If it's just that I'm missing something obvious, some discussion clarifying why this isn't a confounder would be a good addition.

Yes, thank you. There are two points here. First, we did indeed examine the regions immediately adjacent to the inter-protein repeat motifs, which is possible since our library consists of overlapping peptides, giving us an effective resolution of 12-13 amino acids (Figure 6, Supplement 2). Here, we observed that seroreactivity is indeed localized to the part of the peptide with the interprotein motif, as opposed to other parts of the peptide. Second, we have added a new analysis of cross-reactivity between shared interprotein motifs within each individual (see Lines 495-503, and response to Reviewers #1 and #2). There are limitations to this analysis, which we have made clear in the manuscript as well (Lines 632-635).

I thought the focus on repeat regions was important and wonderful to see, but some comparison to polymorphic loci feels like a lost opportunity to add significant value. It would be particularly interesting to know if an immunity to repeats was anticorrelated within individuals to other loci, and how this compares across transmission intensity. For example, does high repeat-element seroreactivity stymie development of antibodies to invasion receptors?

This is an interesting question, although difficult to directly address with this dataset. The question is whether repeats stymie or interfere with the development of immune responses to particular classes of proteins (like invasion proteins). As a whole, we observe more seroreactivity to both repeats and non-repeats together as a function of exposure and age (Author response image 1). That said, individuals vary widely with respect to seropositivity to specific repeats and non-repeats, and therefore it would require a much larger dataset to establish co-occurrence, or interference, since no two individuals are alike. This is graphically depicted in Figure 2, Supplement 2 (top panel). The Ugandan samples are clearly more correlated than the US samples, due to malaria exposure, but high correlation values between individuals are lacking. Furthermore, interference may be relieved in the form of antibody feedback – for instance, it has been shown that feedback from antibodies to the immunodominant repeat epitopes in CSP can lead to diversification of responses to subdominant epitopes in a vaccine setting (Mcnamara et al. 2020). A correlation analysis using this dataset is unlikely to capture these nuances.

Author response image 1.

Author response image 1.

However, within proteins, it is clear that the linear peptide regions that are immuno-dominant are the repeats themselves (Figure 3, and also Figure 4), which is consistent with the notion that repeats prevent or shield against the development of seroreactivity to the non-repeat portions of the same protein. This effect has been posited in ((Schofield 1991)) and elsewhere. As we’ve pointed out, one limitation on this conclusion is that PhIPseq detects largely non-conformational epitopes.

Another potential confounder would be if the parasites in one of your cohorts was more similar to 3D7/IT than the other. This might be assessed with a PCA plot of SNPs from your NGS reads, or perhaps with MalariaGen or PlasmoDB tools comparing different locations in Uganda.

In this dataset, the NGS reads are from our phage library, and not from Ugandan parasite sequences themselves, and thus we do not have the direct data to answer this question. However, parasite diversity in Uganda (and sub-saharan Africa in general) is amongst the highest in the world, with little population structure within Uganda, as we have previously shown (Chang et al. 2017)

We thus expect any differences in genetic composition of parasites between the sites overall to be quite small in comparison to the extensive variation within each site and the large differences between parasites to which people are exposed and the reference parasites.

Reviewer #3 (Recommendations for the authors):

Lines 218-220: "The 9,927 seroreactive peptides identified by the pipeline were derived from 1,648 parasite proteins and antigenic variants, many of which showed broad seroreactivity across pediatric and adult Ugandan samples"

The 9,927 seroreactive peptides were identified from 238,068 total peptides corresponding to 4.2% of the peptides. Similarly, 1,648 seroreactive proteins were identified from 8,980 total proteins corresponding to 18.4% of the proteins. Taken together, the seropositivity in this study was lower than the previous studies using protein array and α screen. Please discuss why this difference has happened.

We thank the reviewer for this comment. We would like to point out that the 238,068 peptides and 8,980 proteins in the library represent the 3D7 and IT proteomes along with variant proteins as well as viral and experimental control proteins, as described in Table 2 (the complete peptide list and sequences are also available at Dryad doi:10.7272/Q69S1P9G). The set of 1,648 seroreactive proteins identified here represent unique proteins, to avoid double counting from 3D7/IT or variant antigens. Thus, these 1,648 proteins represent ~30% of the 5,400 proteome of P. falciparum, and we have clarified this in the Results (Lines 221-222).

We note that we implemented a highly conservative inclusion criteria for proteins in this set to minimize false positives (Figure 2, Supplement 4) by leveraging the large number of US control samples and enforcing a minimum level of 5 individuals sharing. We note that previous studies either did not have a large control (unexposed) set and/or did not require sharing among at least 5 individuals. The comparison of the 1,648 proteins identified here with previous screens are discussed in detail in both the Results (Lines 239-247) and Discussion section (Lines 565-571).

The reader of this paper will be interested in the seropositivity ranking among the top 100 proteins (not only the list of Top protein names; Supplementary table 2). Please provide the ranking of the seropositive proteins and add a discussion about the selected proteins.

Yes, agreed. A “top ranking” summary is now provided (Supplementary table 2b), in addition to the full dataset.

As I listed in the public review 1), the phage display system is the ability to sequentially enrich and amplify the signal to noise. Although more rounds of IP definitely increase the S/N, an increased number of rounds of IP will have disadvantages such as biased results. Please add a description of why the authors defined the methods to round 2 of IP, not 1 or 3 or more.

We and others have measured the gain in signal as a function of IP round (O’Donovan et al., “High-resolution epitope mapping of anti-Hu and anti-Yo autoimmunity by programmable page display. Brain Communications 2020). As shown in the above reference, multiple rounds amplifies the signal. Pilot experiments on Ugandan samples done with 1, 2 or 3 rounds using the Falciparome library showed that the largest fold-gain in S/N for known antibody targets in P. falciparum was realized between round 1 and 2, and thus we proceeded with 2 rounds to avoid potential bias from phage amplification.

References

Chang, Hsiao Han, Colin J. Worby, Adoke Yeka, Joaniter Nankabirwa, Moses R. Kamya, Sarah G. Staedke, Grant Dorsey, et al. 2017. “THE REAL McCOIL: A Method for the Concurrent Estimation of the Complexity of Infection and SNP Allele Frequency for Malaria Parasites.” PLoS Computational Biology 13 (1). https://doi.org/10.1371/JOURNAL.PCBI.1005348.

Mcnamara, Hayley A, Azza H Idris, Henry J Sutton, Mattia Bonsignori, Robert A Seder, Ian A Cockburn Correspondence, Rachel Vistein, et al. 2020. “Antibody Feedback Limits the Expansion of B Cell Responses to Malaria Vaccination but Drives Diversification of the Humoral Response.” https://doi.org/10.1016/j.chom.2020.07.001.

Schofield, L. 1991. “On the Function of Repetitive Domains in Protein Antigens of Plasmodium and Other Eukaryotic Parasites.” Parasitology Today (Personal Ed.) 7 (5): 99–105. https://doi.org/10.1016/0169-4758(91)90166-L.

Vazquez, Sara E., Elise M.N. Ferré, David W. Scheel, Sara Sunshine, Brenda Miao, Caleigh Mandel-Brehm, Zoe Quandt, et al. 2020. “Identification of Novel, Clinically Correlated Autoantigens in the Monogenic Autoimmune Syndrome APS1 by Proteome-Wide Phip-Seq.” eLife 9 (May). https://doi.org/10.7554/ELIFE.55053.

Yuan, Tiezheng, Divya Mohan, Uri Laserson, Ingo Ruczinski, Alan N Baer, H Benjamin Larman, and Corresponding. 2018. “Improved Analysis of Phage ImmunoPrecipitation Sequencing (PhIP-Seq) Data Using a Z-Score Algorithm.” BioRxiv, April, 285916. https://doi.org/10.1101/285916.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Raghavan M, Kalantar K, Duarte E, Teyssier N, Takahashi S, Kung A, Rajan J, Rek J, Tetteh K, Drakeley C, Ssewanyana I, Rodriguez-Barraquer I, Greenhouse B, DeRisi J. 2022. Proteome-wide antigenic profiling in Ugandan cohorts identifies associations between age, exposure intensity, and responses to repeat-containing antigens in Plasmodium falciparum. Dryad Digital Repository. [DOI]

    Supplementary Materials

    Figure 2—source data 1. GO analysis of top seroreactive proteins.
    Supplementary file 1. List of 9927 seroreactive peptides identified in this dataset with their sequences.
    elife-81401-supp1.xlsx (985.1KB, xlsx)
    Supplementary file 2. Top 40 proteins with highest seropositivity and associated literature.
    elife-81401-supp2.zip (43.4KB, zip)
    Supplementary file 3. List of top 100 proteins with highest seropositivity used for GO analysis.
    elife-81401-supp3.zip (1.7KB, zip)
    Supplementary file 4. Seropositivity rate (proportion of people seropositive) for all 9927 seroreactive peptides across different groups in the two exposure settings.
    elife-81401-supp4.xls (3.8MB, xls)
    Supplementary file 5. Seropositivity rate (proportion of people seropositive) for top repeat elements across different groups in the two exposure settings.
    elife-81401-supp5.xlsx (13.1KB, xlsx)
    Supplementary file 6. List of inter-protein motifs and the proteins sharing them.

    Motifs reported here are 7-mers with at least 5 identical amino acids and up to two conservative substitutions (and no wildcards).

    elife-81401-supp6.xlsx (192.4KB, xlsx)
    Supplementary file 7. Table describing the number of interprotein motifs obtained with varied parameters for calling the motifs.
    elife-81401-supp7.xlsx (10.3KB, xlsx)
    Supplementary file 8. Gene network file for interprotein motifs (7-mers with at least 5 identical amino acids and up to two conservative substitutions (and no wildcards)).

    Can be visualized on Cytoscape.

    elife-81401-supp8.zip (13.6KB, zip)
    MDAR checklist

    Data Availability Statement

    The data and sample metadata associated with this study can be accessed in the Dryad repository with the doi:10.7272/Q69S1P9G (https://datadryad.org/stash/share/YuYmQNKNvrWmoMX8n99wle_2bFyrtweAGclxYPHkPjY).

    The code generated for the study is on GitHub and can be accessed at https://github.com/madhura-raghavan/phage-malaria-uganda200.git, (copy archived at swh:1:rev:b49a3ec15d86048dd570b340487cee139fbeb445; Raghavan, 2023).

    All data generated or analyzed during this study are included in the manuscript, supporting files and in the Dryad repository with the https://doi.org/10.7272/Q69S1P9G.

    The following dataset was generated:

    Raghavan M, Kalantar K, Duarte E, Teyssier N, Takahashi S, Kung A, Rajan J, Rek J, Tetteh K, Drakeley C, Ssewanyana I, Rodriguez-Barraquer I, Greenhouse B, DeRisi J. 2022. Proteome-wide antigenic profiling in Ugandan cohorts identifies associations between age, exposure intensity, and responses to repeat-containing antigens in Plasmodium falciparum. Dryad Digital Repository.


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES