Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

Research Square logoLink to Research Square
[Preprint]. 2025 Apr 18:rs.3.rs-6255613. [Version 1] doi: 10.21203/rs.3.rs-6255613/v1

A machine learning framework to identify complex physicochemical features of B cell epitopes

Simranjit Grewal 1, Uwa Iyamu 2, Daniel Vinals 3, Catherine Mitran 4, Nidhi Hegde 5, Stephanie Yanow 6
PMCID: PMC12047986  PMID: 40321766

Abstract

During infection with Plasmodium falciparum in pregnancy, parasites express a unique virulence factor, VAR2CSA, that mediates binding of infected red blood cells to the placenta. A major goal in designing vaccines to protect pregnant women from malaria is to elicit antibodies to VAR2CSA. The challenge is that VAR2CSA is highly polymorphic and identifying conserved epitopes is essential to elicit strain-transcending immunity. Unexpectedly, a mouse monoclonal antibody, 3D10, raised against the unrelated Duffy binding protein from P. vivax (DBPII) cross-reacts with diverse alleles of VAR2CSA in vitro. To identify these potentially conserved epitopes in VAR2CSA, we designed a machine learning framework to analyse 3D10 reactivity to peptides derived from two alleles of VAR2CSA, DBPII, and PvEBP2 (negative control). We used decision trees and a panel of 430 features to extract features correlated to 3D10 binding. We analysed patterns of these features in the dataset and designed mutant peptides to test complex sequence motifs. Features associated with 3D10 reactivity were mapped onto predicted 3D structures of Plasmodium proteins and validated based on 3D10 reactivity to the recombinant antigens. While the array data identified certain linear epitopes, the framework predicted other epitopes that are conformational. With this approach, peptide array data can be mined to extract physicochemical properties of epitopes recognized by polyreactive antibodies.

2. Introduction

Malaria caused by Plasmodium falciparum is among the deadliest infectious diseases and results in more than half a million deaths per year1. Pregnant women compose a large portion of the at-risk population (>36 million pregnancies) with over 12 million infected in sub-Saharan Africa annually2. Vaccines to protect pregnant women from malaria aim primarily to elicit antibodies to the P. falciparum virulence factor VAR2CSA that is uniquely expressed during infection in pregnancy3. VAR2CSA is expressed on the surface of infected erythrocytes and binds to chondroitin sulfate A (CSA)-specific proteoglycans on the placenta syncytiotrophoblast4, disrupting the exchange of oxygen and nutrients from the maternal blood to the fetus5. Placental malaria is associated with poor maternal and birth outcomes, including severe maternal anemia, low birthweight infants, pre-term birth and fetal growth restriction6, underscoring the importance of developing vaccines specific to this population.

Vaccines based on recombinant fragments of VAR2CSA showed promise in pre-clinical studies but failed to elicit broadly inhibitory antibodies in Phase I trials7,8. A key bottleneck for these vaccines is the extensive genetic diversity of the VAR2CSA alleles; thus, efforts are underway to identify conserved epitopes for next-generation vaccines7,8. One approach, common with genetically diverse pathogens like influenza and HIV, is to study the epitopes recognized by antibodies acquired naturally during infection9. Typically, VAR2CSA antibodies are acquired in a parity-dependent manner and are associated with protection from placental malaria10. However, antibodies to VAR2CSA were also reported in certain non-pregnant populations11. In Colombia and Brazil, men and children had antibodies to VAR2CSA at similar levels to pregnant women of all gravidities12,13. Based on the local epidemiology of malaria, we proposed that exposure to the Plasmodium vivax species, and specifically to the P. vivax Duffy binding protein (region 2; DBPII), can elicit antibodies that cross-react with VAR2CSA11,14. In support of this hypothesis, a mouse monoclonal antibody raised against DBPII, called 3D10, cross-reacts with VAR2CSA and moderately inhibits the adhesion to CSA of infected erythrocytes expressing different alleles of VAR2CSA14,15. These findings suggest a conserved epitope may be shared between these evolutionarily distinct proteins, which could be exploited to develop a vaccine against VAR2CSA14,16. A unifying trait between DBPII and VAR2CSA is the presence of Duffy binding like (DBL) domains, which are common to several malaria virulence factors used for invasion or sequestration.

Various studies describe the epitope of 3D10 in the DBPII region of PvDBP based on mutational analysis, phage display libraries, mimotopes, peptide arrays, and alanine scanning1719. A discrete binding region was identified within subdomain 1 (SD1) that includes the amino acids NxxRKR and/or YK(R/Y/E). Yet a search for a similar motif in VAR2CSA failed to identify a homologous epitope, highlighting a common shortfall in mapping discrete epitopes recognized by cross-reactive antibodies20. Our screening of VAR2CSA by peptide array with 3D10 provided no clarity with regards to a conserved epitope between VAR2CSA and DBPII21. The array data revealed one major binding site along with several other clusters of highly reactive peptides within the VAR2CSA DBL domains but none of the peptides contained the consensus 3D10 epitope from DBPII17,18.

To resolve the ambiguity of the peptide array data and characterize our cross-reactive antibody, we employed machine learning to analyse the conserved features of 3D10 epitopes within two diverse alleles of VAR2CSA. We developed a regression machine learning framework to analyse 430 features from a dataset of over 4300 peptides in 4 arrays derived from DBPII, two alleles of VAR2CSA, and PvEBP2 (3D10 non-reactive). We used feature selection to extract physicochemical characteristics that describe 3D10 binding and applied those features to generate a panel of mutated peptides to test more complex patterns of amino acids. Our analyses revealed a broader set of criteria for 3D10 epitopes in VAR2CSA that are also present, as linear and conformational epitopes, in other DBL proteins from Plasmodium. We propose our framework as a method to generate a physicochemically-based binding profile of cross-reactive antibody binding20.

3. Results

Machine learning and feature selection

The dataset used to develop the machine learning algorithm consists of the reactivity values (expressed as arbitrary units) of 3D10 to each peptide in the arrays derived from PvDBP, PvEBP2, and 2 alleles of VAR2CSA. We then mapped the peptide array data to 3D structures of DBPII, PvEBP2, and VAR2CSA (FCR3) to visualize 3D10 reactivity to these different proteins (Fig 1 AD). As expected, 3D10 strongly recognized an epitope in SD1 of DBPII (Fig 1A). There were no epitopes in PvEBP2 (Fig 1B) consistent with previous findings that 3D10 does not recognize this recombinant protein by ELISA16,22. In VAR2CSA, 3D10 recognized several epitopes within DBL3 and 4 (Fig 1C) and DBL5 (Fig 1D). Based on the intensity of 3D10 binding, we identified classified peptides as 3D10 reactive and 3D10 non-reactive and identified the five most reactive peptides (Fig S1; Fig 1 E). Many of the strongest binding peptides contain a significant enrichment of lysine residues and tyrosines adjacent to lysines. However, these traits alone are insufficient to define reactive vs non-reactive peptides given that non-reactive peptides were also rich in lysines and tyrosines next to lysines was commonly observed in the dataset (Fig S2).

Figure 1. 3D10 peptide array data mapped on their respective 3D protein structures.

Figure 1.

Structures generated by AlphaFold 3 (A-B) are accessible in a GitHub repository described in our data availability section. VAR2CSA (FCR3) structures were resolved by cryo-EM and are accessible at PDB ID 7JGE (C) and 7JGF (D)23. Raw median values are represented according to the colour scale and mapped to DBPII (A), PvEBP2 (B), VAR2CSA (FCR3) core region (DBL 1 to 4) (C), and VAR2CSA (FCR3) flexible arm region (DBL 5 and 6) (D). One peptide within SD1 and four peptides recognized most strongly by 3D10 in VAR2CSA were selected for further analysis (E). Figure 1 - PvDBPII / PvEBP2 / VAR2CSA(FCR3) DBL1–4 / DBL 5–6

Next, we generated a panel of features (see methods) (Table S1) that we hypothesized were associated with 3D10 binding. These features focused on specific amino acids, plus properties such as charge, hydropathy and polarity. Structural features were included with the caveat that synthetic peptides can adopt heterogeneous structures in vitro that fail to replicate their structures within the native or recombinant protein24,25. The label (predicted variable) was the raw median 3D10 binding value of two replicates; this was selected to minimize the noise from non-reactive peptides. For feature selection, we applied the methods of feature elimination, feature utilization, and variance feature selection to our panel of 430 features, to select a small subset of features that correlated to 3D10 reactivity (see methods) (Table S2).

Feature validation

As the features identified by our framework could be positively or negatively associated with 3D10 binding, the next step of validation was to discern which features define 3D10 epitopes. We analyzed the dataset using three methods: heatmaps of features, amino acid enrichment calculation, and analysis of amino acid distances within each peptide (see methods).

The first set of analyses focused on positive charge, hydropathy, and sidechain energy (Fig 2). In 3D10 reactive peptides, we observed that lysine and arginine residues were significantly enriched (Fig 2 A). There was also a tendency for positive residues to cluster at the C-terminus and for non-polar residues to cluster at the N-terminus of 3D10 reactive peptides (Fig 2 B)26,27. We hypothesized that N-terminal modifications of VAR2CSA peptides that preserved the C-terminal positive clusters observed in P2 to P5 may increase reactivity as they will be more like P1, the minimum binding peptide from DBPII. To test this hypothesis, we designed subsets of mutated peptides for four different tests (Fig 2 C). For each subset, we applied different approaches to adjust the relative hydropathy and sidechain energy of P2 to P5 to be closer to P1. In subset A, we tested if hybrid peptides that combined the N-terminal residues of P1 with the C-terminal positive clusters of P2-P5 were sufficient or enhanced binding to 3D10; this would indicate N-terminal sidechain energy and hydropathy contribute to the differences in reactivity between peptides. In tests B, C, and D, we mutated residues in peptides P2 to P5 to set the overall peptide sidechain energy and/or hydropathy to be similar to P1. We used either chemically related amino acids, or a minimal number of amino acids while avoiding (where possible) changes to positively charged residues (Fig 2 C). We tested mutated peptides by indirect ELISA and binding was compared to the non-mutated ‘wildtype’ peptides (Fig 2 D). We observed that the N-terminal modifications in P2 to P5 decreased binding in all subsets except when modifications decreased the total number of negative residues. Additionally, the clusters of positive residues at the affixed C-terminus, despite being strongly associated with 3D10 reactive peptides, were not sufficient to explain 3D10 reactivity to peptides P2 to P5. Moreover, sidechain energy and hydropathy patterns at the C-terminus that correlate to 3D10 reactive peptides involve a more complex signature than what is observed in our heatmaps (Fig 2 B)26,27.

Figure 2. Binding-correlated clusters of C-terminal lysines and arginines are insufficient for 3D10 reactivity.

Figure 2.

(A) Frequency of charged residues in 3D10 reactive and non-reactive peptides was compared. Significance was calculated by two-way ANOVA assuming a Gaussian distribution. (B) Heatmap of positive charge, sidechain energy, and hydropathy in 3D10 reactive peptides. Each cell indicates a residue position from the N-terminus at the top to the C-terminus at the bottom. The colour scale for each table ranges from the maximum to minimum for each feature as indicated above each column. (C) Table of peptide sequences with mutated residues coloured red. Peptides are grouped by different tests. (D) Mutated peptide reactivity relative to wildtype. Bars are coloured relative to their test group.

Our analysis suggested that clusters of positive charge at the C-terminus were insufficient for 3D10 reactivity, and reactivity was affected by N-terminal mutations modifying net-charge (Fig 2 CD). To expand on these observations, we increased the depth of positive charge analysis and characterized sequence patterns relative to enriched positive residues (Fig 3 A). Consistent with the heatmaps of 3D10 reactive peptides, there was an increased likelihood of lysine residues positioned adjacent to arginine residues (Fig 3 A; Fig 2 B). Tyrosine residues were significantly more likely to occur adjacent to lysine residues and significantly less likely to occur distal from lysine residues; this corresponds with the selection of the ‘KY count’ feature (Fig 3 B). Further, lysine residues had a higher propensity to occur one amino acid away from other lysine residues (KxK) in 3D10 reactive peptides (Fig 3 D). ‘YK’ was not selected as a feature despite being present in P1. To determine if the absence of ‘YK’ in selected features was a function of the dataset, we swapped the position of lysine and tyrosine residues to create the opposite feature (Fig 3 D). We also decreased the ‘KY’ count by increasing the distance between lysine and arginine (test B). Again, we focused on C-terminal ‘K’ and ‘Y’ residues as our top-binding peptides consistently contained C-terminal ‘KY’ motifs. To test N-terminal positive residues, we focused on P3 as it contained positive residues with consistent spacing allowing us to explore the interchangeability of lysine and arginine residues and the requirement for specific lysine residues (tests C, D, and E). Finally, we tested whether tyrosine could be interchanged with other aromatic residues (test F) (Fig 3 C, E). Consistently, we observed that despite the chemical similarities of arginine and lysine or phenylalanine and tyrosine, it was essential that lysine was adjacent to tyrosine for high 3D10 reactivity. This highlights that 3D10 binding requires the specific amino acid motif: ‘KY’/‘YK’.

Figure 3. Lysine adjacent to tyrosine and specific positive residue patterns result in strong 3D10 reactivity.

Figure 3.

(A-B, C) Log odds ratio of distances between arginine to lysine, tyrosine to lysine, or lysine to lysine residues comparing observed proportion to expected value in 3D10 reactive and non-reactive peptides. Missing data points indicate a distance value occurred zero times and is significantly less common than expected (see methods) (D) Table of peptide sequences with mutated residues coloured red. Peptides are grouped by different tests. (E) Mutated peptide reactivity relative to wildtype. Bars are coloured relative to their test group.

For several of our selected features, the statistical analyses did not uncover significant differences between 3D10 reactive and non-reactive peptides (Table S2). However, their selection by the decision trees suggested they could contribute to 3D10 reactivity. These include non-polar residues, specifically VM, VN, VT, VQ, MT, alanine, and valine counts. Also, there was a tendency for alanine and asparagine to occur less frequently in 3D10 reactive peptides (Fig 4 A). It is unclear if the reduced frequency of these residues is due to the proportional increase in positive residues like lysine and arginine (Fig 2 A). Additionally, we were unable to determine a distinct pattern relative to hydropathy by broadly adjusting the hydropathy of 3D10 reactive peptides (Fig 2 B, D). This may highlight a more complex relationship that varies from peptide to peptide. To address this, we mutated minimal polar and non-polar residues into various positions to determine their importance. For tests A, B, and C, we used serine, methionine, and alanine: serine was present in all top-binding peptides, methionine was present in the frequently observed ‘VM’ feature and alanine was used to mutate polar residues (Fig 4 B). The results of this testing failed to establish a specific pattern of polar residues relative to other aspects of the 3D10 epitope (Fig 4 C). These results suggest that polarity is not a sufficient criterion to distinguish 3D10 reactive peptides from 3D10 non-reactive peptides.

Figure 4. Polar residues do not consistently predict binding of 3D10.

Figure 4.

(A) The frequency of residues in 3D10 reactive and non-reactive peptides was compared. Significance was analysed by two-way ANOVA. (B) Table of peptide sequences with mutated residues coloured red. Peptides are grouped by different tests. (C) Mutated peptide reactivity relative to wildtype. Bars are coloured relative to their test group.

By integrating the results from all mutant peptide sets, we propose the following six criteria for highest 3D10 reactivity: 1) a lysine adjacent to a tyrosine; 2) a lysine one residue away from another lysine; 3) clusters of lysines and arginines; 4) a polar residue proximal to the criteria 1/2/3; 5) minimal negative residues within or between the above criteria; and 6) criteria 1–4 are proximal to one another. The first criterion, a tyrosine adjacent to lysine, is present in 40 of 43 3D10 reactive peptides and only one amino acid away in the remaining peptides. We found that this criterion is not restricted to the C-terminus (Fig 3 E). We also observed that arginine cannot replace lysine, nor can other aromatics replace tyrosine (Fig 3 E). We observed a significantly higher rate of lysine positioned one amino acid away from another lysine (Fig 3 D). Particularly, we observed that binding was diminished or enhanced with the deletion or addition of the ‘KxK’ motif, respectively (Fig 2 CD; Fig 3 DE). We observed that arginine and lysine residues were positioned proximal to one another, positive charge was strongly correlated to binding, and these residues clustered at the C-terminus of reactive peptides (Table S2; Fig 2 AB; Fig 3 AC). Our data indicated that this clustering is not sufficient for 3D10 reactivity but modifications in this region resulted in a major loss of 3D10 reactivity (Fig 2 D; Fig 3 E). We observed that the addition or removal of certain polar residues had significant effects on reactivity, but we were unable to discern a specific positionality relative to other criteria necessary for reactivity (Fig 2 D; Fig 4 C). The addition of negative residues consistently decreased binding while the mutation of negative residues increased binding. Finally, because 3D10 strongly reacts to linear peptides, like P1, we believe that ideally, these criteria must be proximal to one another; in P1, all criteria are satisfied in the minimum possible 6 amino acids28.

3D10 binding criteria application and testing

To validate these criteria within the context of a protein with three-dimensional structure, we constrained the distance between criteria to a field of ~1000 Å2 and restricted our analysis to epitopes on the protein surface2931. Moreover, we defined strict standards to translate criteria for 3D structure mapping. We defined clusters of arginines and lysines as 3 of either residue present within a 6 amino acid span. Criterion 2 was defined as ‘KY’ or ‘YK’ occurrences in protein sequence, and criterion 3 was defined as any occurrence of ‘KxK’ excluding ‘KDK’ or ‘KEK’. ‘Proximal’ was defined as continuous on the surface structure without negative residues between criteria 1, 2, or 3. This standard eliminated ‘KEK’ and ‘KDK’ from consideration. Criterion 4 was interpreted to be the occurrence of a polar residue continuous with any of criteria 1, 2, or 3.

We classified protein segments that satisfied all six criteria as ‘signature regions’. In our analysis of VAR2CSA (FCR3 allele), we observed 8 signature regions across the protein (Fig 5 A). We identified 7 signatures in the VAR2CSA core region DBL 1 to 4. One signature includes the sequences from P2 and P3. P4 was not associated with any of the signatures because it was not surface accessible and does not satisfy criterion 221. One signature was identified in DBL5 within the VAR2CSA arm region and mapped to P5. Surprisingly, several of the signature regions are discontinuous and predicted to be conformational epitopes (Fig 5 A). To further validate both the specificity of this approach and the breadth of cross-reactivity of 3D10, we mapped the 3D10 binding criteria onto other DBL proteins from Plasmodium (Fig 5 BE)32. We used the DBPII region of PvDBP as a reference, and as expected, the criteria mapped to the known epitope in SD1 (Fig 5 B)18. Consistent with our array data and experimental data, PvEBP2 was not predicted to have any signature regions despite containing a DBL domain (Fig 1B; Fig 5 C)16,22. We extended our analysis to PcDBP, an orthologous protein from Plasmodium chabaudi that is closely related to DBPII and PvEBP2. We identified a single conformational signature corresponding to the same region in DBPII (Fig 5 D). EBA-175 from Plasmodium falciparum also shares significant genetic similarity to the DBL domain of DBPII and does contain 2 signature regions: one predominantly linear and one conformational (Fig 5 E)22,33,34.

Figure 5. Mapping of proposed 3D10 binding criteria onto 3D structures of Plasmodium proteins.

Figure 5.

Structures generated by AlphaFold 3 (A-D) are accessible in a GitHub repository described in our data availability section. EBA-175 structure was resolved by crystallography and is accessible at PDB ID 1ZRL (E)35. (A-E) Surface-exposed regions containing a polar residue (magenta), a ‘KxK’ motif (yellow), a ‘KY’ / ‘YK’ motif (cyan), and a positive cluster region (green) that are uninterrupted by other residues or negative residues (grey and red) within a 28-Angstrom diameter was defined as satisfying the criteria. Criteria mapped onto the structure of VAR2CSA (FCR3) (A), DBPII (B), PvEBP2 (C), PcDBP (D), EBA-175 (E). Regions that satisfy the criteria are highlighted and numbered relative to their order in the protein sequence and assigned a specific colour. Boxes around specific amino acids indicate homologous regions between DBPII (B), PvEBP2 (C), and PcDBP (D).

To validate the predicted binding signatures and test the specificity of these criteria, we tested 3D10 against a panel of recombinant proteins by ELISA (Fig 6 A). Except for PvEBP2, the presence of Duffy binding-like (DBL) domains consistently corresponded to 3D10 binding; DBL domains are present in 3D10 reactive proteins (DBPII, VAR2CSA (FV2), VAR2CSA (ID1-ID2), VAR2CSA (DBL5ε), EBA-175, and PcDBP). None of the other Plasmodium proteins were recognized by 3D10. Importantly, in all cases, 3D10 binding in vitro validated the predictions in silico (Fig 6 B). An exciting outcome of the algorithm is that many of the epitopes within these proteins were predicted to be conformational, specifically within ID1-ID2, DBL50ε and PcDBP, and one of two predicted epitopes in EBA-175. We tested whether disruption of the disulfide bonds in these DBL domains would reduce binding of 3D10 (Fig 6 C). As expected, there was no change in reactivity to DBPII which contains a linear epitope. There was also no change in reactivity to DBL5ε, whereas reactivity increased to the full-length VAR2CSA and EBA-175, suggesting DTT treatment may reveal linear epitopes that are not surface-exposed. Of specific interest, DTT reduced binding of 3D10 to ID1-ID2 and PcDBP, consistent with these epitopes being conformational. To validate further, we examined the array data for the ID1-ID2 region and noted only one weakly reactive linear peptide was recognized above our signal-to-noise threshold (Fig 6 D). We also tested 3D10 binding to an array of linear peptides spanning the DBL domain of PcDBP and no linear peptides were recognized (Fig 6 E).

Figure 6. Validation of epitopes predicted by 3D10 binding criteria.

Figure 6.

(A) ELISA reactivity of 3D10 against a panel of recombinant proteins. (B) Comparison of experimental and predicted binding of 3D10 to various Plasmodium antigens. 3D10 binding to peptide arrays was defined by any peptide in the array that could be classified as 3D10 reactive, as characterized in Figure S1. Green shading indicates binding, or satisfaction of criteria and gray indicates no binding or failure to satisfy criteria. (C) Reactivity of 3D10 to DBL proteins treated with DTT to disrupt disulfide bonds. (D-E) Peptide array reactivity for peptides spanning the ID1-ID2 region of VAR2CSA (FCR3) (D) and PcDBP (E).

4. Discussion

We propose a framework for integrating machine learning to enhance epitope mapping with peptide arrays (Fig 7). In this pipeline, peptide array data were combined with feature selection to extract features associated with antibody binding (positively or negatively); statistical analysis and empirical testing down-selected specific binding criteria, which were then mapped onto 3D protein structures to predict linear and conformational epitopes. We identified six criteria for 3D10 binding which mapped to predicted epitopes only in those proteins that were recognized by 3D10 experimentally. These criteria are consistent with the 3D10 epitope in DBPII requiring the amino acids ‘NxxRKR’ and/or ‘YK(R/Y/E)’ determined by mutational analysis, arrays, and mimotopes17,18. Importantly, our criteria identified additional physicochemical features that broaden the epitope beyond this simple motif, which in turn, explains the cross-reactivity of 3D10 with VAR2CSA.

Fig 7. Machine learning peptide array analysis framework.

Fig 7.

This flowchart describes the process to extract 3D10 binding properties from peptide array analysis. Divisions indicate different stages of experimentation. Assets partially generated with Biorender.com.

When we mapped the criteria to the VAR2CSA structure, we identified 8 predicted epitopes. Two of these correspond to a linear sequence in DBL3X that matches two peptides (P2 and P3) recognized most strongly by 3D10 in the array. Surprisingly, the other 6 epitopes are predicted to be conformational. These findings suggest that the framework can reveal conformational epitopes based on linear array data. This was evident with the related DBL protein, PcDBP, which was recognized by 3D10 by ELISA and western blot (data not shown) but failed to bind to any linear peptides in an array. Since DBL domains are structurally stabilized by disulphide bonds36,37, we used DTT to disrupt the sole conformational epitope and indeed, recognition by 3D10 was reduced. Other approaches such as HDX, cryo-EM and site-directed mutagenesis could further validate each of the predicted 3D10 epitopes in these various DBL proteins. While DTT reduced the binding of 3D10 to PcDBP and ID1-ID2, we noted that binding to DBL5ε was not affected and for EBA-175 and the full-length VAR2CSA, binding increased. We propose that DTT treatment can expose other linear peptides that are not surface-exposed. In VAR2CSA, the peptides P4 and P5 were among the most reactive in the array data yet did not meet the criteria from the algorithm because they did not map to the protein surface. When the disulfide bonds were disrupted, these linear epitopes may become accessible for binding to the mAb.

One concern is that 3D10 is a polyreactive antibody that binds promiscuously to all DBL proteins. Analysis of DBL domains from Plasmodium proteins suggests a high likelihood for specific signatures such as ‘KxK’ and positive clusters despite significant sequence differences and genetic diversity38,39. However, our data with PvEBP2 (which shares 36% sequence identity and 52% sequence similarity with DBPII (Fig S3)) support the specificity and accuracy of our 3D10 binding criteria. 3D10 does not bind to the PvEBP2 recombinant protein nor to any of its peptides within our array. When we applied the six criteria for binding to the 3D structure of PvEBP2, no epitopes were predicted. The implication is that the criteria generated by our framework and the requirement for those criteria to be satisfied within a defined physical space on the surface of the protein were specific enough to differentiate reactive and non-reactive recombinant DBL domains. If criteria are not sufficiently restrictive, the framework will be subject to a high rate of false positive signature prediction. For example, our criterion 4 concerning polar residues did not eliminate any signatures for our proteins, and this criterion’s low specificity was not meaningful for predicting 3D10 binding. If we established a definite pattern for the position of polar residues for 3D10 reactivity, we might eliminate one or several potential signatures in VAR2CSA.

The generation of restrictive criteria largely depends on the design and size of the peptide arrays used to train a machine learning model. Although we used arrays specific for target antigens, non-natural randomized peptide arrays could be screened with an antibody of interest to extract features associated with binding or performed in addition to target antigens. The ideal panel would have high variance with respect to physicochemical features evaluated by machine learning and several peptides with varying levels of reactivity. Additionally, conformationally constrained peptide arrays could be incorporated into testing and feature space to enable the consideration of certain structural elements during criteria development4042.

Based on our findings, we propose that 3D10 is an example of a cross-reactive monoclonal antibody, a type of antibody specific for a few antigens related via their DBL domains4346. Recent work has labelled similar antibodies as “super antibodies” or “promiscuous” and note that these antibodies form a population distinct from typical highly specific antibodies9,20,47,48. This type of antibody is associated with antigenically variable pathogens; much like DBL represents an antigenically variable domain present in many malaria proteins important for pathogenesis4952. Despite consistent structural elements (such as interdomain and intradomain disulphide bonds), DBL domains of pathogenic Plasmodium proteins are genetically diverse36,52,53. A review by Walker et al. 2018 highlighted the association between antigenic variability and the generation of broadly neutralizing antibody responses9. For example, HIV and influenza often elicit antibodies with cross-reactivity to related antigens, supporting development of ‘universal’ vaccines for these immune-evasive pathogens51,5457. These antibodies make up a fraction of the total pool of antibodies but have become a focus in the development of vaccines against immune evading pathogens like HIV58. These efforts are tempered by concerns that these antibodies may be difficult to elicit consistently, and not all are broadly neutralizing59. Our framework and approach may serve as an effective platform for the comparison of these antibodies to one another and to identify common elements between them. As more broadly neutralizing antibodies are discovered against HIV, our framework can be used to determine physicochemical patterns that are associated with high breadth and potency of neutralization. The conserved structural and physicochemical properties of multiple binding regions for cross-reactive monoclonal antibodies can be exploited to effectively target antigenically variable pathogens with vaccines.

If applied to polyclonal sera, this framework can also be used to extract features associated with immunodominance. For specific pathogens such as influenza, the criteria could define specific immunodominant features within the HA head region (for example) that elicit only strain-specific antibodies, resulting in low vaccine efficacy60. Targeted epitope masking strategies could be applied to reduce the immunogenicity of these sites and stimulate responses to subdominant, protective epitopes.

The key utility in this framework is a new interpretation of peptide array data. Specifically, it can analyze peptide array data as a physicochemical characterization of the interface between an antibody and thousands of different antigens rather than a tool to map discrete epitopes by residue. Machine learning enables the simultaneous analysis of these antibody-antigen interfaces to reveal complex features of epitopes that extend beyond identifying discrete amino acid motifs. This approach to cross-reactive antibody characterization presents a marked shift from existing methods to better understand their target epitopes and can be applied to both linear and conformational epitopes.

5. Methods

5.1. Peptide array and data processing

Synthetic peptide arrays were synthesized by PepperPrint (Germany) using the PEPperCHIP protocol to develop overlapping arrays: VAR2CSA (FCR3) (GenBank: AAQ73926.1, accession: AY372123.1, residues 1–2659), VAR2CSA (NF54) (GenBank: EWC87419.1, accession: KE123842.1, residues 1–2652), DBPII, PcDBP, and PvEBP2. Peptides were 20 amino acids in length and there was a 19 amino acid overlap between peptides in all arrays except VAR2CSA (NF54) for which there was an 18 amino acid overlap.

For each data point, we used the average of the median foreground intensities for each spot (arbitrary units). For each peptide, binding was measured in duplicate (V1 and V2), and the average raw median intensity was used except for when the difference between the lesser of the two values was less than or equal to 40% the larger of the two values.

MIN(V1,V2)<=MAX(V1,V2)*0.4

Additionally, some peptides occurred in both VAR2CSA (FCR3) and VAR2CSA (NF54); in this case, the highest of the two averaged values were taken. This resulted in a total of 4,378 data points being left; 215 duplicate data points were identified and removed. We employed a test-train split of 1/3 to 2/3 to randomly distribute the data. That is, 2/3 of the datapoints and randomly selected to be involved in training and cross-validation phases of algorithm development and 1/3 of datapoints are selected to test the algorithm.

5.2. Features

Features employed in machine learning can be placed into six categories: single amino acid counts, dipeptide counts, secondary structure, physicochemical values, charge, and disulphide bond potential (Table S1). These values are either whole number, continuous, or binary values. Single amino acid counts are the total counts of each of the 20 different amino acids occurring in protein sequences. The secondary structure employs S4PRED to predict the secondary structure of the recombinant protein sequences, including the elongating “GSGSGSG” N-terminal and C-terminal linkers, and values were taken as the average of residue values for respective peptide sequence segments61. There were three values for secondary structure representing the odds of forming either an alpha-helix, beta-sheet, and a flexible coil. Physicochemical values are based on sums of values for amino acids from 3 different physicochemical scales: sidechain energy, hydropathy, and polarity26,27,62. Charge features were sums of positively charged residues (lysine, arginine, and histidine), negatively charged residues (glutamate and aspartate), and net charge. Disulphide bond potential is a binary feature which is set to ‘1’ if there are two cysteine residues in a peptide with two or more amino acids in between. Dipeptide counts, like single amino acid counts, is a count of total occurrences of each combination of 2 consecutive amino acids; 400 for each combination of 20 amino acids. This feature is adjusted by the hyperparameter ‘peptide trim.’ ‘Peptide trim’ for dipeptide counts defines the number of amino acids to skip before counting dipeptide features. This is employed because of the concern that the C-terminal of the peptides may be less accessible to 3D10 as this is the end attached to the glass slide. We chose not to employ this hyperparameter for other features as dipeptide count is intended to identify patterns of specific amino acids, whereas single amino acid counts, charge, etc., inform the physical properties of the whole peptide, which may affect its flexibility or be used in conjunction with other features.

5.3. Feature selection

We employed three methods for feature selection: variance-based feature selection, feature utilization, and recursive feature elimination. Using an ensemble of features accounts for the potential of each method to be biased for missing predictive features and selecting nonpredictive features. Both variance-based feature selection and recursive feature elimination are provided by scikit-learn63. We created the feature selection method feature utilization for our project.

Variance-based feature selection is an unsupervised method, meaning it is agnostic to 3D10 reactivity. The goal of this method is to establish which features have high variance and, therefore, are less likely to be selected due to overfitting. Because it is not dependent on 3D10 reactivity, features selected by this method are not sufficient to qualify them for analysis. We used a threshold (p) of 1.

Var|X|=p(1-p)

Feature utilization is not available in Scikit-learn. Because of the random nature of our train-test-splitting, this method involves quantifying feature usage frequency from the lowest squared error trees of 100 tournaments of 100 randomly generated trees. If any feature is present in a tournament-winning tree, it is defined as being ‘selected.’

Recursive feature elimination recursively eliminates features to a given value until a cohort of predictive features is left. Starting with the whole feature set (n features) will eliminate a poorly correlative feature to 3D10 reactivity and yield ‘n-1’ features. It repeats this until it reaches our defined values of 5, 10, or 15 features. Fifteen features represent the maximum number of features that can be applied by a tree with depth 4.

5.4. Statistical methods and feature analysis

We used 3 methods to analyze features in addition to their selection: heatmapping, enrichment, and sequence patterns. For enrichment analysis, we compared 3D10 reactive peptides to 3D10 non-reactive peptides (Fig S1).

For heat mapping, we determined average feature values for each of the 20 positions amino acid positions that span the length of the peptides. As the peptide array contained significantly overlapping peptides, the average value for each position in the non-reactive population was almost identical to that of the average for the whole sequence. This meant that the comparison between non-reactive peptides and reactive peptides was not meaningful and may diminish trends observed in 3D10 reactive peptides if they were reflected in the whole sequence. For example, 3D10 reactive peptides were more likely to contain positively charged residues than non-reactive peptides; the difference in their average positive charge for each residue position does not improve the characterization of positionality of positive residues. The data provided by the difference between 3D10 reactive and non-reactive peptides would be better characterized by enrichment analysis.

Amino acid enrichment compared frequencies of amino acid occurrence in 3D10 reactive peptides and 3D10 non-reactive peptides. Statistical significance was determined by 2-way ANOVA selecting for relevant comparisons and correcting for number of comparisons (only comparing values for each amino acid between reactive and non-reactive peptide groups). For example, to compare three different amino acid occurrences in either group, we corrected for three comparisons.

The method for determining amino acid patterns is similar to the method for sequence analysis known as TMSTAT64. Like TMSTAT, we do not have to assume that the distribution of amino acids across sequences is not anti-segregated or co-segregated relative to the features we are seeking to analyze. Assuming non-anti-segregated and non-co-segregated distribution assumes that amino acid distributions are homogeneous across all populations. We do not need to perform this assumption because our statistical comparisons are between observed odds and the expected odds. The expected odds, in this case, would be the likelihood of a homogeneously distributed population. For two types of amino acids, ‘A’ and ‘B’, we calculated the expected probability of a distance between an ‘A’ amino acid to the closest ‘B’ amino acid in 3D10 reactive or 3D10 non-reactive peptides, PExp. This value depended on the average number of ‘B’ amino acids in the respective population, B. Expected odds for any distance calculated by bootstrapping. We simulated all possible distributions of 1 to 5 ‘B’ amino acids across the peptide. We determined the shortest distance between an ‘A’ amino acid to the closest ‘B’ amino acid to develop a reference table (Table S3). For each possible distance, DX, we obtained a probability as a function of B where X denotes the discrete minimum distance between ‘A’ and the closest ‘B’ amino acid; X{1,2,3,,19}. This probability function was calculated by performing second-order polynomial regression on the bootstrapped odds.

PExp=fDxB

This resulted in 19 equations for each possible value of X and enabled the prediction of expected minimum distances between ‘A’ and ‘B’ type amino acids for cases where B was not a whole number. The observed probabilities were a proportional count for the occurrence for each distance to occur in the population PObs. Log odds ratio was obtained by a ratio of observed distance proportions and expected distance probabilities.

LNPObsPExp

We used 95% confidence intervals as a significance cut-off and calculated using previously published methods that assume a normal distribution65. With this cut-off, we could establish if the increased likelihood for a particular distance to occur between amino acids of type ‘A’ and ‘B’ was significantly more than expected. For situations where ‘A’ and ‘B’ were the same (like lysine-to-lysine distances), we employed a correction of BCor¯=B-1 to account for the fact that the reference lysine serving as ‘A’ will not be contributing to B relative to the expected probability of distances. We ignored cases in which only ‘A’ or ‘B’ were present as distances between them would be indeterminant. For a specific DX, we excluded plotting data points for which the PObs was 0 as this meant that the observed probability was indeterminately less common than PExp as the log odds ratio is equal to –∞. For graphing, we plotted D[0,5] as the 95% confidence interval grows as PExp0.

5.5. Indirect ELISA

We measured 3D10, a mouse IgG Mab, reactivity by an enzyme-linked immunosorbent assay (ELISA). 96-well Maxisorb plates (catalog no. 439454; Thermo Fisher Scientific) were coated with synthetic peptides at 10 μg/mL and recombinant proteins at either 1 μg/mL (VAR2CSA ID1-ID2) or 0.5 μg/mL (all others) at 50 μL diluted in 1 x PBS and incubated overnight at 4°C. We then washed wells once with 4% bovine serum albumin (BSA, catalogue no. A7906; Sigma-Aldrich) and incubated wells with 4% BSA for 1 h at 37°C. We washed wells 4 times with 1X PBST (0.1% Tween 20). Plates were incubated with 50 μL of primary antibody for 1 h at room temperature (RT) and washed four times with 1 x PBST. Plates were incubated with primary antibody for 1h at room temperature (RT) and washed four times with 1 x PBST. We then applied the secondary antibody treatments. Primary antibody treatments were either 3D10, Mouse IgG1 Isotype Control, catalogue no. MA5–14453 (Invitrogen, Waltham, MA)- Isotype Control, catalog no. MA5–14453, or 2% BSA. We repeated washing with 1 x PBST (0.1% Tween 20). We then incubated samples at RT for 1 h with 50 μL of HRP-conjugated secondary antibody (goat anti-mouse HRP, catalog no. 170–6516, Bio-Rad, Mississauga, Canada). After repeating washes, 50 μL of 3,3′,5,5′-Tetramethylbenzidine (TMB, catalog no. T0440; Sigma-Aldrich) was added and incubated for 30 min or when reactions approached saturation. This consideration was relevant for cases in which the optimized antibody concentration for a wildtype peptide (P1 to P5) resulted in saturation OD values for mutated peptides that significantly increased reactivity. We stopped reactions with 50 μL H2SO4 (0.5 N). All samples were tested in duplicate on the same plate and at least two plates were tested on different days. For ELISAs with protein segments treated with DTT, samples were incubated with DTT (10mM) at 56°C for 10 min and then added to the plate. The optical density (OD) values were corrected by subtracting the average of blank wells from the raw measurements and evaluating if IgG isotype control and secondary antibody control had no reactivity (OD<0.25).

OD=ODmeasured-ODblank

To determine relative binding, OD values for mutated peptides were measured as a ratio to the wildtype peptide from which they are derived:

Relativebinding=ODmutated/ODwildtype*100

Several of the reagents crucial to my project were gifts or generously provided from various sources. The 3D10 antibody was generously provided by Dr. John Adams. EBA-175 was obtained through BEI Resources, NIAID, NIH: Plasmodium falciparum Erythrocyte Binding Antigen-175 RII-Non-Glycosylated Protein, Recombinant from Pichia pastoris, MRA-1162, contributed by Annie Mo. Our reagents, RH2.A9 and RH4.9-HIS TAG sequences, have been previously described6668. Pv200MSP119 was obtained through BEI Resources, NIAID, NIH: Plasmodium vivax yP30P2-Pv200 MSP119 Protein, Recombinant from Saccharomyces cerevisiae, Strain 2905/6, MRA-60, contributed by David C. Kaslow. Whole VAR2CSA (FCR3) and VAR2CSA (FCR3) (ID1-ID2) were generously provided by Dr. Ali Salanti. Collaborators E. Medawar and A. Jin produced recombinant VAR2CSA (FCR3) (DBL5ε) protein.

5.6. Criteria mapping and structure prediction

To map criteria onto 3D structures, we first generated structures of relevant protein segments in AlphaFold 332. For proteins that had high-resolution structures available, we used the experimentally determined structures. Only EBA-175 and PvMSP1 structures were sourced from previously resolved structures. We used AlphaFold 3 generated structures because of the unavailability of target protein structures, or, in the case of VAR2CSA, available protein structures were insufficient. More specifically, previously resolved structures for VAR2CSA (FCR3) and (NF54) had several surface-exposed segments excluded23,69. EBA-175 and PvMSP1 had near-atomic accuracy resolutions (<2.5 Å) because their structures were determined by high-accuracy methods: crystallography and solution NMR35,70,71. For PvMSP1, we chose 1 of the multiple experimentally determined conformations.

The AlphaFold 3 generated VAR2CSA (FCR3) core region was evaluated against cryo-EM resolved structures for the same area (Fig S4). Relative to the pruned RMSD score, these structures were sufficiently similar to one another (RMSD< 3 Å). Low resolution and a high proportion of unstructured regions may have resulted in a higher whole RMSD score (5.894 Å)72,73. Relative to our protein segment of greater than 1400 amino acids, the whole RMSD score suggests high accuracy74.

Acknowledgements

We thank Dr. Michael Good for providing valuable feedback on our manuscript.

Funding

This research was funded by the National Institute of Allergy and Infectious Diseases of the National Institutes of Health (R01AI150944). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. We acknowledge funding from a project grant from the Canadian Institutes of Health Research (CIHR-IRSC Funding Reference Number 168944). We acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC), [RGPIN-2017-04176].

Funding Statement

This research was funded by the National Institute of Allergy and Infectious Diseases of the National Institutes of Health (R01AI150944). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. We acknowledge funding from a project grant from the Canadian Institutes of Health Research (CIHR-IRSC Funding Reference Number 168944). We acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC), [RGPIN-2017-04176].

Footnotes

Additional Declarations: No competing interests reported.

Contributor Information

Simranjit Grewal, University of Alberta.

Uwa Iyamu, University of Alberta.

Daniel Vinals, University of Alberta.

Catherine Mitran, University of Alberta.

Nidhi Hegde, University of Alberta.

Stephanie Yanow, University of Alberta.

Citations

  • 1.Venkatesan P. The 2023 WHO World malaria report. Lancet Microbe 5, e214 (2024). 10.1016/S2666-5247(24)00016-8 [DOI] [PubMed] [Google Scholar]
  • 2.Anderson L. & Menach L. A. The 2024 WHO World malaria report. Global Malaria Programme 293 (2024). [Google Scholar]
  • 3.Lee W. C., Russell B. & Renia L. Sticking for a Cause: The Falciparum Malaria Parasites Cytoadherence Paradigm. Front Immunol 10, 1444 (2019). 10.3389/fimmu.2019.01444 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Fried M. & Duffy P. E. Adherence of Plasmodium falciparum to chondroitin sulfate A in the human placenta. Science 272, 1502–1504 (1996). 10.1126/science.272.5267.1502 [DOI] [PubMed] [Google Scholar]
  • 5.Ngai M. et al. Malaria in Pregnancy and Adverse Birth Outcomes: New Mechanisms and Therapeutic Opportunities. Trends Parasitol 36, 127–137 (2020). 10.1016/j.pt.2019.12.005 [DOI] [PubMed] [Google Scholar]
  • 6.Desai M. et al. Epidemiology and burden of malaria in pregnancy. Lancet Infect Dis 7, 93–104 (2007). 10.1016/S1473-3099(07)70021-X [DOI] [PubMed] [Google Scholar]
  • 7.Mordmuller B. et al. First-in-human, Randomized, Double-blind Clinical Trial of Differentially Adjuvanted PAMVAC, A Vaccine Candidate to Prevent Pregnancy-associated Malaria. Clin Infect Dis 69, 1509–1516 (2019). 10.1093/cid/ciy1140 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Sirima S. B. et al. PRIMVAC vaccine adjuvanted with Alhydrogel or GLA-SE to prevent placental malaria: a first-in-human, randomised, double-blind, placebo-controlled study. Lancet Infect Dis 20, 585–597 (2020). 10.1016/S1473-3099(19)30739-X [DOI] [PubMed] [Google Scholar]
  • 9.Walker L. M. & Burton D. R. Passive immunotherapy of viral infections: ‘super-antibodies’ enter the fray. Nat Rev Immunol 18, 297–308 (2018). 10.1038/nri.2017.148 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Doritchamou J. Y. A. et al. A single full-length VAR2CSA ectodomain variant purifies broadly neutralizing antibodies against placental malaria isolates. Elife 11 (2022). 10.7554/eLife.76264 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Gnidehou S. et al. Functional antibodies against VAR2CSA in nonpregnant populations from colombia exposed to Plasmodium falciparum and Plasmodium vivax. Infect Immun 82, 2565–2573 (2014). 10.1128/IAI.01594-14 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Beeson J. G. et al. Antibodies among men and children to placental-binding Plasmodium falciparum-infected erythrocytes that express var2csa. Am J Trop Med Hyg 77, 22–28 (2007). [PubMed] [Google Scholar]
  • 13.Oleinikov A. V. et al. A plasma survey using 38 PfEMP1 domains reveals frequent recognition of the Plasmodium falciparum antigen VAR2CSA among young Tanzanian children. PLoS One 7, e31011 (2012). 10.1371/journal.pone.0031011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Gnidehou S. et al. Cross-Species Immune Recognition Between Plasmodium vivax Duffy Binding Protein Antibodies and the Plasmodium falciparum Surface Antigen VAR2CSA. J Infect Dis 219, 110–120 (2019). 10.1093/infdis/jiy467 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Mitran C. J. et al. Antibodies to Cryptic Epitopes in Distant Homologues Underpin a Mechanism of Heterologous Immunity between Plasmodium vivax PvDBP and Plasmodium falciparum VAR2CSA. mBio 10 (2019). 10.1128/mBio.02343-19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Ntumngia F. B. et al. Conserved and variant epitopes of Plasmodium vivax Duffy binding protein as targets of inhibitory monoclonal antibodies. Infect Immun 80, 1203–1208 (2012). 10.1128/IAI.05924-11 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.George M. T. et al. Identification of an Immunogenic Broadly Inhibitory Surface Epitope of the Plasmodium vivax Duffy Binding Protein Ligand Domain. mSphere 4 (2019). 10.1128/mSphere.00194-19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Chen E. et al. Broadly neutralizing epitopes in the Plasmodium vivax vaccine candidate Duffy Binding Protein. Proc Natl Acad Sci U S A 113, 6277–6282 (2016). 10.1073/pnas.1600488113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Mitran C. J., Higa L. M., Good M. F. & Yanow S. K. Generation of a Peptide Vaccine Candidate against Falciparum Placental Malaria Based on a Discontinuous Epitope. Vaccines (Basel) 8 (2020). 10.3390/vaccines8030392 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Laffy J. M. J. et al. Promiscuous antibodies characterised by their physico-chemical properties: From sequence to structure and back. Prog Biophys Mol Biol 128, 47–56 (2017). 10.1016/j.pbiomolbio.2016.09.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Iyamu U. et al. A conserved epitope in VAR2CSA is targeted by a cross-reactive antibody originating from Plasmodium vivax Duffy binding protein. Front Cell Infect Microbiol 13, 1202276 (2023). 10.3389/fcimb.2023.1202276 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Ntumngia F. B. et al. A Novel Erythrocyte Binding Protein of Plasmodium vivax Suggests an Alternate Invasion Pathway into Duffy-Positive Reticulocytes. mBio 7 (2016). 10.1128/mBio.01261-16 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Ma R. et al. Structural basis for placental malaria mediated by Plasmodium falciparum VAR2CSA. Nat Microbiol 6, 380–391 (2021). 10.1038/s41564-020-00858-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.McDonald E. F., Jones T., Plate L., Meiler J. & Gulsevin A. Benchmarking AlphaFold2 on peptide structure prediction. Structure 31, 111–119 e112 (2023). 10.1016/j.str.2022.11.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Weng G. et al. Comprehensive Evaluation of Fourteen Docking Programs on Protein-Peptide Complexes. J Chem Theory Comput 16, 3959–3969 (2020). 10.1021/acs.jctc.9b01208 [DOI] [PubMed] [Google Scholar]
  • 26.Di Rienzo L. et al. Characterizing Hydropathy of Amino Acid Side Chain in a Protein Environment by Investigating the Structural Changes of Water Molecules Network. Front Mol Biosci 8, 626837 (2021). 10.3389/fmolb.2021.626837 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Liang S. & Grishin N. V. Effective scoring function for protein sequence design. Proteins 54, 271–281 (2004). 10.1002/prot.10560 [DOI] [PubMed] [Google Scholar]
  • 28.Regenmortel M. H. V. V. What Is a B-Cell Epitope? (2009).
  • 29.Ainavarapu S. R. et al. Contour length and refolding rate of a small protein controlled by engineered disulfide bonds. Biophys J 92, 225–233 (2007). 10.1529/biophysj.106.091561 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Ramaraj T., Angel T., Dratz E. A., Jesaitis A. J. & Mumey B. Antigen-antibody interface properties: composition, residue interactions, and features of 53 non-redundant structures. Biochim Biophys Acta 1824, 520–532 (2012). 10.1016/j.bbapap.2011.12.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Rubinstein N. D. et al. Computational characterization of B-cell epitopes. Mol Immunol 45, 3477–3489 (2008). 10.1016/j.molimm.2007.10.016 [DOI] [PubMed] [Google Scholar]
  • 32.Abramson J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024). 10.1038/s41586-024-07487-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Hester J. et al. De novo assembly of a field isolate genome reveals novel Plasmodium vivax erythrocyte invasion genes. PLoS Negl Trop Dis 7, e2569 (2013). 10.1371/journal.pntd.0002569 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.de Assis G. M. P. et al. Profiling Humoral Immune Response Against Pre-Erythrocytic and Erythrocytic Antigens of Malaria Parasites Among Neotropical Primates in the Brazilian Atlantic Forest. Front Cell Infect Microbiol 11, 678996 (2021). 10.3389/fcimb.2021.678996 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Tolia N. H., Enemark E. J., Sim B. K. & Joshua-Tor L. Structural basis for the EBA-175 erythrocyte invasion pathway of the malaria parasite Plasmodium falciparum. Cell 122, 183–193 (2005). 10.1016/j.cell.2005.05.033 [DOI] [PubMed] [Google Scholar]
  • 36.Singh S. K., Hora R., Belrhali H., Chitnis C. E. & Sharma A. Structural basis for Duffy recognition by the malaria parasite Duffy-binding-like domain. Nature 439, 741–744 (2006). 10.1038/nature04443 [DOI] [PubMed] [Google Scholar]
  • 37.Howell D. P., Samudrala R. & Smith J. D. Disguising itself--insights into Plasmodium falciparum binding and immune evasion from the DBL crystal structure. Mol Biochem Parasitol 148, 1–9 (2006). 10.1016/j.molbiopara.2006.03.004 [DOI] [PubMed] [Google Scholar]
  • 38.Benavente E. D. et al. Global genetic diversity of var2csa in Plasmodium falciparum with implications for malaria in pregnancy and vaccine development. Sci Rep 8, 15429 (2018). 10.1038/s41598-018-33767-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Smith J. D., Subramanian G., Gamain B., Baruch D. I. & Miller L. H. Classification of adhesive domains in the Plasmodium falciparum erythrocyte membrane protein 1 family. Mol Biochem Parasitol 110, 293–310 (2000). 10.1016/s0166-6851(00)00279-6 [DOI] [PubMed] [Google Scholar]
  • 40.Wei S. et al. Binding epitope for recognition of human TRPM4 channel by monoclonal antibody M4M. Sci Rep 12, 19562 (2022). 10.1038/s41598-022-22077-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Ashworth J. M., Scott D., Yuvuraj F., Woolhouse M., Stoevesandt M., O. . Peptide microarray IgM and IgG screening of pre-SARS-CoV-2 human serum samples from Zimbabwe for reactivity with peptides from all seven human coronaviruses: a cross-sectional study. The Lancet Microbe 4, e215–e227 (2023). [Google Scholar]
  • 42.Vashisht K. et al. Cyclic constrained immunoreactive peptides from crucial P. falciparum proteins: potential implications in malaria diagnostics. Transl Res 249, 28–36 (2022). 10.1016/j.trsl.2022.06.008 [DOI] [PubMed] [Google Scholar]
  • 43.Roberts K. J. et al. Preclinical development of a bispecific TNFalpha/IL-23 neutralising domain antibody as a novel oral treatment for inflammatory bowel disease. Sci Rep 11, 19422 (2021). 10.1038/s41598-021-97236-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Cunningham O., Scott M., Zhou Z. S. & Finlay W. J. J. Polyreactivity and polyspecificity in therapeutic antibody development: risk factors for failure in preclinical and clinical development campaigns. MAbs 13, 1999195 (2021). 10.1080/19420862.2021.1999195 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Finlay W. J. J., Coleman J. E., Edwards J. S. & Johnson K. S. Anti-PD1 ‘SHR-1210’ aberrantly targets pro-angiogenic receptors and this polyspecificity can be ablated by paratope refinement. MAbs 11, 26–44 (2019). 10.1080/19420862.2018.1550321 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Sandeep, Shinde S. H. & Pande A. H. Polyspecificity - An emerging trend in the development of clinical antibodies. Mol Immunol 155, 175–183 (2023). 10.1016/j.molimm.2023.02.005 [DOI] [PubMed] [Google Scholar]
  • 47.Kaur H. & Salunke D. M. Antibody promiscuity: Understanding the paradigm shift in antigen recognition. IUBMB Life 67, 498–505 (2015). 10.1002/iub.1397 [DOI] [PubMed] [Google Scholar]
  • 48.Jain D. & Salunke D. M. Antibody specificity and promiscuity. Biochem J 476, 433–447 (2019). 10.1042/BCJ20180670 [DOI] [PubMed] [Google Scholar]
  • 49.Andrews S. F. et al. Immune history profoundly affects broadly protective B cell responses to influenza. Sci Transl Med 7, 316ra192 (2015). 10.1126/scitranslmed.aad0522 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Doria-Rose N. A. et al. Breadth of human immunodeficiency virus-specific neutralizing activity in sera: clustering analysis and association with clinical variables. J Virol 84, 1631–1636 (2010). 10.1128/JVI.01482-09 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Sather D. N. et al. Factors associated with the development of cross-reactive neutralizing antibodies during human immunodeficiency virus type 1 infection. J Virol 83, 757–769 (2009). 10.1128/JVI.02036-08 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Hodder A. N. et al. Insights into Duffy binding-like domains through the crystal structure and function of the merozoite surface protein MSPDBL2 from Plasmodium falciparum. J Biol Chem 287, 32922–32939 (2012). 10.1074/jbc.M112.350504 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Jagadeeshaprasad M. G. et al. Disulfide bond and crosslinking analyses reveal interdomain interactions that contribute to the rigidity of placental malaria VAR2CSA structure and formation of CSA binding channel. Int J Biol Macromol 226, 143–158 (2023). 10.1016/j.ijbiomac.2022.11.258 [DOI] [PubMed] [Google Scholar]
  • 54.Andrews S. F. et al. An influenza H1 hemagglutinin stem-only immunogen elicits a broadly cross-reactive B cell response in humans. Sci Transl Med 15, eade4976 (2023). 10.1126/scitranslmed.ade4976 [DOI] [PubMed] [Google Scholar]
  • 55.Centers for Disease, C. & Prevention. Serum cross-reactive antibody response to a novel influenza A (H1N1) virus after vaccination with seasonal influenza vaccine. MMWR Morb Mortal Wkly Rep 58, 521–524 (2009). [PubMed] [Google Scholar]
  • 56.Mikell I. et al. Characteristics of the earliest cross-neutralizing antibody response to HIV-1. PLoS Pathog 7, e1001251 (2011). 10.1371/journal.ppat.1001251 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Li T. et al. Identification of a cross-neutralizing antibody that targets the receptor binding site of H1N1 and H5N1 influenza viruses. Nat Commun 13, 5182 (2022). 10.1038/s41467-022-32926-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Burton D. R. & Hangartner L. Broadly Neutralizing Antibodies to HIV and Their Role in Vaccine Design. Annu Rev Immunol 34, 635–659 (2016). 10.1146/annurev-immunol-041015-055515 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Sok D. & Burton D. R. Recent progress in broadly neutralizing antibodies to HIV. Nat Immunol 19, 1179–1188 (2018). 10.1038/s41590-018-0235-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Sanchez-de Prada L. et al. Immunodominance hierarchy after seasonal influenza vaccination. Emerg Microbes Infect 11, 2670–2679 (2022). 10.1080/22221751.2022.2135460 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Moffat L. & Jones D. T. Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework. Bioinformatics 37, 3744–3751 (2021). 10.1093/bioinformatics/btab491 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Grantham R. Amino acid difference formula to help explain protein evolution. Science 185, 862–864 (1974). 10.1126/science.185.4154.862 [DOI] [PubMed] [Google Scholar]
  • 63.Pedregosa F V. G., Gramfort A, Michel V, Thirion B, Grisel O. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. Journal of Machine Learning Research, 2825–2830 (2011). [Google Scholar]
  • 64.Senes A., Gerstein M. & Engelman D. M. Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with beta-branched residues at neighboring positions. J Mol Biol 296, 921–936 (2000). 10.1006/jmbi.1999.3488 [DOI] [PubMed] [Google Scholar]
  • 65.Bland J. M. & Altman D. G. Statistics notes. The odds ratio. BMJ 320, 1468 (2000). 10.1136/bmj.320.7247.1468 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Ahmed Ismail H. et al. Acquired antibodies to merozoite antigens in children from Uganda with uncomplicated or severe Plasmodium falciparum malaria. Clin Vaccine Immunol 20, 1170–1180 (2013). 10.1128/CVI.00156-13 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Tham W. H. et al. Antibodies to reticulocyte binding protein-like homologue 4 inhibit invasion of Plasmodium falciparum into human erythrocytes. Infect Immun 77, 2427–2435 (2009). 10.1128/IAI.00048-09 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Reiling L. et al. The Plasmodium falciparum erythrocyte invasion ligand Pfrh4 as a target of functional and protective human antibodies against malaria. PLoS One 7, e45253 (2012). 10.1371/journal.pone.0045253 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Wang K. et al. Cryo-EM reveals the architecture of placental malaria VAR2CSA and provides molecular insight into chondroitin sulfate binding. Nat Commun 12, 2956 (2021). 10.1038/s41467-021-23254-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Barth P., Schonbrun J. & Baker D. Toward high-resolution prediction and design of transmembrane helical protein structures. Proc Natl Acad Sci U S A 104, 15682–15687 (2007). 10.1073/pnas.0702515104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Babon J. J. et al. Structural studies on Plasmodium vivax merozoite surface protein-1. Mol Biochem Parasitol 153, 31–40 (2007). 10.1016/j.molbiopara.2007.01.015 [DOI] [PubMed] [Google Scholar]
  • 72.Carugo O. How root-mean-square distance (r.m.s.d.) values depend on the resolution of protein structures that are compared. Journal of Applied Crystallography 36, 125–128 (2003). 10.1107/S0021889802020502 [DOI] [Google Scholar]
  • 73.Chothia C. & Lesk A. M. The relation between the divergence of sequence and structure in proteins. EMBO J 5, 823–826 (1986). 10.1002/j.1460-2075.1986.tb04288.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Reva B. A., Finkelstein A. V. & Skolnick J. What is the probability of a chance prediction of a protein structure with an rmsd of 6 A? Fold Des 3, 141–147 (1998). 10.1016/s1359-0278(98)00019-4 [DOI] [PubMed] [Google Scholar]

Articles from Research Square are provided here courtesy of American Journal Experts

RESOURCES