Skip to main content
eLife logoLink to eLife
. 2021 Sep 23;10:e69808. doi: 10.7554/eLife.69808

Common host variation drives malaria parasite fitness in healthy human red cells

Emily R Ebel 1,2, Frans A Kuypers 3, Carrie Lin 2, Dmitri A Petrov 1,, Elizabeth S Egan 2,4,
Editors: Jenny Tung5, George H Perry6
PMCID: PMC8497061  PMID: 34553687

Abstract

The replication of Plasmodium falciparum parasites within red blood cells (RBCs) causes severe disease in humans, especially in Africa. Deleterious alleles like hemoglobin S are well-known to confer strong resistance to malaria, but the effects of common RBC variation are largely undetermined. Here, we collected fresh blood samples from 121 healthy donors, most with African ancestry, and performed exome sequencing, detailed RBC phenotyping, and parasite fitness assays. Over one-third of healthy donors unknowingly carried alleles for G6PD deficiency or hemoglobinopathies, which were associated with characteristic RBC phenotypes. Among non-carriers alone, variation in RBC hydration, membrane deformability, and volume was strongly associated with P. falciparum growth rate. Common genetic variants in PIEZO1, SPTA1/SPTB, and several P. falciparum invasion receptors were also associated with parasite growth rate. Interestingly, we observed little or negative evidence for divergent selection on non-pathogenic RBC variation between Africans and Europeans. These findings suggest a model in which globally widespread variation in a moderate number of genes and phenotypes modulates P. falciparum fitness in RBCs.

Research organism: Human, P. falciparum

Introduction

Malaria caused by the replication of Plasmodium falciparum parasites in red blood cells (RBCs) kills hundreds of thousands of children each year (WHO, 2019). In each 48-hr cycle of blood-stage malaria, parasites must deform RBC membranes to invade them (Koch, 2017; Kariuki et al., 2020); consume hemoglobin and tolerate the resulting oxidative stress (Francis et al., 1997); multiply to displace half the RBC volume (Hanssen et al., 2012); and remodel the RBC membrane to avoid immune detection (Zhang, 2015). Consequently, genetic disorders that alter aspects of RBC biology are well-known to influence malaria susceptibility (Kwiatkowski, 2005). For example, sickle cell trait impairs parasite growth by altering hemoglobin polymerization at low oxygen tension (Pasvol et al., 1978; Archer et al., 2018), while deficiency of the G6PD enzyme involved in oxidative stress tolerance is thought to make parasitized RBCs more susceptible to breakdown (Ruwende and Hill, 1998). Aside from these diseases, however, the genetic basis of RBC susceptibility to malaria remains mostly unknown.

Large genome-wide association studies (GWAS) have identified a few dozen loci that collectively explain up to 11% of the heritability of the risk of severe versus uncomplicated malaria (Timmann et al., 2012; Malaria Genomic Epidemiology Network, 2014; Band et al., 2015; Leffler et al., 2017; Ndila et al., 2018; Malaria Genomic Epidemiology Network, 2019). About 10 of the highest-confidence GWAS signals, including 6 loci known from earlier methods (Allison, 1954; Field et al., 1994; Ruwende and Hill, 1998; Lell et al., 1999; Rowe, 2007; Cao and Galanello, 2010; Galanello and Cao, 2011), are in or near genes expressed predominantly in RBCs. One new GWAS variant has since been shown to regulate expression of the ATP2B4 calcium channel (Zámbó et al., 2017) and to be associated with RBC dehydration (Li et al., 2013b), although a functional link between ATP2B4 and P. falciparum replication has yet to be demonstrated. Additional GWAS discoveries of RBC variation important for malaria are not expected without massive increases in sample size (Boyle et al., 2017; Malaria Genomic Epidemiology Network, 2019), in part because of the large number of hypotheses tested. Severe malaria is a complex phenotype that combines many factors from RBCs, the vascular endothelium, the immune system, the parasite, and the environment (Mackinnon et al., 2005; de Mendonça et al., 2012). Alternate approaches are therefore needed to discover more genetic variation that impacts the replication of malaria parasites in human RBCs.

Heritable RBC phenotypes like mean cell volume (MCV), hemoglobin content (HGB/MCH), and antigenic blood type vary widely within and between human populations (Whitfield et al., 1985; Evans et al., 1999; Pilia et al., 2006; Lo et al., 2011; Cooling, 2015; Canela-Xandri et al., 2018). Large GWAS conducted mostly in Europeans have demonstrated that many blood cell phenotypes are shaped by hundreds of small-effect loci distributed throughout the genome, consistent with polygenic or omnigenic models of complex trait genetics (van der Harst et al., 2012; Astle et al., 2016; Chami et al., 2016; Boyle et al., 2017; Chen et al., 2020; Vuckovic et al., 2020). Certain blood phenotypes like average hemoglobin levels, hematocrit, and RBC membrane fragility are also known to differ between African and European populations, although the differences are typically small in magnitude (Garn, 1981; Perry et al., 1992; Beutler and West, 2005; Kanias et al., 2017; Page et al., 2021). This variation across populations can largely be explained by a few RBC disease alleles that have been widely selected across Africa for their protective effects on malaria (Beutler and West, 2005; Lo et al., 2011; Kanias et al., 2017). Despite the importance of these population-specific variants, a much larger number of common variants with small individual effects on RBC phenotypes are expected to be globally widespread (Biddanda et al., 2020; Chen et al., 2020). It remains untested whether this extensive phenotypic and genetic diversity in RBCs influences malaria susceptibility and, if so, whether it has been shaped by malaria selection.

Here, we approach these questions by performing exome sequencing and extensive RBC phenotyping on blood samples from a diverse human cohort of 122 individuals. We show that P. falciparum fitness varies widely among donor cells in vitro, with the distribution of parasite phenotypes in ‘healthy’ RBCs overlapping those from RBCs carrying classic disease alleles. We apply LASSO variable selection to identify a small set of genes and phenotypes that strongly predict parasite fitness outside of the context of RBC disease, highlighting RBC dehydration and membrane properties as key to modulating P. falciparum fitness. We find little evidence that non-pathogenic alleles or phenotypes that confer parasite protection are associated with African ancestry, perhaps because P. falciparum is susceptible to RBC variation that exists for other selective or demographic reasons. Overall, these findings advance our understanding of the origin and function of common RBC variation and suggest new targets for therapeutic intervention for malaria.

Results

Many healthy blood donors with African ancestry carry alleles for RBC disease

We collected blood samples from 121 donors with no known history of blood disorders, most of whom self-identified as having recent African ancestry (Figure 1A). As a positive control, we also sampled a patient with hereditary elliptocytosis (HE), a polygenic condition characterized by extremely fragile RBC membranes that strongly inhibit P. falciparum growth (Schulman et al., 1990; Facer, 1995; Dhermy et al., 2007; Gallagher, 2013). We performed whole-exome sequencing (Figure 1—source data 1), both to check for the presence of known RBC disease alleles and to confirm the population genetic ancestry of our donors. A principal component analysis of more than 35,000 exomic single-nucleotide polymorphisms (SNPs) showed that most donors fell along a continuum from African to European ancestry, as defined by data from the 1000 Genomes Project (Figure 1A). Pairwise kinship coefficients demonstrated that all donors were unrelated, apart from a six-member family with unique ancestry (Figure 1A, light borders). We found that 16% of the healthy donors carried pathogenic hemoglobin alleles (Figure 1B), including 5 heterozygotes for hemoglobin S (HbAS), 4 heterozygotes for hemoglobin C (HbAC), and 11 individuals with one or two copies of an HBA2 deletion causing α-thalassemia (Galanello and Cao, 2011). We also scored eight polymorphisms in G6PD that have been functionally associated with various degrees of G6PD deficiency (Yoshida et al., 1971; Clarke et al., 2017) and found that 32% of the study population carried at least one, including 12 of the 20 donors with hemoglobinopathies. Among those with wild-type hemoglobin, we identified 1 individual with polymorphisms associated with severe G6PD deficiency (>60% loss of function) and 23 with polymorphisms associated with mild to medium deficiency (<42% loss of function). We detected no alleles linked to other monogenic RBC disorders, including β-thalassemia or xerocytosis (Cao and Galanello, 2010; Glogowska et al., 2017). We therefore classified the remaining 68 unrelated donors as ‘non-carriers’ of known disease alleles for the purposes of this work.

Figure 1. Overview of blood donors and study design.

Figure 1.

(A) PCA of genetic variation across 35,759 unlinked exome SNPs. Donors from this study are plotted on coordinate space derived from 1000 Genomes reference populations. Points with white borders represent six related individuals, five of whom were excluded from the study. All exome variants passing quality filters are available in Figure 1—source data 1. (B) Over a third of donors carried alleles for RBC disorders linked to Plasmodium falciparum resistance. Individuals with >1 disease allele were classified by their most severe condition. non-carrier: Donor without any of the following alleles or conditions. G6PDlow: Mild to medium G6PD deficiency (<42% loss of function). G6PDhigh: Severe G6PD deficiency (>60% loss of function). −α/αα: heterozygous HBA2 deletion, or alpha thalassemia minima. −α/−α: homozygous HBA2 deletion, or α-thalassemia trait. HbAC: heterozygous HBB:E7K, or hemoglobin C trait. HbAS: heterozygous HBB:E7V, or sickle cell trait. HE: hereditary elliptocytosis. (C) Two components of P. falciparum fitness were measured with flow cytometry at three timepoints. Invasion is the change in parasitemia as schizonts egress from maintenance RBCs (green) and invade fresh acceptor RBCs from the blood donors (purple). Growth is the multiplication rate from a complete parasite cycle in the fresh acceptor RBCs. (D) RBC phenotypes were measured using complete blood counts with RBC indices, osmotic fragility tests, and ektacytometry on fresh samples. This figure was partially created with Biorender.com. RBC, red blood cell; SNP, single-nucleotide polymorphism.

Figure 1—source data 1. Individual genotypes, population frequencies, and protein annotations for exome variants passing quality filters (N~160,000).

P. falciparum replication rates vary widely among non-carrier RBCs

To determine the variation in P. falciparum fitness among samples with different genotypes, we performed invasion and growth assays with two parasite strains. The genome reference strain 3D7, which was originally isolated from a European, has been continuously cultured in academic labs for at least 40 years (Walliker et al., 1987; Moser, 2020). Th.026.09 is a drug-resistant strain collected from Senegal in 2009 that is minimally adapted to lab culture (Daniels et al., 2012). These divergent strains were selected in an attempt to balance biological realism with reliable in vitro data.

We observed a wide range of P. falciparum growth rates among RBC samples, especially among non-carriers that lacked known disease alleles (Figure 2A–C). Each strain’s growth rate is defined here as parasite multiplication over a full 48-hr cycle in donor RBCs (Figure 1C), with the mean value for non-carriers set to 100% after normalization. Briefly, we used a repeated control RBC sample (Figure 2, gray points) and other batch-specific factors to correct for variation in parasite growth across multiple experiments (Figure 2—figure supplement 1). Among non-carriers, growth rates ranged from 64% to 136% for 3D7 (SD=17.7%) and 76% to 128% for Th.026.09 (SD=10.6%) (Figure 2A–B). Per-sample growth rates were strongly correlated between the two strains (Figure 2C, R2=0.69, p<3×10–16) and positively correlated when measured in different weeks (p=0.35, Figure 2—figure supplement 2), demonstrating that these data capture meaningful variation among donor RBCs. Furthermore, as expected (Friedman, 1978; Ifediba et al., 1985; Greene, 1993; Facer, 1995), we detected reductions in mean growth rate for both strains in RBCs carrying known disease alleles. These included individuals with α-thalassemia trait (3D7 p=0.027; Th.026.09 p=0.077), HbAS (3D7 p=1.05×10–7; Th.026.09 p=1.2×10–4), and the single carriers of HE and severe G6PD deficiency. Notably, the wide distribution of growth rates for non-carrier RBCs had considerable overlap with the growth rates in carrier RBCs. Only the HbAS and HE samples fell entirely outside the non-carrier range. This observation implies the existence of previously unknown RBC variation that impacts P. falciparum growth, which may have cumulative effect sizes comparable to known disease alleles.

Figure 2. Plasmodium falciparum replication rate varies widely among donor RBCs.

(A, B) Growth of P. falciparum lab strain 3D7 (A) or clinical isolate Th.026.09 (B) over a full 48-hr cycle in donor RBCs (see Figure 1C). Growth is presented relative to the average non-carrier rate after correction for batch effects (Figure 2—figure supplement 1; see Materials and methods), including comparison to a repeated RBC control shown in gray. Each carrier group was compared to unrelated non-carriers using Student’s t-test, except in cases where N=1, where asterisks instead indicate the percentile of the non-carrier distribution. Repeated measurements of 11 donors are shown in Figure 2—figure supplement 2. (C) Per-sample growth rates are correlated between the two P. falciparum strains. (D–F) As in (A–C) but for P. falciparum invasion efficiency (see Figure 1C). R2 and p-values are derived from OLS regression. *p<0.1; **p<0.05; ***p<0.01. RBC, red blood cell.

Figure 2.

Figure 2—figure supplement 1. Linear models of batch effects on parasite fitness.

Figure 2—figure supplement 1.

PMR: parasite multiplication rate.
Figure 2—figure supplement 2. Repeatability of parasite assays in the same donors over time.

Figure 2—figure supplement 2.

Twelve participants (including the weekly control, 1111) donated blood in multiple weeks for independent experiments. Repeated donors were non-carriers except 6443, who carried the HbAC allele; and 7160, 8597, and 4278, who carried alleles for mild G6PD deficiency. Growth and invasion data are shown after standardization and batch correction, as described in Materials and methods. Pearson’s rho is calculated between the first and second assays and can range from –1 to 1.

We observed a similarly wide range in the efficiency of P. falciparum invasion into donor RBCs (Figure 2D–F). Invasion is defined here as the fold-change in parasitemia over the first 24 hours of the assay, when parasites previously maintained in standard culture conditions egressed and invaded new donor RBCs (Figure 1C). Among non-carriers, invasion rates ranged from 70% to 143% for 3D7 (SD=14.9%) and 41% to 193% for Th.026.09 (SD=29.1%) (Figure 2D–E). Compared to growth rates, no disease alleles conferred protection against invasion that was extreme enough to fall outside the broad non-carrier range. HbAC was associated with an 11% decrease in 3D7 invasion (p=0.008), while α-thalassemia trait was associated with a 22% increase in Th.026.09 invasion (p=0.091). Only HE had a strong effect on the invasion efficiency of both strains. The correlation of invasion efficiencies between strains was weaker than for growth (Figure 2F, R2=0.10, p=6×10–4), potentially reflecting strain-specific differences in the pathways used for invasion (Wright and Rayner, 2014). However, we also observed greater batch effects (Figure 2—figure supplement 1) and greater variability between repeated samples (Figure 2—figure supplement 2) for invasion than for growth, suggesting that invasion is influenced by greater experimental noise.

RBC phenotypes vary widely among non-carriers

To assess phenotypic variation across donor RBCs, we measured 22 common indices of RBC size and hemoglobin content from complete blood counts using an ADVIA hematology analyzer (Figure 3A–D; Figure 3—figure supplements 12). Mean cellular volume (MCV) and hemoglobin mass (MCH) are closely related traits, which can be represented together as cellular hemoglobin concentration (CHCM) or the fraction of RBCs with ‘normal’ hemoglobin and volume indices (M5). As expected, each known disease allele was associated with a distinct set of RBC abnormalities (Figure 3—figure supplement 3). These included elevated CHCM for HbAC (p=0.033), consistent with dehydration, and very low MCV (p=6.8×10–5) and MCH (p=2.5×10–7) for α-thalassemia trait (−α/−α), consistent with microcytic anemia (Galanello and Cao, 2011). RBCs from the HE patient also had very low MCV and MCH, reflecting the membrane breakage and volume loss characteristic of this disease. For all these phenotypic measures, we also observed broad distributions in non-carriers that overlapped the distributions of most carriers (Figure 3A–D; Figure 3—figure supplements 12). Notably, the breadth of the non-carrier distribution for each phenotype was large (e.g., 24 fl range for MCV) compared to the average difference between Africans and Europeans (e.g., 3–5 fl; Beutler and West, 2005; Lo et al., 2011). This wide diversity and substantial overlap between non-carrier and carrier traits suggest that healthy RBCs exist on the same phenotypic continuum as RBCs carrying known disease alleles.

Figure 3. Red cell phenotypes that are abnormal in carriers also vary widely among non-carriers.

(A–D) Red cell indices were measured by an ADVIA hematology analyzer. Additional indices are shown in Figure 3—figure supplement 1. MCV: mean corpuscular (RBC) volume; MCH: mean cellular hemoglobin; CHCM: cellular hemoglobin concentration; M5: fraction of RBCs with normal volume and normal hemoglobin (see Figure 3—figure supplement 2). Statistical tests as in Figure 2. (E, F) Osmotic fragility curves. Fragility is defined as the NaCl concentration at which 50% of RBCs lyse (see Figure 3—figure supplement 4). (G, H) Ektacytometry curves characterize RBC deformability and dehydration under salt stress (Figure 3—figure supplement 5). A heatmap of all phenotypes by carrier status is available in Figure 3—figure supplement 3. RBC, red blood cell.

Figure 3.

Figure 3—figure supplement 1. RBC indices data from complete blood counts in donor RBCs.

Figure 3—figure supplement 1.

Donor classifications and statistical tests as in Figure 3. HCT, hematocrit; HDW, hemoglobin distribution width; HGB, hemoglobin; MCHC, mean corpuscular hemoglobin concentration; MPV, mean platelet volume; PLT, platelet count; RBC, red blood cell; RDW, red cell distribution width, .
Figure 3—figure supplement 2. RBC Matrix data from complete blood counts in donor RBCs.

Figure 3—figure supplement 2.

Donor classifications and statistical tests as in Figure 3. RBC, red blood cell.
Figure 3—figure supplement 3. Heatmap of RBC phenotypes by carrier status.

Figure 3—figure supplement 3.

Phenotypes were scaled and centered at 0 with the preProcess function from the caret package in R using all sample data (i.e., not limited to non-carriers). The mean value for each carrier group is shown. RBC, red blood cell.
Figure 3—figure supplement 4. Osmotic fragility diagram and summary data.

Figure 3—figure supplement 4.

Donor classifications and statistical tests as in Figure 3.
Figure 3—figure supplement 5. Ektacytometry diagram and summary data.

Figure 3—figure supplement 5.

Donor classifications and statistical tests as in Figure 3.

We observed similar patterns of variation in RBC membrane fragility (Figure 3E–F; Figure 3—figure supplement 4) and membrane deformability (Figure 3G–H; Figure 3—figure supplement 5), as measured with osmotic fragility tests and osmotic gradient ektacytometry. Both sets of curves represent RBC tolerance to osmotic stress, which can result in swelling and lysis (fragility, Figure 3E–F) or dehydration and decreased deformability (Ohyper, Figure 3G–H). Specific hemoglobinopathies were associated with moderate to strong reductions in fragility, deformability, and/or resistance to loss of deformability when dehydrated (Figure 3—figure supplements 35). HE cells were both extremely fragile and extremely non-deformable. In non-carriers, the distributions for all membrane measures were wide, continuous, and overlapped the distribution of most carriers (Figure 3E–H). Overall, these data demonstrate that multiple phenotypic alterations associated with RBC disease alleles are also present in non-carrier RBCs.

Non-carrier variation in RBC phenotypes predicts P. falciparum replication rate

To identify sets of phenotypes associated with P. falciparum replication in non-carrier RBCs, we used a machine learning method called LASSO (Least Absolute Shrinkage and Selection Operator) that performs regularization and variable selection (Tibshirani, 1994). Briefly, LASSO shrinks the regression coefficients for some possible predictors to zero to obtain a subset of predictors (in this case, phenotypes) that minimizes prediction error. This method is well-suited for data sets where predictors are correlated, as are RBC size, hemoglobin, and membrane dynamics; and for cases where the number of possible predictors is large compared to the number of measurements. To validate RBC phenotypes associated with P. falciparum replication by LASSO, we performed k-folds cross-validation (CV) on train and test sets derived from 10,000 divisions of the non-carrier data in 10-folds (see Materials and methods). To further control for overfitting, we also applied the same procedure to 1000 random permutations of the parasite data. Finally, for each trait selected by LASSO in at least 40% of training sets, we applied univariate OLS regression to estimate the sign of its effect on all measured components of parasite fitness. The highest-confidence results from this analysis are summarized in Figure 4A, with complete details provided in Figure 4—source data 1.

Figure 4. RBC phenotypes predict Plasmodium falciparum fitness in non-carriers.

(A) Phenotypes selected by LASSO in at least 40% of train data sets (blue shading; see Materials and methods) in at least one of four models of parasite replication (columns). Each model was trained on ~90% of the data (B, C) and tested on the remaining 10% (B, C). (+/−) shows the direction of effect if the phenotype was significantly correlated (p<0.1) with the parasite fitness component in a separate, univariate linear model (Figure 4—figure supplement 1; ). MCV: mean RBC volume (fl).MCH: mean corpuscular hemoglobin (pg/RBC). O50: Osmotic fragility (mM NaCl; see Figure 3—figure supplement 4). DImax: Maximum membrane deformability (arbitrary units; see Figure 3—figure supplement 5). Ohyper: Tendency to resist osmotic dehydration and loss of deformability. M4: fraction of RBCs with normal volume and low hemoglobin (see Figure 3—figure supplement 2). M6: fraction of RBCs with normal volume and high hemoglobin. M8: fraction of RBCs with low volume and normal hemoglobin. CHCM: cellular hemoglobin concentration mean (g/dl). MCHC: mean corpuscular hemoglobin concentration (g/dl). PLT: platelet number (×103/µl). MPV: mean platelet volume (fl). RBC: red cell number (×106/µl). HCT: hematocrit, or the fraction of blood volume composed of RBCs. RDW: red cell distribution width (%). (B, C) Variance in parasite fitness explained by RBC phenotypes in LASSO models. Dashed lines indicate average R2 for the measured test data. Each histogram shows the same procedure on 1000 permutations of the measured test data. RBC, red blood cell.

Figure 4—source data 1. Association statistics for individual phenotypic predictors with non-zero LASSO support.

Figure 4.

Figure 4—figure supplement 1. Scatterplots of RBC phenotypes against parasite fitness in non-carriers.

Figure 4—figure supplement 1.

Phenotypes from Figure 4A are shown if at least one OLS test had p<0.1. Lines of best fit are shown if p<0.1. All phenotypes except Fragility and Donor Age are clustered around the median as a result of normalization, which equalized the median value across weeks (see Materials and methods). Pink: 3D7; Orange: Th.026.09. RBC, red blood cell.

P. falciparum fitness in non-carrier RBCs was strongly predicted by variation in traits related to volume, hemoglobin, deformability, and dehydration (Figure 4A). Among 25 tested phenotypes, the most strongly predictive was the ektacytometry parameter Ohyper, which represents a cell’s tendency to retain deformability in the face of dehydration (Figure 3—figure supplement 4). In univariate models, non-carrier RBCs with the largest Ohyper values—that is, those that retained more deformability when dehydrated—supported 22–46% faster parasite growth (3D7 p=0.003, Th.026.09 p=0.007) and 31–83% more effective invasion (3D7 p=0.008, Th.026.09 p=0.005) than RBCs with the smallest Ohyper values, which quickly lost deformability. Consistent with this result, P. falciparum replication was inhibited in RBCs that were more dehydrated at baseline (e.g., with higher CHCM; 3D7 invasion p=0.006, 3D7 growth p=0.024, Th.026.09 invasion p=0.004, Th.026.09 growth p=0.26). Parasites also grew faster in RBCs with larger mean volume (MCV; 3D7 p=0.001, Th.026.09 p=0.0009); a greater mass of hemoglobin per cell (MCH; 3D7 p=0.071, Th.026.09 p=0.016); and more deformable membranes (DImax; 3D7 p=0.0005, Th.026.09 p=0.008). 3D7 growth was also reduced in RBCs with more fragile membranes (O50; p=0.005). Additional phenotypes related to platelets and RBC density were selected for some models, but the direction of their effects was unclear when they were considered individually (Figure 4A, Figure 4—source data 1). These results indicate that common, non-pathogenic variation in RBC size, membrane dynamics, and other correlated traits have meaningful effects on P. falciparum replication rate in RBCs.

Taken together, the non-carrier phenotypes selected by LASSO from training data (N~61, Figure 4A) explained 3–9% of the variation in parasite growth in separate test data (N~7, Figure 4B; 3D7 p=0.008 and RMSE=18.0%; Th.026.09 p=0.079 and RMSE=10.8%). This fraction was significantly greater than expected from random permutations, which were centered on R2=0 in the test data (Figure 4B). Notably, prediction error was greater for individuals with parasite growth values farther away from the mean. In contrast, for invasion, RBC phenotypes did not explain more variation in the test data than expected from permutations (Figure 4C; 3D7 p=0.79 and RMSE=15.0%; Th.026.09 p=0.53 and RMSE=29.5%). All phenotype models were less predictive for the clinical isolate Th.026.09 than the lab strain 3D7, perhaps because clinical isolates are less adapted to laboratory conditions. Overall, these results demonstrate that multiple, variable phenotypes impact P. falciparum susceptibility in healthy RBCs. Non-carrier cells that are less hospitable to parasites share specific traits with RBCs that carry disease alleles, including smaller size, decreased deformability, and an increased tendency to lose deformability when dehydrating.

Common RBC alleles predict P. falciparum replication in non-carriers

Next, we tested whether non-carrier genotypes derived from exome sequencing could improve our predictions of P. falciparum replication rate. With a sample of 68 unrelated non-carriers, we lacked the power to perform the many thousands of tests that are typical in large genetic association studies (Fadista et al., 2016). Instead, our study design focused on 23 RBC proteins previously associated with malaria (Figure 5—source data 1), which we hypothesized are enriched for common variants impacting P. falciparum fitness, as compared to random control sets of RBC proteins. We used the same LASSO procedure described above to test 106 unlinked genetic variants (pairwise r2<0.1) in these 23 RBC proteins, along with RBC phenotypes, for association with P. falciparum fitness in non-carriers. To test for the effects of population structure, we also included the top 10 principal components (PCs) from 1000 Genomes as possible predictors. Notably, PC1 is equivalent to the exome-wide fraction of African ancestry, as determined by ADMIXTURE with K=4 from the 1000 Genomes reference populations (see Materials and methods). We again compared these results to permuted data, as well as to 1,000 sets of 23 genes drawn at random from the RBC proteome (Figure 5—source data 2).

Taken together, genotypes and phenotypes selected by LASSO explained 7–15% of the variation in parasite growth rate in the test data (Figure 5B; 3D7 p=0.012 and RMSE=16.5%; Th.026.09 p=0.063 and RMSE=10.6%). Prediction error was greater for donors with parasite values farther away from the mean, though this trend was weaker than for phenotype-only models. The variance explained by models using real genotype and phenotype data was significantly larger than expected from permutation (Figure 5—figure supplement 1A) and random sets of RBC genes (Figure 5B), suggesting that the 23 malaria-related genes contain variation that influences P. falciparum development.

Figure 5. Common variation in malaria-associated genes predicts Plasmodium falciparum fitness in non-carrier RBCs.

(A) Variants in 23 malaria-related genes (Figure 5—source data 1) and genetic PCs selected by LASSO in at least >40% of train data sets. Each model was trained on ~90% of the measured data (B C) and tested on the remaining 10% (B C). The following genes had no associated variants in non-carriers: CD55, EPB41, FPN, G6PD, GYPA, GYPE, HBA1/2, HBB, and HP. *The only significant PC association was driven by a single East Asian donor (Figure 5—figure supplement 5). (B, C) Variance in parasite fitness explained by LASSO models including 23 malaria-related genes, the top 10 PCs, and RBC phenotypes. Dashed lines indicate average R2 for models using the measured test data. Each histogram shows R2 for models including variants from 23 random genes in the RBC proteome (Figure 5—source data 2) instead of malaria-related genes. All predictors with non-zero LASSO support are shown in Figure 5—source data 3. Additional histograms from permuted data are shown in Figure 5—figure supplement 1. The variance explained by variants undiscovered by previous GWAS is shown in Figure 5—figure supplement 4. GWAS, genome-wide association studies; PC, principal component; RBC, red blood cell.

Figure 5—source data 1. Twenty-three RBC genes with strong links to malaria in the literature.
Figure 5—source data 2. Proteins present in mature RBCs.
This list was derived from the Red Blood Cell Collection database (rbcc.hegelab.org) using a medium-confidence filter.
Figure 5—source data 3. All genetic and phenotypic predictors with non-zero LASSO support.
Growth predictors selected in at least 40% of train data sets are indicated in bold. Genetic predictors are summarized in Figure 5A. NA indicates predictors that were only present as singletons in the smaller invasion data set.

Figure 5.

Figure 5—figure supplement 1. Variance in parasite fitness explained by permuted data in LASSO models.

Figure 5—figure supplement 1.

Each model was trained on ~90% of the measured data and tested on the remaining 10%. Dashed lines indicate average R2 for the measured test data. Each histogram shows the same procedure on 1,000 permutations of the measured test data.
Figure 5—figure supplement 2. Lack of association between RBC dehydration phenotypes and PIEZO1 rs59446030 or ATP2B4 rs1419114.

Figure 5—figure supplement 2.

Each cell shows the p-value from a linear model between the genetic variant and trait in non-carriers. RBC, red blood cell.
Figure 5—figure supplement 3. Three non-carrier variants with potentially overdominant effects on 3D7 growth.

Figure 5—figure supplement 3.

Homozygotes for the minor allele were ignored when estimating effect sizes for these alleles with OLS for Figure 6E. Effect size estimates that include all homozygotes are shown in Figure 5—source data 3.
Figure 5—figure supplement 4. Variants undiscovered by previous GWAS drive most of the association signal between parasite replication rate and the 23 malaria-related genes.

Figure 5—figure supplement 4.

‘Variance explained’ is the R2 of a linear model in non-carriers (excluding using only these variants as predictors. Details on the variants and previous GWAS traits are provided in Figure 5—source data 3). GWAS, genome-wide association studies.
Figure 5—figure supplement 5. An outlier individual for PC2 drives an apparent association between PC2 and 3D7 growth.

Figure 5—figure supplement 5.

See also Figure 1A.
Figure 5—figure supplement 6. A six-member family has unique ancestry and parasite susceptibility compared to other non-carrier donors.

Figure 5—figure supplement 6.

Only PCs that distinguish the family from other non-carriers are shown. P-values are derived from t-tests. PC, principal component.

Nearly all of the 32 polymorphisms selected by LASSO in growth models occurred in (1) ion channel proteins, which regulate RBC hydration; (2) components of the flexible RBC membrane backbone; or (3) red cell plasma membrane proteins, including known invasion receptors (Figure 5A). In the first category, the highly polymorphic ion channel PIEZO1 contained seven polymorphisms associated with small (<3.7%) to moderate (31%) reductions of P. falciparum growth rate. In practice, the smallest effect size that could be reliably determined for an allele with our data was ±3.7% (Figure 5—source data 3). The microsatellite variant PIEZO1-E756del, which has been a focus of several recent studies (Ilboudo et al., 2018; Ma et al., 2018; Rooks et al., 2019; Nguetse et al., 2020), predicted a moderate reduction in Th.026.09 growth (–7.9%, p=0.01) but was not related to RBC dehydration in these data (Figure 5—figure supplement 2). For 3D7, we also detected one growth-associated variant in ATP2B4 (–5.9%, p=0.075), which encodes the primary RBC calcium channel PMCA4b. This variant tags an ATP2B4 haplotype implicated by GWAS in protection from severe malaria and many RBC phenotypes (van der Harst et al., 2012; Li et al., 2013b; Lessard et al., 2017; Lin et al., 2020, Timmann et al., 2012, Zámbó et al., 2017). Notably, however, this variant has never before been functionally demonstrated to be associated with P. falciparum fitness.

SPTA1 and SPTB, which encode the flexible spectrin backbone of RBCs, contained several variants associated with the growth of at least one P. falciparum strain, as did the structural linker genes ANK1, SLC4A1, and EPB42 (Figure 5A). We also identified a total of 10 polymorphisms in ABCB6, GYPB, GYPC, CR1, CD44, and basigin (BSG) that were associated with P. falciparum growth. These plasma membrane proteins have all been previously implicated in P. falciparum invasion by genetic deficiency studies (Mayer et al., 2009; Crosnier et al., 2011; Egan et al., 2015; Egan et al., 2018), and in some cases, studies of natural polymorphisms (Nagayasu et al., 2001; Leffler et al., 2017). Notably, two of the variants identified here are synonymous quantitative trait loci (QTL) for CD44 splicing (rs35356320) and BSG expression (rs4682) (GTEx Consortium et al., 2017), further supporting the possibility that they are functional. No associated variants were detected in the other 10 tested genes, including 3 hemoglobin proteins, G6PD, 2 glycophorins, CD55, EPB41, FPN, and HP. Taken together, these data demonstrate that dozens of host genetic variants shape the phenotypic distribution of red cell susceptibility to P. falciparum in non-carriers.

Eighteen of the 32 variants selected by LASSO were synonymous, which was not significantly different from the input set of 106 variants (p=0.72, two-sided binomial test). Over half of the growth-associated variants have previously been associated with gene expression traits, GWAS traits, or GWAS loci through linkage (Figure 5—source data 3), suggesting that they indeed tag functional polymorphisms. Novel variants nonetheless contribute substantially to the predictive power of these models (Figure 5—figure supplement 4), and nearly all the variants are novel in terms of association with P. falciparum growth rate.

In contrast to growth, models of invasion that included genotypic predictors were no more accurate than expected by chance (Figure 5C, p≥0.3; Figure 5—figure supplement 1B, p≥0.15). However, six of the nine RBC invasion receptors contained variants associated with growth (Figure 5A), including a SNP in glycophorin B (GYPB) that has been linked to malaria risk in Brazil (Tarazona-Santos et al., 2011). These patterns likely stem from experimental noise in our measure of invasion (Figure 1C; Figure 2C and F; Figure 2—figure supplements 12), though we note that our definition of growth involves a re-invasion event (Figure 1C).

No PCs of population structure were significantly associated with P. falciparum growth rate (Figure 5—source data 3), including the PC that distinguishes Africans from other populations (PC1, Figure 1A). One PC was selected by LASSO for 3D7 growth, but this association was driven by a single donor with East Asian ancestry and relatively high susceptibility (Figure 5—figure supplement 5). We note that the unique ancestry and extreme phenotypes of the six-member family (Figure 5—figure supplement 6) would have driven additional correlations if family members had not been excluded from the LASSO models. Although the present study is limited by sample size, these associations between global genetic PCs and P. falciparum growth suggest that additional functional variants remain to be discovered in many populations.

African ancestry does not predict P. falciparum resistance in red cells

Based on evidence from balanced disease alleles like HbAS, it has been suggested that anti-malarial selection has shaped polygenic red cell phenotypes in African populations as a whole (Goheen et al., 2016; Kanias et al., 2017; Ma et al., 2018; Page et al., 2021). We tested this hypothesis by examining the correlation between African ancestry and P. falciparum fitness in non-carrier RBCs (Figure 6A–D). Surprisingly, we found no evidence that these traits were related, apart from a positive relationship between African ancestry and invasion rate of Th.026.09, the clinical Senegalese strain (p=0.004, R2=0.13, Figure 6D). To understand this result, we next examined how key RBC phenotypes identified in this study (Figure 4A) vary with African ancestry (Figure 6F–H; Figure 6—figure supplement 1). We found that greater African ancestry predicts reduced osmotic fragility (p=1.2×10–6), reduced RBC dehydration (CHCM p=0.009; MCHC p=0.089), and a greater fraction of ‘overhydrated’ RBCs with normal volume and low hemoglobin (M4 p=0.041). All of these traits actually predict greater red cell susceptibility to P. falciparum (Figure 4A), although together they explain less than 13% of the non-carrier variation in 3D7 growth. The remaining key phenotypes do not vary with African ancestry, which may explain why African ancestry itself is only weakly associated with P. falciparum fitness in non-carrier RBCs (Figure 6A–D).

Figure 6. Little evidence of widespread selection in Africa for slower Plasmodium falciparum replication, protective alleles, or protective phenotypes in non-carriers.

(A–D) Parasite replication versus the exome-wide fraction of African ancestry in non-carriers, determined with ADMIXTURE by comparison to 1000 Genomes reference populations. R2 and p-values are shown for OLS regression. (E) Alleles in 23 malaria-related genes that predict slower P. falciparum growth in non-carriers (Figure 5A) are not enriched for higher frequencies in Africa versus Europe. Effect sizes are shown for one allele copy for 3D7 or Th.026.09 growth, whichever was greater. Effect sizes were determined from additive models except for three alleles that appeared overdominant (Figure 5—figure supplement 3). FST was calculated from African and European samples in gnomAD (see Materials and methods). HbAS and the HBA2 deletion are shown for comparison. (F–H) RBC phenotypes associated with P. falciparum growth versus the exome-wide fraction of African ancestry in non-carriers. Slower P. falciparum growth in RBCs is predicted by greater fragility (F), greater dehydration (G), and lower Ohyper (H) (Figure 4A). Additional phenotypes are shown in Figure 6—figure supplement 1.

Figure 6.

Figure 6—figure supplement 1. Scatterplots of RBC phenotypes versus African ancestry in non-carriers.

Figure 6—figure supplement 1.

Phenotypes from Figure 4A are shown. R2 and p-values were estimated from OLS regression. RBC, red blood cell.

Next, we used allele frequency data from over 54,000 individuals in the gnomAD collection (Karczewski et al., 2020) to test whether the polymorphisms we associated with P. falciparum growth occur at different frequencies in African and European populations. Geographical differences in malaria selection are sometimes hypothesized to have increased the frequency of hundreds or thousands of undiscovered anti-malarial alleles in Africa (Mackinnon et al., 2005; Williams, 2006), as has been shown for several variants causing common RBC disorders (Kariuki and Williams, 2020). To address this hypothesis for non-carrier variation, we calculated FST between Africans and Europeans for 22 alleles with protective effects large enough to be specified in our sample (≥3.7%; Figure 5—source data 3). We found that 11 of these protective alleles (50%) are more common in Africans, which is not more than expected by chance (p=0.5, one-sided binomial test). The three protective variants with the largest absolute FST values are all more common in Europeans, including a synonymous SPTA1 allele with GWAS associations to several RBC and white blood cell traits. Two protective PIEZO1 variants are more common in Africans, including E756del and a synonymous variant of large effect. Overall, however, we find no evidence that African populations are enriched for non-pathogenic RBC polymorphisms or phenotypes associated with impaired P. falciparum growth in vitro.

Discussion

Healthy RBCs harbor extensive phenotypic and proteomic variation, both within and between human populations. In this study, we demonstrate that this variation modulates a wide range of RBC susceptibility to P. falciparum parasites. Our findings add to a growing understanding of the genetic and phenotypic basis of RBC resistance to P. falciparum, especially for RBCs that lack population-specific disease alleles. These findings suggest new targets for future malaria interventions, in addition to challenging assumptions about the role of malaria selection in shaping human RBC diversity.

Exponential replication of P. falciparum is a significant driver of malaria disease progression (Bejon et al., 2007). Therefore, the ample variation that we observed in this trait in vitro could be relevant for clinical outcomes in endemic regions. Growth inhibition from HbAS, for example, reduces the risk of death from malaria by reducing parasite density in the blood (Allison, 1954; Luzzatto, 2012). While HbAS has a uniquely extreme effect size, we found a threefold range of parasite replication rates among non-carrier RBCs that share substantial overlap with RBCs carrying other protective variants. Although the physiologically complex basis of severe malaria (Okwa, 2012) makes it difficult to estimate the precise contribution of RBC factors to severe malaria risk, the genotypes and phenotypes we have associated with P. falciparum fitness may contribute to malaria susceptibility.

We have shown here that widespread, ‘normal’ variation in RBC hydration and deformability traits are associated with P. falciparum fitness in non-carrier RBCs. Interestingly, the protective phenotypes we detect in non-carrier RBCs are also present in carriers, albeit to a stronger degree (Clark et al., 1983; Mockenhaupt, 2000; Pengon et al., 2018). These results are consistent with experimental manipulations that reduce P. falciparum growth, such as chemical or genetic dehydration of RBCs (Tiffert et al., 2005; Ma et al., 2018). They are also consistent with the protective effect conferred by Dantu, a rare glycophorin variant associated with increased membrane tension (Field et al., 1994; Leffler et al., 2017; Kariuki et al., 2020). Our data expand upon these prior findings by demonstrating for the first time that common, healthy phenotypic variation in RBC traits contributes meaningfully to P. falciparum growth.

In the last decade, several association studies have explored the genetic basis of common variation in RBC traits using large, mostly European cohorts (van der Harst et al., 2012; Astle et al., 2016; Chami et al., 2016; Canela-Xandri et al., 2018; Chen et al., 2020; Vuckovic et al., 2020). These studies agree that the broad distribution of RBC phenotypes in humans is shaped by a large number of common alleles, similar to other complex traits (Boyle et al., 2017). Although the effects of most individual alleles are likely too small to be considered pathogenic on their own, different combinations of alleles may underlie the broad phenotypic variation observed in non-carriers. We cannot rule out the possibility that some extreme phenotypes could be better explained by the presence of large-effect ‘disease’ alleles that remain undiscovered. In particular, our study was not powered to detect rare alleles, which could be an important source of missing heritability (Génin, 2020; Kierczak, 2021). Some RBC phenotypes are also shaped by environmental variation, such as diet and time of day (England et al., 1976; Sennels et al., 2011), which likely diminishes correlations between repeated samples. Although this study cannot distinguish among these explanations for phenotypic variation among non-carrier RBCs, it does suggest that this broad variation is both healthy and functional.

In our linear models of P. falciparum growth, phenotypic variation among RBCs was outperformed by genetic variation in a small number of RBC proteins. This result implies the existence of protective RBC phenotypes that we did not measure (or did not measure with sufficient accuracy), such as quantitative proteomic, transcriptomic, and metabolomic traits that could be addressed by future studies. Approximately half of the polymorphisms we identified are non-synonymous and may therefore exert direct effects on phenotypes like RBC membrane structure or ion transport. The other half of associated polymorphisms were synonymous, which could be linked to coding variants but could also have direct effects on mRNA transcription, splicing, and stability (Sauna and Kimchi-Sarfaty, 2011). Indeed, silent and coding SNPs are equally likely to be associated with human disease (Chen et al., 2010), and many synonymous sites experience strong selection (Supek et al., 2014; Machado et al., 2020). Synonymous SNPs that impact splicing, like rs35356320 in CD44, may also impact protein structure. Some other conceivable RBC phenotypes, such as the dynamics of membrane modification during P. falciparum development, may only become evident in more detailed time course experiments. The true number of RBC phenotypes that impact P. falciparum may be effectively infinite (Kinsler et al., 2020), making it useful in practice that genetic variation is more predictive of parasite growth.

One reason that our study could identify genetic associations with a modest sample size was because we focused on a relatively well-defined component of a larger disease that lends itself to controlled, in vitro experiments. Another important explanation is our use of LASSO variable selection on a restricted set of polymorphisms in genes with strong existing links to malaria (Flynn et al., 2017). Focusing our tests on a limited number of hypotheses obviated the need to meet an exome-wide significance threshold, while still allowing for the discovery of novel alleles. This approach relies directly on prior knowledge (Figure 5—source data 1) and cannot readily be expanded to explicitly test large numbers of anonymous genetic variants. However, testing fewer hypotheses that are more likely to be true helps ensure that ‘significant’ results are reliable (Ioannidis, 2005). Exome-wide data were still critical in this study for assessing population structure, as well as for performing permutation tests that confirmed an enrichment of signal in our 23 focal genes. However, future studies with many more than 68 non-carriers will be required to discover additional associations in unknown genes, non-genic regulatory variation, and any alleles with smaller effects. It is also important to note that genetic linkage complicates the identification of the exact functional polymorphisms in any population sample (Sohail et al., 2019); as in GWAS, we cannot rule out that some associated variants are merely linked to the true functional variants. Indeed, about half of our associated variants occur in linkage blocks containing other SNPs associated with RBC traits by GWAS. In this way, our evidence most strongly supports the conclusion that 13 specific RBC genes are strongly enriched for polymorphisms with impacts on P. falciparum growth.

The associations we observed for parasite growth were stronger and more significant than the associations for parasite invasion. While batch effects clearly played a role, this may also be due to missing invasion data in 10 non-carrier samples (see Materials and methods) that reduced statistical power. Both technical and biological reasons may drive the relatively greater noise observed in our invasion data. For example, invasion success may depend on the length of time spent outside the incubator during assay set-up as well as the genotypes of both donor and acceptor red cells. The reproducibility of our invasion data is also constrained by low and variable starting parasitemia and a 24hr time point, which could be substantially improved in future studies using live-cell imaging focused on invasion. Despite these limitations, our ‘growth’ measurement includes a ‘re-invasion’ event and our growth and invasion measurements are correlated. RBC deformability and dehydration are associated with both fitness components, and SNPs in several canonical invasion receptors are only associated with growth. The invasion data also allow us to highlight unique and interesting trends in the Senegalese clinical strain and in carriers of hemoglobin C.

We also observed weaker associations for the clinical strain Th.026.09 than for the lab strain 3D7. These strains display large differences in absolute growth rate, possibly because Th.026.09 carries costly alleles for drug resistance and 3D7 has had decades longer to adapt to lab conditions (Walliker et al., 1987; Daniels et al., 2012; Moser, 2020). Interestingly, African ancestry predicted higher invasion only for Th.026.09, which might indicate that this strain is better adapted to African RBCs. Despite these differences, we showed that normalized fitness values were significantly correlated between the two strains across donors. Several RBC phenotypes and genotypes that predicted fitness in one strain were also replicated in the other. These results suggest that our findings may be generalizable across divergent strains of P. falciparum, although future studies would benefit by testing many more lab strains and clinical isolates.

One of the unique aspects of our study is the participation of individuals with a range of African ancestry, defined by similarity to donors from five 1000 Genomes reference populations. We found that African ancestry was unexpectedly associated with RBC phenotypes that improved parasite fitness, particularly for Th.026.09. In the future, it would be very interesting to test for local parasite adaptation to human RBCs using P. falciparum strains and RBC samples from around the globe. We also found that the total set of polymorphisms associated with P. falciparum growth by LASSO are not enriched in African populations included in the gnomAD database of human variation. Notably, a recent test of data from a large GWAS for severe malaria (Malaria Genomic Epidemiology Network, 2019) was also unable to demonstrate that natural selection has driven many malaria-protective alleles to higher frequencies in African versus European populations. Therefore, for the total set of alleles detectable in this study, we offer at least four possible explanations for this unexpected result. First, compared to large-effect disease alleles, the majority of non-pathogenic variants may not have had sufficient time to increase in frequency since P. falciparum began expanding in humans some 5000–10,000 years ago (Sundararaman et al., 2016; Otto et al., 2018). Second, the complexity of severe malaria could mean that the variants discovered here do not substantially impact disease outcome, especially relative to known disease variants. Third, the variants discovered here may have pleiotropic effects on other phenotypes, which are themselves subject to other selective pressures besides malaria resistance. Finally, human adaptation may be too local to detect with coarse-grain sampling of sub-Saharan African genetic diversity (e.g., Pankratov et al., 2020). Overall, however, our data suggest that few RBC alleles remain to be discovered that are both particularly common in Africa and have large effects on P. falciparum proliferation in RBCs.

More broadly, these data show that it may be inaccurate to make assumptions about RBC susceptibility to P. falciparum based on a person’s race or continental ancestry. These kinds of hypotheses (Williams, 2006; Goheen et al., 2016; Kanias et al., 2017; Ma et al., 2018) are based on well-known examples of balanced disease alleles, which are notable exceptions to the overwhelming genetic similarity of all human populations (Rosenberg et al., 2002; Novembre and Di Rienzo, 2009). In our data, RBC variation that is associated with reduced P. falciparum fitness is clearly not limited to individuals with recent African ancestry. This result is an important reminder that >90% of the total genetic variation among humans occurs within populations, rather than across them (Lewontin, 1972; Rosenberg, 2011); and that the majority of common genetic variation is shared among all human populations (Biddanda et al., 2020).

In conclusion, this study demonstrates that substantial phenotypic and genetic diversity in healthy human RBCs impacts the replication of malaria parasites. Whether or not this diversity is shaped by malaria selection, a better understanding of how P. falciparum biology is impacted by natural RBC variation could help lead to new therapies for one of humanity’s most important infectious diseases.

Materials and methods

Key resources table.

Reagent type (species) or resource Designation Source or reference Identifiers Additional information
Biological sample (Homo sapiens) Primary whole blood samples This paper Freshly drawn from de-identified human subjects into CPDA tubes (IRB #40479)
Strain, strain background (Plasmodium falciparum) 3D7 PMID:3299700; Obtained from Walter and Eliza Hall Institute, Melbourne, Australia
Strain, strain background (P. falciparum) Th026.09 PMID:22430961; Gift from Daouda Ndiaye and Sarah Volkman, Senegal
Commercial assay or kit DNeasy Blood and Tissue Kit QIAGEN
Commercial assay or kit KAPA Hyperplus Kit Roche
Commercial assay or kit SeqCap EZ Prime Exome Kit Roche
Sequence-based reagent Primers amplifying PIEZO1 exon 17 PMID:32265284
Software, algorithm bwa mem http://arxiv.org/abs/1303.3997 0.7.17-r1188
Software, algorithm GATK https://gatk.broadinstitute.org/hc/en-us 4.0.0.0
Software, algorithm vcftools doi:10.1093/bioinformatics/btr330 0.1.15
Software, algorithm ANNOVAR PMID:20601685 2018-04-16
Software, algorithm PLINK PMID:17701901 v1.90b6.8 64-bit
Software, algorithm ADMIXTURE PMID:21682921 1.3.0
Software, algorithm R https://www.R-project.org/ 3.5.1
Other SYBR Green I nucleic acid stain Invitrogen S7563
Other Drabkin’s Reagent Ricca Chemical 2660–32

Sample collection and preparation

One-hundred and twenty-one subjects with no known history of RBC disorders were recruited to donate blood at the Stanford Clinical and Translational Research Unit. This study size was designed to sample multiple individuals carrying alleles of moderate frequency (5% or higher). Written informed consent was obtained from each subject and/or their parent as part of a protocol approved by the Stanford University Institutional Review Board (#40479). To help control for weekly batch effects, subject 1111 donated fresh blood for each parasite assay. Eleven other subjects donated blood on at least 2 different weeks, constituting biological replicates. Whole blood samples from a HE patient were obtained from Dr. Bertil Glader under a separate approved protocol (Stanford IRB #14004) that permitted sample sharing among researchers. All samples were de-identified upon collection by labeling with a random four-to-six digit code. Two samples were eventually removed from analysis based on a failed sequencing library (6449KD) and history of stem cell transplant (8715).

Whole blood was drawn into CPDA tubes and spun down within 36 hr to separate serum, buffy coat, and RBCs. RBCs were washed and stored in RPMI-1640 medium (Sigma-Aldrich) supplemented with 25 mM HEPES, 50 mg/L hypoxanthine, and 2.42 mM sodium bicarbonate at 4°C. Buffy coat was transferred directly to cryotubes and stored at –80°C.

Exome sequencing and genotype calling

Genomic DNA was isolated from frozen buffy coats using a DNeasy Blood and Tissue Kit (QIAGEN). Libraries were prepared using a KAPA Hyperplus Kit (Roche) and hybridized to human exome probes using the SeqCap EZ Prime Exome Kit (Roche). The resulting exome libraries were sequenced with paired-end 150 bp Illumina reads on the HiSeq or NextSeq platforms at Admera Health (South Plainfield, NJ).

Reads were aligned to the hg38 human reference genome using bwa mem (Li, 2013a), yielding an average coverage of 42X across targeted exome regions (excluding sample 6449KD). Variants were called using GATK best practices (Van der Auwera et al., 2013) and hard filtered with the following parameters: QD<2.0, FS>60.0, ReadPosRankSum<–2.5, SOR>2.5, MQ<55.0, MQRankSum<–1.0, and DP<500. To minimize the effects of sequencing errors, variants not present in 1000 Genomes, dbSNP_138, or the Mills indel collection (Mills et al., 2006) were discarded. Variants that were significantly more frequent in our sample than in gnomAD African and European populations (Karczewski et al., 2020) were also discarded, in order to avoid false associations from miscalled variants. We also excluded singleton variants from all association analyses, potentially including some variants unique to other populations. With the remaining variants, we calculated kinship coefficients among all pairs of donors using vcftools --relatedness2. Only the six members of the known family had pairwise coefficients >0.044, confirming that no other donors were related.

PIEZO1 E756del was genotyped via PCR and Sanger sequencing according to a previously published protocol (Nguetse et al., 2020). To call deletion variants that cause α-thalassemia in the paralogous genes HBA2 and HBA1, we extracted reads from each.bam file that lacked any mismatches or soft-clipping and had MAPQ≥13 (i.e., <5% chance of mapping error). Coverage with these well-mapped reads was calculated over the 73 and 81 bp of unique sequence in HBA2 and HBA1 and normalized to each sample’s exome-wide coverage. To determine which samples has unusually low coverage, we formed an ad hoc reference panel of seven donors who were unlikely to carry deletion alleles based on their normal MCH, MCV, and HGB and >96% exome-wide European ancestry (Weatherall, 2001). We called heterozygous HBA2 deletions when normalized coverage across three unique regions of the HBA2 gene was below the minimum reference value. Similarly, we called homozygous HBA2 deletions when normalized coverage across three unique regions of the HBA2 gene was less than half of the minimum reference value. This approach resulted in an estimated HBA2 copy number of 2.0 in the reference panel, 0.95 in eight putative heterozygotes and 0.12 in four putative homozygotes. The same method produced no evidence of HBA1 deletion in any sample.

Variant classification and linkage pruning

Exonic variants in RefSeq genes were identified using ANNOVAR (Wang et al., 2010). Variants were classified into three categories: those within 23 malaria-related genes (Figure 5—source data 1); those within 887 other RBC proteins (Figure 5—source data 2) derived with a medium-confidence filter from the Red Blood Cell Collection database (rbcc.hegelab.org); and those within any other gene.

Linkage between all pairs of bi-allelic, exonic variants in our 121 genotyped samples was calculated using the --geno-r2 and --interchrom-geno-r2 functions in vcftools (Danecek et al., 2011). Variants in RBC genes that shared r2 >0.1 with any variant in the 23-gene set were removed. Within the 23-gene set and RBC-gene set separately, non-carrier variants were ranked by the p-values of their OLS regression with all four parasite measures. Then, one variant was removed from each pair with r2>0.1, prioritizing retention in the following order: greater significance across models; non-synonymous protein change; higher frequency in our sample; and finally by random sampling. We report results from additive genetic models (genotypes coded 0/1/2), which performed as well or better than recessive (0/0/2) and dominant (0/2/2) models. For three variants, overdominant models (0/1/–) provided the best fit and were used to estimate effect sizes (Figure 5—figure supplement 3).

Population analysis

The population ancestry of our donors was assessed by comparison with African, European, East Asian, and South Asian reference populations from the 1000 Genomes Project (Auton et al., 2015). Briefly, variants called from an hg38 alignment of the 1000 Genomes data (Lowy-Gallego et al., 2019) were filtered for concordance with the variants genotyped in this study. The --indep-pairwise command in PLINK (Purcell et al., 2007) was used to prune SNPs with r2>0.1 with any other SNP in a 50-SNP sliding window, producing 35,759 unlinked variants. These variants were analyzed in both PLINK --pca and in ADMIXTURE (Alexander and Lange, 2011) with K=4 for the 121 genotyped individuals in this study, alongside 2458 individuals from 1000 Genomes. Pan-African and pan-European allele frequencies were obtained from gnomAD v3 (Karczewski et al., 2020). FST for specific alleles was calculated as (HT−HS)/HT and then polarized, such that positive values indicate variants more common in Africa.

P. falciparum culture and assays

Our 3D7 strain of P. falciparum was obtained from the Walter and Eliza Hall Institute (Melbourne, Australia) and routinely cultured in human erythrocytes obtained from the Stanford Blood Center. Th.026.09 is a clinical strain isolated from a patient in Senegal in 2009 and kindly provided by Daouda Ndiaye and Sarah Volkman. 3D7 is drug-sensitive and has been lab-adapted for over 40 years, whereas Th.026.09 is drug-resistant and minimally lab-adapted (Walliker et al., 1987; Daniels et al., 2012; Moser, 2020). 3D7 was maintained at 2% hematocrit in RPMI-1640 supplemented with 25 mM HEPES, 50 mg/L hypoxanthine, 2.42 mM sodium bicarbonate, and 4.31 mg/ml Albumax (Invitrogen) at 37°C in 5% CO2 and 1% O2. Th.026.09 was maintained in the same conditions, except that half the Albumax was replaced with heat-inactivated human AB serum.

Parasite growth and invasion assays were performed using schizont-stage parasites isolated from routine culture using a MACS magnet (Miltenyi). Parasites were added at ~0.5% initial parasitemia to fresh erythrocytes suspended at 1% hematocrit in complete RPMI, as above. Parasites were cultured in each erythrocyte sample for 3–5 days in triplicate 100 µl wells. Parasitemia was determined as the average of the three technical replicates, excluding single outlier points, on day 0, day 1 (24 hr), day 3 (72 hr), and in some cases day 5 (120 hr). The fraction of infected RBCs was measured by staining with SYBR Green one nucleic acid stain (Invitrogen, Thermo Fisher Scientific, Eugene, OR) at 1:2000 dilution in PBS/0.3% BSA for 20 min, followed by flow cytometry analysis on a MACSQuant flow cytometer (Miltenyi). Raw invasion rate was defined as the day 1 parasitemia divided by the day 0 parasitemia; raw growth rate was defined as the day 3 (or day 5) parasitemia divided by the day 1 (or day 3) parasitemia. Day 0 parasitemia was not measured in weeks 1–3, so invasion rate estimates are absent for these samples (N=58 unrelated non-carriers with invasion data). The parasite assays failed for both strains in week 9 and for Th.026.09 in week 10, and so were repeated in weeks 10 and 11 with RBCs that had been stored for 1 or 2 weeks.

To correct for batch effects, including substantial week-to-week variation in P. falciparum replication rate, we extracted the residuals from a linear regression of the raw parasite values against up to four significantly related batch variables: (1) the raw values for control donor 1111 each week; (2) the parasitemia measured at the previous time point; (3) the age in weeks of the RBCs being measured; and (4) the experimenter performing the assays. Notably, there was no additional effect of ‘Week’ or the length of the experiment (i.e., 3 or 5 days) once the above variables were regressed out. To convert these residuals (mean 0%) to relative percentages (mean 100%), we first trained linear models for growth and invasion in each strain with data from control donor 1111 and carriers with extreme parasite values (HbAS and HE for growth; G6PDhigh and HE for invasion). For these models, relative percentages were calculated by normalizing the raw multiplication rates in these samples to the raw multiplication rate in the 1111 control from that week. These linear models were used to convert residuals to relative percentages for all samples. Finally, the relative percentages were arithmetically adjusted so that the mean invasion and growth values for non-carriers was 100%. Code for this normalization is available at https://github.com/emily-ebel/RBC (copy archived at swh:1:rev:31f953428a4ec5f0fa83201085ada0a0995facb2), Ebel, 2021.

Red cell phenotyping and normalization

Complete blood count (CBC) data for RBCs, reticulocytes, and platelets were obtained with an ADVIA 120 hematology analyzer (Siemens, Laguna Hills, CA) at the Red Cell Laboratory at Children’s Hospital Oakland Research Institute. These data were: RBC, HGB, HCT, MCV, MCH, MCHC, CHCM, RDW, HDW, PLT, MPV, Reticulocyte number and percentage, and the fraction of RBCs in each of nine cells of the RBC matrix (see Figure 3—figure supplement 2). Systematic biases were evident for some measures in certain weeks, but data from control donor 1111 were not available for all weeks. Therefore, CBC data were normalized such that the median value for non-carrier samples was equal across weeks.

Osmotic fragility tests were performed in duplicate by incubating 20 μl of washed erythrocytes for 5 min in 500 μl solutions of NaCl in 14 concentrations: 7.17, 6.14, 5.73, 5.32, 4.91, 4.50, 4.30, 4.09, 3.89, 3.68, 3.27, 3.07, 2.66, and 2.46 g/L. Tubes were spun for 5 min at 1000 g and 100 μl of supernatant was transferred to a 96-well plate. Hemoglobin concentration was determined by adding 100 μl of Drabkin’s reagent (Ricca Chemical) to each well and measuring absorbance at OD540nm with a Synergy H1 Plate Reader (Biotek). Relative lysis was determined by normalizing to the maximum hemoglobin concentration in the 14-tube series for each sample. After outlier points were manually removed, sigmoidal osmotic fragility curves were estimated under a self-starting logistic model in the nls package in R. Curves were summarized by the relative tonicity at which 50% lysis occurred (see Figure 3—figure supplement 4) and normalized within weekly batches, such that this value was equal for control sample 1111 across weeks.

Osmotic gradient ektacytometry (Clark et al., 1983; Kuypers, 1990) was performed at the Red Cell Laboratory at Children’s Hospital Oakland Research Institute. Red cell deformability estimates across a gradient of NaCl concentrations were fitted to a 20-parameter polynomial model to generate a smooth curve, which was manually verified to closely fit the data. Each curve was summarized with three standard points (Figure 3—figure supplement 5; Clark et al., 1983), which were normalized such that the median x- and y-values of the three points was equal for non-carrier samples across weeks.

Statistical analysis

Student’s t-test was used to compare trait values between non-carriers and carriers where N>1. Given our modest sample sizes and the expected noise in parasite data, we defined statistical significance as p<0.1. Where N=1 (i.e., for G6PDhigh and HE), significance was assessed with the percentile of the non-carrier distribution. For all comparisons of two continuous variables, OLS linear regression was performed with the lm function in R unless otherwise specified. Adjusted R2 values are reported.

LASSO regression (Tibshirani, 1994; Chatterjee, 2013) was performed in a k-folds CV framework with the glmnet and caret packages in R. For each of 1000 iterations, we used the createFolds function with k=10 to split the non-carrier data into 10-folds of roughly equal size. Each fold was used as a ‘test set’ for a LASSO model trained on the remaining nine folds. For each of the 1000 iterations in which 10-folds were created, we collected 10 sets of predictors from the 10 train sets; one average R2 value for the 10 train sets; and one average R2 for the 10 test sets. Each set of 1000 resulting R2 values were normally distributed, and their average is reported in Figures 4 and 5. The fraction of k-folds CV support per predictor is based on 10,000 total train models (1000 iterations*10 folds each) and is reported in Figure 4A and Figure 5—source data 3.

To perform LASSO with each training set, we used the cv.glmnet function with α=1. This function split the train data into 10 folds 11 times, the first to estimate a lambda sequence and the rest to compute the fit with one fold omitted. The lambda value that produced minimal error in the training data was then used to predict values in the independent test data described above. Since cv.glmnet selects folds at random, we performed this procedure five times for each train/test set (which we term ‘internal cross-validation’). We retained R2 values and selected predictors from the median model of these five internal CVs. Internal CVs did not otherwise contribute to the k-folds CV support reported in the main text.

To assess the significance of each LASSO result, we applied the same modeling procedure to 1000 data sets with randomly permuted parasite values, which preserved the original correlations among RBC predictors. We performed 10 iterations of fold creation for each permuted data set and retained the average R2 for each set of 10-folds, which generated 1000 fold-averaged R2 values for train sets and 1000 fold-averaged R2 values for train sets. Significance was determined by the percentile of the permuted distribution in which the real data fell. We also applied this same procedure to 1000 sets of 23 genes chosen at random from the RBC proteome (Figure 5—source data 2).

We noticed that LASSO effect size estimates for each predictor varied considerably across models. Therefore, we used univariate OLS regression on all non-carrier data (excluding five of the six family members) to estimate the effect size of each predictor selected at least once by LASSO. OLS p-values are reported as a measure of confidence in these effect size estimates, with p<0.1 considered sufficient evidence to report the effect size. However, because OLS regression was only performed for variants pre-selected by LASSO, these p-values cannot be interpreted on their own as evidence of significant associations.

We compared groups of selected genetic variants using the binom.test function in R. For synonymous alleles, we used the proportion of synonymous alleles in the input set of 106 variants (53%) as the null hypothesis. For allele frequencies in Africa and Europe, we categorized protective variants as more common (to any absolute degree) among Africans (N=21,042) or non-Finnish Europeans (N=32,399) in the gnomAD database. The null hypothesis was that 50% of the alleles would be more common in Africans.

Acknowledgements

The authors gratefully acknowledge the invaluable participation of all volunteer blood donors. Nick Bondy, Bertil Glader, Sandra Larkin, Brian Fleischer, Ashley Dunn, Talal Seddik, Trung Pham, David Vu, and Spectrum Child Health provided crucial assistance in donor coordination and sample processing. P. falciparum strain Th.026.09 was kindly provided by Daouda Ndiaye and Sarah Volkman. For quantitative advice, the authors thank Grant Kinsler, Jonathan Pritchard, Susan Holmes, and the Stanford Statistics Consulting Group. This study was primarily supported by a Pilot Early Career award from the Stanford Maternal Child Health Research Institute and a Gabilan Faculty Award from the Stanford University School of Medicine Office of Faculty Development and Diversity (ESE). ERE was an NSF Graduate Research Fellow (DGE-1247312) and received additional support from the Stanford Center for Computational, Evolutionary, and Human Genomics. DAP was funded through an NIH MIRA award 5R35GM118165-05. ESE is a Tashia and John Morgridge Endowed Faculty Scholar in Pediatric Translational Medicine through the Stanford Maternal Child Health Research Institute. Local blood samples were drawn at the Stanford Clinical and Translational Research Unit, which is supported by CTSA Grant UL1 TR001085.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Dmitri A Petrov, Email: dpetrov@stanford.edu.

Elizabeth S Egan, Email: eegan@stanford.edu.

Jenny Tung, Duke University, United States.

George H Perry, Pennsylvania State University, United States.

Funding Information

This paper was supported by the following grants:

  • Stanford University School of Medicine to Elizabeth S Egan.

  • Stanford University School of Medicine to Elizabeth S Egan.

  • Stanford Center for Computational, Evolutionary and Human Genomics to Emily R Ebel.

  • National Institute of General Medical Sciences 5R35GM118165-05 to Dmitri A Petrov.

  • National Science Foundation DGE-1247312 to Emily R Ebel.

Additional information

Competing interests

No competing interests declared.

No competing interests declared.

Author contributions

Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Visualization, Writing – original draft, Writing – review and editing.

Investigation, Methodology, Resources, Writing – review and editing.

Investigation.

Conceptualization, Methodology, Resources, Supervision, Writing – review and editing.

Conceptualization, Investigation, Methodology, Resources, Supervision, Writing – review and editing.

Ethics

Human subjects: Written informed consent and consent to publish was obtained from each subject and/or their parent as part of a protocol approved by the Stanford University Institutional Review Board (#40479).

Additional files

Transparent reporting form

Data availability

All data generated or analyzed during this study are included in the manuscript and supporting files. Source data files have been provided for Figures 1, 4, and 5 and other raw data and normalization scripts are available at https://github.com/emily-ebel/RBC (copy archived at https://archive.softwareheritage.org/swh:1:rev:31f953428a4ec5f0fa83201085ada0a0995facb2).

The following dataset was generated:

Ebel ER, Kuypers FA, Lin C, Petrov DA, Egan ES. 2020. Exome Sequencing from Participants in RBC/Malaria Study. NCBI BioProject. PRJNA683732

References

  1. Alexander DH, Lange K. Enhancements to the admixture algorithm for individual ancestry estimation. BMC Bioinformatics. 2011;12:246. doi: 10.1186/1471-2105-12-246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Allison AC. Protection afforded by sickle-cell trait against subtertian malareal infection. British Medical Journal. 1954;1:290–294. doi: 10.1136/bmj.1.4857.290. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Archer NM, Petersen N, Clark MA, Buckee CO, Childs LM, Duraisingh MT. Resistance to Plasmodium falciparum in sickle cell trait erythrocytes is driven by oxygen-dependent growth inhibition. PNAS. 2018;115:7350–7355. doi: 10.1073/pnas.1804388115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Astle WJ, Elding H, Jiang T, Allen D, Ruklisa D, Mann AL, Mead D, Bouman H, Riveros-Mckay F, Kostadima MA, Lambourne JJ, Sivapalaratnam S, Downes K, Kundu K, Bomba L, Berentsen K, Bradley JR, Daugherty LC, Delaneau O, Freson K, Garner SF, Grassi L, Guerrero J, Haimel M, Janssen-Megens EM, Kaan A, Kamat M, Kim B, Mandoli A, Marchini J, Martens JHA, Meacham S, Megy K, O’Connell J, Petersen R, Sharifi N, Sheard SM, Staley JR, Tuna S, van der Ent M, Walter K, Wang SY, Wheeler E, Wilder SP, Iotchkova V, Moore C, Sambrook J, Stunnenberg HG, Di Angelantonio E, Kaptoge S, Kuijpers TW, Carrillo-de-Santa-Pau E, Juan D, Rico D, Valencia A, Chen L, Ge B, Vasquez L, Kwan T, Garrido-Martín D, Watt S, Yang Y, Guigo R, Beck S, Paul DS, Pastinen T, Bujold D, Bourque G, Frontini M, Danesh J, Roberts DJ, Ouwehand WH, Butterworth AS, Soranzo N. The Allelic Landscape of Human Blood Cell Trait Variation and Links to Common Complex Disease. Cell. 2016;167:1415–1429. doi: 10.1016/j.cell.2016.10.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR, 1000 Genomes Project Consortium A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Band G, Rockett KA, Spencer CCA, Kwiatkowski DP, Malaria Genomic Epidemiology Network A novel locus of resistance to severe malaria in a region of ancient balancing selection. Nature. 2015;526:253–257. doi: 10.1038/nature15390. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bejon P, Berkley JA, Mwangi T, Ogada E, Mwangi I, Maitland K, Williams T, Scott JAG, English M, Lowe BS, Peshu N, Newton CRJC, Marsh K. Defining childhood severe falciparum malaria for intervention studies. PLOS Medicine. 2007;4:e251. doi: 10.1371/journal.pmed.0040251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Beutler E, West C. Hematologic differences between African-Americans and whites: the roles of iron deficiency and alpha-thalassemia on hemoglobin levels and mean corpuscular volume. Blood. 2005;106:740–745. doi: 10.1182/blood-2005-02-0713. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Biddanda A, Rice DP, Novembre J. A variant-centric perspective on geographic patterns of human allele frequency variation. eLife. 2020;9:e60107. doi: 10.7554/eLife.60107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Boyle EA, Li YI, Pritchard JK. An Expanded View of Complex Traits: From Polygenic to Omnigenic. Cell. 2017;169:1177–1186. doi: 10.1016/j.cell.2017.05.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Canela-Xandri O, Rawlik K, Tenesa A. An atlas of genetic associations in UK Biobank. Nature Genetics. 2018;50:1593–1599. doi: 10.1038/s41588-018-0248-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Cao A, Galanello R. Beta-thalassemia. Genetics in Medicine. 2010;12:61–76. doi: 10.1097/GIM.0b013e3181cd68ed. [DOI] [PubMed] [Google Scholar]
  13. Chami N, Chen MH, Slater AJ, Eicher JD, Evangelou E, Tajuddin SM, Love-Gregory L, Kacprowski T, Schick UM, Nomura A, Giri A, Lessard S, Brody JA, Schurmann C, Pankratz N, Yanek LR, Manichaikul A, Pazoki R, Mihailov E, Hill WD, Raffield LM, Burt A, Bartz TM, Becker DM, Becker LC, Boerwinkle E, Bork-Jensen J, Bottinger EP, O’Donoghue ML, Crosslin DR, de Denus S, Dubé MP, Elliott P, Engström G, Evans MK, Floyd JS, Fornage M, Gao H, Greinacher A, Gudnason V, Hansen T, Harris TB, Hayward C, Hernesniemi J, Highland HM, Hirschhorn JN, Hofman A, Irvin MR, Kähönen M, Lange E, Launer LJ, Lehtimäki T, Li J, Liewald DCM, Linneberg A, Liu Y, Lu Y, Lyytikäinen LP, Mägi R, Mathias RA, Melander O, Metspalu A, Mononen N, Nalls MA, Nickerson DA, Nikus K, O’Donnell CJ, Orho-Melander M, Pedersen O, Petersmann A, Polfus L, Psaty BM, Raitakari OT, Raitoharju E, Richard M, Rice KM, Rivadeneira F, Rotter JI, Schmidt F, Smith AV, Starr JM, Taylor KD, Teumer A, Thuesen BH, Torstenson ES, Tracy RP, Tzoulaki I, Zakai NA, Vacchi-Suzzi C, van Duijn CM, van Rooij FJA, Cushman M, Deary IJ, Velez Edwards DR, Vergnaud AC, Wallentin L, Waterworth DM, White HD, Wilson JG, Zonderman AB, Kathiresan S, Grarup N, Esko T, Loos RJF, Lange LA, Faraday N, Abumrad NA, Edwards TL, Ganesh SK, Auer PL, Johnson AD, Reiner AP, Lettre G. EXOME genotyping identifies pleiotropic variants associated with red blood cell traits. American Journal of Human Genetics. 2016;99:8–21. doi: 10.1016/j.ajhg.2016.05.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Chatterjee S. Assumptionless Consistency of the Lasso. arXiv. 2013 http://arxiv.org/abs/1303.5817
  15. Chen R, Davydov EV, Sirota M, Butte AJ. Non-synonymous and synonymous coding snps show similar likelihood and effect size of human disease association. PLOS ONE. 2010;5:e13574. doi: 10.1371/journal.pone.0013574. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Chen MH, Raffield LM, Mousas A, Sakaue S, Huffman JE, Moscati A, Trivedi B, Jiang T, Akbari P, Vuckovic D, Bao EL, Zhong X, Manansala R, Laplante V, Chen M, Lo KS, Qian H, Lareau CA, Beaudoin M, Hunt KA, Akiyama M, Bartz TM, Ben-Shlomo Y, Beswick A, Bork-Jensen J, Bottinger EP, Brody JA, van Rooij FJA, Chitrala K, Cho K, Choquet H, Correa A, Danesh J, Di Angelantonio E, Dimou N, Ding J, Elliott P, Esko T, Evans MK, Floyd JS, Broer L, Grarup N, Guo MH, Greinacher A, Haessler J, Hansen T, Howson JMM, Huang QQ, Huang W, Jorgenson E, Kacprowski T, Kähönen M, Kamatani Y, Kanai M, Karthikeyan S, Koskeridis F, Lange LA, Lehtimäki T, Lerch MM, Linneberg A, Liu Y, Lyytikäinen LP, Manichaikul A, Martin HC, Matsuda K, Mohlke KL, Mononen N, Murakami Y, Nadkarni GN, Nauck M, Nikus K, Ouwehand WH, Pankratz N, Pedersen O, Preuss M, Psaty BM, Raitakari OT, Roberts DJ, Rich SS, Rodriguez BAT, Rosen JD, Rotter JI, Schubert P, Spracklen CN, Surendran P, Tang H, Tardif JC, Trembath RC, Ghanbari M, Völker U, Völzke H, Watkins NA, Zonderman AB, VA Million Veteran Program. Wilson PWF, Li Y, Butterworth AS, Gauchat JF, Chiang CWK, Li B, Loos RJF, Astle WJ, Evangelou E, van Heel DA, Sankaran VG, Okada Y, Soranzo N, Johnson AD, Reiner AP, Auer PL, Lettre G. Trans-ethnic and ancestry-specific blood-cell genetics in 746,667 individuals from 5 global populations. Cell. 2020;182:1198–1213. doi: 10.1016/j.cell.2020.06.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Clark MR, Mohandas N, Shohet SB. Osmotic gradient ektacytometry: Comprehensive characterization of red cell volume and surface maintenance. Blood. 1983;61:899–910. doi: 10.1182/blood.V61.5.899.899. [DOI] [PubMed] [Google Scholar]
  18. Clarke GM, Rockett K, Kivinen K, Hubbart C, Jeffreys AE, Rowlands K, Jallow M, Conway DJ, Bojang KA, Pinder M, Usen S, Sisay-Joof F, Sirugo G, Toure O, Thera MA, Konate S, Sissoko S, Niangaly A, Poudiougou B, Mangano VD, Bougouma EC, Sirima SB, Modiano D, Amenga-Etego LN, Ghansah A, Koram KA, Wilson MD, Enimil A, Evans J, Amodu OK, Olaniyan S, Apinjoh T, Mugri R, Ndi A, Ndila CM, Uyoga S, Macharia A, Peshu N, Williams TN, Manjurano A, Sepúlveda N, Clark TG, Riley E, Drakeley C, Reyburn H, Nyirongo V, Kachala D, Molyneux M, Dunstan SJ, Phu NH, Quyen NN, Thai CQ, Hien TT, Manning L, Laman M, Siba P, Karunajeewa H, Allen S, Allen A, Davis TM, Michon P, Mueller I, Molloy SF, Campino S, Kerasidou A, Cornelius VJ, Hart L, Shah SS, Band G, Spencer CC, Agbenyega T, Achidi E, Doumbo OK, Farrar J, Marsh K, Taylor T, Kwiatkowski DP, MalariaGEN Consortium Characterisation of the opposing effects of g6pd deficiency on cerebral malaria and severe malarial anaemia. eLife. 2017;6:e15085. doi: 10.7554/eLife.15085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Cooling L. Blood groups in infection and host susceptibility. Clinical Microbiology Reviews. 2015;28:801–870. doi: 10.1128/CMR.00109-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Crosnier C, Bustamante LY, Bartholdson SJ, Bei AK, Theron M, Uchikawa M, Mboup S, Ndir O, Kwiatkowski DP, Duraisingh MT, Rayner JC, Wright GJ. Basigin is a receptor essential for erythrocyte invasion by plasmodium falciparum. Nature. 2011;480:534–537. doi: 10.1038/nature10606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, 1000 Genomes Project Analysis Group The variant call format and vcftools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Daniels R, Ndiaye D, Wall M, McKinney J, Séne PD, Sabeti PC, Volkman SK, Mboup S, Wirth DF. Rapid, field-deployable method for genotyping and discovery of single-nucleotide polymorphisms associated with drug resistance in Plasmodium falciparum. Antimicrobial Agents and Chemotherapy. 2012;56:2976–2986. doi: 10.1128/AAC.05737-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. de Mendonça VRR, Goncalves MS, Barral-Netto M. The host genetic diversity in malaria infection. Journal of Tropical Medicine. 2012;2012:940616. doi: 10.1155/2012/940616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Dhermy D, Schrével J, Lecomte MC. Spectrin-based skeleton in red blood cells and malaria. Current Opinion in Hematology. 2007;14:198–202. doi: 10.1097/MOH.0b013e3280d21afd. [DOI] [PubMed] [Google Scholar]
  25. Ebel ER. RBC. swh:1:rev:31f953428a4ec5f0fa83201085ada0a0995facb2Software Heritage. 2021 https://archive.softwareheritage.org/swh:1:dir:936eeefd426d26efb71dfd49bea1ccafaa03ac3f;origin=https://github.com/emily-ebel/RBC;visit=swh:1:snp:4e53482ed5ff50b01ed3179d405ec1ac9387b7a5;anchor=swh:1:rev:31f953428a4ec5f0fa83201085ada0a0995facb2
  26. Egan ES, Jiang RHY, Moechtar MA, Barteneva NS, Weekes MP, Nobre LV, Gygi SP, Paulo JA, Frantzreb C, Tani Y, Takahashi J, Watanabe S, Goldberg J, Paul AS, Brugnara C, Root DE, Wiegand RC, Doench JG, Duraisingh MT. Malaria. A forward genetic screen identifies erythrocyte CD55 as essential for plasmodium falciparum invasion. Science. 2015;348:711–714. doi: 10.1126/science.aaa3526. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Egan ES, Weekes MP, Kanjee U, Manzo J, Srinivasan A, Lomas-Francis C, Westhoff C, Takahashi J, Tanaka M, Watanabe S, Brugnara C, Gygi SP, Tani Y, Duraisingh MT. Erythrocytes lacking the langereis blood group protein abcb6 are resistant to the malaria parasite plasmodium falciparum. Communications Biology. 2018;1:45. doi: 10.1038/s42003-018-0046-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. England JM, Ward SM, Down MC. Microcytosis, anisocytosis and the red cell indices in iron deficiency. British Journal of Haematology. 1976;34:589–597. doi: 10.1111/j.1365-2141.1976.tb03605.x. [DOI] [PubMed] [Google Scholar]
  29. Evans DM, Frazer IH, Martin NG. Genetic and environmental causes of variation in basal levels of blood cells. Twin Research. 1999;2:250–257. doi: 10.1375/136905299320565735. [DOI] [PubMed] [Google Scholar]
  30. Facer CA. Erythrocytes carrying mutations in spectrin and protein 4.1 show differing sensitivities to invasion by plasmodium falciparum. Parasitology Research. 1995;81:52–57. doi: 10.1007/BF00932417. [DOI] [PubMed] [Google Scholar]
  31. Fadista J, Manning AK, Florez JC, Groop L. The (in)famous GWAS P-value threshold revisited and updated for low-frequency variants. European Journal of Human Genetics. 2016;24:1202–1205. doi: 10.1038/ejhg.2015.269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Field SP, Hempelmann E, Mendelow BV, Fleming AF. Glycophorin variants and plasmodium falciparum: Protective effect of the dantu phenotype in vitro. Human Genetics. 1994;93:148–150. doi: 10.1007/BF00210600. [DOI] [PubMed] [Google Scholar]
  33. Flynn CJ, Hurvich CM, Simonoff JS. On the sensitivity of the lasso to the number of predictor variables. Statistical Science. Institute of Mathematical Statistics. 2017;32:88–105. doi: 10.1214/16-STS586. [DOI] [Google Scholar]
  34. Francis SE, Sullivan DJ, Goldberg D. E. Hemoglobin metabolism in the malaria parasite Plasmodium falciparum. Annual Review of Microbiology. 1997;51:97–123. doi: 10.1146/annurev.micro.51.1.97. [DOI] [PubMed] [Google Scholar]
  35. Friedman MJ. Erythrocytic mechanism of sickle cell resistance to malaria. PNAS. 1978;75:1994–1997. doi: 10.1073/pnas.75.4.1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Galanello R, Cao A. Alpha-thalassemia. Genetics in Medicine. 2011;13:83–88. doi: 10.1097/GIM.0b013e3181fcb468. [DOI] [PubMed] [Google Scholar]
  37. Gallagher PG. Abnormalities of the erythrocyte membrane. Pediatric Clinics of North America. 2013;60:1349–1362. doi: 10.1016/j.pcl.2013.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Garn SM. Lower hematocrit levels in blacks are not due to diet or socioeconomic factors. Pediatrics. 1981;67:580. [PubMed] [Google Scholar]
  39. Génin E. Missing heritability of complex diseases: case solved? Human Genetics. 2020;139:103–113. doi: 10.1007/s00439-019-02034-4. [DOI] [PubMed] [Google Scholar]
  40. Glogowska E, Schneider ER, Maksimova Y, Schulz VP, Lezon-Geyda K, Wu J, Radhakrishnan K, Keel SB, Mahoney D, Freidmann AM, Altura RA, Gracheva EO, Bagriantsev SN, Kalfa TA, Gallagher PG. Novel mechanisms of piezo1 dysfunction in hereditary xerocytosis. Blood. 2017;130:1845–1856. doi: 10.1182/blood-2017-05-786004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Goheen MM, Wegmüller R, Bah A, Darboe B, Danso E, Affara M, Gardner D, Patel JC, Prentice AM, Cerami C. Anemia offers stronger protection than sickle cell trait against the erythrocytic stage of falciparum malaria and this protection is reversed by iron supplementation. EBioMedicine. 2016;14:123–130. doi: 10.1016/j.ebiom.2016.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Greene LS. G6pd deficiency as protection againstfalciparum malaria: An epidemiologic critique of population and experimental studies. American Journal of Physical Anthropology. 1993;36:153–178. doi: 10.1002/ajpa.1330360609. [DOI] [Google Scholar]
  43. GTEx Consortium. Laboratory, Data Analysis &Coordinating Center (LDACC)—Analysis Working Group. Statistical Methods groups—Analysis Working Group. Enhancing GTEx (eGTEx) groups. NIH Common Fund. NIH/NCI. NIH/NHGRI. NIH/NIMH. NIH/NIDA. Biospecimen Collection Source Site—NDRI. Biospecimen Collection Source Site—RPCI. Biospecimen Core Resource—VARI. Brain Bank Repository—University of Miami Brain Endowment Bank. Leidos Biomedical—Project Management. ELSI Study. Genome Browser Data Integration &Visualization—EBI. Genome Browser Data Integration &Visualization—UCSC Genomics Institute, University of California Santa Cruz. Lead analysts. Laboratory, Data Analysis &Coordinating Center (LDACC) NIH program management. Biospecimen collection. Pathology. eQTL manuscript working group. Battle A, Brown CD, Engelhardt BE, Montgomery SB. Genetic effects on gene expression across human tissues. Nature. 2017;550:204–213. doi: 10.1038/nature24277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Hanssen E, Knoechel C, Dearnley M, Dixon MWA, Le Gros M, Larabell C, Tilley L. Soft x-ray microscopy analysis of cell volume and hemoglobin content in erythrocytes infected with asexual and sexual stages of Plasmodium falciparum. Journal of Structural Biology. 2012;177:224–232. doi: 10.1016/j.jsb.2011.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Ifediba TC, Stern A, Ibrahim A, Rieder RF. Plasmodium falciparum in vitro: Diminished growth in hemoglobin h disease erythrocytes. Blood. 1985;65:452–455. doi: 10.1182/blood.V65.2.452.452. [DOI] [PubMed] [Google Scholar]
  46. Ilboudo Y, Bartolucci P, Garrett ME, Ashley-Koch A, Telen M, Brugnara C, Galactéros F, Lettre G. A common functional piezo1 deletion allele associates with red blood cell density in sickle cell disease patients. American Journal of Hematology. 2018;93:E362–E365. doi: 10.1002/ajh.25245. [DOI] [PubMed] [Google Scholar]
  47. Ioannidis JPA. Why most published research findings are false. PLOS Medicine. 2005;2:e124. doi: 10.1371/journal.pmed.0020124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Kanias T, Lanteri MC, Page GP, Guo Y, Endres SM, Stone M, Keating S, Mast AE, Cable RG, Triulzi DJ, Kiss JE, Murphy EL, Kleinman S, Busch MP, Gladwin MT. Ethnicity, sex, and age are determinants of red blood cell storage and stress hemolysis: Results of the REDS-III Rbc-omics study. Blood Advances. 2017;1:1132–1141. doi: 10.1182/bloodadvances.2017004820. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP, Gauthier LD, Brand H, Solomonson M, Watts NA, Rhodes D, Singer-Berk M, England EM, Seaby EG, Kosmicki JA, Walters RK, Tashman K, Farjoun Y, Banks E, Poterba T, Wang A, Seed C, Whiffin N, Chong JX, Samocha KE, Pierce-Hoffman E, Zappala Z, O’Donnell-Luria AH, Minikel EV, Weisburd B, Lek M, Ware JS, Vittal C, Armean IM, Bergelson L, Cibulskis K, Connolly KM, Covarrubias M, Donnelly S, Ferriera S, Gabriel S, Gentry J, Gupta N, Jeandet T, Kaplan D, Llanwarne C, Munshi R, Novod S, Petrillo N, Roazen D, Ruano-Rubio V, Saltzman A, Schleicher M, Soto J, Tibbetts K, Tolonen C, Wade G, Talkowski ME, Genome Aggregation Database Consortium. Neale BM, Daly MJ, MacArthur DG. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–443. doi: 10.1038/s41586-020-2308-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Kariuki SN, Marin-Menendez A, Introini V, Ravenhill BJ, Lin Y-C, Macharia A, Makale J, Tendwa M, Nyamu W, Kotar J, Carrasquilla M, Rowe JA, Rockett K, Kwiatkowski D, Weekes MP, Cicuta P, Williams TN, Rayner JC. Red blood cell tension protects against severe malaria in the Dantu blood group. Nature. 2020;585:579–583. doi: 10.1038/s41586-020-2726-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Kariuki SN, Williams TN. Human genetics and malaria resistance. Human Genetics. 2020;139:801–811. doi: 10.1007/s00439-020-02142-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Kierczak M. The contribution of rare whole genome sequencing variants to plasma protein levels and to the missing heritability. Research Square. 2021;1:625433/v1. doi: 10.21203/rs.3.rs-625433/v1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Kinsler G, Geiler-Samerotte K, Petrov D. A Genotype-Phenotype-Fitness Map Reveals Local Modularity and Global Pleiotropy of Adaptation. bioRxiv. 2020 doi: 10.1101/2020.06.25.172197. [DOI] [PMC free article] [PubMed]
  54. Koch M. Plasmodium falciparum erythrocyte-binding antigen 175 triggers a biophysical change in the red blood cell that facilitates invasion. PNAS. 2017;114:4225–4230. doi: 10.1073/pnas.1620843114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Kuypers FA. Use of ektacytometry to determine red cell susceptibility to oxidative stress. The Journal of Laboratory and Clinical Medicine. 1990;116:527–534. [PubMed] [Google Scholar]
  56. Kwiatkowski DP. How malaria has affected the human genome and what human genetics can teach us about malaria. American Journal of Human Genetics. 2005;77:171–192. doi: 10.1086/432519. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Leffler EM, Band G, Busby GBJ, Kivinen K, Le QS, Clarke GM, Bojang KA, Conway DJ, Jallow M, Sisay-Joof F, Bougouma EC, Mangano VD, Modiano D, Sirima SB, Achidi E, Apinjoh TO, Marsh K, Ndila CM, Peshu N, Williams TN, Drakeley C, Manjurano A, Reyburn H, Riley E, Kachala D, Molyneux M, Nyirongo V, Taylor T, Thornton N, Tilley L, Grimsley S, Drury E, Stalker J, Cornelius V, Hubbart C, Jeffreys AE, Rowlands K, Rockett KA, Spencer CCA, Kwiatkowski DP, Malaria Genomic Epidemiology Network Resistance to malaria through structural variation of red blood cell invasion receptors. Science. 2017;356:1140–1152. doi: 10.1126/science.aam6393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Lell B, May J, Schmidt-Ott RJ, Lehman LG, Luckner D, Greve B, Matousek P, Schmid D, Herbich K, Mockenhaupt FP, Meyer CG, Bienzle U, Kremsner PG. The role of red blood cell polymorphisms in resistance and susceptibility to malaria. Clinical Infectious Diseases. 1999;28:794–799. doi: 10.1086/515193. [DOI] [PubMed] [Google Scholar]
  59. Lessard S, Gatof ES, Beaudoin M, Schupp PG, Sher F, Ali A, Prehar S, Kurita R, Nakamura Y, Baena E, Ledoux J, Oceandy D, Bauer DE, Lettre G. An erythroid-specific atp2b4 enhancer mediates red blood cell hydration and malaria susceptibility. The Journal of Clinical Investigation. 2017;127:3065–3074. doi: 10.1172/JCI94378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Lewontin RC. The apportionment of human diversity’, in evolutionary biology. Taylor and Francis. 1972;6:381–398. doi: 10.1007/978-1-4684-9063-3_14. [DOI] [Google Scholar]
  61. Li H. Aligning Sequence Reads, Clone Sequences and Assembly Contigs with BWA-MEM. arXiv. 2013a http://arxiv.org/abs/1303.3997
  62. Li J, Glessner JT, Zhang H, Hou C, Wei Z, Bradfield JP, Mentch FD, Guo Y, Kim C, Xia Q, Chiavacci RM, Thomas KA, Qiu H, Grant SFA, Furth SL, Hakonarson H, Sleiman PMA. GWAS of blood cell Traits identifies novel associated loci and epistatic interactions in caucasian and african-american children. Human Molecular Genetics. 2013b;22:1457–1464. doi: 10.1093/hmg/dds534. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Lin S-H, Brown DW, Machiela MJ. LDTRAIT: An online tool for identifying published phenotype associations in linkage disequilibrium. Cancer Research. 2020;80:3443–3446. doi: 10.1158/0008-5472.CAN-20-0985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Lo KS, Wilson JG, Lange LA, Folsom AR, Galarneau G, Ganesh SK, Grant SFA, Keating BJ, McCarroll SA, O’Donnell CJ, Palmas W, Tang W, Tracy RP, Reiner AP, Lettre G. Genetic association analysis highlights new loci that modulate hematological trait variation in Caucasians and african Americans. Human Genetics. 2011;129:307–317. doi: 10.1007/s00439-010-0925-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Lowy-Gallego E, Fairley S, Zheng-Bradley X, Ruffier M, Clarke L, Flicek P, 1000 Genomes Project Consortium Variant calling on the GRCH38 assembly with the data from phase three of the 1000 Genomes project. Wellcome Open Research. 2019;4:50. doi: 10.12688/wellcomeopenres.15126.2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Luzzatto L. Sickle cell anaemia and malaria. Mediterranean Journal of Hematology and Infectious Diseases. 2012;4:e2012065. doi: 10.4084/MJHID.2012.065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Ma S, Cahalan S, LaMonte G, Grubaugh ND, Zeng W, Murthy SE, Paytas E, Gamini R, Lukacs V, Whitwam T, Loud M, Lohia R, Berry L, Khan SM, Janse CJ, Bandell M, Schmedt C, Wengelnik K, Su AI, Honore E, Winzeler EA, Andersen KG, Patapoutian A. Common piezo1 allele in african populations causes rbc dehydration and attenuates plasmodium infection. Cell. 2018;173:443–455. doi: 10.1016/j.cell.2018.02.047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Machado HE, Lawrie DS, Petrov DA. Pervasive strong selection at the level of codon usage bias in Drosophila melanogaster. Genetics. 2020;214:511–528. doi: 10.1534/genetics.119.302542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Mackinnon MJ, Mwangi TW, Snow RW, Marsh K, Williams TN. Heritability of malaria in Africa. PLOS Medicine. 2005;2:20340. doi: 10.1371/journal.pmed.0020340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Malaria Genomic Epidemiology Network Reappraisal of known malaria resistance loci in a large multicenter study. Nature Genetics. 2014;46:1197–1204. doi: 10.1038/ng.3107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Malaria Genomic Epidemiology Network Insights into malaria susceptibility using genome-wide data on 17,000 individuals from Africa, Asia and Oceania. Nature Communications. 2019;10:5732. doi: 10.1038/s41467-019-13480-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Mayer DCG, Cofie J, Jiang L, Hartl DL, Tracy E, Kabat J, Mendoza LH, Miller LH. Glycophorin B is the erythrocyte receptor of Plasmodium falciparum erythrocyte-binding ligand, EBL-1. PNAS. 2009;106:5348–5352. doi: 10.1073/pnas.0900878106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Mills RE, Luttig CT, Larkins CE, Beauchamp A, Tsui C, Pittard WS, Devine SE. An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Research. 2006;16:1182–1190. doi: 10.1101/gr.4565806. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Mockenhaupt FP. Anaemia in pregnant Ghanaian women: importance of malaria, iron deficiency, and haemoglobinopathies’, Transactions of the Royal Society of Tropical Medicine and Hygiene. Royal Society of Tropical Medicine and Hygiene. 2000;94:477–483. doi: 10.1016/S0035-9203(00)90057-9. [DOI] [PubMed] [Google Scholar]
  75. Moser KA. ‘strains used in whole organism Plasmodium falciparum vaccine trials differ in genome structure, sequence, and immunogenic potential’, genome medicine 2020 12:1. BioMed Central. 2020;12:1–17. doi: 10.1186/S13073-019-0708-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Nagayasu E, Ito M, Akaki M, Nakano Y, Kimura M, Looareesuwan S, Aikawa M. CR1 density polymorphism on erythrocytes of falciparum malaria patients in Thailand. The American Journal of Tropical Medicine and Hygiene. 2001;64:1–5. doi: 10.4269/ajtmh.2001.64.1.11425154. [DOI] [PubMed] [Google Scholar]
  77. Ndila CM, Uyoga S, Macharia AW, Nyutu G, Peshu N, Ojal J, Shebe M, Awuondo KO, Mturi N, Tsofa B, Sepúlveda N, Clark TG, Band G, Clarke G, Rowlands K, Hubbart C, Jeffreys A, Kariuki S, Marsh K, Mackinnon M, Maitland K, Kwiatkowski DP, Rockett KA, Williams TN, MalariaGEN Consortium Human candidate gene polymorphisms and risk of severe malaria in children in Kilifi, Kenya: A case-control association study. The Lancet. 2018;5:e333–e345. doi: 10.1016/S2352-3026(18)30107-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Nguetse CN, Purington N, Ebel ER, Shakya B, Tetard M, Kremsner PG, Velavan TP, Egan ES. A common polymorphism in the mechanosensitive ion channel piezo1 is associated with protection from severe malaria in humans. PNAS. 2020;117:9074–9081. doi: 10.1073/pnas.1919843117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Novembre J, Di Rienzo A. Spatial patterns of variation due to natural selection in humans. Nature Reviews. Genetics. 2009;10:745–755. doi: 10.1038/nrg2632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Okwa OO. InTech. Malaria parasites, malaria parasites; 2012. [DOI] [Google Scholar]
  81. Otto TD, Gilabert A, Crellen T, Böhme U, Arnathau C, Sanders M, Oyola SO, Okouga AP, Boundenga L, Willaume E, Ngoubangoye B, Moukodoum ND, Paupy C, Durand P, Rougeron V, Ollomo B, Renaud F, Newbold C, Berriman M, Prugnolle F. Genomes of all known members of a plasmodium subgenus reveal paths to virulent human malaria. Nature Microbiology. 2018;3:687–697. doi: 10.1038/s41564-018-0162-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Page GP, Kanias T, Guo YJ, Lanteri MC, Zhang X, Mast AE, Cable RG, Spencer BR, Kiss JE, Fang F, Endres-Dighe SM, Brambilla D, Nouraie M, Gordeuk VR, Kleinman S, Busch MP, Gladwin MT, National Heart, Lung, and Blood Institute (NHLBI) Recipient Epidemiology Donor Evaluation Study–III (REDS-III) program Multiple-ancestry genome-wide association study identifies 27 loci associated with measures of hemolysis following blood storage. The Journal of Clinical Investigation. 2021;131:146077. doi: 10.1172/JCI146077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Pankratov V, Montinaro F, Kushniarevich A, Hudjashov G, Jay F, Saag L, Flores R, Marnetto D, Seppel M, Kals M, Võsa U, Taccioli C, Möls M, Milani L, Aasa A, Lawson DJ, Esko T, Mägi R, Pagani L, Metspalu A, Metspalu M. Differences in local population history at the finest level: The case of the estonian population. European Journal of Human Genetics. 2020;28:1580–1591. doi: 10.1038/s41431-020-0699-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Pasvol G, Weatherall DJ, Wilson RJM. Cellular mechanism for the protective effect of haemoglobin S against P. falciparum malaria. Nature. 1978;274:701–703. doi: 10.1038/274701a0. [DOI] [PubMed] [Google Scholar]
  85. Pengon J, Svasti S, Kamchonwongpaisan S, Vattanaviboon P. Hematological parameters and red blood cell morphological abnormality of glucose-6-phosphate dehydrogenase deficiency co-inherited with thalassemia. Hematology/Oncology and Stem Cell Therapy. 2018;11:18–24. doi: 10.1016/j.hemonc.2017.05.029. [DOI] [PubMed] [Google Scholar]
  86. Perry GS, Byers T, Yip R, Margen S. Iron nutrition does not account for the hemoglobin differences between blacks and whites. The Journal of Nutrition. 1992;122:1417–1424. doi: 10.1093/jn/122.7.1417. [DOI] [PubMed] [Google Scholar]
  87. Pilia G, Chen W-M, Scuteri A, Orrú M, Albai G, Dei M, Lai S, Usala G, Lai M, Loi P, Mameli C, Vacca L, Deiana M, Olla N, Masala M, Cao A, Najjar SS, Terracciano A, Nedorezov T, Sharov A, Zonderman AB, Abecasis GR, Costa P, Lakatta E, Schlessinger D. Heritability of cardiovascular and personality traits in 6,148 Sardinians. PLOS Genetics. 2006;2:e132. doi: 10.1371/journal.pgen.0020132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC. PLINK: A tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Rooks H, Brewin J, Gardner K, Chakravorty S, Menzel S, Hannemann A, Gibson J, Rees DC. A gain of function variant in piezo1 (E756DEL) and sickle cell disease. Haematologica. 2019;104:e91–e93. doi: 10.3324/haematol.2018.202697. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA, Feldman MW. Genetic structure of human populations. Science. 2002;298:2381–2385. doi: 10.1126/science.1078311. [DOI] [PubMed] [Google Scholar]
  91. Rosenberg NA. A population-genetic perspective on the similarities and differences among worldwide human populations. Human Biology. 2011;83:659–684. doi: 10.3378/027.083.0601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Rowe JA. Blood group O protects against severe Plasmodium falciparum malaria through the mechanism of reduced rosetting. PNAS. 2007;104:17471–17476. doi: 10.1073/pnas.0705390104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Ruwende C, Hill A. Glucose-6-phosphate dehydrogenase deficiency and malaria. Journal of Molecular Medicine. 1998;76:581–588. doi: 10.1007/s001090050253. [DOI] [PubMed] [Google Scholar]
  94. Sauna ZE, Kimchi-Sarfaty C. Understanding the contribution of synonymous mutations to human disease. Nature Reviews. Genetics. 2011;12:683–691. doi: 10.1038/nrg3051. [DOI] [PubMed] [Google Scholar]
  95. Schulman S, Roth EF, Jr, Cheng B, Rybicki AC, Sussman II, Wong M, Wang W, Ranney HM, Nagel RL, Schwartz RS. Growth of Plasmodium falciparum in human erythrocytes containing abnormal membrane proteins. PNAS. 1990;87:7339–7343. doi: 10.1073/pnas.87.18.7339. [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Sennels HP, Jørgensen HL, Hansen ALS, Goetze JP, Fahrenkrug J. Diurnal variation of hematology parameters in healthy young males: The Bispebjerg study of diurnal variations. Scandinavian Journal of Clinical and Laboratory Investigation. 2011;71:532–541. doi: 10.3109/00365513.2011.602422. [DOI] [PubMed] [Google Scholar]
  97. Sohail M, Maier RM, Ganna A, Bloemendal A, Martin AR, Turchin MC, Chiang CW, Hirschhorn J, Daly MJ, Patterson N, Neale B, Mathieson I, Reich D, Sunyaev SR. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife. 2019;8:e39702. doi: 10.7554/eLife.39702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Sundararaman SA, Plenderleith LJ, Liu W, Loy DE, Learn GH, Li Y, Shaw KS, Ayouba A, Peeters M, Speede S, Shaw GM, Bushman FD, Brisson D, Rayner JC, Sharp PM, Hahn BH. Genomes of cryptic chimpanzee plasmodium species reveal key evolutionary events leading to human malaria. Nature Communications. 2016;7:11078. doi: 10.1038/ncomms11078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Supek F, Miñana B, Valcárcel J, Gabaldón T, Lehner B. Synonymous mutations frequently act as driver mutations in human cancers. Cell. 2014;156:1324–1335. doi: 10.1016/j.cell.2014.01.051. [DOI] [PubMed] [Google Scholar]
  100. Tarazona-Santos E, Castilho L, Amaral DRT, Costa DC, Furlani NG, Zuccherato LW, Machado M, Reid ME, Zalis MG, Rossit AR, Santos SEB, Machado RL, Lustigman S. Population genetics of GYPB and association study between GYPB*S/s polymorphism and susceptibility to P. falciparum infection in the Brazilian Amazon. PLOS ONE. 2011;6:e16123. doi: 10.1371/journal.pone.0016123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  101. Tibshirani R. Regression shrinkage and selection via the lasso. JSTOR. 1994;58:267–288. [Google Scholar]
  102. Tiffert T, Lew VL, Ginsburg H, Krugliak M, Croisille L, Mohandas N. The hydration state of human red blood cells and their susceptibility to invasion by Plasmodium falciparum. Blood. 2005;105:4853–4860. doi: 10.1182/blood-2004-12-4948. [DOI] [PMC free article] [PubMed] [Google Scholar]
  103. Timmann C, Thye T, Vens M, Evans J, May J, Ehmen C, Sievertsen J, Muntau B, Ruge G, Loag W, Ansong D, Antwi S, Asafo-Adjei E, Nguah SB, Kwakye KO, Akoto AOY, Sylverken J, Brendel M, Schuldt K, Loley C, Franke A, Meyer CG, Agbenyega T, Ziegler A, Horstmann RD. Genome-wide association study indicates two novel resistance loci for severe malaria. Nature. 2012;489:443–446. doi: 10.1038/nature11334. [DOI] [PubMed] [Google Scholar]
  104. Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J, Banks E, Garimella KV, Altshuler D, Gabriel S, DePristo MA. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Current Protocols in Bioinformatics. 2013;43:bi1110s43. doi: 10.1002/0471250953.bi1110s43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. van der Harst P, Zhang W, Mateo Leach I, Rendon A, Verweij N, Sehmi J, Paul DS, Elling U, Allayee H, Li X, Radhakrishnan A, Tan S-T, Voss K, Weichenberger CX, Albers CA, Al-Hussani A, Asselbergs FW, Ciullo M, Danjou F, Dina C, Esko T, Evans DM, Franke L, Gögele M, Hartiala J, Hersch M, Holm H, Hottenga J-J, Kanoni S, Kleber ME, Lagou V, Langenberg C, Lopez LM, Lyytikäinen L-P, Melander O, Murgia F, Nolte IM, O’Reilly PF, Padmanabhan S, Parsa A, Pirastu N, Porcu E, Portas L, Prokopenko I, Ried JS, Shin S-Y, Tang CS, Teumer A, Traglia M, Ulivi S, Westra H-J, Yang J, Zhao JH, Anni F, Abdellaoui A, Attwood A, Balkau B, Bandinelli S, Bastardot F, Benyamin B, Boehm BO, Cookson WO, Das D, de Bakker PIW, de Boer RA, de Geus EJC, de Moor MH, Dimitriou M, Domingues FS, Döring A, Engström G, Eyjolfsson GI, Ferrucci L, Fischer K, Galanello R, Garner SF, Genser B, Gibson QD, Girotto G, Gudbjartsson DF, Harris SE, Hartikainen A-L, Hastie CE, Hedblad B, Illig T, Jolley J, Kähönen M, Kema IP, Kemp JP, Liang L, Lloyd-Jones H, Loos RJF, Meacham S, Medland SE, Meisinger C, Memari Y, Mihailov E, Miller K, Moffatt MF, Nauck M, Novatchkova M, Nutile T, Olafsson I, Onundarson PT, Parracciani D, Penninx BW, Perseu L, Piga A, Pistis G, Pouta A, Puc U, Raitakari O, Ring SM, Robino A, Ruggiero D, Ruokonen A, Saint-Pierre A, Sala C, Salumets A, Sambrook J, Schepers H, Schmidt CO, Silljé HHW, Sladek R, Smit JH, Starr JM, Stephens J, Sulem P, Tanaka T, Thorsteinsdottir U, Tragante V, van Gilst WH, van Pelt LJ, van Veldhuisen DJ, Völker U, Whitfield JB, Willemsen G, Winkelmann BR, Wirnsberger G, Algra A, Cucca F, d’Adamo AP, Danesh J, Deary IJ, Dominiczak AF, Elliott P, Fortina P, Froguel P, Gasparini P, Greinacher A, Hazen SL, Jarvelin M-R, Khaw KT, Lehtimäki T, Maerz W, Martin NG, Metspalu A, Mitchell BD, Montgomery GW, Moore C, Navis G, Pirastu M, Pramstaller PP, Ramirez-Solis R, Schadt E, Scott J, Shuldiner AR, Smith GD, Smith JG, Snieder H, Sorice R, Spector TD, Stefansson K, Stumvoll M, Tang WHW, Toniolo D, Tönjes A, Visscher PM, Vollenweider P, Wareham NJ, Wolffenbuttel BHR, Boomsma DI, Beckmann JS, Dedoussis GV, Deloukas P, Ferreira MA, Sanna S, Uda M, Hicks AA, Penninger JM, Gieger C, Kooner JS, Ouwehand WH, Soranzo N, Chambers JC. Seventy-five genetic loci influencing the human red blood cell. Nature. 2012;492:369–375. doi: 10.1038/nature11677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Vuckovic D, Bao EL, Akbari P, Lareau CA, Mousas A, Jiang T, Chen MH, Raffield LM, Tardaguila M, Huffman JE, Ritchie SC, Megy K, Ponstingl H, Penkett CJ, Albers PK, Wigdor EM, Sakaue S, Moscati A, Manansala R, Lo KS, Qian H, Akiyama M, Bartz TM, Ben-Shlomo Y, Beswick A, Bork-Jensen J, Bottinger EP, Brody JA, van Rooij FJA, Chitrala KN, Wilson PWF, Choquet H, Danesh J, Di Angelantonio E, Dimou N, Ding J, Elliott P, Esko T, Evans MK, Felix SB, Floyd JS, Broer L, Grarup N, Guo MH, Guo Q, Greinacher A, Haessler J, Hansen T, Howson JMM, Huang W, Jorgenson E, Kacprowski T, Kähönen M, Kamatani Y, Kanai M, Karthikeyan S, Koskeridis F, Lange LA, Lehtimäki T, Linneberg A, Liu Y, Lyytikäinen LP, Manichaikul A, Matsuda K, Mohlke KL, Mononen N, Murakami Y, Nadkarni GN, Nikus K, Pankratz N, Pedersen O, Preuss M, Psaty BM, Raitakari OT, Rich SS, Rodriguez BAT, Rosen JD, Rotter JI, Schubert P, Spracklen CN, Surendran P, Tang H, Tardif JC, Ghanbari M, Völker U, Völzke H, Watkins NA, Weiss S, VA Million Veteran Program. Cai N, Kundu K, Watt SB, Walter K, Zonderman AB, Cho K, Li Y, Loos RJF, Knight JC, Georges M, Stegle O, Evangelou E, Okada Y, Roberts DJ, Inouye M, Johnson AD, Auer PL, Astle WJ, Reiner AP, Butterworth AS, Ouwehand WH, Lettre G, Sankaran VG, Soranzo N. The polygenic and monogenic basis of blood traits and diseases. Cell. 2020;182:1214–1231. doi: 10.1016/j.cell.2020.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  107. Walliker D, Quakyi IA, Wellems TE, McCutchan TF, Szarfman A, London WT, Corcoran LM, Burkot TR, Carter R. Genetic analysis of the human malaria parasite Plasmodium falciparum. Science. 1987;236:1661–1666. doi: 10.1126/SCIENCE.3299700. [DOI] [PubMed] [Google Scholar]
  108. Wang K, Li M, Hakonarson H. ANNOVAR: Functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Research. 2010;38:e164. doi: 10.1093/nar/gkq603. [DOI] [PMC free article] [PubMed] [Google Scholar]
  109. Weatherall DJ. Phenotype-genotype relationships in monogenic disease: Lessons from the thalassaemias. Nature Reviews. Genetics. 2001;2:245–255. doi: 10.1038/35066048. [DOI] [PubMed] [Google Scholar]
  110. Whitfield JB, Martin NG, Rao DC. Genetic and environmental influences on the size and number of cells in the blood. Genetic Epidemiology. 1985;2:133–144. doi: 10.1002/gepi.1370020204. [DOI] [PubMed] [Google Scholar]
  111. WHO Malaria eradication: Benefits, future scenarios and feasibility. Executive summary, who strategic advisory group on malaria eradication. 2019. [August 7, 2021]. https://www.who.int/publications/i/item/who-cds-gmp-2019-10
  112. Williams TN. Human red blood cell polymorphisms and malaria. Current Opinion in Microbiology. 2006;9:388–394. doi: 10.1016/j.mib.2006.06.009. [DOI] [PubMed] [Google Scholar]
  113. Wright GJ, Rayner JC. Plasmodium falciparum erythrocyte invasion: Combining function with immune evasion. PLOS Pathogens. 2014;10:e1003943. doi: 10.1371/journal.ppat.1003943. [DOI] [PMC free article] [PubMed] [Google Scholar]
  114. Yoshida A, Beutler E, Motulsky AG. Human glucose-6-phosphate dehydrogenase variants bulletin of the World Health Organization. World Health Organization. 1971;45:243–253. [PMC free article] [PubMed] [Google Scholar]
  115. Zámbó B, Várady G, Padányi R, Szabó E, Németh A, Langó T, Enyedi Á, Sarkadi B. Decreased calcium pump expression in human erythrocytes is connected to a minor haplotype in the atp2b4 gene. Cell Calcium. 2017;65:73–79. doi: 10.1016/j.ceca.2017.02.001. [DOI] [PubMed] [Google Scholar]
  116. Zhang Y. Multiple stiffening effects of nanoscale knobs on human red blood cells infected with Plasmodium falciparum malaria parasite. PNAS. 2015;112:6068–6073. doi: 10.1073/pnas.1505584112. [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision letter

Editor: Jenny Tung1

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Acceptance summary:

This paper finds that common red blood cell phenotypic and genetic variation predicts susceptibility to malarial parasites. Contrary to hypotheses about ancestry-associated malaria selection, however, these variants are not more common in African ancestry populations. Overall, this work presents convincing evidence that in vitro assays of malarial invasion and growth are a practical, effective complement to large-scale genome-wide association studies for understanding the genetics of malarial infection.

Decision letter after peer review:

Thank you for submitting your article "Common host variation drives malaria parasite fitness in healthy human red cells" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by George Perry as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1. Demonstrate the robustness of the results to removing the children from the single large family (one mother and five children) currently included in the analysis. More generally, demonstrate that genotype-based prediction is not confounded with family membership/relatedness.

2. Demonstrate that the LASSO model retains high accuracy when predicting Plasmodium invasion and growth phenotypes out-of-sample. Here, it will be crucial to completely separate the training set for the model from the test set to which it is applied (beyond internal cross-validation), either by cleanly stratifying the current sample or ideally by extending to new samples.

3. Evaluate the invasion measurements at 72 hours (the "re-invasion" phenotype); consider whether this reduces the very large amount of noise associated with the original 24-hour invasion phenotype.

4. Discuss the generalizability of the current findings to strains beyond the two strains (one lab, one clinical isolate) used in this study, including the rationale for the choice of these strains and the differences in results between them.

Reviewer #1 (Recommendations for the authors):

1. I'm most concerned about the estimates of predictive accuracy/generalization error. The permutations used to assess predictive accuracy confirm that the particular set of variants chosen by LASSO are more predictive than randomly chosen variants, in this particular sample. However, they don't provide insight into the predictive accuracy of the model out of sample. Although cross-validation should help with that problem, the CV procedure used in glmnet was not clear (also, I assume that α was fixed to 1 throughout; i.e., the authors only used LASSO, not the elastic net-it would be helpful to provide the exact parameters used). As reflected in my public review, I'm also surprised to see such strong predictive accuracy when the repeatability of growth and invasion measures from the same individuals sampled in different weeks (Figure S1) is modest to low. Is the repeatability much higher after controlling for batch and technical effects (which appear to be very substantial based on Table S1)?

2. Related, the predictive accuracy is so good that the results and methods would be very compelling if the model truly generalizes. Towards that end, I think it is essential to use a true out-of-sample test set. Ideally, this could be done by collecting additional samples and phenotyping/genotyping them. Minimally, a cleaner training/test split could be accomplished, e.g., by fitting the model with n = 50 (using internal CV) and then predicting out of sample in the remaining n = 23-although this compromises sample size in the training set, the model prediction accuracy is so high that it should be robust (note that if this approach is used, it will be important not to leak information from the training set into the test set during data normalization-that is, the values from the training set should not be allowed to influence the values from the test set at all). An additional approach would be to predict the repeated sample phenotype values (from n = 11 donors) based on the n = 73 non-carrier donors with one sample represented per donor (not as good, because the samples in the test set would not be truly independent, but still instructive).

3. If the predictive accuracy does hold up, I think the remarkably large effect sizes need to be reconciled with the difficulty of identifying large effect hits in malaria GWAS. Is this expected based on the strength of the correlation between replication rates in vivo and malaria infection/progression? Are the variants identified in the LASSO model strongly enriched for low p-values in GWAS (beyond linkage to known hits for some subset of variants)?

Reviewer #2 (Recommendations for the authors):

Awesome paper! It was a pleasure to read and very well written. The attention to detail was greatly appreciated. Most of my private recommendations are mainly suggestions for how to improve the presentation of data, but none of them are vital to the manuscript.

1. This isn't necessary, but I would like to suggest a figure that shows the association (pairwise) by carrier status for all of the RBC traits and invasion/growth rate statuses. This could be a heatmap where you would be able to show that certain carriers have a certain pattern of outcomes. You have this already in the text, but it may be easier for the reader to see it in figure format.

2. Most of my private recs are just about figures. Would it be possible to also include the association of RBC traits and African ancestry in Figure 6? I think these are really interesting and not having them in Figure 6 undersells the findings.

3. The scatterplots with the transparent dots are a little confusing to see. I would suggest something like a beeswarm plot for plots like Figure 2A-B, with a separate column for the replicates to show the tight distribution.

4. The first sentence of the discussion reads that "healthy red blood cells (RBCs) harbor extensive phenotypic and genetic variation,". RBCs have no nucleus and therefore no DNA.

Reviewer #3 (Recommendations for the authors):

1. The invasion measurements (fold change parasitaemia over 24 hours) were subject to a tremendous amount of variation, perhaps owing to culture conditions affecting schizont egress and subsequent merozoite invasion of RBCs. The authors acknowledged that these environmental effects could lead to greater experimental noise. How would their invasion measurements and analyses change if they took the parasitaemia measurements of parasites that had already gone through one life-cycle in the test RBCs, e.g. at 72 hours (re-invasion measurements)?

2. Limitations in targeted gene approach: could there be non-identified "disease alleles" in non-carriers that explain the overlap in RBC phenotypes and parasite fitness with carriers? They categorised carriers as those with known RBC disease alleles, mainly in haemoglobin and G6PD genes, while non-carriers as those not carrying these alleles. The genetic variants that they added to their analysis were limited to membrane protein genes. The non-carriers could carry a spectrum of additional gene variants that impact the RBC phenotypes observed, which could therefore influence parasite fitness.

3. The authors used one lab parasite strain and one field parasite isolate for their study, wouldn't it have been beneficial to also select a variety of parasite strains representing different invasion pathways and growth patterns, to check if these genetic and RBC phenotypic factors hold true across different strains? Given the limitations with the field isolate, wouldn't it be worthwhile to test other lab strains that use alternative invasion pathways? Also, it would be good to provide a sentence or two explaining the choice of lab and field strains in the study.

4. It was surprising that no variants in the glycophorins and haemoglobin genes were detected, given their important roles in the function of the RBC, and in parasite invasion (in the case of the glycophorins). They have previously been found to have large effect sizes in populations living in malaria endemic regions. Could the authors discuss this?

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Common host variation drives malaria parasite fitness in healthy human red cells" for further consideration by eLife. Your revised article has been evaluated by George Perry as the Senior Editor, and a Reviewing Editor.

The manuscript and response to the previous reviews address nearly all the original reviewer comments and concerns and, overall, represent an excellent contribution to the literature. Revisions to the LASSO prediction analysis now present convincing and realistic evidence that red blood cell phenotypes and common RBC alleles help predict in vitro growth phenotypes.

The remaining issue to be addressed is the inclusion of statistics on training set variance explained as a major result in the text, and as key parts of Figures 4 and 5 (parts B and C of each figure). As the reduction in explanatory power in the external test sets shows, these estimates are over-optimistic and likely a result of overfitting. We ask that you remove the training set statistics from the results and figures, as the test set results alone provide a clearer, more accurate view of model performance to readers, and likely mitigate concerns from readers who are experienced with (and concerned about) overfitting in predictive modeling.

eLife. 2021 Sep 23;10:e69808. doi: 10.7554/eLife.69808.sa2

Author response


Essential revisions:

1. Demonstrate the robustness of the results to removing the children from the single large family (one mother and five children) currently included in the analysis. More generally, demonstrate that genotype-based prediction is not confounded with family membership/relatedness.

We have repeated all the statistical analyses and updated the results after excluding the five children. The major genotypic and phenotypic predictors of P. falciparum replication remain basically the same, except for the family-specific PCs. We interpret this to mean that population structure from the family is unlikely to have biased the results, although some features of those samples are clearly associated with parasite fitness. We now note throughout the manuscript that the children are omitted from the figures and association analyses. To further protect from family confounding, we have also calculated pairwise kinship coefficients using ~400,000 SNPs (Methods) to confirm that no other related individuals were present in our data set.

2. Demonstrate that the LASSO model retains high accuracy when predicting Plasmodium invasion and growth phenotypes out-of-sample. Here, it will be crucial to completely separate the training set for the model from the test set to which it is applied (beyond internal cross-validation), either by cleanly stratifying the current sample or ideally by extending to new samples.

As suggested, we have performed k-fold cross validation on training and test sets to demonstrate that the LASSO model retains high accuracy. We have substantially revised the code, Methods, Results, and two main figures to reflect the suggested re-analysis. Over 1000 random splits of the non-carrier data into 10 folds, we found that RBC traits/genotypes selected from ‘train’ data have prediction accuracy in the separate ‘test’ fold that is significantly higher than expected from shuffled data or random genes. Many of the specific RBC predictors remain the same as in the earlier analysis, although the absolute variance explained in the test data is much smaller than in the train data (max R2 ~ 15% with N ~ 7 vs. ~80% with N ~60). We agree that this true cross-validation approach helps avoid overfitting while still identifying important RBC predictors of P. falciparum growth rate. Overall, the fact that our model retains significant predictive power in out-of-sample data provides strong support for our experimental approach and will stimulate additional research in this area using new and larger groups of samples.

3. Evaluate the invasion measurements at 72 hours (the "re-invasion" phenotype); consider whether this reduces the very large amount of noise associated with the original 24-hour invasion phenotype.

We agree that the manuscript would be improved if the measurements of P. falciparum invasion (24 hour timepoint) were less noisy. In line with the reviewer’s suggestion, we measured the parasitemia at 72 hours (the “re-invasion” phenotype), though please note that this was referred to as “growth” in the original manuscript because it reflects the growth of a ring stage parasite through a complete life cycle of development, egress, and re-invasion. We have altered our experimental diagram in Figure 1C to more clearly indicate that “re-invasion” is part of our “growth” measurement. Notably, we found genetic variants in well-known invasion receptors to be associated only with growth (Figure 5A). We discuss that several RBC phenotypes are significantly associated with both invasion and growth (Figure 4A), although invasion is noisier for both technical and biological reasons. The discussion now acknowledges our limitations in measuring invasion and offers suggestions for future experiments (lines 551-564).

4. Discuss the generalizability of the current findings to strains beyond the two strains (one lab, one clinical isolate) used in this study, including the rationale for the choice of these strains and the differences in results between them.

We have included more details and citations for the two divergent strains in the results and methods (Lines 146-150, 747-749). We now discuss the strong correlations between the strains, including for specific phenotypes and genotypes, which suggest that our results may be generalizable (Lines 565-575). Given some interesting differences between the strains, we acknowledge that future work would benefit from assaying additional parasite diversity linked to specific invasion pathways.

Reviewer #1 (Recommendations for the authors):

1. I'm most concerned about the estimates of predictive accuracy/generalization error. The permutations used to assess predictive accuracy confirm that the particular set of variants chosen by LASSO are more predictive than randomly chosen variants, in this particular sample. However, they don't provide insight into the predictive accuracy of the model out of sample. Although cross-validation should help with that problem, the CV procedure used in glmnet was not clear (also, I assume that α was fixed to 1 throughout; i.e., the authors only used LASSO, not the elastic net-it would be helpful to provide the exact parameters used). As reflected in my public review, I'm also surprised to see such strong predictive accuracy when the repeatability of growth and invasion measures from the same individuals sampled in different weeks (Figure S1) is modest to low. Is the repeatability much higher after controlling for batch and technical effects (which appear to be very substantial based on Table S1)?

We thank the reviewer for this important critique. As suggested, we have performed k-fold cross validation on training and test sets to demonstrate that the LASSO model retains high accuracy out-of-sample. Over 1000 random splits of the non-carrier data into 10 folds, we found that RBC traits/genotypes selected from ‘train’ data have prediction accuracy in separate ‘test’ folds that is significantly higher than expected from shuffled data or random genes. This analysis adds insight into the predictive accuracy of the selected phenotypes and genotypes in true out-of-sample data, which as expected, is lower than in the same data used to train the models. We have updated Figures 4 and 5 and the accompanying Results text (Lines 253-354, 392-413) to reflect the change in analysis. We apologize for the lack of clarity in the Methods, which have been revised to better explain all cross-validation procedures and parameters used (Lines 824-850). We have also revised the discussion (Lines 507-509; 554-559) to clarify that both biological and technical variation are expected to produce noise across samples from the same individuals collected over weeks to months. Finally, we have clarified the caption of Figure 2—figure supplement 2 to indicate that the data are shown after batch correction.

2. Related, the predictive accuracy is so good that the results and methods would be very compelling if the model truly generalizes. Towards that end, I think it is essential to use a true out-of-sample test set. Ideally, this could be done by collecting additional samples and phenotyping/genotyping them. Minimally, a cleaner training/test split could be accomplished, e.g., by fitting the model with n = 50 (using internal CV) and then predicting out of sample in the remaining n = 23-although this compromises sample size in the training set, the model prediction accuracy is so high that it should be robust (note that if this approach is used, it will be important not to leak information from the training set into the test set during data normalization-that is, the values from the training set should not be allowed to influence the values from the test set at all). An additional approach would be to predict the repeated sample phenotype values (from n = 11 donors) based on the n = 73 non-carrier donors with one sample represented per donor (not as good, because the samples in the test set would not be truly independent, but still instructive).

We appreciate this comment, which inspired a thorough re-analysis of the non-carrier data in a k-folds cross-validation framework. In addition to the previous responses, we here note that model performance on held-out test data was highly dependent on which samples were randomly allocated into the (small) test set. To avoid bias from a single randomization of the data into one test and one train set, we performed 1,000 random divisions of the data into 10 folds and treated each fold in turn as a left-out test set. We now report the mean performance in 10*1,000 = 10,000 test sets and 10,000 train sets, which should be more representative of the predictive power of the selected variables.

With respect to leaking information from the test set to the train set, we agree that it is critical to ensure that the parasite values in each set are not unduly influenced by each other. To that end, the parasite data from non-carriers besides the repeated control (1111) were not used to batch-correct the parasite data of other samples (see Figure 2—figure supplement 1 and the updated description of parasite normalization, Lines 776-785).

Unlike with the parasite data, we performed a median-based normalization of RBC phenotypes only once using all the non-carrier data. This was a different type of normalization based on equalizing weekly medians, which we chose because we lacked weekly control data (from sample 1111) for about half the weeks for most RBC phenotypes. If this phenotype normalization produced significant leakage across test and train sets, we might expect a spurious signal in models combining phenotypes with random genes. Importantly, we do not detect this (Figure 5), suggesting any leakage from phenotype normalization is minor.

Finally, we have added text to clarify that RBC phenotypes from the same individual are expected to vary over time (Lines 507-509). Unfortunately, we were unable to directly test our models on the repeated samples because we only collected parasite data (without other RBC phenotypes) for repeated samples after the first assay. We agree that intra-sample variability is an interesting avenue for future, more comprehensive research.

Overall, we appreciate this collection of important and thoughtful recommendations. They motivated a re-analysis that strongly increases our confidence in the RBC phenotypes and genotypes associated with P. falciparum replication and strengthens the manuscript.

3. If the predictive accuracy does hold up, I think the remarkably large effect sizes need to be reconciled with the difficulty of identifying large effect hits in malaria GWAS. Is this expected based on the strength of the correlation between replication rates in vivo and malaria infection/progression? Are the variants identified in the LASSO model strongly enriched for low p-values in GWAS (beyond linkage to known hits for some subset of variants)?

RBCs are certainly important for malaria, and we agree that future studies should take further advantage of existing GWAS data. Nevertheless, we would not expect large RBC effects on parasites in our experiments to translate to large effects in severe malaria GWAS (SM-GWAS). This is primarily because individual immune response, shaped by malaria exposure and age, is a major factor determining SM risk. Interestingly, the main new locus discovered by SM-GWAS (which changes expression of the RBC ion channel ATP2B4) has been associated with both immune cell and RBC phenotypes. A disconnect in ATP2B4 effect sizes for SM-GWAS and RBC traits has also been noted in recent work (Band et al., 2019; Villegas-Mendez et al., 2021). The present association study is the first of its kind in RBCs, and we are reassured that more than half of our growth-associated variants have already been associated with other RBC traits by GWAS, suggesting they are functional or linked to functional variants.

Reviewer #2 (Recommendations for the authors):

Awesome paper! It was a pleasure to read and very well written. The attention to detail was greatly appreciated. Most of my private recommendations are mainly suggestions for how to improve the presentation of data, but none of them are vital to the manuscript.

1. This isn't necessary, but I would like to suggest a figure that shows the association (pairwise) by carrier status for all of the RBC traits and invasion/growth rate statuses. This could be a heatmap where you would be able to show that certain carriers have a certain pattern of outcomes. You have this already in the text, but it may be easier for the reader to see it in figure format.

Thank you for this suggestion. We have added a heatmap showing the phenotypic patterns for each carrier group as a supplement for Figure 3.

2. Most of my private recs are just about figures. Would it be possible to also include the association of RBC traits and African ancestry in Figure 6? I think these are really interesting and not having them in Figure 6 undersells the findings.

We agree that the relationship between RBC and African ancestry is very interesting. We have added three of these panels to Figure 6, with the rest available in the figure supplement.

3. The scatterplots with the transparent dots are a little confusing to see. I would suggest something like a beeswarm plot for plots like Figure 2A-B, with a separate column for the replicates to show the tight distribution.

Thank you for this suggestion. We agree that beeswarm plots are very effective for comparing two groups, such as in the new Figure 5—figure supplement 2 and Figure 5—figure supplement 5. When comparing multiple groups, such as in Figures 2 and 3, we found beeswarm plots awkwardly wide when disallowing overlapping points. We now show similar points overlapping at the edges of each column (option “gutter”), which we agree offers a more accurate representation of the distributions than the transparent points.

4. The first sentence of the discussion reads that "healthy red blood cells (RBCs) harbor extensive phenotypic and genetic variation,". RBCs have no nucleus and therefore no DNA.

We have changed “genetic” to “proteomic” in this sentence to avoid confusion.

Reviewer #3 (Recommendations for the authors):

1. The invasion measurements (fold change parasitaemia over 24 hours) were subject to a tremendous amount of variation, perhaps owing to culture conditions affecting schizont egress and subsequent merozoite invasion of RBCs. The authors acknowledged that these environmental effects could lead to greater experimental noise. How would their invasion measurements and analyses change if they took the parasitaemia measurements of parasites that had already gone through one life-cycle in the test RBCs, e.g. at 72 hours (re-invasion measurements)?

We agree that the manuscript would be improved if the measurements of P. falciparum invasion (24 hour timepoint) were less noisy. In line with the reviewer’s suggestion, we measured the parasitemia at 72 hours (the “re-invasion” phenotype), though we referred to this as “growth” because it reflects the growth of a ring stage parasite through a complete life cycle of development, egress, and re-invasion. We have altered our experimental diagram in Figure 1C to more clearly indicate that “re-invasion” is part of our “growth” measurement.

We agree in the discussion (Lines 551-564) that re-invasion might be less noisy than invasion for many reasons, both experimental (e.g. recent time out of the incubator) and biological (e.g. invading host cells of a new genotype). New text in the discussion acknowledges our limitations in measuring invasion and offers suggestions for future experiments.

2. Limitations in targeted gene approach: could there be non-identified "disease alleles" in non-carriers that explain the overlap in RBC phenotypes and parasite fitness with carriers? They categorised carriers as those with known RBC disease alleles, mainly in haemoglobin and G6PD genes, while non-carriers as those not carrying these alleles. The genetic variants that they added to their analysis were limited to membrane protein genes. The non-carriers could carry a spectrum of additional gene variants that impact the RBC phenotypes observed, which could therefore influence parasite fitness.

We agree that unknown ‘disease alleles’ could contribute to the overlap between non-carriers and carriers (Lines 503-507). We analyzed 23 RBC proteins with strong links to P. falciparum in the literature, of which 5 were not membrane proteins (HBB, HBA1, HBA2, G6PD, and HP). We agree that other RBC proteins besides these 23 are likely to contain genetic variation that impacts parasite fitness, although we believe that testing many other, previously unknown proteins would require a much larger sample size (Lines 534-544).

3. The authors used one lab parasite strain and one field parasite isolate for their study, wouldn't it have been beneficial to also select a variety of parasite strains representing different invasion pathways and growth patterns, to check if these genetic and RBC phenotypic factors hold true across different strains? Given the limitations with the field isolate, wouldn't it be worthwhile to test other lab strains that use alternative invasion pathways? Also, it would be good to provide a sentence or two explaining the choice of lab and field strains in the study.

We agree that repeating these experiments with additional strains is a worthwhile endeavor. The current study was primarily limited by our ability to simultaneously culture multiple strains in a limited supply of fresh donor RBCs. For the two divergent strains used, we have included additional details and citations in the results and methods (Lines 146-150, 747-749). We also discuss the strong correlations between the strains, including for specific geneotypes and phenotypes, that suggest that our results may be generalizable (Lines 565-575). Given the interesting differences between the strains in their preference for RBCs with for African ancestry, we acknowledge that future work would benefit from assaying additional parasite diversity linked to specific invasion pathways.

4. It was surprising that no variants in the glycophorins and haemoglobin genes were detected, given their important roles in the function of the RBC, and in parasite invasion (in the case of the glycophorins). They have previously been found to have large effect sizes in populations living in malaria endemic regions. Could the authors discuss this?

In the hemoglobin genes, we did detect the variants that cause α-thalassemia, sickle cell disease, and hemoglobin C disease (Figure 1B). In non-carriers, we found one common, synonymous variant in HBB that was rarely selected by LASSO and was not associated with any GWAS data (Figure 5-Source data 3). Only one other common variant in HBB exons is present in gnomAD, and though it was not present in our filtered SNP set, it is classified as benign. To clarify the exome variants that we sequenced and analyzed, we have added Figure 1-Source Data 1, which contains annotations and frequencies for all exome variants that passed quality filters.

In the glycophorins, variants with large effect include GYPB*S (Tarazona-Santos et al., 2011), Dantu (Leffler et al., 2017; Field et al., 1994), and deficiencies of GYPA, GYPB, or GYPC. All but GYPB*S are rare in African populations, and we do detect GYPB*S (rs143997559) associated with 3D7 growth. We now point out this specific finding in the results (Lines 402-404).

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Common host variation drives malaria parasite fitness in healthy human red cells" for further consideration by eLife. Your revised article has been evaluated by George Perry as the Senior Editor, and a Reviewing Editor.

The manuscript and response to the previous reviews address nearly all the original reviewer comments and concerns and, overall, represent an excellent contribution to the literature. Revisions to the LASSO prediction analysis now present convincing and realistic evidence that red blood cell phenotypes and common RBC alleles help predict in vitro growth phenotypes.

The remaining issue to be addressed is the inclusion of statistics on training set variance explained as a major result in the text, and as key parts of Figures 4 and 5 (parts B and C of each figure). As the reduction in explanatory power in the external test sets shows, these estimates are over-optimistic and likely a result of overfitting. We ask that you remove the training set statistics from the results and figures, as the test set results alone provide a clearer, more accurate view of model performance to readers, and likely mitigate concerns from readers who are experienced with (and concerned about) overfitting in predictive modeling.

As suggested, we have removed the training set statistics from the results, Figures 4 and 5, and Figure 5 supplementary Figure 1, and now only present the test set statistics.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Ebel ER, Kuypers FA, Lin C, Petrov DA, Egan ES. 2020. Exome Sequencing from Participants in RBC/Malaria Study. NCBI BioProject. PRJNA683732

    Supplementary Materials

    Figure 1—source data 1. Individual genotypes, population frequencies, and protein annotations for exome variants passing quality filters (N~160,000).
    Figure 4—source data 1. Association statistics for individual phenotypic predictors with non-zero LASSO support.
    Figure 5—source data 1. Twenty-three RBC genes with strong links to malaria in the literature.
    Figure 5—source data 2. Proteins present in mature RBCs.

    This list was derived from the Red Blood Cell Collection database (rbcc.hegelab.org) using a medium-confidence filter.

    Figure 5—source data 3. All genetic and phenotypic predictors with non-zero LASSO support.

    Growth predictors selected in at least 40% of train data sets are indicated in bold. Genetic predictors are summarized in Figure 5A. NA indicates predictors that were only present as singletons in the smaller invasion data set.

    Transparent reporting form

    Data Availability Statement

    All data generated or analyzed during this study are included in the manuscript and supporting files. Source data files have been provided for Figures 1, 4, and 5 and other raw data and normalization scripts are available at https://github.com/emily-ebel/RBC (copy archived at https://archive.softwareheritage.org/swh:1:rev:31f953428a4ec5f0fa83201085ada0a0995facb2).

    The following dataset was generated:

    Ebel ER, Kuypers FA, Lin C, Petrov DA, Egan ES. 2020. Exome Sequencing from Participants in RBC/Malaria Study. NCBI BioProject. PRJNA683732


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES