Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2012 Jul 23;109(32):13052–13057. doi: 10.1073/pnas.1210585109

Sequence-based association and selection scans identify drug resistance loci in the Plasmodium falciparum malaria parasite

Daniel J Park a,b,1, Amanda K Lukens a,c, Daniel E Neafsey a, Stephen F Schaffner a, Hsiao-Han Chang b, Clarissa Valim c, Ulf Ribacke c, Daria Van Tyne c, Kevin Galinsky a, Meghan Galligan c, Justin S Becker c, Daouda Ndiaye d, Souleymane Mboup d, Roger C Wiegand a, Daniel L Hartl a,b,1,2, Pardis C Sabeti a,b,2, Dyann F Wirth a,c,2, Sarah K Volkman a,c,e,1,2
PMCID: PMC3420184  PMID: 22826220

Abstract

Through rapid genetic adaptation and natural selection, the Plasmodium falciparum parasite—the deadliest of those that cause malaria—is able to develop resistance to antimalarial drugs, thwarting present efforts to control it. Genome-wide association studies (GWAS) provide a critical hypothesis-generating tool for understanding how this occurs. However, in P. falciparum, the limited amount of linkage disequilibrium hinders the power of traditional array-based GWAS. Here, we demonstrate the feasibility and power improvements gained by using whole-genome sequencing for association studies. We analyzed data from 45 Senegalese parasites and identified genetic changes associated with the parasites’ in vitro response to 12 different antimalarials. To further increase statistical power, we adapted a common test for natural selection, XP-EHH (cross-population extended haplotype homozygosity), and used it to identify genomic regions associated with resistance to drugs. Using this sequence-based approach and the combination of association and selection-based tests, we detected several loci associated with drug resistance. These loci included the previously known signals at pfcrt, dhfr, and pfmdr1, as well as many genes not previously implicated in drug-resistance roles, including genes in the ubiquitination pathway. Based on the success of the analysis presented in this study, and on the demonstrated shortcomings of array-based approaches, we argue for a complete transition to sequence-based GWAS for small, low linkage-disequilibrium genomes like that of P. falciparum.


The malaria parasite Plasmodium falciparum imposes a tremendous disease burden on human societies and is responsible for 1.2 million deaths annually (1). Current efforts to eradicate malaria depend on the continued success of antimalarial drugs (2); however, the emergence of drug-resistant parasites threatens to hamper global health efforts to control and eliminate the disease. Understanding the genetic basis of these adaptations will be necessary to maintain effective global health policies in the face of an ever-changing pathogen.

A key to elucidating the genetic basis of drug resistance is identifying the specific genes associated with the phenotype. In human studies of this kind, the genome-wide association study (GWAS) has overtaken the classic candidate gene approach, made affordable by the use of genotyping arrays (or SNP arrays) that measure only a subset of variants in the genome (3). This optimization is only possible because of the extensive correlation between genetic markers (linkage disequilibrium or LD) in the human genome, which allows the subset of SNPs on an array to act as proxies for other markers not present; this process is known as “tagging” (4).

In P. falciparum, however, array-based GWAS is severely limited by the relatively short extent of LD (58). Lacking that correlation between genetic markers, genotyping arrays usually cannot detect associations with untyped markers, effectively limiting inferences to markers actually present on the array; even the highest density P. falciparum array reported to date found that LD between adjacent markers on the array was too weak for tagging in African populations (6). Consequently, current P. falciparum arrays cannot confidently capture all causal variants for important phenotypes.

The rapidly decreasing cost of whole-genome sequencing offers a promising solution. In principle, working with a whole-genome sequence allows one to directly assay all mutations segregating in the population, obviating the detection problems associated with short LD. Discovering mutations directly also avoids the ascertainment bias inherent to arrays, bias that is exacerbated when SNP discovery and genotyping are performed in different populations (9). Additionally, the small size of the P. falciparum genome (23 Mb, roughly the size of a human exome), makes it potentially 100-fold cheaper than whole-genome sequencing in humans. As malaria sequencing projects become cost-competitive with genotyping arrays, whole-genome sequencing has the potential to become the most effective approach to performing association studies in malaria.

Here, we test the hypothesis that whole-genome sequencing will identify SNP associations not detected by classic array-based approaches. We apply this method to identify loci in the P. falciparum genome that are associated with antimalarial drug resistance and compare the approach to a standard array-based GWAS. We improve the statistical power of this analysis by adapting a commonly used selection test, the cross-population extended haplotype homozygosity (XP-EHH) test (10), and use it as an association test for positively selected phenotypes. These approaches identify a number of candidate loci associated with antimalarial drug resistance, including genes in the ubiquitination pathway, suggesting that alteration of the parasites ability to modulate stress may contribute to evasion of drug pressure and development of resistance in P. falciparum.

Results

Forty-Five Parasite Genomes and the Absence of LD.

We chose a population in a West African region near Dakar, Senegal and culture-adapted 45 P. falciparum parasites recently isolated from malaria-infected patients. This population is particularly relevant for these studies because it has recently been exposed to multiple, changing drug regimens as clinical resistance to traditional drugs has emerged (11). We obtained whole-genome sequence data and generated high-quality consensus base calls for an average of 83% of each genome. This process produces 225,623 segregating SNPs, of which 25,757 met our call rate and minor allele frequency criteria for further study (see Methods). Sequence-based SNP calling in P. falciparum is technically challenging because of its extremely AT-rich genome (12, 13). In light of this finding, we validated our sequence-based approach against array-based methods by using a previously described SNP array (6) to genotype 24 of the 45 isolates. Of the 74,656 SNPs assayed by the array, 4,653 meet our call rate and minor allele frequency criteria. We observe nearly perfect concordance between Affymetrix genotypes and sequence genotypes (see Methods).

Our data demonstrate that SNPs in P. falciparum have very little ability to tag neighboring SNPs because of the short LD in the African population from which they were sampled. Although some portions of the genome exhibit significant LD, over 62% of the SNPs in the genome have no LD (r2 < 0.05) between adjacent SNPs, and 87% of the SNPs have insufficient LD to tag their neighbor (Fig. 1A) using the criterion derived from human GWAS (r2 < 0.8) (4). To measure tagging ability directly, we simulate genotyping arrays of various sizes by sampling random subsets of SNPs from our sequence data. We find that the simulated arrays are not able to tag a significant portion of unassayed markers, a result in stark contrast to the performance of human arrays (Fig. 1B). The tagging performance of our own Affymetrix array (tagging only 22.6% of segregating SNPs in Senegal) is even lower than simulated arrays of similar size (Fig. 1B), most likely because of population-based ascertainment biases (9) that were not modeled in our idealized approach. These findings lead us to conclude that array-based studies in P. falciparum will rarely be able to detect signals resulting from mutations not present on the array.

Fig. 1.

Fig. 1.

Simulated P. falciparum arrays are unable to tag SNPs not present on the array. (A) A histogram of LD between adjacent SNPs from sequenced P. falciparum (black). The vast majority of markers have little to no LD with their neighbors (62% of SNPs have r2 ≤ 0.05, 76% have r2 ≤ 0.2, and 87% have r2 ≤ 0.8). This finding contrasts with human studies, where much more of the genome shows moderate to strong LD between neighboring SNPs (gray). (B) Simulated genotyping marker sets of various sizes are plotted against the percentage of the entire sequenced marker set that they are able to tag (with r2 ≥ 0.8). The dashed, identity line depicts the theoretical scenario where all SNPs are in complete linkage equilibrium and no SNP tags another. Because this is true of 87% of SNPs in the malaria sequence data, the increase is almost linear (black dots). This finding contrasts with the array tagging performance seen in human studies (gray dots), where only a small fraction of markers are needed to tag the bulk of the genome, a principle upon which the array-based GWAS depends. The open triangle depicts the actual performance of the Affymetrix-based Broad Institute P. falciparum SNP array (6).

Sequence-Based GWAS.

The goal of these studies is to identify genomic changes associated with changes in parasite response to antimalarial drugs, as measured in the set of 45 independent P. falciparum isolates. We assayed the cultured parasites for in vitro drug responses (measured by IC50) to 12 standard antimalarials: amodiaquine, artemisinin, atovaquone, chloroquine, dihydroartemisnin, halofantrine, lumefantrine, mefloquine, piperaquine, primaquine, pyrimethamine, and quinine. These antimalarials constitute the 12 phenotypes used in our association studies (Fig. S1). Not surprisingly, drugs with similar chemical structures (e.g., halofantrine, lumefantrine, and mefloquine) show a strong correlation in responses (Fig. S2), as has previously been observed (6, 7), and provide the opportunity for cross-validation of SNPs identified in association studies.

To test associations between SNP genotypes and drug response, we use efficient mixed-model association (EMMA). EMMA is a quantitative association approach well-suited for small sample sizes and partially inbred organisms, such as the malaria parasite (14). It is a commonly used tool among mixed-model GWAS approaches (15) and has recently demonstrated effectiveness with P. falciparum drug studies (6). After correcting for multiple testing (Bonferroni correction for 25,757 SNPs, P < 2 × 10−6), EMMA is able to detect a number of previously known markers of drug resistance, such as four nonsynonymous SNPs in pfcrt (conferring amino acid changes: N75E/K, K76T, Q271E, R371I) (16, 17) associated with chloroquine response, one pfmdr1 SNP (conferring amino acid change: N86Y) (18, 19) associated with halofantrine, lumefantrine, and mefloquine response, and three dhfr SNPs (conferring amino acid changes: N51I, C59R, S108N) (20) associated with pyrimethamine response. We note here that, although mitochondrial and apicoplast genomes were also sequenced, no significant associations were found and the known mitochondrial mutations associated with atovaquone resistance in cytochrome b (codons 268, 133, and 280) (21, 22) were fixed in all 45 individuals for the drug-sensitive alleles. In all, EMMA detects 34 significant SNPs associated with parasite response to five drugs (Fig. S3). Most of these SNPs are in or near previously known associations (8), and five are previously unknown associations with pyrimethamine response (Dataset S1).

Although these sequence-based findings validate the previously known relationship between the pfmdr1 gene and parasite responses to halofantrine, lumefantrine, and mefloquine, it is notable that this association is not detectable by our SNP array (Fig. 2), as the array lacks any markers in pfmdr1 with a sufficiently high minor allele frequency. This finding exemplifies the type of association that can be missed by arrays because of limited LD. Additionally, the agreement between these three drugs at this locus provides validation of this result with respect to structurally related drugs.

Fig. 2.

Fig. 2.

Mefloquine association signals around the known drug resistance locus pfmdr1. EMMA results are shown for all of chromosome 5 with P values for each SNP on a −log10 scale against physical position. The array-based study (Array 24) does not detect any association at the known pfmdr1 locus because of a lack of marker coverage within the gene and sufficient LD around the gene. The sequence-based study with the same 24 samples (Seq 24) detects the expected hit at 0.96 Mb. Including all samples from the sequence-based study (Seq 45) increases the strength of this signal. The dashed line indicates the Bonferroni-corrected significance threshold (P = 0.05, genome-wide SNP counts are 7,068, 17,278, and 25,159, respectively).

Using Haplotype-Based Selection Tests for Association.

To test the hypothesis that drug resistance is largely driven by positive selection, we searched for long haplotypes associated with selection for drug resistance using the XP-EHH test (10). This selection test has not previously been used as a GWAS tool, but it is well suited for this purpose when we presume that the phenotype we are studying is under positive selection. Although this assumption is not valid for most human-based GWAS for noncommunicable diseases, it is very likely to be the case when studying parasite genomes for resistance adaptations to widely used drugs, which represent a strong selective pressure. Used in this way, the XP-EHH test identifies areas in the genome where resistant parasites show much longer haplotypes than sensitive parasites, indicative of recent positive selection on the resistant population. In our data, the test detects a number of signals, including pfcrt and dhfr, as well as a number of other hits spanning a total of 32 genomic regions across 11 drugs (Fig. 3, Fig. S4, and Dataset S1). Seventeen of these regions are indicative of selection in the drug-resistant population, whereas 15 are consistent with selection in the drug-sensitive population. With the exception of the regions containing pfcrt and dhfr, none of these loci were detected by EMMA alone.

Fig. 3.

Fig. 3.

Significant signals of drug-associated selection across five antimalarial drugs. XP-EHH results are shown using a Manhattan-inspired plot, with SNP z-scores plotted against genomic position, with each chromosome colored separately. Positive z-scores suggest selection in drug-resistant parasites, negative z-scores suggest selection in sensitive parasites. The dashed lines indicate the two-sided Bonferroni significance thresholds (P = 0.025 and 0.975). Only drugs with significant hits are shown here; z-score and quantile-quantile plots for all drugs are shown in Fig. S4.

Although this approach does not detect the known pfmdr1 locus, this is consistent with our expectations because of the nature of the test. The N86Y mutation in pfmdr1 confers increased susceptibility (18, 19) to many drugs compared with the wild-type allele. As such, this SNP would not be an expected candidate for positive natural selection on a novel variant, the type of selection XP-EHH is designed to detect. Moreover, the absence of a pfmdr1 signal from the XP-EHH test is consistent with the lack of findings in this gene from previous genomic scans for positive selection based on the relative EHH, iHS (integrated haplotype statistic), and XP-EHH tests in multiple populations (5, 6, 23).

In searching for long haplotypes, the XP-EHH test typically identifies a large number of significant SNPs in close proximity to each other. These regions often span many tens of kilobases and several annotated genes. This result is expected because the process of positive natural selection increases the prevalence of both the selected variant as well as of nearby variants, generating local regions of extended haplotypes. Thus, although XP-EHH strongly implicates these 32 regions as areas of phenotype-associated positive selection, by itself it is usually unable to localize the source of this selection to a specific gene. We use P values from EMMA to improve signal localization by identifying the strongest signals of association within each region. This approach allows us to suggest a possible gene or mutation as a focus of phenotype-specific positive selection for each identified region (Dataset S1) and is reminiscent of earlier approaches that intersect selection and association results (23, 24).

A more comprehensive examination of the regions under drug-associated selection reveals discrete biological pathways and processes that may be particularly important as mediators of drug response in P. falciparum (SI Results). The 59 genes in these 32 regions can be functionally classified as surface molecules or transporters, genome maintenance or transcriptional regulation, metabolic enzymes including lipid metabolizers, and members of the ubiquitin proteasome system. Most surface molecule-associated mutations and intergenic mutations are localized to intrachromosomal clusters containing var, rifin, and stevor genes, and a number of genes are found among molecules modulating ubiquitination, lipid metabolism, or folate metabolism. Members of these pathways are also represented in the large region of pyrimethamine-specific selection on chromosome 6, where it is difficult to localize the focus of selection. Collectively, these findings argue that certain biological processes in general, and genes in the ubiquitination and lipid metabolism pathways in particular, play important roles in modulating drug responses in P. falciparum.

Discussion

Complete genome sequencing provides many advantages over array-based genotyping for association studies. These advantages include the ability to directly type the causal allele, the increased detection power from increased marker density, and the ability to overcome ascertainment biases that arise when studying different populations with a fixed marker set. In P. falciparum, the lack of tagging ability because of the near absence of long-range LD limits the utility of arrays for association studies. Furthermore, the small genome size of P. falciparum brings the cost of whole-genome sequencing to approximate parity with traditional genotyping arrays, and recent advances in pathogen-specific DNA-enrichment and host-specific DNA-depletion techniques for clinical samples makes the sequence-based GWAS approach more accessible and cost-effective than ever before (13, 25).

We introduce a selection-association approach based on the XP-EHH selection test. Although this approach may not be appropriate for many association studies, it is sensible when the phenotype under study is under strong selection, which is likely the case for drug resistance in pathogens. As a haplotype-based test that takes advantage of multiple, adjacent SNPs, it has the advantage of being more sensitive than single-marker approaches like EMMA, given the same sample size (4). In addition to detecting new signals of drug-associated selection, we also find that the directional nature of the test statistic, a z-score, provides useful information about whether the selection is associated with drug sensitivity or resistance. Consequently, we also introduce an alternative visualization of the output: a Manhattan-like plot of z-scores, instead of −log10 P values, to illustrate the directionality of the signals (Fig. 3). In our data, we observed a tendency for many drugs (artemisnin, dihydroartemisnin, primaquine, halofantrine, lumefantrine, and mefloquine) to show highly significant signals of selection for drug sensitivity at pfcrt, the gene known to be responsible for chloroquine resistance (Fig. S4). Although, in principle, this type of signal may result from selection toward drug sensitivity, in this particular case it most likely results from the general pattern of anticorrelation between chloroquine and these six other drugs (Fig. S2). Additionally, the absence of a significant chloroquine sensitivity signal at pfcrt is consistent with reports that the return of chloroquine-sensitive parasites in Africa did not result from a classic selective sweep (26). In either case, the Manhattan-like z-score plots allow us to note the presence of these drug-sensitivity signals while keeping them visually separate from the drug resistance signals on which we wish to focus.

Our approaches identify a significant number of loci associated with changes in drug response (Dataset S1). The strongest of these loci contain previously known mediators of resistance, such as the mutations in pfcrt, pfmdr1, and dhfr. Curation of our remaining results using a variety of gene and protein prediction algorithms and literature searches (27) point to several cellular processes and pathways of potential interest, including the ubiquitin proteasome system, lipid metabolism, and folate metabolism (Dataset S1). We argue that these findings point to biological processes used by the parasite to survive drug pressure or circumvent the action of antimalarial compounds. Other genes of interest include those encoding three ABC transporters—a class of transporters known to modulate drug responses in other organisms (28)—and genes proposed to modulate chromatin (29, 30), DNA repair (31, 32), or RNA binding (33), pathways that have been shown to potentially be altered in response to drug pressure.

A number of the signals of recent positive selection are unique to pyrimethamine-resistant parasites. Although the known resistance locus, dhfr, is present among these, there are even stronger signals of pyrimethamine-associated selection on chromosome 6 and chromosome 12. The region on chromosome 6 contains two previously uncharacterized genes proposed to participate in folate metabolism (PFF1360w and PFF1490w), as well as five genes encoding proteins acting as either chaperones or in ubiquitination (PFF1365c, PFF1485w, PFF1445c, PFF1415c, and PFF1505w), and three genes encoding molecules likely to modulate lipid metabolism (PFF1350c, PFF1375c-a/b, and PFF1420w). In the chromosome 12 region, the XP-EHH test produces significant P values for eight SNPs over a 15-kb region spanning five adjacent genes. The extended haplotypes surrounding these SNPs continue even further, spanning 28 kb and 14 genes in total (Fig. 4A). These results present challenges for experimental validation, as the goal of association studies is to generate a small number of testable hypotheses about molecular mechanisms. Fortunately, the use of EMMA P values in this region can assist in localizing the signal. We find that the strongest EMMA SNP coincides with the strongest XP-EHH SNP, which is a nonsynonymous mutation in PFL2100w, a putative ubiquitin-conjugating enzyme (E2) (Fig. 4B). Additionally, a significant, pyrimethamine-specific selection signal on chromosome 8 is entirely contained within MAL8P1.23 [a putative HECT (homologous to the E6-AP carboxyl terminus) ubiquitin ligase E3] (Dataset S1), another gene in the ubiquitin-mediated pathway (34). Given the role of this pathway in directing protein degradation and recycling, it is possible that alterations in these genes create changes in stress responses or protein turnover of key resistance modulators that allow the parasite to survive under drug pressure.

Fig. 4.

Fig. 4.

Localizing the pyrimethamine-associated selection signal on chromosome 12. (A) Defining the region: XP-EHH identifies eight genome-wide significant SNPs in close proximity on chromosome 12. Each of these eight SNPs represents the center of an area of extended haplotype homozygosity, as measured by the EHH statistic. Haplotype decay for resistant parasites is plotted for each of these eight SNPs, which defines a larger region from 1.807 Mb to 1.835 Mb in which the causal mutation may exist. This region spans 28 kb and 14 genes. (B) Localizing the signal: focusing within this region, we use single-marker association signals from EMMA to localize the signal. The most significant EMMA SNP coincides with the most significant XP-EHH SNP and localizes to an E398D amino acid change in PFL2100w (ubiquitin conjugating enzyme E2).

The evolution of drug resistance in the natural setting is likely to be a multistep process and our work potentially identifies key pathways involved in this process. Field-based evidence has demonstrated a reduced fitness for drug-resistant parasites in the absence of drug pressure, and laboratory-based work has demonstrated the relative fitness of different mutational changes in target enzymes. Our findings point to potential compensatory mutations in a pathway related to protein stability and turnover, and it is tempting to speculate that such adaptations enable the “expression” of a resistant phenotype, such as has been observed in yeast (35). Although molecular approaches are required to validate the role of this pathway in modulating drug response, these results demonstrate the potential for sequence-based GWAS approaches to identify pathways, in addition to individual genes, that may be responsible for the phenotype of interest.

Ultimately, all association results require experimental validation and follow-up work to explore possible mechanisms of action. Association studies, even in their ideal form, simply generate hypotheses based on correlations. However, improved methods for association studies can significantly reduce the necessary validation work by reducing false-positive rates, increasing study-detection power, and improving localization ability. This study successfully pilots the use of whole-genome sequence data for association studies in malaria and demonstrates significant advantages in detection power over array-based studies. We strongly recommend that future association studies in low-LD, small-genome organisms adopt the sequence-based GWAS approach as well, given the relative costs. We additionally demonstrate the effectiveness of the XP-EHH selection test as an association test for phenotypes under positive selection. Finally, we combine data from both tests to localize long signals and reduce the number of hypotheses for follow-up validation. This combined approach identifies more candidate loci than with single-marker tests alone.

Materials and Methods

Sequencing.

Parasites were obtained from patients with uncomplicated mild malaria in Senegal from 2001 to 2009 under ethical approval from the Institutional Review Board at the Harvard School of Public Health under protocol #16330-106 with informed consent for the study. Parasites were culture-adapted by standard methods (36) and genomic DNA was extracted from 45 single-clone samples. Samples were determined to be monoclonal and genetically distinct by a 24 SNP molecular barcode (37). Genomic DNA was sequenced using Illumina Hi-Seq machines. The first 12 parasites were sequenced with 76-bp single-end reads and the remaining 33 were sequenced with paired-end reads ranging from 76 bp to 101 bp in length. The median sequence coverage depth was 144.8× after alignment (ranging from 32× to 400×). Reads were aligned with the Burrows-Wheeler Aligner (BWA) v0.5.9-r16 against the 3D7 reference assembly (PlasmoDB v7.1). A consensus sequence was called for each strain using the GATK Unified Genotyper v1.2.3-g61b89e2 (38) with the following parameters: -A AlleleBalance -stand_emit_conf 0 ‐‐output_mode EMIT_ALL_SITES. Bases were then removed if they exhibited poor quality (GQ less than 30 or QUAL less than 60) or if they called a heterozygous genotype. This process left consensus calls for 56–91% of the genome (83% median) for each of 45 individuals. Of these sites, 225,623 positions are polymorphic among the 45 individuals. Of these SNPs, only 25,757 had genotypes in at least 36 individuals (80% call rate) and were nonsingletons (i.e., minor allele count > 1 or minor allele frequency > 4%). All analyses are based on this set of 25,757 SNPs. SNP data are available in dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/) as batch Pf_0004 from submitter BROAD-GENOMEBIO. SNPs have been deposited at PlasmoDB (27) v9.1 to allow easy searching and visualization in combination with other malaria genomic data sets. SNP data can also be found in ref. 39. Consensus calls for the whole genome are available in ref. 40.

Principal component analysis was conducted using the program SMARTPCA (41) in the EIGENSOFT 3.0 package. We applied a local LD correction (nsnpldregress = 2) and found no significant eigenvectors in the population.

Tagging Analysis.

Tagging analysis in Fig. 1B was generated by using PLINK (42) to find tagging SNPs for each SNP that were within 10 kb and at least r2 ≥ 0.8. We then simulated genotyping arrays by randomly sampling subsets of SNPs of varying subset sizes and calculating the fraction of total SNPs that are tagged by the subset. We first reduced the sequence data to 40 random individuals to simulate ascertainment bias against low allele-frequency markers, then randomly sampled markers that were still polymorphic among the smaller population size to simulate a genotyping array. We simulated 19 different array sizes, ranging from 5% of the sequenced SNPs (1,227) to 95% of the sequenced SNPs (22,087). Two-hundred simulations per array size were run and the result was highly consistent: 95% confidence intervals were too small to visualize on the figure. Simulations for the human genome were based on 60 diploid individuals of European descent (CEU) from Hapmap release 23a. Each iteration chose 54 random individuals to simulate ascertainment bias, filtered SNPs to an 80% call rate and to nonsingletons. Our Affymetrix array was able to tag 5,508 SNPs in our sequence data using the 4,894 SNPs on the array that overlapped with the 25,757 SNPs in our sequence data (open triangle in Fig. 1B). Histograms in Fig. 1A are binned into 20 evenly spaced bins of r2 from 0 to 1. The plot is normalized such that the sum of all bars in each histogram is equal to 1 to show the relative proportions of SNPs in each bin. Simulation data are provided at ftp://ftp.broadinstitute.org/pub/malaria/pnas-park-2012-suppfile-1.zip (39).

Drug Assays.

Drug assays were performed as previously described (43) with slight modifications for 384-well format (SI Methods). The range of drug concentrations are shown in Fig. S1, and the IC50 data along with raw input data for all association tests, is provided in ref. 39.

EMMA.

Single marker association tests were run using EMMA (14). Because not all drugs have complete phenotype data for all 45 individuals, SNPs are additionally filtered to those that met our previous call rate and minor allele criteria among the subset of samples for which drug data exists. This filtering results in 23,000–25,180 SNPs for any given drug. Log10 (IC50) values were used for this quantitative test. Biological replicates of drug data were presented to EMMA as multiple individuals from the same genetic strain, which allows EMMA to use the additional data to discern heritable phenotypic variance from nonheritable variance (15) and mimics the use of clonally identical parasites in other studies (44, 45). Significance was defined as SNPs that exceeded a Bonferroni-corrected threshold of P < 0.05 and also survive 60% of jackknife simulations. EMMA results were jackknifed by performing 200 random subsets of 38 samples and requiring an false-discovery rate-corrected significance of Q < 0.1. SNPs that passed this threshold in 60% of jackknife simulations were considered to be robust against false-positives because of small sample-size effects.

XP-EHH.

Selection-association tests were run using the XP-EHH test (10). Each drug defined a partitioning of samples into two “subpopulations” (“sensitive” and “resistant”) based on cutoffs shown in Fig. S1 and provided at ftp://ftp.broadinstitute.org/pub/malaria/pnas-park-2012-suppfile-1.zip (39) (SI Methods). XP-EHH requires a recombination map as input, which we constructed with LDhat v2.1 (46) (SI Methods). XP-EHH also requires fully imputed genotypes. Imputation was performed using PHASE 2.1.1 (47), producing 29,605 nonsingleton SNPs (SI Methods).

XP-EHH computes a significance value for each SNP in the genome, assuming that SNP comprises the haplotype “core” of selection. Because the test identifies long haplotypes, it results in a large number of genome-wide significant SNPs (defined by Bonferroni-corrected P < 0.05) in clustered stretches of the genome. We reduced the set of significant SNPs to a set of significant genomic regions by taking each significant core SNP, computing a window around each one where EHH decayed to 0.05, and merging overlapping windows. This process resulted in a smaller list of significant regions for each drug (Dataset S1). Regions were further filtered by removing those which did not contain at least one core SNP that survived 50% of jackknife simulations. XP-EHH results were jackknifed by performing 200 random subsets of 38 samples and requiring a Bonferroni-corrected significance of P < 0.1.

Genotyping Arrays.

A subset of 25 parasites was also hybridized to an Affymetrix array containing 74,656 markers (6). SNPs were called using BRLMM-P from Affy Power Tools v1.10.2 and filtered according to the same methods as Van Tyne, et al. (6), resulting in 15,075 validated SNPs, 8,778 of which were polymorphic among the 25 individuals from Senegal. SNP coordinates were converted from PlasmoDB v5.0 coordinates to v7.1 coordinates using whole genome nucmer alignments (48). Concordance between array and sequencing data were measured for the set of markers in which genotype calls existed by both methods. For 24 samples, nearly perfect concordance between Affymetrix genotypes and sequence genotypes was observed for the 24 samples (averaging 99.2% concordance, with all 24 samples above 98.2% concordance). This level of concordance is similar to what is observed with technical replicate hybridizations of the same DNA sample (6). One sample, SenP19.04.c, reported a 28.2% mismatch rate, suggestive of a sample identification error, and was removed from the analysis. EMMA analyses were run on the array data using the same filters and procedures as for sequence data described above, using 4,514–4,653 SNPs per drug phenotype. Results are shown in Fig. S5. Array data for these 24 samples are available from ref. 39.

Supplementary Material

Supporting Information

Acknowledgments

We thank the sample collection team in Senegal, including Younouss Diedhiou, Lamine Ndiaye, Amadou Moctar Mbaye, Baba Dieye, Moussa Dieng Sarr, Papa Diogoye Sene, and Ngayo Sy; the technical staff at the Harvard School of Public Health who maintained parasite cultures, including Kayla Barnes, Dave Rosen, Kate Fernandez, and Gilberto Ramirez; members of the P.C.S. laboratory for a careful review of our manuscript, including Kristian Andersen, Chris Edwards, Chris Matranga, Rachel Sealfon, Jesse Shapiro, Ilya Shlyakhter, Matt Stremlau, and Shervin Tabrizi; and those who made contributions to the community database, PlasmoDB.org, that facilitated biological curation of candidate genes presented in this work. This study is supported by the Bill and Melinda Gates Foundation; National Institutes of Health (NIH) Grant 1R01AI075080-01A1; the Ellison Medical Foundation; the Exxon-Mobil Foundation; the NIH Fogarty International Center; the National Institute of Allergy and Infectious Diseases, and Broad Scientific Planning and Allocation of Resources Committee (SPARC); a National Science Foundation Graduate Research Fellowship (to D.J.P.); and fellowships from the Burroughs Wellcome and Packard Foundations (to P.C.S.).

Footnotes

The authors declare no conflict of interest.

Data deposition: The SNP data have been deposited at dbSNP, www.ncbi.nlm.nih.gov/projects/SNP (batch id Pf_0004 from submitter BROAD-GENOMEBIO), are accessible via the Broad Institute, ftp://ftp.broadinstitute.org/pub/malaria/pnas-park-2012-suppfile-1.zip, and have also been deposited in PlasmoDB v9.1, http://plasmodb.org/. The consensus calls for the whole genome are available via the Broad Institute, ftp://ftp.broadinstitute.org/pub/malaria/pnas-park-2012-suppfile-2.vcf.gz.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1210585109/-/DCSupplemental.

References

  • 1.Murray CJL, et al. Global malaria mortality between 1980 and 2010: A systematic analysis. Lancet. 2012;379:413–431. doi: 10.1016/S0140-6736(12)60034-8. [DOI] [PubMed] [Google Scholar]
  • 2.malERA Consultative Group on Drugs A research agenda for malaria eradication: Drugs. PLoS Med. 2011;8:e1000402. doi: 10.1371/journal.pmed.1000402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Altshuler DM, Daly MJ, Lander ES. Genetic mapping in human disease. Science. 2008;322:881–888. doi: 10.1126/science.1156409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.de Bakker PIW, et al. Efficiency and power in genetic association studies. Nat Genet. 2005;37:1217–1223. doi: 10.1038/ng1669. [DOI] [PubMed] [Google Scholar]
  • 5.Mu J, et al. Plasmodium falciparum genome-wide scans for positive selection, recombination hot spots and resistance to antimalarial drugs. Nat Genet. 2010;42:268–271. doi: 10.1038/ng.528. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Van Tyne D, et al. Identification and functional validation of the novel antimalarial resistance locus PF10_0355 in Plasmodium falciparum. PLoS Genet. 2011;7:e1001383. doi: 10.1371/journal.pgen.1001383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Yuan J, et al. Chemical genomic profiling for antimalarial therapies, response signatures, and molecular targets. Science. 2011;333:724–729. doi: 10.1126/science.1205216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Volkman SK, Neafsey DE, Schaffner SF, Park DJ, Wirth DF. Harnessing genomics and genome biology to understand malaria biology. Nat Rev Genet. 2012;13:315–328. doi: 10.1038/nrg3187. [DOI] [PubMed] [Google Scholar]
  • 9.Albrechtsen A, Nielsen FC, Nielsen R. Ascertainment biases in SNP chips affect measures of population divergence. Mol Biol Evol. 2010;27:2534–2547. doi: 10.1093/molbev/msq148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Sabeti PC, et al. International HapMap Consortium Genome-wide detection and characterization of positive selection in human populations. Nature. 2007;449:913–918. doi: 10.1038/nature06250. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Mouzin E, Thior PM, Diouf MB, Sambou B. Geneva, Switzerland: WHO; 2010. Focus on senegal. Progress & impact series, no. 4, Available at http://www.path.org/publications/detail.php?i=2072. [Google Scholar]
  • 12.Oyola SO, et al. Optimizing Illumina next-generation sequencing library preparation for extremely AT-biased genomes. BMC Genomics. 2012;13:1. doi: 10.1186/1471-2164-13-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Melnikov A, et al. Hybrid selection for sequencing pathogen genomes from clinical samples. Genome Biol. 2011;12:R73. doi: 10.1186/gb-2011-12-8-r73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Kang HM, et al. Efficient control of population structure in model organism association mapping. Genetics. 2008;178:1709–1723. doi: 10.1534/genetics.107.080101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Price AL, Zaitlen NA, Reich DE, Patterson N. New approaches to population stratification in genome-wide association studies. Nat Rev Genet. 2010;11:459–463. doi: 10.1038/nrg2813. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Fidock DA, et al. Mutations in the P. falciparum digestive vacuole transmembrane protein PfCRT and evidence for their role in chloroquine resistance. Mol Cell. 2000;6:861–871. doi: 10.1016/s1097-2765(05)00077-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Wootton JC, et al. Genetic diversity and chloroquine selective sweeps in Plasmodium falciparum. Nature. 2002;418:320–323. doi: 10.1038/nature00813. [DOI] [PubMed] [Google Scholar]
  • 18.Duraisingh MT, et al. The tyrosine-86 allele of the pfmdr1 gene of Plasmodium falciparum is associated with increased sensitivity to the anti-malarials mefloquine and artemisinin. Mol Biochem Parasitol. 2000;108:13–23. doi: 10.1016/s0166-6851(00)00201-2. [DOI] [PubMed] [Google Scholar]
  • 19.Nkhoma S, et al. Parasites bearing a single copy of the multi-drug resistance gene (pfmdr-1) with wild-type SNPs predominate amongst Plasmodium falciparum isolates from Malawi. Acta Trop. 2009;111:78–81. doi: 10.1016/j.actatropica.2009.01.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Nair S, et al. A selective sweep driven by pyrimethamine treatment in southeast Asian malaria parasites. Mol Biol Evol. 2003;20:1526–1536. doi: 10.1093/molbev/msg162. [DOI] [PubMed] [Google Scholar]
  • 21.Kessl JJ, Meshnick SR, Trumpower BL. Modeling the molecular basis of atovaquone resistance in parasites and pathogenic fungi. Trends Parasitol. 2007;23:494–501. doi: 10.1016/j.pt.2007.08.004. [DOI] [PubMed] [Google Scholar]
  • 22.Dong CK, et al. Identification and validation of tetracyclic benzothiazepines as Plasmodium falciparum cytochrome bc1 inhibitors. Chem Biol. 2011;18:1602–1610. doi: 10.1016/j.chembiol.2011.09.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Cheeseman IH, et al. A major genome region underlying artemisinin resistance in malaria. Science. 2012;336:79–82. doi: 10.1126/science.1215966. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Kudaravalli S, Veyrieras JB, Stranger BE, Dermitzakis ET, Pritchard JK. Gene expression levels are a target of recent natural selection in the human genome. Mol Biol Evol. 2009;26:649–658. doi: 10.1093/molbev/msn289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Venkatesan M, et al. Using CF11 cellulose columns to inexpensively and effectively remove human DNA from Plasmodium falciparum-infected whole blood samples. Malar J. 2012;11:41. doi: 10.1186/1475-2875-11-41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Laufer MK, et al. Return of chloroquine-susceptible falciparum malaria in Malawi was a reexpansion of diverse susceptible parasites. J Infect Dis. 2010;202:801–808. doi: 10.1086/655659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Aurrecoechea C, et al. PlasmoDB: A functional genomic database for malaria parasites. Nucleic Acids Res. 2009;37(Database issue):D539–D543. doi: 10.1093/nar/gkn814. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Leprohon P, Légaré D, Ouellette M. ABC transporters involved in drug resistance in human parasites. Essays Biochem. 2011;50:121–144. doi: 10.1042/bse0500121. [DOI] [PubMed] [Google Scholar]
  • 29.Cui L, Miao J. Chromatin-mediated epigenetic regulation in the malaria parasite Plasmodium falciparum. Eukaryot Cell. 2010;9:1138–1149. doi: 10.1128/EC.00036-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Coleman BI, Duraisingh MT. Transcriptional control and gene silencing in Plasmodium falciparum. Cell Microbiol. 2008;10:1935–1946. doi: 10.1111/j.1462-5822.2008.01203.x. [DOI] [PubMed] [Google Scholar]
  • 31.Castellini MA, et al. Malaria drug resistance is associated with defective DNA mismatch repair. Mol Biochem Parasitol. 2011;177:143–147. doi: 10.1016/j.molbiopara.2011.02.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Tarique M, Satsangi AT, Ahmad M, Singh S, Tuteja R. Plasmodium falciparum MLH is schizont stage specific endonuclease. Mol Biochem Parasitol. 2012;181:153–161. doi: 10.1016/j.molbiopara.2011.10.012. [DOI] [PubMed] [Google Scholar]
  • 33.Meng X, et al. Cytoplasmic Metadherin (MTDH) provides survival advantage under conditions of stress by acting as RNA-binding protein. J Biol Chem. 2012;287:4485–4491. doi: 10.1074/jbc.C111.291518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Ponts N, et al. Deciphering the ubiquitin-mediated pathway in apicomplexan parasites: A potential strategy to interfere with parasite virulence. PLoS ONE. 2008;3:e2386. doi: 10.1371/journal.pone.0002386. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Jarosz DF, Lindquist S. Hsp90 and environmental stress transform the adaptive value of natural genetic variation. Science. 2010;330:1820–1824. doi: 10.1126/science.1195487. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Trager W, Jensen JB. Human malaria parasites in continuous culture. Science. 1976;193:673–675. doi: 10.1126/science.781840. [DOI] [PubMed] [Google Scholar]
  • 37.Daniels R, et al. A general SNP-based molecular barcode for Plasmodium falciparum identification and tracking. Malar J. 2008;7:223. doi: 10.1186/1475-2875-7-223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.McKenna A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Broad Institute (2012) Tagging simulation data, drug data, PLINK-formatted input data for both sequence and array data, recombination maps, imputed genotypes, GWAS outputs, and R code for generating all figures. Available at ftp://ftp.broadinstitute.org/pub/malaria/pnas-park-2012-suppfile-1.zip.
  • 40. Broad Institute (2012) Consensus sequence calls for each of 45 strains and 23 million bases. VCF file is bgzip compressed and indexed by tabix and vcftools (.tbi and .vcfidx files are also in this directory). Available at ftp://ftp.broadinstitute.org/pub/malaria/pnas-park-2012-suppfile-2.vcf.gz.
  • 41.Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006;2:e190. doi: 10.1371/journal.pgen.0020190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Purcell S, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Plouffe D, et al. In silico activity profiling reveals the mechanism of action of antimalarials discovered in a high-throughput screen. Proc Natl Acad Sci USA. 2008;105:9059–9064. doi: 10.1073/pnas.0802982105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Anderson TJC, et al. Inferred relatedness and heritability in malaria parasites. Proc Biol Sci. 2010;277:2531–2540. doi: 10.1098/rspb.2010.0196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Anderson TJC, et al. High heritability of malaria parasite clearance rate indicates a genetic basis for artemisinin resistance in western Cambodia. J Infect Dis. 2010;201:1326–1330. doi: 10.1086/651562. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.McVean G, Awadalla P, Fearnhead P. A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics. 2002;160:1231–1241. doi: 10.1093/genetics/160.3.1231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Stephens M, Donnelly P. A comparison of bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet. 2003;73:1162–1169. doi: 10.1086/379378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Kurtz S, et al. Versatile and open software for comparing large genomes. Genome Biol. 2004;5:R12. doi: 10.1186/gb-2004-5-2-r12. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information
1210585109_sd01.xlsx (6.9MB, xlsx)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES