Skip to main content
Wellcome Open Research logoLink to Wellcome Open Research
. 2021 Jul 13;6:42. Originally published 2021 Feb 24. [Version 2] doi: 10.12688/wellcomeopenres.16168.2

An open dataset of Plasmodium falciparum genome variation in 7,000 worldwide samples

MalariaGENa, Ambroise Ahouidi 1, Mozam Ali 2, Jacob Almagro-Garcia 2,3, Alfred Amambua-Ngwa 2,4, Chanaki Amaratunga 5, Roberto Amato 2,3, Lucas Amenga-Etego 6,7, Ben Andagalu 8, Tim J C Anderson 9, Voahangy Andrianaranjaka 10, Tobias Apinjoh 11, Cristina Ariani 2, Elizabeth A Ashley 12, Sarah Auburn 13,14, Gordon A Awandare 7,15, Hampate Ba 16, Vito Baraka 17,18, Alyssa E Barry 19,20,21, Philip Bejon 22, Gwladys I Bertin 23, Maciej F Boni 14,24, Steffen Borrmann 25, Teun Bousema 26,27, Oralee Branch 28, Peter C Bull 22,29, George B J Busby 3, Thanat Chookajorn 30, Kesinee Chotivanich 30, Antoine Claessens 4,31, David Conway 26, Alister Craig 32,33, Umberto D'Alessandro 4, Souleymane Dama 34, Nicholas PJ Day 12, Brigitte Denis 33, Mahamadou Diakite 34, Abdoulaye Djimdé 34, Christiane Dolecek 14, Arjen M Dondorp 12, Chris Drakeley 26, Eleanor Drury 2, Patrick Duffy 5, Diego F Echeverry 35,36, Thomas G Egwang 37, Berhanu Erko 38, Rick M Fairhurst 39, Abdul Faiz 40, Caterina A Fanello 12, Mark M Fukuda 41, Dionicia Gamboa 42, Anita Ghansah 43, Lemu Golassa 38, Sonia Goncalves 2, William L Hamilton 2,44, G L Abby Harrison 21, Lee Hart 3, Christa Henrichs 3, Tran Tinh Hien 24,45, Catherine A Hill 46, Abraham Hodgson 47, Christina Hubbart 48, Mallika Imwong 30, Deus S Ishengoma 17,49, Scott A Jackson 50, Chris G Jacob 2, Ben Jeffery 3, Anna E Jeffreys 48, Kimberly J Johnson 3, Dushyanth Jyothi 2, Claire Kamaliddin 23, Edwin Kamau 51, Mihir Kekre 2, Krzysztof Kluczynski 3, Theerarat Kochakarn 2,30, Abibatou Konaté 52, Dominic P Kwiatkowski 2,3,48, Myat Phone Kyaw 53,54, Pharath Lim 5,55, Chanthap Lon 41, Kovana M Loua 56, Oumou Maïga-Ascofaré 34,57,58, Cinzia Malangone 2, Magnus Manske 2, Jutta Marfurt 13, Kevin Marsh 14,59, Mayfong Mayxay 60,61, Alistair Miles 2,3, Olivo Miotto 2,3,12, Victor Mobegi 62, Olugbenga A Mokuolu 63, Jacqui Montgomery 64, Ivo Mueller 21,65, Paul N Newton 66, Thuy Nguyen 2, Thuy-Nhien Nguyen 24, Harald Noedl 67, Francois Nosten 14,68, Rintis Noviyanti 69, Alexis Nzila 70, Lynette I Ochola-Oyier 22, Harold Ocholla 71,72, Abraham Oduro 6, Irene Omedo 22, Marie A Onyamboko 73, Jean-Bosco Ouedraogo 74, Kolapo Oyebola 75,76, Richard D Pearson 2,3, Norbert Peshu 22, Aung Pyae Phyo 12,68, Chris V Plowe 77, Ric N Price 12,13,45, Sasithon Pukrittayakamee 30, Milijaona Randrianarivelojosia 78,79, Julian C Rayner 2, Pascal Ringwald 80, Kirk A Rockett 2,48, Katherine Rowlands 48, Lastenia Ruiz 81, David Saunders 41, Alex Shayo 82, Peter Siba 83, Victoria J Simpson 3, Jim Stalker 2, Xin-zhuan Su 5, Colin Sutherland 26, Shannon Takala-Harrison 84, Livingstone Tavul 83, Vandana Thathy 22,85, Antoinette Tshefu 86, Federica Verra 87, Joseph Vinetz 42,88, Thomas E Wellems 5, Jason Wendler 48, Nicholas J White 12, Ian Wright 3, William Yavo 52,89, Htut Ye 90
PMCID: PMC8008441  PMID: 33824913

Version Changes

Revised. Amendments from Version 1

We are grateful to the reviewers for their suggestions and have updated the manuscript in response. We now include gene IDs every time a gene is mentioned for the first time in the manuscript. We have replaced “complex rearrangements” in the results section with an explicit description of the event. We have added a paragraph to detail that sample collection is heterogeneous and due care is needed when interpreting the results. No changes have been made to the data or figures.

Abstract

MalariaGEN is a data-sharing network that enables groups around the world to work together on the genomic epidemiology of malaria. Here we describe a new release of curated genome variation data on 7,000 Plasmodium falciparum samples from MalariaGEN partner studies in 28 malaria-endemic countries. High-quality genotype calls on 3 million single nucleotide polymorphisms (SNPs) and short indels were produced using a standardised analysis pipeline. Copy number variants associated with drug resistance and structural variants that cause failure of rapid diagnostic tests were also analysed.  Almost all samples showed genetic evidence of resistance to at least one antimalarial drug, and some samples from Southeast Asia carried markers of resistance to six commonly-used drugs. Genes expressed during the mosquito stage of the parasite life-cycle are prominent among loci that show strong geographic differentiation. By continuing to enlarge this open data resource we aim to facilitate research into the evolutionary processes affecting malaria control and to accelerate development of the surveillance toolkit required for malaria elimination.

Keywords: malaria, plasmodium falciparum, genomics, genomic epidemiology, evolution, data resource, population genetics, drug resistance, rapid diagnostic test failure

Introduction

A major obstacle to malaria elimination is the great capacity of the parasite and vector populations to evolve in response to malaria control interventions. The widespread use of chloroquine and DDT in the 1950’s led to high levels of drug and insecticide resistance, and the same pattern has been repeated for other first-line antimalarial drugs and insecticides. Over the past 15 years, mass distribution of pyrethroid-treated bednets in Africa and worldwide use of artemisinin combination therapy (ACT) has led to substantial reductions in malaria prevalence and mortality, but there are rapidly increasing levels of resistance to ACT in Southeast Asian parasites and of pyrethroid resistance in African mosquitoes. A deep understanding of local patterns of resistance and the continually changing nature of the local parasite and vector populations is necessary to manage the use of drugs and insecticides and to deploy public health resources for maximum sustainability and impact.

Current methods for genetic surveillance of the parasite population are largely based on targeted genotyping of specific loci, e.g. known markers of drug resistance. Whole genome sequencing of malaria parasites is currently more expensive and complex, particularly at the stage of data analysis, but it is an important adjunct to targeted genotyping, as it provides a more comprehensive picture of parasite genetic variation. It is particularly important for discovery of new drug resistance markers and for monitoring patterns of gene flow and evolutionary adaptation in the parasite population.

The Plasmodium falciparum Community Project ( Pf Community Project) was established with the aim of integrating parasite genome sequencing into clinical and epidemiological studies of malaria ( www.malariagen.net/projects). It forms part of the Malaria Genomic Epidemiology Network (MalariaGEN), a global data-sharing network comprising multiple partner studies, each with its own research objectives and led by a local investigator 1 . Genome sequencing was performed centrally, and partner studies were free to analyse and publish the genetic data produced on their own samples, in line with MalariaGEN’s guiding principles on equitable data sharing 13 . A programme of capacity building for research into parasite genetics was developed at multiple sites in Africa alongside the Pf Community Project 4 .

The first phase of the project focused on developing simple methods to obtain purified parasite genome DNA from small blood samples collected in the field 5, 6 and on establishing reliable computational methods for variant discovery and genotype calling from short-read sequencing data 7 . This presented a number of analytical challenges due to long tracts of highly repetitive sequence and hypervariable regions within the P. falciparum genome, and also because a single infection can contain a complex mixture of genotypes. Once a reliable analysis pipeline was in place, a process was established for periodic data releases to partners, with continual improvements in data quality as new analytical methods were developed.

Data from the Pf Community Project were initially released through a companion project called Pf3k, whose goal was to bring together leading analysts from multiple institutions to benchmark and standardise methods of variant discovery and genotyping calling. A visual analytics web application was developed 8 for researchers to explore the data. The open dataset was enlarged in 2016 when multiple partner studies contributed to a consortial publication on 3,488 samples from 23 countries 9 .

Data produced by the Pf Community Project have been used to address a broad range of research questions, both by the groups that generated samples and data and by the wider research community, and have generated over 50 previous publications (refs 5–55). These data have become a key resource for the epidemiology and population genetics of antimalarial drug resistance 922 and an important platform for the discovery of new genetic markers and mechanisms of resistance through genome-wide association studies 2327 and combined genome-transcriptome analysis 28 . The data have also been used to study gene deletions that cause failure of rapid diagnostic tests 29 ; to characterise genetic variation in malaria vaccine antigens 30, 31 ; to screen for new vaccine candidates 32 ; to investigate specific host-parasite interactions 33, 34 ; and to describe the evolutionary adaptation and diversification of local parasite populations 7, 9, 12, 3540 .

The Pf Community Project data also provide an important resource for developing and testing new analytical and computational methods. A key area of methods development is quantification of within-host diversity 7, 4146 , estimation of inbreeding 7, 47 , and deconvolution of mixed infections into individual strains 48, 49 . The data have also been used to develop and test methods for estimating identity by descent 50, 51 , imputation 52 , typing structural variants 53 , designing other SNP genotyping platforms 54 and data visualisation 8, 55 . In a companion study we performed whole genome sequencing of experimental genetic crosses of P. falciparum, and this provided a benchmark to test the accuracy of our genotyping methods, and to conduct an in-depth analysis of indels, structural variants and recombination events which are complicated to ascertain in these population genetic samples 56 .

Here we describe a new release of curated genome variation data on 7,113 samples of P. falciparum collected by 49 partner studies from 73 locations in Africa, Asia, South America and Oceania between 2002 and 2015 ( Table 1, Supplementary Data; Supplementary Table 1 and 2).

Table 1. Count of samples in the dataset.

Countries are grouped into eight geographic regions based on their geographic and genetic characteristics. For each country, the table reports: the number of distinct sampling locations; the total number of samples sequenced; the number of high-quality samples included in the analysis; and the percentage of samples collected between 2012–2015, the most recent sampling period in the dataset. Eight samples were obtained from travellers returning from an endemic country, but where the precise site of the infection could not be determined. These were reported from Ghana (3 sequenced samples/2 analysis set samples), Kenya (2/1), Uganda (2/1) and Mozambique (1/1). “Lab samples” contains all sequences obtained from long-term in vitro cultured and adapted isolates, e.g. laboratory strains. The breakdown by site is reported in Supplementary table 1 and the list of contributing studies in Supplementary table 2.

Region Country Sampling
locations
Sequenced
samples
Analysis set
samples
% analysis
samples 2012–2015
South America
(SAM)
Colombia 4 16 16 0%
Peru 2 23 21 0%
West Africa (WAF) Benin 1 102 36 100%
Burkina Faso 1 57 56 0%
Cameroon 1 239 235 100%
Gambia 4 277 219 67%
Ghana 3 1,003 849 56%
Guinea 2 197 149 0%
Ivory Coast 3 70 70 100%
Mali 5 449 426 80%
Mauritania 4 86 76 100%
Nigeria 2 42 29 97%
Senegal 1 86 84 100%
Central Africa (CAF) Congo DR 1 366 344 100%
East Africa (EAF) Ethiopia 2 34 21 100%
Kenya 3 129 109 55%
Madagascar 3 25 24 100%
Malawi 2 351 254 0%
Tanzania 5 350 316 85%
Uganda 1 14 12 0%
South Asia (SAS) Bangladesh 2 93 77 64%
Western Southeast
Asia (WSEA)
Myanmar 5 250 211 71%
Western Thailand 2 962 868 24%
Eastern Southeast
Asia (ESEA)
Cambodia 5 1,214 896 32%
Northeastern
Thailand
1 28 20 75%
Laos 2 131 120 21%
Viet Nam 2 264 226 11%
Oceania (OCE) Indonesia 1 92 80 73%
Papua New Guinea 3 139 121 63%
Returning travellers Various locations 0 8 5 0%
Lab samples Various locations 0 16 0 0%
Total 73 7,113 5,970 52%

Results

Variant discovery and genotyping

We used the Illumina platform to produce genome sequencing data on all samples and we mapped the sequence reads against the P. falciparum 3D7 v3 reference genome. The median depth of coverage was 73 sequence reads averaged across the whole genome and across all samples. We constructed an analysis pipeline for variant discovery and genotyping, including stringent quality control filters that took into account the unusual features of the P. falciparum genome, incorporating lessons learnt from our previous work 7, 56 and the Pf3k project, as outlined in the Methods section.

In the first stage of analysis we discovered variation at over six million positions, corresponding to about a quarter of the 23 Mb P. falciparum genome (Supplementary Data; Supplementary Table 3). These included 3,168,721 single nucleotide polymorphisms (SNPs): these were slightly more common in coding than non-coding regions and were mostly biallelic. The remaining 2,882,975 variants were predominantly short indels but also included more complex combinations of SNPs and indels: these were much more abundant in non-coding than coding regions, and mostly had at least three alleles. The predominance of indels in non-coding regions has been previously observed and is most likely a consequence of the extreme AT bias which leads to many short repetitive sequences 56, 57 .

For the purpose of this analysis, we excluded all variants in subtelomeric and internal hypervariable regions, mitochondrial and apicoplast genomes, and some other regions of the genome where the mapping of short sequence reads is prone to a high error rate due to extremely high rates of variation 56 . A total of 1,838,733 SNPs (of which 1,626,886 were biallelic) and 1,276,027 indels (or SNP/indel combinations) passed all these filters. The pass rate for SNPs in coding regions (66%) was considerably higher than that for SNPs in non-coding regions (47%), indels in coding regions (37%) and indels in non-coding regions (47%). Finally, we removed samples with a low genotyping success rate or other quality control issues. We also removed replicates and 41 samples with genetic markers of infection by multiple Plasmodium species, leaving 5,970 high-quality samples from 28 countries ( Table 1).

We used coverage and read pair analysis to determine duplication genotypes around mdr1 (PF3D7_0523000), plasmepsin2/3 (PF3D7_1408000 and PF3D7_1408100) and gch1 (PF3D7_1224000), each of which are associated with drug resistance. For each of these three genes we discovered many different sets of breakpoints (29, 10 and 3 pairs of breakpoints for mdr1, gch1, and plasmepsin 2/3, respectively), including a large and complex structural rearrangement involving a triplicated segment embedded within a duplication, in which the triplicated segment is inverted (“dup-trpinv-dup”) 58 that to the best of our knowledge has not been observed before in Plasmodium species (Supplementary Data; Supplementary Note, Supplementary Tables 4–6). We also used sequence reads coverage to identify large structural variants that appear to delete or disrupt hrp2 (PF3D7_0831800) and hrp3 (PF3D7_1372200), an event that can cause rapid diagnostic tests to malfunction.

The population genetic analyses in this paper are based on the filtered dataset of high-quality SNP genotypes in 5,970 samples. These data are openly available, together with annotated genotyping data on 6 million putative variants in all 7,113 samples, plus details of partner studies and sampling locations, at www.malariagen.net/resource/26.

Global population structure

The genetic structure of the global parasite population reflects its geographic regional structure 7, 9, 10 as illustrated by a neighbour-joining tree and a principal component analysis of all samples based on their SNP genotypes ( Figure 1). Based on these observations we grouped the samples into eight geographic regions: West Africa, Central Africa, East Africa, South Asia, the western part of Southeast Asia, the eastern part of Southeast Asia, Oceania and South America. Each of these can be viewed as a regional sub-population of parasites, which is more or less differentiated from other regional sub-populations depending on rates of gene flow and other factors. The different regions encompass a range of epidemiological and environmental settings, varying in transmission intensity, vector species and history of antimalarial drug usage. Note these regional classifications are intentionally broad, and therefore overlook many interesting aspects of local population structure, e.g. a distinctive Ethiopian sub-population can be identified by more detailed analysis of African samples 12 .

Figure 1. Population structure.

Figure 1.

( A) Genome-wide unrooted neighbour-joining tree showing population structure across all sites, with sample branches coloured according to country groupings ( Table 1): South America (green, n=37); West Africa (red, n=2231); Central Africa (orange, n=344); East Africa (yellow, n=739); South Asia (purple, n=77); West Southeast Asia (light blue; n=1079); East Southeast Asia (dark blue; n=1262); Oceania (magenta; n=201). The circular inset shows a magnified view of the part of the tree where the majority of samples from Africa coalesce, showing that the three African sub-regions are genetically close but distinct. ( B, C) First three component of a genome-wide principal coordinate analysis. The first axis (PC1) captures the separation of African and South American from Asian samples. The following two axes (PC2 and PC3) capture finer levels of population structure due to geographical separation and selective forces. Each point represents a sample and the colour legend is the same as above.

Genetically mixed infections were considerably more common in Africa than other regions, consistent with the high intensity of malaria transmission in Africa ( Figure 2a). Analysis of F WS , a measure of within-host diversity 7 , shows that most samples from Southeast Asia (1763/2341), South America (37/37) and Oceania (158/201) have F WS >0.95, which to a first approximation indicates that the infection is dominated by a clonal population of parasite 41 . In contrast, nearly half of samples from Africa (1625/3314) have F WS <0.95, indicating the presence of more complex infections. Genetically mixed infections were also common in Bangladesh (41/77 samples have F WS <0.95), another area of high malaria transmission and the only South Asian country represented in this dataset, but did not reach the extremely high levels of within-host diversity ( F WS <0.2) observed in some samples from Africa.

Figure 2. Characteristics of the eight regional parasite populations.

Figure 2.

( A) Distribution of within-host diversity, as measured by F WS, showing that genetically mixed infections were considerably more common in Africa than other regions, consistent with the high intensity of malaria transmission in Africa. ( B) Distribution of per site nucleotide diversity calculated in non-overlapping 25kbp genomic windows. We only considered coding biallelic SNPs to reduce the ascertainment bias caused by poor accessibility of non-coding regions. In both previous panels, thick lines represent median values, boxes show the interquartile range, and whiskers represent the bulk of the distribution, discounting outliers. ( C) Genome-wide median LD (y-axis, measured by r 2) between pairs of SNPs as function of their physical distance (x-axis, in bp), showing a rapid decay in all regional parasite populations. The inset panel shows a magnified view of the decay, showing that in all populations r 2 decayed below 0.1 (dashed horizontal line) within 500 bp. All panels utilise the same palette, with colours denoting each geographic region.

The average nucleotide diversity across the global sample collection was 0.040% (median=0.028%), i.e. two randomly-selected samples differ by an average of 4 nucleotide positions per 10kb. Levels of nucleotide diversity vary greatly across the genome 56 and also geographically ( Figure 2b). Distributions of values were highest in Africa, followed by Bangladesh, but the scale of regional differences was relatively modest, ranging from an average of 0.030% in Eastern Southeast Asia to 0.040% in West Africa (median=0.019% and 0.028% respectively; Figure 2b). In other words, the nucleotide diversity of each regional parasite population was not much less than that of the global parasite population. This is consistent with the idea that the global P. falciparum population has a common African origin and that historically there must have been significant levels of migration.

All regional sub-populations showed very low levels of linkage disequilibrium relative to human populations, e.g. r 2 decayed to <0.1 within 500 bp ( Figure 2c). As expected, African populations had the highest rates of LD decay, implying the highest levels of haplotype diversity.

Geographic patterns of population differentiation and gene flow

Parasite sub-populations in different locations naturally tend to differentiate over time unless there is sufficient gene flow to counterbalance genetic drift. Genome-wide estimates of F ST provide an indicator of this process of genetic differentiation, which is partly determined by geographic distance ( Figure 3). For example, we observe much greater genetic differentiation between South America and South Asia (genome-wide average F ST 0.22) or between Africa and Oceania (0.20) than between sub-regions within Asia (<0.1) or within Africa (<0.02).

Figure 3. Geographic patterns of population differentiation and gene flow.

Figure 3.

Each point represents one pairwise comparison between two regional parasite populations. The x-axis reports the geographic separation between the two populations, measured as great-circle distance between the centre of mass of each population and without taking into account natural barriers. The y-axis reports the genetic differentiation between the two populations, measured as average genome-wide F ST. Points are coloured based on the regional populations they represent: between African populations (red); between Asian populations (blue); between Southeast Asia (as a whole) and Oceania, Africa or South America (purple); all the rest (orange).

These data reveal some interesting exceptions to the general rule that genome-wide F ST is correlated with geographic distance. For example, African parasites are more strongly differentiated from Southeast Asian parasites (genome-wide average F ST 0.20) than they are from parasites in neighbouring Bangladesh (0.11). If this is examined in more detail, there is an unexpectedly steep gradient of genetic differentiation at the geographical boundary between South Asia and Southeast Asia, i.e. parasites sampled in Myanmar and Western Thailand are much more strongly differentiated from parasites sampled in Bangladesh (genome-wide F ST 0.07) than would be expected given that these are neighbouring countries. As discussed later, Southeast Asia is the global epicentre of antimalarial drug resistance, and these observations add to a growing body of evidence that Southeast Asian parasites have acquired a wide range of genomic features that are likely due to natural selection rather than genetic drift 23, 40 .

It is noteworthy that the level of genetic differentiation between western and eastern parts of Southeast Asia (genome-wide F ST 0.05) is greater than between West Africa and East Africa (0.02) although the geographic distances are much greater in Africa. This is likely due to the lower intensity of malaria transmission in Southeast Asia, and in particular the presence of a malaria-free corridor running through Thailand, which act as barriers to gene flow across the region 23, 40 .

Genes with high levels of geographic differentiation

The F ST metric can also be calculated for individual variants to identify specific genes that have acquired high levels of geographic differentiation relative to the genome as a whole. This can be done either at the global level (to identify variants that are highly differentiated between different regions of the world) or at the local level (to identify variants that are highly differentiated between different sampling locations within a region).

To identify variants that are strongly differentiated at the global level, we began by estimating F ST for each SNP across all of the eight regional sub-populations. The group of SNPs with the highest global F ST levels were found to be strongly enriched for non-synonymous mutations, suggesting that the process of differentiation is at least in part due to natural selection ( Figure 4). After ranking all SNPs according to their global F ST value, we calculated a global differentiation score for each gene based on the highest-ranking non-synonymous SNP within the gene (see Methods). All genes are ranked according to their global differentiation score in the accompanying data release, and those with the highest score are listed in Supplementary Table 7 (Supplementary Data). The most highly differentiated gene, p47 (PF3D7_1346800), is known to interact with the mosquito immune system 59 and has two variants (S242L and V247A) that are at fixation in South America but absent in other geographic regions. Also among the five most highly differentiated genes are gig (PF3D7_0935600, implicated in gametocytogenesis 60 ), pfs16, (PF3D7_0406200, expressed on the surface of gametes 61 ) and ctrp (PF3D7_0315200, expressed on the ookinete cell surface and essential for mosquito infection 62 ). Thus, four of the five most highly differentiated parasite genes are involved in the process of transmission by the mosquito vector, raising the possibility that this reflects evolutionary adaptation of the P. falciparum population to the different Anopheles species that transmit malaria in different geographical regions.

Figure 4. SNPs geographic differentiation.

Figure 4.

Coloured lines show the proportions of SNPs in ten F ST bins, stratified by genomic regions: non-synonymous (red), synonymous (yellow), intronic (green) and intergenic (blue). F ST is calculated between all eight regional parasite populations and the number of SNPs in each bin is indicated in the background histogram. The y-axis on the right-hand side refers to the histogram and is on a log scale.

It is more difficult to characterise variants that are strongly differentiated at the local level, due to smaller sample sizes and various sources of sampling bias, but a crude estimate can be obtained by analysis of each of the six geographical regions with samples from multiple countries. F ST was estimated for each SNP across different sampling locations within each geographical region, and the results for different regions were combined by a heuristic approach to obtain a local differentiation score for each gene (see Methods). A range of genes associated with drug resistance (crt (PF3D7_0709000), dhfr (PF3D7_0417200), dhps (PF3D7_0810800), kelch13 (PF3D7_1343700), mdr1 (PF3D7_0523000), mdr2 (PF3D7_1447900) and fd (PF3D7_1318100)) were in the top centile of local differentiation scores (Supplementary Data; Supplementary Figure 1, Supplementary Table 8, Supplementary Note).

Geographic patterns of drug resistance

Classification of samples based on markers of drug resistance. Antimalarial drug resistance represents a major focus of research for many partner studies within the Pf Community Project, and this dataset therefore contains a significant body of data that have appeared in previous reports on drug resistance. Readers are referred to these publications for more detailed analyses of local patterns of resistance 914,16–22 and of resistance to specific drugs including chloroquine 16, 21 , sulfadoxine-pyrimethamine 16, 19, 21 and artemisinin combination therapy 911, 1315, 17, 18, 21, 22 .

Here we have classified all samples into different types of drug resistance based on published genetic markers and current knowledge of the molecular mechanisms (see www.malariagen.net/resource/26 for details of the heuristic used). Table 2 summarises the frequency of different types of drug resistance in samples from different geographical regions. Overall, we observed higher prevalence of samples classified as resistant in Southeast Asia than anywhere else, with multiple samples resistant to all drugs considered. Note that samples were collected over a relatively long time period (2002–15) during which there were major changes in global patterns of drug resistance, and that the sampling locations represented in a given year depended on which partner studies were operative at the time. To alleviate this problem, we have also divided the data into samples collected before and after 2011 (Supplementary Data; Supplementary table 10), but temporal trends in aggregated data should be interpreted with due caution.

Table 2. Cumulative frequency of different types of drug resistance in samples from different geographical regions.

All samples were classified into different types of drug resistance based on published genetic markers, and represent best attempt based on the available data. Each type of resistance was considered to be either present, absent or unknown for a given sample. For each resistance type, the table reports: the genetic markers considered; the drug they are associated with; the proportion of samples in each region classified as resistant out of the samples where the type was not unknown. The number of samples classified as either resistant or not resistant varies for each type of resistance considered (e.g. due to different levels of genomic accessibility); numbers in brackets reports the minimum and maximum number analysed while the exact numbers considered are reported in Supplementary table 9. SP: sulfadoxine-pyrimethamine; treatment: SP used for the clinical treatment of uncomplicated malaria; IPTp: SP used for intermittent preventive treatment in pregnancy; AS-MQ: artesunate + mefloquine combination therapy; DHA-PPQ: dihydroartemisinin + piperaquine combination therapy. Details of the rules used to infer resistance status from genetic markers can be found on the resource page at www.malariagen.net/resource/26.

Marker Associated with
resistance to
South
America
(n=33–37)
West Africa
(n=1851–2231)
Central
Africa
(n=262–344)
East Africa
(n=678–739)
South
Asia
(n=62–77)
Western
Southeast Asia
(n=906–1079)
Eastern
Southeast Asia
(n=867–1256)
Oceania
(n=185–201)
crt
76T
Chloroquine 100% 41% 66% 14% 93% 100% 97% 99%
dhfr
108N
Pyrimethamine 97% 84% 100% 98% 100% 100% 100% 100%
dhps
437G
Sulfadoxine 30% 75% 97% 93% 97% 100% 87% 61%
mdr1
2+ copies
Mefloquine 0% 0% 0% 0% 0% 44% 12% 1%
kelch13
WHO list
Artemisinin 0% 0% 0% 0% 0% 28% 46% 0%
plasmepsin 2-3
2+ copies
Piperaquine 0% 0% 0% 0% 0% 0% 17% 0%
dhfr
triple mutant
SP (treatment) 0% 75% 82% 91% 43% 90% 92% 0%
dhfr and dhps
sextuple mutant
SP (IPTp) 0% 0% 1% 10% 19% 82% 19% 0%
kelch13 and
mdr1
AS-MQ 0% 0% 0% 0% 0% 13% 9% 0%
kelch13 and
plasmepsin 2-3
DHA-PPQ 0% 0% 0% 0% 0% 0% 15% 0%

Below we summarise the overall profile of drug resistance types in the regional sub-populations: this is intended simply to provide context for users of this dataset, and should not be regarded as a statement of the current epidemiological situation. The Supplementary Notes (Supplementary Data) contain a more detailed description of the geographical distribution of haplotypes, CNV breakpoints, interactions between genes, and variants associated with less commonly used antimalarial drugs. In the accompanying data release, we also identify samples with mdr1, plasmepsin2/3 and gch1 gene amplifications that can affect drug resistance.

Chloroquine resistance. Samples were classified as chloroquine resistant if they carried the crt 76T allele. As shown in Table 2, this was found in almost all samples from Southeast Asia, South America and Oceania. It was also found across Africa but at lower frequencies, particularly in East Africa where chloroquine resistance is known to have declined since chloroquine was discontinued 6365 . Supplementary Table 11 (Supplementary Data) shows the geographical distribution of different crt haplotypes (based on amino acid positions 72–76) which is consistent with the theory that chloroquine resistance spread from Southeast Asia to Africa with multiple independent origins in South America and Oceania 66, 67 . The crt locus is also relevant to other types of drug resistance, e.g. crt variants that are relatively specific to Southeast Asia form the genetic background of artemisinin resistance, and newly emerging crt alleles have been associated with the spread of ACT failure due to piperaquine resistance 13, 14, 22, 68 .

Sulfadoxine-pyrimethamine resistance. Clinical resistance to sulfadoxine-pyrimethamine (SP) is determined by multiple mutations and their interactions, so following current practice 69 we classified SP resistant samples into four overlapping types: (i) carrying the dhfr 108N allele, associated with pyrimethamine resistance; (ii) the dhps 437G allele, associated with sulfadoxine resistance; (iii) carrying the dhfr triple mutant, which is strongly associated with SP failure; (iv) carrying the dhfr/dhps sextuple mutant, which confers a higher level of SP resistance. As shown in Table 2, dhfr 108N was found in almost all samples in all regions apart from West Africa, while dhps 437G was at very high frequency throughout most of Africa and Asia, and at lower frequencies in South America and Oceania (see also Supplementary Data; Supplementary Table 12). Triple mutant dhfr parasites were common throughout Africa and Asia, whereas sextuple mutant dhfr/dhps parasites were at much lower frequency except in Western Southeast Asia. In the accompanying data release, we also identify samples with gch1 gene amplifications (Supplementary Data; Supplementary Table 4) that can modulate SP resistance 70 , although their effect on the clinical outcome and interaction with mutations in dhfr and dhps is not fully established.

Resistance to artemisinin combination therapy. We classified samples as artemisinin resistant based on the World Health Organization classification of non-synonymous mutations in the propeller region of the kelch13 gene that have been associated with delayed parasite clearance 71 . By this definition, artemisinin resistance was confined to Southeast Asia but, as previously reported, this dataset contains a substantial number of non-synonymous kelch13 propeller SNPs occurring at <5% frequency in Africa and elsewhere 9 . The most common ACT formulations in Southeast Asia are artesunate-mefloquine (AS-MQ) and dihydroartemisinin-piperaquine (DHA-PPQ). We classified samples as mefloquine resistant if they had mdr1 amplification 72 or as piperaquine resistant if they had plasmepsin 2/3 amplification 25 . Mefloquine resistance was observed throughout Southeast Asia and was most common in the western part. Piperaquine resistance was confined to eastern Southeast Asia with a notable concentration in western Cambodia. Elsewhere 11, 13 we describe the kel1/pla1 lineage of artemisinin- and piperaquine-resistant parasites that expanded in western Cambodia during 2008–13, and then spread to other countries during 2013–18, causing high rates of DHA-PPQ treatment failure across eastern Southeast Asia: since the current dataset extends only to 2015 it captures only the first phase of the kel1/pla1 lineage expansion.

HRP2/3 deletions that affect rapid diagnostic tests

Rapid diagnostic tests (RDTs) provide a simple and inexpensive way to test for parasites in the blood of patients who are suspected to have malaria, and have become a vital tool for malaria control 73, 74 . The most widely used RDTs are designed to detect P. falciparum histidine-rich protein 2 and cross-react with histidine-rich protein 3, encoded by the hrp2 and hrp3 genes respectively. Parasites with gene deletions of hrp2 and/or hrp3 have emerged as an important cause of RDT failure in a number of locations 7579 . It is difficult to devise a simple genetic assay to monitor for risk of RDT failure because hrp2 and hrp3 deletions comprise a diverse mixture of large structural variations with multiple independent origins, and both genes are located in subtelomeric regions of the genome with very high levels of natural variation 29, 8083 . In the absence of a well-validated algorithmic method, we visually inspected sequence read coverage and identified samples with clear evidence of large structural variants that disrupted or deleted the hrp2 and hrp3 genes. We took a conservative approach: samples that appeared to have a mixture of deleted and non-deleted genotypes were classified as non-deleted.

Deletions were found at relatively high frequency in Peru (8 of 21 samples had hrp2 deletions, 14 had hrp3 deletions and 6 had both) but were not seen in samples from Colombia and were relatively rare outside South America. Oceania was the only other region where we observed hrp2 deletions, but at very low frequency (4%, n=3/80), and also had hrp3 deletions (25%) though no combined deletions were seen. Deletions of hrp3 only were more geographically widespread than hrp2 deletions, being common in Ethiopia (43%, n=9/21) and in Senegal (7%, n=6/84), and at relatively low frequency (<5%) in Kenya, Cambodia, Laos, and Vietnam (Supplementary Data; Supplementary Table 13). Note that these findings might under-estimate the true prevalence of hrp2/ hrp3 deletions, due to sampling bias (our samples were primarily collected from RDT-positive cases) and also because we focused on large structural variants and did not consider polymorphisms that might also cause RDT failure but would require more sophisticated analytical approaches. There is a need for more reliable diagnostics of hrp2 and hrp3 deletions, and we hope that these open data will accelerate this important area of applied methodological research.

Discussion

This open dataset comprises sequence reads and genotype calls on over 7,000 P. falciparum samples from MalariaGEN partner studies in 28 countries. After excluding variants and samples that failed to meet stringent quality control criteria, the dataset contains high-quality genotype calls for 3 million polymorphisms including SNPs, indels, CNVs and large structural variations, in almost 6,000 samples. The data can be analysed in their entirety or can be filtered to select for specific genes, or geographical locations, or samples with particular genotypes. This is twice the sample size of our previous consortial publication 9 and is the largest available data resource for analysis of P. falciparum population structure, gene flow and evolutionary adaptation. Each sample has been annotated to show its profile of resistance to six major antimalarial drugs and whether it carries structural variations that can cause RDT failure. The classification scheme is heuristic and based on a subset of known genetic markers, so it should not be treated as a failsafe predictor of the phenotype of a particular sample. Our purpose in providing these annotations is to make it easy for users without specialist training in genetics to explore the global dataset and to analyse any subset of samples for key features that are relevant to malaria control. Samples were collected by independent groups that were operative at a given time and in a given place with distinct objectives; while care needs to be taken when interpreting results spanning multiple years and geographical settings (e.g. aggregated trends of drug resistance prevalence), this heterogeneity also allows for the exploration of a wide range of epidemiological and transmission settings.

An important function of this curated dataset is to provide information on the provenance and key features of samples associated with each partner study, thus allowing the findings reported in different publications to be linked and compared. Data produced by the Pf Community Project have been analysed in more than 50 publications (refs 5–55) and a few examples will serve to illustrate the diverse ways in which the data are being used. An analysis of samples collected across Africa by Amambua-Ngwa, Djimde and colleagues found evidence that parasite population structure overlaps with historical patterns of human migration and that the P. falciparum population in Ethiopia is significantly diverged from other parts of the continent 12 . A series of studies by Amato, Miotto and colleagues have documented the evolution of a multidrug-resistant lineage of P. falciparum that originated in Western Cambodia over ten years ago and is now expanding rapidly across Southeast Asia, acquiring additional resistance mutations as it spreads 11, 13, 14 . McVean and colleagues have developed a computational method for deconvolution of the haplotypic structure of mixed infections, allowing analysis of the pedigree structure of parasites that are cotransmitted by the same mosquito 49 . Bahlo and colleagues have developed a different haplotype-based method to describe the relatedness structure of the parasite population and to identify new genomic loci with evidence of recent positive selection 50 .

A recent report from the World Health Organization highlights the need for improved surveillance systems in sustaining malaria control and achieving the long-term goal of malaria eradication 84 . To be of practical value for national malaria control programmes, genetic data must address well-defined use cases and be readily accessible 85 . Amplicon sequencing technologies provide a powerful new tool for targeted genotyping that could feasibly be implemented locally in malaria-endemic countries 86, 87 , but there remains a need for the international malaria control community to generate and share whole genome sequencing data, e.g. to monitor for newly emerging forms of drug resistance and to understand regional patterns of parasite migration. The next generation of long-read sequencing technologies will improve the precision of population genomic inference, e.g. by enabling analysis of hypervariable regions of the genome, and of pedigree structures within mixed infections. The accuracy with which the resistance phenotype of a sample can be predicted from genome sequencing data will also improve as we gain better functional understanding of the polygenic determinants of drug resistance.

Thus, the next few years are likely to see major advances in both the scale and information content of parasite genomic data. The practical value for malaria control will be greatly enhanced by the progressive acquisition of longitudinal time-series data, particularly if this is linked to other sources of epidemiological data and translated into reliable, actionable information with sufficient rapidity to allow control programmes to monitor the impact of their interventions on the parasite population in near real time. The Pf Community Project provides proof of concept that systems can be developed for groups in different countries to share data, to analyse it using standardised methods, and to make it readily accessible to other researchers and the malaria control community.

Methods

Here we summarise the bioinformatics methods used to produce and analyse the data; further details are available at www.malariagen.net/resource/26.

Ethical approval

All samples in this study were derived from blood samples obtained from patients with P. falciparum malaria, collected with informed consent from the patient or a parent or guardian. At each location, sample collection was approved by the appropriate local and institutional ethics committees. The following local and institutional committees gave ethical approval for the partner studies: Human Research Ethics Committee of the Northern Territory Department of Health & Families and Menzies School of Health Research, Darwin, Australia; National Research Ethics Committee of Bangladesh Medical Research Council, Bangladesh; Comite d'Ethique de la Recherche - Institut des Sciences Biomedicales Appliquees, Benin; Ministere de la Sante – Republique du Benin, Benin; Comité d'Éthique, Ministère de la Santé, Bobo-Dioulasso, Burkina Faso; Institutional Review Board Centre Muraz, Burkina Faso; Ministry of Health National Ethics Committee for Health Research, Cambodia; Institutional Review Board University of Buea, Cameroon; Comite Institucional de Etica de investigaciones en humanos de CIDEIM, Colombia; Comité National d'Ethique de la Recherche, Cote d’Ivoire; Comite d’Ethique Universite de Kinshasa, Democratic Republic of Congo; Armauer Hansen Research Institute Institutional Review Board, Ethiopia; Addis Ababa University, Aklilu Lemma Institute of Pathobiology Institutional Review Board, Ethiopia; Kintampo Health Research Centre Institutional Ethics Committee, Ghana; Ghana Health Service Ethical Review Committee, Ghana; University of Ghana Noguchi Medical Research Institute, Ghana; Navrongo Health Research Centre Institutional Review Board, Ghana; Comite d’Ethique National Pour la Recherché en Santé, Republique de Guinee; Indian Council of Medical Research, India; Eijkman Institute Research Ethics Commission, Eijkman Institute for Molecular Biology, Jakarta, Indonesia; KEMRI Scientific and Ethics Review Unit, Kenya; Ministry of Health National Ethics Committee For Health Research, Laos; Ethical Review Committee of University of Ilorin Teaching Hospital, Nigeria; Comité National d'Ethique auprès du Ministère de la Santé Publique, Madagascar; College of Medicine Regional Ethics Committee University of Malawi, Malawi; Faculté de Médecine, de Pharmacie et d'Odonto-Stomatologie, University of Bamako, Bamako, Mali; Ethics Committee of the Ministry of Health, Mali; Ethics committee of the Ministry of Health, Mauritania; Department of Medical Research (Lower Myanmar); Ministry of Health, Government of The Republic of the Union of Myanmar; : Institutional Review Board, Papua New Guinea Institute of Medical Research, Goroka, Papua New Guinea; PNG Medical Research Advisory Council (MRAC), Papua New Guinea; Institutional Review Board, Universidad Nacional de la Amazonia Peruana, Iquitos, Peru; Ethics Committee of the Ministry of Health, Senegal; National Institute for Medical Research and Ministry of Health and Social Welfare, Tanzania; Medical Research Coordinating Committee of the National Institute for Medical Research, Tanzania; Ethics Committee, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand; Ethics Committee at Institute for the Development of Human Research Protections, Thailand; Gambia Government/MRC Joint Ethics Committee, Banjul, The Gambia; London School of Hygiene and Tropical Medicine Ethics Committee, London, UK; Oxford Tropical Research Ethics Committee, Oxford, UK; Walter Reed Army Institute of Research, USA; National Institute of Allergy and Infectious Diseases, Bethesda, MD, USA; Ethical Committee, Hospital for Tropical Diseases, Ho Chi Minh City, Vietnam; Ministry of Health Institute of Malariology-Parasitology-Entomology, Vietnam.

Standard laboratory protocols were used to determine DNA quantity and proportion of human DNA in each sample as previously described 7, 56 .

Data generation and curation

Reads mapping to the human reference genome were discarded before all analyses, and the remaining reads were mapped to the P. falciparum 3D7 v3 reference genome using bwa mem 88 version 0.7.15. “Improved” BAMs were created using the Picard tools CleanSam, FixMateInformation and MarkDuplicates version 2.6.0 and GATK v3 base quality score recalibration. All lanes for each sample were merged to create sample-level BAM files.

We discovered potential SNPs and indels by running GATK’s HaplotypeCaller 89 independently across each of the 7,182 sample-level BAM files and genotyped these for each of the 16 reference sequences (14 chromosomes, 1 apicoplast and 1 mitochondria) using GATK’s CombineGVCFs and GenotypeGCVFs.

SNPs and indels were filtered using GATK’s Variant Quality Score Recalibration (VQSR). Variants with a VQSLOD score ≤ 0 were filtered out. Functional annotations were applied using snpEff 90 version 4.1. Genome regions were annotated using vcftools version 0.1.10 and masked if they were outside the core genome. Unless otherwise specified, we used biallelic SNPs that pass all quality filters for all the analysis.

We removed 69 samples from lab studies to create the release VCF files which contain 7,113 samples. VCF files were converted to ZARR format and subsequent analyses were mainly performed using scikit-allel version 1.1.18 and the ZARR files.

We identified species using nucleotide sequence from reads mapping to six different loci in the mitochondrial genome, using custom java code (available at https://github.com/malariagen/GeneticReportCard). The loci were located within the cox3 gene (PF3D7_MIT01400), as described in a previously published species detection method 91 . Alleles at various mitochondrial positions within the six loci were genotyped and used for classification as shown in Supplementary Table 14 (Supplementary Data).

We created a final analysis set of 5,970 samples after removing replicate, low coverage, suspected contaminations or mislabelling and mixed-species samples.

Genotyping of drug resistance markers and samples classification

We used two complementary methods to determine tandem duplication genotypes around mdr1, plasmepsin2/3 and gch1, namely a coverage-based method and a method based on position and orientation of reads near discovered duplication breakpoints. In brief, the outline algorithm is: (1) Determine copy number at each locus using a coverage based hidden Markov model (HMM); (2) Determine breakpoints of identified duplications by manual inspection of reads and face-away read pairs around all sets of breakpoints; (3) for each locus in each sample, initially set copy number to that determined by the HMM if ≤ 10 CNVs discovered in total, else consider undetermined; (4) if face-away pairs provide self-sufficient evidence for the presence or absence of the amplification, override the HMM call; (5) for each locus in each sample, set the breakpoint to be that with the highest proportion of face-away reads.

We genotyped deletions in hrp2 and hrp3 by manual inspection of sequence read coverage plots.

The procedure used to map genetic markers to inferred resistance status classification is described in detail for each drug in the accompanying data release ( https://www.malariagen.net/resource/26).

In brief, we called amino acids at selected loci by first determining the reference amino acids and then, for each sample, applying all variations using the GT field of the VCF file. The amino acid and copy number calls generated were used to classify all samples into different types of drug resistance. Our methods of classification were heuristic and based on the available data and current knowledge of the molecular mechanisms. Each type of resistance was considered to be either present, absent or unknown for a given sample.

Population-level analysis and characterisation

We calculate genetic distance between samples using biallelic SNPs that pass filters using a method previously described 9 . In addition to calculating genetic distance between all pairs of samples from the current data set, we also calculated the genetic distance between each sample and the lab strains 3D7, 7G8, GB4, HB3 and Dd2 from the Pf3k project.

The matrix of genetic distances was used to generate neighbour-joining trees and principal coordinates. Based on these observations we grouped the samples into eight geographic regions: South America, West Africa, Central Africa, East Africa, South Asia, the western part of Southeast Asia, the eastern part of Southeast Asia and Oceania, with samples assigned to region based on the geographic location of the sampling site. Five samples from returning travellers were assigned to region based on the reported country of travel.

F WS was calculated using custom python scripts using the method previously described 7 . Nucleotide diversity (π) was calculated in non-overlapping 25 kbp genomic windows, only considering coding biallelic SNPs to reduce the ascertainment bias caused by poor accessibility of non-coding regions. LD decay ( r 2) was calculated using the method of Rogers and Huff and biallelic SNPs with low missingness and regional allele frequency >10%. Mean F ST between populations was calculated using Hudson’s method.

Allele frequencies stratified by geographic regions and sampling sites were calculated using the genotype calls produced by GATK. F ST was calculated between all 8 regions, and also between all sites with at least 25 QC pass samples. F ST between different locations for individual SNPs was calculated using Weir and Cockerham’s method.

We defined the global differentiation score for a gene as 1Nmax(N) , where is the rank of the non-synonymous SNP with the highest global F ST value within that gene. To define the local differentiation score, we first calculated for each region containing multiple sites (WAF, EAF, SAS, WSEA, ESEA and OCE) F ST for each SNP between sites within that region. For each gene, we then calculated the rank of the highest F ST non-synonymous SNP within that gene for each of the six regions. We defined the local differentiation score for each gene using the second highest of these six ranks (N), to ensure that the gene was highly ranked in at least two populations, i.e. to minimise the chance of artefactually ranked a gene highly due to a single variant in a single population. The final local differentiation score was normalised to ensure that the range of possible scores was between 0 and 1, local differentiation score was defined as 1Nmax(N) .

An earlier version of this article can be found on bioRxiv (DOI: https://doi.org/10.1101/824730).

Data availability

Underlying data

Data are available under the MalariaGEN terms of use for the Pf Community Project: https://www.malariagen.net/data/terms-use/p-falciparum-community-project-terms-use. Depending on the nature, format and content of the data, appropriate mechanisms have been utilised for data access, as detailed below.

This project contains the following underlying data that are available as an online resource: www.malariagen.net/resource/26. Data are also available from Figshare.

Figshare: Supplementary data to: An open dataset of Plasmodium falciparum genome variation in 7,000 worldwide samples. https://doi.org/10.6084/m9.figshare.13388603 92 .

  • Study information: Details of the 49 contributing partner studies, including description, contact information and key people.

  • Sample provenance and sequencing metadata: sample information including partner study information, location and year of collection, ENA accession numbers, and QC information for 7,113 samples from 28 countries.

  • Measure of complexity of infections: characterisation of within-host diversity (FWS) for 5,970 QC pass samples.

  • Drug resistance marker genotypes: genotypes at known markers of drug resistance for 7,113 samples, containing amino acid and copy number genotypes at six loci: crt, dhfr, dhps, mdr1, kelch13, plasmepsin 2–3.

  • Inferred resistance status classification: classification of 5,970 QC pass samples into different types of resistance to 10 drugs or combinations of drugs and to RDT detection: chloroquine, pyrimethamine, sulfadoxine, mefloquine, artemisinin, piperaquine, sulfadoxine- pyrimethamine for treatment of uncomplicated malaria, sulfadoxine- pyrimethamine for intermittent preventive treatment in pregnancy, artesunate-mefloquine, dihydroartemisinin-piperaquine, hrp2 and hrp3 genes deletions.

  • Drug resistance markers to inferred resistance status: details of the heuristics utilised to map genetic markers to resistance status classification.

  • Gene differentiation: estimates of global and local differentiation for 5,561 genes.

  • Short variants genotypes: Genotype calls on 6,051,696 SNPs and short indels in 7,113 samples from 29 countries, available both as VCF and zarr files.

Extended data

This project contains the following underlying supplementary data available as a single document download: www.malariagen.net/resource/26. Extended data are also available from Figshare.

Figshare: Supplementary data to: An open dataset of Plasmodium falciparum genome variation in 7,000 worldwide samples. https://doi.org/10.6084/m9.figshare.13388603 92 .

‘File9_Pf_6_supplementary’ contains the Supplementary Note, Supplementary Tables and Supplementary Figure:

  • Supplementary Note

    • Analysis of local differentiation score

    • The classic 76T chloroquine resistance mutation in crt is found on multiple haplotypes

    • Suplhadoxine-pyrimethamine resistance is widespread and associated with many haplotypes

    • mdr1 duplications have many different breakpoints

    • Artemisinin, piperaquine, and mefloquine resistance

    • No evidence of resistance to less commonly used antimalarials

  • Supplementary Table 1. Breakdown of analysis set samples by geography.

  • Supplementary Table 2. Studies contributing samples.

  • Supplementary Table 3. Summary of discovered variant positions.

  • Supplementary Table 4. Breakpoints of duplications of gch1.

  • Supplementary Table 5. Breakpoints of duplications of mdr1.

  • Supplementary Table 6. Breakpoints of duplications of plasmepsin 2–3.

  • Supplementary Table 7. Genes ranked by global differentiation score.

  • Supplementary Table 8. Genes ranked by local differentiation score.

  • Supplementary Table 9. Number of samples used to determine proportions in Table 2.

  • Supplementary Table 10. Frequencies of mutations associated with mono- and multi-drug resistance pre- and post-2011.

  • Supplementary Table 11. Frequency of crt amino acid 72–76 haplotypes.

  • Supplementary Table 12. Frequencies of dhfr (51, 59, 108, 164) and dhps (437, 540, 581, 613) multi-locus haplotypes.

  • Supplementary Table 13. Frequency of HRP2 and HRP3 deletions by country.

  • Supplementary Table 14. Alleles at six mitochondrial positions used for the species identification.

  • Supplementary Figure 1. Histogram of local differentiation score for all genes.

Data hosted with Figshare are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Data analysis group

Pearson, RD * , Amato, R * , Hamilton, WL, Almagro-Garcia, J, Chookajorn, T, Kochakarn, T, Miotto, O, Kwiatkowski, DP

*Joint analysis lead

Local study design, implementation and sample collection

Ahouidi, A, Amambua-Ngwa, A, Amaratunga, C, Amenga-Etego, L, Andagalu, B, Anderson, TJC, Apinjoh, T, Ashley, EA, Auburn, S, Awandare, G, Ba, H, Baraka, V, Barry, AE, Bejon, P, Bertin, GI, Boni, MF, Borrmann, S, Bousema, T, Branch, O, Bull, PC, Chotivanich, K, Claessens, A, Conway, D, Craig, A, D’Alessandro, U, Dama, S, Day, N, Denis, B, Diakite, M, Djimdé, A, Dolecek, C, Dondorp, A, Drakeley, C, Duffy, P, Echeverry, DF, Egwang, TG, Erko, B, Fairhurst, RM, Faiz, A, Fanello, CA, Fukuda, MM, Gamboa, D, Ghansah, A, Golassa, L, Harrison, GLA, Hien, TT, Hill, CA, Hodgson, A, Imwong, M, Ishengoma, DS, Jackson, SA, Kamaliddin, C, Kamau, E, Konaté, A, Kyaw, MP, Lim, P, Lon, C, Loua, KM, Maïga-Ascofaré, O, Marfurt, J, Marsh, K, Mayxay, M, Mobegi, V, Mokuolu, OA, Montgomery, J, Mueller, I, Newton, PN, Nguyen, TN, Noedl, H, Nosten, F, Noviyanti, R, Nzila, A, Ochola-Oyier, LI, Ocholla, H, Oduro, A, Omedo, I, Onyamboko, MA, Ouedraogo, J, Oyebola, K, Peshu, N, Phyo, AP, Plowe, CV, Price, RN, Pukrittayakamee, S, Randrianarivelojosia, M, Rayner, JC, Ringwald, P, Ruiz, L, Saunders, D, Shayo, A, Siba, P, Su, X, Sutherland, C, Takala-Harrison, S, Tavul, L, Thathy, V, Tshefu, A, Verra, F, Vinetz, J, Wellems, TE, Wendler, J, White, NJ, Yavo, W, Ye, H

Sequencing, data production and informatics

Pearson, RD, Stalker, J, Ali, M, Amato, R, Ariani, C, Busby, G, Drury, E, Hart, L, Hubbart, C, Jacob, CG, Jeffery, B, Jeffreys, AE, Jyothi, D, Kekre, M, Kluczynski, K, Malangone, C, Manske, M, Miles, A, Nguyen, T, Rowlands, K, Wright, I, Goncalves, S, Rockett, KA

Partner study support and coordination

Simpson, VJ, Miotto, O, Amato, R, Goncalves, S, Henrichs, C, Johnson, KJ, Pearson, RD, Rockett, KA, Kwiatkowski, DP

Acknowledgements

This study was conducted by the MalariaGEN Plasmodium falciparum Community Project, and was made possible by clinical parasite samples contributed by partner studies, whose investigators are represented in the author list and in the associated data release ( https://www.malariagen.net/resource/26). This research was supported in part by the Intramural Research Programme of the NIH, NIAID. In addition, the authors would like to thank the following individuals who contributed to partner studies, making this study possible: Dr Eugene Laman for work in sample collection in the Republic of Guinea; Dr Abderahmane Tandia and Dr Yacine Deh and Dr Samuel Assefa for work in sample collection in Mauritania; Dr Ibrahim Sanogo for work in sample collection in Mali; Dr James Abugri and Dr Nicholas Amoako for work coordinating sample collection in Ghana. Genome sequencing was undertaken by the Wellcome Sanger Institute and we thank the staff of the Wellcome Sanger Institute Sample Logistics, Sequencing, and Informatics facilities for their contribution. The authors would like to thank Erin Courtier for her assistance with the journal submission. The views expressed here are solely those of the authors and do not reflect the views, policies or positions of the U.S. Government or Department of Defense. Material has been reviewed by the Walter Reed Army Institute of Research. There is no objection to its presentation and/or publication. The opinions or assertions contained herein are the private views of the author, and are not to be construed as official, or as reflecting true views of the Department of the Army or the Department of Defense. The investigators have adhered to the policies for protection of human subjects as prescribed in AR 70–25. PR is a staff member of the World Health Organization. PR alone is responsible for the views expressed in this publication and they do not necessarily represent the decisions, policy or views of the World Health Organization.

Funding Statement

The sequencing, analysis, informatics and management of the Community Project are supported by Wellcome through Sanger Institute core funding (098051), a Strategic Award (090770/Z/09/Z) and the Wellcome Centre for Human Genetics core funding (203141/Z/16/Z), by the MRC Centre for Genomics and Global Health which is jointly funded by the Medical Research Council and the Department for International Development (DFID) (G0600718; M006212), and by the Bill & Melinda Gates Foundation (OPP1204628).

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

[version 2; peer review: 2 approved]

References

  • 1. Malaria Genomic Epidemiology Network: A global network for investigating the genomic epidemiology of malaria. Nature. 2008;456(7223):732–7. 10.1038/nature07632 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Chokshi DA, Parker M, Kwiatkowski DP: Data sharing and intellectual property in a genomic epidemiology network: policies for large-scale research collaboration. Bull World Health Organ. 2006;84(5):382–7. 10.2471/blt.06.029843 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Parker M, Bull SJ, de Vries J, et al. : Ethical data release in genome-wide association studies in developing countries. PLoS Med. 2009;6(11): e1000143. 10.1371/journal.pmed.1000143 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Ghansah A, Amenga-Etego L, Amambua-Ngwa A, et al. : Monitoring parasite diversity for malaria elimination in sub-Saharan Africa. Science. 2014;345(6202):1297–8. 10.1126/science.1259423 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Auburn S, Campino S, Clark TG, et al. : An effective method to purify Plasmodium falciparum DNA directly from clinical blood samples for whole genome high-throughput sequencing. PLoS One. 2011;6(7): e22213. 10.1371/journal.pone.0022213 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Venkatesan M, Amaratunga C, Campino S, et al. : Using CF11 cellulose columns to inexpensively and effectively remove human DNA from Plasmodium falciparum-infected whole blood samples. Malar J. 2012;11:41. 10.1186/1475-2875-11-41 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Manske M, Miotto O, Campino S, et al. : Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing. Nature. 2012;487(7407):375–9. 10.1038/nature11174 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Vauterin P, Jeffery B, Miles A, et al. : Panoptes: Web-based exploration of large scale genome variation data. Bioinformatics. 2017;33(20):3243–3249. 10.1093/bioinformatics/btx410 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. MalariaGEN Plasmodium falciparum Community Project: Genomic epidemiology of artemisinin resistant malaria. eLife. 2016;5: e08714. 10.7554/eLife.08714 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Miotto O, Almagro-Garcia J, Manske M, et al. : Multiple populations of artemisinin-resistant Plasmodium falciparum in Cambodia. Nat Genet. 2013;45(6):648–55. 10.1038/ng.2624 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Amato R, Pearson RD, Almagro-Garcia J, et al. : Origins of the current outbreak of multidrug-resistant malaria in southeast Asia: a retrospective genetic study. Lancet Infect Dis. 2018;18(3):337–45. 10.1016/S1473-3099(18)30068-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Amambua-Ngwa A, Amenga-Etego L, Kamau E, et al. : Major subpopulations of Plasmodium falciparum in sub-Saharan Africa. Science. 2019;365(6455):813–6. 10.1126/science.aav5427 [DOI] [PubMed] [Google Scholar]
  • 13. Hamilton WL, Amato R, van der Pluijm RW, et al. : Evolution and expansion of multidrug-resistant malaria in southeast Asia: a genomic epidemiology study. Lancet Infect Dis. 2019;19(9):943–51. 10.1016/S1473-3099(19)30392-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. van der Pluijm RW, Imwong M, Chau NH, et al. : Determinants of dihydroartemisinin-piperaquine treatment failure in Plasmodium falciparum malaria in Cambodia, Thailand, and Vietnam: a prospective clinical, pharmacological, and genetic study. Lancet Infect Dis. 2019;19(9):952–61. 10.1016/S1473-3099(19)30391-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Ariey F, Witkowski B, Amaratunga C, et al. : A molecular marker of artemisinin-resistant Plasmodium falciparum malaria. Nature. 2014;505(7481):50–5. 10.1038/nature12876 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Nwakanma DC, Duffy CW, Amambua-Ngwa A, et al. : Changes in malaria parasite drug resistance in an endemic population over a 25-year period with resulting genomic evidence of selection. J Infect Dis. 2014;209(7):1126–35. 10.1093/infdis/jit618 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Ashley EA, Dhorda M, Fairhurst RM, et al. : Spread of Artemisinin Resistance in Plasmodium falciparum Malaria. N Engl J Med. 2014;371(5):411–23. 10.1056/NEJMoa1314981 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Kamau E, Campino S, Amenga-Etego L, et al. : K13-propeller polymorphisms in Plasmodium falciparum parasites from sub-Saharan Africa. J Infect Dis. 2015;211(8):1352–5. 10.1093/infdis/jiu608 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Ravenhall M, Benavente ED, Mipando M, et al. : Characterizing the impact of sustained sulfadoxine/pyrimethamine use upon the Plasmodium falciparum population in Malawi. Malar J. 2016;15(1):575. 10.1186/s12936-016-1634-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Gomes AR, Ravenhall M, Benavente ED, et al. : Genetic diversity of next generation antimalarial targets: A baseline for drug resistance surveillance programmes. Int J Parasitol Drugs Drug Resist. 2017;7(2):174–180. 10.1016/j.ijpddr.2017.03.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Apinjoh TO, Mugri RN, Miotto O, et al. : Molecular markers for artemisinin and partner drug resistance in natural Plasmodium falciparum populations following increased insecticide treated net coverage along the slope of mount Cameroon: Cross-sectional study. Infect Dis Poverty. 2017;6(1):136. 10.1186/s40249-017-0350-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Ross LS, Dhingra SK, Mok S, et al. : Emerging Southeast Asian PfCRT mutations confer Plasmodium falciparum resistance to the first-line antimalarial piperaquine. Nat Commun. 2018;9(1):3314. 10.1038/s41467-018-05652-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Miotto O, Amato R, Ashley EA, et al. : Genetic architecture of artemisinin-resistant Plasmodium falciparum . Nat Genet. 2015;47(3):226–34. 10.1038/ng.3189 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Takala-Harrison S, Jacob CG, Arze C, et al. : Independent Emergence of Artemisinin Resistance Mutations Among Plasmodium falciparum in Southeast Asia. J Infect Dis. 2015;211(5):670–9. 10.1093/infdis/jiu491 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Amato R, Lim P, Miotto O, et al. : Genetic markers associated with dihydroartemisinin-piperaquine failure in Plasmodium falciparum malaria in Cambodia: a genotype-phenotype association study. Lancet Infect Dis. 2017;17(2):164–73. 10.1016/S1473-3099(16)30409-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Borrmann S, Straimer J, Mwai L, et al. : Genome-wide screen identifies new candidate genes associated with artemisinin susceptibility in Plasmodium falciparum in Kenya. Sci Rep. 2013;3:3318. 10.1038/srep03318 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Wendler JP, Okombo J, Amato R, et al. : A Genome Wide Association Study of Plasmodium falciparum Susceptibility to 22 Antimalarial Drugs in Kenya. PLoS One. 2014;9(5): e96486. 10.1371/journal.pone.0096486 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Zhu L, Tripathi J, Rocamora FM, et al. : The origins of malaria artemisinin resistance defined by a genetic and transcriptomic background. Nat Commun. 2018;9(1):5158. 10.1038/s41467-018-07588-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Sepúlveda N, Phelan J, Diez-Benavente E, et al. : Global analysis of Plasmodium falciparum histidine-rich protein-2 ( pfhrp2) and pfhrp3 gene deletions using whole-genome sequencing data and meta-analysis. Infect Genet Evol. 2018;62:211–9. 10.1016/j.meegid.2018.04.039 [DOI] [PubMed] [Google Scholar]
  • 30. Williams AR, Douglas AD, Miura K, et al. : Enhancing blockade of Plasmodium falciparum erythrocyte invasion: assessing combinations of antibodies against PfRH5 and other merozoite antigens. PLoS Pathog. 2012;8(11): e1002991. 10.1371/journal.ppat.1002991 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Benavente ED, Oresegun DR, de Sessions PF, et al. : Global genetic diversity of var2csa in Plasmodium falciparum with implications for malaria in pregnancy and vaccine development. Sci Rep. 2018;8(1):15429. 10.1038/s41598-018-33767-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Amambua-Ngwa A, Tetteh KKA, Manske M, et al. : Population genomic scan for candidate signatures of balancing selection to guide antigen characterization in malaria parasites. PLoS Genet. 2012;8(11): e1002992. 10.1371/journal.pgen.1002992 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Campino S, Marin-Menendez A, Kemp A, et al. : A forward genetic screen reveals a primary role for Plasmodium falciparum Reticulocyte Binding Protein Homologue 2a and 2b in determining alternative erythrocyte invasion pathways. PLoS Pathog. 2018;14(11): e1007436. 10.1371/journal.ppat.1007436 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Crosnier C, Iqbal Z, Knuepfer E, et al. : Binding of Plasmodium falciparum merozoite surface proteins DBLMSP and DBLMSP2 to human immunoglobulin M is conserved among broadly diverged sequence variants. J Biol Chem. 2016;291(27):14285–99. 10.1074/jbc.M116.722074 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Amambua-Ngwa A, Jeffries D, Amato R, et al. : Consistent signatures of selection from genomic analysis of pairs of temporal and spatial Plasmodium falciparum populations from the Gambia. Sci Rep. 2018;8(1):9687. 10.1038/s41598-018-28017-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Duffy CW, Amambua-Ngwa A, Ahouidi AD, et al. : Multi-population genomic analysis of malaria parasites indicates local selection and differentiation at the gdv1 locus regulating sexual development. Sci Rep. 2018;8(1):15763. 10.1038/s41598-018-34078-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Duffy CW, Ba H, Assefa S, et al. : Population genetic structure and adaptation of malaria parasites on the edge of endemic distribution. Mol Ecol. 2017;26(11):2880–2894. 10.1111/mec.14066 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Duffy CW, Assefa SA, Abugri J, et al. : Comparison of genomic signatures of selection on Plasmodium falciparum between different regions of a country with high malaria endemicity. BMC Genomics. 2015;16(1):527. 10.1186/s12864-015-1746-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Mobegi VA, Duffy CW, Amambua-Ngwa A, et al. : Genome-wide analysis of selection on the malaria parasite Plasmodium falciparum in West African populations of differing infection endemicity. Mol Biol Evol. 2014;31(6):1490–9. 10.1093/molbev/msu106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Shetty AC, Jacob CG, Huang F, et al. : Genomic structure and diversity of Plasmodium falciparum in Southeast Asia reveal recent parasite migration patterns. Nat Commun. 2019;10(1):2665. 10.1038/s41467-019-10121-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Auburn S, Campino S, Miotto O, et al. : Characterization of within-host Plasmodium falciparum diversity using next-generation sequence data. PLoS One. 2012;7(2):e32891. 10.1371/journal.pone.0032891 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Assefa SA, Preston MD, Campino S, et al. : estMOI: estimating multiplicity of infection using parasite deep sequencing data. Bioinformatics. 2014;30(9):1292–4. 10.1093/bioinformatics/btu005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Murray L, Mobegi VA, Duffy CW, et al. : Microsatellite genotyping and genome-wide single nucleotide polymorphism-based indices of Plasmodium falciparum diversity within clinical infections. Malar J. 2016;15(1):275. 10.1186/s12936-016-1324-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Chang HH, Worby CJ, Yeka A, et al. : THE REAL McCOIL: A method for the concurrent estimation of the complexity of infection and SNP allele frequency for malaria parasites. PLoS Comput Biol. 2017;13(1):e1005348. 10.1371/journal.pcbi.1005348 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. O’Brien JD, Iqbal Z, Wendler J, et al. : Inferring Strain Mixture within Clinical Plasmodium falciparum Isolates from Genomic Sequence Data. PLoS Comput Biol. 2016;12(6):e1004824. 10.1371/journal.pcbi.1004824 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Robinson T, Campino SG, Auburn S, et al. : Drug-resistant genotypes and multi-clonality in Plasmodium falciparum analysed by direct genome sequencing from peripheral blood of malaria patients.in press. PLoS One. 2011;6(8):e23204. 10.1371/journal.pone.0023204 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. O’Brien JD, Amenga-Etego L, Li R: Approaches to estimating inbreeding coefficients in clinical isolates of Plasmodium falciparum from genomic sequence data. Malar J. 2016;15:473. 10.1186/s12936-016-1531-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Zhu SJ, Almagro-Garcia J, McVean G: Deconvolution of multiple infections in Plasmodium falciparum from high throughput sequencing data. Bioinformatics. 2018;34(1):9–15. 10.1093/bioinformatics/btx530 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Zhu SJ, Hendry JA, Almagro-Garcia J, et al. : The origins and relatedness structure of mixed infections vary with local prevalence of P. falciparum malaria. eLife. 2019;8:e40845. 10.7554/eLife.40845 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Henden L, Lee S, Mueller I, et al. : Identity-by-descent analyses for measuring population dynamics and selection in recombining pathogens. PLoS Genet. 2018;14(5):e1007279. 10.1371/journal.pgen.1007279 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Schaffner SF, Taylor AR, Wong W, et al. : hmmIBD: software to infer pairwise identity by descent between haploid genotypes. Malar J. 2018;17(1):196. 10.1186/s12936-018-2349-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Samad H, Coll F, Preston MD, et al. : Imputation-Based Population Genetics Analysis of Plasmodium falciparum Malaria Parasites. PLoS Genet. 2015;11(4):e1005131. 10.1371/journal.pgen.1005131 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Ravenhall M, Campino S, Clark TG: SV-Pop: population-based structural variant analysis and visualization. BMC Bioinformatics. 2019;20(1):136. 10.1186/s12859-019-2718-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Jacob CG, Tan JC, Miller BA, et al. : A microarray platform and novel SNP calling algorithm to evaluate Plasmodium falciparum field samples of low DNA quantity. BMC Genomics. 2014;15(1):719. 10.1186/1471-2164-15-719 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Preston MD, Assefa SA, Ocholla H, et al. : PlasmoView: A Web-based Resource to Visualise Global Plasmodium falciparum Genomic Variation. J Infect Dis. 2014;209(11):1808–15. 10.1093/infdis/jit812 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Miles A, Iqbal Z, Vauterin P, et al. : Indels, structural variation, and recombination drive genomic diversity in Plasmodium falciparum. Genome Res. 2016;26(9):1288–99. 10.1101/gr.203711.115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Hamilton WL, Claessens A, Otto TD, et al. : Extreme mutation bias and high AT content in Plasmodium falciparum. Nucleic Acids Res. 2017;45(4):1889–901. 10.1093/nar/gkw1259 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Carvalho CMB, Ramocki MB, Pehlivan D, et al. : Inverted genomic segments and complex triplication rearrangements are mediated by inverted repeats in the human genome. Nat Genet. 2011;43(11):1074–81. 10.1038/ng.944 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Molina-Cruz A, Garver LS, Alabaster A, et al. : The human malaria parasite Pfs47 gene mediates evasion of the mosquito immune system. Science. 2013;340(6135):984–7. 10.1126/science.1235264 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Gardiner DL, Dixon MWA, Spielmann T, et al. : Implication of a Plasmodium falciparum gene in the switch between asexual reproduction and gametocytogenesis. Mol Biochem Parasitol. 2005;140(2):153–60. 10.1016/j.molbiopara.2004.12.010 [DOI] [PubMed] [Google Scholar]
  • 61. Moelans II, Meis JF, Kocken C, et al. : A novel protein antigen of the malaria parasite Plasmodium falciparum, located on the surface of gametes and sporozoites. Mol Biochem Parasitol. 1991;45(2):193–204. 10.1016/0166-6851(91)90086-l [DOI] [PubMed] [Google Scholar]
  • 62. Dessens JT, Beetsma AL, Dimopoulos G, et al. : CTRP is essential for mosquito infection by malaria ookinetes. EMBO J. 1999;18(22):6221–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Laufer MK, Thesing PC, Eddington ND, et al. : Return of Chloroquine Antimalarial Efficacy in Malawi. N Engl J Med. 2006;355(19):1959–66. 10.1056/NEJMoa062032 [DOI] [PubMed] [Google Scholar]
  • 64. Laufer MK, Takala‐Harrison S, Dzinjalamala FK, et al. : Return of Chloroquine‐Susceptible Falciparum Malaria in Malawi Was a Reexpansion of Diverse Susceptible Parasites. J Infect Dis. 2010;202(5):801–8. 10.1086/655659 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Frosch AEP, Laufer MK, Mathanga DP, et al. : Return of Widespread Chloroquine-Sensitive Plasmodium falciparum to Malawi. J Infect Dis. 2014;210(7):1110–4. 10.1093/infdis/jiu216 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Wootton JC, Feng X, Ferdig MT, et al. : Genetic diversity and chloroquine selective sweeps in Plasmodium falciparum. Nature. 2002;418(6895):320–3. 10.1038/nature00813 [DOI] [PubMed] [Google Scholar]
  • 67. Mita T, Tanabe K, Kita K: Spread and evolution of Plasmodium falciparum drug resistance.Elsevier, Parasitol Int. 2009;58(3):201–9. 10.1016/j.parint.2009.04.004 [DOI] [PubMed] [Google Scholar]
  • 68. Agrawal S, Moser KA, Morton L, et al. : Association of a Novel Mutation in the Plasmodium falciparum Chloroquine Resistance Transporter With Decreased Piperaquine Sensitivity. J Infect Dis. 2017;216(4):468–76. 10.1093/infdis/jix334 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Naidoo I, Roper C: Mapping ‘partially resistant’, ‘fully resistant’, and ‘super resistant’ malaria. Trends Parasitol. 2013;29(10):505–15. 10.1016/j.pt.2013.08.002 [DOI] [PubMed] [Google Scholar]
  • 70. Heinberg A, Kirkman L: The molecular basis of antifolate resistance in Plasmodium falciparum: looking beyond point mutations. Ann N Y Acad Sci. 2015;1342(1):10–8. 10.1111/nyas.12662 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71. World Health Organization: Artemisinin and artemisinin-based combination therapy resistance: status report.2018. Reference Source [Google Scholar]
  • 72. Price RN, Uhlemann AC, Brockman A, et al. : Mefloquine resistance in Plasmodium falciparum and increased pfmdr1 gene copy number. Lancet. 2004;364(9432):438–47. 10.1016/S0140-6736(04)16767-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73. Cheng Q, Gatton ML, Barnwell J, et al. : Plasmodium falciparum parasites lacking histidine-rich protein 2 and 3: a review and recommendations for accurate reporting. Malar J. 2014;13:283. 10.1186/1475-2875-13-283 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74. WHO: Malaria rapid diagnostic test performance. Results of WHO product testing of malaria RDTs: round 8 (2016-2018).WHO,2018; (accessed Aug 22, 2019). Reference Source [Google Scholar]
  • 75. Gamboa D, Ho MF, Bendezu J, et al. : A Large Proportion of P. falciparum Isolates in the Amazon Region of Peru Lack pfhrp2 and pfhrp3: Implications for Malaria Rapid Diagnostic Tests. PLoS One. 2010;5(1):e8091. 10.1371/journal.pone.0008091 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76. Rachid Viana GM, Akinyi Okoth S, Silva-Flannery L, et al. : Histidine-rich protein 2 ( pfhrp2) and pfhrp3 gene deletions in Plasmodium falciparum isolates from select sites in Brazil and Bolivia. PLoS One. 2017;12(3):e0171150. 10.1371/journal.pone.0171150 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77. Parr JB, Verity R, Doctor SM, et al. : Pfhrp2-deleted Plasmodium falciparum parasites in the democratic republic of the congo: a national cross-sectional survey. J Infect Dis. 2017;216(1):36–44. 10.1093/infdis/jiw538 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78. Menegon M, L’Episcopia M, Nurahmed AM, et al. : Identification of Plasmodium falciparum isolates lacking histidine-rich protein 2 and 3 in Eritrea. Infect Genet Evol. 2017;55:131–4. 10.1016/j.meegid.2017.09.004 [DOI] [PubMed] [Google Scholar]
  • 79. Bharti PK, Chandel HS, Ahmad A, et al. : Prevalence of pfhrp2 and/or pfhrp3 Gene Deletion in Plasmodium falciparum Population in Eight Highly Endemic States in India. PLoS One. 2016;11(8):e0157949. 10.1371/journal.pone.0157949 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80. Baker J, Ho MF, Pelecanos A, et al. : Global sequence variation in the histidine-rich proteins 2 and 3 of Plasmodium falciparum: implications for the performance of malaria rapid diagnostic tests. Malar J. 2010;9:129. 10.1186/1475-2875-9-129 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81. Akinyi S, Hayden T, Gamboa D, et al. : Multiple genetic origins of histidine-rich protein 2 gene deletion in Plasmodium falciparum parasites from Peru. Sci Rep. 2013;3:2797. 10.1038/srep02797 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82. Akinyi Okoth S, Abdallah JF, Ceron N, et al. : Variation in Plasmodium falciparum Histidine-Rich Protein 2 ( Pfhrp2) and Plasmodium falciparum Histidine-Rich Protein 3 ( Pfhrp3) Gene Deletions in Guyana and Suriname. PLoS One. 2015;10(5): e0126805. 10.1371/journal.pone.0126805 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83. Parr JB, Anderson O, Juliano JJ, et al. : Streamlined, PCR-based testing for pfhrp2- and pfhrp3-negative Plasmodium falciparum. Malar J. 2018;17(1):137. 10.1186/s12936-018-2287-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84. World Health Organisation: WHO Strategic Advisory Group on Malaria Eradication. Malaria eradication: benefits, future scenarios and feasibility. Executive Summary.WHO Strategic Advisory Group on Malaria Eradication. Executive Summary. Geneva: World Health Organisation,2019. Reference Source [Google Scholar]
  • 85. Dalmat R, Naughton B, Kwan-Gett TS, et al. : Use cases for genetic epidemiology in malaria elimination. Malar J. 2019;18(1):163. 10.1186/s12936-019-2784-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86. Early AM, Daniels RF, Farrell TM, et al. : Detection of low-density Plasmodium falciparum infections using amplicon deep sequencing. Malar J. 2019;18(1):219. 10.1186/s12936-019-2856-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87. Boyce RM, Hathaway N, Fulton T, et al. : Reuse of malaria rapid diagnostic tests for amplicon deep sequencing to estimate Plasmodium falciparum transmission intensity in western Uganda. Sci Rep. 2018;8(1):10159. 10.1038/s41598-018-28534-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88. Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60. 10.1093/bioinformatics/btp324 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89. DePristo MA, Banks E, Poplin R, et al. : A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43(5):491–8. 10.1038/ng.806 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90. Cingolani P, Platts A, Wang LL, et al. : A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin). 2012;6(2):80–92. 10.4161/fly.19695 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91. Echeverry DF, Deason NA, Davidson J, et al. : Human malaria diagnosis using a single-step direct-PCR based on the Plasmodium cytochrome oxidase III gene. Malar J. 2016;15:128. 10.1186/s12936-016-1185-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92. MalariaGEN: Supplementary data to: An open dataset of Plasmodium falciparum genome variation in 7,000 worldwide samples. figshare. Dataset.2021. 10.6084/m9.figshare.13388603.v1 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wellcome Open Res. 2021 Mar 25. doi: 10.21956/wellcomeopenres.17752.r42796

Reviewer response for version 1

Didier Menard 1

This manuscript from the MalariaGEN consortium, a data-sharing community of teams working on Plasmodium falciparum genomic epidemiology, presents the new release of curated P. falciparum genomes from isolates collected in 73 locations in Africa, Asia, South America and Oceania.

Based on robust and perfectly detailed methods (ranging from the treatment of the blood samples, the DNA extraction, the Illumina and computational platforms developed to produce genome sequencing for variant discovery and genotype calling), they analyzed 7000 P. falciparum genome sequences and provided numerous exciting data. For instance, they found that variations (SNPs and indels) in P. falciparum genome affected about a quarter of the 23 Mb genome (and mostly coding regions), or that duplication genotypes are frequent around mdr1, plasmepsin2/3 and gch1, which are known to be associated with antimalarial drug resistance (including mefloquine, piperaquine and sulfadoxine/pyrimethamine).

Moreover, population genetic analyses conducted on this largest available data resource, depict a comprehensive picture of P. falciparum parasite populations globally and sub populations at continental level. In the results, a large section is devoted to the description of the geographic patterns of validated molecular markers (SNPs and CNVs) associated with antimalarial drug resistance. By compiling data on all samples collected from 2002–2015, they present clear profiles of drug resistance by regional sub-populations for the most used antimalarial drugs. Finally, they reveal a global landscape regarding a major challenge for malaria elimination, that are deletions in hrp2 and 3 genes linked with false negative results of HRP2-based malaria RDT.

Written in a very clear way, it must be point out that the authors have made huge efforts so that these data are understandable for a general audience, especially for the non-experts in genomics or for policy makers in malaria endemic countries. Their data effectively depict the main challenges currently encountered in the fight against malaria: the monitoring of the strategies deployed by the assessment of the impact on P. falciparum parasite populations, the geographical evolution of antimalarial drug resistances and the effectiveness of diagnostic tools used in malaria endemic areas (i.e. malaria RDT).

Of note, the authors fairly expose the main issues and drawbacks related to the methods used (i.e. the analytical challenges due to long tracts of highly repetitive sequence and hypervariable regions within the P. falciparum genome, and the challenges of studying a complex mixture of genotypes from polyclonal infections),

Although, I am impressed by the work done by the consortium, I have several minor comments that could improve the manuscript:

  • Sample collection -  P. falciparum samples investigated are not from systematic sampling collections dedicated to this study but rather from multiple studies conducted by groups with different objectives and from heterogeneous populations (patients living in malaria endemic areas, travelers, etc.) . I think this issue should be discussed in the manuscript.

  • Likewise, the long time period covering the samples collection (2002–2015) is also a major bias which can alter the final results.

  • I guess that all samples were collected from symptomatic patients seen at health facilities level? Unfortunately, this makes that data presented capture only P. falciparum populations infected this population. With the rise of new technologies, I am wondering whether the MalariaGEN consortium could investigate samples collected from asymptomatic individuals and explore the genomic profiles of this hidden reservoir but representing the major parasite biomass?

  • I am aware that the authors have performed a difficult and complex exercise by providing high quality genomic data and comprehensive description of their data for a large audience. The major challenge that is not addressed in the manuscript is how these important data can be translated into concrete actions in the field by health providers.

  • Last comment regarding the database. It will be helpful to provide for each sample/genome sequence, the location (country) and the date of collection.

Is the work clearly and accurately presented and does it cite the current literature?

Yes

If applicable, is the statistical analysis and its interpretation appropriate?

I cannot comment. A qualified statistician is required.

Are all the source data underlying the results available to ensure full reproducibility?

Yes

Is the study design appropriate and is the work technically sound?

Yes

Are the conclusions drawn adequately supported by the results?

Yes

Are sufficient details of methods and analysis provided to allow replication by others?

Yes

Reviewer Expertise:

Expert in antimalarial drug resistance

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Wellcome Open Res. 2021 Jul 6.
Richard Pearson 1

We thank the reviewers for the extremely positive and supportive feedback. In their comments and suggestions both reviewers have well captured the spirit of this data resource and of the large collaborative network behind it. We are pleased to submit detailed responses and a revised version of the manuscript that addresses their comments.

2.1) Sample collection - P. falciparum samples investigated are not from systematic sampling collections dedicated to this study but rather from multiple studies conducted by groups with different objectives and from heterogeneous populations (patients living in malaria endemic areas, travelers, etc.). I think this issue should be discussed in the manuscript.

Thanks for raising this point. On one hand, the heterogeneity of sampling approaches offers a unique opportunity to investigate questions in a variety of epidemiological settings in a systematic way. Specifics of each study are provided in ftp://ngs.sanger.ac.uk/production/malaria/pfcommunityproject/Pf6/Pf_6_partner_studies.pdf and users of the resource can contact individual investigators for further details. At the same time, we agree that this can also act as a confounder in some analysis, which is why we’ve devoted significant time to the curation of the dataset to make it “analysis ready”.

As suggested, we have amended the manuscript in version 2 to include the considerations above in the paragraph: “Samples were collected by independent groups that were operative at a given time and in a given place with distinct objectives; while care needs to be taken when interpreting results spanning multiple years and geographical settings (e.g. aggregated trends of drug resistance prevalence), this heterogeneity also allows for the exploration of a wide range of epidemiological and transmission settings.”

2.2) Likewise, the long time period covering the samples collection (2002–2015) is also a major bias which can alter the final results.

This is an important point in particular for interpreting drug resistance results, and one we explicitly bring out in the paragraph: “Note that samples were collected over a relatively long time period (2002–15) during which there were major changes in global patterns of drug resistance, and that the sampling locations represented in a given year depended on which partner studies were operative at the time. To alleviate this problem, we have also divided the data into samples collected before and after 2011 (Supplementary Data; Supplementary table 10), but temporal trends in aggregated data should be interpreted with due caution.”. Following the reviewer’s suggestion, we have now stressed this point further in our reply to point (2.1) above.

2.3) I guess that all samples were collected from symptomatic patients seen at health facilities level? Unfortunately, this makes that data presented capture only P. falciparum populations infected this population. With the rise of new technologies, I am wondering whether the MalariaGEN consortium could investigate samples collected from asymptomatic individuals and explore the genomic profiles of this hidden reservoir but representing the major parasite biomass?

Asymptomatic infections are indeed an incredibly significant reservoir that needs to be explicitly considered to achieve a complete and accurate picture of the transmission landscape. The development of new technologies has begun to dig deeper and deeper in this area and initial results seem to be very encouraging that good quality data can indeed be obtained from asymptomatic and/or low parasitemia subjects. MalariaGEN would certainly be supportive of this kind of effort and we have indeed active collaborations with partners exploring these questions. To the best of our knowledge, though, some of these methodologies are still of limited sensitivity and in part experimental and will require further work in order to be deployed on the large scale required by this scientific question, but that is certainly an area for future investigation.

2.4) I am aware that the authors have performed a difficult and complex exercise by providing high quality genomic data and comprehensive description of their data for a large audience. The major challenge that is not addressed in the manuscript is how these important data can be translated into concrete actions in the field by health providers.

This data resource represents a clear step towards the ultimate objective of translating genomic surveillance outputs into actionable actions, although it is fair to say that this is a long journey with many different components. The ability for multiple groups to share data, to analyse it using standardised methods, and to make it readily accessible is the foundation for translational impact to reach maturity.

In the discussion we highlighted a series of future translational directions which have been and will be facilitated by resources like this one (and future ones) but it is certainly true that these results require careful interpretation due to the caveats highlighted in the paper and by the reviewer, which inevitably limit their impact. At the same time this dataset does create a systematic framework to enact and contextualize future discoveries of that nature and, indirectly, contributes to them.

Ultimately, the practical value for malaria control will be greatly enhanced by the progressive acquisition of longitudinal time-series data and their integration with other sources of epidemiological data which will allow control programmes to monitor the impact of their interventions on the parasite population in near real time.

2.5) Last comment regarding the database. It will be helpful to provide for each sample/genome sequence, the location (country) and the date of collection.

This information is included in the “Sample provenance and sequencing metadata” file available at the resource page https://www.malariagen.net/resource/26

Wellcome Open Res. 2021 Mar 22. doi: 10.21956/wellcomeopenres.17752.r42794

Reviewer response for version 1

Maria Isabel Veiga 1, Nuno S Osório 1

The analysis of whole-genome sequences obtained from Plasmodium falciparum is particularly challenging due to the presence of hypervariable regions, highly repetitive sequences, and frequent mixture of parasites due to multiple infections of the host. The authors of this study describe a curated list of over three million high-confidence polymorphisms obtained from the genome sequence analysis of more than 7000 samples of P. falciparum collected by several studies in 73 locations in Africa, Asia, South America and Oceania.

This work, reporting a laudable effort to substantially enrich publicly available genome data of P. falciparum worldwide, is of paramount importance for the field. The contribution goes in line with authors' previous consortia publications, extending largely the number of available data that can be analysed via web with powerful data analysis pipelines. By providing open access to a curated list of polymorphisms based on reproducible and high-quality protocols for the sequencing and analysis of P. falciparum genomes this study is likely to decrease the difficulties that have delayed the research on genomic epidemiology and population genomics of P. falciparum. Among other advances, studies in this area are likely to have important implications for a better understanding of the evolution towards drug resistance of the different global parasite populations ultimately contributing for a better control of this devastating disease. The manuscript is very well written and clear. It presents eight genetically distinct populations of parasites each endemic to different word regions, including South America, West Africa, Central Africa, East Africa, South Asia, West Southeast Asia, East Southeast Asia and Oceania. An interesting genetic and geographic characterization of the eight parasite populations is also shown. Of note, the finding of higher within-host diversity in the parasite populations endemic to Africa, the identification of single nucleotide polymorphism with high levels of geographic differentiation, and further characterization of geographic patterns of drug resistance and polymorphisms with potential impact in rapid diagnostic tests. We do not have major criticisms of the study.

Our minor suggestions for the improval of the manuscript focus on:

  • Increasing the accessibility of the table listing polymorphisms in supplementary data. The authors do provide the data in VCF and zarr files, which are not very user friendly nor allow a fast search of a specific polymorphism. We understand that developing a web interface for this purpose would be a challenge beyond this research article but possibly exporting the VCF file data into tables that could be available in online repositories.

  • Add to the supplementary file 4, describing the drug resistance markers genotype, the PfMDR1 N86Y. This SNP is a well-known modulator of antimalarial response and considered a risk factor for the treatment of artemether-lumefantrine.

  • Add the ID of the genes most mentioned in the main article. The gene ID (PF3D7_xxxxxxx), is provided in supplementary file 7, but to clarify the reader, we recommend to add it also in the main article when first describing the genes.

  • In the results section, when describing gene amplification and different sets of breakpoints, the authors describe complex rearrangements that have not been observed before in Plasmodium species. In regards to pfmdr1 duplication events has been described to vary in size while spanning different genes in different parasites 1 , 2 , 3 , 4 . In a genome walking like approach, it has been described different amplicon sizes containing the pfmdr1 in clinical isolates from Southeast Asia where they also investigated if the type (i.e., which genes are included) and size of the amplicon influence drug susceptibility phenotypes 5 .

Is the work clearly and accurately presented and does it cite the current literature?

Yes

If applicable, is the statistical analysis and its interpretation appropriate?

Yes

Are all the source data underlying the results available to ensure full reproducibility?

Yes

Is the study design appropriate and is the work technically sound?

Yes

Are the conclusions drawn adequately supported by the results?

Yes

Are sufficient details of methods and analysis provided to allow replication by others?

Yes

Reviewer Expertise:

Molecular epidemiology, antimalarial drug resistance

We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

References

  • 1. : Amplification of the multidrug resistance gene in some chloroquine-resistant isolates of P. falciparum. Cell .1989;57(6) : 10.1016/0092-8674(89)90330-9 921-930 10.1016/0092-8674(89)90330-9 [DOI] [PubMed] [Google Scholar]
  • 2. : Recurrent gene amplification and soft selective sweeps during evolution of multidrug resistance in malaria parasites. Mol Biol Evol .2007;24(2) : 10.1093/molbev/msl185 562-73 10.1093/molbev/msl185 [DOI] [PubMed] [Google Scholar]
  • 3. : Amplification of the multidrug resistance gene pfmdr1 in Plasmodium falciparum has arisen as multiple independent events. Mol Cell Biol .1991;11(10) : 10.1128/mcb.11.10.5244 5244-50 10.1128/mcb.11.10.5244 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. : Genome wide gene amplifications and deletions in Plasmodium falciparum. Mol Biochem Parasitol .2007;155(1) : 10.1016/j.molbiopara.2007.05.005 33-44 10.1016/j.molbiopara.2007.05.005 [DOI] [PubMed] [Google Scholar]
  • 5. : pfmdr1 amplification is related to increased Plasmodium falciparum in vitro sensitivity to the bisquinoline piperaquine. Antimicrob Agents Chemother .2012;56(7) : 10.1128/AAC.06350-11 3615-9 10.1128/AAC.06350-11 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wellcome Open Res. 2021 Jul 6.
Richard Pearson 1

We thank the reviewers for the extremely positive and supportive feedback. In their comments and suggestions both reviewers have well captured the spirit of this data resource and of the large collaborative network behind it. We are pleased to submit detailed responses and a revised version of the manuscript that addresses their comments.

1.1) Increasing the accessibility of the table listing polymorphisms in supplementary data. The authors do provide the data in VCF and zarr files, which are not very user friendly nor allow a fast search of a specific polymorphism. We understand that developing a web interface for this purpose would be a challenge beyond this research article but possibly exporting the VCF file data into tables that could be available in online repositories.

We thank the reviewer for this important feedback on how to increase the reach of this resource. Since the publication of this article, we have been working on an initial web interface that allows users to navigate some aspects of the data: please see https://www.malariagen.net/apps/pf6. The current version mainly focuses on epidemiologically relevant data and emphasises the community behind the project and at the moment doesn’t provide access to the genomic variation information, which will require further work.

Of course accessibility is a relative criteria and as such it requires balancing out different priorities. In the past we have provided tabular versions of the data ( www.malariagen.net/data) but the benefits have been very limited. For example, handling multiallelic and non-SNP variations requires somewhat arbitrary encoding decisions that significantly affect the simplicity and intuitiveness of the tabular format. Increasing the sample size has made these variations more common (e.g. in this release there are about 50% non-SNP variants and 50% multiallelic variants) to the point that there was no real advantage in maintaining the format. The decision of primarily utilising the VCF format comes from the recognition that these files are the standard de facto in the genomic community, which in turn has developed a large ecosystem of tools to handle them: please see the README at ftp://ngs.sanger.ac.uk/production/malaria/pfcommunityproject/Pf6/Pf_6_README_20191010.txt for some examples, e.g. to subset the data.

However we agree this might still be limiting for some use cases and we are working towards a more integrated solution. As an example of our direction of travel, please see  https://malariagen.github.io/vector-data/landing-page.html, which presents some simplified data access workflows for the MalariaGEN Anopheles gambiae 1000 Genomes Project.

1.2) Add to the supplementary file 4, describing the drug resistance markers genotype, the PfMDR1 N86Y. This SNP is a well-known modulator of antimalarial response and considered a risk factor for the treatment of artemether-lumefantrine.

We recognise that there is growing evidence of the role of PfMDR1 N86Y in artemether-lumefantrine resistance. In particular, multiple studies have shown that lumefantrine appears to select for N86. Despite that, WHO still reports markers of resistance to lumefantrine as “Yet to be validated” (p. 22 - https://www.who.int/publications/i/item/9789240012813 ). In this release, supplementary file 4 only contains validated markers so it would be inconsistent to add the markers. However, we will consider adding putative markers in future releases where appropriate.

1.3) Add the ID of the genes most mentioned in the main article. The gene ID (PF3D7_xxxxxxx), is provided in supplementary file 7, but to clarify the reader, we recommend to add it also in the main article when first describing the genes.

We have implemented the recommendation and added gene IDs every time a gene is mentioned for the first time in the manuscript version 2.

1.4) In the results section, when describing gene amplification and different sets of breakpoints, the authors describe complex rearrangements that have not been observed before in Plasmodium species. In regards to pfmdr1 duplication events has been described to vary in size while spanning different genes in different parasites1,2,3,4. In a genome walking like approach, it has been described different amplicon sizes containing the pfmdr1 in clinical isolates from Southeast Asia where they also investigated if the type (i.e., which genes are included) and size of the amplicon influence drug susceptibility phenotypes5.

The complex rearrangements that have not been observed before which we were referring to here are “dup-trpinv-dup” rearrangements that to the best of our knowledge have only previously been described in human data (see ref 58). This complex and large structural rearrangement involves a triplicated segment embedded within a duplication, in which the triplicated segment is inverted. We recognise that the original wording in the text was ambiguous and we’ve replaced “complex rearrangements” with an explicit description of the event.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Availability Statement

    Underlying data

    Data are available under the MalariaGEN terms of use for the Pf Community Project: https://www.malariagen.net/data/terms-use/p-falciparum-community-project-terms-use. Depending on the nature, format and content of the data, appropriate mechanisms have been utilised for data access, as detailed below.

    This project contains the following underlying data that are available as an online resource: www.malariagen.net/resource/26. Data are also available from Figshare.

    Figshare: Supplementary data to: An open dataset of Plasmodium falciparum genome variation in 7,000 worldwide samples. https://doi.org/10.6084/m9.figshare.13388603 92 .

    • Study information: Details of the 49 contributing partner studies, including description, contact information and key people.

    • Sample provenance and sequencing metadata: sample information including partner study information, location and year of collection, ENA accession numbers, and QC information for 7,113 samples from 28 countries.

    • Measure of complexity of infections: characterisation of within-host diversity (FWS) for 5,970 QC pass samples.

    • Drug resistance marker genotypes: genotypes at known markers of drug resistance for 7,113 samples, containing amino acid and copy number genotypes at six loci: crt, dhfr, dhps, mdr1, kelch13, plasmepsin 2–3.

    • Inferred resistance status classification: classification of 5,970 QC pass samples into different types of resistance to 10 drugs or combinations of drugs and to RDT detection: chloroquine, pyrimethamine, sulfadoxine, mefloquine, artemisinin, piperaquine, sulfadoxine- pyrimethamine for treatment of uncomplicated malaria, sulfadoxine- pyrimethamine for intermittent preventive treatment in pregnancy, artesunate-mefloquine, dihydroartemisinin-piperaquine, hrp2 and hrp3 genes deletions.

    • Drug resistance markers to inferred resistance status: details of the heuristics utilised to map genetic markers to resistance status classification.

    • Gene differentiation: estimates of global and local differentiation for 5,561 genes.

    • Short variants genotypes: Genotype calls on 6,051,696 SNPs and short indels in 7,113 samples from 29 countries, available both as VCF and zarr files.

    Extended data

    This project contains the following underlying supplementary data available as a single document download: www.malariagen.net/resource/26. Extended data are also available from Figshare.

    Figshare: Supplementary data to: An open dataset of Plasmodium falciparum genome variation in 7,000 worldwide samples. https://doi.org/10.6084/m9.figshare.13388603 92 .

    ‘File9_Pf_6_supplementary’ contains the Supplementary Note, Supplementary Tables and Supplementary Figure:

    • Supplementary Note

      • Analysis of local differentiation score

      • The classic 76T chloroquine resistance mutation in crt is found on multiple haplotypes

      • Suplhadoxine-pyrimethamine resistance is widespread and associated with many haplotypes

      • mdr1 duplications have many different breakpoints

      • Artemisinin, piperaquine, and mefloquine resistance

      • No evidence of resistance to less commonly used antimalarials

    • Supplementary Table 1. Breakdown of analysis set samples by geography.

    • Supplementary Table 2. Studies contributing samples.

    • Supplementary Table 3. Summary of discovered variant positions.

    • Supplementary Table 4. Breakpoints of duplications of gch1.

    • Supplementary Table 5. Breakpoints of duplications of mdr1.

    • Supplementary Table 6. Breakpoints of duplications of plasmepsin 2–3.

    • Supplementary Table 7. Genes ranked by global differentiation score.

    • Supplementary Table 8. Genes ranked by local differentiation score.

    • Supplementary Table 9. Number of samples used to determine proportions in Table 2.

    • Supplementary Table 10. Frequencies of mutations associated with mono- and multi-drug resistance pre- and post-2011.

    • Supplementary Table 11. Frequency of crt amino acid 72–76 haplotypes.

    • Supplementary Table 12. Frequencies of dhfr (51, 59, 108, 164) and dhps (437, 540, 581, 613) multi-locus haplotypes.

    • Supplementary Table 13. Frequency of HRP2 and HRP3 deletions by country.

    • Supplementary Table 14. Alleles at six mitochondrial positions used for the species identification.

    • Supplementary Figure 1. Histogram of local differentiation score for all genes.

    Data hosted with Figshare are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

    Data analysis group

    Pearson, RD * , Amato, R * , Hamilton, WL, Almagro-Garcia, J, Chookajorn, T, Kochakarn, T, Miotto, O, Kwiatkowski, DP

    *Joint analysis lead

    Local study design, implementation and sample collection

    Ahouidi, A, Amambua-Ngwa, A, Amaratunga, C, Amenga-Etego, L, Andagalu, B, Anderson, TJC, Apinjoh, T, Ashley, EA, Auburn, S, Awandare, G, Ba, H, Baraka, V, Barry, AE, Bejon, P, Bertin, GI, Boni, MF, Borrmann, S, Bousema, T, Branch, O, Bull, PC, Chotivanich, K, Claessens, A, Conway, D, Craig, A, D’Alessandro, U, Dama, S, Day, N, Denis, B, Diakite, M, Djimdé, A, Dolecek, C, Dondorp, A, Drakeley, C, Duffy, P, Echeverry, DF, Egwang, TG, Erko, B, Fairhurst, RM, Faiz, A, Fanello, CA, Fukuda, MM, Gamboa, D, Ghansah, A, Golassa, L, Harrison, GLA, Hien, TT, Hill, CA, Hodgson, A, Imwong, M, Ishengoma, DS, Jackson, SA, Kamaliddin, C, Kamau, E, Konaté, A, Kyaw, MP, Lim, P, Lon, C, Loua, KM, Maïga-Ascofaré, O, Marfurt, J, Marsh, K, Mayxay, M, Mobegi, V, Mokuolu, OA, Montgomery, J, Mueller, I, Newton, PN, Nguyen, TN, Noedl, H, Nosten, F, Noviyanti, R, Nzila, A, Ochola-Oyier, LI, Ocholla, H, Oduro, A, Omedo, I, Onyamboko, MA, Ouedraogo, J, Oyebola, K, Peshu, N, Phyo, AP, Plowe, CV, Price, RN, Pukrittayakamee, S, Randrianarivelojosia, M, Rayner, JC, Ringwald, P, Ruiz, L, Saunders, D, Shayo, A, Siba, P, Su, X, Sutherland, C, Takala-Harrison, S, Tavul, L, Thathy, V, Tshefu, A, Verra, F, Vinetz, J, Wellems, TE, Wendler, J, White, NJ, Yavo, W, Ye, H

    Sequencing, data production and informatics

    Pearson, RD, Stalker, J, Ali, M, Amato, R, Ariani, C, Busby, G, Drury, E, Hart, L, Hubbart, C, Jacob, CG, Jeffery, B, Jeffreys, AE, Jyothi, D, Kekre, M, Kluczynski, K, Malangone, C, Manske, M, Miles, A, Nguyen, T, Rowlands, K, Wright, I, Goncalves, S, Rockett, KA

    Partner study support and coordination

    Simpson, VJ, Miotto, O, Amato, R, Goncalves, S, Henrichs, C, Johnson, KJ, Pearson, RD, Rockett, KA, Kwiatkowski, DP


    Articles from Wellcome Open Research are provided here courtesy of The Wellcome Trust

    RESOURCES