Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Aug 1.
Published in final edited form as: Mol Ecol Resour. 2022 May 3;22(6):2285–2303. doi: 10.1111/1755-0998.13622

Design and implementation of multiplexed amplicon sequencing panels to serve genomic epidemiology of infectious disease: a malaria case study

Emily LaVerriere 1,2,, Philipp Schwabl 1,2,, Manuela Carrasquilla 1,2,3,, Aimee R Taylor 2,4, Zachary M Johnson 1,2, Meg Shieh 1,2, Ruchit Panchal 1,2, Timothy J Straub 1,2, Rebecca Kuzma 1,2, Sean Watson 1, Caroline O Buckee 4, Carolina M Andrade 5, Silvia Portugal 3,5, Peter D Crompton 6, Boubacar Traore 7, Julian C Rayner 8, Vladimir Corredor 9, Kashana James 10, Horace Cox 11, Angela M Early 1,2, Bronwyn L MacInnis 2, Daniel E Neafsey 1,2,*
PMCID: PMC9288814  NIHMSID: NIHMS1819057  PMID: 35437908

Abstract

Multiplexed PCR amplicon sequencing (AmpSeq) is an increasingly popular application for cost-effective monitoring of threatened species and managed wildlife populations, and shows strong potential for genomic epidemiology of infectious disease. AmpSeq data from infectious microbes can inform disease control in multiple ways, including measuring drug resistance marker prevalence, distinguishing imported from local cases, and determining the effectiveness of therapeutics. We describe the design and comparative evaluation of two new AmpSeq assays for Plasmodium falciparum malaria parasites: a four-locus panel (‘4CAST’) composed of highly diverse antigens, and a 129-locus panel (‘AMPLseq’) composed of drug resistance markers, highly diverse loci for inferring relatedness, and a locus to detect Plasmodium vivax co-infection. We explore the performance of each panel in various public health use cases with in silico simulations as well as empirical experiments. The 4CAST panel appears highly suitable for evaluating the number of distinct parasite strains within samples (complexity of infection), showing strong performance across a wide range of parasitemia levels without a DNA pre-amplification step. For relatedness inference, the larger AMPLseq panel performs similarly to two existing panels of comparable size, despite differences in the data and approach used for designing each panel. Finally, we describe an R package (paneljudge) that facilitates the design and comparative evaluation of genetic panels for relatedness estimation, and we provide general guidance on the design and implementation of AmpSeq panels for genomic epidemiology of infectious disease.

Keywords: malaria, genotyping, genome, epidemiology, relatedness, amplicon sequencing

Introduction

Genetic data are a valuable resource for understanding microbial ecology and the epidemiology of infectious disease. The value of this data type has been highlighted by the COVID-19 pandemic, for which viral sequence analysis has greatly informed patterns of disease spread and evolution, influencing public health policy decisions around the world (Oude Munnink et al., 2021). Applications of genetic data in epidemiology extend from viral and bacterial outbreak management (Gardy & Loman, 2018) to the study of eukaryotic parasites underlying important diseases such as malaria, leishmaniasis, and cryptosporidiosis (Cantacessi, Dantas-Torres, Nolan, & Otranto, 2015; Nader et al., 2019; Neafsey, Taylor, & MacInnis, 2021).

Many use cases (applications) of genetic data have been identified for malaria (Dalmat, Naughton, Kwan-Gett, Slyker, & Stuckey, 2019), the leading parasitic killer worldwide (WHO, 2019). These include tracking the spread of drug/insecticide resistance markers and diagnostic resistance mutations (Chenet et al., 2016; Jacob et al., 2021; Kayiba et al., 2021; Lautu-Gumal et al., 2021; Miotto et al., 2020), assessing disease transmission levels (Daniels et al., 2015; Galinsky et al., 2015), identifying sources of infections and imported cases (Liu et al., 2020; Tessema et al., 2019), and estimating genetic connectivity among different populations (Taylor et al., 2017). Malaria parasite genetic data also have demonstrated utility in therapeutic efficacy studies, e.g., for distinguishing recrudescent infections, potentially indicative of low drug efficacy, from reinfections (Gruenberg, Lerch, Beck, & Felger, 2019; Jones et al., 2021). In the malaria field, these applications are served by different types of genetic data produced at varying resolution, technical complexity, and cost, ranging from genetic panels that comprise as few as 8 – 12 polymorphic microsatellites (MS) or 24 single nucleotide polymorphisms (SNPs) (Daniels et al., 2008; Yalcindag et al., 2012) to whole genome sequencing (WGS) data (Miotto et al., 2015; Takala-Harrison et al., 2015).

To be scalable and sustainable, genetic data should be produced at the minimum resolution that provides robust support for the intended analysis application. WGS data often provide the most complete population genomic perspective on an organism of interest. However, the cost and technical challenges of generating, storing, and interpreting WGS data are impediments to scalability and widespread implementation for organisms with large genomes, or microbes with small genomes in samples dominated by host DNA. Targeted sequencing approaches that focus deep coverage on select genomic regions of interest using multiplexed PCR amplification (AmpSeq) are finding increased application in areas of conservation biology, fisheries science, and evolutionary research (e.g., phylogenetics) (Baetscher, Clemento, Ng, Anderson, & Garza, 2018; Bybee et al., 2011; Dupuis et al., 2018; Hargrove, McCane, Roth, High, & Campbell, 2021; Natesh et al., 2019; O’Neill et al., 2013; Schmidt, Campbell, Govindarajulu, Larsen, & Russello, 2020). These approaches can also serve genomic epidemiology of infectious diseases by focusing sequencing coverage on the most informative regions of pathogen genomes.

While only a few AmpSeq protocols for eukaryotic parasites have been published to date, pioneer examples for malaria and trypanosomatid parasites have confirmed the viability of this approach with low-parasitemia host and vector samples, where parasite DNA comprises a very small fraction of the total sample (Jacob et al., 2021; Ruybal-Pesántez et al., 2021; Schwabl et al., 2020; Tessema et al., 2020). Furthermore, one recent study has confirmed the value of designing amplicons to capture multi-SNP P. falciparum ‘microhaplotypes’, which exhibit polyallelic rather than biallelic diversity to facilitate relatedness inference (Tessema et al., 2020). New relatedness-based analytical approaches for genomic epidemiology are currently developing for malaria parasites and other sexually recombining pathogens (Henden, Lee, Mueller, Barry, & Bahlo, 2018; Schaffner, Taylor, Wong, Wirth, & Neafsey, 2018). The use of genomic data for estimation of recent common ancestry shared by pairs or clusters of parasites or mosquitoes has shown strong potential to provide epidemiologically useful insights over small geographic distances (10s to 100s of kilometers) and short time scales (weeks to months) relative to traditional population genetic parameters of population diversity and divergence (Cerqueira et al., 2017; Taylor et al., 2017). While many analyses of recent common ancestry in malaria parasites to date have used WGS data, targeted genotyping of as few as 200 biallelic SNPs or 100 polyallelic loci (e.g., microsatellites or microhaplotypes) may also be used to infer relatedness with necessary precision (Taylor, Jacob, Neafsey, & Buckee, 2019), making AmpSeq an excellent candidate to serve relatedness estimation.

However, there remains uncertainty in the molecular epidemiology field as to the suitability of existing panels for profiling pathogen populations in specific geographic locations that did not inform the original panel designs, and it is unclear which protocol features are most conducive to implementation in both high and low-resource settings. Should each disease field adopt a common multiplexed amplicon protocol and panel, or should bespoke panels be implemented regionally to address genetically distinct pathogen populations and specific use cases?

To address these questions, in this manuscript we describe the design and comparative evaluation of two new multiplexed amplicon assays for P. falciparum malaria parasites: a four-locus panel composed of highly diverse loci, useful for estimating the number of genetically distinct strains within an infection (complexity of infection; COI) as well as distinguishing between continuing and newly acquired infections in any geographic setting, and a 129-locus panel composed of drug resistance markers and many diverse loci for relatedness inference initially designed for application in South America (a region that did not inform previously published panel designs). Both assays use non-proprietary reagents (including standard PCR oligos) in order to maximize accessibility and affordability in malaria-endemic settings. The panels are supported by new open-source bioinformatic analysis pipelines to facilitate widespread use. We also show that the core sets of multiplexed PCR oligos can flexibly accommodate various new targets not included in the original designs, allowing for panel customization towards detecting locally relevant resistance markers, polymorphic loci, and co-infecting parasite species. We use WGS data to explore the degree to which our newly described and previously published genotyping panels can serve studies in diverse geographies, versus the alternative of customizing panels with targets that are locally informative but not globally useful. We suggest there is value in genotyping panels that can be flexibly adapted to incorporate informative targets from pathogen populations of interest. The analyses and resources described in this manuscript clarify the rapidly diversifying options for targeted microbial sequencing (Fig. 1) by providing tools and guidance for the comparative evaluation and refinement of AmpSeq approaches.

Figure 1. Amplicon sequencing and other genotyping approaches for genomic epidemiology of infectious diseases.

Figure 1.

Schematic of three common approaches for molecular surveillance data generation. Genomic DNA can be extracted from clinical samples and then processed using any of the three methods shown: SNP barcoding, amplicon sequencing, or whole genome sequencing (WGS). Our two amplicon panels, AMPLseq and 4CAST, are shown with representations of their loci and amplification. Pre-amplification (selective whole genome amplification), which increases the ratio of parasite to human DNA in samples, is generally recommended for WGS and some amplicon sequencing panels (AMPLseq, but not 4CAST). SNP barcoding provides data in the form of variant calls at each SNP; amplicon sequencing provides extremely deep coverage at select, small regions of the genome; and WGS generally provides shallower coverage of the entire genome.

Materials and Methods

Panel designs

We developed a small multiplex of four highly polymorphic antigenic loci, dubbed ‘4CAST’: CSP, AMA1, SERA2, and TRAP (Fig. 2). All four amplicons use previously published primer sequences (Miller et al., 2017; Neafsey et al., 2015), as no modification was required for successful multiplexing.

Figure 2. Global characterization of loci in the 4CAST and AMPLseq panels.

Figure 2.

Estimates of diversity of each locus in the 4CAST and AMPLseq panels, with one locus per row. We estimated haplotypic diversity from monoclonal P. falciparum WGS data from each country. The top 4 loci shown represent the 4CAST loci, which are also included in the AMPLseq panel. All 128 P. falciparum loci in the AMPLseq panel are shown; the single P. vivax locus is not shown.

In designing the larger multiplexed amplicon panel we call ‘AMPLseq’ (short for ‘Assorted Mix of Plasmodium Loci’), we first built a large pool of candidate loci, anticipating significant attrition of candidates due to primer incompatibility. We prioritized four classes of loci: loci within antigens of interest (Helb et al., 2015), loci with high population diversity for relatedness inference (Taylor et al., 2019), loci included in the SpotMalaria v1 panel (Chang et al., 2019; Jacob et al., 2021), and known drug resistance markers. We contracted the services of GTseek LLC (https://gtseek.com) to design multiplexed oligo panels according to the criteria previously described for the Genotyping-in-Thousands by sequencing (GT-seq) protocol (Campbell, Harmon, & Narum, 2015) (S1 Supporting information). We optimized the final primer set and reaction conditions through several sequencing runs and determined that the primers for the four 4CAST loci (CSP, AMA1, SERA2, TRAP) could be added to the panel without compromising amplification of the other loci. We also successfully added primers amplifying known markers of antimalarial drug resistance within the genes dhfr, dhps, mdr1, and kelch13 (S2 Table). Furthermore, we added previously described primers targeting a region within Pvdhfr (Lefterova, Budvytiene, Sandlund, Färnert, & Banaei, 2015) in order to identify P. falciparum / P. vivax co-infections undetected in preliminary screening by microscopy or rapid diagnostic test (RDT). The final panel contains this single P. vivax locus and 128 P. falciparum loci (Fig. 2), with a median length across all amplicons of 276 bp (S1 Fig.).

Panel protocols

To create the primer pool used in 4CAST PCR1, we combined 100 μM of each 4CAST primer (S3 Table) and diluted the combined primer mix to 6.25 μM per primer in nuclease-free water (NF dH2O). Each 10.5 μl PCR1 reaction incorporated 1.5 μl combined primer mix, 5 μl KAPA HiFi HotStart ReadyMix (2x), and 4 μl sample template. Each 12.2 μl PCR2 reaction incorporated 2.2 μl unique dual index (10 μM Illumina Nextera DNA UD Indexes), 5 μl KAPA HiFi HotStart ReadyMix (2x), 2 μl NF dH2O and 3 μl PCR1 product. PCR cycling conditions are provided in S1 Protocol. We combined PCR2 products in equal volumes and subjected the resultant library to double-sided size selection using Agencourt AMPure XP beads (Beckman Coulter). We verified size selection via Agilent Bioanalyzer 2100 and sequenced the selected library at 6 pM with >10% PhiX in paired-end, 500-cycle format using MiSeq Reagent Kit v2 (S1 Protocol).

We followed a similar nested PCR and pooled clean-up procedure for AMPLseq library construction. Primer sequences, input volumes and concentrations are listed in S3 Table and PCR conditions and size selection steps are described in S2 Protocol. As detailed therein, AMPLseq library construction differs to 4CAST library construction in a few minor aspects. For example, primer input quantities vary slightly (800 pmol +/− 33%) to account for amplification rate differences among loci. PCR1 products are diluted prior to PCR2 and only single-sided (left-tailed) bead-based size selection is used to enhance yield. Sequencing also occurs via paired-end, 500-cycle MiSeq but with a higher final library loading concentration (12 pM) and a lower fraction of PhiX (8%).

Mock samples

We generated mock samples from parasite lines 3D7 and Dd2, cultured at 3% hematocrit in commercially obtained red blood cells as previously described (Trager & Jensen, 1976). We extracted genomic DNA using the Qiagen Blood and Tissue Kit on cells previously lysed with 0.15% saponin. We generated positive control template representing DNA extractions from whole human blood infected with 10000 monoclonal 3D7 parasites/μl by combining 13.76 ng/μl human genomic DNA (Promega, Madison, WI) with 0.92 ng/μl 3D7 genomic DNA at a ratio (v/v) of 2.66 to 1, resulting in a mixture containing 10 ng/μl human genomic DNA and 0.25 ng/μl 3D7 genomic DNA. We used 0.25 ng to represent the mass of 10000 P. falciparum genomes based on a 22.8 Mbp genome size and an average mass of 660 g per mol bp; this assumes one haploid parasite genome per infected cell, as expected for peripheral blood (the target profile). We generated further control templates representing 1000, 100, and 10 3D7 parasites/μl by serial 1:10 dilution of the 10000 3D7 parasites/μl control with 10 ng/μl human genomic DNA. We also generated a 10000 parasites/μl positive control as described above but using Dd2 instead of 3D7 strain genomic DNA. We generated mixed-strain control templates by combining the 10000 3D7 parasites/μl control with this 10000 Dd2 parasites/μl control at 1:1, 3:1, and 10:1 ratios (respectively). We serially diluted the 1:1 ratio to 1000, 100, and 10 parasites/μl concentrations and diluted the 3:1 and 10:1 ratios to 1000 and 100 parasites/μl concentrations using 10 ng/μl human genomic DNA diluent as before. We also applied selective whole genome amplification (sWGA) to all above control templates representing ≤ 1000 parasites/μl. The 50 μl sWGA reaction followed Oyola et al. 2016 (Oyola et al., 2016) with the exception of fixing template input volume to 10 μl. We purified sWGA products with Agencourt AMPure XP beads (Beckman Coulter) on the KingFisher Flex (S3 Protocol) and verified amplification success via NanoDrop (ThermoFisher Scientific).

Clinical samples

We tested the panels on clinical dried blood spot (DBS) samples from Mali and Guyana. Tran et al. collected samples in Kalifabougou, Mali between 2011 and 2013 as previously described (Tran et al., 2013). The Kalifabougou cohort study was approved by the Ethics Committee of the Faculty of Medicine, Pharmacy and Dentistry at the University of Sciences, Technique and Technology of Bamako, and the Institutional Review Board of the National Institute of Allergy and Infectious Diseases, National Institutes of Health (NIH IRB protocol number: 11IN126; https://clinicaltrials.gov/; trial number NCT01322581). Written informed consent was obtained from parents or guardians of participating children before inclusion in the study. The Guyana Ministry of Health collected samples from Port Kaituma and Georgetown, Guyana between May and August 2020 by spotting participants’ whole blood onto Whatman FTA cards and storing the samples with individual desiccant packets at room temperature. Informed consent (or parental assent for minors) was obtained for all subjects according to protocols approved by ethical committees.

We punched DBS samples 3 – 5 times into a 96-well deep well plate using the DBS pneumatic card puncher (Analytical Sales and Services, Inc.) equipped with a 3 mm cutter. We then extracted gDNA following the DNA purification from buccal swab section of the KingFisher Ready DNA Ultra 2.0 Prefilled Plates user guide (ThermoFisher Scientific) with minor modifications (S4 Protocol). We used the same sWGA procedure as above on the extracted gDNA.

Whole genome sequencing and variant calling

We performed whole genome sequencing on clinical samples collected in Guyana to validate 4CAST and AMPLseq outcomes. We performed sWGA on DNA samples as described above to enrich parasite DNA. We used the enriched DNA to construct Illumina sequencing libraries from the amplified material using the NEBNext Ultra II FS DNA prep kit (NEB #E6177) prior to sequencing on an Illumina HiSeqX instrument at the Broad Institute, using 150 bp paired-end reads and targeting a sequencing depth of at least 50x. We aligned reads to the P. falciparum v3 reference genome assembly using BWA-MEM (Li, 2013) and called SNPs and INDELs using the GATK HaplotypeCaller (DePristo et al., 2011; McKenna et al., 2010; Van der Auwera et al., 2013) according to the best practices for P. falciparum as determined by the Pf3k consortium (https://www.malariagen.net/resource/34). Analyses were limited to the callable segments of the genome (Miles et al., 2016) and excluded sites at which over 20% of samples were multiallelic. Data from these samples were submitted to the NCBI Sequence Read Archive (http://www.ncbi.nlm.nih.gov/sra) under accession PRJNA758191.

Amplicon data analysis

We developed the application AmpSeQC (S2 Supporting information) to assess sequence quality and amplicon/sequence run success (S2 Fig.). We also used AmpSeQC for P. vivax detection by concatenating the P. falciparum 3D7 and P. vivax PvP01 reference genomes during the BWA-MEM alignment step. For in-depth assessment of P. falciparum sequence variation, we processed paired-end Illumina sequencing data in the form of FASTQ files using a custom analysis pipeline (S2 Supporting information) that leverages the Divisive Amplicon Denoising Algorithm (DADA2) tool designed by Callahan et al. 2016 (Callahan et al., 2016) to obtain microhaplotypes (S2 Fig.). We mapped microhaplotypes obtained from DADA2 against a custom-built database of 3D7 and Dd2 reference sequences for each amplicon locus and filtered microhaplotypes based on edit distance, length, and chimeric identification using a custom R script (S2 Supporting information). We summarized observed sequence polymorphism into a concise format by converting individual microhaplotypes into pseudo-CIGAR strings using a custom python script. Microhaplotypes were discarded if supported by fewer than 10 read-pairs or by less than 1% total read-pairs within a locus, or if they exhibited other error features (S3 Supporting information).

We analyzed native and pre-amplified mock samples to determine precision and sensitivity of the DADA2 pipeline and filters. We defined a true positive (TP) as a microhaplotype with a pseudo-CIGAR string identical to the reference strain (either 3D7 or Dd2). We defined a false positive (FP) as a microhaplotype with a pseudo-CIGAR string not matching 3D7 (in the case of samples containing only 3D7) or not matching 3D7 or Dd2 (in the case of the mixtures), and we defined a false negative (FN) as a locus without any correct microhaplotype representation. We defined precision as TP/(TP + FP), and sensitivity (recall) as TP/(TP + FN). Forty-five of 128 P. falciparum loci in AMPLseq exhibit identical 3D7 and Dd2 reference sequences; we only included these in precision and sensitivity calculations for pure 3D7 controls (i.e., TP + FN = 128); precision and sensitivity calculations for strain mixtures considered only the 83 loci that differ between 3D7 and Dd2 reference sequences (i.e., TP + FN = 83).

All amplicon sequencing data were submitted to http://www.ncbi.nlm.nih.gov/sra under accession PRJNA758191.

Comparator Panels

We compared 4CAST and AMPLseq to two previously published AmpSeq panels for malaria molecular surveillance, Paragon HeOME v1 (Tessema et al., 2020) and SpotMalaria v2 (Jacob et al., 2021).

Paragon HeOME v1, designed via CleanPlex algorithm (Paragon Genomics Inc, USA), contains 100 primer pairs in a single pool. Primer design focused on 150 genetic windows that show high diversity and differentiation (Jose D ≥ 0.21) among clinical isolates from throughout sub-Saharan Africa, resulting in 93 amplicon targets with high median heterozygosity in all malaria endemic regions of the world. The panel also targets seven drug resistance-associated loci. A distinctive feature of HeOME library construction involves its requirement for bead-based clean-up and CleanPlex digestion of each sample between PCR1 and PCR2. The protocol therefore does not require sWGA prior to PCR1.

SpotMalaria v2, designed via Agena BioScience and MPprimer design software, contains 136 primer pairs divided into three different pools. A majority of target loci are intended for genetic barcode generation and contain biallelic sites with ≥ 0.01 minor allele frequency (MAF) in each of eight global parasite populations studied via WGS (MalariaGEN Plasmodium falciparum Community Project, 2016). These barcoding loci were chosen based on their ability to recapitulate WGS-based inferences on pairwise genetic distance, population differentiation, and sample heterozygosity at different spatial scales. Primers also target various drug resistance-associated loci and mitochondrial sequences with conserved primer binding sites among Plasmodium spp. Library construction requires sWGA prior to PCR1 but no special processing between PCR1 and PCR2.

We also compared our amplicon panels to a molecular barcode assay containing 24 SNPs (Daniels et al., 2008). The SNPs targeted by this Taqman qPCR-based assay were chosen principally for their high average MAF (> 0.35) across parasite sample collections from Thailand and Senegal, with further filtering to remove tightly linked SNPs and to minimize the generation of identical barcodes for closely related strains (Daniels et al., 2008).

paneljudge and in silico data simulations

We used WGS data to simulate genotypic panel data for simulations. This publication uses data from the MalariaGEN Plasmodium falciparum Community Project as described online pending publication and public release of dataset Pf7 (https://www.malariagen.net/resource/34). Specifically, we used genomic data from monoclonal samples collected in Mali, Malawi, Senegal, and Thailand (Zhu et al., 2019), and from Colombia and Venezuela (ENA accession numbers in S4 Table). We also used previously published monoclonal genomic data from Guyana (SRA BioProject PRJNA543530) (Mathieu et al., 2020) and French Guiana (SRA BioProject PRJNA242182) (Pelleau et al., 2015). We used the scikit-allel library (Miles et al., 2020) to process the data and then estimate microhaplotype frequency and diversity. Specifically, we used the ‘read_vcf’, ‘is_het’, ‘haploidify_samples’, and ‘distinct_frequencies’ functions to estimate haplotype frequencies (S1 Supporting information).

We assessed the performance of different panels for relatedness inference using simulated data. We simulated data on pairs of haploid genotypes (equivalent to pairs of monoclonal malaria samples) using paneljudge, an R package that we built to simulate data under a hidden Markov model (HMM) (Taylor et al., 2019), which is the same HMM used in the single-population implementation of hmmIBD (Schaffner et al., 2018) (S2 Supporting information). For each panel, we calculated inter-locus distances from the median nucleotide position of each locus and set distances as infinite between chromosomes. For each panel and population of interest, we calculated haplotype frequency estimates using scikit-allel, as described above. Given these distances and frequency estimates, we simulated data using relatedness parameter values of 0.01 (unrelated), 0.50 (related), and 0.99 (clonal), and switch rate parameter values of 1, 5, 10, and 50. For each combination of panel, population, relatedness parameter, and switch rate parameter, we simulated data on 100 haploid genotype pairs using paneljudge. For each haploid genotype pair, we then generated, also using paneljudge, estimates of the relatedness parameter, estimates of the switch rate parameter, and 95% confidence intervals (CIs). The estimates were generated under the same parameterization of the HMM used to simulate the data, i.e., frequencies and distances were unchanged. We next performed classification from the estimates of the relatedness parameters and their CIs. For pairs simulated using a relatedness parameter of 0.01, we generously classified estimates as correct if the lower limit of the 95% confidence interval around the relatedness estimate (LCI) was below or equal to 0.01 and the upper limit of the 95% confidence interval (UCI) was below 0.99. For pairs simulated using a parameter value of 0.50, we classified estimates of relatedness as correct if the LCI was above 0.01 and the UCI was below 0.99. For pairs simulated using a parameter value of 0.99, we classified estimates of relatedness as correct if the LCI was above 0.01 and the UCI was above or equal to 0.99. If the 95% confidence interval spanned both 0.01 and 0.99 (i.e., LCI < 0.01 and UCI > 0.99), then we denoted the estimate as unclassified. To evaluate panel performance in COI estimation, we combined monoclonal WGS data to engineer in silico polyclonal samples using vcftools (Danecek et al., 2011). We counted the number of distinct microhaplotypes observed at each locus per sample and estimated COI as the maximum number of distinct microhaplotypes observed at any locus within a sample.

To evaluate panel performance for geographic attribution, we identified microhaplotypes at loci as described above. We used the microhaplotype sequences themselves and visualized these data using the Rtsne package (Krijthe, 2015), with 5000 iterations, θ of 0.0, and perplexity parameter of 10.We performed PCA analyses using the ‘prcomp’ function in base R version 4.1.2 (R Core Team, 2021).

Results

4CAST and AMPLseq validation

We validated assay precision (defined as TP/(TP + FP)), sensitivity (defined as TP/(TP + FN)), and depth of coverage using 3D7 mock samples representing parasitemia levels between 10 and 10000 parasites/μl in 10 ng/μl human DNA. Following automated and manual filtration steps (S3 Supporting information), both 4CAST and AMPLseq generated 3D7 microhaplotype calls with 100% precision for all parasitemia levels assessed, both with and without pre-amplification by sWGA. 4CAST achieved high sensitivity and depth without preliminary sWGA, generating a median of 43 read-pairs per locus from native templates representing 10 parasites/μl (Fig. 3A). Median depth increased to 443 and 1312.5 read-pairs per locus for native templates representing 100 and 1000 parasites/μl, respectively. Read-pair counts were also evenly distributed among 4CAST loci using native DNA (Fig. 3A).

Figure 3. 4CAST and AMPLseq panel validation with mock and clinical samples.

Figure 3.

A) Boxplots of read-pairs per locus in the 4CAST panel. The first facet shows read-depth per locus across mock samples ranging from 10 – 10000 parasites/μl, using both native and sWGA DNA (n = 4 per condition). The second and third facets show read-depth per locus across two sets of clinical samples, < 1 year old and ~10 year old dried blood spots, respectively (n = 32 per sample set). B) The same 3D7 mock sample sets were used to assess AMPLseq sensitivity (top left panel of B) and read-depth (bottom left panel of B). Each point in the bottom left panel of B represents read-pair support for one AMPLseq locus. Positions on the y-axis indicate median read-pair support across replicate samples. Low sensitivity observed using native templates (grey) representing 10 and 100 3D7 parasites/μl suggests that clinical samples should be pre-amplified with sWGA (results at right). C) Ratio of 4CAST read-pairs from microhaplotypes assigned to 3D7 (x-axis) or Dd2 (y-axis) from 3D7 + Dd2 mock mixtures with strain ratios of 1:1 (tan), 3:1 (pink), and 10:1 (dark red), respectively. All samples contained 1000 parasites per μl in total, i.e., across both strains. Dashed lines represent the expected ratio, and each point represents a 4CAST locus per sample (n = 4 per condition). D) AMPLseq read-pair ratios observed in native mock mixtures (1000 parasites/μl) are plotted as above for 4CAST.

Unlike 4CAST, AMPLseq required sWGA for 3D7 mock samples representing 10 and 100 parasites/μl (Fig. 3B). Following sWGA on mock samples representing 10 parasites/μl, the assay generated ≥ 10 read-pairs at a median of 126 loci, with a median of 465 read-pairs after excluding loci with fewer than 10 reads. Values were statistically similar for pre-amplified samples representing 100 parasites/μl and increased to 692 read-pairs for pre-amplified samples representing 1000 parasites/μl (S3 Fig.).

We also validated the sensitivity of 4CAST and AMPLseq for genotyping polyclonal infections by using mock samples containing both 3D7 and Dd2 templates (likewise in 10 ng/μl human DNA). These mixtures featured Dd2 at 50% (i.e., 1:1 3D7:Dd2 ratio), 25% (3:1), and 9% (10:1) relative abundance. For all parasitemia levels assessed in native and pre-amplified mixtures (between 10 and 10000 parasites/μl), both 4CAST and AMPLseq generated microhaplotype calls with 100% precision at the 83 loci that are dimorphic between the 3D7 and Dd2 references (including all four 4CAST loci and an additional 79 loci in AMPLseq). This metric excludes two microhaplotypes classified as false positives in post-pipeline screening prior to precision analysis (S3 Supporting information).

4CAST showed high sensitivity for Dd2 without the need for sWGA. At 1000 parasites/μl, the assay detected Dd2-specific microhaplotypes at each of its four loci in all 1:1, 3:1, and 10:1 mixture replicates (Fig. 3C). At 100 parasites/μl, median Dd2 sensitivity remained 100% at 1:1 and 3:1 ratios but was slightly lower (94%) at 10:1. At 10 parasites/μl, 1:1 ratios yielded a median of 3 target loci for 3D7 and a median of 2 target loci for Dd2; median sensitivity in these samples rose to 3.5 and 3 loci (respectively) following pre-amplification with sWGA, but this led to unbalanced read-pair support between the two strains (S4A Fig.), possibly due to differential sWGA success on low-quality Dd2 vs. higher-quality 3D7 templates. 4CAST read-pair ratios generated from native templates, by contrast, showed a strong correlation with input ratios at 100 parasites/μl (S4A Fig.) and 1000 parasites/μl (Fig. 3C). Ratios were less informative at 10 parasites/μl (S4A Fig.).

AMPLseq was also successful in detecting Dd2-specific microhaplotypes, but only at a maximum of 77 of 83 dimorphic loci (in the 1:1 ratio at 10000 parasites/μl). With sWGA, Dd2-specific sequences were detected at a minimum of two dimorphic loci for all three input ratios (1:1, 3:1, 10:1) and parasitemia levels (≥ 10 parasites/μl) assessed. Like with 4CAST, however, the use of sWGA decorrelated read-pair ratios from input ratios S4B Fig.). This decorrelation was not observed with native templates (Fig. 3D) at 1000 and 10000 parasites/μl for which AMPLseq achieves high read-pair support without the use of sWGA.

We also tested both panels on genomic DNA extracted from dried blood spots collected by the Guyana Ministry of Health in 2020 from individuals diagnosed as P. falciparum-positive via rapid diagnostic test (RDT). Ten Guyanese samples were tested with both panels, and an additional six were tested with AMPLseq. Using 4CAST, we observed coverage across all loci in all samples, with a median depth per locus of 1162 read-pairs without sWGA (Fig. 3A). Using AMPLseq (with sWGA), we observed a median of 122 loci with ≥ 10 read-pairs and a median depth of 298 read-pairs covered locus (Fig. 3B).

Additionally, we tested both panels on gDNA extracted from 16 dried blood spot samples collected in Mali in 2011 (Tran et al., 2013) and subsequently stored at room temperature for ten years. Using 4CAST (without sWGA), we observed a median depth of 407 read-pairs per locus (Fig. 3A). Using AMPLseq (with sWGA), we observed a median of 75 loci with ≥ 10 read-pairs and a median depth of 112 read-pairs covered locus (Fig. 3B).

Evaluation of panel performance for relatedness

We used the R package paneljudge to assess in silico the impact of choosing a specific genotyping panel for relatedness inference. Considering the choice of panel, we evaluated relatedness estimation from data simulated on our 4CAST and AMPLseq panels, the SpotMalaria v2 (Jacob et al., 2021) and Paragon HeOME v1 (Tessema et al., 2020) amplicon panels, and a barcode of 24 SNPs (Daniels et al., 2008). When data were simulated using microhaplotype frequency estimates of Senegalese parasites, we found that almost all estimates of unrelated or clonal pairs were correctly classified, regardless of the panel (Fig. 4A). All three large panels also performed similarly well in accurately identifying related (but not clonal) parasite pairs, despite being the product of three distinct design processes. Neither 4CAST nor the 24 SNP barcode estimated relatedness for partially-related samples as well as the larger panels. We also evaluated panel performance in less diverse parasite populations (Colombia and Thailand), including a population not used in the panel designs (Colombia). We repeated the simulations using microhaplotype frequencies estimated with these data. Again, we found that all panels performed well for estimating relatedness of clonal pairs, and that 4CAST and the 24 SNP barcode were less likely to have correctly classified estimates of non-clonal pairs. With the data simulated using Colombian microhaplotype frequencies from the Pacific Coast region, all three large panels performed well for all three relatedness values, despite the Colombian data not having informed the design of any of the panels.

Figure 4. In silico relatedness estimation comparisons among panels and empirical AMPLseq validation against WGS.

Figure 4.

A) Evaluation of relatedness estimation from data simulated on genotyping panels using the paneljudge R package. Pairs of haploid genotypes were simulated at each locus of a panel, using microhaplotype frequencies estimated from a given parasite population (Colombia, Thailand, or Senegal, as shown in columns from left to right). Genotype pairs were simulated at three levels of relatedness: unrelated (relatedness = 0.01), related (relatedness = 0.50), and clonal (relatedness = 0.99), as shown in the rows from top to bottom. Relatedness estimates of these pairs were classified using their 95% confidence intervals (LCI = lower limit of the 95% confidence interval, UCI = upper limit of the 95% confidence interval). Estimates could be correctly classified, misclassified, or unclassified, as described in the grey box. Each bar represents the proportion of simulations per condition (n = 400) classified in each category. Bars that are filled with a color represent correctly classified simulations, bars that are hashed represent misclassified simulations, and bars that are filled with white represent simulations that were unable to be classified. The colors of the bars represent the panel used in that set of simulations. B) Empirical AMPLseq results recapitulate WGS-based relatedness inference. Points represent relatedness estimates (hmmIBD (Schaffner et al., 2018) ‘fract_sites_IBD’ computed under default settings) for pairs of Guyanese samples using WGS (n = 9408 variants) vs. AMPLseq (n = 220 variants, from within 128 AMPLseq P. falciparum loci).

Pairwise relatedness estimates (Schaffner et al., 2018) from AMPLseq correlated highly with those from WGS data available for the Guyanese sample set (Pearson’s r = 0.86, slope = 1.01, p < 0.001) (Fig. 4B). Despite patient travel history metadata suggesting infections to have occurred in various geographic regions of Guyana (S1 Table), AMPLseq relatedness estimates for the Guyanese sample set were significantly higher than those for the Malian sample set (Mann Whitney U, p < 0.001), consistent with lower parasite population diversity anticipated in the Guiana Shield (Carrasquilla et al., 2022; Yalcindag et al., 2012). Nevertheless, the wide range of highly WGS-correlated AMPLseq relatedness estimates (0.007 – 1) observed among Guyanese sample comparisons suggests AMPLseq capacity to indicate epidemiologically relevant microstructure even in relatively unstructured parasite populations, as is achievable via WGS (Mathieu et al., 2020). For example, even within the very small Guyanese sample set analyzed in this study, we could detect an enrichment of the first (lowest) quartile of pairwise relatedness estimates for comparisons involving A2-GUY and C5-GUY, two highly related samples that in WGS analysis show 50% relatedness with a sample from Venezuela (SPT26229, see S4 Table). Larger AMPLseq sample sets from bordering countries such as Guyana and Venezuela may thus indicate cases of parasite importation or introgression between spatially proximate regions in which disease ecology and management is distinct.

Geographic attribution

We again engineered amplicon data in silico to evaluate the relative signal in genotyping panels for geographic attribution of samples. We sub-sampled WGS variant calls, called microhaplotypes, and evaluated these data using principal component analysis (PCA) (S5 Fig.) and t-SNE visualizations (Fig. 5). We found that all three larger panels (AMPLseq, SpotMalaria, and Paragon) distinguished non-African samples by country, and these panels separated East African (Malawi) from West African samples (Mali/Senegal) to varying degrees; no panel was able to distinguish between Malian and Senegalese samples in these visualizations. Results from both the 24 SNP barcode (Daniels et al., 2008) and 4CAST distinguished samples by continent of origin, though not by country. We also added empirical AMPLseq data from 5 Guyanese samples (C3-GUY, C4-GUY, C5-GUY, C7-GUY, and C8-GUY) and WGS data sub-sampled to AMPLseq coordinates for Venezuelan sample SPT26229 (S6 Fig.). The AMPLseq samples formed a small cluster beside the WGS-based Guyanese and French Guianese samples. The Venezuelan sample SPT26229 also placed on the perimeter of the Guyana/French Guiana sample cluster, sharing the same axis-2 position as the empirical AMPLseq points. Results suggest that empirical AMPLseq data can distinguish autochthonous samples from the Guiana Shield, and we expect geographic attribution in the region to improve as more data are collected from infections originating in Venezuela and other undersampled localities.

Figure 5. In silico geographic attribution comparison among panels.

Figure 5.

Visualization of WGS data subsetted to coordinates of genotyping panels. Microhaplotypes called at each locus were visualized using tSNE representation, with parameter Θ of 0.0, 5000 iterations, and a perplexity parameter of 30. Each point represents a single sample, with color and shape representing its country of origin (which was not included in the tSNE algorithm). One genotyping panel is visualized in each plot: (A) AMPLseq, (B) 4CAST, (C) SpotMalaria v2, (D) Paragon v1, (E) 24 SNP barcode.

COI estimation

We evaluated COI estimation based on 4CAST as opposed to the single locus AMA1, which is commonly used for this purpose, alone or with a single additional locus (Lerch et al., 2017; Miller et al., 2017; Nelson et al., 2019). We engineered in silico samples with COI ranging from 2 to 10 (100 engineered samples per COI level) and used the maximum number of unique microhaplotypes present at any locus as a simple data summary method to estimate COI. 4CAST provided more accurate estimates of COI than AMA1 alone in these simulated data, especially at simulated COI levels between 5 and 7. (Fig. 6A). Estimation improved at engineered COI = 8 using AMPLseq (S7 Fig.), but to reap the full benefit of the larger panel in practice will require an inferential approach that accounts for both chance sharing of alleles using population allele frequencies and for variable coverage/sensitivity among loci.

Figure 6. In silico and empirical complexity of infection (COI) inference.

Figure 6.

A) Scatter plots of estimated COI for samples engineered in silico from combinations of monoclonal WGS data, subsetted to the loci of interest (AMA1 locus or 4CAST loci). The x-axis represents the number of monoclonal genomes combined into each simulation, and the y-axis represents the COI estimated using the simulated data. COI was naively estimated as the maximum number of unique microhaplotypes present at any locus per sample (n = 100 samples per condition). Each point represents a sample, jittered for visibility. The black bars represent the median and light grey boxes represent the 25th – 75th quantiles. B) Estimated COI for clinical samples sequenced using 4CAST. COI was again estimated as the maximum number of unique microhaplotypes present at any locus in the sample.

In the absence of such an algorithm, we proceeded with the simple data summary method above to classify COI in the Malian and Guyanese clinical samples. Only a single polyclonal infection (C6-GUY) occurred among Guyanese samples assayed by 4CAST. The repeated detection of two CSP and SERA2 alternate alleles at depths ranging from 32 to 168 read-pairs enabled unambiguous COI = 2 classification for the sample. WGS sequencing coverage, by contrast, detected only moderately elevated SNP heterozygosity (1.9%) in C6-GUY and this elevation was not sufficient to classify COI >1 via The Real McCoil (Chang et al., 2017) (S8 Fig.). AMPLseq also identified COI = 2 for C6-GUY but without consistent support(2 loci presenting 2 alleles in replicate 1 and 6 loci presenting 2 alleles in replicate 2). Six additional Guyanese samples were assayed by AMPLseq and one was classified as COI = 2. This sample (A5-GUY) gave a stronger minor variant signal in both AMPLseq (15 loci presenting 2 alleles in both replicates) and WGS data (10.9% SNP heterozygosity) (S8 Fig.).

For the Malian sample set, 4CAST and AMPLseq both classified samples E5-PST030 and C6-PST063 as monoclonal and all other samples as polyclonal based on the presence/absence of sample loci with multiple alleles. Sensitivity for minor strains was depth-dependent in both assays, as reducing 4CAST depth via subsampling reduced estimated COI (Fig. 6B, S9A Fig.) and increasing AMPLseq depth via sequencing batch size reduction increased estimated COI (S9BC Fig.). These results emphasize the importance of incorporating read-depth and expected diversity metrics per locus into COI inference algorithms, especially when depth is not concentrated or evenly spread across a panel’s most polymorphic loci.

Longitudinal sampling: distinguishing continuing vs. newly acquired infections

We used 4CAST to examine longitudinal samples that were likely to be diverse and polyclonal. We sequenced samples from two asymptomatic individuals in the longitudinal Mali cohort over multiple biweekly visits during the transmission season (July to December) (Fig. 7) (Tran et al., 2013). In the first individual (Fig. 7A), we detected a single microhaplotype at each locus that was present in the first two time points, suggesting a continued infection during the two weeks between sample collection. At the third time point, we detected a single, distinct microhaplotype at each locus, suggesting that a new infection had occurred and the original infection had disappeared or decreased below our limit of detection (< 10 parasites/μl). In the second individual (Fig 7B), we detected a similar pattern of strain turnover: a monoclonal infection at the first time point, followed by a distinct, polyclonal infection at the second time point. We also examined two individuals over a longer series of visits (S10 Fig.), in which we detected a series of polyclonal infections, with some strains sustained over many time points (S10A Fig.), and (S10B Fig.) a series of distinct monoclonal infections. In all cases, the individuals were asymptomatic and did not receive anti-malarial treatment between visits; however, these simple examples demonstrate the clarity that 4CAST can bring to tracking infection turnover in longitudinal studies and suggests its potential in distinguishing recrudescence vs. reinfection in therapeutic efficacy studies.

Figure 7. Longitudinal tracking of infections using 4CAST.

Figure 7.

Identification of distinct microhaplotypes present in samples from two individuals. The x-axis represents consecutive time points, and the y-axis represents the individual microhaplotypes identified, grouped by locus. Colored points represent the presence of that microhaplotype, connected when present in multiple visits.

Drug resistance profiling

AMPLseq loci in dhfr, mdr1, dhps, kelch13, and mdr2 contain ten sequence regions that code for various amino acid (AA) polymorphisms that have previously been associated with resistance to antimalarial drugs (Ariey et al., 2014; Miotto et al., 2015; Mita et al., 2007; Veiga et al., 2016). Fourteen of these 18 positions of interest contained nonsynonymous mutations in Malian and Guyanese clinical samples of this study (Fig. 8). Positions of interest that lacked mutations across both sample sets were dhfr AA 164; dhps AA 613; kelch13 AA 580; and mdr2 AA 484. All Guyanese sequences shared the same mutant alleles at many loci, suggestive of fixed mutant alleles. Malian samples, by contrast, did not show fixed mutant alleles at any amino acid position of interest. A mix of mutant and wildtype alleles occurred among Malian samples for dhfr AA 51, 59, and 108; mdr1 AA 86, 184, and 1246; dhps AA 436 and 437; and mdr2 AA 492. A previously reported synonymous polymorphism was observed in one Malian sample at kelch13 AA 589 (Taylor et al., 2015).

Figure 8. Drug resistance-associated sequence profiling in Guyanese and Malian clinical samples.

Figure 8.

Bars in the top plot indicate the occurrence of various drug resistance-associated amino acid changes within AMPLseq loci. Positions of interest assayed by AMPLseq but without mutant alleles (see x-marks) in the clinical samples profiled here are labeled in parentheses. Positions 484 and 492 in mdr2 have been suggested to be involved in artemisinin resistance despite lack of experimental data showing an association with a clinical phenotype (Chenet et al., 2017; Miotto et al., 2015). Bottom plot indicates chromosomal and amino acid (AA) positions of each drug resistance-associated AMPLseq locus (excluding primer binding sites). Asterisk indicates synonymous mutation within kelch13.

P. falciparum and P. vivax co-infection detection

To test the ability of AMPLseq to detect P. vivax co-infections via co-amplification of PvDHFR, two additional Guyanese blood spot samples that had been diagnosed as P. vivax-only (G4G430) and P. vivax + P. falciparum co-infection (G4G180) via RDT were included in the sample set. These samples did not undergo sWGA.

PvDHFR was detected at high depth in both samples (1068 – 1822 read-pairs for G4G430 and 234 – 560 read-pairs for G4G180) (Fig. 9). Only G4G180 also showed read-pair support at P. falciparum loci (> 10 read-pairs at 100 – 115 loci). PvDHFR was not detected in any native or pre-amplified 3D7 or mixed-strain (3D7 + Dd2) templates. This demonstrates high specificity of both PvDHFR and P. falciparum AMPLseq primers to their intended target species without any apparent amplification inhibition by the presence of congeneric DNA.

Figure 9.

Figure 9.

Plasmodium vivax detection by AMPLseq. The left panel demonstrates strong read-pair support for PvDHFR (green circle) in native control samples previously suggested to contain P. vivax (Pv; G4G430) and P. vivax + P. falciparum (Pv + Pf; G4G180) via RDT. The right panel shows native and sWGA results for C7-GUY, a clinical sample that appears to have been misdiagnosed as Pf-only prior to AMPLseq. Blue circles represent read-pair support for P. falciparum loci. Positions on the y-axis indicate median read-pair support across two sample replicates. Box and whiskers indicate quartiles.

PvDHFR was also detected at low levels (16 – 30 read-pairs) in both native template replicates of C7-GUY, one of the sixteen Guyanese samples previously diagnosed as P. falciparum-only via RDT. Surprisingly, two PvDHFR read-pairs were also detected in one of the two sWGA replicates from the sample, despite the expectation that sWGA would primarily amplify P. falciparum sequences. Sensitivity of PvDHFR detection in pre-amplified samples could be enhanced by adding PvDHFR primers to the P. falciparum sWGA primer pool. PvDHFR detection did not occur in any Malian sample, consistent with low prevalence of P. vivax in West Africa relative to the Guiana Shield.

Discussion

The utility of AmpSeq for molecular surveillance of infectious diseases is evidenced by the growing number of protocols recently published or under development for Plasmodium and other pathogen taxa (Aydemir et al., 2018; Fola et al., 2020; Jacob et al., 2021; Mitchell et al., 2021; Moser et al., 2021; Ruybal-Pesántez et al., 2021; Schwabl et al., 2020; Tessema et al., 2020). Here, we demonstrate the performance of two new panels for P. falciparum, designed to serve different use cases and exhibiting different per-sample costs and levels of complexity. Our comparative analyses of these two new panels, 4CAST and AMPLseq, relative to previously published genotyping panels demonstrate that they perform comparably to existing panels of similar composition across use cases, in a diversity of geographic settings, despite different geographic representation in the population genomic data used to inform their designs. This suggests that de novo custom panel design may not be required for accurate COI and relatedness estimation in parasite populations from previously unstudied geographic regions. We therefore suggest that future implementation of these panels and future designs for other organisms should be guided by three criteria: 1) the intended use cases for the data; 2) protocol complexity and compatibility with available instruments and expertise; and 3) protocol customizability for locally relevant genetic loci.

Considering the first of these criteria, intended use case, our investigations above suggest a straightforward mapping of panels by size and feature to use case. The small 4CAST panel is well suited to COI estimation (Fig. 6), and profiles four highly diverse antigens for the same effort and cost traditionally used to profile a single locus. Because of the very high diversity of the loci in the 4CAST panel in most parasite populations, this panel is also well suited to any application requiring genetic delineation of distinct parasite lineages (Fig. 7). In therapeutic efficacy studies, for example, it is essential to determine whether subjects who become parasitemic following drug treatment are exhibiting a recrudescence of an incompletely-cleared strain from the initial infection (which could indicate treatment failure), or if they have become reinfected with a distinct parasite strain subsequent to treatment. We suggest that the 4CAST panel would be significantly more informative than traditional genotyping approaches used in therapeutic efficacy studies, such as profiling length polymorphisms or allele-specific amplification in the msp1/msp2/glurp genes (Reeder & Marshall, 1994; Snounou, 2002), especially if coupled with an inferential approach that accounts for some chance sharing of alleles dependent on their population frequencies. 4CAST is also more cost-effective than independent monoplex amplification and Illumina sequencing of individual loci (Early et al., 2019; Gruenberg et al., 2019; Lerch et al., 2017).

Our work demonstrates that the AMPLseq panel performs comparably to two existing multiplexed amplicon sequencing panels of similar size (Jacob et al., 2021; Tessema et al., 2020) for any use case reliant on estimation of parasite relatedness (Fig. 4), despite different design criteria and datasets that informed the panels. Potential public health use cases that employ relatedness information include measuring the connectivity of parasites between locations to define units of control, and monitoring changes in the level of transmission (Cerqueira et al., 2017; Daniels et al., 2015; Knudson et al., 2020). The AMPLseq panel and its peers are well suited to detecting imported vs. local infections given their capacity to distinguish parasites from distinct countries, as long as population genetic differentiation is sufficiently high (Fig. 5.) Finally, the larger panels offer the capacity to monitor genetic markers associated with drug resistance (Fig. 8) or, in some panels, detect co-infection with other Plasmodium species (Fig. 9).

The second panel selection criterion, protocol complexity and compatibility with available instruments, should be prefaced with a reminder that all of these protocols employ nested PCR reactions as the fundamental mechanism to produce sequencing libraries targeting small genomic regions of interest. Equipped with a few key instruments, most laboratories with access to pre- and post-PCR hood space are likely capable of the nested PCR library construction approach. Key instruments needed are a centrifuge, thermocycler, vortexer, magnetic rack, fragment size analyzer (e.g., Bioanalyzer or TapeStation) and DNA quantitation device (e.g., Qubit fluorometer). The latter items are necessary for careful clean-up of inappropriately large or small DNA molecules and precise quantification of libraries for optimal loading on the sequencing flow cell. Sequencing can be performed using a small instrument such as the Illumina iSeq100 or on larger platforms shared by multiple groups. Batch sizes (i.e., number of samples pooled in a library) will depend on the available platform and on the read- depth needed for the study objective and transmission system at hand. COI inference in high-transmission regions, for example, typically requires greater depth (potentially smaller batch sizes) than does relatedness analysis focusing on monoclonal parasites. Another important consideration for most any Illumina-based amplicon sequencing protocol is the high level of precaution required to avoid sample contamination. PCR reactions should be conducted in dedicated hoods using dedicated pipettes, ideally also with downstream sample processing confined to rooms or locations physically removed from those where native templates are processed. A centrifuge (or plate spinner) is listed as a requirement above primarily for its role in moving sample liquids away from the top seals of plates, which should always be handled with special care.

Though the AmpSeq protocols highlighted herein share many common features, they differ in other aspects that may impact implementation. Whereas the 4CAST and Paragon panels perform well on native DNA (both tested with samples as low as 10 parasites/μl), the AMPLseq and SpotMalaria panels require sWGA pre-amplification prior to the first PCR reaction to ensure adequate performance for low-parasitemia samples, which may represent a significant proportion of samples in some settings. The sWGA step occurs using an isothermal amplification protocol that is relatively simple to perform but requires an expensive phi29 DNA polymerase and a magnetic bead-based post-reaction cleanup of individual samples, for an approximate additional cost of $8 USD per sample at the time of writing. Though not large in absolute terms, this cost is comparable to the cost of the AMPLseq or 4CAST protocols themselves, which range from $5 – 10 USD per sample without sWGA, depending on details of implementation such as sequencing instrument and sample indexing per run. The larger panels additionally employ differing numbers of first-round PCR reactions and require a varying number of magnetic bead-based cleanups to tailor the length profile of intermediate products (summarized in S5 Table), which means that the local capacity for automating the bead-based cleanups is a relevant implementation consideration.

An additional limitation demonstrated here for AMPLseq regards incomplete panel amplification from relatively low-parasitemia and/or older sample material. This limits potential application to older and/or degraded sample collections unless lower batch sizes (S9C Fig.) or larger (more expensive) sequencing instruments are utilized. The impact of DNA integrity on panel performance should be further assessed in future work. Additional performance assessment using parasitemias formulated prior to DNA extraction (e.g., diluting ring-stage parasites) would also enhance future sensitivity tests.

The third criterion for panel selection, customizability, may be most relevant for the drug resistance surveillance use case, given differences in the geographic distribution of important drug resistance markers, and varying coverage of known markers by the existing panels. All of the protocols are amenable to customization through the addition of independent target amplifications in the first round of PCR, which could be combined with other first-round PCR multiplex products prior to the second PCR. A more elegant customization approach would be to add (or subtract) targets from the first-round PCR reaction. While complicated bioinformatic pipelines are typically necessary in the design of large multiplexes, in our experience, small multiplexes like 4CAST, which was made from pre-existing primer pairs designed independently, may simply function without optimization, and could presumably be augmented with a small number of additional loci. Though the AMPLseq multiplex of 129 PCR loci benefited from careful design of the original panel via GTseek, we successfully added 4CAST,drug resistance and PvDHFR loci to the GTseek target set without any primer modifications. The AMPLseq panel is thus likely receptive to further augmentation. As the AMPLseq and 4CAST protocols utilize unmodified, commercially available oligos as primers, further customization should be feasible in any setting. However, we must note that not all targets are amenable to incorporation into the multiplex, as we failed despite multiple attempts to include amplicons targeting the pfcrt gene associated with chloroquine resistance (Fidock et al., 2000), or the hrp2/3 genes, which can contain deletions that lead to false-negative diagnosis via rapid diagnostic test (Gamboa et al., 2010).

The proliferation of new AmpSeq protocols for molecular surveillance of infectious and non-infectious organisms raises the important question of whether it is valuable for each field to converge on a single approach or common panel per organism. Factors precluding a completely homogeneous approach include varying instrumentation, expertise, and use cases for the data across settings, in addition to elucidation of new drug resistance markers of interest and an anticipated onward diversification of targeted genotyping technologies. For example, in the Plasmodium field, other productive multiplexed genotyping approaches include molecular inversion probes (Aydemir et al., 2018; Moser et al., 2021) or multi-copy (VAR) gene metabarcoding analysis (Day et al., 2017). Factors favoring convergence within the AmpSeq domain include opportunities for coordinated procurement of instruments and reagents in low and middle-income countries, and opportunities to directly compare observations between studies and surveillance efforts led by different groups. This latter factor, which we term portability of analyses, has the potential to provide regional or global insight through syntheses across studies. However, the portability of certain analyses is hampered by ascertainment bias, an inherent limitation of any targeted sequencing approach for analyses based on the genotypic state of select loci in different countries. That is, a panel designed based on observations of genetic diversity through WGS in countries A and B may not provide a fair means of comparing diversity in countries C vs. D, if diversity there is distributed differently in the genome than in countries A and B. WGS is the ultimate tool for avoiding this bias. However, the problems of comparing populations profiled with different panels may be mitigated by comparing inferred relatedness levels within populations rather than actual genotypic diversity measures (Taylor et al., 2019). Overlap of loci among panels would further facilitate direct estimation of relatedness between samples included in different studies as in Carrasquilla et al., 2022, where the importance of confidence intervals around relatedness estimates is highlighted. The AMPLseq panel we describe here contains a significant number (n=47) of targets from the SpotMalaria panel, and we expect that future P. falciparum panel designs will also tend to exhibit some degree of overlap with other panels, both by deliberate design and through blind convergence based on key genomic features, such as high diversity and sequence amenability to PCR primer design.

As molecular surveillance efforts for malaria parasites and other organisms are more widely adopted and become increasingly diverse, it will be essential for the community to develop standardized approaches for the design, validation, interpretation, and sharing of targeted amplicon sequencing data. The paneljudge R package described here provides an excellent means to comparatively evaluate existing and hypothetical panel performance via data collected from previous population genomic surveys, and the bioinformatic analysis pipelines we have developed are suitable for interpreting Illumina data from diverse targets and panels in different organisms. We anticipate the growth of this field and the development of new analytical tools to extract even more epidemiological and ecological insight from increasingly large AmpSeq datasets for diverse taxa.

Supplementary Material

Supplemental Material
Supplementary Tables

Acknowledgments

This project has been funded in whole or in part with Federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Grant Number U19AI110818 to the Broad Institute. This project was also supported by an NIH R01 award to DN (R01AI141544), an award from the Bill and Melinda Gates Foundation to DN and COB (OPP1213366), and a Broad Institute NextGen Award to BM. The Mali cohort study was funded by the Division of Intramural Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health. The Colombian cohort study was supported by British Council Newton-Caldas Fund Institutional Links Award G1854. We thank MalariaGen for use of the Colombian WGS data. We thank Annie Laws for project management. We thank Dr. Nathan Campbell for assistance in the AMPLseq panel design and evaluation.

Footnotes

Data Accessibility and Benefit Sharing

Data Accessibility: All amplicon sequencing data, as well as WGS data from 16 Guyana samples, were submitted to the NCBI Sequence Read Archive (http://www.ncbi.nlm.nih.gov/sra) under accession PRJNA758191. This publication uses data from the MalariaGEN Plasmodium falciparum Community Project as described online pending publication and public release of dataset Pf7 (https://www.malariagen.net/resource/34); additional ENA accession numbers are available in Table S4. Previously published data from Guyana and French Guiana can be found at SRA BioProjects PRJNA543530 and PRJNA242182, respectively. Software and documentation can be found at https://github.com/broadinstitute/AmpSeQC (AmpSeQC pipeline), https://github.com/broadinstitute/malaria-amplicon-pipeline.git (Malaria amplicon pipeline), and https://github.com/artaylor85/paneljudge (paneljudge).

Benefits Generated: A research collaboration was developed with scientists from the countries providing genetic samples, all collaborators are included as co-authors, the results of research have been shared with the provider communities and the broader scientific community (see above), and the research addresses a priority concern, in this case the public health concern of malaria. More broadly, our group is committed to international scientific partnerships, as well as institutional capacity building.

References

  1. [dataset]LaVerriere E, Schwabl P, Carrasquilla M, Taylor AR, Johnson ZM, Shieh M, Panchal R, Straub TJ, Kuzma R, Watson S, Buckee CO, Andrade CM, Portugal S, Crompton PD, Traore B, Rayner JC, Corredor V, James K, Cox H, Early AM, MacInnis BL, Neafsey DE (2021). Data from: Design and implementation of multiplexed amplicon sequencing panels to serve genomic epidemiology of infectious disease: a malaria case study. NCBI Sequence Read Archive. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA758191/ [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Ariey F, Witkowski B, Amaratunga C, Beghain J, Langlois A-C, Khim N, Kim S, Duru V, Bouchier C, Ma L, Lim P, Leang R, Duong S, Sreng S, Suon S, Chuor CM, Bout DM, Ménard S, Rogers WO, … Ménard D (2014). A molecular marker of artemisinin-resistant Plasmodium falciparum malaria. Nature, 505(7481), 50–55. 10.1038/nature12876 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Aydemir O, Janko M, Hathaway NJ, Verity R, Mwandagalirwa MK, Tshefu AK, Tessema SK, Marsh PW, Tran A, Reimonn T, Ghani AC, Ghansah A, Juliano JJ, Greenhouse BR, Emch M, Meshnick SR, & Bailey JA (2018). Drug-Resistance and Population Structure of Plasmodium falciparum Across the Democratic Republic of Congo Using High-Throughput Molecular Inversion Probes. The Journal of Infectious Diseases, 218(6), 946–955. 10.1093/infdis/jiy223 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Baetscher DS, Clemento AJ, Ng TC, Anderson EC, & Garza JC (2018). Microhaplotypes provide increased power from short-read DNA sequences for relationship inference. Molecular Ecology Resources, 18(2), 296–305. 10.1111/1755-0998.12737 [DOI] [PubMed] [Google Scholar]
  5. Bybee SM, Bracken-Grissom H, Haynes BD, Hermansen RA, Byers RL, Clement MJ, Udall JA, Wilcox ER, & Crandall KA (2011). Targeted Amplicon Sequencing (TAS): A Scalable Next-Gen Approach to Multilocus, Multitaxa Phylogenetics. Genome Biology and Evolution, 3, 1312–1323. 10.1093/gbe/evr106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, & Holmes SP (2016). DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods, 13(7), 581–583. 10.1038/nmeth.3869 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Campbell NR, Harmon SA, & Narum SR (2015). Genotyping-in-Thousands by sequencing (GT-seq): A cost effective SNP genotyping method based on custom amplicon sequencing. Molecular Ecology Resources, 15(4), 855–867. 10.1111/1755-0998.12357 [DOI] [PubMed] [Google Scholar]
  8. Cantacessi C, Dantas-Torres F, Nolan MJ, & Otranto D (2015). The past, present, and future of Leishmania genomics and transcriptomics. Trends in Parasitology, 31(3), 100–108. 10.1016/j.pt.2014.12.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Carrasquilla M, Early AM, Taylor AR, Knudson A, Echeverry DF, Anderson TJ, Mancilla E, Aponte S, Cárdenas P, Buckee CO, Rayner JC, Sáenz FE, Neafsey DE, & Corredor V (2022). Resolving drug selection and migration in an inbred South American Plasmodium falciparum population with identity-by-descent analysis (p. 2022.02.18.480973). bioRxiv. 10.1101/2022.02.18.480973 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Cerqueira GC, Cheeseman IH, Schaffner SF, Nair S, McDew-White M, Phyo AP, Ashley EA, Melnikov A, Rogov P, Birren BW, Nosten F, Anderson TJC, & Neafsey DE (2017). Longitudinal genomic surveillance of Plasmodium falciparum malaria parasites reveals complex genomic architecture of emerging artemisinin resistance. Genome Biology, 18(1), 78. 10.1186/s13059-017-1204-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Chang H-H, Wesolowski A, Sinha I, Jacob CG, Mahmud A, Uddin D, Zaman SI, Hossain MA, Faiz MA, Ghose A, Sayeed AA, Rahman MR, Islam A, Karim MJ, Rezwan MK, Shamsuzzaman AKM, Jhora ST, Aktaruzzaman MM, Drury E, … Buckee C (2019). Mapping imported malaria in Bangladesh using parasite genetic and human mobility data. ELife, 8, e43481. 10.7554/eLife.43481 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Chang H-H, Worby CJ, Yeka A, Nankabirwa J, Kamya MR, Staedke SG, Dorsey G, Murphy M, Neafsey DE, Jeffreys AE, Hubbart C, Rockett KA, Amato R, Kwiatkowski DP, Buckee CO, & Greenhouse B (2017). THE REAL McCOIL: A method for the concurrent estimation of the complexity of infection and SNP allele frequency for malaria parasites. PLoS Computational Biology, 13(1), e1005348. 10.1371/journal.pcbi.1005348 [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Chenet SM, Akinyi Okoth S, Huber CS, Chandrabose J, Lucchi NW, Talundzic E, Krishnalall K, Ceron N, Musset L, Macedo de Oliveira A, Venkatesan M, Rahman R, Barnwell JW, & Udhayakumar V (2016). Independent Emergence of the Plasmodium falciparum Kelch Propeller Domain Mutant Allele C580Y in Guyana. The Journal of Infectious Diseases, 213(9), 1472–1475. 10.1093/infdis/jiv752 [DOI] [PubMed] [Google Scholar]
  14. Chenet SM, Okoth SA, Kelley J, Lucchi N, Huber CS, Vreden S, Macedo de Oliveira A, Barnwell JW, Udhayakumar V, & Adhin MR (2017). Molecular Profile of Malaria Drug Resistance Markers of Plasmodium falciparum in Suriname. Antimicrobial Agents and Chemotherapy, 61(7), e02655–16. 10.1128/AAC.02655-16 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Dalmat R, Naughton B, Kwan-Gett TS, Slyker J, & Stuckey EM (2019). Use cases for genetic epidemiology in malaria elimination. Malaria Journal, 18(1), 1–11. 10.1186/s12936-019-2784-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, & 1000 Genomes Project Analysis Group. (2011). The variant call format and VCFtools. Bioinformatics, 27(15), 2156–2158. 10.1093/bioinformatics/btr330 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Daniels RF, Schaffner SF, Wenger EA, Proctor JL, Chang H-H, Wong W, Baro N, Ndiaye D, Fall FB, Ndiop M, Ba M, Milner DA, Taylor TE, Neafsey DE, Volkman SK, Eckhoff PA, Hartl DL, & Wirth DF (2015). Modeling malaria genomics reveals transmission decline and rebound in Senegal. Proceedings of the National Academy of Sciences of the United States of America, 112(22), 7067–7072. 10.1073/pnas.1505691112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Daniels RF, Volkman SK, Milner DA, Mahesh N, Neafsey DE, Park DJ, Rosen D, Angelino E, Sabeti PC, Wirth DF, & Wiegand RC (2008). A general SNP-based molecular barcode for Plasmodium falciparum identification and tracking. Malaria Journal, 7(1), 223. 10.1186/1475-2875-7-223 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M, McKenna A, Fennell TJ, Kernytsky AM, Sivachenko AY, Cibulskis K, Gabriel SB, Altshuler D, & Daly MJ (2011). A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics, 43(5), 491–498. 10.1038/ng.806 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Dupuis JR, Bremer FT, Kauwe A, San Jose M, Leblanc L, Rubinoff D, & Geib SM (2018). HiMAP: Robust phylogenomics from highly multiplexed amplicon sequencing. Molecular Ecology Resources, 18(5), 1000–1019. 10.1111/1755-0998.12783 [DOI] [PubMed] [Google Scholar]
  21. Early AM, Daniels RF, Farrell TM, Grimsby J, Volkman SK, Wirth DF, MacInnis BL, & Neafsey DE (2019). Detection of low-density Plasmodium falciparum infections using amplicon deep sequencing. Malaria Journal, 18(1), 219. 10.1186/s12936-019-2856-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Fidock DA, Nomura T, Talley AK, Cooper RA, Dzekunov SM, Ferdig MT, Ursos LM, Sidhu AB, Naudé B, Deitsch KW, Su XZ, Wootton JC, Roepe PD, & Wellems TE (2000). Mutations in the P. falciparum digestive vacuole transmembrane protein PfCRT and evidence for their role in chloroquine resistance. Molecular Cell, 6(4), 861–871. 10.1016/s1097-2765(05)00077-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Fola AA, Kattenberg E, Razook Z, Lautu-Gumal D, Lee S, Mehra S, Bahlo M, Kazura J, Robinson LJ, Laman M, Mueller I, & Barry AE (2020). SNP barcodes provide higher resolution than microsatellite markers to measure Plasmodium vivax population genetics. Malaria Journal, 19(1), 375. 10.1186/s12936-020-03440-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Galinsky K, Valim C, Salmier A, de Thoisy B, Musset L, Legrand E, Faust A, Baniecki M, Ndiaye D, Daniels RF, Hartl DL, Sabeti PC, Wirth DF, Volkman SK, & Neafsey DE (2015). COIL: a methodology for evaluating malarial complexity of infection using likelihood from single nucleotide polymorphism data. Malaria Journal, 14(1), 4. 10.1186/1475-2875-14-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Gamboa D, Ho M-F, Bendezu J, Torres K, Chiodini PL, Barnwell JW, Incardona S, Perkins M, Bell D, McCarthy J, & Cheng Q (2010). A Large Proportion of P. falciparum Isolates in the Amazon Region of Peru Lack pfhrp2 and pfhrp3: Implications for Malaria Rapid Diagnostic Tests. PloS One, 5(1). 10.1371/journal.pone.0008091 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Gardy JL, & Loman NJ (2018). Towards a genomics-informed, real-time, global pathogen surveillance system | Nature Reviews Genetics. Nature Reviews Genetics, 19, 9–20. 10.1038/nrg.2017.88 [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Gruenberg M, Lerch A, Beck H-P, & Felger I (2019). Amplicon deep sequencing improves Plasmodium falciparum genotyping in clinical trials of antimalarial drugs. Scientific Reports, 9(1), 1–12. 10.1038/s41598-019-54203-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Hargrove JS, McCane J, Roth CJ, High B, & Campbell MR (2021). Mating systems and predictors of relative reproductive success in a Cutthroat Trout subspecies of conservation concern. Ecology and Evolution, 11(16). 10.1002/ece3.7914 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Helb DA, Tetteh KKA, Felgner PL, Skinner J, Hubbard A, Arinaitwe E, Mayanja-Kizza H, Ssewanyana I, Kamya MR, Beeson JG, Tappero J, Smith DL, Crompton PD, Rosenthal PJ, Dorsey G, Drakeley CJ, & Greenhouse B (2015). Novel serologic biomarkers provide accurate estimates of recent Plasmodium falciparum exposure for individuals and communities. Proceedings of the National Academy of Sciences, 112(32), E4438–E4447. 10.1073/pnas.1501705112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Henden L, Lee S, Mueller I, Barry A, & Bahlo M (2018). Identity-by-descent analyses for measuring population dynamics and selection in recombining pathogens. PLoS Genetics, 14(5), e1007279. 10.1371/journal.pgen.1007279 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Jacob CG, Thuy-Nhien N, Mayxay M, Maude RJ, Quang HH, Hongvanthong B, Vanisaveth V, Ngo Duc T, Rekol H, van der Pluijm R, von Seidlein L, Fairhurst R, Nosten F, Hossain MA, Park N, Goodwin S, Ringwald P, Chindavongsa K, Newton P, … Miotto O (2021). Genetic surveillance in the Greater Mekong subregion and South Asia to support malaria control and elimination. ELife, 10, e62997. 10.7554/eLife.62997 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Jones S, Kay K, Hodel EM, Gruenberg M, Lerch A, Felger I, & Hastings I (2021). Should deep-sequenced amplicons become the new gold-standard for analysing malaria drug clinical trials? Antimicrobial Agents and Chemotherapy, AAC0043721. 10.1128/AAC.00437-21 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Kayiba NK, Yobi DM, Tshibangu-Kabamba E, Tuan VP, Yamaoka Y, Devleesschauwer B, Mvumbi DM, Okitolonda Wemakoy E, De Mol P, Mvumbi GL, Hayette M-P, Rosas-Aguirre A, & Speybroeck N (2021). Spatial and molecular mapping of Pfkelch13 gene polymorphism in Africa in the era of emerging Plasmodium falciparum resistance to artemisinin: a systematic review. The Lancet. Infectious Diseases, 21(4), e82–e92. 10.1016/S1473-3099(20)30493-X [DOI] [PubMed] [Google Scholar]
  34. Knudson A, González-Casabianca F, Feged-Rivadeneira A, Pedreros MF, Aponte S, Olaya A, Castillo CF, Mancilla E, Piamba-Dorado A, Sanchez-Pedraza R, Salazar-Terreros MJ, Lucchi N, Udhayakumar V, Jacob C, Pance A, Carrasquilla M, Apráez G, Angel JA, Rayner JC, & Corredor V (2020). Spatio-temporal dynamics of Plasmodium falciparum transmission within a spatial unit on the Colombian Pacific Coast. Scientific Reports, 10(1), 3756. 10.1038/s41598-020-60676-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Krijthe JH (2015). Rtsne: T-Distributed Stochastic Neighbor Embedding using a Barnes-Hut Implementation. https://github.com/jkrijthe/Rtsne
  36. Lautu-Gumal D, Razook Z, Koleala T, Nate E, McEwen S, Timbi D, Hetzel MW, Lavu E, Tefuarani N, Makita L, Kazura J, Mueller I, Pomat W, Laman M, Robinson LJ, & Barry AE (2021). Surveillance of molecular markers of Plasmodium falciparum artemisinin resistance (kelch13 mutations) in Papua New Guinea between 2016 and 2018. International Journal for Parasitology. Drugs and Drug Resistance, 16, 188–193. 10.1016/j.ijpddr.2021.06.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Lefterova MI, Budvytiene I, Sandlund J, Färnert A, & Banaei N (2015). Simple real-time PCR and amplicon sequencing method for identification of plasmodium species in human whole blood. Journal of Clinical Microbiology, 53(7), 2251–2257. 10.1128/JCM.00542-15 [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Lerch A, Koepfli C, Hofmann NE, Messerli C, Wilcox S, Kattenberg JH, Betuela I, O’Connor L, Mueller I, & Felger I (2017). Development of amplicon deep sequencing markers and data analysis pipeline for genotyping multi-clonal malaria infections. BMC Genomics, 18(1), 864. 10.1186/s12864-017-4260-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Li H (2013). Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. ArXiv:1303.3997 [Preprint]. http://arxiv.org/abs/1303.3997 [Google Scholar]
  40. Liu Y, Tessema SK, Murphy M, Xu S, Schwartz A, Wang W, Cao Y, Lu F, Tang J, Gu Y, Zhu G, Zhou H, Gao Q, Huang R, Cao J, & Greenhouse B (2020). Confirmation of the absence of local transmission and geographic assignment of imported falciparum malaria cases to China using microsatellite panel. Malaria Journal, 19(1), 244. 10.1186/s12936-020-03316-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. MalariaGEN Plasmodium falciparum Community Project. (2016). Genomic epidemiology of artemisinin resistant malaria. ELife, 5, e08714. 10.7554/eLife.08714 [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Mathieu LC, Cox H, Early AM, Mok S, Lazrek Y, Paquet J-C, Ade M-P, Lucchi NW, Grant Q, Udhayakumar V, Alexandre JS, Demar M, Ringwald P, Neafsey DE, Fidock DA, & Musset L (2020). Local emergence in Amazonia of Plasmodium falciparum k13 C580Y mutants associated with in vitro artemisinin resistance. ELife, 9, e51015. 10.7554/eLife.51015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, & DePristo MA (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research, 20(9), 1297–1303. 10.1101/gr.107524.110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Miles A, Iqbal Z, Vauterin P, Pearson R, Campino S, Theron M, Gould K, Mead D, Drury E, O’Brien J, Rubio VR, MacInnis B, Mwangi J, Samarakoon U, Ranford-Cartwright L, Ferdig M, Hayton K, Su X, Wellems T, … Kwiatkowski D (2016). Indels, structural variation, and recombination drive genomic diversity in Plasmodium falciparum. Genome Research, 26(9), 1288–1299. 10.1101/gr.203711.115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Miles A, pyup.io bot, Murillo R, Ralph P, Harding NJ, Pisupati Rahul, Rae Summer, & Millar Tim. (2020). cggh/scikit-allel: v1.3.2. Zenodo. 10.5281/zenodo.3976233 [DOI] [Google Scholar]
  46. Miller RH, Hathaway NJ, Kharabora O, Mwandagalirwa K, Tshefu A, Meshnick SR, Taylor SM, Juliano JJ, Stewart VA, & Bailey JA (2017). A deep sequencing approach to estimate Plasmodium falciparum complexity of infection (COI) and explore apical membrane antigen 1 diversity. Malaria Journal, 16(1), 490. 10.1186/s12936-017-2137-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Miotto O, Amato R, Ashley EA, MacInnis B, Almagro-Garcia J, Amaratunga C, Lim P, Mead D, Oyola SO, Dhorda M, Imwong M, Woodrow C, Manske M, Stalker J, Drury E, Campino S, Amenga-Etego L, Thanh T-NN, Tran HT, … Kwiatkowski DP (2015). Genetic architecture of artemisinin-resistant Plasmodium falciparum. Nature Genetics, 47(3), 226–234. 10.1038/ng.3189 [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Miotto O, Sekihara M, Tachibana S-I, Yamauchi M, Pearson RD, Amato R, Gonçalves S, Mehra S, Noviyanti R, Marfurt J, Auburn S, Price RN, Mueller I, Ikeda M, Mori T, Hirai M, Tavul L, Hetzel MW, Laman M, … Mita T (2020). Emergence of artemisinin-resistant Plasmodium falciparum with kelch13 C580Y mutations on the island of New Guinea. PLoS Pathogens, 16(12), e1009133. 10.1371/journal.ppat.1009133 [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Mita T, Tanabe K, Takahashi N, Tsukahara T, Eto H, Dysoley L, Ohmae H, Kita K, Krudsood S, Looareesuwan S, Kaneko A, Björkman A, & Kobayakawa T (2007). Independent Evolution of Pyrimethamine Resistance in Plasmodium falciparum Isolates in Melanesia. Antimicrobial Agents and Chemotherapy, 51(3), 1071–1077. 10.1128/AAC.01186-06 [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Mitchell RM, Zhou Z, Sheth M, Sergent S, Frace M, Nayak V, Hu B, Gimnig J, ter Kuile F, Lindblade K, Slutsker L, Hamel MJ, Desai M, Otieno K, Kariuki S, Vigfusson Y, & Shi YP (2021). Development of a new barcode-based, multiplex-PCR, next-generation-sequencing assay and data processing and analytical pipeline for multiplicity of infection detection of Plasmodium falciparum. Malaria Journal, 20(1), 92. 10.1186/s12936-021-03624-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Moser KA, Madebe RA, Aydemir O, Chiduo MG, Mandara CI, Rumisha SF, Chaky F, Denton M, Marsh PW, Verity R, Watson OJ, Ngasala B, Mkude S, Molteni F, Njau R, Warsame M, Mandike R, Kabanywanyi AM, Mahende MK, … Bailey JA (2021). Describing the current status of Plasmodium falciparum population structure and drug resistance within mainland Tanzania using molecular inversion probes. Molecular Ecology, 30(1), 100–113. 10.1111/mec.15706 [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Nader JL, Mathers TC, Swain MT, Robinson G, Chalmers RM, Hunter PR, van Oosterhout C, & Tyler KM (2019). Evolutionary genomics of anthroponosis in Cryptosporidium. Nature Microbiology, 4, 14. 10.1038/s41564-019-0377-x [DOI] [PubMed] [Google Scholar]
  53. Natesh M, Taylor RW, Truelove NK, Hadly EA, Palumbi SR, Petrov DA, & Ramakrishnan U (2019). Empowering conservation practice with efficient and economical genotyping from poor quality samples. Methods in Ecology and Evolution, 10(6), 853–859. 10.1111/2041-210X.13173 [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Neafsey DE, Juraska M, Bedford T, Benkeser D, Valim C, Griggs A, Lievens M, Abdulla S, Adjei S, Agbenyega T, Agnandji ST, Aide P, Anderson S, Ansong D, Aponte JJ, Asante KP, Bejon P, Birkett AJ, Bruls M, … Wirth DF (2015). Genetic Diversity and Protective Efficacy of the RTS,S/AS01 Malaria Vaccine. New England Journal of Medicine, 373(21), 2025–2037. 10.1056/NEJMoa1505819 [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Neafsey DE, Taylor AR, & MacInnis BL (2021). Advances and opportunities in malaria population genomics. Nature Reviews. Genetics. 10.1038/s41576-021-00349-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Nelson CS, Sumner KM, Freedman E, Saelens JW, Obala AA, Mangeni JN, Taylor SM, & O’Meara WP (2019). High-resolution micro-epidemiology of parasite spatial and temporal dynamics in a high malaria transmission setting in Kenya. Nature Communications, 10, 5615. 10.1038/s41467-019-13578-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. O’Neill EM, Schwartz R, Bullock CT, Williams JS, Shaffer HB, Aguilar-Miguel X, Parra-Olea G, & Weisrock DW (2013). Parallel tagged amplicon sequencing reveals major lineages and phylogenetic structure in the North American tiger salamander (Ambystoma tigrinum) species complex. Molecular Ecology, 22(1), 111–129. 10.1111/mec.12049 [DOI] [PubMed] [Google Scholar]
  58. Oude Munnink BB, Worp N, Nieuwenhuijse DF, Sikkema RS, Haagmans B, Fouchier RAM, & Koopmans M (2021). The next phase of SARS-CoV-2 surveillance: real-time molecular epidemiology. Nature Medicine. 10.1038/s41591-021-01472-w [DOI] [PubMed] [Google Scholar]
  59. Oyola SO, Ariani CV, Hamilton WL, Kekre M, Amenga-Etego LN, Ghansah A, Rutledge GG, Redmond S, Manske M, Jyothi D, Jacob CG, Otto TD, Rockett K, Newbold CI, Berriman M, & Kwiatkowski DP (2016). Whole genome sequencing of Plasmodium falciparum from dried blood spots using selective whole genome amplification. Malaria Journal, 15(1). 10.1186/s12936-016-1641-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Pelleau S, Moss EL, Dhingra SK, Volney B, Casteras J, Gabryszewski SJ, Volkman SK, Wirth DF, Legrand E, Fidock DA, Neafsey DE, & Musset L (2015). Adaptive evolution of malaria parasites in French Guiana: Reversal of chloroquine resistance by acquisition of a mutation in pfcrt. Proceedings of the National Academy of Sciences, 112(37), 11672–11677. 10.1073/pnas.1507142112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. R Core Team. (2021). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. https://www.R-project.org/ [Google Scholar]
  62. Reeder JC, & Marshall VM (1994). A simple method for typing Trypanosoma falciparum merozoite surface antigens 1 and 2 (MSA-1 and MSA-2) using a dimorphic-form specific polymerase chain reaction. Molecular and Biochemical Parasitology, 68(2), 329–332. 10.1016/0166-6851(94)90179-1 [DOI] [PubMed] [Google Scholar]
  63. Ruybal-Pesántez S, Sáenz FE, Deed S, Johnson EK, Larremore DB, Vera-Arias CA, Tiedje KE, & Day KP (2021). Clinical malaria incidence following an outbreak in Ecuador was predominantly associated with Plasmodium falciparum with recombinant variant antigen gene repertoires. MedRxiv [Preprint], 2021.04.12.21255093. 10.1101/2021.04.12.21255093 [DOI] [Google Scholar]
  64. Schaffner SF, Taylor AR, Wong W, Wirth DF, & Neafsey DE (2018). hmmIBD: software to infer pairwise identity by descent between haploid genotypes. Malaria Journal, 17(1), 196. 10.1186/s12936-018-2349-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Schmidt DA, Campbell NR, Govindarajulu P, Larsen KW, & Russello MA (2020). Genotyping-in-Thousands by sequencing (GT-seq) panel development and application to minimally invasive DNA samples to support studies in molecular ecology. Molecular Ecology Resources, 20(1), 114–124. 10.1111/1755-0998.13090 [DOI] [PubMed] [Google Scholar]
  66. Schwabl P, Maiguashca Sánchez J, Costales JA, Ocaña-Mayorga S, Segovia M, Carrasco HJ, Hernández C, Ramírez JD, Lewis MD, Grijalva MJ, & Llewellyn MS (2020). Culture-free genome-wide locus sequence typing (GLST) provides new perspectives on Trypanosoma cruzi dispersal and infection complexity. PLoS Genetics, 16(12), e1009170. 10.1371/journal.pgen.1009170 [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Snounou G (2002). Genotyping of Plasmodium spp. Nested PCR. Methods in Molecular Medicine, 72, 103–116. 10.1385/1-59259-271-6:103 [DOI] [PubMed] [Google Scholar]
  68. Takala-Harrison S, Jacob CG, Arze C, Cummings MP, Silva JC, Dondorp AM, Fukuda MM, Hien TT, Mayxay M, Noedl H, Nosten F, Kyaw MP, Nhien NTT, Imwong M, Bethell D, Se Y, Lon C, Tyner SD, Saunders DL, … Plowe CV (2015). Independent emergence of artemisinin resistance mutations among Plasmodium falciparum in Southeast Asia. The Journal of Infectious Diseases, 211(5), 670–679. 10.1093/infdis/jiu491 [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Taylor AR, Jacob PE, Neafsey DE, & Buckee CO (2019). Estimating Relatedness Between Malaria Parasites. Genetics, 212(4), 1337–1351. 10.1534/genetics.119.302120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Taylor AR, Schaffner SF, Cerqueira GC, Nkhoma SC, Anderson TJC, Sriprawat K, Pyae Phyo A, Nosten F, Neafsey DE, & Buckee CO (2017). Quantifying connectivity between local Plasmodium falciparum malaria parasite populations using identity by descent. PLoS Genetics, 13(10), e1007065. 10.1371/journal.pgen.1007065 [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Taylor SM, Parobek CM, DeConti DK, Kayentao K, Coulibaly SO, Greenwood BM, Tagbor H, Williams J, Bojang K, Njie F, Desai M, Kariuki S, Gutman J, Mathanga DP, Mårtensson A, Ngasala B, Conrad MD, Rosenthal PJ, Tshefu AK, … Juliano JJ (2015). Absence of Putative Artemisinin Resistance Mutations Among Plasmodium falciparum in Sub-Saharan Africa: A Molecular Epidemiologic Study. The Journal of Infectious Diseases, 211(5), 680–688. 10.1093/infdis/jiu467 [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Tessema SK, Hathaway NJ, Teyssier NB, Murphy M, Chen A, Aydemir O, Duarte EM, Simone W, Colborn J, Saute F, Crawford E, Aide P, Bailey JA, & Greenhouse B (2020). Sensitive, highly multiplexed sequencing of microhaplotypes from the Plasmodium falciparum heterozygome. The Journal of Infectious Diseases. 10.1093/infdis/jiaa527 [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Tessema S, Wesolowski A, Chen A, Murphy M, Wilheim J, Mupiri A-R, Ruktanonchai NW, Alegana VA, Tatem AJ, Tambo M, Didier B, Cohen JM, Bennett A, Sturrock HJ, Gosling R, Hsiang MS, Smith DL, Mumbengegwi DR, Smith JL, & Greenhouse B (2019). Using parasite genetic and human mobility data to infer local and cross-border malaria connectivity in Southern Africa. ELife, 8, e43510. 10.7554/eLife.43510 [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Trager W, & Jensen JB (1976). Human malaria parasites in continuous culture. Science (New York, N.Y.), 193(4254), 673–675. 10.1126/science.781840 [DOI] [PubMed] [Google Scholar]
  75. Tran TM, Li S, Doumbo S, Doumtabe D, Huang C-Y, Dia S, Bathily A, Sangala J, Kone Y, Traore A, Niangaly M, Dara C, Kayentao K, Ongoiba A, Doumbo OK, Traore B, & Crompton PD (2013). An Intensive Longitudinal Cohort Study of Malian Children and Adults Reveals No Evidence of Acquired Immunity to Plasmodium falciparum Infection. Clinical Infectious Diseases: An Official Publication of the Infectious Diseases Society of America, 57(1), 40–47. 10.1093/cid/cit174 [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J, Banks E, Garimella KV, Altshuler D, Gabriel S, & DePristo MA (2013). From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Current Protocols in Bioinformatics, 43, 11.10.1–11.10.33. 10.1002/0471250953.bi1110s43 [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Veiga MI, Dhingra SK, Henrich PP, Straimer J, Gnädig N, Uhlemann A-C, Martin RE, Lehane AM, & Fidock DA (2016). Globally prevalent PfMDR1 mutations modulate Plasmodium falciparum susceptibility to artemisinin-based combination therapies. Nature Communications, 7(1), 11553. 10.1038/ncomms11553 [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. WHO. (2019). World Malaria Report. https://www.who.int/publications-detail/world-malaria-report-2019
  79. Yalcindag E, Elguero E, Arnathau C, Durand P, Akiana J, Anderson TJ, Aubouy A, Balloux F, Besnard P, Bogreau H, Carnevale P, D’Alessandro U, Fontenille D, Gamboa D, Jombart T, Le Mire J, Leroy E, Maestre A, Mayxay M, … Prugnolle F (2012). Multiple independent introductions of Plasmodium falciparum in South America. Proceedings of the National Academy of Sciences of the United States of America, 109(2), 511–516. 10.1073/pnas.1119058109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Zhu SJ, Hendry JA, Almagro-Garcia J, Pearson RD, Amato R, Miles A, Weiss DJ, Lucas TC, Nguyen M, Gething PW, Kwiatkowski D, McVean G, & for the Pf3k Project. (2019). The origins and relatedness structure of mixed infections vary with local prevalence of P. falciparum malaria. ELife, 8. 10.7554/eLife.40845 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material
Supplementary Tables

RESOURCES