Abstract
Longitudinal tracking of individual Plasmodium falciparum strains in multi-clonal infections is essential for investigating infection dynamics of malaria. The traditional genotyping techniques did not permit tracking changes in individual clone density during persistent natural infections. Amplicon deep sequencing (Amp-Seq) offers a tool to address this knowledge gap. The sensitivity of Amp-Seq for relative quantification of clones was investigated using three molecular markers, ama1-D2, ama1-D3, and cpmp. Amp-Seq and length-polymorphism based genotyping were compared for their performance in following minority clones in longitudinal samples from Papua New Guinea. Amp-Seq markers were superior to length-polymorphic marker msp2 in detecting minority clones (sensitivity Amp-Seq: 95%, msp2: 85%). Multiplicity of infection (MOI) by Amp-Seq was 2.32 versus 1.73 for msp2. The higher sensitivity had no effect on estimates of force of infection because missed minority clones were detected in preceding or succeeding bleeds. Individual clone densities were tracked longitudinally by Amp-Seq despite MOI > 1, thus providing an additional parameter for investigating malaria infection dynamics. Amp-Seq based genotyping of longitudinal samples improves detection of minority clones and estimates of MOI. Amp-Seq permits tracking of clone density over time to study clone competition or the dynamics of specific, i.e. resistance-associated genotypes.
Introduction
Molecular-epidemiological parameters used to describe the infection dynamics of Plasmodium falciparum include the number of co-infecting parasite clones (multiplicity of infection, MOI), the rate at which different genotypes are acquired over time (molecular force of infection, molFOI) and duration of infection1. These measures are based on monitoring the presence or absence of clones in cross-sectional or longitudinal samples collected in regular intervals. In earlier studies, individual parasite clones in multi-clonal field samples were distinguished and tracked over time by genotyping the length-polymorphic marker merozoite surface protein 2 (msp2) by capillary electrophoresis-based fragment sizing (CE)2–4. Yet, msp2-CE genotyping has limited sensitivity for minority clone detection3,5,6. Alternative typing methods instead could perform better in detecting minority clones, but might impact measures of MOI and molFOI7,8. So far quantification of individual clones within multi-clonal infections was not feasible, as this would have required highly complex allele-specific quantitative PCR (qPCR).
SNP-based genotyping by deep amplicon sequencing (Amp-Seq) can detect low-abundant P. falciparum clones at ratios of 1:1000 in mixed infections7–9. Most importantly, genotyping by Amp-Seq also quantifies precisely the relative abundance of clones, as shown with artificial mixtures of clones9–11. From these ratios the absolute density of each clone (i.e. a certain haplotype) within a multi-clone infection can be deduced if the total parasitaemia of the sample was established by qPCR11. When analysing consecutive samples from a given study participant, presence and fluctuations in density of clones can be tracked. We explore how longitudinal information can be used to improve identification of minority clones with low densities around the detection limit.
A previous study has estimated clonal density with Amp-Seq in multi-clone infections to estimate clearance rates after antimalarial treatment11. We apply the same approach to track parasite clones longitudinally in untreated natural infections. In addition, we increase the resolution of genotyping by combining sequence information from several markers into multi-locus haplotypes.
Methods
Study design
A subset of 153 archived P. falciparum genomic DNA samples from 33 children (mean 4.3 samples [min: 2, max: 11]) aged 1–5 years were available from a cohort study described earlier12 with blood sampling over 40 weeks (first 12 weeks every fortnight, then monthly) in Papua New Guinea (PNG). The two conditions for selection of children were: ≥2/14 bleeds PCR positive, and MOI > 1 in at least one of the samples of each child. Ethical clearance was obtained from PNG Institute of Medical Research Institutional Review Board (IRB 07.20) and PNG Medical Advisory Committee (07.34). Informed written consent was obtained from all parents or guardians prior to recruitment of each child. All experiments were performed in accordance with relevant guidelines and regulations.
Genotyping using length polymorphic marker msp2
Samples were genotyped using the classical P. falciparum marker msp2 according to published protocols13. Fluorescently labelled nested PCR products were sized by CE on an automated sequencer and analysed using GeneMarker software. Fragments were accepted if the following cut-off criteria were met: peak height >500 intensity units and >10% of the height of the majority peak. Electropherograms were inspected visually to exclude obvious stutter peaks. All DNA samples were genotyped in 2 independent laboratories to assess reproducibility of clone detection and measures of MOI.
Marker selection for Amplicon deep sequencing
Amp-Seq was performed on three amplicons located in two different P. falciparum marker genes, namely PF3D7_0104100, “conserved Plasmodium membrane protein” (cpmp), and PF3D7_1133400, “apical membrane antigen 1” (ama1) whose genetic diversity has been studied in great detail7,14–16. Previously published primers were used for marker cpmp9. For ama1 two amplicons of 479 and 516 bp were selected that span regions of maximum diversity, i.e. subdomains 2 and 3 of the ectodomain17. Primer sequences and exact amplicon positions are listed in Supplementary Tables S1 and S2.
Sequencing library preparation
Sequencing libraries were generated by three rounds of PCR, according to previously published protocols9. After primary PCR, a 5′ linker sequence was added during nested PCR. Nested PCR products were subject to another PCR round with primers binding to the linker sequences and carrying Illumina sequence adapters plus an eight nucleotide long sample-specific molecular index to permit pooling of amplicons for sequencing and later de-multiplexing. The final sequence library was purified with NucleoMag beads. Sequencing was performed on an Illumina MiSeq platform in paired-end mode (2 × 250 bp) using Illumina MiSeq reagent kit v2 (500-cycles) together with Enterobacteria phage PhiX control (Illumina, PhiXControl v3).
Sequence read analysis and haplotype calling
Samples yielding a sequence coverage of <25 reads were excluded from the analysis. An overview of sequence read coverage for all Amp-Seq markers is given in Supplementary Table S3. Several pipelines to process Amp-Seq data have recently been published, including HaplotypR (https://github.com/lerch-a/HaplotypR.git) that was used for this study9,18–21. Haplotype calling is explained in full detail in an earlier publication9. In short: Low quality sequences were removed by trimming reads to a final size of 240 bp forward and 170 bp reverse for all amplicons. Index, linker and primer sequence (corresponding to ~50 bp) were trimmed off from forward and reverse reads. As the reference sequence, P. falciparum strain 3D7 was used (PlasmoDB release 3422). The term genotype refers to a single nucleotide polymorphism (SNP), whereas a haplotype was defined as a sequence variant of an entire amplicon. Calling a SNP required a >50% frequency of the total reads in at least two independent samples. Haplotypes containing insertions or deletions (indels) were filtered out, as well as haplotypes resulting from chimeric reads or singleton reads. The number of reads of a given haplotype over all remaining reads of the same marker within a sample is denoted by the term “within-host haplotype frequency”. Cut-off criteria for haplotype calling were as follows: a minimum of 3 reads coverage per haplotype, a within-host haplotype frequency ≥0.1% and an occurrence of this haplotype in ≥2 samples over the entire data set including technical replicates. The chosen cut-off criteria where studied in great detail and discussed in an earlier publication9.
Multi-locus haplotype inference in longitudinal samples
Amp-Seq quantifies the frequency of each haplotype within a sample. This permits the inference of multi-locus haplotypes, an approach also used earlier by software DEploid23. In this study a semi-automated procedure was applied for multi-locus haplotype inference that utilized longitudinal sample information to solve complex mixtures. A multi-locus haplotype was deduced iteratively and separately for each sample. In the first round, the multi-locus haplotype of the dominant clone of a sample was inferred by selecting each marker’s dominant haplotype (>54% within-host haplotype frequency, i.e. 50% + 3.8% standard deviation in within-host haplotype frequency between replicates). After each round, the identified dominant haplotype was ignored and in the following round the dominant haplotype was identified among the remaining reads. If several haplotypes occurred in a sample at similar frequencies, it may be impossible to identify the dominant haplotype. Nevertheless, in many cases this could be resolved by analysing the change in within-host haplotype frequency between the observed and preceding or succeeding sample of the same host. An example of our approach to multi-locus haplotype inference is shown in detail in Supplementary Text.
The final step of multi-locus haplotype inference addressed the problem of clones from a multiple infection that share by chance the same allele of one of the markers. As a consequence, the within-host frequency of a shared haplotype amounts to the sum of two or more independent clones carrying the same allele. In such cases multi-locus haplotypes were inferred by assigning the shared alleles to those haplotypes that summed up to the same proportion in the other two markers. Samples for which the multi-locus haplotype could not be established by this approach were considered unresolvable (Supplementary Table S4).
Reproducibility, sensitivity and false discovery rate
Samples were analysed in duplicate with Amp-Seq markers and msp2-CE. Performing duplicates permitted to identify and exclude false-positive haplotypes and thus prevented erroneous over-estimation of MOI. Each haplotype was classified into one of four groups (example see Supplementary Fig. S1): (1) True-positive (TP) haplotype, i.e. it passed the haplotype calling cut-off in both replicates or in one replicate plus in the preceding or succeeding bleed; (2) False-positive (FP) haplotype, i.e. it passed the haplotype calling cut-off in only one replicate and was not detected in any of the preceding or succeeding samples of that individual; (3) False-negative (FNi) haplotype, i.e. it was detected in one or both replicates but did not pass the cut-off criteria at that occasion, whereas it was detected in the preceding or succeeding bleed as TP (at least once) or FN haplotype; (4) Background noise (all other cases).
Additionally, false-negative (FNii) haplotypes were imputed for samples in which no sequence read was detected. These false-negative haplotypes were imputed only when (a) the haplotype was detected in the preceding as well as the succeeding bleed as a true-positive. Presence in only one of preceding or succeeding sample was not considered sufficient evidence for assuming a case of missed detection. For the Amp-Seq markers but not msp2-CE, false-negative haplotypes were also imputed when (b) data for the other two markers was present and the corresponding multi-locus haplotype was established in the preceding or succeeding sample.
The sensitivity to detect parasite clones was estimated based on selected individuals who had not received antimalarial treatment during the timespan analysed and harboured at least one haplotype that was detected at 3 consecutive bleeds. Sensitivity was defined as the true positive rate of a genotyping method and was calculated as TP/(TP + FN). The risk to falsely assign a haplotype not present in the sample was measured as the “false discovery rate” (FDR), calculated as FP/(TP + FP). This rate represents the extent of false haplotype calls of a genotyping method.
The reproducibility of clone detection in technical replicates (comprising all experiential procedures from PCR to sequence run) was calculated as , where n1 is the number of haplotypes detected in a single replicate and n2 the number of haplotypes detected in both replicates24. Only TP haplotypes were used to estimate reproducibility.
Epidemiological parameters: clone density, diversity, MOI and FOI
The density of a parasite clone was calculated by multiplying within-host haplotype frequency by parasitaemia (measured by qPCR). As late P. falciparum stages are absent from peripheral blood owing to sequestration, it was assumed that all detected clones were ring or early trophozoite stages, which each possess a single haploid genome. Thus, genome density correlates with clone density. Clone density is expressed as copies of target gene per microliter, quantified by qPCR targeting the 18S rRNA gene of P. falciparum25. The technical detection limit of qPCR was 0.4 copies/μl whole blood.
Based on true positive haplotypes, the expected heterozygosity (He) and mean MOI were determined from baseline (or first bleed available) samples for each marker as described earlier9. He was also estimated for combined markers in samples that had a resolvable multi-locus haplotype and that were separated by a treatment plus ≥2 consecutive P. falciparum negative samples from the same child.
molFOI was estimated on longitudinal sets of sample that had a complete set of replicates for all markers. Haplotypes were counted as new infection if a haplotype was (i) not present in the baseline sample but in a subsequent sample, (ii) not detected at ≥2 consecutive preceding bleeds or (iii) not detected after antimalarial treatment plus after at least one negative sample. Time at risk was calculated as the timespan between baseline and last sampling, minus 14 days for each antimalarial treatment (to account for the prophylactic effect of treatment).
An overview of sample selection criteria applied for different types of analyses is listed in Supplementary Table S5.
Results
Genetic diversity of markers
The discriminatory power of Amp-Seq markers cpmp, ama1-D2 and ama1-D3, as well as length-polymorphic marker msp2-CE was estimated in 33 baseline samples (Supplementary Table S5). The resolution was highest for amplicon marker cpmp (He = 0.961) that distinguished 30 haplotypes and yielded a mean MOI = 2.45 (Table 1, MOI distribution by marker in Supplementary Fig. S2). The second-best resolution was obtained by marker msp2-CE (He = 0.940) that distinguished 20 haplotypes and measured a mean MOI = 1.73. Haplotype and SNP frequencies of Amp-Seq markers are shown in Fig. 1 and Supplementary Fig. S2.
Table 1.
Marker | He | Mean MOI | Number of clonesa | Number of haplotypes | Number of SNPsb |
---|---|---|---|---|---|
msp2 CE | 0.940 | 1.73c | 57 | 20 | n/a |
cpmp | 0.961 | 2.45c | 81 | 30 | 48 |
ama1-D2 | 0.928 | 2.27c | 75 | 15 | 17 |
ama1-D3 | 0.939 | 2.24c | 74 | 22 | 11 |
aSum of all haplotypes in all samples.
bWith respect to the reference sequence of P. falciparum strain 3D7.
cPairwise comparison using two-sided paired t-test with adjusted p-value by Holm: p-value = 0.008 for ama1-D2 vs msp2-CE, p-value = 0.036 for ama1-D3 vs msp2-CE, and p-value = 0.005 for cpmp vs msp2-CE.
Discriminatory power can be increased by combining multiple markers. Inference of multi-locus haplotypes was not possible for all baseline samples. Instead, 47 independent samples were analysed that had fully established multi-locus haplotypes (Supplementary Table S5). These 47 samples comprised 67 fully established multi-locus haplotypes. Combining marker cpmp with either of the two ama1 fragments yielded very high diversity (54 and 56 haplotypes, He = 0.992 and 0.994 for cpmp/ama1-D2 and cpmp/ama1-D3) (Table 2 and Supplementary Fig. S3). Combining all 3 markers did not increase discriminatory power any further.
Table 2.
Marker | He | Number of Haplotypes | Mean MOI |
---|---|---|---|
cpmp | 0.948 | 25 | 1.43 |
ama1-D2 | 0.925 | 16 | 1.30 |
ama1-D3 | 0.936 | 21 | 1.30 |
cpmp + ama1-D2 | 0.992 | 54 | 1.43 |
cpmp + ama1-D3 | 0.994 | 56 | 1.43 |
cpmp + ama1-D2 + ama1-D3 | 0.994 | 56 | 1.43 |
Using longitudinal genotyping data to increase detectability of clones
Imperfect detectability of parasite clones has been described previously in longitudinal genotyping studies1,26–28. Data from replicates and longitudinal samples can be used to make assumptions on missed clones. This permits imputing of missed haplotypes and thus improves the tracking of clonal infections within an individual over time. Two types of false-negative haplotypes were distinguished: (FNi) haplotypes that were detected below the cut-off and (FNii) haplotypes that were not detected but imputed (Supplementary Table S6). Supplementary Fig. S4 shows an example of these different types of missed haplotypes for all Amp-Seq markers.
The sensitivity to detect parasite clones was estimated for each genotyping marker by enumerating false-negative haplotypes. Sensitivity was higher for the Amp-Seq markers than for msp2-CE (in decreasing order 96.5%, 95.0%, 93.9% and 85.1% for ama1-D2, cpmp, ama1-D3 and msp2-CE) (Table 3). For ≥57% of the identified false-negative haplotypes, reads were detected but fell below cut-off criteria (category (i) above). If such haplotypes were counted as positives by relaxing the cut-off criteria, sensitivity would increase to 99.1%, 97.5% and 97.4% for Amp-Seq markers ama1-D2, cpmp and ama1-D3 (Table 3). Using the standard cut-off criteria, the false discovery rate of haplotypes for Amp-Seq markers was in the range of 0.9–4.2% (Table 3).
Table 3.
Marker | TP | FN | FP | Sensitivity | FDR | Detected Haplotypesa | ||
---|---|---|---|---|---|---|---|---|
n | ni | niia | niib | n | TP/(TP + FNi+iiab) | FP/(TP + FP) | (TP + FNi)/(TP + FNi+iiab) | |
msp2-CE | 86 | 10 | 5 | n/ab | n/ac | 0.851 ± 0.101d | n/ac | 0.950 ± 0.061d |
cpmp | 115 | 4 | 2 | 1 | 5 | 0.943 ± 0.066 | 0.042 ± 0.057 | 0.975 ± 0.044 |
ama1-D2 | 109 | 3 | 0 | 1 | 1 | 0.965 ± 0.052 | 0.009 ± 0.027 | 0.991 ± 0.026 |
ama1-D3 | 108 | 4 | 2 | 1 | 3 | 0.939 ± 0.068 | 0.027 ± 0.046 | 0.974 ± 0.045 |
Sensitivity and FDR including 95% confidence interval was estimated based on persistent clones in 48 longitudinal samples from 12 individuals. Detectability of minority clone can be increased by including missed persistent haplotypes detected below the cut-off criteria. TP, true-positive haplotypes. FNi, false-negative haplotypes detected, but below cut-off criteria. FNiiab, false-negative haplotypes with no read detected.
aDetected true-positive and false-negative haplotypes.
bNot imputed for msp2-CE as multi-locus haplotypes cannot be established.
cLength-polymorphic data generated in different laboratories do not provide replicates suited for determination of false-positive haplotype calls and estimation of FDR.
dWithout haplotypes, that were imputed based on multi-locus haplotypes at the beginning or end of an infection.
Reproducibility to detect parasite clones in technical replicates was greater for Amp-Seq markers than for marker msp2-CE (0.94, 0.93, 0.92 and 0.89 for ama1-D3, ama1-D2, cpmp and msp2-CE) (Supplementary Table S7). Reproducibility decreased either with decreasing clone density, decreasing within-host haplotype frequency, or decreasing sequence coverage (Supplementary Table S8 and Fig. S5)9. Differences in estimates of within-host haplotype frequency between replicates were very small: The median difference was 0.70%, 0.54% and 0.38% for cpmp, ama1-D3 and ama1-D2 (Supplementary Fig. S6).
Determination of molFOI by different molecular markers and methods
A higher sensitivity of the genotyping method does not necessary impact molFOI, i.e. new clones/year, because a missed minority clone could be detected at one of the successive bleeds. We investigated the number of new infections acquired during 40 weeks follow-up in 27 children from whom a complete data set was available (on average 4.3 samples per child [min: 2, max: 7]) (Supplementary Figs S7–S39). Mean molFOI was 2.7, 2.7, 2.3 and 2.2 new infections per year for markers ama1-D3, cpmp, msp2-CE and ama1-D2 (negative binomial regression p-value for comparison of msp2-CE to ama1-D3, cpmp and ama1-D2: 0.596, 0.649 and 0.877) (Supplementary Fig. S40). Thus, no substantial difference in mean molFOI was found for the different molecular markers and different genotyping methods. Mean molFOI of multi-locus haplotypes could not be calculated because multi-locus haplotype inference was not possible for all consecutive samples of each child (Supplementary Table S4).
Quantitative dynamics of multiple infecting P. falciparum clones
Densities of individual clones was calculated from the total parasitaemia by qPCR and the within-host haplotype frequency. Examples of individual clone density dynamics in children with multi-clone infections are shown for three Amp-Seq markers (Fig. 2). The density of some clones remained constant over time, whereas other clones showed fluctuations in density over 3 orders of magnitude (Fig. 2A,B). In some children the dominant clone remains dominant over the observation period (Fig. 2A), whereas in others switch-over between minority clone and dominant clone was observed (Fig. 2B). In highly complex field samples some clones might share the same haplotype of a given marker (Fig. 2C). Such clones can only be differentiated and quantified if multiple markers are typed and at least one of the markers is not shared between concurrent clones.
After artemisinin combination therapy, some of the parasite clones from multi-clone infections were cleared 14 days after antimalarial treatment, whereas others were still detectable (Fig. 2A–C). These persisting clones had decreased clone densities (<21 copies/μl) and likely represent remaining late gametocyte stages of cleared asexual infections29. Some new infections following antimalarial treatment (artesunate-primaquine) showed a rapid increase in clone density within the first 14 days after re-infection of a host, followed by a slow decrease in clone density until clearance (Fig. 2D), whereas in other infections clone density remained constant (Fig. 2C).
Discussion
While MOI and molFOI have been extensively described as epidemiological parameters, the ratio and density of individual clones within complex infections has not yet been investigated in detail. This gap in knowledge was due to shortfalls of traditional length-polymorphic markers, where the length of a fragment greatly influences the amplification efficiency in multi-clone infections with fragments competing in PCR and a strong bias favouring smaller fragments5. As a result, multi-locus haplotypes could not be inferred from traditional genotyping data in a reliable way. Such inference is required, for example, for phylogenetic or population genetic studies. In these studies, multiple-clone infections were usually excluded or only the predominant haplotype included30,31. With the possibility to establish multi-locus haplotypes from complex infections the discriminatory power will be greatly improved in future. This study explored the feasibility of multi-locus haplotype calling in complex infections and the usefulness of the Amp-Seq genotyping technique in longitudinal data.
Single Amp-Seq markers cpmp, ama1-D2, ama1-D3, and msp2-CE yielded similar resolution. Combining cpmp with either of the ama1 fragments increased further discriminatory power. The excellent performance of Amp-Seq marker cpmp had been demonstrated earlier9. Such increased resolution is of great practical value for PCR-correction in clinical drug efficacy trials, where new infections need to be reliably distinguished from those present in an individual earlier6,32,33. Discriminatory power may be increased even further by replacing one of the two ama1 fragments with another highly discriminatory marker that has no linkage to either Amp-Seq marker cpmp nor ama1.
Reproducibility of true-positive haplotype calls was measured based on two technical replicates. By definition, a true haplotype must occur in all replicates except for three cases: (1) imperfect detectability of low-density clones, where scarce template may, by chance, lead to occasional absence of the PCR template in one of the replicates, (2) template competition impeding minority clones, whereby templates of a minority clone, present at very low abundance, are outcompeted by dominant clones, and (3), insufficient sequence depth to detect the minority clone in one replicate. It is essential to differentiate between false-positive haplotype calls (caused by cross-contamination, or amplification and sequencing errors9,11) and imperfect detection. This was achieved by considering preceding or succeeding bleeds of an individual. This approach was applied for those cases only where a haplotype was missed in one of the replicates. In our data set, all missing haplotype calls of replicates could be assigned to one of the three causes: imperfect detection, template competition or insufficient sequence depth.
Genotyping longitudinal samples in duplicates enabled also an evidence-based approach to identify false-negative haplotypes. This permitted the estimation of each marker’s sensitivity to detect minority clones. The estimated sensitivities of minority clone detection should serve primarily for a comparison of different genotyping methods, as the sample’s true haplotype composition remain uncertain. Amp-Seq genotyping with markers ama1-D2, ama1-D3 and cpmp missed fewer clones compared to msp2-CE genotyping (Amp-Seq in average 5.4% versus 14.9% msp2-CE). This difference is likely due to less stringent cut-off criteria for Amp-Seq compared to msp2-CE genotyping. Minority clone detection by msp2-CE is limited by peak calling cut-off criteria, which are usually a fixed minimal signal intensity plus a minimum peak height of 10% (used in our study) or more of the dominant peak. Minority clones with an abundance of <10% of all amplified fragments will not pass these criteria. An increase of msp2-CE sensitivity would require a lower cut-off, which would lead to more false positive signals from either stutter peaks or background noise. In contrast, Amp-Seq allows the removal of PCR artefacts before haplotype calling and thus can support a much lower cut-off of <1%9.
In cohort studies where Amp-Seq genotyping is performed in successive follow up samples of the same patient, an even more relaxed definition of Amp-Seq cut-off criteria would be justifiable. In this scenario, the same evidence-based strategy of using successive samples can be used to recover minority haplotypes that were detected with read counts below the haplotype calling cut-offs. If recovery would be performed in this study, ≥57% of all false-negative haplotypes would be identified. Such recovery would increase detectability of parasite clones by Amp-Seq to >97%. In addition, multi-locus haplotypes could provide additional evidence for accurate recovery.
The higher sensitivity of Amp-Seq to detect minority clones compared to msp2-CE substantially increased MOI, but did not affect mean molFOI. Any estimation of molFOI needs to account for temporary absence of clones from the peripheral blood caused by sequestration1,26–28. A clone that is temporarily undetectable owing to density fluctuations is likely observed at either the preceding or succeeding bleed. Therefore, a clone is usually only counted as new infection if it was not detected in ≥2 consecutive blood samples. As a consequence, a clone missed at a single bleed will not necessarily lead to a decrease of molFOI.
A clone that was intermittently missed at one bleed by msp2-CE was always detected by Amp-Seq. This observation supports the practice in earlier publications where intermittently missed clones were imputed27,28. Counting a recurrent haplotype as new infection after a single negative bleed would lead to an overestimation of molFOI1,3,26–28. The statistical power of the current study was limited and a larger sample size is needed to fully explore the effect of the typing method used on estimates of MOI or molFOI.
A major advantage of Amp-Seq over msp2-CE is that the density of an individual clone in multi-clone infections can be calculated. Quantifying the density of individual parasite clones over time permits the studying of dynamics, and thus fitness, of parasite clones exposed to within-host competition34. For example, the relative densities of new infections can be compared to clones already persisting in a host, and their densities in respect to extrinsic factors or clinical symptoms can be investigated.
For infections with high multiplicity (MOI ≥ 3), inference of multi-locus haplotypes remains challenging (example in Supplementary Fig. S41). Inference is straightforward if a haplotype occurs at a distinctive abundance in any of the longitudinal samples (Supplementary Table S4). In contrast, if haplotypes are equally abundant in one sample and also remain so over several time points, the multi-locus haplotype cannot be inferred. Inference also is impossible for complex patterns with shared haplotypes, i.e. if a haplotype has a high population frequency and therefore is present in 2 or more clones of a blood sample. Shared haplotypes may even lead to inference of wrong multi-locus haplotypes, e.g. when three clones were present at an equal within-host frequency, though only two haplotypes were measured at each locus. However, the risk of erroneous multi-locus haplotype inference decreases if more than 2 unlinked markers are used, as the likelihood of shared multi-locus haplotypes decreases with increasing number of loci. In the present study, multi-locus haplotypes up to MOI = 3 were inferred. For multiplicity >3 and for resolving complex patterns of shared haplotypes, additional longitudinal samples could be analysed simultaneously, for example by incorporating the within-host haplotype frequencies of all consecutive samples of an individual into DEploid software23.
Conclusion
Amplicon sequencing improves clone detectability compared to msp2-CE owing to its greater sensitivity for detection of minority clones. Our results confirm earlier assumptions on clone persistence with intermittent missed observation. This validates the imputation of false negatives to correct for imperfect detection of clones, a strategy also used in previous studies on clone dynamics. Using multi-locus haplotypes for genotyping permitted to identify robustly individual clones and improved differentiation between new and recurring clones. Construction of multi-locus haplotypes are of great value to compensate the effects of highly abundant haplotypes in the population. The option to quantify individual clones enables new approaches to investigate effects of parasite fitness or superinfection in multi-clone infections.
Supplementary information
Acknowledgements
We are grateful to the study participants and their guardians and to the field and laboratory team of the PNG Institute of Medical Research, in particular Alice Ura. We would also like to thank Stephen Wilcox for supervising sequence library preparation and sequencing. This work was supported by the Swiss National Science Foundation [grant number 310030_159580] and the International Centers of Excellence for Malaria Research [grant number 1U19AI129392]. A.L. was partly funded by Novartis Foundation for Medical-Biological Research. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author Contributions
Conceived and designed the experiments: I.F., I.M., A.L., C.K. Performed the experiments: A.L., C.K., J.H.K., N.H., A.R.U. Supervised field work: I.B., A.R.U. Analysed the data: A.L. Supervision: I.F. Writing - draft: A.L., I.F. All Co-authors have read the manuscript and agreed with the final version.
Data Availability
The datasets generated and analysed during the current study are available in NCBI Sequence Read Archive repository under accession number SRX2704363 (https://www.ncbi.nlm.nih.gov/sra/SRX2704363). The source code for software HaplotypR is available at https://github.com/lerch-a/HaplotypR.
Competing Interests
The authors declare no competing interests.
Footnotes
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Supplementary information accompanies this paper at 10.1038/s41598-019-39656-7.
References
- 1.Felger I, et al. The dynamics of natural Plasmodium falciparum infections. PLoS One. 2012;7:e45542. doi: 10.1371/journal.pone.0045542. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Hofmann NE, et al. The complex relationship of exposure to new Plasmodium infections and incidence of clinical malaria in Papua New Guinea. Elife. 2017;6:1–23. doi: 10.7554/eLife.23708. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Koepfli C, et al. How much remains undetected? Probability of molecular detection of human Plasmodia in the field. PLoS One. 2011;6:e19010. doi: 10.1371/journal.pone.0019010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Sondén K, et al. Asymptomatic Multiclonal Plasmodium falciparum Infections Carried Through the Dry Season Predict Protection Against Subsequent Clinical Malaria. J. Infect. Dis. 2015;212:608–16. doi: 10.1093/infdis/jiv088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Messerli C, Hofmann NE, Beck H-P, Felger I. Critical Evaluation of Molecular Monitoring in Malaria Drug Efficacy Trials and Pitfalls of Length-Polymorphic Markers. Antimicrob. Agents Chemother. 2017;61:AAC.01500–16. doi: 10.1128/AAC.01500-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Juliano JJ, Gadalla N, Sutherland CJ, Meshnick SR. The perils of PCR: can we accurately ‘correct’ antimalarial trials? Trends Parasitol. 2010;26:119–24. doi: 10.1016/j.pt.2009.12.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Miller RH, et al. A deep sequencing approach to estimate Plasmodium falciparum complexity of infection (COI) and explore apical membrane antigen 1 diversity. Malar. J. 2017;16:490. doi: 10.1186/s12936-017-2137-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Juliano JJ, et al. Exposing malaria in-host diversity and estimating population diversity by capture-recapture using massively parallel pyrosequencing. Proc. Natl. Acad. Sci. USA. 2010;107:20138–43. doi: 10.1073/pnas.1007068107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Lerch A, et al. Development of amplicon deep sequencing markers and data analysis pipeline for genotyping multi-clonal malaria infections. BMC Genomics. 2017;18:864. doi: 10.1186/s12864-017-4260-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Levitt B, et al. Overlap Extension Barcoding for the Next Generation Sequencing and Genotyping of Plasmodium falciparum in Individual Patients in Western Kenya. Sci. Rep. 2017;7:41108. doi: 10.1038/srep41108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Mideo N, et al. A deep sequencing tool for partitioning clearance rates following antimalarial treatment in polyclonal infections. Evol. Med. public Heal. 2016;2016:21–36. doi: 10.1093/emph/eov036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Betuela I, et al. Relapses contribute significantly to the risk of Plasmodium vivax infection and disease in Papua New Guinean children 1-5 years of age. J. Infect. Dis. 2012;206:1771–80. doi: 10.1093/infdis/jis580. [DOI] [PubMed] [Google Scholar]
- 13.Falk N, et al. Comparison of PCR-RFLP and Genescan-based genotyping for analyzing infection dynamics of Plasmodium falciparum. Am. J. Trop. Med. Hyg. 2006;74:944–50. doi: 10.4269/ajtmh.2006.74.944. [DOI] [PubMed] [Google Scholar]
- 14.Arnott A, et al. Distinct patterns of diversity, population structure and evolution in the AMA1 genes of sympatric Plasmodium falciparum and Plasmodium vivax populations of Papua New Guinea from an area of similarly high transmission. Malar. J. 2014;13:233. doi: 10.1186/1475-2875-13-233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Cortés A, et al. Allele specificity of naturally acquired antibody responses against Plasmodium falciparum apical membrane antigen 1. Infect. Immun. 2005;73:422–30. doi: 10.1128/IAI.73.1.422-430.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Cortés A, et al. Geographical structure of diversity and differences between symptomatic and asymptomatic infections for Plasmodium falciparum vaccine candidate AMA1. Infect. Immun. 2003;71:1416–26. doi: 10.1128/IAI.71.3.1416-1426.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hodder AN, et al. The disulfide bond structure of Plasmodium apical membrane antigen-1. J. Biol. Chem. 1996;271:29446–52. doi: 10.1074/jbc.271.46.29446. [DOI] [PubMed] [Google Scholar]
- 18.Hathaway, N. J., Parobek, C. M., Juliano, J. J. & Bailey, J. A. SeekDeep: single-base resolution de novo clustering for amplicon deep sequencing. Nucleic Acids Res. 10.1093/nar/gkx1201, 1–13 (2017). [DOI] [PMC free article] [PubMed]
- 19.Neafsey DE, et al. Genetic Diversity and Protective Efficacy of the RTS,S/AS01 Malaria Vaccine. N. Engl. J. Med. 2015;373:2025–37. doi: 10.1056/NEJMoa1505819. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Callahan BJ, et al. DADA2: High-resolution sample inference from Illumina amplicon data. Nat. Methods. 2016;13:581–583. doi: 10.1038/nmeth.3869. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Early, A. M. et al. Amplicon deep sequencing of low-density. bioRxiv, 10.1101/453472 (2018).
- 22.Bahl A, et al. PlasmoDB: the Plasmodium genome resource. A database integrating experimental and computational data. Nucleic Acids Res. 2003;31:212–5. doi: 10.1093/nar/gkg081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Zhu SJ, Almagro-Garcia J, McVean G. Deconvolution of multiple infections in Plasmodium falciparum from high throughput sequencing data. Bioinformatics. 2018;34:9–15. doi: 10.1093/bioinformatics/btx530. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Bretscher MT, et al. Detectability of Plasmodium falciparum clones. Malar. J. 2010;9:234. doi: 10.1186/1475-2875-9-234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Rosanas-Urgell A, et al. Comparison of diagnostic methods for the detection and quantification of the four sympatric Plasmodium species in field samples from Papua New Guinea. Malar. J. 2010;9:361. doi: 10.1186/1475-2875-9-361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Sama W, Owusu-Agyei S, Felger I, Dietz K, Smith T. Age and seasonal variation in the transition rates and detectability of Plasmodium falciparum malaria. Parasitology. 2006;132:13–21. doi: 10.1017/S0031182005008607. [DOI] [PubMed] [Google Scholar]
- 27.Sama W, Owusu-Agyei S, Felger I, Vounatsou P, Smith T. An immigration-death model to estimate the duration of malaria infection when detectability of the parasite is imperfect. Stat. Med. 2005;24:3269–88. doi: 10.1002/sim.2189. [DOI] [PubMed] [Google Scholar]
- 28.Smith T, Felger I, Fraser-Hurt N, Beck HP. Effect of insecticide-treated bed nets on the dynamics of multiple Plasmodium falciparum infections. Trans. R. Soc. Trop. Med. Hyg. 1999;93(Suppl 1):53–7. doi: 10.1016/S0035-9203(99)90328-0. [DOI] [PubMed] [Google Scholar]
- 29.Bousema T, et al. Revisiting the circulation time of Plasmodium falciparum gametocytes: molecular detection methods to estimate the duration of gametocyte carriage and the effect of gametocytocidal drugs. Malar. J. 2010;9:136. doi: 10.1186/1475-2875-9-136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.MalariaGEN Plasmodium falciparum Community Project. Genomic epidemiology of artemisinin resistant malaria. Elife5, 1–29 (2016). [DOI] [PMC free article] [PubMed]
- 31.Barry AE, Schultz L, Buckee CO, Reeder JC. Contrasting population structures of the genes encoding ten leading vaccine-candidate antigens of the human malaria parasite, Plasmodium falciparum. PLoS One. 2009;4:e8497. doi: 10.1371/journal.pone.0008497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Porter KA, et al. Uncertain outcomes: adjusting for misclassification in antimalarial efficacy studies. Epidemiol. Infect. 2011;139:544–51. doi: 10.1017/S0950268810001652. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Plucinski MM, Morton L, Bushman M, Dimbu PR, Udhayakumar V. Robust Algorithm for Systematic Classification of Malaria Late Treatment Failures as Recrudescence or Reinfection Using Microsatellite Genotyping. Antimicrob. Agents Chemother. 2015;59:6096–100. doi: 10.1128/AAC.00072-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.de Roode JC, Culleton R, Cheesman SJ, Carter R, Read AF. Host heterogeneity is a determinant of competitive exclusion or coexistence in genetically diverse malaria infections. Proceedings. Biol. Sci. 2004;271:1073–80. doi: 10.1098/rspb.2004.2695. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The datasets generated and analysed during the current study are available in NCBI Sequence Read Archive repository under accession number SRX2704363 (https://www.ncbi.nlm.nih.gov/sra/SRX2704363). The source code for software HaplotypR is available at https://github.com/lerch-a/HaplotypR.