Abstract
Changes in the copy number of large genomic regions, termed copy number variations (CNVs), contribute to important phenotypes. CNVs are readily identified using conventional approaches when present in a large fraction of the cell population. However, CNVs in only a few genomes are often overlooked but important; if beneficial, a de novo CNV that arises in a single genome can expand during selection to create a population of cells with novel characteristics. While single cell methods for studying de novo CNVs are increasing, we continue to lack information about CNV dynamics in rapidly evolving microbial populations. Here, we investigated de novo CNVs in the genome of the Plasmodium parasite that causes human malaria. The highly AT-rich Plasmodium falciparum genome readily accumulates CNVs that facilitate rapid adaptation. We employed low-input genomics and specialized computational tools to evaluate the impact of sub-lethal stress on the de novo CNV rate. We observed a significant increase in genome-wide de novo CNVs following treatment with an antimalarial compound that inhibits replication. De novo CNVs encompassed genes from various cellular pathways participating in human infection. This snapshot of CNV dynamics emphasizes the connection between replication stress, DNA repair, and CNV generation in this important microbial pathogen.
Graphical Abstract
Graphical Abstract.
Introduction
Changes in the copy number of large genomic regions, termed copy number variations (CNVs), are a source of phenotypic diversity for many organisms (as reviewed in [1–3]), including rapidly evolving microbes such as bacteria, yeast, and viruses [4–6]. CNVs also contribute to many diseases from cancer to blood, metabolic, neurological, and infectious diseases [7–19]. Increased access to genome sequencing has facilitated the identification of these important genomic rearrangements, especially following selection. CNVs that are identified using standard “bulk” analysis approaches (e.g. read coverage methods) are present in a large fraction of the cell population (>50% [2]). However, those in a minority of genomes, or even a few genomes, are “averaged” away during analysis steps. This artifact limits our ability to assess how CNVs arise and contribute to the genomic diversity of individual cells within a population.
In order to observe a genome’s evolutionary potential in the absence of selection, we require approaches specifically designed to detect CNVs that are not present in a predecessor “parental” genome. Due to their rare and novel nature, these events are termed “de novo” CNVs [3, 20–24]. Early experimental progress detecting de novo CNVs involved cloning individual cells, which takes time, is prone to contamination, and prevents detection of detrimental CNVs [20–22]. Misalignment of short-reads to reference genomes (i.e. discordant or split reads) can also indicate the presence of de novo CNVs, but false positives are common if matched normal samples are not available (reviewed in [25]). Recent single cell-based methods have successfully assessed de novo CNVs during experimental evolution, disease progression, and tissue development (as reviewed in [25–29]). While the reach of single cell analysis is expanding, we continue to lack information about de novo CNVs in microbes and their dynamics in evolving pathogen populations.
The Plasmodium parasite that causes malaria readily accumulates kb-sized CNVs in its genome [30–33]. This protozoan propagates within the mosquito vector and then the human host, where it infects the liver and cyclically invades red blood cells (erythrocytic cycle, Fig. 1A). CNVs impact parasite survival, allowing evasion of clinical detection [34], expansion of beneficial gene families [35], invasion of new host cells [36], and development of antimalarial resistance [37–39]. We are specifically interested in the genetic diversity of one species of malaria parasite, P. falciparum, since this may explain its rapid adaptation to new drugs and host environments [40]. Due to its relatively small, AT-rich genome (23 Mb, 19.4% GC [41]), low-input genomics is challenging in this single cell protozoan. However, we previously optimized a single cell genomics approach for erythrocytic stage P. falciparum parasites and recently developed novel computational tools to evaluate de novo CNVs in challenging genomes [42–44].
Figure 1.
Low-input genomics approach for analysis of malaria parasite genomes under stress. (A) P. falciparum erythrocytic cycle including approximate timing (h) and genome number (n). Cycle length is 44 h total for the Dd2 parasite line [78, 79]. Ring stage parasites have a single genome, replication begins at ∼30 h [80], and schizonts develop with 10–20 individual parasites within a single erythrocyte. (B) Overview of low-input genomics procedures. Early erythrocytic stage (ring, see panel A) P. falciparum parasites grown in vitro were treated with an antimalarial compound (DSM1) or the solvent control (DMSO) followed by recovery and reinvasion to produce a new round of ring stage parasites (see details in panel C). Parasite-infected erythrocytes were isolated using flow sorting (10-cell control or 2-cell samples) and genomes were amplified using a modified MALBAC-based whole genome amplification approach. Quality control steps involved DNA quantification to assess amplification success, droplet digital PCR to assess parasite genome amplification, and PCR-high resolution melting to assess sample cross-contamination. After Illumina short-read sequencing, reads were filtered, trimmed and used as input for CNV analysis using two approaches, HapCNV and LUMPY. (C) Timeline of parasite treatment and recovery. For all experiments (pilot and low-input genomics), parasites were synchronized to enrich for rings before short-term treatment was applied. Parasite growth and viability were tracked from start, to post-treatment, to post-recovery, and harvest. The recovery period allowed for completion of the erythrocytic cycle and reinvasion to produce rings before harvest for low-input genomics (see more details on pilot treatments in Supplementary Fig. S1A and low-input genomics in Supplementary Fig. S2A). *, cumulative time from start (0 h). **treatment led to a decrease in parasitemia and delay in ring progression and therefore staging is approximate. Aphi, aphidicolin. (D) Mean growth rate of parasite lines and treatments for low-input genomics experiment: FCR3 (yellow circles), untreated Dd2 (black circles), and treated Dd2 parasites (red circles). Fold change in % parasitemia was calculated by dividing the ending parasitemia from the starting parasitemia for the period being measured. Cumulative timing is as follows: start (0 h), post treat (12 h), pre-harvest (38 h), and harvest (+40–43 h); N = 2. Cumulative parasitemia and ring percentage shown in Supplementary Fig. S2B; raw flow cytometry plots tracking viability and parasitemia shown in Supplementary Fig. S2C–G. (E) Recovery time for parasite lines and treatments: FCR3 (yellow), untreated Dd2 (black), and DSM1-treated Dd2 parasites (red). Recovery is defined as the period just after treatment until harvest for down-stream processing. (F) FACS plots at parasite isolation step for DSM1-treated Dd2 samples. MitoProbe DiIC1 [5] (stains viable parasites) versus SYBR Green (stains parasite genome) indicates the proportion of erythrocytes (heat map of density) that contain viable ring stage parasites (blue circle: 0.38% ring stage parasitemia). Plots for other parasite lines are depicted in Supplementary Fig. S4.
Here, we combined these advancements and made additional improvements to directly investigate de novo CNV formation in the P. falciparum genome. Since various types of cellular stress can induce genetic change (reviewed in [2, 45–47]), we also evaluated the impacts of sub-lethal stress on de novo CNV development. Using a low-input genomics pipeline, we observed that replication stress increased the number of de novo CNVs across the parasite genome. This finding provides key evidence for adaptive amplification in an important pathogen that has evolved strategies to encourage CNV formation.
Materials and methods
Parasite lines, compounds, and treatments
We acquired Dd2 (MRA-156) and FCR3 (MRA-731) parasite lines from Bei Resources (ATCC, Manassas, VA). In this study, we were interested in detecting sub-clonal levels of genomic diversity that occur naturally in cell culture (i.e. untreated conditions); therefore, we did not re-clone parasite lines prior to treatment. For low-input genomics, we grew parasites in complete RPMI 1640 with HEPES (Thermo Fisher Scientific, Waltham, MA) supplemented 0.5% Albumax II Lipid-Rich BSA (MilliporeSigma, Burlington, MA) and 50 mg/l hypoxanthine (Thermo Fisher Scientific) and donor A + human erythrocytes (BioIVT, Hicksville, NY). We grew all cultures at 3% hematocrit at 37°C and individually flushed flasks with 5% oxygen, 5% carbon dioxide, and 90% nitrogen gas. We diluted cultures with uninfected erythrocytes and changed the culture medium every other day to keep parasitemia below 2% during maintenance. We confirmed that all cultures were negative for mycoplasma contamination approximately monthly using a LookOut Mycoplasma PCR detection kit (MilliporeSigma). We synthesized DSM1 as in previous studies [48, 49] and purchased aphidicolin (MilliporeSigma).
Pilot assessments of DSM1 treatment
We conducted multiple independent pilot assessments comparing DSM1 treatment with a replication inhibitor, aphidicolin (see general scheme in Supplementary Fig. S1A). We first acquired high ring-stage cultures (>60%) by synchronizing mixed cultures of parasites twice with 5% sorbitol, ∼48 h apart. For these pilot assessments, we used sorbitol synchronization to increase the level of ring stage in the cultures; we chose this approach instead of magnetic purification or Percoll gradients in order to limit the time parasites were outside of culture conditions. We then applied 1 μM DSM1, solvent control (dimethylsulfoxide, DMSO), or 4.4 μM aphidicolin to ring-dominant parasite populations (as judged by microscopy and flow cytometry) for 12 h. Following treatment, we washed parasites with sterile 1× phosphate-buffered saline (PBS, Thermo Fisher Scientific), returned them to complete RPMI, and allowed parasites to recover for >36 h, depending on the experiment (detailed in Supplementary Fig. S1B–E). We tracked parasitemia and parasite viability on an Accuri C6 flow cytometer (BD Biosciences, Franklin Lakes, NJ) as previously performed [42, 48, 50]. At various time points (start at 0 h, post-treatment at 12 h, or during recovery, Supplementary Fig. S1), we stained parasites with 1× SYBR Green (Thermo Fisher Scientific, stains the parasite nucleus) to assess the proportion of infected erythrocytes (parasitemia) and stage of the parasite development cycle; we also stained parasites with 10 nM MitoProbe DiIC1 [5] (Thermo Fisher Scientific, stains active parasite mitochondria) to indicate the proportion of the parasites that were viable after recovery. All pilot experiments were assessed from multiple flasks (>2, Supplementary Fig. S1).
DSM1 treatment for low-input genomics
For parasite treatment for low-input genomics, we synchronized parasites (as above), applied 1 μM DSM1 or the DMSO control for 12 h in duplicate flasks, and allowed recovery for 28–31 h (Supplementary Fig. S2A). We staggered sample sorting in order to limit the impact on parasite health by removal from culture conditions and allowed parasites to reinvade to produce rings. We removed treatments and tracked parasite number and health as described above (Supplementary Fig. S2B–G). Following reinvasion, we harvested viable 1n ring stage parasites for low-input genomics using flow sorting (details below).
Parasite flow sorting for low-input genomics
Cell sorter calibration and accuracy assessments
We calibrated the flow sorter (SH800, Sony Biotechnology, San Jose, CA) using the manufacturer’s calibration beads. We accounted for overlaps in the excitation/emission wavelengths using the integrated compensation panel matrix calculation in the SH800 software according to the manufacturer’s procedure. We also manually calibrated the droplet sorting to the nearest 0.2 mm, as recommended by the manufacturer, using the 96-well plate setting (Armadillo high- performance 96-well plate, Thermo Fisher Scientific). We evaluated SH800 sorting accuracy prior to low-input harvest using a colorimetric assay as previously described [51]. Briefly, we mixed SYBR Green+/MitoProbe + parasites (see staining details in Parasite lines, compounds, and treatments) with horseradish peroxidase enzyme (Thermo Fisher Scientific) at a final concentration of 2.5 mg/ml. We then sorted parasites into a 96-well plate filled with TMB-ELIZA substrate (Thermo Fisher Scientific) using the single cell (three drops) instrument setting, in triplicate plates (Supplementary Fig. S3A and B). Formation of a color in the well (blue, green, or yellow) indicates the successful sorting of the enzyme, and therefore parasites, into the well with the substrate. This assessment allowed us to evaluate the accuracy of SH800 sorting (through the evaluation of success for 1- versus 2-cell wells, Supplementary Fig. S3C), the consistency of sorting (through the evaluation of replicates), and the best plate positions for sorting (through the evaluation of performance in different plate rows/columns). Based on these evaluations, we proceeded with isolating 2- and 10-cells per well and avoided sorting into the top 2 rows and the first and last column of the 96-well plate (Supplementary Fig. S3D).
Parasite isolation and storage
We stained parasites with SYBR Green and MitoProbe DiIC1 [5] in complete RPMI as above (see staining details in Parasite lines, compounds, and treatments), gassed the tubes with 5% CO2, 5% O2, 90% N, and placed sample on ice to ensure viability prior to flow sorting of viable, ring-stage parasites (SH800, Sony Biotechnology Inc., San Jose, CA). We used a final concentration of 1 × 107 parasites/ml diluted in sterile 1× PBS (Thermo Fisher Scientific) as input for sorting at the “single-cell setting” (3 drop) into a 96-well plate (Armadillo high performance 96-well plate, Thermo Fisher Scientific) with each well containing 2.375 μl of cell lysis buffer [0.025 M Tris pH 8.8 (Roche Diagnostics, Indianapolis, IN), 0.01 M NaCl (MilliporeSigma), 0.01 M KCl (MilliporeSigma), 0.01 M (NH4)2SO4 (Thermo Fisher Scientific), 0.001 M EDTA (Promega, Madison, WI), and 10% Triton X-100 (MilliporeSigma)]. We gated viable 1n ring-stage parasites (Supplementary Fig. S4) and sorted into wells containing cell lysis buffer with an approximate sorting time of 15 min. After sorting, we centrifuged for 30 s in a plate centrifuge (MPS1000, Labnet International, Madison, NJ). We immediately overlaid samples with one drop (~25 μl) of light mineral oil (BioReagent grade for molecular biology, Millipore Sigma) and sealed the plates with Microamp® Clear Adhesive Film (Applied Biosystems, Waltham, MA) before storage at −80°C until whole genome amplification. All samples were isolated on the same day and stored frozen for ∼1 month before amplification (FCR3: 33 days, treated Dd2: 35 days, and untreated Dd2: 40 days).
MALBAC whole genome amplification for low-input genomics
We handled each plate separately in order to limit cross-contamination during amplification steps. First, we thawed each plate containing sorted parasites (see Parasite isolation and storage) and added 1 mg/ml Proteinase K in sterile 1× PBS (Thermo Fisher Scientific) to a final volume of 2.5 μl per well. We heated the plate in a PCR cycler (C1000, Bio-Rad Laboratories, Hercules, CA) at 50°C for 3 h, followed by 75°C for 20 min and 80°C for 5 min for proteinase k digestion. We amplified parasite genomes using the Multiple Annealing and Looping Based Amplification Cycles (MALBAC) method essentially as previously described ([42], Version 1) with some modifications (Version 2, Supplementary Fig. S5). In summary, (i) we modified the pre-amplification random primer by adding five additional degenerate bases with 20% GC-content to increase annealing to AT-rich genome sequences (5′GTGAGTGATGGTTGAGGTAGTGTGGAGNNNNNNNNNNTTT 3′); (ii) we performed 19 of the 21 total linear cycles with the Bsu DNA Polymerase (Large Fragment, New England Biosciences), which has a lower optimal reaction temperature (37°C) to improve the amplification on AT-rich sequences [52]; (iii) we lowered the extension temperature from 40/50°C to 37°C during the linear amplification cycles that used the Bsu enzyme (see full cycling parameters in Supplementary Fig. S5 and impacts of these changes in Supplementary Table S1); and (iv) we integrated robotic pipetting (Mosquito LV, SPT Labtech, Melbourn, UK) to increase the throughput of our assays (from 23 samples in Version 1 to 90 samples in the current Version 2) and limit contamination potential.
Overall, we performed 21 total linear cycles (19 cycles with Bsu polymerase and 2 cycles with Bst polymerase, New England Biolabs, Ipswitch, MA) and 17 total exponential amplification cycles using Herculase II Fusion DNA polymerase (Agilent Technologies, Santa Clara, CA). During amplification steps, we employed standard steps to limit contamination [42]. For automated pipetting of the enzyme solution during linear cycles, tips were changed after each round of pipetting. Post-amplification, we purified amplified DNA with Zymo DNA Clean & Concentrator-5columns (Zymo Research, Irvine, CA) according to the protocol and ran 2 µl of all samples on 1% agarose gels to check for the presence of DNA (generally, if >30 ng/µl, samples could be visualized with a size range of 100 to >1500 bp).
Assessments of amplification success for low-input genomics
DNA quantification
We quantified the MALBAC-amplified DNA using a Qubit fluorimeter (Qubit 1X dsDNA High Sensitivity Assay Kit, Thermo Fisher Scientific) (Supplementary Fig. S6).
Droplet digital PCR
To confirm the presence of parasite DNA in MALBAC-amplified samples, we performed droplet digital PCR (ddPCR) as described previously using the QX2000 droplet generator, C1000 thermocycler, and QX2000 droplet reader (Bio-Rad Laboratories) [48, 53] (Supplementary Fig. S7). We used duplex assays to evaluate two parasite genomic loci concurrently [pfmdr1: Forward: TGCCCACAGAATTGCATCTA; Reverse- ACCCTGATCGAAATGGAACCT; Probe: TCGTGTGTTCCATGTGACTG; pfhsp70: Forward: TGCTGTCATTACCGTTCCAG; Reverse: AGATGCTGGTACAATTGCAGGA; Probe: AGCAGCTGCAGTAGGTTCATT (Integrated DNA technologies, Newark, NJ)]. The reaction master mix contained 600 nm of forward and reverse primers, 50 nm probes, 10 µl of ddPCR Supermix for Probes (2×, Bio-Rad Laboratories), 3 µl of nuclease-free water (QIAGEN), and 1.5 ng (5 µl) of template DNA per assay (total of 20 µl). We used the following cycling conditions for PCR amplification: 10 min at 95°C initial denaturation step, 1 min at 95°C second denaturation step, and 2 min at 58°C annealing and extension step (ramp rate of 1°C per second), the second denaturation step and the annealing/extension step repeated 60 times, and then 10 min at 98°C to halt the reaction [53]. In addition to running amplified samples to assess amplification success, we ran ddPCR with bulk genomic DNA as a positive control, no template controls (water replaced DNA), and material from “no cell” wells to assess cross-well contamination. We considered the samples positive for parasite DNA if there were >50 total positive droplets in target-positive clusters. For comparisons to DNA amounts (Supplementary Fig. S6), we plotted total nanograms against total positive droplets (from both pfmdr1 and pfhsp70 loci) and performed a simple linear regression in PRISM (GraphPad, La Jolla, CA).
High resolution melting assay
To assess potential contamination between MALBAC-amplified samples, we performed asymmetric PCR amplification of the pfdhps locus followed by high-resolution melting (HRM) as described previously [54, 55] (Supplementary Fig. S8). The pfdhps locus at codon 613 is distinct in Dd2 and FCR3 parasite lines (Dd2: Ser-613 and FCR3: Ala-613, [56]). Each 20 µl reaction contained 8 µl of the 2.5× LightScanner Master mix (BioFire™ Defense, Salt Lake City, Utah, USA), 1/10 µM of forward/reverse primers and 8 µM probes targeting the pfdhps gene position 613: Forward: CTCTTACAAAATATACATGTATATGATGAGTATCCACTT; Reverse: CATGTAATTTTTGTTGTGTATTTATTACAACATTTTGA; Probe: AAGATTTATTGCCCATTGCATGA/3SpC3, (Integrated DNA technologies), 7 µl of nuclease free water, and 3 µl of DNA (∼0.05 ng total). We used the following cycling conditions for PCR amplification with the Rotor-Gene Q instrument with a 72-well rotor (QIAGEN): 95°C for 5 min, 45 cycles of 95°C for 10 s, 55°C for 30 s, and 72°C for 10 s, followed by a pre-melt at 55°C for 90 s, and a HRM ramp from 65°C to 90°C, with an increase of 0.1°C every 2 s. We plotted the change in fluorescence versus temperature (dF/T) using Rotor-Gene Q software (version 2.3.5, build 1; QIAGEN) and compared HRM peaks of amplified samples to bulk genomic DNA and plasmid controls.
Bulk DNA extraction for short-read sequencing
We extracted bulk DNA for short-read sequencing as previously performed [42]. Briefly, we lysed erythrocytes with 1.5% saponin and washed the parasite pellet three times with 1× PBS (Thermo Fisher Scientific), before resuspension in a buffered solution [150 mM NaCl (MilliporeSigma), 10 mM EDTA (Promega Corporation, Madison, WI), and 50 mM Tris pH7.5 (Roche Diagnostics)] to a total volume of 500 µl. We then lysed the parasites with 10% sarkosyl (MilliporeSigma) and 20 mg/ml proteainase K (Thermo Fisher Scientific) at 37°C overnight before DNA purification using standard phenol/chloroform/isoamyl alcohol extraction and chloroform washing steps (two times each, [42]). Finally, we precipitated DNA using 100% ethanol with 100 mM of sodium acetate overnight in DNA-lo bind tubes (Eppendorf, Enfield, CT) and then washed twice with 70% ethanol before resuspension in 50 µl of nuclease-free water (QIAGEN). We stored bulk genomic DNA at −20°C until sequencing library preparation.
Low-input genomics sample selection and short-read sequencing
Low-input sample selection
We selected 16 low-input samples from the FCR3 plate, 36 samples from the untreated Dd2 plate, and 38 samples from the treated Dd2 plate for short-read Illumina sequencing. We based our selection on the quantity of the MALBAC amplified DNA and presence of parasite DNA using ddPCR (Supplementary Figs S6 and S7). In summary, the majority of FCR3 and untreated Dd2 samples yielded quantifiable parasite DNA following MALBAC amplification (53/60 FCR3 samples and 60/60 untreated Dd2 samples); in these conditions, we chose samples randomly to proceed with sequencing (indicated in Supplementary Fig. S6). For treated Dd2 samples, we chose samples for sequencing if they had adequate DNA quantity (>10 ng total, 30 samples, Supplementary Fig. S6) or had ddPCR results showing the presence of parasite DNA (an additional 8 samples).
Short-read sequencing
Before short-read sequencing, we sheared bulk samples and low-input samples using Covaris M220 Focused Ultrasonicator for 150 and 130 s, respectively, to generate fragment sizes of ∼350 bp as evaluated by an Agilent 2100 Bioanalyzer using the High Sensitivity DNA kit (Agilent, Santa Clara). We adjusted the volume of sheared samples with nuclease-free water up to 50 µl. For samples with >100 ng (Supplementary Table S2, including bulk and MALBAC amplified samples), we diluted them to 1.2–2 ng/µl; for samples <100 ng, we proceeded with no dilution. We used NEBNext Ultra II kit (Illumina Inc., San Diego, CA) to prepare libraries for sequencing with three cycles of PCR amplification, as performed previously [42]. We quantified the resulting libraries using NEBNext Library Quant Kit (Illumina Inc.) before sequencing on the Illumina Nextseq 550 using 150 bp paired-end cycles.
Short-read sequence processing and analysis
Read processing and alignment
We performed short-read quality control steps as described previously [40, 42]. Briefly, we reordered and removed singletons and subsequently interleaved paired reads using BBMap, filtered for high-quality reads (>Q30), trimmed the MALBAC common sequence, PhiX, and Illumina adapters from the remaining reads with the BBDuk tool within BBMap, and aligned reads to the pf3D7-62_v3 reference genome using the Speedseq genome aligner [57]. We removed reads that mapped to VAR regions from bam files according to previously defined genomic coordinates [58]. We filtered out reads with low mapping quality (mapq < 30) and duplicated reads using SAMtools [59]. We calculated the percentage of reads that map to the P. falciparum genome by dividing the number of mapq >30 mapped reads by the number of total unaligned reads (mapped reads/total reads). We employed Qualimap to report mean coverage and standard deviation across the genome [60]. Using nonoverlapping 20 kb size bins, we calculated the coefficient of variation of read coverage by dividing the standard deviation of coverage within a bin by the mean across a sample and multiplying by 100 [61, 62] (R version 4.4.2). Short-read data is available at NCBI Sequence Reads Archives under project PRJNA1201106.
BLAST-based evaluation of read composition
We blasted reads from low-input genomics samples essentially as previously described [42]. We also performed an additional BLAST analysis to understand read composition. We randomly downsampled and extracted 5000 reads from either (i) all aligned reads (mapq > 1) or (ii) unmapped reads (mapq > 30, Supplementary Fig. S9). To ensure that BLAST alignments consisted of high-quality reads, we manually filtered out low complexity (%N > 50) and short sequences (length < 100 bp). We used these reads as input for a custom pipeline, where we blasted reads against a database consisting of common genomes downloaded from NCBI (i.e. human, mouse, mycoplasma, bacterial genera, etc.). For the BLAST step, we employed a Conda installable BLASTN v2.16.0 + for analyzing the reads [63]. In order to count as a hit, the sequence must have 90% of its bases aligned and have an e-value of < e-20. The pipeline is designed to return the top 10 blast hits with these criteria, and either call the most significant hit (lowest e-value), or if alignments from the Plasmodium genus were present in the result (but not the most significant hit), call the most significant Plasmodium hit. We included this step to account for possible mismatching due to the abundance of highly repetitive sequences in the Plasmodium genome. We also used a custom script to quantify the abundance of hits based on the genus related to each hit’s accession number. If the pipeline identified a hit, but the genus was not represented in the custom database, the hit was called “unknown.” The code for the pipeline, including construction of the database, is available at https://doi.org/10.6084/m9.figshare.29885879.
Single-nucleotide polymorphism analysis
We performed single nucleotide polymorphism (SNP) genotyping and analysis as previously [42], based on the MalariaGen P. falciparum Community Project V6.0 pipeline [64–67] using the pf3D7-62_v3 reference genome (Supplementary Fig. S10). Briefly, we applied GATK’s Base Quality Score Recalibration using default settings. We detected potential SNPs using GATK’s HaplotypeCaller and subsequently genotyped the SNPs using CombineGVCFs and GenotypeGVCFs. We employed GATK’s VariantRecalibrator using previously validated SNP datasets [68] and applied GATK’s ApplyRecalibration to assign VQSLOD scores [67]. We filtered the resulting SNPs for those with VQSLOD > 6 and for a GT quality metric > 20 to ensure high-quality variant calling. We only selected variants flagged as Bi-allelic to simplify the analysis. For SNP principal component analysis (PCA), we merged experiment-wide SNP data (described above) into a single file. Then we merged the VCF into a large matrix and converted the genomic data into numeric information using the “vcfr” package in R (Version 4.2.3) (https://CRAN.R-project.org/package=vcfR). We excluded individual SNPs if > 25% of the samples lacked a call in this position or if all calls were the same for each sample in that position). We scaled the remaining SNPs around the origin using the “scale” R function. We then calculated the principal components using the “prcomp” R function and scored the dataset using the “scores” function from the “vegan” R package (https://CRAN.R-project.org/package=vegan).
Copy number variation analysis
CNV calling in bulk samples
We performed CNV detection for bulk samples similar to as previously described [40, 42]. Briefly, we called CNVs independently using two methods: CNVnator (read depth based calling [69]) and LUMPY (split and discordant read based calling [70]). To Identify CNVs called in both methods, we used SVCROWS to define overlapping CNV regions relative to their size [43]. Briefly, SVCROWS uses a reciprocal-overlap-based approach (i.e. two CNVs must be overlapping each other at, or greater than, a defined threshold) to determine if two CNVs are close enough in their genomic position to be called the same. The program utilizes adaptive thresholds for overlap based on the sizes of the CNVs being compared, ensuring that we account for shifts in CNV position, particularly in regions of high variance. For known CNV calls, we used the following SVCROWS input parameters: ExpandRORegion = FALSE, BPfactor = TRUE, DefaultSizes = FALSE, xs = 3000, xl = 10 000, y1s = 300, y1l = 1000, y2s = 50%, and y2l = 80%; based on the average size of a P. falciparum gene (∼2.3 kb) and intergenic region (∼2 kb). Similar to our previous study [42], we identified three known CNVs that were called by both LUMPY and CNVnator methods in the core genome of bulk samples (untreated and treated). We determined known CNV boundaries using SVCROWS: pfmdr1 (Pf3D7_05_v3, 888001–970000, 82 kb), pf11-1 (Pf3D7_10_v3, 1 521 345–1541576, 20 kb), and pf332 (Pf3D7_11_v3, 1950201–1962400, 12 kb).
Downsampling
For CNV analysis that assessed downsampled data, we first converted the processed bam files back into FASTQ files using SAMtools and then used the reformat.sh option of BBtools to select 1.3M reads from each FASTQ (represented the fewest number of reads from a sample that passed quality filtering from the final dataset). We then realigned files to the reference genome (pf3D7-62_v3).
CNV calling in two-cell samples
We employed two methods for CNV detection in the core genome of low-input samples; LUMPY is a split/discordant read strategy with high sensitivity [70], and HapCNV is a read coverage-based strategy designed for haploid genomes [44]. We ran LUMPY as part of Speedseq with default parameters as previously described [40]. We filtered resulting structural variants to include only duplications (DUP, >1 copy of a region) and deletions (DEL, one less copy of region than the reference). We then filtered those calls for those GQ > 20 to ensure high-fidelity calls. In HapCNV, we used a quality control and bias correction procedure to exclude bins of poor quality and remove bias introduced by GC content and mappability variation. We then constructed a pseudo-reference for each Dd2 low-input sample using within-Dd2 information, which enabled control of background noise while preserving CNV signals after normalization. Finally, we used a circular binary segmentation algorithm (CBS [71]) to detect copy number change points followed by a Gaussian Mixture Model (GMM [72]) for CNV identification.
For statistics, we used PRISM (GraphPad, La Jolla, CA), using unpaired parametric T-tests with Welch’s correction. We calculated standard error in Microsoft Excel.
Defining CNV regions/determination of “rarity” in CNV calling
Small differences in sequence quality surrounding a read can lead to shifted breakpoint determination for biologically identical CNVs, which is especially true for low-input genomics datasets [28]. To account for this, we used SVCROWS to determine whether two CNV signals were the same within and between samples (see CNV calling in bulk samples for input parameters [43]). We assigned the categorizations of “rare” and “common” by assessing the CNV region frequency within datasets as performed in other studies [44, 73]. “Rare” CNVs were defined as occurring in <10% of the samples within a treatment group; “common” CNVs were defined as occurring in ≥10% of samples within a treatment group; “known” CNVs were defined by CNVs called in bulk samples (see above, CNV calling in bulk samples).
High-confidence CNV region identification
To identify “high-confidence” CNV regions called by both HapCNV and LUMPY methods, we compared the “consensus list” generated by SVCROWS [43] for each detection method by combining them into a single input file. Because there is a large disparity of average CNV region lengths between the two methods (HapCNV = ∼40 kb, LUMPY = ∼4.3 kb), we relaxed the stringency of the SVCROWS parameters for this analysis. Our input parameters to generate the “high-confidence” list were as follows: xs = 3000, xl = 6000, y1s = 500, y1l = 1500, y2s = 30%, and y2l = 60%. We defined high-confidence CNV regions as those that had >1 match from both HapCNV and LUMPY that was the same type (i.e. either duplication or deletion or mixed). For Venn diagram generation, we calculated overlaps using SVCROWS “Scavenge” mode [43], input parameters: ExpandRORegion = FALSE, BPfactor = TRUE, DefaultSizes = FALSE, xs = 3000, xl = 6000, y1s = 500, y1l = 1500, y2s = 30%, and y2l = 60%). We systematically compared lists for each overlap comparison, and if regions had at least one match in an opposing dataset, we considered it a match. We used the draw.quad.venn function in the “VennDiagram” R package (R 4.2.1) to generate the diagram.
Breakpoint analysis of high-confidence CNV regions
We assessed features of breakpoints flanking high-confidence CNVs as performed previously [40]. Briefly, we used the CNV breakpoint coordinates either set by LUMPY or HapCNV to extract a 2000 bp window around the 5′ or 3′ CNV end (±1000 bp from the breakpoint). Although Huckaby et al. solely used LUMPY to determine breakpoints [40], we employed CNV boundary information from both CNV callers (LUMPY and HAPCNV) merged using the Expand-RO function in SVCROWS to construct the conserved CNV region ([43], see parameters in High-confidence CNV region identification). In this analysis, breakpoints are called using either and/or both HapCNV and LUMPY boundaries on either end, which captures variance but can limit accuracy. For this reason, we only identified breakpoints of high-confidence CNVs, which are conserved between CNV-callers. With these defined breakpoints, we then used Vienna 2.1.9 folding prediction software with Mathews 2004 DNA folding parameters, wherein G-quadruplexes, GU pairing, and lonely base pairs were disallowed to predict stable structures in that region based on Gibbs free energy estimates in sliding 50 bp windows. We called stable hairpins by determining local minima of Gibbs free energy across the 2000 bp region using the same 50 bp windows. We reported the position of the stable structure closest to the breakpoint using the threshold of −5.8 kCal/mol, which we previously reported as the cutoff for stable secondary structure formation in this genome. If no stable hairpins could be determined within 2 kb around the breakpoint, we recorded that breakpoint as “unresolved.” We also calculated the GC content of the breakpoints from the 100 bp window surrounding the stable hairpin, whether this position was genic or intergenic, and the size of the resulting CNV as the distance between the two breakpoints.
Gene Ontology enrichment and protein class identification from high-confidence CNV regions
We used the online Gene Ontology Resource (geneontology.org) to perform GO enrichment analysis using the PANTHER Classification System [74, 75]. Since a large portion of the P. falciparum genome remains unannotated (PlasmoDB, ∼30%) and the majority of molecular functions remain unclassified (92.8%), we used the Panther Protein class assessment (version 19.0, only 55.9% remained unclassified) with default statistics (Fisher’s test with FDR adjusted P-value of < 0.05 for significance, which is recommended for small counts and overlaps between classes). We used the web tool to represent protein classes on pie charts.
Clinical importance of high-confidence CNV regions
We assessed the clinical importance of genes within high-confidence CNVs by first collecting gene IDs from high-confidence duplications and deletions. To generate this gene list, we used SVCROWS “Hunt” mode ([43], input parameters: BPfactor = TRUE, DefaultSizes = FALSE, xs = 3000, xl = 6000, y1s = 300, y1l = 600, y2s = 30, and y2l = 60), which takes a secondary input list of known genes (Pf3D7_62_v3, Plasmodb.org) and overlaps CNVs to the gene list using a size-weighted scale. We then performed a literature search to identify genes within the list that are essential for erythrocytic stage, mosquito stage (including gametocyte, gamete, ookinete, and sporozoite formation), or liver stage P. falciparum or P. berghei development. We also included genes that produced known parasite antigens important for immune recognition/vaccine escape or contributed to clinical drug resistance. We noted the Mutagenesis Index Score (MIS) for each gene, which estimates essentiality in erythrocytic stage parasites [76], from the PlasmoDB database [77]. To assess overlap of high-confidence CNVs with a large catalog of P. falciparum CNVs from clinical isolates, we collected the gene list from high-frequency CNVs (>1%, >300 bp) identified in clinical isolates (Supplementary Table S3 from [38]) and manually reformatted the list to match SVCROWS input style guidelines [43]. We then assessed how often these genes overlapped with our high-confidence gene list using the SVCROWS “Hunt” mode described above.
Results
Refined low-input genomics pipeline increased efficiency and accuracy
We developed a robust pipeline for assessing the frequency of de novo CNVs in the P. falciparum genome. We adapted our single cell genomics method to improve parasite isolation and whole genome amplification steps over our previous study [42]. In this modified protocol, we used fluorescence-activated cell sorting (FACS) to isolate viable, 1n parasites to improve efficiency and modified aspects of our whole genome amplification method to improve coverage (Supplementary Table S1 and Supplementary Fig. S5). We also sorted low-cell populations to increase accuracy (e.g. two cells per well, Supplementary Fig. S3), and added quality control PCR-based steps that confirmed our samples were of high quality prior to short-read sequencing (Supplementary Figs S7 and S8). The resulting low-input genomics pipeline consisted of basic steps including parasite isolation using flow sorting, whole genome amplification using a modified MALBAC-based approach, quality control confirmation, short-read sequencing, and CNV analysis (Fig. 1B).
Replication stress followed by a recovery period led to isolation of viable parasites
To explore the impact of antimalarial stress on de novo CNV generation, we treated the parasites with a compound that targets P. falciparum dihydroorotate dehydrogenase, pfdhodh (DSM1, [81]). The application of DSM1 for an extended period kills parasites (>48 h at 10× the EC50, [82]). However, we found that when applied to ring-stage parasites for a short period of time prior to replication start (see general timeline in Fig. 1C), DSM1 stalls parasite progression through the life cycle in a reproducible and reversible manner (Supplementary Fig. S1B, C, and E); this effect was similar to parasite treatment with the replication inhibitor, aphidicolin ([83, 84], Supplementary Fig. S1D and E). While aphidicolin stalls replication through inhibition of B-family DNA polymerases [84, 85], DSM1 likely impacts replication through depletion of pyrimidine pools during the pre-replication phase (Fig. 1A) as is observed for other forms of dNTP depletion [86, 87]. For this reason, we now refer to DSM1 treatment as “replication stress”.
For the low-input genomics harvest, we applied DSM1 to ring-stage parasites for 12 h (Fig. 1C and Supplementary Fig. S2A). Similar to our pilot experiments (Supplementary Fig. S1B, C, and E), we observed that DSM1-treated parasites exhibited a slightly decreased growth rate and altered stage progression compared to untreated parasites (Fig. 1D and Supplementary Fig. S2B–G). We harvested viable parasites after a recovery period (an additional ∼30 h, Fig. 1E, or ∼42 h from the treatment start, Fig. 1C), where we allowed parasites to complete an additional round of replication and erythrocyte invasion to produce those that have a single, haploid genome (ring stage, Supplementary Fig. S2E and G).
As a control for cross-sample contamination between isolation wells, we isolated untreated parasites with different genetic backgrounds (FCR3 versus Dd2, Table 1, and Fig. 1D and E). Prior to isolation of low-cell populations, we confirmed that FCR3, untreated Dd2, and DSM1-treated Dd2 parasites were at a similar life cycle stage and viability (Table 1); we saved a portion of these samples for parasite population sequencing (i.e. bulk samples). We then proceeded to isolate small populations of viable, ring stage parasites using FACS (Supplementary Fig. S4) for whole genome amplification and sequencing.
Table 1.
Parasite density, staging, and health at isolation for low-input genomics
| Line/Treatmenta | Mean % Parasitemiab | Mean % Ringsc | Mean % Viabilityd |
|---|---|---|---|
| Untreated FCR3 e | 0.6% | 90% | 91% |
| Untreated Dd2 e | 0.8% | 87%f | 94% |
| DSM1-treated Dd2 | 0.6% | 77%f | 95% |
Treatment conditions: 1 µM DSM1 (∼10× EC50) for 12 h prior to 31 h of recovery. Untreated samples were incubated with DMSO as a solvent control for 12 h and allowed to recover for 28–30 h (Fig. 1E).
Parasitemia was determined by calculating the number of infected erythrocytes (SYBR Green+) compared to uninfected erythrocytes (SYBR Green-) (Supplementary Fig. S2C–G).
Mean % Rings value was collected from “post-recovery” plots in Supplementary Fig. S2E/G (R1 SYBR green plots).
Viability of parasites was determined by measuring the mitochondrial membrane potential (Mitoprobe+) and calculating % of total SYBR Green-based parasitemia (plots presented in Supplementary Fig. S2E and G, and Supplementary Fig. S4).
FCR3 (Africa) and Dd2 (Southeast Asia) parasite lines are from distinct geographic origins [88, 89].
The initial % rings were 91.8% for both untreated and treated conditions (Supplementary Fig. S2C).
Quality assessments showed effective isolation and amplification of low-input samples
We sorted ring-stage low-cell populations from each parasite group into 60 wells of a 96-well plate (FCR3, untreated Dd2, and treated Dd2), including 6 wells with 10 cells and 56 wells with 2 cells. Ten-cell wells served as positive controls for the whole genome amplification step and two-cell wells provided the optimal balance between sorting accuracy (Supplementary Fig. S3) and de novo CNV detection. Zero cells were sorted into the top 2 rows of the plate (“no-cell” wells). After parasite lysis and applying PfMALBAC version-2 whole genome amplification (Supplementary Fig. S5), we assessed amplification success using three approaches. First, we measured the resulting DNA quantity across 80% of the amplified wells (Supplementary Fig. S6). On average, MALBAC amplification in wells that contained sorted parasites yielded ∼120 ng of total DNA per reaction, with a ∼10% increase in DNA for 10- versus 2-cell samples (mean of 127 ng versus 116 ng total, respectively). We did not detect position-based bias across plates or appreciable amplification from no-cell wells, but we observed that treated Dd2 wells had ∼3-fold lower levels of amplification than other samples (mean of 51 ng versus 151 ng total, respectively). Of note, there was little difference in mean amplified DNA amounts between the two untreated sample groups (FCR3 at 146 ng and untreated Dd2 at 152 ng).
Second, we performed droplet digital PCR (ddPCR) for parasite-specific genes on approximately one-third of wells to confirm the amplification of the parasite genome. DdPCR for pfmdr1 and pfhsp70 displayed that wells with measurable DNA contained amplified parasite DNA (Supplementary Figs S6 and S7). Additionally, we confirmed that two-cell wells with very low total DNA amounts (Supplementary Fig. S6) were positive for parasite genomes while “no-cell” wells did not show evidence of parasite material (Supplementary Fig. S7A–C). We observed a significant correlation between total DNA and positive ddPCR droplet counts (black dotted line, P-value of 0.0004) but also identified some samples where DNA quantity was low (<50 ng) when measured using Qubit but parasite-specific signal was high (red points, all treated Dd2, Supplementary Fig. S7D). This observation indicated that DNA quantification is not the best way to measure amplification success; target-based methods like ddPCR are more helpful to estimate the quantity of amplified material.
Finally, we employed high-resolution melting analysis to profile a drug resistance marker that differs between FCR3 and Dd2 parasites [56]. By assessing the pfdhps SNP profile of amplified genomes and comparing it to the parental profile in ∼10% of samples, we confirmed that there was no evidence of cross-sample contamination during the preparation and amplification steps (Supplementary Fig. S8). Therefore, we proceeded to sequence the amplified bulk and low-input samples (Supplementary Table S2).
Coverage deviation and SNP profiles exhibited expected trends in low-input samples
We sequenced 3 bulk samples and 90 low-input samples using Illumina short-read sequencing (Table 2 and Supplementary Table S3). Overall, sequencing proceeded well as indicated by coverage and coverage deviation of the bulk samples, as well as an equivalent mean mapping quality across all samples (Table 2). Because we noticed that treated Dd2 wells had lower levels of DNA following amplification (Supplementary Fig. S6), we sequenced higher amounts of material for this condition; this choice impacted mean coverage levels where treated Dd2 samples had ∼4 times higher coverage than untreated Dd2 samples (Table 2). As expected, based on previous studies [42], coverage deviation was ∼3-fold higher in low-input samples when compared to bulk samples, reflecting the bias of the whole genome amplification step to over- or under-amplify specific genomic regions.
Table 2.
Sequencing Summary for low-input samples and paired bulk samples
| No. of samplesb | Mean no. mapped reads per sample | Mean coverage per sample | Mean coefficient of variation (CVc) | Mean mapping quality | ||
|---|---|---|---|---|---|---|
| Bulk | Untreated FCR3 | 1 | 9 966 137 | 54.9 | 59.5 | 58.2 |
| Low-input | Untreated FCR3 | 16 | 2 694 209 | 13.7 | 105.1 | 58.2 |
| Bulk | Untreated Dd2 | 1 | 5 735 107 | 33.6 | 33.3 | 58.5 |
| Low-input | Untreated Dd2 | 33 | 2 579 674 | 13.2 | 82.6 | 58.3 |
| Bulk | Treated Dd2 | 1 | 72 380 215 | 416.3 | 35.2 | 58.5 |
| Low-input | Treated Dd2 a | 36 | 9 936 903 | 53.7 | 86.5 | 58.4 |
Four times more material was loaded on the flow cell for the treated samples than the untreated samples due to lower initial amplification yields (Supplementary Fig. S6).
Excludes samples that were removed due to low coverage. For low-input samples, analysis includes both 10- and 2-cell samples.
CV is the coefficient of variation of normalized read abundance as in [42].
The percentage of high-quality reads that mapped to the P. falciparum genome was high across all samples (mean of 67% of all reads). To identify any contaminating reads, we also blasted a subset of high-quality reads (%N < 50, length > 100 bp, and read q score > 30) from each sample to a database of genomes from other organisms. While the majority yielded matches to the P. falciparum genome (97.5%), a very small number of high-quality reads matched sequences from different organisms (Supplementary Fig. S9). Together, this information shows efficient amplification of the parasite genome and little contribution of environmental contamination during sample preparation. We removed five low-input samples from further analysis based on low coverage levels; on average, excluded samples had ∼7 times lower coverage than other low-input samples (Supplementary Table S3). Of the remaining samples, 18 were 10-cell samples and 57 were 2-cell samples. Although mean coverage was ∼2-fold higher for 2-cell samples due to the higher coverage of treated low-input samples, the mean normalized deviation was similar between 10- and 2-cell samples (3.5 and 3.1, respectively).
To once again check for cross-sample contamination, we tracked SNPs in the low-input samples compared to bulk samples. Despite some variation due to the non-clonal nature of parasite lines (see Materials and methods), low-input SNP profiles were similar to their corresponding bulk sample as compared using PCA (Supplementary Fig. S10A). After normalizing total SNPs to mapped reads, we detected a lower rate of SNPs in treated samples compared to untreated counterparts (P-value of 0.0001, Supplementary Fig. S10B). While we did not detect a correlation between normalized total SNPs and amplification quality (Supplementary Fig. S10C, R2: untreated Dd2, 0.01; treated Dd2, 0.01), we did observe a positive correlation between SNP number and coverage depth in treated versus untreated Dd2 samples (Supplementary Fig. S10D, R2: untreated Dd2, 0.66; treated Dd2, 0.27). This latter finding indicates that the difference in SNP numbers between sample groups is likely due to varying sensitivity at different levels of read coverage [90, 91].
Experimental and computational advances improved known CNV calls across low-input samples
For the current study, we employed two different CNV calling methods in low-input samples. HapCNV is a novel read coverage-based CNV calling method specifically designed for low-cell data from haploid genomes [44]. In contrast to traditional methods that arbitrarily select reference samples for CNV data normalization, HapCNV constructs a genomic location (or bin)-based pseudo-reference as a comparison baseline. This step systematically alleviates amplification bias for the identification of de novo CNVs. On the other hand, LUMPY is an established CNV calling method that exhibits high sensitivity due to the incorporation of multiple CNV signals (i.e. split and discordant reads) generated from short-read sequencing. It is particularly well-suited for detecting low-frequency variants; however, as for many CNV callers, high sensitivity leads to higher false positives [70, 92, 93].
Using these two distinct CNV calling methods, combined with a recently developed CNV counting approach (SVCROWS [43]), we evaluated the presence of known CNVs in our two-cell samples (Fig. 2A and B). The identification of known CNVs (i.e. those identified in the bulk sample, see Materials and methods) within low-input samples displays the utility of the specific CNV calling methods for different size CNVs in specific genome locations. In our previous study, we identified 2 of the 3 known CNVs in ∼10% of single cell genomes [42]. In the current study, we identified the pfmdr1 amplicon in 100% of two-cell samples using HapCNV (57/57) and 79% of samples using LUMPY (45/57). We did not identify the pf11-1 amplicon in any two-cell samples using HapCNV (0/57) but detected this locus in 75% of samples using LUMPY (43/57). Finally, we identified pf332 CNVs in 46% of two-cell samples using HapCNV (26/57) and 100% of samples using LUMPY (57/57). Although our two studies used different CNV calling methods and are not directly comparable, the overall improvement in the detection of known CNVs in the current study is likely due to advances in both the whole genome amplification method to limit amplification bias (Supplementary Table S1) and recently developed analysis approaches. When we evaluated the detection of three known CNVs specifically in Dd2 low-input samples, we observed a somewhat higher rate of known CNV detection by either method in treated samples (Fig. 2B, HapCNV: mean of 1.2 out of 3 total CNVs for untreated and 1.7 for treated samples (increase of 42%), LUMPY: mean of 2.1 out of 3 total CNVs for untreated and 2.9 for treated samples (increase of 38%, Supplementary Table S4).
Figure 2.
Low cell genomics displays an increase in de novo CNVs following replication stress. Number of CNVs from untreated (U-Dd2) and treated (T-Dd2) two-cell samples from two CNV analysis methods: HapCNV and LUMPY. Statistics for all plots use an unpaired T-test with two tailed Welch’s correction (****:P value < 0.0001; ***: <0.001; **: <0.01; *:<0.05; no stars: not significant). Analyses include all reads. Line at mean value for each dataset. (A) Depiction of CNV categories used in the analysis. Known CNVs (orange) are detected in bulk samples and present in all low cell samples. De novo CNVs are not present in bulk samples are considered common (green, >10%) or rare (teal, <10%) depending on their frequency across the 2-cell samples. (B) Detection of three known CNVs in two-cell samples. Known CNVs were identified in Dd2 bulk sequence (either pfmdr1, pf11-1 or pf332). 0: no known CNVs were detected in two-cell sample; 1/2/3: one/two/or three known CNVs were detected in two-cell sample (see Supplementary Table S4 for sample counts). (C) Proportion of total CNVs detected as duplications (Dup) or deletions (Del) in untreated (U) or treated (T) 2-cell Dd2 samples (P values: 0.02 for Dup and 0.03 for Del from LUMPY). (D) Detection of de novo CNVs (common and rare combined) from all reads (P values: 0.0002 for HapCNV, <0.0001 for LUMPY). (E) Detection of common CNVs from all reads (P values: 0.0004 for HapCNV, <0.0001 for LUMPY). (F) Detection of rare CNVs from all reads (P values: 0.008 for HapCNV, <0.0001 for LUMPY). (G) Proportion of total CNVs detected as rare and common from all reads; pie charts show the mean but statistics are calculated using all data points from the rare CNV category (P values: 0.003 for HapCNV, 0.009 for LUMPY). Pie chart size does not represent total de novo CNV numbers (∼12× higher for LUMPY, Supplementary Table S4).(H) Mean fold change between rare and common CNVs detected by HapCNV and LUMPY in untreated and treated Dd2 2-cell samples (see Table 3).
De novo CNVs in low-input samples consisted of rare and common CNVs
We next sought to quantify de novo CNVs in low-input samples. We defined de novo CNVs as those that are not present in the bulk sample and categorized them based on their frequency in low cell samples. As defined in other studies [44, 73], “common” CNVs were present in a larger number of genomes (≥10% of the same sample type, i.e. untreated or treated), and “rare” CNVs were those that occurred in a small proportion of samples (<10% of the sample type) (Fig. 2A). Overall, LUMPY detected more total de novo CNVs than HapCNV across all samples (∼12-fold), and the proportion of rare versus common CNVs varied depending on the method (18% versus 82% for HapCNV, 61% versus 39% for LUMPY, respectively, Supplementary Table S4). Additionally, de novo CNVs were more often identified as duplications than deletions for both CNV calling methods (Fig. 2C).
To understand the nature of common and rare CNVs, we also assessed how often their locations overlapped across the two sample types (untreated and treated, Supplementary Fig. S11). This analysis is useful for tracking common/rare category utility and relevance. For utility, this step acts as a sanity check since, by definition, we do not expect rare CNV locations to overlap as often as common CNVs. For relevance, de novo CNVs with conserved locations across sample types are less likely to represent true CNVs newly arising in a genome. As expected, we identified common CNVs with conserved locations between untreated and treated samples (22% for HapCNV and 52% for LUMPY, Supplementary Fig. S11A). The lower rate of overlapping calls across samples for HapCNV is likely due to the bin-based normalization strategy to remove amplification artifacts [44]. Conversely, we found that rare CNVs were predominantly called in unique genome locations (94% for HapCNV and 97% for LUMPY, Supplementary Fig. S11A), supporting their novel nature. This pattern was consistent when we randomly downsampled all sequencing data to the lowest read coverage prior to CNV calling (1.3 million reads, Supplementary Fig. S11B). This comparison not only highlights the suitability of the common and rare categories but also the difference between the CNV calling methods. Based on these observations, we assessed common and rare CNVs using both methods to capture the broadest view of stress effects on CNV generation.
Genome-wide de novo CNVs increased following replication stress
When we compared de novo CNVs in genomes with and without replication stress, we found that results were consistent regardless of the CNV calling method (Fig. 2). While the proportion of duplication and deletions somewhat changed with treatment (P value of 0.02 for decreased duplications in treated samples for LUMPY, Fig. 2C), we identified large differences in numbers de novo CNVs between treated and untreated two-cell samples (P value of 0.0002 for HapCNV and < 0.0001 for LUMPY, Fig. 2D). This pattern was consistent when we downsampled all sequencing data (P value of 0.001 for HapCNV and < 0.0001 for LUMPY, Supplementary Fig. S12A and B), indicating that the difference in de novo CNV counts between treatments was not due to read coverage. When we assessed common and rare CNV categories, we once again observed highly significant differences between treated and untreated two-cell samples in common CNVs using both methods (common: P value 0.0004 for HapCNV and < 0.0001 for LUMPY, Fig. 2E; rare: P value of 0.008 for HapCNV and < 0.0001 for LUMPY, Fig. 2F).
When we compared the proportion of de novo CNVs relative to total CNVs per sample, rare CNVs were significantly increased over common CNVs (P value of 0.003 for HapCNV and 0.009 for LUMPY, Fig. 2G). This difference persisted regardless of downsampling (P value of 0.02 for HapCNV and 0.004 for LUMPY, Supplementary Fig. S12B and C) and is in line with our assessment above that rare CNVs sit in unique genome locations and are more likely to be novel in nature (Supplementary Fig. S11). Overall, we detected a ∼2–3-fold increase of de novo CNVs in treated samples, regardless of the CNV calling method (Table 3). Once again, rare CNVs displayed the largest increase following treatment (∼3–4-fold, Table 3 and Fig. 2H).
Table 3.
Mean CNV counts per two-cell sample using two analysis methods
| CNV detection methodc | Condition (two-cell only) | Rare CNVs per sample (SE)a | Common CNVs per sample (SE)a | Combined de novo CNVs per sample (SE)b |
|---|---|---|---|---|
| HapCNV | Treated Dd2 | 7 (1.9) | 23 (2.1) | 15 (3.3) |
| Untreated Dd2 | 1.7 (0.4) | 15 (0.8) | 8 (1.0) | |
| Fold change | 4.4 | 1.6 | 1.9 | |
| LUMPY | Treated Dd2 | 297 [25] | 163 [11] | 230 [35] |
| Untreated Dd2 | 99 (6.5) | 68 (3.1) | 84 (9.3) | |
| Fold change | 3.0 | 2.4 | 2.7 |
Includes both duplications and deletions. Values are calculated by taking the mean of CNV counts per sample within each category.
De novo CNV counts combined the subcategories of rare (<10% of samples) and common CNVs (>10% of samples, absent in bulk).
CNV analysis performed using all reads (downsampled data presented in Supplementary Fig. S12).
High-confidence de novo CNVs across the genome share consistent breakpoint characteristics
When we compared overlaps between the HapCNV and LUMPY (Fig. 3A), we detected a set of CNV regions that was consistent within sample groups (5 for untreated and 38 for treated, Supplementary Table S5). The frequency of these “high-confidence” CNV regions also displayed an increase following replication stress (with knowns excluded, ∼12-fold increase in treated samples, Fig. 3B). Overall, high-confidence CNV regions represented both duplications and deletions (Fig. 3B) and were located on the majority of chromosomes (12 of 14, Fig. 3C). Of note, approximately half of these regions were identified as “rare” by both CNV calling methods across treated samples (17/36, 47%; Supplementary Table S5), indicating that novel CNVs were stimulated in parasite genomes under stress.
Figure 3.
High-confidence CNVs are located across the genome and represent diverse protein classes.(A) Comparison of CNV calls showing the number of CNV regions consistent across the two CNV calling methods. High-confidence CNV regions in untreated samples (U-Dd2, in yellow text); high-confidence CNV regions in treated samples (T-Dd2, in white text). Central number (gray): CNV regions consistent across all samples and calling methods (includes two known CNVs and two de novo CNVs, Supplementary Table S5). (B) Summary of number of high-confidence (HC) CNV regions in untreated (U-Dd2) and treated (T-Dd2) samples including duplication (Dup), deletion (Del), and mixed calls (sub-regions were called as duplications and deletions across a single CNV region by the same CNV calling method, i.e. HapCNV or LUMPY). (C) Chromosomal location of high-confidence CNV regions identified by both HapCNV and LUMPY methods (green) from untreated (black) and treated (red) parasites. Only core regions of the genome are included in the representation; subtelomeric regions as defined by Otto et al. were omitted. Each CNV region was increased in length by a factor of 2 to facilitate visualization relative to the rest of the genome. *, de novo high-confidence CNV regions identified in both untreated and treated samples. (D) Panther classification system v19 protein class comparison. Top chart: protein classes from all annotated P. falciparum genes. Bottom charts: protein classes represented by high-confidence CNV regions in untreated (U-Dd2) and treated (T-Dd2) samples. UC: unclassified proteins (green). Other colors are randomly assigned by the program to represent diverse protein classes.
We also identified proposed boundaries for the start and end of high-confidence CNV regions (termed “breakpoints”) and assessed their properties. Although CNV calling methods vary in their ability to precisely call CNV breakpoints [94], we were interested in whether these sequences shared characteristics with P. falciparum CNVs identified using conventional methods (i.e. high AT content, intergenic location, and nearby stable DNA secondary structures [40]). Overall, we found that features of breakpoints from high-confidence CNVs (12 for untreated and 66 for treated conditions, Supplementary Table S6) were consistent with what was previously reported. The majority of breakpoints (>75%) had nearby stable secondary structures predicted to form hairpins with an average free energy of folding of −7 to −8 kcal/mol (depending on the treatment, Supplementary Table S6). Most of these regions were situated in intergenic areas of the genome (56% for untreated and 75% for treated). The breakpoint regions also exhibited a higher AT-content compared to the genomic average (85%–86%, Supplementary Table S6, versus 80.6% genome-wide [41]). Based on breakpoint positions, we estimated the size of the high-confidence CNV regions; in general, this list represented relatively large regions (>4 kb) that had the potential to cover multiple genes (maximum of 115 kb, average of 23 kb, Supplementary Table S6).
High-confidence de novo CNVs represented diverse cellular pathways with potential clinical benefits
When we searched for genes that are covered by these regions, we identified 26 genes (across 3 de novo CNV regions) and 198 genes (across 37 de novo CNV regions) in untreated and treated Dd2 samples, respectively (Supplementary Table S5). Genes encompassed by the CNV regions represented diverse protein classes (Fig. 3D) and no gene ontology (GO) categories were significantly enriched in this list of genes (using an FDR adjusted P value of 0.05, Supplementary Table S7). Along with the distribution across the genome (Fig. 3C), the lack of GO term enrichment emphasizes the random nature of the de novo CNVs and the absence of selection during sample preparation.
High-confidence CNV regions encompassed genes that were previously reported as important for clinical malaria infections (Table 4). Several genes play essential roles in transmission of the parasite from the human blood to the mosquito and back. We identified CNV regions that included genes for two TRAP-related proteins, CTRP and S6, which are important for infection of the mosquito midgut and salivary glands, respectively [95, 96]. Additionally, CNV regions encompassed two PHIST genes (Pfg14-744 and 748) that are important for early gametocyte development [97, 98]. We detected CNV regions covering three genes important for liver stage development including two genetically attenuated vaccine candidates, UIS3 and SLARP [99–102]; the latter of which achieved equivalent protection levels to whole sporozoite vaccines (PfSPZ-GA1, [103, 104]). Two CNV regions carried genes important for replication of the erythrocytic stage (SEA1 and eif4A) and another two for virulence and antigen expression (PTP6 and SURFIN 14.1). One CNV region encompassed an aminophospholipid transporter previously identified as important for artemisinin resistance in a GWAS study [105]. Finally, some of the genes from high confidence CNV regions were shown to be essential in the erythrocytic stage transposon mutagenesis screen (Pfg14-744, Pfg14-748, eif4A, and PTP6), indicating their multiple important roles across the parasite developmental cycle.
Table 4.
High-confidence CNV regions with clinical importance
| Gene ID | DUP/DEL | Chrom. | Product description | Clinical importance | Citation |
|---|---|---|---|---|---|
| PF3D7_0 315 200 | DUP | 3 | Circumsporozoite- and TRAP-related protein (CTRP) | Transmission | [95] |
| PF3D7_0 315 600 | DUP | 3 | Male development protein (MD3) | Transmission | [106] |
| PF3D7_1 403 800 | DUP | 14 | Nuclear formin-like protein MISFIT | Transmission | [107] |
| PF3D7_1 442 600 | DUP | 14 | TRAP-like protein (S6) | Transmission | [96] |
| PF3D7_1 477 700 a | DEL | 14 | Plasmodium exported protein, PHISTa (Pfg14-748) | Transmissionb | [97, 98] |
| PF3D7_1 477 300 a | DEL | 14 | Plasmodium exported protein, PHIST (Pfg14-744) | Transmissionb | [97, 98] |
| PF3D7_1 474 900 | DEL | 14 | Trailer hitch homolog (CITH) | Transmissionf | [108] |
| PF3D7_1 223 700 | DEL | 12 | Vacuolar iron transporter (VIT) | Liver stage f | [109] |
| PF3D7_1 147 000 | DEL | 11 | Sporozoite and liver stage asparagine-rich protein (SLARP) | Liver stage | [100, 101] |
| PF3D7_1 302 200 | Del | 13 | Upregulated in infectious sporozoites 3 (UIS3) | Liver stage | [99, 102] |
| PF3D7_1 021 800 | DUP | 10 | Schizont egress antigen-1 (SEA1) | Erythrocytic stage replicationc | [110] |
| PF3D7_1 468 700 a | DEL | 14 | Eukaryotic initiation factor 4A (eif4A, PfH45) | Erythrocytic stage replication | [111, 112] |
| PF3D7_1 302 000 a | DEL | 13 | EMP1-trafficking protein (PTP6) | Virulence | [113] |
| PF3D7_1 477 600 | DEL | 14 | Surface-associated interspersed protein 14.1 (SURFIN 14.1) | Antigene | [114] |
| PF3D7_1 468 600 | DEL | 14 | Aminophospholipid transporter/flippase | Drug resistancef | [105] |
Essential in erythrocytic stage parasites based on MIS score < 0.3 ([76], as reported in PlasmdDB [77]).
While essential in erythrocytic parasites, these genes are expressed during early gametocytogenesis and categorized as impactful for transmission to mosquitos.
Not essential in erythrocytic stage but categorized as impactful for replication due to role in schizont egress. Antibodies to SEA1 have been detected in malaria-immune blood but the implication of this observation remains unknown [115].
Studies performed with P. berghei orthologues of P. falciparum genes.
The SURFIN gene family is located in subtelomeric regions of the genome and expressed on the surface of erythrocytes [116]; combined with indications of selection in an in-silico study, this gene is a likely antigen target for protective immunity.
Although this transporter class is likely important for parasite biology (reviewed in [117]), the role of this locus in artemisinin resistance has not been validated beyond the initial GWAS study.
As a final analysis, we evaluated whether genes from high-confidence CNV regions specifically from our treated samples have been detected in parasites from clinical infections. We performed this analysis by searching the largest catalogue of clinically relevant CNVs to date; this list of high-frequency CNVs was previously called using genomes from 2855 parasite isolates from 21 malaria-endemic countries and represented genes from larger (>300 bp), high-quality, core genome variants [38]. Assuming ∼5000 genes in the core P. falciparum genome [58], we found that genes from the two lists were ~two-times more likely to overlap than by chance (chi-square odds ratio of 2.5, Fisher’s exact P value < 0.0001 for HapCNV-treated CNV list and 1.6, 0.03 for LUMPY-treated CNV list). Therefore, stress-induced de novo CNVs have the potential to be beneficial in the clinical environment.
Discussion
Low-input genomics for studying de novo CNVs
Non-parental or “de novo” CNVs are not detected when analyzing a population of parasites predominantly because their signal (e.g. extra reads that align to that region or reads that span breakpoints) is negated by the overwhelming signal from normal copy number at that genome location. For this reason, assessments of fewer cells are necessary to investigate de novo CNV generation. While de novo CNVs have been previously tracked using flow cytometry of single yeast genomes [23], this sensitive approach provides a limited view of CNV dynamics by focusing on few specific loci that express fluorescent reporters. Inferring gene copy number through single cell transcriptomics can identify Mb-sized structural changes across genomes from heterogeneous tumor samples [118–121] but this approach is not applicable for smaller sized de novo CNVs. Single-cell genomics, where individual genomes are isolated and amplified to a level that can be sequenced, has been used to directly quantify de novo CNV rates in brain tissue, cancer cells, and Leishmania parasites [122–129]. Early single-cell genomics studies of P. falciparum have been promising but so far have had limited success with CNV identification [42, 130, 131].
Here, we optimized a low-input analysis pipeline and successfully isolated, amplified, and sequenced P. falciparum samples for CNV analysis. With experimental and computational improvements, we were able to increase our rate of parental, or “known,” CNV calling over our prior study [42]. Importantly, we detected de novo CNVs across the parasite genome and replication stress significantly increased their rate of formation. By analyzing ∼45 low-input samples per condition, we are limited to observing a subset of the population. However, this study is one of few that have directly assessed de novo CNVs in microbes and our findings demonstrate that replication stress readily drives the rapid generation of de novo CNVs in an important pathogen. Below, we cover the rationale behind our experimental/analysis choices and integrate our findings into an overarching model of P. falciparum adaptation.
Application of stress without evidence of selection to explore CNV dynamics
To evaluate the impact of replication stress on de novo CNV generation, we applied treatment to parasites just prior to replication. We observed replication stall and then resume post-treatment, which provided evidence that we successfully applied sub-lethal stress (Fig. 1D, Supplementary Figs. S1 and S2). Following this step, we allowed the parasites to complete replication and reinvade new erythrocytes; we reasoned that this “recovery phase” enabled the repair of the resulting DNA damage, which is likely to be replication-dependent (reviewed in [1, 132]. Additionally, reinvasion facilitated the isolation of haploid parasite genomes (1n, Fig. 1A), encouraging the detection of de novo CNVs due to limited contrasting signal [26].
Because of the reinvasion step, which involves an expansion in parasite number (∼3-fold, Fig. 1D), there was potential to select for beneficial DNA changes across the population of parasites. However, we did not detect evidence of strong selection from SNP profiles (Supplementary Fig. S10A) or high-confidence CNV regions (Fig. 3 and Supplementary Table S5). Specifically, we did not observe an enrichment of CNVs that encompassed the pfdhodh gene, which contributes directly to DSM1 resistance [133], or genes from DNA-related pathways (Supplementary Table S7). While studies of numerous microbes observe CNVs arise under strong selective conditions like nutrient starvation or drug treatment (as for Saccharomyces yeast: [5, 134, 135], Salmonella bacteria: reviewed in [136], as well as Plasmodium: [133, 137–143]), we now show that mild, transient conditions can stimulate CNVs that have the potential to increase parasite survival.
Relative comparison using multiple CNV calling methods to appreciate the impact of stress
De novo CNV estimates using single-cell methods from yeast, neurons, and human cancer cells vary greatly and are difficult to standardize due to the use of different experimental techniques and CNV calling methods [23, 24, 144]. For these reasons, we are not attempting to compare the rate of P. falciparum CNV formation from this study to those from other organisms. Additionally, this lack of standardization in the field led us to use two distinct CNV calling methods in our analysis. Due to the strengths and weaknesses of HapCNV and LUMPY, we observed differences in both known (Fig. 2B) and de novo CNVcalling (Supplementary Table S4). LUMPY identifies reads that cover breakpoint regions to sensitively detect CNVs [70]; because we are counting regions with few reads as support, both sensitivity and the number of false positives are high in this analysis. High known calling rates along with high numbers of de novo CNVs in our studies exemplified this feature of LUMPY. On the other hand, HapCNV uses a genome-specific pseudo-reference for normalization, which removes repeated patterns of over-and under-amplification ([44] and Materials and methods); because we require read coverage to span 3 consecutive 1kb bins, small CNVs in lower coverage genomic neighborhoods are excluded in HapCNV analysis. This limited the detection of smaller known amplicons (pf11-1 and pf332), led to fewer de novo CNV calls using this method, and likely contributed to the larger size of high-confidence CNVs (Supplementary Table S6). Given the high abundance of small CNVs (<300 bp) in the parasite genome [38, 145], HapCNV is likely underestimating the impact of small CNVs in our study.
Ultimately, the value of our study is in the relative comparison of treated and untreated samples. During sample preparation, we acknowledge that the apparent 3-fold difference in amplification efficiencies between untreated and treated sample groups (Supplementary Fig. S6) could compromise this comparison. In order to prevent cross-contamination, two-cell samples were sorted into different plates and consequently, they were amplified separately. Although staggared by just a few days (see Materials and methods), alterations in reagent performance from batch to batch could contribute to differences in amplification efficiency. Additionally, we speculate that there may have been a general under-quantification of the treated Dd2 plate; we increased the amount of each sample sequenced by 4-fold to compensate for low DNA amounts and ended up achieving ∼4-fold higher coverage levels for treated samples (Table 2). This under-quantification is supported by our ddPCR assays, where samples with small amounts of quantified DNA still yielded abundant PCR positive droplets (Supplementary Fig. S7D), indicating adequate P. falciparum genome amplification for this round. Because sequencing quality was similar between sample groups (Table 2, CV and mapping quality), downsampled data showed a consistent result with analyses using all data (Fig. 2, and Supplementary Figs S11 and S12), and treatment had a consistent impact on CNV categories using two distinct CNV calling programs (Table 3 and Supplementary Fig. S12B), we are confident that coverage differences are not dramatically impacting our overall conclusions.
During our analysis, we assumed that false positive CNVs occur at a similar rate across both treated and untreated groups (within each CNV caller and coverage level), which allowed us to confidently assess the impact of stress despite some experimental variation. Variations in known and de novo CNV calling described above emphasize that no CNV calling method is perfect and combining them can improve confidence in results [42, 146–148]. Therefore, we investigated de novo CNV patterns using the two individual methods (Fig. 2, and Supplementary Figs S11 and S12) as well as those that overlapped between HapCNV and LUMPY (Fig. 3 and Supplementary Table S5). Importantly, these “high-confidence” CNVs reflected increases after stress previously detected using the individual tools, albeit at a greater level (∼3- versus 12-fold increase). Additionally, we speculate that newly arising CNVs would have distinct locations across samples; thus, the rare nature and unique locations of high-confidence CNVs emphasized their potential to be novel (Supplementary Table S5).
De novo CNV categories highlighting existing and novel genome variation
During our investigations, we identified two types of de novo CNVs; those detected in one or a few genomes (rare, <10%) and those detected in more than a few (common, >10%). While others have used these CNV categories [73], there is no precedence for them in the context of Plasmodium biology (i.e. a haploid parasite with asynchronous replication and schizogeny [149]). However, we propose that tracking these categories helps us to understand the biological relevance of de novo CNVs in our analysis.
Based on their frequency, common CNVs are either artifacts of low-input procedures/CNV analysis or represent minor variants that preexisted in the population or arose early in the replication cycle. For the former, bias during the whole genome amplification step (i.e. the repeated pattern of over/under-amplification that occurs in a reproducible pattern across the parasite genome [42]) and PCR during library construction have the potential to skew gene copy number and increase the false positive rate [150, 151]. However, we chose experimental and computational methods designed to limit the contribution of amplification bias. First, MALBAC amplification itself limits the over-amplification of certain genomic regions by avoiding exponential amplification at the earliest steps [150] and we used limited PCR cycles during library preparation (3 cycles, [42]). These efforts are most clearly shown through the reduction in CV following MALBAC optimization in both of our studies (by ∼39% after modifying the amplification primer [42] and by 43% after switching to the Bsu polymerase, Supplementary Table S1). Second, LUMPY is not dependent on read coverage and HapCNV specifically addresses amplification artifacts by removing repeated signal present in all samples [44, 70]. Overall, we detected few CNVs with conserved genomic locations across low-input samples, which provides evidence that our methods limit the effect of amplification bias on the results; we only identified two high-confidence CNV regions that had conserved locations across multiple Dd2 and FCR3 2-cell samples (Fig. 3C and Supplementary Table S5). In the future, amplification-free methods [152], or visualization of single long-reads [49, 153], may offer advantages in distinguishing amplification bias from minor variants and de novo CNVs.
Rare CNVs, on the other hand, represent either random noise or true signal from novel CNVs arising in the genome. We assert that most noise is removed through normalization procedures, especially with HapCNV, and the impact of remaining false positives are minimized by the relative comparison of our studies (see above). We identified the majority of rare CNVs in unique genome locations across sample types, providing evidence that they are not a result of amplification bias where the same CNVs are repeatedly detected in each sample. Additionally, the greater impact of stress on rare CNVs than common CNVs (Table 3 and Fig. 2F and 2G) supports their replication dependence. The random nature of de novo CNVs, as well as the capacity to encompass any gene across the genome, ensures that CNVs can alter all aspects of parasite biology in response to the host environment. Further, our finding that stress-induced de novo CNVs tended to exhibit altered copy number in clinical isolates combined with the high frequency of unique CNVs in previous genome-wide CNV studies [32, 38], directly illustrate the expansive evolutionary potential of this pathogen.
Adaptations that encourage de novo CNV formation
The current model of CNV formation in asexual erythrocytic P. falciparum parasites is that AT-rich sequences form hairpins, disrupt replication, and eventually lead to double-strand breaks that are repaired by error-prone pathways [40]. The evolution of CNVs in this organism is especially interesting because of its unique genome architecture and alternative repertoire of CNV-generating repair pathways [41, 154]. Although they arise at many locations across the genome ([30, 31, 38] and Fig. 3C), P. falciparum CNVs that contribute to adaptation are commonly gene duplications with a relatively simple structure. Many impactful duplications form in tandem head-to-tail orientation (([40], [133], [143], [155]), Fig. 4A), which is likely due to a limited repertoire of DNA repair pathways; P. falciparum lacks the canonical nonhomologous end-joining (NHEJ) pathway that is a major contributor to CNV formation in other organisms [1, 154]. Instead, parasites use pathways that employ varying lengths of sequence homology (i.e. homologous recombination, or HR, and microhomology-mediated repair, Fig. 4B). This repair repertoire, along with an especially high AT-content genome that facilitates CNV formation [40, 133], and a lack of cell cycle checkpoints that control replication forks during times of stress (reviewed in [156]), likely represent adaptations that assist haploid P. falciparum parasites in accumulating CNVs across their genome (Fig. 4A). One major question that remains is whether these same processes are active in other rapidly replicating parasite stages such as oocysts in the mosquito midgut or schizonts in the human liver.
Figure 4.
Potential connection between replication stress, DNA repair, and CNV generation in the malaria genome. (A) Adaptations that encourage CNV formation in the P. falciparum genome (underlined, [40]). (B) Proposed model of how replication stress can impact DNA repair pathways based on prior studies (HR, homologous recombination; NHEJ, nonhomologous end-joining; MMEJ, microhomology-mediated end joining; MMBIR, microhomology-mediated break-induced repair; DSB, double-strand breaks). Model is suggested based on studies showing (i) stress can lead to reduction of RAD51 and increased DSBs reduced HR activity [157, 158], (ii) stress can trigger pathways that use microhomology sequences for DNA repair [159, 160], and (iii) P. falciparum uses microhomologous sequences to generate CNVs [40]. (C) Potential benefits of a diverse parasite population for evolutionary potential. Stress elevates the frequency of de novo CNVs across the population, which leads to more rapid evolution of beneficial CNVs (blue cells).
Updating the model of P. falciparum genome adaptation
In conjunction with previous studies in other organisms, our findings in P. falciparum support the connection between replication stress, DNA repair, and CNV generation (Fig. 4B). Studies from bacteria to cancer cells have shown that stress can either alter levels of proteins essential for HR-based repair or increase the frequency of DNA breaks [160]. In cancer cells, hypoxic stress causes a decrease in HR activity [157] and this may lead to an increased reliance on alternative error-prone pathways to repair DNA damage [161]. In bacteria, starvation drives initial amplification steps at microhomologous sequences [159, 162]. The nature, degree, and timing of applied stress likely matter because studies also show the dose-dependent induction of HR proteins under stress conditions [163]. Studying these processes in P. falciparum under various conditions will be particularly important due to the lack of NHEJ repair (see Adaptations that encourage de novo CNV formation); when NHEJ is deficient in cancer cells, DNA repair becomes more error-prone and contributes to CNV formation [164, 165].
Prior to the current study, the predominant evidence connecting DNA repair and CNV generation in P. falciparum was the detection of microhomology-mediated pathway signatures in CNV breakpoints [40]. Microhomology-mediated pathways require less homology and therefore, are more likely to interact with diverse sequences up- and downstream of a DNA break to generate various length CNVs. Our current observation of stress-induced de novo CNV formation (Table 3 and Fig. 2H) further supports this model. We originally set out to understand how sub-lethal antimalarial treatment impacts CNV generation because this is a condition that the parasite encounters during clinical infection [166]. However, because this compound targets pyrimidine biosynthesis and its impacts mirror aphidicolin treatment (Supplementary Fig. S1), we speculate that the effects we observed are likely due to replication inhibition. Our result is consistent with studies on diverse organisms, like humans, mice, and flies, where replication stress leads to CNV formation [20–22, 167, 168]. Interestingly, the degree of de novo CNV stimulation is also consistent across these studies; we detected a ∼3-fold increase in P. falciparum de novo CNVs (Table 3) compared to a ∼3–5-fold increase in mammalian de novo CNVs [20–22]. Compared to this work, our study highlights that despite the smaller genome and divergent biology of Plasmodium, as well as the use of different experimental approaches, there are likely conserved processes that increase genetic diversity under stress. Future studies will address gaps in knowledge including assessment of conditions that stimulate the P. falciparum adaptive amplification response, the timing of this response, and the requirement and regulation of specific mechanistic players.
Support for adaptive amplification in microbial populations
The experimental evolution approach to evaluate the impact of selective conditions on microbial populations across relatively long time scales has provided many insights that include the important role of gene amplification in genomic diversity (e.g. [5, 135]). In bacteria, the term “adaptive amplification” was established to describe how environmental conditions can induce genetic change to help cells adapt to stress [162]. De novo CNVs have been directly investigated in bacteria and Leishmania parasites using single read assessments and single-cell isolation, respectively [129, 153]; to our knowledge, prior studies have not assessed the effect of nonselective short-term conditions on de novo CNV rates in microbes. By continuing to improve genome amplification and CNV detection methods, we anticipate that future studies of genome dynamics will lead to new insight on how stress stimulates microbial evolution.
Already, studies from a variety of organisms are showing that an increased rate of CNV formation can have an impact on a population of organisms; if beneficial under specific conditions, a CNV that arises in a single genome can expand during selection into a larger population of cells with novel characteristics [136, 169]. This rapid expansion is exemplified when minor bacterial populations with higher gene copy numbers confer “heteroresistance” during clinical antibiotic selection [153, 170–172]. In another example, higher levels of intra-tumor heterogeneity in gene copy number predict a poorer cancer prognosis [128, 173, 174].
For Plasmodium, even with a change in the copy number of a single region per parasite, the genomic diversity within a single infected human is expansive due to the sheer numbers of parasites (estimated to reach 108 parasites when symptomatic and >1011 in severe P. falciparum infection [175]). This diversity can provide an obvious advantage as a heterogeneous population prepares asexual parasites to respond to diverse stressors (Fig. 4C). While our current studies point to a role of adaptive amplification in P. falciparum, we do note a low level of de novo CNVs across the parasite genome under normal conditions (Fig. 2D and Supplementary Fig. S12A). Since some antimalarials act rapidly [82], some level of de novo CNVs already present in a few parasites across the population would increase the chances of survival during drug treatment. It is also important to understand how parasites respond to stressful environments during infection, including changes in nutrient composition in different hosts, drug treatment during symptomatic infection, or attack from the human immune system. While the current study focused on one antimalarial compound, it will be important for future studies to evaluate the impact of other sources of stress on P. falciparum CNV formation. As mentioned above, hypoxia stimulates CNV formation in cancer cells [157] and a proteotoxic drug stimulates genetic change in yeast [176]. As a eukaryotic microbe, P. falciparum occupies special niche that can be exploited as a model to study eukaryotic mechanisms of stress-induced genome dynamics.
Clinical implications and future questions
P. falciparum causes the majority of malaria deaths worldwide and readily acquires antimalarial resistance [177, 178]. Resistance-conferring CNVs that encompass multiple genes have been identified in clinical infections [30, 38, 179–183]. While a gene from one of our high-confidence CNVs may contribute to artemisinin resistance, we also identified genes that participate in many processes important for malaria transmission and infection (Table 4). This observation combined with the diverse genome location and protein classes represented (Fig. 3) illustrates the enormous adaptive potential of the P. falciparum genome.
Despite their direct contributions to various phenotypes, CNVs may also facilitate the acquisition of point mutations in haploid P. falciparum; strong evidence for the close relationship comes from the observation of point mutations within amplifications selected in vitro [138, 140, 143, 184, 185]. Once de novo CNVs form during replication of the asexual erythrocytic stage (Fig. 4), meiotic recombination during the sexual phase in the mosquito can streamline beneficial CNVs to balance fitness costs [68]. Given the importance of CNVs in P. falciparum adaptation, it is not surprising that this organism has evolved strategies to encourage CNV formation (as described in Adaptations that encourage de novo CNV formation). Additionally, parasites from specific regions of the world may have an increased propensity to develop drug resistance [1 86]. Evaluating whether the CNV rate correlates with parasite background will help to define the evolutionary potential of this successful pathogen.
Antimalarial therapies and vaccines targeting the Plasmodium parasite are in danger due to drug resistance and access challenges [187, 188]. A strategy to impede genome evolution may be required to control malaria infections. “Evolution-proof” therapies have been explored in response to antibiotic and anticancer resistance (reviewed in [189, 190]). Of particular note is a recent study targeting NHEJ repair to prevent resistance in melanoma cell lines [191]. While antimalarial combination therapies were originally suggested to limit recrudescence following artemisinin-based drug treatment [192], the parasite has gained CNV-based strategies to overcome partner drugs ([182, 183] and reviewed in [193]). Therefore, identifying and targeting unique aspects of Plasmodium biology that facilitate CNV formation (e.g. DNA repair, AT-dependent mechanisms, or replication control) may limit parasite evolution and increase treatment efficacy.
Supplementary Material
Acknowledgements
We would like to thank Dr. John Campbell for the use of the Mosquito LV instrument and assay reagents for the colorimetric assessment of the SH800. We would also like to thank Dr. Ali Guler for statistical consultation and Dr. Heidi Seears from the University of Virginia Department of Biology Genomics Core Facility for sequencing assistance. We also thank the Vector and Eukaryotic Pathogen Genomics Database Resource (VEuPathDB) and specifically PlasmoDB (https://plasmodb.org) for hosting data used in some aspects of this study.
Author contributions: Noah Brown (Formal analysis [lead], Methodology [lead], Validation [lead], Visualization [supporting], Writing—original draft [supporting], Writing—review & editing [supporting]), Aleksander Luniewski (Data curation [lead], Investigation [lead], Visualization [supporting], Writing—review & editing [supporting]), Xuanxuan Yu (Formal analysis [supporting], Investigation [supporting], Methodology [supporting], Writing—review & editing [supporting]), Michelle Warthan (Investigation [lead], Writing—review & editing [supporting]), Shiwei Liu (Conceptualization [equal], Resources [equal], Writing—review & editing [supporting]), Julia Zulawinska (Formal analysis [supporting], Investigation [supporting], Methodology [supporting], Writing—review & editing [supporting]), Syed Basil Ahmad (Methodology [supporting], Visualization [supporting]), Nadia Prasad (Data curation [supporting], Visualization [supporting], Writing—review & editing [supporting]), Molly Congdon (Resources [supporting], Writing—review & editing [supporting]), Webster Santos (Resources [lead], Writing—review & editing [supporting]), Feifei Xiao (Funding acquisition [supporting], Methodology [supporting], Writing—review & editing [supporting]), and Jennifer Guler (Conceptualization [lead], Formal analysis [supporting], Funding acquisition [lead], Methodology [supporting], Project administration [lead], Resources [lead], Supervision [lead], Visualization [supporting], Writing—original draft [lead], Writing—review & editing [lead])
Notes
Present address: Indiana University School of Medicine, Indianapolis, IN, 46202, United States
Contributor Information
Noah Brown, Department of Biology, University of Virginia, Charlottesville, VA 22903, United States.
Aleksander Luniewski, Department of Biology, University of Virginia, Charlottesville, VA 22903, United States.
Xuanxuan Yu, Department of Biostatistics, University of Florida, College of Public Health and Health Professions, Gainesville, FL 32603, United States; Department of Surgery, College of Medicine, University of Florida, Gainesville, FL 32610, United States.
Michelle Warthan, Department of Biology, University of Virginia, Charlottesville, VA 22903, United States.
Shiwei Liu, Department of Biology, University of Virginia, Charlottesville, VA 22903, United States.
Julia Zulawinska, Department of Biology, University of Virginia, Charlottesville, VA 22903, United States.
Syed Ahmad, Department of Biology, University of Virginia, Charlottesville, VA 22903, United States.
Nadia Prasad, Department of Biology, University of Virginia, Charlottesville, VA 22903, United States.
Molly Congdon, Department of Chemistry, Virginia Tech, Blacksburg, VA 24061, United States.
Webster Santos, Department of Chemistry, Virginia Tech, Blacksburg, VA 24061, United States.
Feifei Xiao, Department of Biostatistics, University of Florida, College of Public Health and Health Professions, Gainesville, FL 32603, United States.
Jennifer L Guler, Department of Biology, University of Virginia, Charlottesville, VA 22903, United States.
Supplementary data
Supplementary data is available at NAR online.
Conflict of interest
None declared.
Funding
This work was supported by the National Institutes of Health (1R01AI150856 to J.L.G.); and the National Science Foundation (NRT-ROL 2021791 to N.J.B.). Funding to pay the Open Access publication charges for this article was provided by the NIH grant.
Data availability
The source code for the BLAST pipeline and SVCROWS can be accessed at https://doi.org/10.6084/m9.figshare.29885879 and https://doi.org/10.6084/m9.figshare.29099801.v1. Short read data are available at NCBI Sequence Reads Archives under project PRJNA1201106.
This paper is linked to: https://doi.org/10.1093/nar/gkaf1440.
References
- 1. Hastings PJ, Lupski JR, Rosenberg SM et al. Mechanisms of change in gene copy number. Nat Rev Genet. 2009;10:551–64. 10.1038/nrg2593. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Lauer S, Gresham D. An evolving view of copy number variants. Curr Genet. 2019;65:1287–95. 10.1007/s00294-019-00980-0. [DOI] [PubMed] [Google Scholar]
- 3. Pos O, Radvanszky J, Buglyo G et al. DNA copy number variation: main characteristics, evolutionary significance, and pathological aspects. Biomedical J. 2021;44:548–59. 10.1016/j.bj.2021.02.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Roth JR. The joys and terrors of fast adaptation: new findings elucidate antibiotic resistance and natural selection. Mol Microbiol. 2011;79:279–82. 10.1111/j.1365-2958.2010.07459.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Payen C, Di Rienzi SC, Ong GT et al. The dynamics of diverse segmental amplifications in populations of Saccharomyces cerevisiae adapting to strong selection. G3. 2014;4:399–409. 10.1534/g3.113.009365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Elde NC, Child SJ, Eickbush MT et al. Poxviruses deploy genomic accordions to adapt rapidly against host antiviral defenses. Cell. 2012;150:831–41. 10.1016/j.cell.2012.05.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Leary RJ, Lin JC, Cummins J et al. Integrated analysis of homozygous deletions, focal amplifications, and sequence alterations in breast and colorectal cancers. Proc Natl Acad Sci USA. 2008;105:16224–9. 10.1073/pnas.0808041105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Beroukhim R, Mermel CH, Porter D et al. The landscape of somatic copy-number alteration across human cancers. Nature. 2010;463:899–905. 10.1038/nature08822. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Zhang L, Yuan Y, Lu KH et al. Identification of recurrent focal copy number variations and their putative targeted driver genes in ovarian cancer. BMC Bioinf. 2016;17:222. 10.1186/s12859-016-1085-7. [DOI] [Google Scholar]
- 10. Peng H, Lu L, Zhou Z et al. CNV detection from circulating tumor DNA in late stage non-small cell lung cancer patients. Genes. 2019;10:926. 10.3390/genes10110926. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Gonzalez E, Kulkarni H, Bolivar H et al. The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science. 2005;307:1434–40. 10.1126/science.1101160. [DOI] [PubMed] [Google Scholar]
- 12. Sebat J, Lakshmi B, Malhotra D et al. Strong association of de novo copy number mutations with autism. Science. 2007;316:445–9. 10.1126/science.1138659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Glessner JT, Bradfield JP, Wang K et al. A genome-wide study reveals copy number variants exclusive to childhood obesity cases. Am Hum Genet. 2010;87:661–6. 10.1016/j.ajhg.2010.09.014. [DOI] [Google Scholar]
- 14. Olsson LM, Holmdahl R. Copy number variation in autoimmunity–importance hidden in complexity?. Eur J Immunol. 2012;42:1969–76. 10.1002/eji.201242601. [DOI] [PubMed] [Google Scholar]
- 15. Butchbach ME. Copy number variations in the survival motor neuron genes: implications for spinal muscular atrophy and other neurodegenerative diseases. Front Mol Biosci. 2016;3:7. 10.3389/fmolb.2016.00007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Sekar A, Bialas AR, de Rivera H et al. Schizophrenia risk from complex variation of complement component 4. Nature. 2016;530:177–83. 10.1038/nature16549. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Zekavat SM, Ruotsalainen S, Handsaker RE et al. Deep coverage whole genome sequences and plasma lipoprotein(a) in individuals of European and African ancestries. Nat Commun. 2018;9:2606. 10.1038/s41467-018-04668-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Farashi S, Harteveld CL. Molecular basis of alpha-thalassemia. Blood Cells Mol Dis. 2018;70:43–53. 10.1016/j.bcmd.2017.09.004. [DOI] [PubMed] [Google Scholar]
- 19. Wisniowiecka-Kowalnik B, Nowakowska BA. Genetics and epigenetics of autism spectrum disorder-current evidence in the field. J Appl Genetics. 2019;60:37–47. 10.1007/s13353-018-00480-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Arlt MF, Mulle JG, Schaibley VM et al. Replication stress induces genome-wide copy number changes in human cells that resemble polymorphic and pathogenic variants. Am Hum Genet. 2009;84:339–50. 10.1016/j.ajhg.2009.01.024. [DOI] [Google Scholar]
- 21. Arlt MF, Ozdemir AC, Birkeland SR et al. Hydroxyurea induces de novo copy number variants in human cells. Proc Natl Acad Sci USA. 2011;108:17360–5. 10.1073/pnas.1109272108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Arlt MF, Rajendran S, Birkeland SR et al. De novo CNV formation in mouse embryonic stem cells occurs in the absence of Xrcc4-dependent nonhomologous end joining. PLoS Genet. 2012;8:e1002981. 10.1371/journal.pgen.1002981. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Lauer S, Avecilla G, Spealman P et al. Single-cell copy number variant detection reveals the dynamics and diversity of adaptation. PLoS Biol. 2018;16:e3000069. 10.1371/journal.pbio.3000069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Oketch DJA, Giulietti M, Piva F. Copy number variations in pancreatic cancer: from biological significance to clinical utility. Int J Mol Sci. 2023;25:391. 10.3390/ijms25010391. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Yi D, Nam JW, Jeong H. Toward the functional interpretation of somatic structural variations: bulk- and single-cell approaches. Brief Bioinform. 2023;24:297. 10.1093/bib/bbad297. [DOI] [Google Scholar]
- 26. Macaulay IC, Voet T. Single cell genomics: advances and future perspectives. PLoS Genet. 2014;10:e1004126. 10.1371/journal.pgen.1004126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Wang Y, Navin NE. Advances and applications of single-cell sequencing technologies. Mol Cell. 2015;58:598–609. 10.1016/j.molcel.2015.05.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Gawad C, Koh W, Quake SR. Single-cell genome sequencing: current state of the science. Nat Rev Genet. 2016;17:175–88. 10.1038/nrg.2015.16. [DOI] [PubMed] [Google Scholar]
- 29. Wang X, Chen H, Zhang NR. DNA copy number profiling using single-cell sequencing. Brief Bioinform. 2018;19:731–6. 10.1093/bib/bbx004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Kidgell C, Volkman SK, Daily J et al. A systematic map of genetic variation in Plasmodium falciparum. PLoS Pathog. 2006;2:e57. 10.1371/journal.ppat.0020057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Ribacke U, Mok BW, Wirta V et al. Genome wide gene amplifications and deletions in Plasmodium falciparum. Mol Biochem Parasitol. 2007;155:33–44. 10.1016/j.molbiopara.2007.05.005. [DOI] [PubMed] [Google Scholar]
- 32. Cheeseman I, Gomez-Escobar N, Carret C et al. Gene copy number variation throughout the Plasmodium falciparum genome. Bmc Genomics [Electronic Resource]. 2009;10:353. 10.1186/1471-2164-10-353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Bopp SE, Manary MJ, Bright AT et al. Mitotic evolution of Plasmodium falciparum shows a stable core genome but recombination in antigen families. PLoS Genet. 2013;9:e1003293. 10.1371/journal.pgen.1003293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Gendrot M, Fawaz R, Dormoi J et al. Genetic diversity and deletion of Plasmodium falciparum histidine-rich protein 2 and 3: a threat to diagnosis of P. falciparum malaria. Clin Microbiol Infect. 2019;25:580–5. 10.1016/j.cmi.2018.09.009. [DOI] [PubMed] [Google Scholar]
- 35. Matesanz F, Tellez M, Alcina A. The Plasmodium falciparum fatty acyl-CoA synthetase family (PfACS) and differential stage-specific expression in infected erythrocytes. Mol Biochem Parasitol. 2003;126:109–12. 10.1016/S0166-6851(02)00242-6. [DOI] [PubMed] [Google Scholar]
- 36. Dankwa S, Lim C, Bei AK et al. Ancient human sialic acid variant restricts an emerging zoonotic malaria parasite. Nat Commun. 2016;7:11187. 10.1038/ncomms11187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Price RN, Uhlemann A-C, Brockman A et al. Mefloquine resistance in Plasmodium falciparum and increased pfmdr1 gene copy number. The Lancet. 2004;364:438–47. 10.1016/S0140-6736(04)16767-6. [DOI] [Google Scholar]
- 38. Ravenhall M, Benavente ED, Sutherland CJ et al. An analysis of large structural variation in global Plasmodium falciparum isolates identifies a novel duplication of the chloroquine resistance associated gene. Sci Rep. 2019;9:8287. 10.1038/s41598-019-44599-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Heinberg A, Siu E, Stern C et al. Direct evidence for the adaptive role of copy number variation on antifolate susceptibility in Plasmodium falciparum. Mol Microbiol. 2013;88:702–12. 10.1111/mmi.12162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Huckaby AC, Granum CS, Carey MA et al. Complex DNA structures trigger copy number variation across the Plasmodium falciparum genome. Nucleic Acids Res. 2019;47:1615–27. 10.1093/nar/gky1268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Gardner MJ, Hall N, Fung E et al. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 2002;419:498–511. 10.1038/nature01097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Liu S, Huckaby AC, Brown AC et al. Single-cell sequencing of the small and AT-skewed genome of malaria parasites. Genome Med. 2021;13:75. 10.1186/s13073-021-00889-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Brown N, Danis C, Guler JL et al. SVCROWS: a user-defined tool for interpreting significant structural variants in heterogeneous datasets. Nucleic Acids Res. 2026; 10.1093/nar/gkaf1440. [DOI] [Google Scholar]
- 44. Yu X, Qin F, Liu S et al. HapCNV: a comprehensive framework for CNV detection in Low-input DNA sequencing data. bioRxiv, 10.1101/2024.12.19.629494, 7 January 2025, preprint: not peer reviewed.. [DOI] [Google Scholar]
- 45. Galhardo RS, Hastings PJ, Rosenberg SM. Mutation as a stress response and the regulation of evolvability. Crit Rev Biochem Mol Biol. 2007;42:399–435. 10.1080/10409230701648502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Arlt MF, Wilson TE, Glover TW. Replication stress and mechanisms of CNV formation. Curr Opin Genet Dev. 2012;22:204–10. 10.1016/j.gde.2012.01.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Glover TW, Wilson TE, Arlt MF. Fragile sites in cancer: more than meets the eye. Nat Rev Cancer. 2017;17:489–501. 10.1038/nrc.2017.52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. McDaniels JM, Huckaby AC, Carter SA et al. Extrachromosomal DNA amplicons in antimalarial-resistant Plasmodium falciparum. Mol Microbiol. 2021;115:574–90. 10.1111/mmi.14624. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Liu S, Zulawinska J, Ebel ER et al. Direct long-read visualization reveals hidden variation in GCH1 gene copy number and precise expansion steps. Bmc Genomics. 2025;26:671. 10.1186/s12864-025-11859-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Brown AC, Moore CC, Guler JL. Cholesterol-dependent enrichment of understudied erythrocytic stages of human Plasmodium parasites. Sci Rep. 2020;10:4591. 10.1038/s41598-020-61392-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Rodrigues OR, Monard S. A rapid method to verify single-cell deposition setup for cell sorters. Cytometry Pt A. 2016;89:594–600. 10.1002/cyto.a.22865. [DOI] [Google Scholar]
- 52. Nam D, Kim S, Kim JH et al. Low-temperature loop-mediated isothermal amplification operating at physiological temperature. Biosensors (Basel). 2023;13:367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Brown N, da Silva C, Webb C et al. Antimalarial resistance risk in Mozambique detected by a novel quadruplex droplet digital PCR assay. Antimicrob Agents Chemother. 2024;68:e0034624. 10.1128/aac.00346-24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Daniels R, Ndiaye D, Wall M et al. Rapid, field-deployable method for genotyping and discovery of single-nucleotide polymorphisms associated with drug resistance in Plasmodium falciparum. Antimicrob Agents Chemother. 2012;56:2976–86. 10.1128/AAC.05737-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Kassaza K, Long AC, McDaniels JM et al. Surveillance of Plasmodium falciparum pfcrt haplotypes in southwestern uganda by high-resolution melt analysis. Malar J. 2021;20:114. 10.1186/s12936-021-03657-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Alker AP, Mwapasa V, Meshnick SR. Rapid real-time PCR genotyping of mutations associated with sulfadoxine-pyrimethamine resistance in Plasmodium falciparum. Antimicrob Agents Chemother. 2004;48:2924–9. 10.1128/AAC.48.8.2924-2929.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Chiang C, Layer RM, Faust GG et al. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nat Methods. 2015;12:966–8. 10.1038/nmeth.3505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Otto TD, Bohme U, Sanders M et al. Long read assemblies of geographically dispersed Plasmodium falciparum isolates reveal highly structured subtelomeres. Wellcome Open Res. 2018;3:52. 10.12688/wellcomeopenres.14571.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Li H, Handsaker B, Wysoker A et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–9. 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Garcia-Alcalde F, Okonechnikov K, Carbonell J et al. Qualimap: evaluating next-generation sequencing alignment data. Bioinformatics. 2012;28:2678–9. 10.1093/bioinformatics/bts503. [DOI] [PubMed] [Google Scholar]
- 61. Huang L, Ma F, Chapman A et al. Single-cell whole-genome amplification and sequencing: methodology and applications. Annu Rev Genom Hum Genet. 2015;16:79–102. 10.1146/annurev-genom-090413-025352. [DOI] [Google Scholar]
- 62. Chen C, Xing D, Tan L et al. Single-cell whole-genome analyses by Linear Amplification via Transposon Insertion (LIANTI). Science. 2017;356:189–94. 10.1126/science.aak9787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. Altschul SF, Gish W, Miller W et al. Basic local alignment search tool. J Mol Biol. 1990;215:403–10. 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- 64. McKenna A, Hanna M, Banks E et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–303. 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65. DePristo MA, Banks E, Poplin R et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43:491–8. 10.1038/ng.806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66. Van der Auwera GA, Carneiro MO, Hartl C et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics. 2013;43:1110. 10.1002/0471250953.bi1110s43. [DOI] [Google Scholar]
- 67. MalariaGen, Ahouidi A, Ali M, Almagro-Garcia J et al. An open dataset of Plasmodium falciparum genome variation in 7,000 worldwide samples. Wellcome Open Res. 2021;6:42. 10.12688/wellcomeopenres.16168.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. Miles A, Iqbal Z, Vauterin P et al. Indels, structural variation, and recombination drive genomic diversity in Plasmodium falciparum. Genome Res. 2016;26:1288–99. 10.1101/gr.203711.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Abyzov A, Urban AE, Snyder M et al. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 2011;21:974–84. 10.1101/gr.114876.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70. Layer RM, Chiang C, Quinlan AR et al. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 2014;15:R84. 10.1186/gb-2014-15-6-r84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71. Venkatraman ES, Olshen AB. A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics. 2007;23:657–63. 10.1093/bioinformatics/btl646. [DOI] [PubMed] [Google Scholar]
- 72. Xiao F, Niu Y, Hao N et al. modSaRa: a computationally efficient R package for CNV identification. Bioinformatics. 2017;33:2384–5. 10.1093/bioinformatics/btx212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73. Jiang Y, Wang R, Urrutia E et al. CODEX2: full-spectrum copy number variation detection by high-throughput DNA sequencing. Genome Biol. 2018;19:202. 10.1186/s13059-018-1578-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74. Mi H, Ebert D, Muruganujan A et al. PANTHER version 16: a revised family classification, tree-based classification tool, enhancer regions and extensive API. Nucleic Acids Res. 2021;49:D394–403. 10.1093/nar/gkaa1106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75. Singh G, Gupta D. In-silico functional annotation of Plasmodium falciparum hypothetical proteins to identify novel drug targets. Front Genet. 2022;13:821516. 10.3389/fgene.2022.821516. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76. Zhang M, Wang C, Otto TD et al. Uncovering the essential genes of the human malaria parasite Plasmodium falciparum by saturation mutagenesis. Science. 2018;360:eaap7847. 10.1126/science.aap7847. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77. Aurrecoechea C, Brestelli J, Brunk BP et al. PlasmoDB: a functional genomic database for malaria parasites. Nucleic Acids Res. 2009;37:D539–43. 10.1093/nar/gkn814. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78. Reilly HB, Wang H, Steuter JA et al. Quantitative dissection of clone-specific growth rates in cultured malaria parasites. Int J Parasitol. 2007;37:1599–607. 10.1016/j.ijpara.2007.05.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79. Reilly Ayala HB, Wacker MA, Siwo G et al. Quantitative trait loci mapping reveals candidate pathways regulating cell cycle duration in Plasmodium falciparum. Bmc Genomics [Electronic Resource]. 2010;11:577. 10.1186/1471-2164-11-577. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80. Voss Y, Klaus S, Guizetti J et al. Plasmodium schizogony, a chronology of the parasite’s cell cycle in the blood stage. PLoS Pathog. 2023;19:e1011157. 10.1371/journal.ppat.1011157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81. Phillips MA, Gujjar R, Malmquist NA et al. Triazolopyrimidine-based dihydroorotate dehydrogenase inhibitors with potent and selective activity against the malaria parasite Plasmodium falciparum. J Med Chem. 2008;51:3649–53. 10.1021/jm8001026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82. Sanz LM, Crespo B, De-Cozar C et al. P. falciparum in vitro killing rates allow to discriminate between different antimalarial mode-of-action. PLoS One. 2012;7:e30949. 10.1371/journal.pone.0030949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83. Inselburg J, Banyal HS. Synthesis of DNA during the asexual cycle of Plasmodium falciparum in culture. Mol Biochem Parasitol. 1984;10:79–87. 10.1016/0166-6851(84)90020-3. [DOI] [PubMed] [Google Scholar]
- 84. Inselburg J, Banyal HS. Plasmodium falciparum: synchronization of asexual development with aphidicolin, a DNA synthesis inhibitor. Exp Parasitol. 1984;57:48–54. 10.1016/0014-4894(84)90061-4. [DOI] [PubMed] [Google Scholar]
- 85. Sugino A, Nakayama K. DNA polymerase alpha mutants from a Drosophila melanogaster cell line. Proc Natl Acad Sci USA. 1980;77:7049–53. 10.1073/pnas.77.12.7049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86. Petermann E, Orta ML, Issaeva N et al. Hydroxyurea-stalled replication forks become progressively inactivated and require two different RAD51-mediated pathways for restart and repair. Mol Cell. 2010;37:492–502. 10.1016/j.molcel.2010.01.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87. Mannava S, Moparthy KC, Wheeler LJ et al. Depletion of deoxyribonucleotide pools is an endogenous source of DNA damage in cells undergoing oncogene-induced senescence. Am J Pathol. 2013;182:142–51. 10.1016/j.ajpath.2012.09.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88. Jensen JB, Trager W. Plasmodium falciparum in culture: establishment of additional strains. Am J Trop Med Hyg. 1978;27:743–6. 10.4269/ajtmh.1978.27.743. [DOI] [PubMed] [Google Scholar]
- 89. Dolan SA, Miller LH, Wellems TE. Evidence for a switching mechanism in the invasion of erythrocytes by Plasmodium falciparum. J Clin Invest. 1990;86:618–24. 10.1172/JCI114753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90. Tian S, Yan H, Neuhauser C et al. An analytical workflow for accurate variant discovery in highly divergent regions. Bmc Genomics [Electronic Resource]. 2016;17:703. 10.1186/s12864-016-3045-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91. Sanderson ND, Swann J, Barker L et al. High precision Neisseria gonorrhoeae variant and antimicrobial resistance calling from metagenomic Nanopore sequencing. Genome Res. 2020;30:1354–63. 10.1101/gr.262865.120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92. Cameron DL, Di Stefano L, Papenfuss AT. Comprehensive evaluation and characterisation of short read general-purpose structural variant calling software. Nat Commun. 2019;10:3240. 10.1038/s41467-019-11146-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93. Gabrielaite M, Torp MH, Rasmussen MS et al. A comparison of tools for copy-number variation detection in germline whole exome and whole genome sequencing data. Cancers. 2021;13:6283, 10.3390/cancers13246283 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94. Kosugi S, Momozawa Y, Liu X et al. Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Genome Biol. 2019;20:117. 10.1186/s13059-019-1720-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95. Dessens JT, Beetsma AL, Dimopoulos G et al. CTRP is essential for mosquito infection by malaria ookinetes. EMBO J. 1999;18:6221–7. 10.1093/emboj/18.22.6221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96. Steinbuechel M, Matuschewski K. Role for the Plasmodium sporozoite-specific transmembrane protein S6 in parasite motility and efficient malaria transmission. Cell Microbiol. 2009;11:279–88. 10.1111/j.1462-5822.2008.01252.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97. Eksi S, Haile Y, Furuya T et al. Identification of a subtelomeric gene family expressed during the asexual-sexual stage transition in Plasmodium falciparum. Mol Biochem Parasitol. 2005;143:90–9. 10.1016/j.molbiopara.2005.05.010. [DOI] [PubMed] [Google Scholar]
- 98. Silvestrini F, Lasonder E, Olivieri A et al. Protein export marks the early phase of gametocytogenesis of the human malaria parasite Plasmodium falciparum. Mol Cell Proteomics. 2010;9:1437–48. 10.1074/mcp.M900479-MCP200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99. Mueller AK, Labaied M, Kappe SH et al. Genetically modified Plasmodium parasites as a protective experimental malaria vaccine. Nature. 2005;433:164–7. 10.1038/nature03188. [DOI] [PubMed] [Google Scholar]
- 100. Silvie O, Goetz K, Matuschewski K. A sporozoite asparagine-rich protein controls initiation of Plasmodium liver stage development. PLoS Pathog. 2008;4:e1000086. 10.1371/journal.ppat.1000086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101. van Schaijk BC, Ploemen IH, Annoura T et al. A genetically attenuated malaria vaccine candidate based on P. falciparum b9/slarp gene-deficient sporozoites. eLife. 2014;3:e03582. 10.7554/eLife.03582. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102. Real E, Rodrigues L, Cabal GG et al. Plasmodium UIS3 sequesters host LC3 to avoid elimination by autophagy in hepatocytes. Nat Microbiol. 2018;3:17–25. 10.1038/s41564-017-0054-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103. Roestenberg M, Walk J, van der Boor SC et al. A double-blind, placebo-controlled phase 1/2a trial of the genetically attenuated malaria vaccine PfSPZ-GA1. Sci Transl Med. 2020;12:eaaz5629. 10.1126/scitranslmed.aaz5629. [DOI] [PubMed] [Google Scholar]
- 104. Franke-Fayard B, Marin-Mogollon C, Geurten FJA et al. Creation and preclinical evaluation of genetically attenuated malaria parasites arresting growth late in the liver. NPJ Vaccines. 2022;7:139. 10.1038/s41541-022-00558-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105. Van Tyne D, Park DJ, Schaffner SF et al. Identification and functional validation of the novel antimalarial resistance locus PF10_0355 in Plasmodium falciparum. PLoS Genet. 2011;7:e1001383. 10.1371/journal.pgen.1001383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106. Farrukh A, Musabyimana JP, Distler U et al. The Plasmodium falciparum CCCH zinc finger protein MD3 regulates male gametocytogenesis through its interaction with RNA-binding proteins. Mol Microbiol. 2024;121:543–64. 10.1111/mmi.15215. [DOI] [PubMed] [Google Scholar]
- 107. Bushell ES, Ecker A, Schlegelmilch T et al. Paternal effect of the nuclear formin-like protein MISFIT on Plasmodium development in the mosquito vector. PLoS Pathog. 2009;5:e1000539. 10.1371/journal.ppat.1000539. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108. Mair GR, Lasonder E, Garver LS et al. Universal features of post-transcriptional gene regulation are critical for Plasmodium zygote development. PLoS Pathog. 2010;6:e1000767. 10.1371/journal.ppat.1000767. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109. Slavic K, Krishna S, Lahree A et al. A vacuolar iron-transporter homologue acts as a detoxifier in Plasmodium. Nat Commun. 2016;7:10403. 10.1038/ncomms10403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110. Perrin AJ, Bisson C, Faull PA et al. Malaria parasite schizont egress antigen-1 plays an essential role in nuclear segregation during schizogony. mBio. 2021;12:e03377–03320. 10.1128/mBio.03377-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111. Pradhan A, Hussain EM, Tuteja R. Characterization of replication fork and phosphorylation stimulated Plasmodium falciparum helicase 45. Gene. 2008;420:66–75. 10.1016/j.gene.2008.05.005. [DOI] [PubMed] [Google Scholar]
- 112. Evans L, Gowers D, Firman K et al. Enhanced purification and characterization of the PfeIF4A (PfH45) helicase from Plasmodium falciparum using a codon-optimised clone. Protein Expression Purif. 2012;85:1–8. 10.1016/j.pep.2012.06.010. [DOI] [Google Scholar]
- 113. Maier AG, Rug M, O’Neill MT et al. Exported proteins required for virulence and rigidity of Plasmodium falciparum-infected human erythrocytes. Cell. 2008;134:48–61. 10.1016/j.cell.2008.04.051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114. Ajibola O, Diop MF, Ghansah A et al. In silico characterisation of putative Plasmodium falciparum vaccine candidates in African malaria populations. Sci Rep. 2021;11:16215. 10.1038/s41598-021-95442-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115. Raj DK, Nixon CP, Nixon CE et al. Antibodies to PfSEA-1 block parasite egress from RBCs and protect against malaria infection. Science. 2014;344:871–7. 10.1126/science.1254417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 116. Winter G, Kawai S, Haeggstrom M et al. SURFIN is a polymorphic antigen expressed on Plasmodium falciparum merozoites and infected erythrocytes. J Exp Med. 2005;201:1853–63. 10.1084/jem.20041392. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 117. Weiner J, Kooij TW. Phylogenetic profiles of all membrane transport proteins of the malaria parasite highlight new drug targets. Microb Cell. 2016;3:511–21. 10.15698/mic2016.10.534. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 118. Patel AP, Tirosh I, Trombetta JJ et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science. 2014;344:1396–401. 10.1126/science.1254257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 119. Xu QH, Chen SH, Hu YB et al. Single-cell RNA transcriptome reveals the intra-tumoral heterogeneity and regulators underlying tumor progression in metastatic pancreatic ductal adenocarcinoma. Cell Death Discov. 2021;7:331. 10.1038/s41420-021-00663-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 120. Gao RL, Bai SS, Henderson YC et al. Delineating copy number and clonal substructure in human tumors from single-cell transcriptomes. Nat Biotechnol. 2021;39:599–608. 10.1038/s41587-020-00795-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 121. Mahdipour-Shirayeh A, Erdmann N, Leung-Hagesteijn C et al. sciCNV: high-throughput paired profiling of transcriptomes and DNA copy number variations at single-cell resolution. Briefings Bioinf. 2022;23:bbab413. 10.1093/bib/bbab413. [DOI] [Google Scholar]
- 122. Navin N, Kendall J, Troge J et al. Tumour evolution inferred by single-cell sequencing. Nature. 2011;472:90–4. 10.1038/nature09807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 123. McConnell MJ, Lindberg MR, Brennand KJ et al. Mosaic copy number variation in human neurons. Science. 2013;342:632–7. 10.1126/science.1243472. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 124. Cai X, Evrony GD, Lehmann HS et al. Single-cell, genome-wide sequencing identifies clonal somatic copy-number variation in the human brain. Cell Rep. 2014;8:1280–9. 10.1016/j.celrep.2014.07.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 125. Francis JM, Zhang CZ, Maire CL et al. EGFR variant heterogeneity in glioblastoma resolved through single-nucleus sequencing. Cancer Discov. 2014;4:956–71. 10.1158/2159-8290.CD-13-0879. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 126. Eirew P, Steif A, Khattra J et al. Dynamics of genomic clones in breast cancer patient xenografts at single-cell resolution. Nature. 2015;518:422–6. 10.1038/nature13952. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 127. Rohrback S, April C, Kaper F et al. Submegabase copy number variations arise during cerebral cortical neurogenesis as revealed by single-cell whole-genome sequencing. Proc Natl Acad Sci USA. 2018;115:10804–9. 10.1073/pnas.1812702115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 128. Baslan T, Kendall J, Volyanskyy K et al. Novel insights into breast cancer copy number genetic heterogeneity revealed by single-cell genome sequencing. eLife. 2020;9:e51480. 10.7554/eLife.51480. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 129. Negreira GH, Monsieurs P, Imamura H et al. High throughput single-cell genome sequencing gives insights into the generation and evolution of mosaic aneuploidy in Leishmania donovani. Nucleic Acids Res. 2022;50:293–305. 10.1093/nar/gkab1203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 130. Nair S, Nkhoma SC, Serre D et al. Single-cell genomics for dissection of complex malaria infections. Genome Res. 2014;24:1028–38. 10.1101/gr.168286.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 131. Trevino SG, Nkhoma SC, Nair S et al. High-resolution single-cell sequencing of malaria parasites. Genome Biol Evol. 2017;9:3373–83. 10.1093/gbe/evx256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 132. Saxena S, Zou L. Hallmarks of DNA replication stress. Mol Cell. 2022;82:2298–314. 10.1016/j.molcel.2022.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 133. Guler JL, Freeman DL, Ahyong V et al. Asexual populations of the human malaria parasite, Plasmodium falciparum, use a two-step genomic strategy to acquire accurate, beneficial DNA amplifications. PLoS Pathog. 2013;9:e1003375. 10.1371/journal.ppat.1003375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 134. Dunham MJ, Badrane H, Ferea T et al. Characteristic genome rearrangements in experimental evolution of Saccharomyces cerevisiae. Proc Natl Acad Sci USA. 2002;99:16144–9. 10.1073/pnas.242624799. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 135. Gresham D, Desai MM, Tucker CM et al. The repertoire and dynamics of evolutionary adaptations to controlled nutrient-limited environments in yeast. PLoS Genet. 2008;4:e1000303. 10.1371/journal.pgen.1000303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 136. Sandegren L, Andersson DI. Bacterial gene amplification: implications for the evolution of antibiotic resistance. Nat Rev Micro. 2009;7:578–88. 10.1038/nrmicro2174. [DOI] [Google Scholar]
- 137. Dharia N, Sidhu A, Cassera M et al. Use of high-density tiling microarrays to identify mutations globally and elucidate mechanisms of drug resistance in Plasmodium falciparum. Genome Biol. 2009;10:R21. 10.1186/gb-2009-10-2-r21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 138. Rottmann M, McNamara C, Yeung BK et al. Spiroindolones, a potent compound class for the treatment of malaria. Science. 2010;329:1175–80. 10.1126/science.1193225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 139. Ross LS, Gamo FJ, Lafuente-Monasterio MJ et al. In vitro resistance selections for plasmodium falciparum dihydroorotate dehydrogenase inhibitors give mutants with multiple point mutations in the drug-binding site and altered growth. J Biol Chem. 2014;289:17980–95. 10.1074/jbc.M114.558353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 140. Phillips MA, Lotharius J, Marsh K et al. A long-duration dihydroorotate dehydrogenase inhibitor (DSM265) for prevention and treatment of malaria. Sci Transl Med. 2015;7:296ra111. 10.1126/scitranslmed.aaa6645. [DOI] [Google Scholar]
- 141. Cowell A, Winzeler E. Exploration of the Plasmodium falciparum resistome and druggable genome reveals new mechanisms of drug resistance and antimalarial targets. Microbiol Insights. 2018;11: 1178636118808529. 10.1177/1178636118808529. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 142. Rocamora F, Zhu L, Liong KY et al. Oxidative stress and protein damage responses mediate artemisinin resistance in malaria parasites. PLoS Pathog. 2018;14:e1006930. 10.1371/journal.ppat.1006930. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 143. Luth MR, Godinez-Macias KP, Chen D et al. Systematic in vitro evolution in Plasmodium falciparum reveals key determinants of drug resistance. Science. 2024;386:eadk9893. 10.1126/science.adk9893. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 144. Evrony GD, Lee E, Park PJ et al. Resolving rates of mutation in the brain using single-neuron genomics. eLife. 2016;5:e12966. 10.7554/eLife.12966. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 145. Mills RE, Walter K, Stewart C et al. Mapping copy number variation by population-scale genome sequencing. Nature. 2011;470:59–65. 10.1038/nature09708. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 146. Zhao M, Wang Q, Wang Q et al. Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives. BMC Bioinf. 2013; 14:(Suppl 11). 10.1186/1471-2105-14-S11-S1.Suppl 11, S1. [DOI] [Google Scholar]
- 147. Guan P, Sung WK. Structural variation detection using next-generation sequencing data: a comparative technical review. Methods. 2016;102:36–49. 10.1016/j.ymeth.2016.01.020. [DOI] [PubMed] [Google Scholar]
- 148. Gabrielaite M, Torp MH, Rasmussen MS et al. A comparison of tools for copy-number variation detection in germline whole exome and whole genome sequencing data. Cancers. 2021;13:6283. 10.3390/cancers13246283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 149. Arnot DE, Ronander E, Bengtsson DC. The progression of the intra-erythrocytic cell cycle of Plasmodium falciparum and the role of the centriolar plaques in asynchronous mitotic division during schizogony. Int J Parasitol. 2011;41:71–80. 10.1016/j.ijpara.2010.07.012. [DOI] [PubMed] [Google Scholar]
- 150. Zong C, Lu S, Chapman AR et al. Genome-wide detection of single-nucleotide and copy-number variations of a single human cell. Science. 2012;338:1622–6. 10.1126/science.1229164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 151. Lasken RS. Single-cell sequencing in its prime. Nat Biotechnol. 2013;31:211–2. 10.1038/nbt.2523. [DOI] [PubMed] [Google Scholar]
- 152. Laks E, McPherson A, Zahn H et al. Clonal decomposition and DNA replication states defined by scaled single-cell genome sequencing. Cell. 2019;179:1207–1221.e22. 10.1016/j.cell.2019.10.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 153. Kupke J, Brombach J, Fang Y et al. Heteroresistance in Enterobacter cloacae complex caused by variation in transient gene amplification events. NPJ Antimicrob Resist. 2025;3:13. 10.1038/s44259-025-00082-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 154. Kirkman LA, Lawrence EA, Deitsch KW. Malaria parasites utilize both homologous recombination and alternative end joining pathways to maintain genome integrity. Nucleic Acids Res. 2014;42:370–9. 10.1093/nar/gkt881. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 155. Triglia T, Foote SJ, Kemp DJ et al. Amplification of the multidrug resistance gene pfmdr1 in Plasmodium falciparum has arisen as multiple independent events. Mol Cell Biol. 1991;11:5244–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 156. Matthews H, Duffy CW, Merrick CJ. Checks and balances? DNA replication and the cell cycle in Plasmodium. Parasites Vectors. 2018;11:216. 10.1186/s13071-018-2800-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 157. Bindra RS, Schaffer PJ, Meng A et al. Down-regulation of Rad51 and decreased homologous recombination in hypoxic cancer cells. Mol Cell Biol. 2004;24:8504–18. 10.1128/MCB.24.19.8504-8518.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 158. Saleh-Gohari N, Bryant HE, Schultz N et al. Spontaneous homologous recombination is induced by collapsed replication forks that are caused by endogenous DNA single-strand breaks. Mol Cell Biol. 2005;25:7158–69. 10.1128/MCB.25.16.7158-7169.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 159. Slack A, Thornton PC, Magner DB et al. On the mechanism of gene amplification induced under stress in Escherichia coli. PLoS Genet. 2006;2:e48. 10.1371/journal.pgen.0020048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 160. Hastings PJ, Ira G, Lupski JR. A microhomology-mediated break-induced replication model for the origin of human copy number variation. PLoS Genet. 2009;5:e1000327. 10.1371/journal.pgen.1000327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 161. Valerie K, Povirk LF. Regulation and mechanisms of mammalian double-strand break repair. Oncogene. 2003;22:5792–812. 10.1038/sj.onc.1206679. [DOI] [PubMed] [Google Scholar]
- 162. Hastings PJ, Bull HJ, Klump JR et al. Adaptive amplification: an inducible chromosomal instability mechanism. Cell. 2000;103:723–31. 10.1016/S0092-8674(00)00176-8. [DOI] [PubMed] [Google Scholar]
- 163. Cole GM, Schild D, Lovett ST et al. Regulation of RAD54- and RAD52-lacZ gene fusions in Saccharomyces cerevisiae in response to DNA damage. Mol Cell Biol. 1987;7:1078–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 164. Raghavan D, Shipley WU, Garnick MB et al. Biology and management of bladder cancer. N Engl J Med. 1990;322:1129–38. 10.1056/NEJM199004193221607. [DOI] [PubMed] [Google Scholar]
- 165. Bentley J, Diggle CP, Harnden P et al. DNA double strand break repair in human bladder cancer is error prone and involves microhomology-associated end-joining. Nucleic Acids Res. 2004;32:5249–59. 10.1093/nar/gkh842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 166. White NJ. Pharmacokinetic and pharmacodynamic considerations in antimalarial dose optimization. Antimicrob Agents Chemother. 2013;57:5792–807. 10.1128/AAC.00287-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 167. Durkin SG, Ragland RL, Arlt MF et al. Replication stress induces tumor-like microdeletions in FHIT/FRA3B. Proc Natl Acad Sci USA. 2008;105:246–51. 10.1073/pnas.0708097105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 168. Chen L, Zhou W, Zhang C et al. CNV instability associated with DNA replication dynamics: evidence for replicative mechanisms in CNV mutagenesis. Hum Mol Genet. 2015;24:1574–83. 10.1093/hmg/ddu572. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 169. Craven SH, Neidle EL. Double trouble: medical implications of genetic duplication and amplification in bacteria. Future Microbiol. 2007;2:309–21. 10.2217/17460913.2.3.309. [DOI] [PubMed] [Google Scholar]
- 170. Hjort K, Nicoloff H, Andersson DI. Unstable tandem gene amplification generates heteroresistance (variation in resistance within a population) to colistin in Salmonella enterica. Mol Microbiol. 2016;102:274–89. 10.1111/mmi.13459. [DOI] [PubMed] [Google Scholar]
- 171. Anderson SE, Sherman EX, Weiss DS et al. Aminoglycoside heteroresistance in Acinetobacter baumannii AB5075. mSphere. 2018;3:e00271–00218. 10.1128/mSphere.00271-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 172. Nicoloff H, Hjort K, Levin BR et al. The high prevalence of antibiotic heteroresistance in pathogenic bacteria is mainly caused by gene amplification. Nat Microbiol. 2019;4:504–14. 10.1038/s41564-018-0342-0. [DOI] [PubMed] [Google Scholar]
- 173. van Dijk E, van den Bosch T, Lenos KJ et al. Chromosomal copy number heterogeneity predicts survival rates across cancers. Nat Commun. 2021;12:3188. 10.1038/s41467-021-23384-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 174. Sobral D, Martins M, Kaplan S et al. Genetic and microenvironmental intra-tumor heterogeneity impacts colorectal cancer evolution and metastatic development. Commun Biol. 2022;5:937. 10.1038/s42003-022-03884-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 175. White NJ. Malaria parasite clearance. Malar J. 2017;16:88. 10.1186/s12936-017-1731-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 176. Shor E, Fox CA, Broach JR. The yeast environmental stress response regulates mutagenesis induced by proteotoxic stress. PLoS Genet. 2013;9:e1003680. 10.1371/journal.pgen.1003680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 177. WHO . 2023; World Malaria Report 2023.
- 178. Okombo J, Fidock DA. Towards next-generation treatment options to combat Plasmodium falciparum malaria. Nat Rev Micro. 2025;23:178–91. 10.1038/s41579-024-01099-x. [DOI] [Google Scholar]
- 179. Wilson C, Serrano A, Wasley A et al. Amplification of a gene related to mammalian mdr genes in drug-resistant Plasmodium falciparum. Science. 1989;244:1184–6. 10.1126/science.2658061. [DOI] [PubMed] [Google Scholar]
- 180. Nair S, Nash D, Sudimack D et al. Recurrent gene amplification and soft selective sweeps during evolution of multidrug resistance in malaria parasites. Mol Biol Evol. 2007;24:562–73. 10.1093/molbev/msl185. [DOI] [PubMed] [Google Scholar]
- 181. Ravenhall M, Benavente ED, Mipando M et al. Characterizing the impact of sustained sulfadoxine/pyrimethamine use upon the Plasmodium falciparum population in Malawi. Malar J. 2016;15:575. 10.1186/s12936-016-1634-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 182. Leroy D, Macintyre F, Adoke Y et al. African isolates show a high proportion of multiple copies of the Plasmodium falciparum plasmepsin-2 gene, a piperaquine resistance marker. Malar J. 2019;18:126. 10.1186/s12936-019-2756-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 183. Bopp S, Magistrado P, Wong W et al. Plasmepsin II-III copy number accounts for bimodal piperaquine resistance among Cambodian Plasmodium falciparum. Nat Commun. 2018;9:1769. 10.1038/s41467-018-04104-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 184. Thaithong S, Ranford-Cartwright LC, Siripoon N et al. Plasmodium falciparum: gene mutations and amplification of dihydrofolate reductase genes in parasites grown in vitro in presence of pyrimethamine. Exp Parasitol. 2001;98:59–70. 10.1006/expr.2001.4618. [DOI] [PubMed] [Google Scholar]
- 185. Istvan ES, Dharia NV, Bopp SE et al. Validation of isoleucine utilization targets in Plasmodium falciparum. Proc Natl Acad Sci USA. 2011;108:1627–32. 10.1073/pnas.1011560108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 186. Rathod PK, McErlean T, Lee PC. Variations in frequencies of drug resistance in Plasmodium falciparum. Proc Natl Acad Sci USA. 1997;94:9389–93. 10.1073/pnas.94.17.9389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 187. Olawade DB, Wada OZ, Ezeagu CN et al. Malaria vaccination in Africa: a mini-review of challenges and opportunities. Medicine (Baltimore). 2024;103:e38565. 10.1097/MD.0000000000038565. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 188. Theodoridis L, Carvalho TG. Antimalarial drug resistance and drug discovery: learning from the past to innovate the future. Int J Parasitol. 2025;28:100602. 10.1016/j.ijpddr.2025.100602. [DOI] [Google Scholar]
- 189. Bell G, MacLean C. The Search for ‘Evolution-Proof’ Antibiotics. Trends Microbiol. 2018;26:471–83. 10.1016/j.tim.2017.11.005. [DOI] [PubMed] [Google Scholar]
- 190. Zhang L, Ma J, Liu L et al. Adaptive therapy: a tumor therapy strategy based on Darwinian evolution theory. Crit Rev Oncol Hematol. 2023;192:104192. 10.1016/j.critrevonc.2023.104192. [DOI] [PubMed] [Google Scholar]
- 191. Dharanipragada P, Zhang X, Liu S et al. Blocking genomic instability prevents acquired resistance to MAPK inhibitor therapy in melanoma. Cancer Discov. 2023;13:880–909. 10.1158/2159-8290.CD-22-0787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 192. Li GQ, Arnold K, Guo XB et al. Randomised comparative study of mefloquine, qinghaosu, and pyrimethamine-sulfadoxine in patients with falciparum malaria. The Lancet. 1984;324:1360–1. 10.1016/S0140-6736(84)92057-9. [DOI] [Google Scholar]
- 193. Siddiqui FA, Liang X, Cui L. Plasmodium falciparum resistance to ACTs: emergence, mechanisms, and outlook. Int J Parasitol. 2021;16:102–18. 10.1016/j.ijpddr.2021.05.007. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The source code for the BLAST pipeline and SVCROWS can be accessed at https://doi.org/10.6084/m9.figshare.29885879 and https://doi.org/10.6084/m9.figshare.29099801.v1. Short read data are available at NCBI Sequence Reads Archives under project PRJNA1201106.
This paper is linked to: https://doi.org/10.1093/nar/gkaf1440.





