Long read single cell RNA sequencing reveals the isoform diversity of Plasmodium vivax transcripts

Brittany Hazzard; Juliana M Sá; Angela C Ellis; Tales V Pascini; Shuchi Amin; Thomas E Wellems; David Serre

doi:10.1371/journal.pntd.0010991

. 2022 Dec 16;16(12):e0010991. doi: 10.1371/journal.pntd.0010991

Long read single cell RNA sequencing reveals the isoform diversity of Plasmodium vivax transcripts

Brittany Hazzard ¹, Juliana M Sá ², Angela C Ellis ², Tales V Pascini ², Shuchi Amin ², Thomas E Wellems ², David Serre ^1,^3,^*

Editor: Ananias A Escalante⁴

PMCID: PMC9803293 PMID: 36525464

Abstract

Plasmodium vivax infections often consist of heterogenous populations of parasites at different developmental stages and with distinct transcriptional profiles, which complicates gene expression analyses. The advent of single cell RNA sequencing (scRNA-seq) enabled disentangling this complexity and has provided robust and stage-specific characterization of Plasmodium gene expression. However, scRNA-seq information is typically derived from the end of each mRNA molecule (usually the 3’-end) and therefore fails to capture the diversity in transcript isoforms documented in bulk RNA-seq data. Here, we describe the sequencing of scRNA-seq libraries using Pacific Biosciences (PacBio) chemistry to characterize full-length Plasmodium vivax transcripts from single cell parasites. Our results show that many P. vivax genes are transcribed into multiple isoforms, primarily through variations in untranslated region (UTR) length or splicing, and that the expression of many isoforms is developmentally regulated. Our findings demonstrate that long read sequencing can be used to characterize mRNA molecules at the single cell level and provides an additional resource to better understand the regulation of gene expression throughout the Plasmodium life cycle.

Author summary

Single-cell RNA-sequencing is a valuable tool for identifying gene expression differences among cells present in one sample. However, scRNA-seq data is usually generated by sequencing the 3’ end of mRNA molecules after poly-A capture, which complicates assigning reads to specific genes for organisms with poorly annotated UTRs, and prevents identifying differences in isoform expression. Here, we utilize a modified version of 10X scRNA-seq technology to characterize full-length transcripts using PacBio sequencing from both sporozoite and blood stages of Plasmodium vivax. These data allow us to predict full-length, stage-specific, transcripts for P. vivax, as well as to identify variations in UTR usage throughout the P. vivax life cycle.

Introduction

Plasmodium vivax is the second most common cause of human malaria worldwide and was responsible for 4.5 million clinical cases of malaria in 2020 [1]. Despite these numbers, P. vivax research lags behind that of P. falciparum, in part due to the difficulty to propagate the parasites in vitro [2,3]. Genomic techniques, such as genome [4–9] and transcriptome [10–12] sequencing, have improved our knowledge of P. vivax biology, but are hindered by the complexity of most blood-stage P. vivax infections: multiple genetically-distinct parasites are often simultaneously present in one infection and, due to the lack of, or incomplete, sequestration of P. vivax stages, all intraerythrocytic developmental stages concurrently circulate in the blood. Experimental infections of non-human primates using monkey-adapted strains of P. vivax [13,14] provide a robust system to study the regulation of all blood stages in vivo using monoclonal and well-characterized parasites. However, the simultaneous presence of multiple developmental stages in the blood, each with their own regulatory profiles, remains a major challenge. Short-term ex vivo cultures [10,11] and statistical inferences of the stage composition [10–12,15,16] have been used to circumvent this issue but suffer from limited resolution and possible artefacts.

Characterization of the gene expression of single cells (scRNA-seq) provides an elegant alternative and has been successfully applied to different Plasmodium species [17–23], including P. vivax [24,25]. However, many scRNA-seq assays rely on the capture and sequencing of the 3’ ends of polyadenylated transcripts [26,27] and, consequently, only a short portion of each transcript is sequenced and the data generated provide little information about transcript isoforms and alternative splicing [11,12,15,28]. Additionally, it can be difficult to assign one signal to a specific gene since scRNA-seq reads typically derive from the 3’ untranslated regions (3’-UTRs) which are incompletely annotated in P. vivax [11,12,15,24,28].

Full-length isoform sequencing (iso-Seq) using long-read technologies, such as PacBio, enables reliable characterization of gene isoforms and their UTRs [29–35]. This method has also been applied to single cells for identifying cell-type specific isoforms in a high throughput manner [30,34,35] and improving analysis of subsequent scRNA-seq data [34].

Here we combine long- and short read sequencing of scRNA-seq libraries to characterize P. vivax isoforms throughout the intraerythrocytic life cycle as well as in sporozoites. The standard short read Illumina sequencing of scRNA-seq libraries provides a robust description of the developmental stage of each P. vivax parasite captured, while sequencing the same mRNA molecules using PacBio long reads enables characterizing full-length transcripts present in these cells. The data generated allow to better annotate the 5’- and 3’-UTRs of P. vivax genes, to comprehensively characterize the transcripts expressed, and to identify stage-specific isoforms. Overall, our results demonstrate that single cell long read sequencing, in conjunction with short read sequencing, provides a robust method for comprehensively characterizing mRNA sequences at the single cell level and identifying isoforms involved in the regulation of specific Plasmodium stages.

Results and discussion

Characterization of single cell P. vivax blood-stage and sporozoite transcriptomes

We obtained blood-stage parasites from two Saimiri boliviensis monkeys infected with the Chesson strain of P. vivax and prepared 10X Genomics 3’-end single cell RNA sequencing (scRNA-seq) libraries after enrichment of infected red blood cells (see Material and Methods). After generating 57,550,235–63,399,045 short reads per sample, we successfully mapped 72–74% of all reads to the P. vivax P01 genome sequence [36]. While full-length mRNAs are captured and converted into cDNA in the 10X droplets, only the 3’-end of each transcript is sequenced due to the cDNA fragmentation occurring during library preparation (Fig 1). All 3’-end scRNA-seq reads should therefore derive from the last ~300 bp of the transcripts (see e.g., S1 Fig). However, only 27–33% of the reads mapped within annotated P. vivax genes and 47–52% mapped more than 500 bp away from any annotated gene S1 Table). These results are consistent with previous analyses [24] and highlight that many genes and/or their 3’-UTRs remain incompletely annotated in the P. vivax genome. After removing PCR duplicates and stringent quality control filters, we obtained information about 949 and 1,807 single cell blood-stage transcriptomes, each characterized by more than 5,000 unique reads (Table 1). To characterize which blood stages were present in each infection, we analyzed these transcriptomes using principal component analysis and showed that several asexual and sexual developmental stages were present in both samples, although with some differences in their relative proportions (S2 Fig).

Fig 1 — The figure shows, on the left, the standard 10X scRNA-seq protocol based on Illumina sequencing and, on the right, the protocol used for sequencing full-length transcripts using Pacific Biosciences chemistry. Abbreviations: GEMs, Gel Beads in Emulsion; UMI, Unique Molecule Identifier; TSO, template switch oligo.

Table 1. Illumina and PacBio sequencing results and transcript predictions.

	Illumina sequencing data				PacBio sequencing data
Sample ID	No. of reads	%Pv reads	No. of unique reads^{^}	No. of cells*	No. of reads (CCS)	%in Illumina cells^#	No. of unique reads^{^}	No. of transcripts
Blood1	57,550,235	74%	31,177,026	949	3,390,270	64%	1,688,745	9,982
Blood2	63,399,045	72%	35,706,126	1,807	3,570,838	50%	1,523,456	9,982
Spz1	63,039,836	7%	3,654,760	2,609	2,578,778	8%	106,531	784
Spz2	74,606,583	3%	2,131,303	2,363	1,935,640	5%	40,977	784

Open in a new tab

^No. of mapped reads with unique molecular identifiers (UMIs).

#Percentage of PacBio reads with a GEM barcode corresponding to one of the Illumina-defined single cell transcriptomes.

We also prepared two scRNA-seq libraries from salivary gland sporozoites dissected from Anopheles stephensi and Anopheles freeborni, respectively, fed on Saimiri monkeys infected with P. vivax parasites. Out of the 63,039,836 and 79,688,059 reads generated from each library, 3–7% mapped to the P. vivax genome sequence, similar to the numbers obtained previously for P. berghei sporozoites [21]. 25–26% of those reads mapped within annotated P. vivax genes and 58–59% mapped more than 500 bp away from any gene (S1 Table). After removing PCR duplicates and stringent quality control filters, we obtained information about 2,609 and 2,363 single cell transcriptomes, each characterized by more than 250 unique reads (Tables 1 and S1). We used a lower threshold for the sporozoite analysis due to the lower number of P. vivax reads recovered as a result of the overwhelming presence of mosquito RNA (but the cutoff is comparable to those previously used for analyzing sporozoite scRNA-seq data [21]). In contrast to the heterogeneity of the blood stage parasites, these salivary gland sporozoites formed a relatively homogeneous population (S2 Fig).

Long read sequencing of scRNA-seq libraries provides full-length transcript information

Since only the 3’-end of each transcript is usually sequenced in scRNA-seq experiments, it is sometimes difficult to assign the scRNA-seq reads to a specific gene, or to identify signals derived from different gene isoforms [24]. We therefore used cDNA from the same four 10X scRNA-seq libraries before fragmentation to generate full-length isoform sequences using the PacBio chemistry (see Fig 1 and Material and Methods for details).

From each blood sample, we generated 3,390,270 and 3,570,838 circular consensus sequences (CCS) that each derived from at least 10 passes of sequencing of an individual cDNA molecule. 50–64% of these sequences carried a 10X barcode matching one of the cells characterized by Illumina sequencing, and >99% of those sequences mapped unambiguously to the P. vivax genome (Tables 1 and S1). After removal of PCR duplicates, we ended up with 1,688,745 and 1,523,456 unique reads (each derived from a unique mRNA molecule) for further analysis (Table 1). While the Illumina and PacBio library preparations diverged after cDNA amplification, the same barcoded cDNA molecules were used for both experiments (Fig 1) and the numbers of reads obtained with each technology for each individual cell were highly correlated (S3 Fig).

For the sporozoite samples, we generated 2,578,778 and 1,935,640 circular consensus sequences. 5–8% of these sequences carried a 10X barcode matching one of the cells characterized by Illumina sequencing, and 50–65% of those sequences mapped unambiguously to the P. vivax genome. After removal of PCR duplicates and additional QCs, we ended up with 106,531 and 40,977 unique reads respectively for further analysis (Tables 1 and S1). Similar to the pattern observed for the blood stage samples, the numbers of reads obtained by Illumina and PacBio sequencing for each sporozoite were highly correlated (S3 Fig) although, due to the lower sequencing depth, the overall number of cells with PacBio information was limited to 2,449 and 1,670.

We then summarized these mapped reads into P. vivax transcripts and, to avoid including sequences that may represent partially degraded molecules or technical artefacts, we only considered predicted transcripts represented by more than 10 PacBio unique reads in both samples of the same type (e.g., in both blood-stage samples). Overall, we identified a total of 9,982 transcripts from blood stage parasites and 784 from sporozoites (Tables 1 and S2), with an average transcript length of 2,432 bp from blood stages and 2,054 bp from sporozoites and ranging from 100 to 20,506 bp and from 197 to 6,809 bp, respectively. (Note that the transcript prediction algorithm occasionally leads to artefacts when transcripts in the same orientation overlap, which seems to be the case for the 20 kb transcript).

Most transcripts encode proteins corresponding to the genome annotation

Out of the 9,982 blood stage transcripts, 8,550 transcripts were predicted to encode more than 100 amino acids and 5,368 of those (63%) were predicted to encode a full-length protein (including a start and stop codon) (S1 Table). The median length of these full-length protein sequences was 291 amino acids, significantly smaller than the length of the protein sequences annotated in the latest version of the P01 P. vivax genome [36] (median of 401 amino acids) (S4 Fig). This discrepancy is possibly due to the lower 10X reverse transcriptase processivity that may hamper synthesis of extended cDNA molecules and prevent recovery of transcripts from long genes (note that most full-length transcripts obtained from PacBio sequencing match the annotated reference, see below). Of the 784 sporozoite transcripts, 563 transcripts were predicted to encode more than 100 amino acids and 344 of those (60%) were predicted to encode a full-length protein (including a start and stop codon), with a median length of 267 amino acids (S4 Fig and S1 Table).

We then compared the protein sequences predicted from these full-length transcripts to all P. vivax protein sequences annotated in the P01 genome (see S5 Fig, Materials and Methods). Most transcripts were predicted to encode protein-coding sequences highly similar to those annotated in the genome: 4,553 of the blood-stage transcripts (84.8%) and 254 of the sporozoite transcripts (73.8%) were identical to an annotated P. vivax protein-coding sequence for >90% of their length (S6 Fig, see e.g., Fig 2A). Additionally, 378 blood-stage (7.1%) and 23 sporozoite transcripts (6.7%) partially matched known protein-coding sequences and were identical over 50–90% of their sequence. While some of these transcripts could represent instances where the current gene annotation is possibly incorrect (see e.g., Fig 2B), in 253 of these cases (63%) another isoform in our dataset matched the entire annotated protein-coding sequence, suggesting that these differences represent incomplete annotations of transcript isoforms rather than incorrect annotations (see also below). One example is the cytochrome b5-like heme/steroid binding protein (PVP01_0716500) that is transcribed as annotated in the genome by blood-stage parasites but, due to an alternative start site, is transcribed into a shorter mRNA by sporozoites, resulting in a shorter predicted protein (Figs 2C and S7). Finally, 436 blood-stage (8.1%) and 67 sporozoite transcripts (19.5%) aligned over less than 50% of their length to annotated protein sequences but those transcripts were very short (less than 200 amino acids) and/or had low support from the PacBio reads and likely represented artifacts or fragmented transcripts. The complete list of predicted transcripts, their sequence and their identity to reference transcripts is presented in S2 Table.

Fig 2 — (A) Example of protein-coding transcripts matching the current gene annotations. The figure shows 5.7 kb of chromosome 13 containing two annotated P. *vivax* genes, the ubiquitin fusion degradation protein 1 (PVP01_1330900) and an ATP synthase-associated protein (PVP01_1331000). The blue horizontal bars at the bottom shows the annotations for these genes in plasmoDB v54, while the top panel shows the PacBio reads mapping to this locus (each red horizontal line is a unique read mapped to the positive strand with the grey lines indicating spliced introns). Note that, while the PacBio reads support a shorter 3’-UTR than annotated for PVP01_133100, the predicted protein coding sequences are identical to the ones annotated. **(B)** Example of protein coding transcript differing from the current gene annotation. The figure shows 5 kb of chromosome 13 surrounding the rhoptry-associated protein 1 (RAP1, PVP01_1338500). The PacBio reads (mapped to the negative strand and displayed in blue) support the presence of an unannotated intron (red box), leading to additional predicted coding sequences upstream of this intron, and a different protein than annotated in the genome (thick blue bars at the bottom). **(C)** Example of two isoforms with different predicted protein coding sequences. The top panel shows that blood-stage parasites express a transcript for cytochrome b5-like heme/steroid binding protein (PVP01_0716500) identical to the annotated protein-coding sequence (although with a shorter 5’-UTR). The middle panel shows that P. *vivax* sporozoites express this gene from a different start site (red box) resulting in a shorter transcript and a different predicted protein. (Note also the presence of an unannotated, and alternatively spliced, intron in the 3’UTR).

Most P. vivax transcripts have extended UTRs that often contain introns

Even for the transcripts highly similar to the annotated protein-coding genes, the PacBio sequences add novel information by providing a detailed description of the 5’- and 3’-UTRs which, despite recent efforts [28], remain incompletely annotated in P. vivax. Consistent with previous reports [11,15], our data showed extensive UTRs in many genes (Fig 3). We observed that 5’- and 3’-UTRs were roughly similar in length in transcripts expressed by blood-stage parasites, while 5’-UTRs were, on average, slightly longer than 3’-UTRs in sporozoite transcripts (745 vs 731 bp in blood stages [p-value = 0.3149] and 762 vs 650 bp in sporozoites [p-value = 0.0076]). To evaluate how this improved characterization of UTRs would affect future analyses of scRNA-seq data, we compared the percentage of scRNA-seq reads mapped to the currently annotated P. vivax genes with the percentage mapped to annotations supplemented by our predicted isoforms (available in S1 Data). While only 27–33% of the scRNA-seq reads mapped within previously annotated genes for the blood stage parasites and 25–26% for the sporozoites, including the PacBio predictions raises these figures to 69–77% and 65–70%, respectively (S1 Table). Reanalysis on an independent scRNA-seq dataset generated from another P. vivax blood stage infection (AMRU-I from [24]) confirmed these results, improving the mapping of reads to annotated transcripts from 30% to 88%.

Fig 3 — Distribution of UTR lengths (x-axis, in bp) for transcripts expressed by blood-stage parasites (top) and sporozoites (bottom).

While UTRs have not been extensively studied in Plasmodium, in many eukaryotes they contain important regulatory elements which often affect mRNA stability [37–41]. While the length of the 5’-UTR does not appear to be associated with the level of the transcript expression determined using the short reads (p = 0.315), the 3’-UTR length was negatively correlated with expression level (p = 2.8x10^-6), although with a very low coefficient of correlation (Spearman’s Rho = -0.0798) (S8 Fig). In an attempt to identify regulatory elements in the UTRs, we compared the abundance of all possible 5-mers in the 5’-UTRs, 3’-UTRs and promoter regions but failed to detect significant enrichment (S9 Fig). More in-depth analyses will be required to understand the function of these UTRs but the data presented here will provide a solid foundation to implement these studies.

Interestingly, out of the 5,368 full-length blood stage protein-coding transcripts, 1,072 (20%) had at least one intron in the UTRs: 419 transcripts had at least one intron in the 5’-UTR, 568 in the 3’-UTR and 89 had introns in both UTRs (see e.g., S10 Fig). Of these, 322 transcripts had more than one intron in their UTRs. Of the 344 full-length sporozoite transcripts, 26 had introns in the UTRs, 12 had at least one intron in the 5’-UTR, 11 in the 3’-UTR and 3 had introns in both UTRs. Five of these transcripts had more than one intron in the 5’- or 3’-UTR. Remarkably, in blood-stage parasites the levels of expression of the transcripts with an intron in the UTR were, on average, 3-fold higher than those of genes with no UTR intron (Wilcoxon rank test, p-value < 2.2x10^-16), suggesting that presence of introns in the UTR may be associated with increased mRNA stability in P. vivax. Furthermore, genes with introns in their coding sequences were also three times more likely to have introns in the UTR (χ² = 206.78, p-value < 2.2e-16), which may suggest that UTR splicing is mechanistically associated with splicing of coding sequences, possibly due to a more effective recruitment of the splicing machinery at those transcripts. To evaluate whether the presence of UTR introns was more frequent, at a specific developmental stage, we assigned each gene to the blood stage where it was most abundantly expressed. The proportion of genes with introns in their UTR was significantly different among stages (χ² = 26.141, p-value = 8.9 x 10⁻⁶), with 11% and 18% of the genes most expressed in early and late trophozoites having an UTR intron, respectively, compared to only 3% and 8% of the genes expressed in female gametocytes and schizonts (note that genes with multiple isoforms were excluded from this analysis as it is difficult to robustly quantify the relative expression of isoforms). Finally, we tested whether genes containing introns in their UTRs disproportionally belonged to a specific pathway but failed to detect any gene ontology enrichment (FDR < = 0.1, S3 Table).

Transcript isoforms are common in P. vivax and can be expressed in a stage-specific manner

The 5,368 full-length protein coding transcripts derived from blood-stage parasites were transcribed from 2,869 genes: 1,687 genes (59%) were transcribed into a single isoform, while 1,182 (41%) showed evidence of multiple isoforms (and out of those, 719 genes were expressed in two isoforms, 247 in three and 216 were transcribed in four or more isoforms). Most isoforms (n = 819, 70%) encoded the same protein sequence and differed only in their UTR length: 1,024 (87%) genes with isoforms differed in their 5’-UTR length and 1,008 (85%) differed in the 3’-UTR length, with 450 genes with isoforms differing in their UTR introns (Table 2, see also S10 Fig for an example). 363 genes showed evidence of isoforms that were predicted to encode for different proteins, due to an alternate protein coding start (299), alternative end (311), and/or exon skipping (67) (Table 2).

Table 2. Summary of isoform types from PacBio predictions.

Sample	No. of genes	Genes with >1 isoforms	Alt 5’-UTR	Alt 3’-UTR	Alt Coding Seq	Alt Coding Start	Alt Coding End
Blood	2,870	1,182	1,024	1,008	363	299	311
Sporozoites	238	27	8	9	8	5	6

Open in a new tab

In sporozoites, the 344 full-length protein coding transcripts were transcribed from 238 genes: 211 genes (89%) were transcribed into a single isoform, while 27 (11%) showed evidence of multiple isoforms (20 genes were expressed in two isoforms and three in three isoforms and four in four or more). All but eight isoforms were predicted to encode the same protein. Five of these genes had an alternate coding start and six had an alternate coding stop resulting in the change in coding sequence, and all had an exon skip or truncation (Table 2).

Since the full-length transcript data derived from molecules characterized by scRNA-seq, these data provide a unique opportunity to preliminarily examine whether different isoforms were expressed at different stages of the parasite development. Using the 10X cell barcodes, we determined the developmental age of each cell using pseudotime (determined from the Illumina scRNA-seq data). We only considered in this analysis 636 genes that had two (or more) isoforms expressed in at least 50 individual cells each. 123 (19%) of these genes showed evidence of expressing isoforms according to the parasite development (S4 Table). These stage-specific isoforms included 62 and 35 genes with differences in 5’- or 3’-UTR length (see e.g., Figs 4 or S11A) and/or changes in coding sequences (n = 40) (S11B Fig). These findings suggest that P. vivax utilizes alternative start or termination of transcription as a means of transcriptional or translational regulation between stages.

Fig 4 — **(A)** Each panel shows the PacBio reads mapped to the glutaredoxin 1 (PVP01_0833900) and split in four groups according to the stage of the parasites they derived from: early trophozoites, late trophozoites, schizonts and female gametocytes (the terminology used for each group reflects developmental categories based on pseudotime analysis and might not exactly correspond to stages determined by microscopy). Female gametocytes express glutaredoxin 1 from a more upstream TSS than asexual parasites, and the resulting transcripts have an additional intron in the 5’-UTR (red box). (Note also the presence on an alternatively spliced intron in the 3’UTR of some transcripts). (B) PCAs showing that the short (blue) and long (red) isoforms for glutaredoxin-1 are expressed at stages of the parasite development, with the long isoform almost exclusively expressed in female gametocytes.

Conclusion

Long read PacBio sequencing coupled with short read Illumina sequencing of single cell RNA-seq libraries provide a robust method to detect and characterize full-length transcripts and to identify mRNA isoforms expressed at different times. These simple modifications of existing laboratory protocols for generating 10XGenomics scRNA-seq can be easily applied to any eukaryotic cells and could be invaluable to examine variations in gene expression in organisms that are difficult to study in the laboratory and display a complex life cycle. This study also yielded novel insights on the variations and regulation of gene expression in P. vivax, which provides a solid framework to improve our understanding of P. vivax biology and regulation. This study notably highlighted the presence of extensive and incompletely annotated UTRs, and of ubiquitous UTR introns, that were sometimes expressed in a stage-specific manner. In particular, the observation that different stages used different start sites for transcribing the same gene provides new insights on how gene expression may be regulated throughout the Plasmodium life cycle. In addition, the extensive variations observed in 3’-UTR lengths and the presence of introns in these regions might indicate important roles in mRNA stability or translation efficiency and, consequently, in the regulation of Plasmodium parasite biology.

Materials and methods

Ethics statement

All animal procedures were conducted in accordance with the National Institutes of Health (NIH) guidelines and regulations [42], under approved protocols by the National Institute of Allergy and Infectious Diseases (NIAID) Animal Care and Use Committee (ACUC) (Animal study NIAID LMVR15). Animals were purchased from NIH-approved sources and transported and housed according to Guide for the Care and Use of Laboratory Animals [42].

Animal studies and sample collection

We infected two splenectomized Saimiri boliviensis monkeys with the Chesson strain of P. vivax using parasitized erythrocytes from cryopreserved stocks. Once they developed a parasitemia >0.1%, we collected 1 mL of blood from the femoral vein of each monkey after anesthesia with 10 mg/kg of ketamine and processed the blood samples on MACS LS columns as previously described [24]. Note that this enrichment procedure, that relies on the paramagnetic properties of hemozoin generated by maturing blood-stage parasites, fails to capture ring stage parasites.

Two blood samples from two additional Saimiri boliviensis monkeys coinfected with a NIH-1993 clone [24] and the Chesson P. vivax strain were used for membrane feeding of Anopheles stephensi and Anopheles freeborni. Salivary glands sporozoites were collected from each feeding at 21 days post-feed: 50 female mosquitoes were anesthetized on ice and their salivary glands dissected in PBS under a stereomicroscope. The salivary glands were transferred to a low-retention tube (Protein LoBind Tube; Eppendorf) containing PBS, homogenized with a disposable pestle, spun down, washed, resuspended, and quantified.

10X single cell RNA-sequencing library preparation and sequencing

An estimated 3,000 infected red blood cells or sporozoites from each sample were loaded onto 10X Chromium controller to prepare scRNA-seq libraries according to the manufacturer’s instructions. We then generated, from each library, 57–75 million paired-end reads using an Illumina NovaSeq.

In addition, an aliquot of the cDNA prior to fragmentation was amplified by eight additional cycles of PCR before preparation of a PacBio library using the SMRTBell Express kit 2.0. We then generated 196–328 million reads from each library using a PacBio Sequel II.

Short read analysis and single cell characterization

Following Illumina sequencing, the short reads were processed as described previously [24]. Briefly, we mapped all reads to the P01 P. vivax genome [36] using Hisat2 [43] with the default parameters except for a maximum intron length of 5,000 bp. We then removed PCR duplicates by identifying reads with identical barcode, unique transcript identifier (UMI) and mapping coordinates. We assigned each unique read to i) a cell based on its barcode and ii) a 500 bp window based on its genomic position. Only cells defined by more than 5,000 unique reads in blood stages and 250 unique reads in sporozoites were further analyzed. Count matrixes and principal component analysis was performed using in-house scripts.

Long read analysis

The entire analytical pipeline for processing and analyzing the long-read data is described in S5 Fig. Briefly, the raw PacBio reads were first collapsed into circular consensus sequencing (CCS) using smrtanalysis [44] and only CCS supported by more than 10 passes were considered for further analysis. We then compared the 10X barcodes of each CCS reads with those obtained after Illumina sequencing and kept all reads matching the barcodes of one of the cells characterized by more than 5,000 unique Illumina reads. To account for the higher error rate of PacBio sequencing we allowed up to one nucleotide mismatch in the barcode sequence. After trimming the 10X adapters, 10X barcodes and the polyA tails using custom scripts, we mapped all CCS reads to the P. vivax P01 genome using minimap2 [45] using the cdna parameters and a k of 14. PCR duplicates were removed as described above. Only reads mapped to the P01 genome over 50% of their length were kept for further analysis.

Transcript and protein identification

Transcript prediction was performed separately for each sample utilizing Stringtie2 [46,47]. Mapping files were divided by forward and reverse reads, and into single and multiple exon reads. Each of the four files was run separately using Stringtie2 long read default parameters. Only transcripts supported by more than 10 reads in each sample were considered for further processing (separately for the blood-stage and sporozoite samples). Transcript predictions were then compared and collapsed across samples into a single gtf for the blood-stage and sporozoite samples using gffcompare and custom scripts. We then used Transdecoder [48] to identify putative protein coding sequences from the predicted transcript sequences. Finally, we compared the predicted protein sequences to those annotated in the most current P. vivax P01 genome (v54) using BlastP and custom scripts.

Stage-specific transcript analysis

To identify isoforms expressed in a stage-specific manner, we first assigned each PacBio read to a specific isoform with BLAT, using all predicted stringtie transcripts (including non-coding and incomplete transcripts) and considering the match with the greatest overall identity (and only considering reads aligned with >90% identity as aligned). We then identified the pseudotime of the cell expressing this transcript by matching the GEMS barcode to the Illumina data. For each gene that had two (or more) isoforms, each expressed in more than 50 cells, we then tested for qualitative differences in isoform expression associated with the parasite development by comparing the pseudotime ranks of the cells expressing the first isoform with the pseudotime ranks of the cells expressing the second isoform using a Kolmogorov-Smirnov test.

Supporting information

S1 Fig. Example of Illumina and Pacbio data generated from the same scRNA-seq library.

The top panel shows the data generated by Illumina sequencing and displays typical peaks corresponding to the 3’-end of each expressed transcript. The middle panel shows data generated, from the same mRNAs, using PacBio sequencing and illustrates how generating full-length transcripts improves interpretation of scRNA-seq data for organisms with incomplete gene annotations.

(TIF)

Click here for additional data file.^{(663.8KB, tif)}

S2 Fig. Principal component analysis showing the relationships among individual parasite cells characterized by scRNA-seq (using the Illumina data).

Top row: each dot is a single cell blood stage transcriptome, and is displayed based on its gene expression profile and colored according to the expression of stage markers (with cells being assigned to marker with the highest expression): red–early trophozoites, green–late trophozoites, purple–schizonts, turquoise–female gametocytes. Note that few ring stage parasites are included (if any) due to the enrichment method used (see Material and Methods). Bottom row: single cell sporozoite data.

(TIF)

Click here for additional data file.^{(524KB, tif)}

S3 Fig. Correlation between the number of reads obtained by Illumina and PacBio.

The scatterplot shows the correlation between the number of Illumina reads (x-axis) and PacBio reads (y-axis) obtained from each cell (individual black dots). Each panel represents the data for a different sample.

(TIF)

Click here for additional data file.^{(255.3KB, tif)}

S4 Fig. Predicted protein length distributions.

Distribution of the length (in amino acids) of the protein-coding sequences predicted from the PacBio transcripts (in red) and of the protein-coding sequences annotated in the P01 P. vivax genome (in blue).

(TIF)

Click here for additional data file.^{(304.1KB, tif)}

S5 Fig. Summary of the bioinformatic pipeline used for processing the PacBio reads and for predicting transcripts.

(TIF)

Click here for additional data file.^{(506.2KB, tif)}

S6 Fig. Comparison of the predicted proteins with annotated proteins.

Distribution of the percentage alignment (x-axis) of the predicted protein coding sequences with the most similar protein sequence annotated in the P01 genome. Left: blood-stage transcripts. Right: sporozoite transcripts. (Note that the y-axis is cut and the right-most bars (perfect matches) go to 400 and 250 for the left and right panels, respectively).

(TIF)

Click here for additional data file.^{(614.7KB, tif)}

S7 Fig. Stage-specific expression of the isoforms of the cytochrome b5-like heme/steroid binding protein (PVP01_0716500).

The left PCA shows blood-stage and sporozoite parasites jointly displayed according to their gene expression profiles. The right figure shows the same PCA, with each parasite colored based on the cytochrome b5-like heme/steroid binding protein isoform expression: red–“sporozoite” isoform, blue–“blood-stage” isoform.

(TIF)

Click here for additional data file.^{(705.9KB, tif)}

S8 Fig. UTR length and expression.

Correlation between the length of a transcript’s UTR (x-axis, in bp) and its level of expression determined by Illumina data (y-axis). Note that only genes expressing a single isoform are included in this analysis.

(TIF)

Click here for additional data file.^{(128.6KB, tif)}

S9 Fig. UTR kmer analysis.

Comparison of the abundance of all 5-mers in gene promoters, 5’-UTRs and 3’-UTRs. Note that most motifs with different abundance (i.e., deviating from the diagonales) are either repeated sequences or encoding for a start codon (ATG).

(TIF)

Click here for additional data file.^{(395.4KB, tif)}

S10 Fig. Example of a transcript with unannotated UTR introns.

The figure shows PacBio reads (in blue) corresponding to the annotated mRNA for coenzyme Q-binding protein COQ10 homolog (PVP01_0113000) but with an unannotated intron in the 3’-UTR (red box) as well as, for a subset of the mRNAs, a second intron in the 5’-UTR (blue box).

(TIF)

Click here for additional data file.^{(650KB, tif)}

S11 Fig. Examples of isoforms expressed in a stage-specific manner.

Each panel shows the PacBio reads mapped to a selected locus and split in four groups according to the stage of the parasites they derived from: early trophozoites, late trophozoites, schizonts and female gametocytes (from top to bottom). (A) Early trophozoites express the Ham 1-like protein (PVP01_0316500) from a more upstream TSS than the other stages and the resulting transcripts have a longer 5’-UTR containing five introns (red box). (B) The isoforms expressed from suppressor of kinetochore protein 1 (PVP01_1105000) result into different predicted protein coding sequences: some transcripts expressed exclusively in early trophozoites retain the third intron (red box) leading to a different open reading frame (blue bars at the bottom).

(TIF)

Click here for additional data file.^{(939KB, tif)}

S1 Table. Sequencing and transcript predictions results.

Table listing numbers from each step of sequencing analysis and transcript prediction.

(XLSX)

Click here for additional data file.^{(10.8KB, xlsx)}

S2 Table. List of all predicted transcripts.

Table listing all predicted transcripts, their nucleotide sequence, amino acid sequence (if applicable), most similar PVP01 gene name and description (if applicable), their UTR lengths, and the number of introns.

(XLSX)

Click here for additional data file.^{(9.4MB, xlsx)}

S3 Table. Results of Gene Ontology analysis.

Top hits from GO analysis.

(XLSX)

Click here for additional data file.^{(41.7KB, xlsx)}

S4 Table. List of predicted genes with differentially expressed isoforms.

(XLSX)

Click here for additional data file.^{(24.7KB, xlsx)}

S1 Data. Gene predictions.

gtf file of all predicted protein coding transcripts combinned with the current reference.

(GTF)

Click here for additional data file.^{(11.2MB, gtf)}

Acknowledgments

We thank the technicians, care-takers, and veterinaries of the Division of Veterinary Resources and of the insectary of the Laboratory of Malaria and Vector Research, National Institute of Allergy and Infectious Diseases, for animal care, technical assistance, and mosquito rearing; and S. Ott, H. Bowen, L. Sadzewicz and L. Tallon in the Genomic Resource Center at the University of Maryland School of Medicine for their support with Illumina and PacBio sequencing.

Data Availability

All sequence data generated in this study are deposited at the Sequence Read Archive under the BioProject PRJNA863611. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA863611/. Custom scripts are available through github at https://github.com/bhazzard11/Single-Cell-PacBio.

Funding Statement

This work was supported by an award from the National Institutes of Health to the University of Maryland School of Medicine (U19 AI110820 to DS) and by the Intramural Research Program of the National Institute of Allergy and Infectious Diseases, National Institutes of Health. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1.World Malaria Report 2021. Geneva: World Health Organization; 2021. Licence: CC BY-NC-SA 3.0 IGO.
2.Anstey NM, Guerra CA, Yeung S, Price RN, Tjitra E, White NJ. Vivax Malaria: Neglected and Not Benign. Am J Trop Med Hyg. 2007. Dec 1;77(6_Suppl):79–87. [PMC free article] [PubMed] [Google Scholar]
3.Bermúdez M, Moreno-Pérez DA, Arévalo-Pinzón G, Curtidor H, Patarroyo MA. Plasmodium vivax in vitro continuous culture: the spoke in the wheel. Malar J. 2018. Dec;17(1):301. doi: 10.1186/s12936-018-2456-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Carlton JM, Adams JH, Silva JC, Bidwell SL, Lorenzi H, Caler E, et al. Comparative genomics of the neglected human malaria parasite Plasmodium vivax. Nature. 2008. Oct;455(7214):757–63. doi: 10.1038/nature07327 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Neafsey DE, Galinsky K, Jiang RHY, Young L, Sykes SM, Saif S, et al. The malaria parasite Plasmodium vivax exhibits greater genetic diversity than Plasmodium falciparum. Nat Genet. 2012. Sep;44(9):1046–50. doi: 10.1038/ng.2373 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Chan ER, Barnwell JW, Zimmerman PA, Serre D. Comparative Analysis of Field-Isolate and Monkey-Adapted Plasmodium vivax Genomes. Escalante AA, editor. PLoS Negl Trop Dis. 2015. Mar 13;9(3):e0003566. doi: 10.1371/journal.pntd.0003566 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Pearson RD, Amato R, Auburn S, Miotto O, Almagro-Garcia J, Amaratunga C, et al. Genomic analysis of local variation and recent evolution in Plasmodium vivax. Nat Genet. 2016. Aug;48(8):959–64. doi: 10.1038/ng.3599 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Auburn S, Benavente ED, Miotto O, Pearson RD, Amato R, Grigg MJ, et al. Genomic analysis of a pre-elimination Malaysian Plasmodium vivax population reveals selective pressures and changing transmission dynamics. Nat Commun. 2018. Dec;9(1):2585. doi: 10.1038/s41467-018-04965-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Popovici J, Friedrich LR, Kim S, Bin S, Run V, Lek D, et al. Genomic Analyses Reveal the Common Occurrence and Complexity of Plasmodium vivax Relapses in Cambodia. Miller LH, editor. mBio. 2018. Mar 7;9(1):e01888–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Bozdech Z, Mok S, Hu G, Imwong M, Jaidee A, Russell B, et al. The transcriptome of Plasmodium vivax reveals divergence and diversity of transcriptional regulation in malaria parasites. Proc Natl Acad Sci. 2008. Oct 21;105(42):16290–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Zhu L, Mok S, Imwong M, Jaidee A, Russell B, Nosten F, et al. New insights into the Plasmodium vivax transcriptome using RNA-Seq. Sci Rep. 2016. Apr;6(1):20498. doi: 10.1038/srep20498 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Kim A, Popovici J, Menard D, Serre D. Plasmodium vivax transcriptomes reveal stage-specific chloroquine response and differential regulation of male and female gametocytes. Nat Commun. 2019. Dec;10(1):371. doi: 10.1038/s41467-019-08312-z [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Collins WE. Nonhuman primate models. II. Infection of Saimiri and Aotus monkeys with Plasmodium vivax. Methods Mol Med. 2002;72:85–92. doi: 10.1385/1-59259-271-6:85 [DOI] [PubMed] [Google Scholar]
14.Young Martin D., Porter James A., Johnson Carl M. Plasmodium vivax Transmitted from Man to Monkey to Man. Science. 1966. Aug 26;153(3739):1006–7. doi: 10.1126/science.153.3739.1006 [DOI] [PubMed] [Google Scholar]
15.Kim A, Popovici J, Vantaux A, Samreth R, Bin S, Kim S, et al. Characterization of P. vivax blood stage transcriptomes from field isolates reveals similarities among infections and complex gene isoforms. Sci Rep. 2017. Dec;7(1):7761. doi: 10.1038/s41598-017-07275-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Tebben K, Dia A, Serre D. Determination of the Stage Composition of Plasmodium Infections from Bulk Gene Expression Data. Langelier CR, editor. mSystems. 2022. Aug 30;7(4):e00258–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Reid AJ, Talman AM, Bennett HM, Gomes AR, Sanders MJ, Illingworth CJR, et al. Single-cell RNA-seq reveals hidden transcriptional variation in malaria parasites. eLife. 2018. Mar 27;7:e33105. doi: 10.7554/eLife.33105 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Poran A, Nötzel C, Aly O, Mencia-Trinchant N, Harris CT, Guzman ML, et al. Single-cell RNA sequencing reveals a signature of sexual commitment in malaria parasites. Nature. 2017. Nov;551(7678):95–9. doi: 10.1038/nature24280 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Howick VM, Russell AJC, Andrews T, Heaton H, Reid AJ, Natarajan K, et al. The Malaria Cell Atlas: Single parasite transcriptomes across the complete Plasmodium life cycle. Science. 2019. Aug 23;365(6455):eaaw2619. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Brancucci NMB, De Niz M, Straub TJ, Ravel D, Sollelis L, Birren BW, et al. Probing Plasmodium falciparum sexual commitment at the single-cell level. Wellcome Open Res. 2018. Oct 17;3:70–70. doi: 10.12688/wellcomeopenres.14645.4 [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Bogale HN, Pascini TV, Kanatani S, Sá JM, Wellems TE, Sinnis P, et al. Transcriptional heterogeneity and tightly regulated changes in gene expression during Plasmodium berghei sporozoite development. Proc Natl Acad Sci. 2021. Mar 9;118(10):e2023438118. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Real E, Howick VM, Dahalan FA, Witmer K, Cudini J, Andradi-Brown C, et al. A single-cell atlas of Plasmodium falciparum transmission through the mosquito. Nat Commun. 2021. Dec;12(1):3196. doi: 10.1038/s41467-021-23434-z [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Ruberto AA, Bourke C, Merienne N, Obadia T, Amino R, Mueller I. Single-cell RNA sequencing reveals developmental heterogeneity among Plasmodium berghei sporozoites. Sci Rep. 2021. Dec;11(1):4127. doi: 10.1038/s41598-021-82914-w [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Sà JM, Cannon MV, Caleon RL, Wellems TE, Serre D. Single-cell transcription analysis of Plasmodium vivax blood-stage parasites identifies stage- and species-specific profiles of expression. Duffy M, editor. PLOS Biol. 2020. May 4;18(5):e3000711. doi: 10.1371/journal.pbio.3000711 [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Ruberto AA, Bourke C, Vantaux A, Maher SP, Jex A, Witkowski B, et al. Single-cell RNA sequencing of Plasmodium vivax sporozoites reveals stage- and species-specific transcriptomic signatures. Mireji PO, editor. PLoS Negl Trop Dis. 2022. Aug 4;16(8):e0010633. doi: 10.1371/journal.pntd.0010633 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Baran-Gale J, Chandra T, Kirschner K. Experimental design for single-cell RNA sequencing. Brief Funct Genomics. 2018;17(4):8. doi: 10.1093/bfgp/elx035 [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Zheng GXY, Terry JM, Belgrader P, Ryvkin P, Bent ZW, Wilson R, et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun. 2017. Apr;8(1):14049. doi: 10.1038/ncomms14049 [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Siegel SV, Chappell L, Hostetler JB, Amaratunga C, Suon S, Böhme U, et al. Analysis of Plasmodium vivax schizont transcriptomes from field isolates reveals heterogeneity of expression of genes involved in host-parasite interactions. Sci Rep. 2020. Dec;10(1):16667. doi: 10.1038/s41598-020-73562-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Sharon D, Tilgner H, Grubert F, Snyder M. A single-molecule long-read survey of the human transcriptome. Nat Biotechnol. 2013. Nov;31(11):1009–14. doi: 10.1038/nbt.2705 [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Tian L. Comprehensive characterization of single-cell full-length isoforms in human and mouse with long-read sequencing. 2021;24. doi: 10.1186/s13059-021-02525-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Shields EJ, Sorida M, Sheng L, Sieriebriennikov B, Ding L, Bonasio R. Genome annotation with long RNA reads reveals new patterns of gene expression and improves single-cell analyses in an ant brain. BMC Biol. 2021. Dec;19(1):254. doi: 10.1186/s12915-021-01188-w [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Westoby J, Artemov P, Hemberg M, Ferguson-Smith A. Obstacles to detecting isoforms using full-length scRNA-seq data. Genome Biol. 2020. Dec;21(1):74. doi: 10.1186/s13059-020-01981-w [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Rhoads A, Au KF. PacBio Sequencing and Its Applications. Genomics Proteomics Bioinformatics. 2015. Oct;13(5):278–89. doi: 10.1016/j.gpb.2015.08.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Healey HM, Bassham S, Cresko WA. Single-cell Iso-Sequencing enables rapid genome annotation for scRNAseq analysis. Sanchez Alvarado A, editor. Genetics. 2022. Mar 3;220(3):iyac017. doi: 10.1093/genetics/iyac017 [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Gupta I, Collier PG, Haase B, Mahfouz A, Joglekar A, Floyd T, et al. Single-cell isoform RNA sequencing characterizes isoforms in thousands of cerebellar cells. Nat Biotechnol. 2018. Dec;36(12):1197–202. doi: 10.1038/nbt.4259 [DOI] [PubMed] [Google Scholar]
36.Auburn S, Böhme U, Steinbiss S, Trimarsanto H, Hostetler J, Sanders M, et al. A new Plasmodium vivax reference sequence with improved assembly of the subtelomeres reveals an abundance of pir genes. Wellcome Open Res. 2016. Nov 15;1:4. doi: 10.12688/wellcomeopenres.9876.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Adjalley SH, Chabbert Christophe D, Klaus Bernd Pelechano Vicent Steinmetz Lars M. Landscape and Dynamics of Transcription Initiation in the Malaria Parasite Plasmodium falciparum. Cell Rep. 2016. Mar 15;(14):14. doi: 10.1016/j.celrep.2016.02.025 [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Breitbart RE, Andreadis A, Nadal-Ginard B. ALTERNATIVE SPLICING: A UBIQUITOUS MECHANISM FOR THE GENERATION OF MULTIPLE PROTEIN ISOFORMS FROM SINGLE GENES. Annu Rev Biochem. 1987. Jun 1;56(1):467–95. doi: 10.1146/annurev.bi.56.070187.002343 [DOI] [PubMed] [Google Scholar]
39.Pavlovic Djuranovic S, Erath J, Andrews RJ, Bayguinov PO, Chung JJ, Chalker DL, et al. Plasmodium falciparum translational machinery condones polyadenosine repeats. eLife. 2020. May 29;9:e57799. doi: 10.7554/eLife.57799 [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Li J, Cai B, Qi Y, Zhao W, Liu J, Xu R, et al. UTR introns, antisense RNA and differentially spliced transcripts between Plasmodium yoelii subspecies. Malar J. 2016. Dec;15(1):30. doi: 10.1186/s12936-015-1081-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Yeoh LM, Lee VV, McFadden GI, Ralph SA. Alternative Splicing in Apicomplexan Parasites. Sibley D, Garsin DA, editors. mBio. 2019. Feb 26;10(1):e02866–18. doi: 10.1128/mBio.02866-18 [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Hawkins P, Morton DB, Burman O, Dennison N, Honess P, Jennings M, et al. A guide to defining and implementing protocols for the welfare assessment of laboratory animals: eleventh report of the BVAAWF/FRAME/RSPCA/UFAW Joint Working Group on Refinement. Lab Anim. 2011. Jan 1;45(1):1–13. doi: 10.1258/la.2010.010031 [DOI] [PubMed] [Google Scholar]
43.Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 2015. Apr;12(4):6. doi: 10.1038/nmeth.3317 [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Bayega A, Fahiminiya S, Oikonomopoulos S, Ragoussis J. Current and Future Methods for mRNA Analysis: A Drive Toward Single Molecule Sequencing. Methods Mol Biol Clifton NJ. 2018;1783:209–41. doi: 10.1007/978-1-4939-7834-2_11 [DOI] [PubMed] [Google Scholar]
45.Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018. May 10;34(18):7. doi: 10.1093/bioinformatics/bty191 [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Kovaka S, Zimin AV, Pertea GM, Razaghi R, Salzberg SL, Pertea M. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 2019. Dec;20(1):278. doi: 10.1186/s13059-019-1910-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Pertea M, Kim D, Pertea GM, Leek JT, Salzberg SL. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc. 2016. Sep;11(9):1650–67. doi: 10.1038/nprot.2016.095 [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc. 2013. Aug;8(8):1494–512. doi: 10.1038/nprot.2013.084 [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS Negl Trop Dis. doi: 10.1371/journal.pntd.0010991.r001

Decision Letter 0

Ana Rodriguez, Ananias A Escalante

23 Aug 2022

Dear Ms Hazzard,

Thank you very much for submitting your manuscript "Long read single cell RNA sequencing reveals the isoform diversity of Plasmodium vivax transcripts" for consideration at PLOS Neglected Tropical Diseases. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Ananias A. Escalante, PhD

Academic Editor

PLOS Neglected Tropical Diseases

Ana Rodriguez

Section Editor

PLOS Neglected Tropical Diseases

***********************

Reviewer's Responses to Questions

Key Review Criteria Required for Acceptance?

As you describe the new analyses required for acceptance, please consider the following:

Methods

-Are the objectives of the study clearly articulated with a clear testable hypothesis stated?

-Is the study design appropriate to address the stated objectives?

-Is the population clearly described and appropriate for the hypothesis being tested?

-Is the sample size sufficient to ensure adequate power to address the hypothesis being tested?

-Were correct statistical analysis used to support conclusions?

-Are there concerns about ethical or regulatory requirements being met?

Reviewer #1: The data collection methods are clear, although more background could be given to put the methods in context. The code used for the analysis is not available on the github link provided.

Reviewer #2: The methods are adequately described.

Reviewer #3: Minor revisions only ( see below)

--------------------

Results

-Does the analysis presented match the analysis plan?

-Are the results clearly and completely presented?

-Are the figures (Tables, Images) of sufficient quality for clarity?

Reviewer #1: The results are clearly presented.

Reviewer #2: The results are clearly and concisely presented.

Reviewer #3: Minor revisions only ( see below)

--------------------

Conclusions

-Are the conclusions supported by the data presented?

-Are the limitations of analysis clearly described?

-Do the authors discuss how these data can be helpful to advance our understanding of the topic under study?

-Is public health relevance addressed?

Reviewer #1: Conclusions could be strengthened if they can show that their data can be used to improve the reference and increase mapping rate.

Reviewer #2: The conclusion section is remarkably brief and could be extended. The authors could mention the applicability of this methodology to other parasites, the possible significance of alternative mRNA isoforms and how this could be followed up and how the findings of this study improve our understanding of P. vivax biology.

Reviewer #3: Minor revisions only ( see below)

--------------------

Editorial and Data Presentation Modifications?

Use this section for editorial suggestions as well as relatively minor modifications of existing data that would enhance clarity. If the only modifications needed are minor and/or editorial, you may wish to recommend “Minor Revision” or “Accept”.

Reviewer #1: Minor Comments

-The authors should cite papers outside of malaria that have used this technology before, there is no context given for the development of the technology and what it has been used for in other systems.

- The authors justify using pacbio to look at isoforms because the 10x libraries are 3' biased. However, they don't show that this is actually the case for P. vivax. Could the authors show this coverage bias in their data?

-Fig S1 Blood stages: show expression of markers so it is clear how stage was assigned.

Sporozoites: Colour the plot by a marker gene or UMI. Is there a subset of higher quality cells that could be used

-How have you controlled for different levels of expression in the stage-specific transcript analysis?

Reviewer #2: I have only minor comments for the authors to consider.

1. On line 147, the authors state that the length of the transcripts from asexual stages ranged from 100 to 20,506 bp. A 20 kb transcript is remarkably long, much longer than any annotated gene. What is this transcript and do the authors think this is a genuine mRNA?

2. In Figures 2C, 5A and 5C there appears to be evidence for alternative splicing that is not mentioned in the Figure legend. Is this correct? If so, it might be worth mentioning so that readers are not confused.

3. In Figure 5, the authors mark the figures with the terms “Group A”, “Group B” and “Group C”, while the figure legend explains that these terms refer to early trophozoites, late trophozoites and schizonts respectively. Why not simply mark the figure itself with the stage of the parasites? I don’t see the utility of using the terms “Group A, B, C”.

Reviewer #3: (No Response)

--------------------

Summary and General Comments

Use this section to provide overall comments, discuss strengths/weaknesses of the study, novelty, significance, general execution and scholarship. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. If requesting major revision, please articulate the new experiments that are needed.

Reviewer #1: Hazzard et al have generated short- and long-read scRNAseq data using the standard 10x genomics pipeline in combination with pacbio sequencing in P. vivax blood stages and sporozoites. Their analysis identifies a subset of transcripts that do not match the current reference annotations as well as isoforms that show a stage specific pattern of expression. They identify different UTR lengths between blood-stage parasites and sporozoites, and UTRs that contain introns, which they hypothesize might play a role in gene regulation.

Overall, I found the analysis superficial and lacking of interesting biology. Although this is an interesting dataset, the authors have not invested enough in the analysis to either reveal noteworthy biology or provide the community with a valuable resource dataset. I suggest strengthing the paper as a reference dataset prior to publication.

Major Comments

The authors show that only half (or less) of the reads map within 500 bp of an annotated p. vivax gene and suggest that this is because of incomplete annotation of the reference genome. Can the authors use their data to improve the reference and demonstrate that more of the reads would map if they had better annotations?

line 196: "More in-depth analyses will be required to understand the function of these UTRs but the data presented here will provide a solid foundation to implement these studies." If the authors want this to be a useful dataset they need to provide the actual data. Could the authors provide a gtf with the newly annotated isoforms and UTRs? Can the dataset be integrated into PlasmoDB?

-Three of the five main figures are screen shots from a genome browser tool. I suggest that the authors diversify and include more information in the main figures. For example, where they demonstrate different isofroms between blood stage and sporozoite parasites, what does expression of the different isoforms look like on the PCA and does that help them interpret the biology of the cell clusters within the blood stages?

Reviewer #2: The submission by Hazzard and colleagues describes the application of combined Illumina/PacBio sequencing to 10X single cell transcriptomic analysis of Plasmodium vivax infected red blood cells. P. vivax remains a major cause of morbidity in large regions of the developing world, however analysis of these parasites has been hampered by the inability to raise this species in culture, thus limiting experimental approaches. The application of single cell transcriptomics to parasites isolated from patients or from experimentally infected primates has enabled a much better analysis of gene expression patterns, however technical limitations have resulted in most information being derived from sequences near the 3’ ends of the transcripts. Here the authors attempt to address this issue by using long-read sequencing technology, thereby deriving sequence information from much longer stretches of the transcripts, in many cases obtaining full-length transcript information. This enabled them to identify products of alternative splicing and variations in the 5’ and 3’ UTRs. In addition, by identifying the developmental stage of each individual cell analyzed, they were able to associate stage-specificity with alternative isoforms, something that was difficult previously.

The manuscript reports an important and useful technical advance in the study of Plasmodium vivax. The methodology described here is likely to be applied by many research groups in the future, and the advance will likely also be applicable to other Plasmodium species, including P. falciparum. The manuscript also includes novel discoveries regarding alternative mRNA isoforms, which will undoubtedly be following up to determine their biological significance. Overall, the manuscript is well written and easily understood.

Reviewer #3: The manuscript by Hazzard and colleagues presents a single-cell sequencing dataset consisting of Plasmodium vivax blood stages and sporozoites, sequenced using two different platforms – traditional Illumina sequencing and PacBio long-read sequencing.

Plasmodium vivax remains a relevant human pathogen and given how few good quality transcriptomics datasets are available, the presented work is certainly of significant value to the parasitology community. Additionally, the combination of the two techniques provides a wealth of information regarding the parasite transcript structure, gene expression changes etc. The rationale of the work is clearly stated and backed up by relevant literature. The manuscript is generally clearly written.

There is however a number of details (mostly related to data analysis) that should be clarified in order to justify some of the conclusions of the manuscript. The major ones are:

1) Cell identity:

It is not clear how different stages of the parasite life cycle were identified in the blood sample. The single-cell data appears to be plotted as PCA components (rather than UMAP or t-SNE graphs which are the golden standard of dimensionality reduction for this data type) and does not form clear clusters. There is no mention of any attempts of clustering, identification of marker genes or developmental trajectory mapping etc. Pseudotime analysis is mentioned but there are no methods attached to it, so it is unknown what was done exactly. The sporozoite samples are processed separately so it is not clear how they relate to other stages. Given the importance for further transcript assignment, the details of this analysis (if done) should be included, ideally pooling the sporozoite samples together with other stages.

2) Full transcript annotation:

Authors mention themselves that numerous transcripts/peptides predicted from their annotation appear to be truncated blaming it on “the 10X reverse transcriptase processivity that will hamper synthesis of long cDNA molecules and prevent recovery of full-length transcripts”. In this light, the fact that the vast majority of the new isoforms they identified differ in the length of 3’UTR or 5’UTR only, may suggest that they represent truncated RNA molecules rather than full functional transcript isoforms. The fact that UTRs in general have different lengths between the stages (something not observed in the other Plasmodium datasets as far as I know) may support this.

In this case more stringent filtering conditions (or an attempt to validate at least one of the "new' transcripts by eg. Northern blot, although I realise that authors may not have sufficient material or the tools to attempt it) need to be used to identify the novel Plasmodium isoforms

This should be acknowledged in the manuscript and the general conclusions regarding the transcript UTRs length should be toned down.

3) Differential isoform usage between the life stages.

Again, it is not clear how robust this analysis is without further details. Rather than using tools for isoform usage quantification, the authors choose to go for custom approach consisting of the stage distribution of cells containing a given isoform across the life cycle (if I understand correctly). No details of this method and no processed output is given. The manuscripts is badly missing a table with the list of all stage-specific isoforms, the numbers of molecules that their analysis is based on, and the results of statistical test applied to each of them. More details (or validation of one of the genes with >1 isoforms) are required to substantiate the authors claim regarding the stage-specificity.

Additionally:

1) there are no attempts to compare the author’s dataset to similar ones generated for P.falciparum, a model human malaria species. As the two parasites share very similar syntenic genomes (with 1:1 orthologs for almost all genes) and similar exon/intron structure, such comparison could be very useful, given the fact that P.f has much better UTR annotation and a number of stage specific gene expression studies

2) The authors don’t provide the link to raw datasets (eg. SRA, GEO).Given the declaration in the data availability statement, I assume that this is a last-minute omission and the data will be made accessible to the community

3) A Github repository with the scripts for the data analysis is empty.

4) it is not clear what the fig S7 should represent as the labelling of the axes is not very helpful. No statistical test is applied, and no raw numbers are given, so it is difficult to tell how significant the difference in kmers is. For the motif enrichment in the specific sequences, a number of standard tools are available (Meme, Ace etc.)

4. In asexual stage annotation authors identify the trophozoites, schizonts and females but not the rings and males which should be also present in the mixed blood stages. Was that an artefact of the experimental design? Can authors provide reasons for that?

5. Based in the few examples shown in Fig 5, the transcripts with multiple isoforms tend to be more expressed at the stage in which a new isoform is identified. Given how many genes are differentially expressed between the life stages, it would be useful to include the results of differential expression analysis of the genes which are differentially spliced in order to see if one phenomenon I connected with the other (eg. is the increase of expression correlated with different UTR). That could also provide a hint whether some of the isoforms are just result of faulty transcript processing that appear whether the gene is highly expressed.

Regardless of these omissions, the study is well designed and its publication will be of interest for the malaria community. I commend the authors for their effort and looking forward to seeing this work in print.

--------------------

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

PLoS Negl Trop Dis. 2022 Dec 16;16(12):e0010991. doi: 10.1371/journal.pntd.0010991.r002

Author response to Decision Letter 0

27 Oct 2022

Attachment

Submitted filename: Response_PlosNTD_final.docx

Click here for additional data file.^{(498.7KB, docx)}

PLoS Negl Trop Dis. doi: 10.1371/journal.pntd.0010991.r003

Decision Letter 1

Ana Rodriguez, Ananias A Escalante

28 Nov 2022

Dear Ms Hazzard,

We are pleased to inform you that your manuscript 'Long read single cell RNA sequencing reveals the isoform diversity of Plasmodium vivax transcripts' has been provisionally accepted for publication in PLOS Neglected Tropical Diseases.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Neglected Tropical Diseases.

Best regards,

Ananias A. Escalante, PhD

Academic Editor

PLOS Neglected Tropical Diseases

Ana Rodriguez

Section Editor

PLOS Neglected Tropical Diseases

***********************************************************

Reviewer's Responses to Questions

Key Review Criteria Required for Acceptance?

As you describe the new analyses required for acceptance, please consider the following:

Methods

-Are the objectives of the study clearly articulated with a clear testable hypothesis stated?

-Is the study design appropriate to address the stated objectives?

-Is the population clearly described and appropriate for the hypothesis being tested?

-Is the sample size sufficient to ensure adequate power to address the hypothesis being tested?

-Were correct statistical analysis used to support conclusions?

-Are there concerns about ethical or regulatory requirements being met?

Reviewer #1: (No Response)

Reviewer #3: No further comments, all my previous ones were adressed.

**********

Results

-Does the analysis presented match the analysis plan?

-Are the results clearly and completely presented?

-Are the figures (Tables, Images) of sufficient quality for clarity?

Reviewer #1: (No Response)

Reviewer #3: Yes, perhaps more details could be present in the legend of Table S4 ( what is D-crit test?)

**********

Conclusions

-Are the conclusions supported by the data presented?

-Are the limitations of analysis clearly described?

-Do the authors discuss how these data can be helpful to advance our understanding of the topic under study?

-Is public health relevance addressed?

Reviewer #1: (No Response)

Reviewer #3: Yes

**********

Editorial and Data Presentation Modifications?

Reviewer #1: (No Response)

Reviewer #3: No further problems

**********

Summary and General Comments

Reviewer #1: The authors have adequately addressed my previous comments. This will be a useful resource for the malaria community.

Reviewer #3: No additional experiments or analysis needed

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #3: No

PLoS Negl Trop Dis. doi: 10.1371/journal.pntd.0010991.r004

Acceptance letter

Ana Rodriguez, Ananias A Escalante

12 Dec 2022

Dear Ms Hazzard,

We are delighted to inform you that your manuscript, "Long read single cell RNA sequencing reveals the isoform diversity of Plasmodium vivax transcripts," has been formally accepted for publication in PLOS Neglected Tropical Diseases.

We have now passed your article onto the PLOS Production Department who will complete the rest of the publication process. All authors will receive a confirmation email upon publication.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any scientific or type-setting errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Note: Proofs for Front Matter articles (Editorial, Viewpoint, Symposium, Review, etc...) are generated on a different schedule and may not be made available as quickly.

Soon after your final files are uploaded, the early version of your manuscript will be published online unless you opted out of this process. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Neglected Tropical Diseases.

Best regards,

Shaden Kamhawi

co-Editor-in-Chief

PLOS Neglected Tropical Diseases

Paul Brindley

co-Editor-in-Chief

PLOS Neglected Tropical Diseases

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Fig. Example of Illumina and Pacbio data generated from the same scRNA-seq library.

(TIF)

Click here for additional data file.^{(663.8KB, tif)}

S2 Fig. Principal component analysis showing the relationships among individual parasite cells characterized by scRNA-seq (using the Illumina data).

(TIF)

Click here for additional data file.^{(524KB, tif)}

S3 Fig. Correlation between the number of reads obtained by Illumina and PacBio.

(TIF)

Click here for additional data file.^{(255.3KB, tif)}

S4 Fig. Predicted protein length distributions.

(TIF)

Click here for additional data file.^{(304.1KB, tif)}

S5 Fig. Summary of the bioinformatic pipeline used for processing the PacBio reads and for predicting transcripts.

(TIF)

Click here for additional data file.^{(506.2KB, tif)}

S6 Fig. Comparison of the predicted proteins with annotated proteins.

(TIF)

Click here for additional data file.^{(614.7KB, tif)}

S7 Fig. Stage-specific expression of the isoforms of the cytochrome b5-like heme/steroid binding protein (PVP01_0716500).

(TIF)

Click here for additional data file.^{(705.9KB, tif)}

S8 Fig. UTR length and expression.

(TIF)

Click here for additional data file.^{(128.6KB, tif)}

S9 Fig. UTR kmer analysis.

(TIF)

Click here for additional data file.^{(395.4KB, tif)}

S10 Fig. Example of a transcript with unannotated UTR introns.

(TIF)

Click here for additional data file.^{(650KB, tif)}

S11 Fig. Examples of isoforms expressed in a stage-specific manner.

(TIF)

Click here for additional data file.^{(939KB, tif)}

S1 Table. Sequencing and transcript predictions results.

Table listing numbers from each step of sequencing analysis and transcript prediction.

(XLSX)

Click here for additional data file.^{(10.8KB, xlsx)}

S2 Table. List of all predicted transcripts.

(XLSX)

Click here for additional data file.^{(9.4MB, xlsx)}

S3 Table. Results of Gene Ontology analysis.

Top hits from GO analysis.

(XLSX)

Click here for additional data file.^{(41.7KB, xlsx)}

S4 Table. List of predicted genes with differentially expressed isoforms.

(XLSX)

Click here for additional data file.^{(24.7KB, xlsx)}

S1 Data. Gene predictions.

gtf file of all predicted protein coding transcripts combinned with the current reference.

(GTF)

Click here for additional data file.^{(11.2MB, gtf)}

Attachment

Submitted filename: Response_PlosNTD_final.docx

Click here for additional data file.^{(498.7KB, docx)}

Data Availability Statement

[pntd.0010991.ref001] 1.World Malaria Report 2021. Geneva: World Health Organization; 2021. Licence: CC BY-NC-SA 3.0 IGO.

[pntd.0010991.ref002] 2.Anstey NM, Guerra CA, Yeung S, Price RN, Tjitra E, White NJ. Vivax Malaria: Neglected and Not Benign. Am J Trop Med Hyg. 2007. Dec 1;77(6_Suppl):79–87. [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref003] 3.Bermúdez M, Moreno-Pérez DA, Arévalo-Pinzón G, Curtidor H, Patarroyo MA. Plasmodium vivax in vitro continuous culture: the spoke in the wheel. Malar J. 2018. Dec;17(1):301. doi: 10.1186/s12936-018-2456-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref004] 4.Carlton JM, Adams JH, Silva JC, Bidwell SL, Lorenzi H, Caler E, et al. Comparative genomics of the neglected human malaria parasite Plasmodium vivax. Nature. 2008. Oct;455(7214):757–63. doi: 10.1038/nature07327 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref005] 5.Neafsey DE, Galinsky K, Jiang RHY, Young L, Sykes SM, Saif S, et al. The malaria parasite Plasmodium vivax exhibits greater genetic diversity than Plasmodium falciparum. Nat Genet. 2012. Sep;44(9):1046–50. doi: 10.1038/ng.2373 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref006] 6.Chan ER, Barnwell JW, Zimmerman PA, Serre D. Comparative Analysis of Field-Isolate and Monkey-Adapted Plasmodium vivax Genomes. Escalante AA, editor. PLoS Negl Trop Dis. 2015. Mar 13;9(3):e0003566. doi: 10.1371/journal.pntd.0003566 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref007] 7.Pearson RD, Amato R, Auburn S, Miotto O, Almagro-Garcia J, Amaratunga C, et al. Genomic analysis of local variation and recent evolution in Plasmodium vivax. Nat Genet. 2016. Aug;48(8):959–64. doi: 10.1038/ng.3599 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref008] 8.Auburn S, Benavente ED, Miotto O, Pearson RD, Amato R, Grigg MJ, et al. Genomic analysis of a pre-elimination Malaysian Plasmodium vivax population reveals selective pressures and changing transmission dynamics. Nat Commun. 2018. Dec;9(1):2585. doi: 10.1038/s41467-018-04965-4 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref009] 9.Popovici J, Friedrich LR, Kim S, Bin S, Run V, Lek D, et al. Genomic Analyses Reveal the Common Occurrence and Complexity of Plasmodium vivax Relapses in Cambodia. Miller LH, editor. mBio. 2018. Mar 7;9(1):e01888–17. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref010] 10.Bozdech Z, Mok S, Hu G, Imwong M, Jaidee A, Russell B, et al. The transcriptome of Plasmodium vivax reveals divergence and diversity of transcriptional regulation in malaria parasites. Proc Natl Acad Sci. 2008. Oct 21;105(42):16290–5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref011] 11.Zhu L, Mok S, Imwong M, Jaidee A, Russell B, Nosten F, et al. New insights into the Plasmodium vivax transcriptome using RNA-Seq. Sci Rep. 2016. Apr;6(1):20498. doi: 10.1038/srep20498 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref012] 12.Kim A, Popovici J, Menard D, Serre D. Plasmodium vivax transcriptomes reveal stage-specific chloroquine response and differential regulation of male and female gametocytes. Nat Commun. 2019. Dec;10(1):371. doi: 10.1038/s41467-019-08312-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref013] 13.Collins WE. Nonhuman primate models. II. Infection of Saimiri and Aotus monkeys with Plasmodium vivax. Methods Mol Med. 2002;72:85–92. doi: 10.1385/1-59259-271-6:85 [DOI] [PubMed] [Google Scholar]

[pntd.0010991.ref014] 14.Young Martin D., Porter James A., Johnson Carl M. Plasmodium vivax Transmitted from Man to Monkey to Man. Science. 1966. Aug 26;153(3739):1006–7. doi: 10.1126/science.153.3739.1006 [DOI] [PubMed] [Google Scholar]

[pntd.0010991.ref015] 15.Kim A, Popovici J, Vantaux A, Samreth R, Bin S, Kim S, et al. Characterization of P. vivax blood stage transcriptomes from field isolates reveals similarities among infections and complex gene isoforms. Sci Rep. 2017. Dec;7(1):7761. doi: 10.1038/s41598-017-07275-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref016] 16.Tebben K, Dia A, Serre D. Determination of the Stage Composition of Plasmodium Infections from Bulk Gene Expression Data. Langelier CR, editor. mSystems. 2022. Aug 30;7(4):e00258–22. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref017] 17.Reid AJ, Talman AM, Bennett HM, Gomes AR, Sanders MJ, Illingworth CJR, et al. Single-cell RNA-seq reveals hidden transcriptional variation in malaria parasites. eLife. 2018. Mar 27;7:e33105. doi: 10.7554/eLife.33105 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref018] 18.Poran A, Nötzel C, Aly O, Mencia-Trinchant N, Harris CT, Guzman ML, et al. Single-cell RNA sequencing reveals a signature of sexual commitment in malaria parasites. Nature. 2017. Nov;551(7678):95–9. doi: 10.1038/nature24280 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref019] 19.Howick VM, Russell AJC, Andrews T, Heaton H, Reid AJ, Natarajan K, et al. The Malaria Cell Atlas: Single parasite transcriptomes across the complete Plasmodium life cycle. Science. 2019. Aug 23;365(6455):eaaw2619. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref020] 20.Brancucci NMB, De Niz M, Straub TJ, Ravel D, Sollelis L, Birren BW, et al. Probing Plasmodium falciparum sexual commitment at the single-cell level. Wellcome Open Res. 2018. Oct 17;3:70–70. doi: 10.12688/wellcomeopenres.14645.4 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref021] 21.Bogale HN, Pascini TV, Kanatani S, Sá JM, Wellems TE, Sinnis P, et al. Transcriptional heterogeneity and tightly regulated changes in gene expression during Plasmodium berghei sporozoite development. Proc Natl Acad Sci. 2021. Mar 9;118(10):e2023438118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref022] 22.Real E, Howick VM, Dahalan FA, Witmer K, Cudini J, Andradi-Brown C, et al. A single-cell atlas of Plasmodium falciparum transmission through the mosquito. Nat Commun. 2021. Dec;12(1):3196. doi: 10.1038/s41467-021-23434-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref023] 23.Ruberto AA, Bourke C, Merienne N, Obadia T, Amino R, Mueller I. Single-cell RNA sequencing reveals developmental heterogeneity among Plasmodium berghei sporozoites. Sci Rep. 2021. Dec;11(1):4127. doi: 10.1038/s41598-021-82914-w [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref024] 24.Sà JM, Cannon MV, Caleon RL, Wellems TE, Serre D. Single-cell transcription analysis of Plasmodium vivax blood-stage parasites identifies stage- and species-specific profiles of expression. Duffy M, editor. PLOS Biol. 2020. May 4;18(5):e3000711. doi: 10.1371/journal.pbio.3000711 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref025] 25.Ruberto AA, Bourke C, Vantaux A, Maher SP, Jex A, Witkowski B, et al. Single-cell RNA sequencing of Plasmodium vivax sporozoites reveals stage- and species-specific transcriptomic signatures. Mireji PO, editor. PLoS Negl Trop Dis. 2022. Aug 4;16(8):e0010633. doi: 10.1371/journal.pntd.0010633 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref026] 26.Baran-Gale J, Chandra T, Kirschner K. Experimental design for single-cell RNA sequencing. Brief Funct Genomics. 2018;17(4):8. doi: 10.1093/bfgp/elx035 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref027] 27.Zheng GXY, Terry JM, Belgrader P, Ryvkin P, Bent ZW, Wilson R, et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun. 2017. Apr;8(1):14049. doi: 10.1038/ncomms14049 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref028] 28.Siegel SV, Chappell L, Hostetler JB, Amaratunga C, Suon S, Böhme U, et al. Analysis of Plasmodium vivax schizont transcriptomes from field isolates reveals heterogeneity of expression of genes involved in host-parasite interactions. Sci Rep. 2020. Dec;10(1):16667. doi: 10.1038/s41598-020-73562-7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref029] 29.Sharon D, Tilgner H, Grubert F, Snyder M. A single-molecule long-read survey of the human transcriptome. Nat Biotechnol. 2013. Nov;31(11):1009–14. doi: 10.1038/nbt.2705 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref030] 30.Tian L. Comprehensive characterization of single-cell full-length isoforms in human and mouse with long-read sequencing. 2021;24. doi: 10.1186/s13059-021-02525-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref031] 31.Shields EJ, Sorida M, Sheng L, Sieriebriennikov B, Ding L, Bonasio R. Genome annotation with long RNA reads reveals new patterns of gene expression and improves single-cell analyses in an ant brain. BMC Biol. 2021. Dec;19(1):254. doi: 10.1186/s12915-021-01188-w [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref032] 32.Westoby J, Artemov P, Hemberg M, Ferguson-Smith A. Obstacles to detecting isoforms using full-length scRNA-seq data. Genome Biol. 2020. Dec;21(1):74. doi: 10.1186/s13059-020-01981-w [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref033] 33.Rhoads A, Au KF. PacBio Sequencing and Its Applications. Genomics Proteomics Bioinformatics. 2015. Oct;13(5):278–89. doi: 10.1016/j.gpb.2015.08.002 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref034] 34.Healey HM, Bassham S, Cresko WA. Single-cell Iso-Sequencing enables rapid genome annotation for scRNAseq analysis. Sanchez Alvarado A, editor. Genetics. 2022. Mar 3;220(3):iyac017. doi: 10.1093/genetics/iyac017 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref035] 35.Gupta I, Collier PG, Haase B, Mahfouz A, Joglekar A, Floyd T, et al. Single-cell isoform RNA sequencing characterizes isoforms in thousands of cerebellar cells. Nat Biotechnol. 2018. Dec;36(12):1197–202. doi: 10.1038/nbt.4259 [DOI] [PubMed] [Google Scholar]

[pntd.0010991.ref036] 36.Auburn S, Böhme U, Steinbiss S, Trimarsanto H, Hostetler J, Sanders M, et al. A new Plasmodium vivax reference sequence with improved assembly of the subtelomeres reveals an abundance of pir genes. Wellcome Open Res. 2016. Nov 15;1:4. doi: 10.12688/wellcomeopenres.9876.1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref037] 37.Adjalley SH, Chabbert Christophe D, Klaus Bernd Pelechano Vicent Steinmetz Lars M. Landscape and Dynamics of Transcription Initiation in the Malaria Parasite Plasmodium falciparum. Cell Rep. 2016. Mar 15;(14):14. doi: 10.1016/j.celrep.2016.02.025 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref038] 38.Breitbart RE, Andreadis A, Nadal-Ginard B. ALTERNATIVE SPLICING: A UBIQUITOUS MECHANISM FOR THE GENERATION OF MULTIPLE PROTEIN ISOFORMS FROM SINGLE GENES. Annu Rev Biochem. 1987. Jun 1;56(1):467–95. doi: 10.1146/annurev.bi.56.070187.002343 [DOI] [PubMed] [Google Scholar]

[pntd.0010991.ref039] 39.Pavlovic Djuranovic S, Erath J, Andrews RJ, Bayguinov PO, Chung JJ, Chalker DL, et al. Plasmodium falciparum translational machinery condones polyadenosine repeats. eLife. 2020. May 29;9:e57799. doi: 10.7554/eLife.57799 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref040] 40.Li J, Cai B, Qi Y, Zhao W, Liu J, Xu R, et al. UTR introns, antisense RNA and differentially spliced transcripts between Plasmodium yoelii subspecies. Malar J. 2016. Dec;15(1):30. doi: 10.1186/s12936-015-1081-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref041] 41.Yeoh LM, Lee VV, McFadden GI, Ralph SA. Alternative Splicing in Apicomplexan Parasites. Sibley D, Garsin DA, editors. mBio. 2019. Feb 26;10(1):e02866–18. doi: 10.1128/mBio.02866-18 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref042] 42.Hawkins P, Morton DB, Burman O, Dennison N, Honess P, Jennings M, et al. A guide to defining and implementing protocols for the welfare assessment of laboratory animals: eleventh report of the BVAAWF/FRAME/RSPCA/UFAW Joint Working Group on Refinement. Lab Anim. 2011. Jan 1;45(1):1–13. doi: 10.1258/la.2010.010031 [DOI] [PubMed] [Google Scholar]

[pntd.0010991.ref043] 43.Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 2015. Apr;12(4):6. doi: 10.1038/nmeth.3317 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref044] 44.Bayega A, Fahiminiya S, Oikonomopoulos S, Ragoussis J. Current and Future Methods for mRNA Analysis: A Drive Toward Single Molecule Sequencing. Methods Mol Biol Clifton NJ. 2018;1783:209–41. doi: 10.1007/978-1-4939-7834-2_11 [DOI] [PubMed] [Google Scholar]

[pntd.0010991.ref045] 45.Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018. May 10;34(18):7. doi: 10.1093/bioinformatics/bty191 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref046] 46.Kovaka S, Zimin AV, Pertea GM, Razaghi R, Salzberg SL, Pertea M. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 2019. Dec;20(1):278. doi: 10.1186/s13059-019-1910-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref047] 47.Pertea M, Kim D, Pertea GM, Leek JT, Salzberg SL. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc. 2016. Sep;11(9):1650–67. doi: 10.1038/nprot.2016.095 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pntd.0010991.ref048] 48.Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc. 2013. Aug;8(8):1494–512. doi: 10.1038/nprot.2013.084 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Long read single cell RNA sequencing reveals the isoform diversity of Plasmodium vivax transcripts

Brittany Hazzard

Juliana M Sá

Angela C Ellis

Tales V Pascini

Shuchi Amin

Thomas E Wellems

David Serre

Roles

Abstract

Author summary

Introduction

Results and discussion

Characterization of single cell P. vivax blood-stage and sporozoite transcriptomes

Fig 1. Overview of the experimental design.

Table 1. Illumina and PacBio sequencing results and transcript predictions.

Long read sequencing of scRNA-seq libraries provides full-length transcript information

Most transcripts encode proteins corresponding to the genome annotation

Fig 2. Examples of full-length isoform sequencing using PacBio and resulting transcript predictions.

Most P. vivax transcripts have extended UTRs that often contain introns

Fig 3. UTR length distributions.

Transcript isoforms are common in P. vivax and can be expressed in a stage-specific manner

Table 2. Summary of isoform types from PacBio predictions.

Fig 4. Example of isoforms expressed in a stage-specific manner.

Conclusion

Materials and methods

Ethics statement

Animal studies and sample collection

10X single cell RNA-sequencing library preparation and sequencing

Short read analysis and single cell characterization

Long read analysis

Transcript and protein identification

Stage-specific transcript analysis

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Ana Rodriguez

Ananias A Escalante

Roles

Author response to Decision Letter 0

Decision Letter 1

Ana Rodriguez

Ananias A Escalante

Roles

Acceptance letter

Ana Rodriguez

Ananias A Escalante

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases