Table 1. Comparison of previous var assembly approaches based on DNA- and RNA-sequencing.
| Study | Assembler | k-mer | Transcript or gene assembly | Validation on reference strain(s)(Yes/No) | Validation on field strain(s)(Yes/No) | Validation across different expression levels (Yes/No) | Read length (Short/Long) | Read correction (Yes/No) | Scaffolding(Yes/No) | Var transcript filter approach | Assumption | Other limitations |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Duffy et al., 2016 | Oases | Transcript | No | No | No | Short | Yes | Yes | Aligned to 399 var genes with BLAST (e-value <10–5) | – No quantification of misassemblies – Unable to recover full-length transcript assemblies |
||
| Dara et al., 2017 | Sprai and Celera (no longer maintained) |
71 | Gene | Yes (strain NF54) | Yes (12 UM patient samples) | NA – only genome assemblies | Long and short | Method assumes combination of long- and short-read sequencing will identify errors | No | >500 bp and aligned to VarDom database | Assumes a whole genome assembly is available | – Require prior filtering of human DNA – Need a combination of short-read and long-read sequencing |
| Tonkin-Hill et al., 2018 * | SoapDeNovo-Trans | 21, 31, 41, 52, and 61 | Transcript | Yes (strain ITG) | No | No | Short | No | No | >500 bp and containing a sig. annotated var domain | – Unable to fully resolve N-terminus – No quantification of misassemblies |
|
| Otto et al., 2019 | Masurca +post-assembly improvements |
Default | Gene | Yes (clone 3D7) | Yes (15 Pf3k reference genomes) | No | Short | Yes | Yes | Whole genome dataset | ||
| Andrade et al., 2020 | Velvet | 41 | Transcript | No | No | No | Short | No | No | Aligned to VarDom database | – No quantification of misassemblies | |
| Stucke et al., 2021 | rnaSPAdes | Default | Transcript | No | Yes (6 UM patient samples) | No – only the most expressed var gene | Short | Yes | Unclear | >500 bp and containing a sig. annotated var domain | Information about the true var annotation is available | – Performs de novo assembly on all non-human and P. falciparum mapping reads – Inconsistent results in three samples when comparing genomic and RNA-sequencing results for dominant var gene |
Also used in Wichers et al., 2021; Guillochon et al., 2022.
UM: uncomplicated malaria.