Skip to main content
. 2024 May 24;10(5):001249. doi: 10.1099/mgen.0.001249

Table 1. Summary of nine approaches included in this study: input required for each approach is either FASTQ (sequencing reads), or VCF (allele frequency table relative to the Wuhan reference). The database used by each approach was divided into two categories: curated mutation profiles, where a selection of alleles present per lineage is done by a functional or phylogenetic criteria, or selected alleles from GISAID genomes, where all or some GISAID genomes are aligned to the reference genome, and alleles are selected based on allele-frequency. For VLQ, this is used in selecting a reference set for lineages when performing pseudoalignment. Database availability refers to how users can access the database, by accessing the tools’ GitHub repository or by executing a command within the tool. For VLQ, this is used in selecting a reference set for lineages when performing pseudoalignment. Database availability refers to how users can access the database, by accessing the tools’ GitHub repository or by executing a command within the tool.

Tool Package manager Dependencies Input Method to estimate lineage relative abundances Database type Database details Database availability Reference
Alcov Pip Cutadapt, minimap2 Python Packages: Fire, numpy, pandas, scikit-learn, matplotlib, seaborn, pysam FASTQ Linear model: ordinary least squared Curated mutation profiles Cov-Spectrum GitHub [31]
Basic na blast+, PRINSEQ, fastp, bwa, picard, samtools, varscan, python3, R FASTQ Linear model: constrained linear model Selected alleles from GISAID genomes Allele-frequency>=90 % GitHub [30]
Freyja Conda iVar, samtools, UShER Python packages: cvxpy, numpy, pandas FASTQ Constrained (weighted) least absolute deviations Curated mutation profiles Marker mutations from UShER global phylogenetic tree GitHub or; Via Freyja [10]
Gromstole na cutadapt, minimap2, Python, R FASTQ Quasibinomial regression model Curated mutation profiles Cov-Spectrum GitHub [33]
LCS na Snakemake, samtools, GATK4, bwa, picard, USHER Python Packages:, biopython, pysam, pyvcf, pandas, cvxpy, ray-core FASTQ or VCF Log-likelihood maximization model Selected alleles from GISAID genomes Allele-frequency>=80 % with phylogenetic verification GitHub or; Via LCS [34]
Lineagespot bioconductor R VCF Average allele frequency Curated mutation profiles Outbreak.Info or Pangolin na [35]
VLQ na kallisto, samtools, minimap2, bwa, bbmap Python Packages: pyvcf, pysam FASTQ Pseudoalignment to reference genomes (Kallisto) Selected alleles from GISAID genomes Reference genomes selected to capture alleles with frequency>=50 % Via VLQ [37]
V-pipe Conda COJAC, ShoRAH, LolliPop Python Packages: pysam, pandas, numpy, pyyaml, strictyaml, requests, click, poetry-core FASTQ Linear model: constrained linear model (Lollipop) and Dirichlet process mixture model (ShoRAH) Curated mutation profiles Cov-Spectrum and UKHSA Genomics Public Health analysis variant-definitions Via COJAC 39–42
Pipes na bowtie2, R (ape) FASTQ Expectation-maximization algorithm Selected alleles from GISAID genomes Calculated phylogenetic internal nodes for each lineage Via Pipes [36]