alignment_length |
Calculate alignment length |
Analysis |
Multiple-sequence file in FASTA format |
NA |
AMAS (Borowiec 2016) |
alignment_recoding |
Recode alignments using reduced character states |
Processing |
Multiple-sequence file in FASTA format |
Woese et al. (1991), Embley et al. (2003), Kosiol et al. (2004), Hrdy et al. (2004), Susko and Roger (2007)
|
Custom scripts (Hernandez and Ryan 2021) |
alignment_summary |
Summarize diverse properties of a multiple sequence alignment |
Analysis |
Multiple-sequence file in FASTA format |
NA |
AMAS (Borowiec 2016); custom scripts (Shen, Salichos, et al. 2016); PhyKIT (Steenwyk, Buida, et al. 2021) |
consensus_sequence |
Generates a consensus sequence |
Analysis |
Multiple-sequence file in FASTA format |
Sternke et al. (2019)
|
Geneious (https://www.geneious.com) |
constant_sites |
Determine the number of constant sites in an alignment |
Analysis |
Multiple-sequence file in FASTA format |
Kumar et al. (2016)
|
IQ-TREE (Minh et al. 2020) |
parsimony_ informative_sites |
Determine the number of parsimony-informative sites in an alignment |
Analysis |
Multiple-sequence file in FASTA format |
Kumar et al. (2016)
|
AMAS (Borowiec 2016); custom scripts (Shen, Salichos, et al. 2016) |
position_specific_ score_matrix |
Generates a position specific score matrix for an alignment |
Analysis |
Multiple-sequence file in FASTA format |
Gribskov et al. (1987)
|
BLAST+ (Camacho et al. 2009) |
variable_sites |
Determine the number of variable sites in an alignment |
Analysis |
Multiple-sequence file in FASTA format |
Shen, Salichos, et al. (2016)
|
AMAS (Borowiec 2016); custom scripts (Shen, Salichos, et al. 2016); PhyKIT (Steenwyk, Buida, et al. 2021) |
gc_content_first_ position |
Determine the GC content of the first codon position among protein coding sequences |
Analysis |
Protein coding sequences in FASTA format |
Bentele et al. (2013)
|
Custom scripts (Bentele et al. 2013) |
gc_content_second_ position |
Determine the GC content of the second codon position among protein coding sequences |
Analysis |
Protein coding sequences in FASTA format |
Bentele et al. (2013)
|
Custom scripts (Bentele et al. 2013) |
gc_content_third_ position |
Determine the GC content of the third codon position among protein coding sequences |
Analysis |
Protein coding sequences in FASTA format |
Bentele et al. (2013)
|
Custom scripts (Bentele et al. 2013) |
gene_wise_relative_ synonymous_ codon_usage |
Calculate gene-wise relative synonymous codon usage |
Analysis |
Protein coding sequences in FASTA format |
This study |
This study |
relative_synonymous_ codon_usage |
Calculate relative synonymous codon usage |
Analysis |
Protein coding sequences in FASTA format |
Xu et al. (2008)
|
MEGA (Kumar et al. 2016) |
translate_sequence |
Translate protein coding sequences to amino acid sequences |
Processing |
Protein coding sequences in FASTA format |
NA |
EMBOSS (Rice et al. 2000) |
fastq_read_lengths |
Examine the distribution of read lengths |
Analysis |
Sequence reads in FASTQ format |
NA |
FQStat (Chanumolu et al. 2019) |
subset_pe_fastq_reads |
Down sample paired-end reads |
Processing |
Sequence reads in FASTQ format |
NA |
SeqKit (Shen, Le, et al. 2016) |
subset_se_fastq_reads |
Down sample single-end reads |
Processing |
Sequence reads in FASTQ format |
NA |
SeqKit (Shen, Le, et al. 2016) |
trim_pe_fastq_reads |
Trim paired-end reads based on quality and length thresholds |
Analysis |
Sequence reads in FASTQ format |
NA |
Trimmomatic (Bolger et al. 2014) |
trim_se_fastq_reads |
Trim single-end reads based on quality and length thresholds |
Analysis |
Sequence reads in FASTQ format |
NA |
Trimmomatic (Bolger et al. 2014) |
trim_pe_adapters_fastq_reads |
Trim adapters from paired-end reads and implement length thresholds |
Analysis |
Sequence reads in FASTQ format |
NA |
Trimmomatic (Bolger et al. 2014) |
trim_se_ adapters_fastq_reads |
Trim adapters from single-end reads and implement length thresholds |
Analysis |
Sequence reads in FASTQ format |
NA |
Trimmomatic (Bolger et al. 2014) |
gc_content |
Determine GC content |
Analysis |
FASTA file of nucleotide sequences |
Romiguier et al. (2010)
|
custom scripts (Shen, Salichos, et al. 2016); GC-Profile (Gao and Zhang 2006) |
genome_assembly_metrics |
Determine diverse properties of a genome assembly for quality assessment and characterization |
Analysis |
FASTA file of a genome assembly |
Gurevich et al. (2013)
|
QUAST (Gurevich et al. 2013); REAPR (Hunt et al. 2013) |
l50 |
L50 |
Analysis |
FASTA file of a genome assembly |
Gurevich et al. (2013)
|
QUAST (Gurevich et al. 2013) |
l90 |
L90 |
Analysis |
FASTA file of a genome assembly |
Gurevich et al. (2013)
|
QUAST (Gurevich et al. 2013) |
longest_scaffold |
Determine the length of the longest entry in a FASTA file |
Analysis |
FASTA file |
Gurevich et al. (2013)
|
Custom scripts (Ou et al. 2020) |
n50 |
N50 |
Analysis |
FASTA file of a genome assembly |
Gurevich et al. (2013)
|
QUAST (Gurevich et al. 2013) |
n90 |
N90 |
Analysis |
FASTA file of a genome assembly |
Gurevich et al. (2013)
|
QUAST (Gurevich et al. 2013) |
number_of_large_scaffolds |
Determine the number and length of scaffolds longer than 500 nucleotides. Length threshold of 500 nucleotides can be modified by the user |
Analysis |
FASTA file |
NA |
QUAST (Gurevich et al. 2013) |
number_of_scaffolds |
Determine the number of FASTA entries |
Analysis |
FASTA file |
NA |
QUAST (Gurevich et al. 2013) |
sum_of_scaffold_lengths |
Determine the total length of all FASTA entries |
Analysis |
FASTA file |
NA |
QUAST (Gurevich et al. 2013) |
character_frequency |
Determine the frequency of each character. Gaps are assumed to be represented as ‘?’ and ‘-’ characters |
Analysis |
FASTA file |
NA |
Biostrings (https://rdrr.io/bioc/Biostrings/) |
faidx |
Get sequence entry from FASTA file |
Processing |
FASTA file |
NA |
SAMtools (Li et al. 2009) |
file_format_converter |
Converts multiple sequence alignments from one format to another |
Processing |
FASTA, Clustal, MAF, Mauve, Phylip, Phylip-sequential, Phylip-relaxed, and Stockholm |
NA |
ALTER (Glez-Pena et al. 2010) |
multiple_line_to_single_line_fasta |
Reformat sequences to be represented on one line |
Processing |
FASTA file |
NA |
FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) |
remove_fasta_entry |
Remove sequence based on entry identifier |
Processing |
FASTA file |
NA |
NA |
remove_short_sequences |
Remove short sequences |
Processing |
FASTA file |
NA |
NA |
rename_fasta_entries |
Rename FASTA entries |
Processing |
FASTA file |
NA |
FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) |
reorder_by_sequence_length |
Reorder FASTA entries by length |
Processing |
FASTA file |
NA |
SeqKit (Shen, Le, et al. 2016) |
sequence_complement |
Generate sequence complements in the forward or reverse direction |
Processing |
FASTA file |
Britten (1998)
|
EMBOSS (Rice et al. 2000) |
sequence_length |
Calculate the length of each FASTA file |
Analysis |
FASTA file |
NA |
bioawk (https://github.com/lh3/bioawk) |
single_line_to_multiple_line_fasta |
Reformat sequences to be represented on multiple lines |
Processing |
FASTA file |
NA |
FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) |