. 2022 May 10;221(3):iyac079. doi: 10.1093/genetics/iyac079

Table 1.

Summary of 42 functions available in BioKIT at the time of publication.

Function name	Description	Type of function	Input data	Citation	Example software that performs this function
alignment_length	Calculate alignment length	Analysis	Multiple-sequence file in FASTA format	NA	AMAS (Borowiec 2016)
alignment_recoding	Recode alignments using reduced character states	Processing	Multiple-sequence file in FASTA format	Woese et al. (1991), Embley et al. (2003), Kosiol et al. (2004), Hrdy et al. (2004), Susko and Roger (2007)	Custom scripts (Hernandez and Ryan 2021)
alignment_summary	Summarize diverse properties of a multiple sequence alignment	Analysis	Multiple-sequence file in FASTA format	NA	AMAS (Borowiec 2016); custom scripts (Shen, Salichos, et al. 2016); PhyKIT (Steenwyk, Buida, et al. 2021)
consensus_sequence	Generates a consensus sequence	Analysis	Multiple-sequence file in FASTA format	Sternke et al. (2019)	Geneious (https://www.geneious.com)
constant_sites	Determine the number of constant sites in an alignment	Analysis	Multiple-sequence file in FASTA format	Kumar et al. (2016)	IQ-TREE (Minh et al. 2020)
parsimony_ informative_sites	Determine the number of parsimony-informative sites in an alignment	Analysis	Multiple-sequence file in FASTA format	Kumar et al. (2016)	AMAS (Borowiec 2016); custom scripts (Shen, Salichos, et al. 2016)
position_specific_ score_matrix	Generates a position specific score matrix for an alignment	Analysis	Multiple-sequence file in FASTA format	Gribskov et al. (1987)	BLAST+ (Camacho et al. 2009)
variable_sites	Determine the number of variable sites in an alignment	Analysis	Multiple-sequence file in FASTA format	Shen, Salichos, et al. (2016)	AMAS (Borowiec 2016); custom scripts (Shen, Salichos, et al. 2016); PhyKIT (Steenwyk, Buida, et al. 2021)
gc_content_first_ position	Determine the GC content of the first codon position among protein coding sequences	Analysis	Protein coding sequences in FASTA format	Bentele et al. (2013)	Custom scripts (Bentele et al. 2013)
gc_content_second_ position	Determine the GC content of the second codon position among protein coding sequences	Analysis	Protein coding sequences in FASTA format	Bentele et al. (2013)	Custom scripts (Bentele et al. 2013)
gc_content_third_ position	Determine the GC content of the third codon position among protein coding sequences	Analysis	Protein coding sequences in FASTA format	Bentele et al. (2013)	Custom scripts (Bentele et al. 2013)
gene_wise_relative_ synonymous_ codon_usage	Calculate gene-wise relative synonymous codon usage	Analysis	Protein coding sequences in FASTA format	This study	This study
relative_synonymous_ codon_usage	Calculate relative synonymous codon usage	Analysis	Protein coding sequences in FASTA format	Xu et al. (2008)	MEGA (Kumar et al. 2016)
translate_sequence	Translate protein coding sequences to amino acid sequences	Processing	Protein coding sequences in FASTA format	NA	EMBOSS (Rice et al. 2000)
fastq_read_lengths	Examine the distribution of read lengths	Analysis	Sequence reads in FASTQ format	NA	FQStat (Chanumolu et al. 2019)
subset_pe_fastq_reads	Down sample paired-end reads	Processing	Sequence reads in FASTQ format	NA	SeqKit (Shen, Le, et al. 2016)
subset_se_fastq_reads	Down sample single-end reads	Processing	Sequence reads in FASTQ format	NA	SeqKit (Shen, Le, et al. 2016)
trim_pe_fastq_reads	Trim paired-end reads based on quality and length thresholds	Analysis	Sequence reads in FASTQ format	NA	Trimmomatic (Bolger et al. 2014)
trim_se_fastq_reads	Trim single-end reads based on quality and length thresholds	Analysis	Sequence reads in FASTQ format	NA	Trimmomatic (Bolger et al. 2014)
trim_pe_adapters_fastq_reads	Trim adapters from paired-end reads and implement length thresholds	Analysis	Sequence reads in FASTQ format	NA	Trimmomatic (Bolger et al. 2014)
trim_se_ adapters_fastq_reads	Trim adapters from single-end reads and implement length thresholds	Analysis	Sequence reads in FASTQ format	NA	Trimmomatic (Bolger et al. 2014)
gc_content	Determine GC content	Analysis	FASTA file of nucleotide sequences	Romiguier et al. (2010)	custom scripts (Shen, Salichos, et al. 2016); GC-Profile (Gao and Zhang 2006)
genome_assembly_metrics	Determine diverse properties of a genome assembly for quality assessment and characterization	Analysis	FASTA file of a genome assembly	Gurevich et al. (2013)	QUAST (Gurevich et al. 2013); REAPR (Hunt et al. 2013)
l50	L50	Analysis	FASTA file of a genome assembly	Gurevich et al. (2013)	QUAST (Gurevich et al. 2013)
l90	L90	Analysis	FASTA file of a genome assembly	Gurevich et al. (2013)	QUAST (Gurevich et al. 2013)
longest_scaffold	Determine the length of the longest entry in a FASTA file	Analysis	FASTA file	Gurevich et al. (2013)	Custom scripts (Ou et al. 2020)
n50	N50	Analysis	FASTA file of a genome assembly	Gurevich et al. (2013)	QUAST (Gurevich et al. 2013)
n90	N90	Analysis	FASTA file of a genome assembly	Gurevich et al. (2013)	QUAST (Gurevich et al. 2013)
number_of_large_scaffolds	Determine the number and length of scaffolds longer than 500 nucleotides. Length threshold of 500 nucleotides can be modified by the user	Analysis	FASTA file	NA	QUAST (Gurevich et al. 2013)
number_of_scaffolds	Determine the number of FASTA entries	Analysis	FASTA file	NA	QUAST (Gurevich et al. 2013)
sum_of_scaffold_lengths	Determine the total length of all FASTA entries	Analysis	FASTA file	NA	QUAST (Gurevich et al. 2013)
character_frequency	Determine the frequency of each character. Gaps are assumed to be represented as ‘?’ and ‘-’ characters	Analysis	FASTA file	NA	Biostrings (https://rdrr.io/bioc/Biostrings/)
faidx	Get sequence entry from FASTA file	Processing	FASTA file	NA	SAMtools (Li et al. 2009)
file_format_converter	Converts multiple sequence alignments from one format to another	Processing	FASTA, Clustal, MAF, Mauve, Phylip, Phylip-sequential, Phylip-relaxed, and Stockholm	NA	ALTER (Glez-Pena et al. 2010)
multiple_line_to_single_line_fasta	Reformat sequences to be represented on one line	Processing	FASTA file	NA	FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/)
remove_fasta_entry	Remove sequence based on entry identifier	Processing	FASTA file	NA	NA
remove_short_sequences	Remove short sequences	Processing	FASTA file	NA	NA
rename_fasta_entries	Rename FASTA entries	Processing	FASTA file	NA	FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/)
reorder_by_sequence_length	Reorder FASTA entries by length	Processing	FASTA file	NA	SeqKit (Shen, Le, et al. 2016)
sequence_complement	Generate sequence complements in the forward or reverse direction	Processing	FASTA file	Britten (1998)	EMBOSS (Rice et al. 2000)
sequence_length	Calculate the length of each FASTA file	Analysis	FASTA file	NA	bioawk (https://github.com/lh3/bioawk)
single_line_to_multiple_line_fasta	Reformat sequences to be represented on multiple lines	Processing	FASTA file	NA	FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/)