Skip to main content
. 2022 May 10;221(3):iyac079. doi: 10.1093/genetics/iyac079

Table 1.

Summary of 42 functions available in BioKIT at the time of publication.

Function name Description Type of function Input data Citation Example software that performs this function
alignment_length Calculate alignment length Analysis Multiple-sequence file in FASTA format NA AMAS (Borowiec 2016)
alignment_recoding Recode alignments using reduced character states Processing Multiple-sequence file in FASTA format Woese et al. (1991), Embley et al. (2003), Kosiol et al. (2004), Hrdy et al. (2004), Susko and Roger (2007) Custom scripts (Hernandez and Ryan 2021)
alignment_summary Summarize diverse properties of a multiple sequence alignment Analysis Multiple-sequence file in FASTA format NA AMAS (Borowiec 2016); custom scripts (Shen, Salichos, et al. 2016); PhyKIT (Steenwyk, Buida, et al. 2021)
consensus_sequence Generates a consensus sequence Analysis Multiple-sequence file in FASTA format Sternke et al. (2019) Geneious (https://www.geneious.com)
constant_sites Determine the number of constant sites in an alignment Analysis Multiple-sequence file in FASTA format Kumar et al. (2016) IQ-TREE (Minh et al. 2020)
parsimony_ informative_sites Determine the number of parsimony-informative sites in an alignment Analysis Multiple-sequence file in FASTA format Kumar et al. (2016) AMAS (Borowiec 2016); custom scripts (Shen, Salichos, et al. 2016)
position_specific_ score_matrix Generates a position specific score matrix for an alignment Analysis Multiple-sequence file in FASTA format Gribskov et al. (1987) BLAST+ (Camacho et al. 2009)
variable_sites Determine the number of variable sites in an alignment Analysis Multiple-sequence file in FASTA format Shen, Salichos, et al. (2016) AMAS (Borowiec 2016); custom scripts (Shen, Salichos, et al. 2016); PhyKIT (Steenwyk, Buida, et al. 2021)
gc_content_first_ position Determine the GC content of the first codon position among protein coding sequences Analysis Protein coding sequences in FASTA format Bentele et al. (2013) Custom scripts (Bentele et al. 2013)
gc_content_second_ position Determine the GC content of the second codon position among protein coding sequences Analysis Protein coding sequences in FASTA format Bentele et al. (2013) Custom scripts (Bentele et al. 2013)
gc_content_third_ position Determine the GC content of the third codon position among protein coding sequences Analysis Protein coding sequences in FASTA format Bentele et al. (2013) Custom scripts (Bentele et al. 2013)
gene_wise_relative_ synonymous_ codon_usage Calculate gene-wise relative synonymous codon usage Analysis Protein coding sequences in FASTA format This study This study
relative_synonymous_ codon_usage Calculate relative synonymous codon usage Analysis Protein coding sequences in FASTA format Xu et al. (2008) MEGA (Kumar et al. 2016)
translate_sequence Translate protein coding sequences to amino acid sequences Processing Protein coding sequences in FASTA format NA EMBOSS (Rice et al. 2000)
fastq_read_lengths Examine the distribution of read lengths Analysis Sequence reads in FASTQ format NA FQStat (Chanumolu et al. 2019)
subset_pe_fastq_reads Down sample paired-end reads Processing Sequence reads in FASTQ format NA SeqKit (Shen, Le, et al. 2016)
subset_se_fastq_reads Down sample single-end reads Processing Sequence reads in FASTQ format NA SeqKit (Shen, Le, et al. 2016)
trim_pe_fastq_reads Trim paired-end reads based on quality and length thresholds Analysis Sequence reads in FASTQ format NA Trimmomatic (Bolger et al. 2014)
trim_se_fastq_reads Trim single-end reads based on quality and length thresholds Analysis Sequence reads in FASTQ format NA Trimmomatic (Bolger et al. 2014)
trim_pe_adapters_fastq_reads Trim adapters from paired-end reads and implement length thresholds Analysis Sequence reads in FASTQ format NA Trimmomatic (Bolger et al. 2014)
trim_se_ adapters_fastq_reads Trim adapters from single-end reads and implement length thresholds Analysis Sequence reads in FASTQ format NA Trimmomatic (Bolger et al. 2014)
gc_content Determine GC content Analysis FASTA file of nucleotide sequences Romiguier et al. (2010) custom scripts (Shen, Salichos, et al. 2016); GC-Profile (Gao and Zhang 2006)
genome_assembly_metrics Determine diverse properties of a genome assembly for quality assessment and characterization Analysis FASTA file of a genome assembly Gurevich et al. (2013) QUAST (Gurevich et al. 2013); REAPR (Hunt et al. 2013)
l50 L50 Analysis FASTA file of a genome assembly Gurevich et al. (2013) QUAST (Gurevich et al. 2013)
l90 L90 Analysis FASTA file of a genome assembly Gurevich et al. (2013) QUAST (Gurevich et al. 2013)
longest_scaffold Determine the length of the longest entry in a FASTA file Analysis FASTA file Gurevich et al. (2013) Custom scripts (Ou et al. 2020)
n50 N50 Analysis FASTA file of a genome assembly Gurevich et al. (2013) QUAST (Gurevich et al. 2013)
n90 N90 Analysis FASTA file of a genome assembly Gurevich et al. (2013) QUAST (Gurevich et al. 2013)
number_of_large_scaffolds Determine the number and length of scaffolds longer than 500 nucleotides. Length threshold of 500 nucleotides can be modified by the user Analysis FASTA file NA QUAST (Gurevich et al. 2013)
number_of_scaffolds Determine the number of FASTA entries Analysis FASTA file NA QUAST (Gurevich et al. 2013)
sum_of_scaffold_lengths Determine the total length of all FASTA entries Analysis FASTA file NA QUAST (Gurevich et al. 2013)
character_frequency Determine the frequency of each character. Gaps are assumed to be represented as ‘?’ and ‘-’ characters Analysis FASTA file NA Biostrings (https://rdrr.io/bioc/Biostrings/)
faidx Get sequence entry from FASTA file Processing FASTA file NA SAMtools (Li et al. 2009)
file_format_converter Converts multiple sequence alignments from one format to another Processing FASTA, Clustal, MAF, Mauve, Phylip, Phylip-sequential, Phylip-relaxed, and Stockholm NA ALTER (Glez-Pena et al. 2010)
multiple_line_to_single_line_fasta Reformat sequences to be represented on one line Processing FASTA file NA FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/)
remove_fasta_entry Remove sequence based on entry identifier Processing FASTA file NA NA
remove_short_sequences Remove short sequences Processing FASTA file NA NA
rename_fasta_entries Rename FASTA entries Processing FASTA file NA FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/)
reorder_by_sequence_length Reorder FASTA entries by length Processing FASTA file NA SeqKit (Shen, Le, et al. 2016)
sequence_complement Generate sequence complements in the forward or reverse direction Processing FASTA file Britten (1998) EMBOSS (Rice et al. 2000)
sequence_length Calculate the length of each FASTA file Analysis FASTA file NA bioawk (https://github.com/lh3/bioawk)
single_line_to_multiple_line_fasta Reformat sequences to be represented on multiple lines Processing FASTA file NA FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/)