Table 2:
List of commands included in both BigSeqKit and seqkit. Those commands with an asterisk support new functionalities not included in seqkit
Basic commands | |
seq | Transform sequences (extract ID, filter by length, remove gaps, reverse complement, etc.) |
subseq | Get subsequences by region/gtf/bed, including flanking sequences |
stats | Simple statistics of FASTA/Q files: #seqs, min/max length, N50, Q20%, Q30%, etc. |
faidx* | Create FASTA or FASTQ index file and extract subsequences |
Format conversion | |
fa2fq | Retrieve corresponding FASTQ records by a FASTA file |
fq2fa | Convert FASTQ file to FASTA format |
translate | Translate DNA/RNA to protein sequence |
Searching | |
grep | Search sequences by ID/name/sequence/sequence motifs |
locate | Locate subsequences/motifs |
Set operations | |
sample | Sample sequences by number or proportion |
rmdup | Remove duplicated sequences by ID/name/sequence |
common | Find common sequences of multiple files by ID/name/sequence |
duplicate | Duplicate sequences N times |
head | Print first N FASTA/Q records |
head-genome | Print sequences of the first genome with common prefixes in name |
pair | Match up paired-end reads from 2 FASTQ files |
range | Print FASTA/Q records in a range (start:end) |
Edit | |
concat | Concatenate sequences with the same ID from multiple files |
replace | Replace name/sequence using a regular expression |
rename | Rename duplicated IDs |
Ordering | |
sort | Sort sequences by ID/name/sequence/length |
shuffle | Shuffle sequences |