Skip to main content
. 2021 Sep 30;22(3):847–861. doi: 10.1111/1755-0998.13502

TABLE 1.

Table of all bioinformatic tasks performed across the core papers set

Task group Task Description Number of papers reporting task Number of papers not reporting software Total number of software tools Total number of software functions Number of papers performing manually
Read preparation Quality control Generating a report of sequence quality information from a sample or set of samples ‐ no modification is done to data 19 0 4 4 0
Adapter trimming Trimming of sequencing adapters 9 1 6 6 0
Demultiplexing Separation of sequences from a mixed pool into separate pools based on the occurrence of a unique set of bases (index or tag) 55 17 16 19 0
Pair merging The assembly of mate pair reads into a single contig 63 1 10 18 0
Quality trimming The removal of bases from either or both ends of sequences in a pool based on quality scores 20 1 8 10 0
Mate pairing The identification and syncronisation of mate pair reads between two samples, often involving arranging reads in identical orders and/or removal of reads without a mate pair 3 0 3 3 0
Primer trimming Trimming of PCR primers 66 8 15 17 0
Reverse complementation Reverse complementing the sequences in a pool 7 3 2 2 0
Sequence conversion Converting sequences from fastq to fasta 3 0 2 3 0
Length trimming The removal of bases from either or both ends of sequences in a pool, either the removal of a fixed number of bases or the removal of a variable number of bases to reduce sequences to a standard length 10 3 6 7 0
Pair concatenation Concatenating mate pair reads into a single contig (where reads don't overlap) 8 4 4 4 0
Assembly The assembly of reads into contigs, applied when more than one pair of overlapping fragments have been metabarcoded 6 0 4 4 0
Degapping Removal of gaps from sequences 1 0 1 1 0
Sequence processing Dereplication The removal of duplicate reads to retain only unique sequences in a pool; often the total number of copies of a sequence is recorded in the header of the retained sequence 58 10 11 19 0
Size sorting The sorting of a fasta file according to a size annotation in the header 10 2 3 4 0
Filtering Quality filtering Removal and/or trimming of sequences from a pool based on quality information. Also often converts from fastq to fasta. 81 11 20 27 0
Similarity filtering Removal of sequences based on similarity to an alignment, either based on sequence identity or alignment position 9 1 4 4 0
Length filtering The removal of sequences from a pool that are less than, more than, or fall within or outside of a specified length threshold or thresholds 54 21 17 23 0
Preclustering Reduction of sequence variation in a dataset prior to further processing ‐ a form of denoising 12 1 3 6 0
Denoising The removal of reads containing putative PCR or sequencing errors based on statistical assessment 18 1 8 8 0
Normalisation A process by which the number of sequences for each of a set of samples is reduced where necessary such that the output set of samples all have the same number of sequences while maintaining the relative frequencies of OTUs 2 0 1 1 1
Chimera filtering The filtering of putative chimeric assemblies from a pool of mate paired reads 63 4 6 16 1
Translation filtering Removal of sequences from a set of sequence based on their translation, usually removing sequences with inframe stop codons or frameshifts due to erroneous indels or substitutions caused by sequencing errors 22 3 11 12 0
Frequency filtering Removal of sequences based on their frequency in a pool 51 37 11 15 1
Taxonomy filtering Removal of sequences based on an assigned taxonomy or a taxonomic classification 9 5 1 1 1
Mistag filtering Removal of sequences based on putative tagging errors 3 1 1 1 0
Data generation OTU delimitation The grouping of a set of sequences into OTUs by some method 84 5 12 22 0
OTU mapping The mapping of sequences to OTUs to provide read counts for each OTU 30 3 7 11 0
Uncurated taxonomic assignment The assignment (identification or classification) of taxonomy to OTUs using a global uncorated reference database (e.g., GenBank, BOLD) 55 2 11 13 0
Reference taxonomic assignment The assignment (identification or classification) of taxonomy to OTUs using a purpose‐built and/or specially curated reference set of sequences 60 9 18 23 1

Tasks are grouped into four groups by broad purpose, and a detailed definition of each task is given along with summary statistics of the implementation of each task across the 111 papers. For a list of the software used for each task, Table S1 is an expanded version of this table.