. 2021 Sep 30;22(3):847–861. doi: 10.1111/1755-0998.13502

TABLE 1.

Table of all bioinformatic tasks performed across the core papers set

Task group	Task	Description	Number of papers reporting task	Number of papers not reporting software	Total number of software tools	Total number of software functions	Number of papers performing manually
Read preparation	Quality control	Generating a report of sequence quality information from a sample or set of samples ‐ no modification is done to data	19	0	4	4	0
	Adapter trimming	Trimming of sequencing adapters	9	1	6	6	0
	Demultiplexing	Separation of sequences from a mixed pool into separate pools based on the occurrence of a unique set of bases (index or tag)	55	17	16	19	0
	Pair merging	The assembly of mate pair reads into a single contig	63	1	10	18	0
	Quality trimming	The removal of bases from either or both ends of sequences in a pool based on quality scores	20	1	8	10	0
	Mate pairing	The identification and syncronisation of mate pair reads between two samples, often involving arranging reads in identical orders and/or removal of reads without a mate pair	3	0	3	3	0
	Primer trimming	Trimming of PCR primers	66	8	15	17	0
	Reverse complementation	Reverse complementing the sequences in a pool	7	3	2	2	0
	Sequence conversion	Converting sequences from fastq to fasta	3	0	2	3	0
	Length trimming	The removal of bases from either or both ends of sequences in a pool, either the removal of a fixed number of bases or the removal of a variable number of bases to reduce sequences to a standard length	10	3	6	7	0
	Pair concatenation	Concatenating mate pair reads into a single contig (where reads don't overlap)	8	4	4	4	0
	Assembly	The assembly of reads into contigs, applied when more than one pair of overlapping fragments have been metabarcoded	6	0	4	4	0
	Degapping	Removal of gaps from sequences	1	0	1	1	0
Sequence processing	Dereplication	The removal of duplicate reads to retain only unique sequences in a pool; often the total number of copies of a sequence is recorded in the header of the retained sequence	58	10	11	19	0
Sequence processing	Size sorting	The sorting of a fasta file according to a size annotation in the header	10	2	3	4	0
Filtering	Quality filtering	Removal and/or trimming of sequences from a pool based on quality information. Also often converts from fastq to fasta.	81	11	20	27	0
	Similarity filtering	Removal of sequences based on similarity to an alignment, either based on sequence identity or alignment position	9	1	4	4	0
	Length filtering	The removal of sequences from a pool that are less than, more than, or fall within or outside of a specified length threshold or thresholds	54	21	17	23	0
	Preclustering	Reduction of sequence variation in a dataset prior to further processing ‐ a form of denoising	12	1	3	6	0
	Denoising	The removal of reads containing putative PCR or sequencing errors based on statistical assessment	18	1	8	8	0
	Normalisation	A process by which the number of sequences for each of a set of samples is reduced where necessary such that the output set of samples all have the same number of sequences while maintaining the relative frequencies of OTUs	2	0	1	1	1
	Chimera filtering	The filtering of putative chimeric assemblies from a pool of mate paired reads	63	4	6	16	1
	Translation filtering	Removal of sequences from a set of sequence based on their translation, usually removing sequences with inframe stop codons or frameshifts due to erroneous indels or substitutions caused by sequencing errors	22	3	11	12	0
	Frequency filtering	Removal of sequences based on their frequency in a pool	51	37	11	15	1
	Taxonomy filtering	Removal of sequences based on an assigned taxonomy or a taxonomic classification	9	5	1	1	1
	Mistag filtering	Removal of sequences based on putative tagging errors	3	1	1	1	0
Data generation	OTU delimitation	The grouping of a set of sequences into OTUs by some method	84	5	12	22	0
	OTU mapping	The mapping of sequences to OTUs to provide read counts for each OTU	30	3	7	11	0
	Uncurated taxonomic assignment	The assignment (identification or classification) of taxonomy to OTUs using a global uncorated reference database (e.g., GenBank, BOLD)	55	2	11	13	0
	Reference taxonomic assignment	The assignment (identification or classification) of taxonomy to OTUs using a purpose‐built and/or specially curated reference set of sequences	60	9	18	23	1

Tasks are grouped into four groups by broad purpose, and a detailed definition of each task is given along with summary statistics of the implementation of each task across the 111 papers. For a list of the software used for each task, Table S1 is an expanded version of this table.