. 2019 Apr 13;8(4):giz029. doi: 10.1093/gigascience/giz029

Table 1:

Main steps undertaken by each module of the database curation script

Module	Steps
Module 1	Extract subset of raw Midori database for query taxon and loci
	Remove sequences with non-binomial species names, reduce subspecies to species labels
	Add local sequences (optional)
	Check for relevant new sequences for list of query species on NCBI (GenBank and RefSeq) (optional)
	Select amplicon region and remove primers
	Remove sequences with ambiguous bases
	Align
	End of module: optional check of alignments
Module 2	Compare sequence species labels with taxonomy
	Non-matching labels queried against Catalogue of Life to check for known synonyms
	Remaining mismatches kept if genus already exists in taxonomy, otherwise flagged for removal
	End of module: optional check of flagged species labels
Module 3	Discard flagged sequences
	Update taxonomy key file for sequences found to be incorrectly labelled in Module 2
	Run SATIVA
	End of module: optional check of putatively mislabelled sequences
Module 4	Discard flagged sequences
	Finalize consensus taxonomy and relabel sequences with correct species label and accession number
	Select 1 representative sequence per haplotype per species