Skip to main content
. 2019 Apr 13;8(4):giz029. doi: 10.1093/gigascience/giz029

Table 1:

Main steps undertaken by each module of the database curation script

Module Steps
Module 1 Extract subset of raw Midori database for query taxon and loci
Remove sequences with non-binomial species names, reduce subspecies to species labels
Add local sequences (optional)
Check for relevant new sequences for list of query species on NCBI (GenBank and RefSeq) (optional)
Select amplicon region and remove primers
Remove sequences with ambiguous bases
Align
End of module: optional check of alignments
Module 2 Compare sequence species labels with taxonomy
Non-matching labels queried against Catalogue of Life to check for known synonyms
Remaining mismatches kept if genus already exists in taxonomy, otherwise flagged for removal
End of module: optional check of flagged species labels
Module 3 Discard flagged sequences
Update taxonomy key file for sequences found to be incorrectly labelled in Module 2
Run SATIVA
End of module: optional check of putatively mislabelled sequences
Module 4 Discard flagged sequences
Finalize consensus taxonomy and relabel sequences with correct species label and accession number
Select 1 representative sequence per haplotype per species