. 2021 Mar 3;7(3):mgen000531. doi: 10.1099/mgen.0.000531

Table 1.

Quality control metrics of the bioinformatics workflow

Metric	Definition	Warning threshold	Failure threshold
Contamination	Percentage of reads classified as highest occurring in species other than E. coli	1 %	5 %
Median coverage against assembly	Median coverage based on mapping of the trimmed reads against the assembled contigs	20	10
% cgMLST genes identified	Percentage of cgMLST genes identified. Only perfect hits (i.e. full length and 100 % identity) are considered [85]	95	90
Average read quality (Q-score)	Q-score of the trimmed reads averaged over all reads and positions	30	25
GC-content deviation	Deviation of the average GC content of the trimmed reads from the expected value for E. coli (50.5% [86])	2 %	4 %
N-content	Average N-fraction per read position of the trimmed reads, expressed as a percentage	0.5 %	1 %
Per base sequence content	Difference between AT and GC frequencies averaged at every read position. Since primer artefacts can cause fluctuations at the start of reads due to the non-random nature of enzymatic tagmentation when the Nextera XT protocol is used for library preparation, the first 20 bases are not included in this test. As fluctuations can also exist at the end of reads caused by the low abundance of very long reads because of read trimming, the 0.5 % longest reads are similarly excluded	3 %	6 %
Minimum read length	Minimum read length after trimming (denoted as a percentage of untrimmed read length) that a minimum of half of all trimmed reads must obtain (e.g. half of all trimmed reads should either be minimally 120 or 200 bases long when raw input reads lengths are 300 bases long)	66.67 %	40.00 %