Table 1.
Metric |
Definition |
Warning threshold |
Failure threshold |
---|---|---|---|
Contamination |
Percentage of reads classified as highest occurring in species other than E. coli |
1 % |
5 % |
Median coverage against assembly |
Median coverage based on mapping of the trimmed reads against the assembled contigs |
20 |
10 |
% cgMLST genes identified |
Percentage of cgMLST genes identified. Only perfect hits (i.e. full length and 100 % identity) are considered [85] |
95 |
90 |
Average read quality (Q-score) |
Q-score of the trimmed reads averaged over all reads and positions |
30 |
25 |
GC-content deviation |
Deviation of the average GC content of the trimmed reads from the expected value for E. coli (50.5% [86]) |
2 % |
4 % |
N-content |
Average N-fraction per read position of the trimmed reads, expressed as a percentage |
0.5 % |
1 % |
Per base sequence content |
Difference between AT and GC frequencies averaged at every read position. Since primer artefacts can cause fluctuations at the start of reads due to the non-random nature of enzymatic tagmentation when the Nextera XT protocol is used for library preparation, the first 20 bases are not included in this test. As fluctuations can also exist at the end of reads caused by the low abundance of very long reads because of read trimming, the 0.5 % longest reads are similarly excluded |
3 % |
6 % |
Minimum read length |
Minimum read length after trimming (denoted as a percentage of untrimmed read length) that a minimum of half of all trimmed reads must obtain (e.g. half of all trimmed reads should either be minimally 120 or 200 bases long when raw input reads lengths are 300 bases long) |
66.67 % |
40.00 % |