TABLE 4.
Key criteria for evaluation | Description | Target | Key factors affecting performance | Quantitative evaluation (scale of 0–5) |
Stability | Consistency of the typing result for an isolate after its primary isolation and during laboratory storage and subculture. | Typing results should be stable during laboratory storage and subculture; strain markers should not mutate too rapidly to change the strain’s position in the epidemiological context; data on the stability of the markers should be available. | Rapid mutations and recombination of the marker(s) during storage and subculture could lead to poor reproducibility. | 0 – Extremely poor stability 1 – No data are available on stability 3 – Some limited data suggest that markers are stable 5 – Strong data are available supporting stability of markers (and/or data are available that can be used to correct for mutations or changes in markers during passage). |
Typeability | Ability to assign a type to all isolates tested by it. | Typeability should be as high as possible. | Poor typeability could be found in assays using a scheme that does not cover genetic variation in full; typeability may also be reduced if some isolates show high endogenous nuclease activity. | 0 – Extremely poor typeability (<80%) 1 – Data indicate between 80 and 90% typeability; or no evaluation of typeability performed 2 – Data indicate between 90 and 93% typeability 3 – Data indicate between 94 and 96% typeability 4 – Data indicate between 97 and 99% typeability 5 – Data indicate >99% typeability. |
Discriminatory power | Ability to assign a different type to two unrelated strains; discriminatory power can be expressed using Simpson’s index of diversity (SID) | Discriminatory power should be as high as possible. For highly discriminatory methods, clustering using phylogenetic analysis tools can be used to define isolates that share a recent common ancestor. | Discriminatory power is highly dependent on the marker(s) selected for typing. | 0 – Extremely poor discriminatory power (<80%, SID <0.80) 1 – Data indicate between 80 and 90% discriminatory power (SID 0.80–0.90); or no evaluation of discriminatory power performed 2 – Data indicate between 90 and 93% discriminatory power (SID 0.90–0.93) 3 – Data indicate between 94 and 96% discriminatory power (SID 0.94–0.96) 4 – Data indicate between 97 and 99% discriminatory power (SID 0.97–0.99) 5 – Data indicate >99% discriminatory power (i.e., SID > 0.99). Note: we recommend that data are generated using appropriate strain collection and >100 isolates. |
Epidemiological concordance | Ability to reflect, agree with, and possibly further illuminate the available epidemiological information about the cases under study. | Epidemiological concordance should be as high as possible; strains from the same outbreak or strains that are otherwise linked by epidemiological evidence should be classified into the same subtype (or phylogenetically characterized as sharing a recent common ancestor). | Low epidemiological concordance could be found in assays that either target “low stability markers” or an assay with limited discriminatory power, which will group together isolates that are epidemiologically unrelated. | 0 – Extremely poor epidemiological concordance; <80% isolates are classified correctly. 1 – Poor epidemiological concordance; data indicate between 80 and 90% isolates are classified correctly; or no evaluation of epidemiological concordance 2 – Low epidemiological concordance; data indicate between 90 and 93% isolates are classified correctly 3 – Intermediate level of epidemiological concordance; data indicate between 94 and 96% isolates are classified correctly) 4 – Good epidemiological concordance; data indicate between 97 and 99% isolates are classified correctly 5 – Strong epidemiological concordance; data indicate all isolates are classified correctly Note: we recommend that data are generated by using at least 20 sets of epidemiologically related isolates. Ideally, a given subtyping method classifies all of these isolates correctly. |
Reproducibility | Ability to perform reproducibly in different laboratories and with different personnel. | Results should be highly reproducible (>99%). | Poor reproducibility could be the results of (i) technically difficult assay (leading to technical errors by personnel, e.g., cross-contamination), (ii) reagents not standardized sufficiently, (iii) equipment not performing reproducibly, (iv) poorly optimized typing system, (v) sensitivity of equipment or assay system to environmental factors (e.g., humidity, temperature), (vi) bias in observing, recording, analysis, and interpretation of the results; (vii) or assays targeting biologically highly variable markers (e.g., some of the surface antigens targeted by classical serotyping). | 0 – Extremely poor reproducibility; <80%; meaning for >20% of isolates results are not reproducible between labs 1 – Poor reproducibility; data indicate between 80 and 90% of isolates results are reproducible between labs 2 – Low reproducibility; data indicate between 91 and 93% of isolates results are reproducible between labs 3 – Intermediate reproducibility; data indicate between 94 and 96% of isolates results are reproducible between labs 4 – Good reproducibility; data indicate between 97 and 99% of isolates results are reproducible between labs 5 – Strong reproducibility; data indicate >99% of isolates results are reproducible between labs Note: we recommend that data are generated based on an evaluation by at least four laboratories. |
Repeatability | Ability to produce the same results in the same laboratory with the same equipment and personnel | Results should be highly repeatable ( > 99%) | Poor repeatability could be the result of i) technically difficult assay (leading to technical errors by personnel, e.g., cross-contamination), ii) reagents not standardized sufficiently, iii) equipment not performing reproducibly. | 0 – Extremely low repeatability (<90%; meaning for >10% of isolates results are not repeatable) 1 – No evaluation of repeatability performed 2 – Data indicate between 90 and 93% repeatability 3 – Data indicate between 94 and 96% repeatability; or repeatability evaluated with small number of isolates (<40) 4 – Data indicate between 97 and 99% repeatability 5 – Data indicate >99% repeatability Note: we recommend that repeatability evaluation performed with at least 40 isolates, ideally with 100 isolates. |
Serovar prediction ability | Ability to accurately predict the serovar of a given strain. | Range, as the number of identifiable serovars, and accuracy (i.e., percentage of isolates with correct serovar identification) should be maximized. Accuracy should be given priority over range as misclassification may lead to worse decisions than non-classification. | Poor serovar prediction could be a result of (i) limited database coverage of different serovars, (ii) low discriminatory power, (iii) low typeability, (iv) no standard protocol of serovar prediction with produced data. | 0 – Extremely low serovar prediction accuracy (serovar is correctly predicted for <70% of serovars) 1 – No evaluation of serovar prediction ability, or weak prediction accuracy (data indicate between 70 and 80% serovar prediction accuracy) 2 – Data indicate between 80 and 85% serovar prediction accuracy 3 – Data indicate between 85 and 90% serovar prediction accuracy; or serovar prediction ability evaluated with small number of serovars 4 – Data indicate between 90 and 98% serovar prediction accuracy 5 – Data indicate >98% serovar prediction accuracy); serovars are correctly predicted for all common isolates2 Note: we recommend that data are generated by using at least 40 different serovars, ideally more than 100 serovars. |
Speed | Time to results from pure single colony | <5 days | Speed can be influenced by throughput, equipment, and data analysis program used for a given assay | 0 – >1 month 1 – 3–4 weeks 2 – 2–3 weeks 3 – 1–2 weeks 4 – ≤5 days 5 – ≤2 days |
Ease of use | Ease of use encompasses technical simplicity, workload, suitability for high throughput test, ease of data analysis, and result interpretation | Ease of use is important for the implementation of an assay in the internal laboratories of food industry, less important when using services provided by a commercial laboratory. | Poor ease of use is usually caused by the high level of expertise and experience required by a given assay, e.g., bioinformatics expertise to analyze data produced by the assay. | 0 – The given assay requires extremely high level of expertise and experience in specific techniques (PhD level scientist with >4 days of specialized training) 3 – The given assay requires average level of expertise and experience of a microbiological technician 5 – No specific expertise or experience required; assay can be completed by high school diploma and <1 day training. |
Cost | Total cost encompasses cost of equipment reagent/consumables, data analysis platform, and staffing. For routine use, we usually just assess the reagent cost per isolate. Staffing cost can vary considerably in different regions/countries within a given turnaround time, thus needs to be assessed separately with actual local situations. | A balance between efficiency/effectiveness and cost of a given assay is more important than pursuing low cost, because low cost may potentially lead to larger economic loss and extra investigation time caused by poor quality of typing result. | High cost per isolate for routine test is usually caused by high reagent cost and long turnaround time (leading to high staffing cost). | We recommend to use the actual reagent cost per isolate plus staffing cost estimated with given turnaround time to compare the assay being validated to the currently/previously used methods by food industry; data here are based on costs from commercial laboratories in North America and Europe: 0 – >$1,000 per isolate 1 – $500–$1,000 per isolate 2 – $200–$500 per isolate 3 – $150–200 per isolate 4 – $100–150 per isolate 5 – ≤$100 |
1The parameters and information in this table are adapted from Van Belkum et al. (2007) and Wiedmann et al. (2014) with industry-specific practical needs. 2The serovar typing ability of conventional serotyping method (Kaufmann–White Le Minor scheme) is around 90% taking the typeability and accuracy of it into consideration (Bopp et al., 2016).