Table 2: Performance criteria to define the readiness of test methods for hazard evaluation.
Criteria | Description | Examples / Why is it important | Max. score |
---|---|---|---|
1 Test system | Note: here scoring not for ‘test method’ | 10 | |
1a What is modelled | Is there a clear rationale given for what target organ/tissue relevant for human poisoning/pathology the test systems should reflect | Here: question is not for relevance, but whether there is documentation and a rationale at all. | 1 |
1b Relevance | Is the chosen test system known to be a key component in pathogenesis, or why is it thought to reflect a key component, mechanism or tissue | Here: is the tissue/organ modelled important for regulatory toxicology or biomedical research purposes. Is evidence given for the relevance of the model by morphological comparison, gene expression or functional criteria? Are all/sufficient cell types included in the model? | 1 |
1c System uncertainties and human correlate (HC) | (i) Is there a discussion on where the test system differs from the mimicked human tissue, and which gaps of analogy need to be considered? (ii) Do toxicant-altered genes (or other biomarkers) correspond to changes in mimicked human tissue (after poisoning or in relevant pathologies) | (i) E.g. a differentiated cell or a cell line (such as HepG2) does not necessarily reflect all features of the corresponding in vivo tissue/conditions. (ii) This is an additional measure to increase confidence in the test; not mandatory, but helpful. | 1 |
1d Definition of cells | Is the test system sufficiently characterized (source; multiple positive and negative markers for cell identity, number, quality, composition, differentiation state, viability, usual morphology, basic function, basic reaction to stimuli, STR… [“STR”=?]) | This is especially important for cells that have to be produced regularly, e.g. by differentiation or primary cell isolation. | 1 |
1e Cell composition | For multi-component systems: information on all cellular subpopulations. What is the percentage of contaminating cells or in co-cultures what is the percentage of all subpopulations. | This is important for the test endpoints as it could be that only one cell type may be affected by a toxicant. For primary cells: have cells from different sources (suppliers) been tested (e.g. hepatocytes from different suppliers may differ in purity and quality)? For routine use it would be beneficial to have pre-set acceptance criteria for each cell type | 1 |
1f Cellular environment | Information on structuring components of the test system: coating, scaffolds, matrix description, medium (supplements), microfluidic effects, supportive cells, dimensions and positioning/handling of 3D constructs,.... | This means a very detailed description of the culture conditions, including temporal and spatial aspects. Cell differentiation and response (quality, quantity, kinetics) may depend on multiple external factors and on the 3D arrangement | 1 |
1g Biological consistency | (i) Has the variation of the test system been assessed, influencing factors identified? (ii) Have acceptance criteria and performance standards for the test system been defined (different from the test!)? | (i) E.g. do medium supplements have an influence on the outcome of the cells; such as batch effects of FCS or serum replacement additives? (ii) e.g. a range of marker expression levels, of biological function (proliferation, protein production,…), of structural features (cell number, organoid size,…),… For lines: what is the optimum passage number of cells?. For routine use it would be beneficial to have pre-set acceptance criteria for the whole model/test system | 1 |
1h Critical components | Have critical components and handling steps been identified and described? Are examples for normal performance and morphology given; are there examples for alerts? | E.g. cell density on a specific day of differentiation could be a critical step; wrong, strange morphology of cells could be an alert.). For routine use it would be beneficial to have pre-set acceptance criteria. | |
1i Cell stability | Stability proven over multiple doublings; genetic stability shown; pluripotency/multipotency (for stem cells) shown, cell identity shown | For stem cells, stability needs to be shown over many passages (≥10). For primary cells: stability and identity of supply needs to be shown; stability of function (e.g. xenobiotic metabolism) shown. | 1 |
1j Transgenic cells | Transgene characterized (source, sequence, regulation); insertion characterized; stability of function shown and quantified; cell identity and function related to wt [weight?]; clonality documented. | 1 | |
2 Exposure scheme | 3 | ||
2a Description | Complete, detailed, unambiguous. | Medium changes, re-additions, coating, treatment period and timing, incubation conditions (temp. gassing,..) | 1 |
2b Unique identity | Tests with multiple variants of a test need to define very transparently, which variant the data come from | E.g. from which cell type/clone; which time; which plate format; which medium additives… | 1 |
2c Graphical scheme | Complete sequence of events, including endpoint assessment | Supports clarity and data assignment to test variants | 1 |
3 Documentation / SOP | 5 | ||
3a
Availability |
Method description for test system, test procedure, analytical endpoints and prediction model; public availability of SOP (data bank or test developer upon request) | Normal scientific publications are usually not sufficient, unless it is a specific methods paper. For transferability of the test method it is beneficial to have SOPs or other documents covering each component of test method and the whole testing process | 1 |
3b Stage of development | Version history; updated | 1 | |
3c For CRO tests | Are full performance standards and corresponding data delivered by the CRO along with test data (in case SOP details are not disclosed) | Non-disclosure of SOP is acceptable, if full performance/readiness criteria are given. | 1 |
3d Test components | Documented and available (receipt, storage, handling and disposal documents); quality criteria and checking procedure established | E.g. for media, plates, coating it should be defined, what is acceptable/non-acceptable and how this is controlled. Test chemical identity and purity (certificate of analysis) and safety data sheets for chemicals | 1 |
3e Stocks | Procedure for preparation, storage and quality control of stocks established | 1 | |
4 Main endpoint(s) | Mainly referring to specific/functional endpoints | 4 | |
4a Biol. relevance | Is there a rationale given why test endpoint is relevant to adverse outcomes | Helps to interpret the results obtained. | 1 |
4b Toxicological relevance | Are toxicants (≥ 3) known to affect the endpoint | Helps to interpret the results obtained. | 1 |
4c Analytical methods | Methods defined, rationale given; positive controls and acceptability criteria | Positive controls for analytical method may differ from controls for test/endpoint | 1 |
4d Multiple endpoints | Are all endpoints and their relation to one another (priority, preference) defined | E.g. neurite outgrowth / cytotoxicity | 1 |
5 Cytotoxicity | Here: if cytotoxicity is not main endpoint | 5 | |
5a Cytotoxicity within test | Cytotoxicity is preferentially determined within same test compartment as the major endpoint; second choice is under same conditions in parallel | Control of cytotoxicity in a different format (e.g. other types of plates; other time are very problematic). Measuring cytotoxicity under the same test conditions as the main end point help to interpret the mechanism related to the adverse effects for the main end point (specific or cytotoxicity driven mechanism) | 1 |
5b Subpopulation effects | Are subpopulations detected by measure for cytotoxicity or proliferation; are minor changes detected? Has sensitivity been shown? | Usually at least three types of assay required (measurement of viability, measurement of cell death, single cell analysis) | 0.5 |
5c Specificity (compared to cytotox) | A measure needs to be established to distinguish a specific/functional endpoint from cytotoxicity | E.g. neurite outgrowth, migration inhibition in non-cytotoxic concentration ranges | 0.5 |
5d Timing within test | For repeated/prolonged dosing, early death and compensatory growth need to be considered | The test of cytotoxicity only at the end may give false negative data, if cells die early and this is not detectable late, because of compensatory proliferation. | 0.5 |
5e Timing after test | For very short endpoints, e.g. electrophysiology measured 30 min after toxicant exposure, delayed measure of cytotoxicity is necessary | Cells cannot die in very short time, even though compound triggers lethal changes. Data for 24h exposure should be given. | 0.5 |
5f Curve fitting | Sufficient non-toxic data points (baseline); at least 40% toxicity / change to allow fitting | 0.5 | |
5g Non-cytotoxicity | Absence of ‘cytotoxicity’ does not mean non-cytotoxicity (question of power): has data variation been considered; is a measure of uncertainty given for non-cytotoxicity (e.g. BMCL calculation)? | 0.5 | |
5h Bench mark response | Has a rationale been given for setting a threshold value for cytotoxicity (statistical or biological significance) | E.g. statistical: 3x standard deviation; biological: 90% viability; see also: http://invitrotox.uni-konstanz.de/ | 0.5 |
5i Apoptosis/ Proliferation | If natural feature of the test system: measure for normal rate required | 0.5 | |
6 Test method controls | 4 | ||
6a Positive controls (PC) | ≥ 3 toxicants required for test definition; preferentially of different mechanisms; preferentially human-relevant toxicants; indicate variation of PC within and across assays | Used to define acceptability criteria, S/N ratio or z’-value of screen | 1 |
6b Negative controls (NC) | ≥ 5 negative controls are required to define specificity at ±20% level; concentration of negatives needs to be defined and rationalized | Ways to define negatives: (i) e.g. compound only acting when metabolized, (ii) acting on another organ, (iii) known to be safe for pregnant women, (iv) being selective for another assay, (v) pairs/matches of a specific positive control (e.g. inactive metabolite) | 1 |
6c Unspecific controls (UC) | A type of negative control for functional assays: not inactive, but only cytotoxic | Absolutely essential to define baseline variation and thus the relevant benchmark response for positive hits | 1 |
6d Endpoint-specific controls (EC) | To provide plausibility, and to help initial test setup: EC show that pathways considered to be relevant for test endpoint are indeed affecting the test endpoint. EC help to correlate (by concentration and time) compound effect on pathway (activity measure to be established) and on test endpoint (standard test readout). EC may be chemicals or siRNA; pathways may be defined from literature or experimentally (gene expression) | Example: actin is required for migration, thus an actin inhibitor should affect migration endpoint | 1 |
7 Data evaluation | Here: referring to main endpoint(s) | 4 | |
7a Outliers | Procedure for handling and documentation should be established | 1 | |
7b Concentration -dependence | Higher confidence in concentration-dependent data; no-effect concentrations must be included (full range curve); data need sufficiently dense spacing around benchmark concentration; preferably provide statistical significance for key data points | 1 | |
7c Benchmark response | Give rationale for definition (statistical (after FDR correction) or biological). Provide power estimate if conclusions are drawn from negatives. | 1 | |
7d Curve fitting | Indicate detailed procedure used for curve fitting; preferentially force fitted curve through 100% at negative control conditions (full function) | E.g. sigmoidal, linear or exponential curve fit | 1 |
8 Testing strategy | 4 | ||
8a Hazard prediction | Which hazard is assessed; which question does the test method answer? | 1 | |
8b Link to an AOP | Does the test give input to a mechanistic concept, e.g. an AOP? | Helps to position in battery; helps to interpret results | 1 |
8c Role in battery | Full score for stand alone tests. For tests that are not stand alone, information on their relation to other tests in a battery is required. | Information is required on how the test data would be used in a battery and under which conditions this is possible. | 1 |
8d Comparison to similar tests | Does the test fill a gap in a battery? Is it providing advantages compared to another test for the same hazard? | Avoid overlapping tests to be performed. Ensure adequate testing battery/strategy | 1 |
9 Robustness | 4 | ||
9a Reproducibility | Data available on normal variation; Information on factors affecting test variation is given | Historic control data on positive controls show normal range; known artefacts and shortcomings | 1 |
9b Intra-lab | Data available from different operators, different test runs over longer time | 1 | |
9c Inter-lab | Data available on transferability / reproducibility in another lab | 1 | |
9d Historical controls | Data for PC and NC over time | 1 | |
10 Test benchmarks | 4 | ||
10a Sensitivity (of the test) | Signal noise ratio (S/N) defined. Sensitivity information available | S/N based on adequate data sets. The S/N is used to determine the limit of detection. Additional measures: True positive rate, hit rate; sensitivity to detect a panel of positive controls, etc… | 1 |
10b Specificity (of the test) | Tested with sufficient number and quality of negative controls | Additional measures: true negative rate, etc. | 1 |
10c Acceptance criteria | Clearly defined and documented. Normal range of variation known | E.g. a given positive control has to reduce the main endpoint by at least 25%, otherwise test plate is discarded. | 1 |
10d Response characteristics | Should the response be linear? What are the upper and lower limits? | Additional measures: mono-directional or bi-directional deviation defined; Info on accuracy, precision, limit of quantification, etc. | 1 |
11 Prediction model | 4 | ||
11a Definition | Information should be available and clear
(including rationale for model, i.e. its particular strengths). Information and rationale should be given for use of sharp thresholds or probabilistic approach. |
Information on how many classes of toxicants
are predicted. Positives and non-positives; or strong, medium, weak
positives. Information on uncertainty of prediction should be given, at
least for positives (note that uncertainty of negatives is often not
defined). E.g. you can define a sharp threshold all above 4 is positive or you can define above 4 has a 70% likelihood to be positive |
1 |
11b Rationale | Reason, and mathematical basis / plausibility for prediction model given | Reason for the choice and value of thresholds | 1 |
11c Confirmation | Experimental testing of prediction model; confirmation of function/predictivity | 1 | |
11d Limitations | Information on limitations of prediction model, and on how exceptions and special cases are to be handled | Strange curve shapes, solubility issues, assay
interferences, … How special chemical classes are handled |
1 |
12 Applicability domains | 3 | ||
12a Chemicals | Is information on the types of chemicals that fall into the prediction model / testing range available? | 1 | |
12b Pathways | The type of pathways that are relevant for the test (to be disturbed or to be detected) | 1 | |
12c AOP | Information contributed to an AOP KE/MIE; element of a KE testing battery | 1 | |
13 Screening hits | 4 | ||
13a Hit definition | Transparent, pre-defined criteria (including curve-fitting/statistical procedure) | Usually, non-hits are discarded. If statements of non-hits are made, they need definition and power calculation. | 1 |
13b Hit confirmation (prim.) | Independent test run(s) in “same” test method; full concentration-response | Often loose (soft) criteria for hits, and no correction for false discovery rate. Confirmation assays can counteract such problems; use of new cells and new compound stocks provides additional robustness. | 1 |
13c Hit confirmation (sec.) | Additional test (different from primary test method) confirming hit on same endpoint as screen | E.g. migration may be measured by tracking cells (primary test) and then (secondary test) by a Boyden chamber method. | 1 |
13d Screen documentation | Acceptability criteria, performance of positive controls, internal robustness controls | 1 |