Table 2.
Source | Relevant factor | Acting on | Effect in study: |
|
---|---|---|---|---|
Training + LOOCV | Train → test | |||
Reference: homogeneous sample (“0”) | Separation strength, Δy0 Spread, σy0 |
– |
dML0 = Δy0/σy0 acc0 = Φ(dML0/2) |
– |
Heterogeneity of the diseasea | f | Separation strength, Δy → dML | dML = √[(1 + f)/2] dML0 | dML = fdML0 |
Heterogeneity: biological variation | σBIOL | broadening, σy → dML | dML = (σy0/σy)dML0 | dMLT = (σy/σyT)dML |
Measurement noise | σEXP | broadening, σy → dML | dML = (σy0/σy) dML0 | dMLT = (σy/σyT)dML |
Sampling effects (finite N) | N, σy | uncertainty in accuracy | ≤ SD(acc) in train/test case | SD(acc) = √(acctrue × (100%−acctrue)/N) |
Gold → silver standard | Intra-class kappa, κ | ceiling of accuracy | acc = κ × acc0 +(1−κ)/2 |
ai.e., related to the prediction model.
f = cos(θ), the relative amount of heterogeneity;σ2y = σ2BIOL + σ2EXP; acc = accuracy; subscripts “T” refer to values in the Test sample.