Table 3.
The process of development of (sub-)classification tools for LBP using AI/ML compared to the STarT Back and McKenzie.
Classification accuracya | Internal consistencyb | Test−retest reliabilityc | Intra- or inter-rater reliabilityd | Construct validitye | Discriminative validityf | Prognosis: paing | Prognosis: disabilityg | Treatment: painh | Treatment: disabilityh | Treatment: costsh | |
---|---|---|---|---|---|---|---|---|---|---|---|
AI/ML | 20/25 (80%) | — | — | — | — | — | — | — | — | — | — |
STarT Back | NA | 6/9 (67%) | 9/9 (100%) | — | 5/11 (45%) | 8/8 (100%) | 2/6 (33%) | 6/8 (75%) | 1/4 (25%) | 3/4 (75%) | 0/2 (0%) |
McKenzie | NA | — | — | 4/10 (40%) | — | — | — | 1/2 (50%) | 5/11 (45%) | 4/11 (36%) | 0/1 (0%) |
Values reported as number and percentage.
AI/ML artificial intelligence and machine learning, — no studies available or unable to be measured, NA not assessed in this systematic review.
aNumber of AI/ML studies reporting ≥80% accuracy of classification into ‘low-back pain’ versus ‘healthy’.
bInternal consistency was considered acceptable if Cronbach’s α was ≥0.7146.
cTest−retest was considered as acceptable above an intraclass correlation coefficient (ICC) of ≥0.7146,163.
dKappa scores for intra-rater and inter-tester reliability were considered good ≥0.61122.
eConstruct validity ≥0.6 was considered acceptable146,164.
fDiscriminative validity ≥0.7 was considered as acceptable discrimination13.
gPrognosis prediction was considered ‘adequate’ when the classification approach resulted in statistically significant prediction of outcome after adjusting for baseline pain or disability in multivariate models147–150.
hTreatment effect was considered ‘adequate’ when the classification approach resulted in a statistically significant improved patients outcomes for pain or disability or healthcare costs in randomised or non-randomised clinical trials.