. 2020 Jul 9;3:93. doi: 10.1038/s41746-020-0303-x

Table 3.

The process of development of (sub-)classification tools for LBP using AI/ML compared to the STarT Back and McKenzie.

	Classification accuracy^a	Internal consistency^b	Test−retest reliability^c	Intra- or inter-rater reliability^d	Construct validity^e	Discriminative validity^f	Prognosis: pain^g	Prognosis: disability^g	Treatment: pain^h	Treatment: disability^h	Treatment: costs^h
AI/ML	20/25 (80%)	—	—	—	—	—	—	—	—	—	—
STarT Back	NA	6/9 (67%)	9/9 (100%)	—	5/11 (45%)	8/8 (100%)	2/6 (33%)	6/8 (75%)	1/4 (25%)	3/4 (75%)	0/2 (0%)
McKenzie	NA	—	—	4/10 (40%)	—	—	—	1/2 (50%)	5/11 (45%)	4/11 (36%)	0/1 (0%)

Values reported as number and percentage.

AI/ML artificial intelligence and machine learning, — no studies available or unable to be measured, NA not assessed in this systematic review.

^aNumber of AI/ML studies reporting ≥80% accuracy of classification into ‘low-back pain’ versus ‘healthy’.

^bInternal consistency was considered acceptable if Cronbach’s α was ≥0.7¹⁴⁶.

^cTest−retest was considered as acceptable above an intraclass correlation coefficient (ICC) of ≥0.7^146,163.

^dKappa scores for intra-rater and inter-tester reliability were considered good ≥0.61¹²².

^eConstruct validity ≥0.6 was considered acceptable^146,164.

^fDiscriminative validity ≥0.7 was considered as acceptable discrimination¹³.

^gPrognosis prediction was considered ‘adequate’ when the classification approach resulted in statistically significant prediction of outcome after adjusting for baseline pain or disability in multivariate models^147–150.

^hTreatment effect was considered ‘adequate’ when the classification approach resulted in a statistically significant improved patients outcomes for pain or disability or healthcare costs in randomised or non-randomised clinical trials.