Table 2.
Summary table of the between-study and within-study findings on the differences in the validity of sensor-derived measurements of motor function across various groups.
| Are there differences in the validity of sensor-derived measures of motor function as captured | Between-study (ie, meta-analytic) findings | Within-study findings |
| Using mass market devices vs medical sensors? |
|
Insufficient data to evaluate |
| At specific sensor locations? |
|
Insufficient data to evaluate |
| home vs in the laboratory? |
|
No; 1 study found AUCa values of 0.76 (when administered at home) vs 0.83 (when administered in clinic) [59]. A second study found slightly higher accuracy, sensitivity, and specificity when the task was completed at home [87]. |
| In longitudinal vs cross-sectional studies? |
|
No; One study found high Pearson r validity coefficients (r>0.50) for over 40 distinct motion outcomes but very low validity coefficients for a handful, including deflection rage roll (measured in degrees), mean sway velocity roll (measured in degrees per second), and up-down deviation (measured in centimeters) [69]. A second study found Pearson r validity coefficients above 0.50 for variables related to steps taken, distance, and speed, but coefficients below 0.50 for variables related to angles (eg, trunk, hips, ankle, trunk, upper limb, and full body) [78]. A third study found Pearson r validity coefficients above 0.50 for gait, arising from chair, body bradykinesia, hypokinesia, and overall posture and validity coefficients below 0.50 for rigidity of lower and upper extremities axial rigidity, postural stability, legs agility, and tremors in lower or upper extremities [98]. |
| In healthy vs motor impaired patients? |
|
Insufficient data to evaluate |
| Using different feature detection algorithms? |
|
No; One study was able to detect movement best when using random forests relative to support vector machines and naïve Bayes [55]. A second study found that both neural networks and boosting outperformed support vector machines and Fisher linear discriminant analysis [90]. A third study found neural networks performed better than other bagging algorithms including random forest, multilayer perception, decision tree, support vector machine, and naïve Bayes [64]. A fourth study found support vector machines performed better than logistic regression and decision trees [80]. A fifth study found that random forests based on Ridge regression outperformed those based on Lasso, or Gini impurity, and that linear support vector machines outperformed logistic regression and boosting [103]. The sole consistent pattern that emerged was that supervised machine learning techniques performed better than unsupervised techniques (eg, naïve Bayes). |
| Using particular motion sensor signal types? |
|
Insufficient data to evaluate |
| Using all vs a subset of features? |
|
No; One study found AUC values >0.90 for 998 detected features, with a drop to 0.75 when based on the top 30 features [49]. A second study concluded “Accuracies obtained using the 30 most salient features were broadly comparable with the corresponding sensitivity and specificity values obtained using all 998 features” [42]. |
| With the thresholds held constant across patients vs patient-specific thresholds? |
|
No; Although algorithm training typically occurred across a sample, several studies took the approach of starting the algorithm (feature detection) using data across all participants but then allowing each patient to vary in later stages such as feature selection or determining thresholds [34,54,63,68]. Validity estimates from this smaller group of studies were similar in magnitude to those studies that applied the same features and thresholds to the classification of all participants. |
| Using clinically supervised vs nonsupervised assessments of patient clinical status? |
|
Insufficient data to evaluate |
| With outliers trimmed vs retained in the feature detection stage? |
|
Insufficient data to evaluate |
| With transformed data vs untransformed data? |
|
Insufficient data to evaluate |
| With standardized data vs unstandardized data? |
|
Insufficient data to evaluate |
aAUC: area under the curve.