Skip to main content
PLOS One logoLink to PLOS One
. 2025 Jun 17;20(6):e0325172. doi: 10.1371/journal.pone.0325172

Prediction of future aging-related slow gait and its determinants with deep learning and logistic regression

Alison Deatsch 1,*,#, Michael McKenna 2,#, Jonathan Palumbo 2, Qu Tian 3, Eleanor Simonsick 3, Luigi Ferrucci 3, Robert Jeraj 1,4, Richard G Spencer 2
Editor: Esedullah Akaras5
PMCID: PMC12173421  PMID: 40526703

Abstract

Background

Identification of accelerated aging and its biomarkers can lead to more timely therapeutic interventions and decision-making. Therefore, we sought to predict aging-related slow gait, a known predictor of accelerated aging, and its determinants.

Methods

We applied a deep learning neural network (NN) and compared it to conventional logistic regression (LR) analysis. We incorporated 1,363 participants from the Baltimore Longitudinal Study of Aging to predict current and future slow gait at 6-year and 10-year follow-up using two clinically-relevant cut-points.

Results

Our NN achieved a maximum sensitivity (specificity) of 81.2% (87.9%), for a 10-year prediction with 0.8 m/s cut-point. We demonstrated the necessity of class balancing and found the NN to perform comparably to or in some cases, better than, LR which achieved a maximum sensitivity and specificity of 84.5% and 86.3%, respectively. Sobol index analysis identified the strongest determinants to be age, BMI, sleep, and grip strength.

Conclusions

The novel use of a NN for this purpose, and successful benchmarking against conventional techniques, justifies further exploration and expansion of this model.

1 Introduction

Understanding the hallmarks and predictors of healthy aging is essential to the goal of increasing population health span [1]. In particular, the development of tools to identify healthy versus accelerated aging trajectories and their early and longitudinal biomarkers can advance opportunities for more effective interventions and better-informed clinical decision-making.

One important indicator of biological aging is gait speed. Every decrement of 0.1 m/s in gait speed is associated with a 12% higher mortality in older adults [2] Among cohorts of older adults, gait speed has also been shown to be associated with survival, to reflect health and functional status, and to be a clinical indicator of well-being [310]. In fact, gait speed has been recommended as a primary endpoint for clinical trials due to the strong association between physical performance measures and brain age [1115]. Slowing gait speed has also been shown to precede cognitive decline and to be associated with Alzheimer’s pathology, making it a useful marker for neurodegenerative risk prediction [16]. Gait speed is a metric of particular relevance for those with long-term, chronic conditions such as COPD and heart failure as a significant predictor of functional status, future well-being, and development of diabetes [1719]. Indeed, one of the chief advantages of gait speed as a health metric is the fact that it is impacted by the integrity of a wide range of organ systems, including neurological status (both sensory and motor), cardiovascular health, orthopedic status, and pulmonary function. However, despite the growing importance of gait speed as a marker for overall health, identification of risk for future decreased gait speed remains incompletely understood.

Since there is no single comprehensive predictor of biological aging, aggregate measures are needed that incorporate complementary clinical biomarkers [20]. Ideally, these would incorporate predictors encompassing functional and physiological domains essential for the study of aging and gait speed. Many studies have been performed correlating a current gait speed measurement with healthy aging metrics, [2127] and baseline gait speed is often used as a predictor of other aging outcomes [4,2833]. However, in spite of its importance, there are few studies attempting to predict future slow gait.

Many predictors have been considered for their predictive power to identify current and future gait speed. A number of investigators have considered similar sets of modifiable risk factors, medical conditions, and clinical data including age, sex, comorbidities, strength, sleep, and alcohol consumption [3436]. Other studies have added more complex factors such as cognition, [37,38] inflammatory markers, [39] and brain volumes [35]. While many variables have been well-established to correlate with gait speed, the quantitative power of these to predict future gait speed decline is much less established.

Several attempts have been made using statistical models to predict gait speed changes from a narrow set of potential predictors, [34,40] with the most commonly-applied analytic techniques being linear and logistic regression. These approaches, however, are restricted to the exploration of only linear relationships, while the complexity of human biochemistry and physiology suggests that their performance may be surpassed by that of models incorporating nonlinear effects and interactions. Indeed, there is a great deal of evidence for nonlinearity in human biological systems (e.g., circadian rhythms, calcium signaling, heart rate variability, disease dynamics) [41,42]. In the case of interest, several studies have previously found nonlinear relationships between gait speed and BMI, [43] age, [44] physical activity, [45] leg strength, [46] and falls [47]. Thus, it is essential that models predicting gait speed decline can encompass potentially nonlinear relationships.

Deep learning (DL) neural networks (NNs) permit the investigation of nonlinear model-free relationships between clinical variables and outcomes and have demonstrated major successes throughout the biomedical sciences. When employed in studies of aging, NNs and other machine learning methods have demonstrated high performance in defining and predicting functional and cognitive outcomes, with multiple contributions highlighting the importance of capturing nonlinear relationships [32,33,48]. For example, Lin et al. found that a NN outperformed logistic regression for predicting mortality in elderly patients with hip fracture [49]. However, despite this demonstrated early success, application of NNs to studies of aging, and to future gait speed prediction in particular, has been underexplored [50].

Indeed, there remains a gap in the literature regarding the use of NNs for prediction of gait speed from clinical variables. This is likely due in large part to the need for a substantial training dataset incorporating the relevant predictor and outcome variables for model development. The required number of subjects depends on the complexity of the analysis, the quality of the data, and the algorithm under consideration. For example, the number of subjects in a selection of comparable studies were 108, 239, 746, and 1901 [21,32,33,48]. Clearly, dataset needs for DL tasks can vary widely, but are generally larger than the required size for LR or basic statistical analysis.

In addition to providing a natural means of developing implicit nonlinear models, NNs exhibit much greater flexibility in the selection of input variables as compared to logistic and linear regression. Convolutional neural networks (CNN) also offer the opportunity to use raw image data without feature extraction, while separate NN structures can make full use of longitudinal data. [51]. Thus, our motivation for use of a NN included establishing a platform to be further expanded in subsequent work.

The development of models which can accurately predict current and future gait speed decline would be of great clinical use in several contexts. For example, these findings may inform physicians of potentially modifiable risks that could ameliorate potential loss of normative gait function in their patients. Similarly, physiotherapists may have the opportunity to design therapeutic protocols based on patient status.

The purpose of this work, therefore, was two-fold. First, we sought to predict aging-related slow gait and its determinants across various timeframes from a basic set of health measures. We employed a NN classification model to capture nonlinear complexity and compared its performance to a conventional logistic regression to demonstrate its viability. This defines our second goal, which was to develop a flexible NN architecture suitable for modification and implementation in further related studies.

2 Methods

2.1 Data

Our studies were performed using data from the Baltimore Longitudinal Study on Aging (BLSA) [52,53]. The BLSA is America’s longest running scientific study of human aging, with data collection beginning in 1958. This provided sufficient data for training and permitted us to perform our studies on a population exhibiting, overall, normative aging. Our analysis focused on two clinically relevant gait speed cut-points (0.8 m/s and 1.0 m/s) [4,21,5457]. For this pilot study, we chose input variables based on other studies evaluating current and future gait speed [21,23,26,27,3438,58]. We first explored the identification of current slow gait speed and its determinants, but with this rich dataset, we were also able to investigate multiple prediction timeframes to identify trends in healthy aging and its determinants across time.

2.1.1 BLSA dataset.

The BLSA was founded in 1958 and contains data on over 3,200 participants. The approximately 1,300 participants still active in the study return every 1–4 years to receive comprehensive health, cognitive, and functional evaluations. A freeze of the BLSA database was accessed August 17, 2021, and the authors performing the analysis had no access to information that could identify individual participants at any time. At the time of this study, the BLSA database consisted of 3,821 gait speed measurements from 1,363 unique subjects. Measurements were performed at intervals depending on subject age, with intervals between visits decreasing with increasing age (<60 = 4 yr intervals, 60–79 = 2 yr intervals, 80+ = 1 yr intervals). The median number of visits per subject was two, with a range of up to 12. This dataset is uniquely suited to our purpose due to the large number of subjects, the high quantity of metrics recorded, the regularity of the testing intervals, and the longitudinal nature of the study.

Gait speed measurements were obtained from timed walks performed according to the Short Physical Performance Battery (SPPB) protocol [59]. The SPPB, including its gait measurement component, is a standardized, widely used tool for measuring physical performance [60]. Participants were timed as they walked unassisted at a normal pace on a 6-meter course. Walking times were converted to gait speed (m/s) and the average of two trials was calculated. These timed walks have high test-retest reliability with ICC values 0.87–0.97 and uncertainty of 0.06–0.11 m/s, depending on the measured population [6164].

Gait speed outcomes were evaluated using two different cut-points that have been shown to be clinically relevant for normatively aging population represented by the BLSA: (1) 0.8 m/s which has been proposed as a marker for severe mobility disability with a strong correlation to mortality [5557] and (2) 1.0 m/s, the speed below which the risk of mortality doubles and at which subjects are deemed at high risk for adverse health-related outcomes [4,21,54]. Binary classification was chosen due to its natural clinical interpretation, similar to a host of other clinical outcome measures. In addition, both LR and NN architecture lend themselves naturally to such classification analyses. Indeed, a classification study is a natural design for a NN and allows us to compare the NN to the well-defined, gold standard LR model. In addition, the use of these cut-points provides clinical relevance as they are metrics for health status categorization and treatment guidance.

The number of gait speed measurements in the BLSA database for each cut-point is shown in Table 1. Any of these classifications (columns in the tables) can be used as an output for the DL model. The “Current” timeframe denotes the classification of gait speed at the time of the measurement, where a “Slow” measurement captures any gait speed that is at or below the cut-point value at that time. In the “Future” timeframes, “slow” captures any gait measurement above the cut-point value corresponding to a subject who will develop a slow gait, that is, fall below the cut-point value, within the time frame indicated. Subjects included in the Future timeframes were restricted to only those with at least as many years of follow-up as specified by the timeframe. In this way, subjects who will develop slow gait in the future or who developed this outside of the study were not incorrectly included in the “Normative” class. Other timeframes were also explored but proved to have too few subjects in the “Slow” class to achieve reliable results.

Table 1. Counts of gait speed measurements and key demographics for each dataset from the BLSA database.
Cut-point Dataset Number of Slow Measurements
(unique participants)
Number of Normative Measurements
(unique participants)
Total Number of Gait Speed Measurements Mean Age % Male % Slow
0.8 m/s Current 276 (238) 3545 (3066) 3821 71.4 ± 13.3 49.4 7.22
Future: 6 Years 181 (98) 830 (504) 1011 69.3 ± 11.6 48.6 17.9
Future: 10 Years 245 (106) 247 (221) 492 71.8 ± 12.0 47.8 49.8
1.0 m/s Current 910 (789) 2911 (2538) 3821 71.4 ± 13.3 49.4 23.8
Future: 6 Years 274 (182) 662 (424) 936 68.5 ± 11.2 49.6 29.3
Future: 10 Years 392 (209) 186 (167) 578 70.9 ± 10.9 49.0 67.8

The number of unique participants from which the gait speed measurements arise is listed in parentheses.

In order to compare our results to previous models, we selected a subset of 15 markers encompassing demographic, lifestyle, and medical history variables previously identified as potentially correlated to gait speed decline either at current [21,23,26,27,34,38,58] or future [3437] timepoints. Inclusion decisions were made based on several factors: (1) the amount of missing data for a particular variable across multiple time frames and subjects, (2) the typical availability of the variable in clinical settings, and (3) the interpretability of the variable. We preferred to incorporate fewer, well-selected variables, rather than more of the available variables, in order to increase the overall interpretability of the result and to prevent the difficulties arising from multiple dependencies. We considered age, sex, hypertension, diabetes, stroke, heart attack, osteoarthritis, cognitive impairment, depression, sleep quality, exercise, grip strength, pain, alcohol consumption, and BMI (Table 2). Categorical variables are noted in the table with (y/n) and were input as 0 or 1 to the model (or a scale of 1–5 in the case of sleep quality). Continuous variables (age, cognitive impairment, grip strength, and BMI) were input as exact values, as recorded in the BLSA data. Missing data points were replaced by the median value of that input for all participants, as is common practice in machine learning. Median was chosen in preference to the mean for this imputation to avoid undue sensitivity to outliers. The percentage of missing values was less than 2% for all variables except MMSE, which was missing for 15% of the gait speed measurements.

Table 2. Definitions of each input variable from the BLSA database.
Input Variable Definition/ Survey Question
Age
 Sex Gender (binary)
 Hypertension Has a doctor ever told you that you are hypertensive? (y/n)
 Diabetes Has a doctor ever told you that you have diabetes? (y/n)
 Stroke Has a doctor ever told you that you had a stroke, mini stroke, or slight stroke? (y/n)
 Heart Attack Has a doctor ever told you that you had a heart attack? (y/n)
 Osteoarthritis Has a doctor ever told you that you have osteoarthritis? (y/n)
 Cognitive Impairment Mini Mental State Exam (MMSE) score (0–30)
 Depression Has a doctor ever told you that you have depression? (y/n)
 Sleep Quality Sleep quality rating past month (Scale: 1–5)
 Exercise Have you performed vigorous exercise in past 2 weeks? (y/n)
 Grip Strength Hand grip muscles right (kg)
 Pain Have you had any/frequent pain in past year (overall)? (y/n)
 Alcohol Consumption Have you consumed alcohol in the past 12 months? (y/n)
 BMI Body Mass Index

2.1.2 Consideration of dataset bias.

We evaluated several points of potential bias in the BLSA data used for this study. Firstly, highly correlated input variables can pose a problem for logistic regression models, making it harder to interpret coefficients and identify significant independent variables. While this multicollinearity is less of a confounding factor for NNs, it can still slow convergence and affect sensitivity analysis. We checked the correlation of our input variables by calculating the Pearson correlation coefficient between each pair. The resulting coefficients for each pair are shown in S1 Fig in the Supporting Information. Values < −0.5 and >0.5 are generally considered to indicate notable correlations. In our case, we see this only for age and grip strength, with r = 0.54. We conclude that there was minimal multicollinearity in our input data. In fact, we elected to incorporate only the right-hand grip strength in the analysis due to high collinearity with the left-hand grip strength (r = 0.74). The use of dominant hand grip strength was also considered but would have further restricted dataset size since this value was not reported in all participants.

For the use of longitudinal data, the potential for bias due to selective dropout, or non-random loss of participants from a study, must be considered [65]. We compared the age, MMSE score, and gender proportion between the cohort of patients included in each dataset and those dropped due to lack of follow-up (see S1 Table in the Supporting Information). The average of each metric for the included cohort was compared to that of the dropped cohort. Potential bias was evaluated by two-sample t-tests as well as effect size calculations for age and MMSE score and by a two-proportion z-test for the percent males. Any significant differences from the t-test or z-test are denoted by the bold text in the table. The small effect sizes, all less than 0.22, as well as the fact that even statistically significant differences were well within one standard deviation of each other, indicate minimal bias due to selective dropout.

We also considered the possibility of dataset drift, that is, whether the average gait speed of subjects has changed over the many years of BLSA data collection. To evaluate this, we calculated the mean gait speed of subjects grouped by the year in which the measurement was taken. S2 Fig in the Supporting Information shows the lack of a clear trend, indicating absence of substantial drift. We also considered dataset drift with respect to year of birth, as shown in the Supporting Information in S3a Fig; such drift could, for example, result from a secular improvement in overall population health. To separate this from the expected trend with respect to age, we also plotted gait speed versus participant age in S3b Fig. A linear regression analysis resulted in β coefficients of 0.0083 and 0.0082, respectively. Based on the near equality of these coefficients and the shape of the plots, we conclude that the effect of dataset drift on our study due to birth year was also not significant.

Lastly, we considered the possibility of bias from the uneven distribution of participant ages at each timepoint of measurement. We found that there were far more measurements in the middle age range of 60–80 than at the extremes within the dataset. To explore the potential of bias due to this overrepresentation, we examined the pattern of false predictions with respect to age within our current timeframe models. We found that the shape of the age histogram closely matched the shape of the false prediction histogram for both cut-points, suggesting that prediction errors were occurring in proportion to the number of subjects across the dataset, that is, accuracy was relatively independent of age. We conclude that training set size was sufficient across the participant ages and there was consequently minimal bias attributable to sample age distribution.

2.1.3 Ethics approval and consent to participate.

Written informed consent was obtained from all participants and the Institutional Review Board of the Intramural Research Program approved the study protocol.

2.2 Models

Two types of binary classification models were developed to predict slow or normative future gait speed. Separate models were created to predict a participant’s current gait speed class. In all models, stratified 10-fold cross validation was performed. Each model type was trained and tested separately with unbalanced data and balanced data for the various timepoints and the two cut-points described in Section 2.1. Throughout this paper, we refer to each separately trained model with each dataset as an individual classifier. To be clear, sensitivity in our context reflects the ability to correctly classify participants with gait speed below the specified cut-point (also referred to as “slow walkers”), while specificity reflects the ability to correctly classify those with gait speed above the cut-point (also referred to as “normative walkers”).

Our dataset is highly unbalanced, as is often the case for clinical data. In particular, the slow walkers are much less well-represented as compared to normative walkers. As a result, accuracy (proportion of correctly classified subjects) is a poor metric for performance evaluation; high accuracy may be attained even with poor sensitivity for identification of slow walkers; this is a well-known issue in clinical classification settings. In contrast, the Youden index (Sensitivity + Specificity −1) provides a single summary statistic that captures both sensitivity and specificity. We therefore used this metric to optimize model hyperparameters and to compare performance across models. In addition, model performance was also assessed by sensitivity and specificity separately, precision, and the area under the precision-recall curve (AUPRC).

2.2.1 Neural network methodology.

We implemented feed-forward NNs with the Keras library using TensorFlow as backend, using Python version 3.6. We chose a relatively simple NN architecture in order to incorporate nonlinear relationships without overcomplicating the model. An architecture with a limited number of nodes was chosen to prevent overfitting since the input data (a single column, short length vector) was relatively non-complex compared to many DL tasks. Overfitting was mitigated by careful hyperparameter tuning and 10-fold cross validation as detailed below.

A schematic overview of our model is shown in Fig 1. Generally, the model used operational blocks as indicated by the inset of Fig 1; these consisted of a fully connected layer followed by batch normalization, an exponential linear unit (ELU) activation function, and dropout. Batch normalization was used prior to the activation function to standardize inputs and facilitate training. Dropout regularization was implemented to reduce overfitting and improve generalizability. Four such sequential FC blocks were followed by another fully connected layer and then by a final softmax binary output layer. Further details on model optimization through loss function testing and hyperparameter tuning can be found in the Supporting Information.

Fig 1. Schematic of the NN architecture.

Fig 1

Models for current gait prediction and future gait prediction differ in layer size and activation functions. The final two layers, colored green, do not use the batch normalization and the dropout procedure implemented in previous blocks.

2.2.2 Logistic regression methodology.

Logistic regression was also implemented using the scikit-learn module in Python for benchmarking our DL results. We compared performance using several solvers: Newton’s method, liblinear, stochastic average gradient (SAG), and limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS). We found no significant difference between these, and so selected the default L-BFGS solver with L2-regularization. Inputs to this model were the 15 physiological markers used for the NN method, and the target variable was again a binary slow/normative label assigned based on either the 0.8 or the 1.0 m/s cut-point. Stratified 10-fold cross validation was again used, with results compiled from the average of the 10 folds. Logistic regression was performed both with and without the use of terms representing first order interactions with age.

2.3 Class balancing techniques

As in many clinical studies, the BLSA dataset exhibited substantial class imbalance, with many fewer slow than normative walkers. Models trained with unbalanced datasets often serve as poor predictors of the minority class; [66] therefore, we explored the efficacy of various class balancing techniques. We compared the effect on model performance from one undersampling technique, Random Undersampling (RUS), and two oversampling techniques, synthetic minority oversampling technique (SMOTE) and SMOTE with edited nearest neighbors undersampling (SMOTE-ENN).

These class balancing procedures were applied to the training set within each cross-validation fold. This ensured that the testing set was composed exclusively of non-synthetic samples and that there was no leakage from the training set. Augmentation of each fold also reduced bias, reducing the effect of any outlier synthetic samples. Undersampling of each fold separately ensured that majority class samples containing potentially important or unique information were not removed entirely from the experiment. Data augmentation was not performed on the 10-year classifier for the 0.8 m/s cut-point given the approximately equal number of slow and normative samples in that dataset.

2.3.1 Random undersampling (RUS).

A manual RUS procedure was used to randomly remove samples from the majority class of the training set until each classifier had an approximately even number of slow and normative samples. Applying RUS individually to each fold rather than to the full training set ensured that normative samples containing potentially important samples were not removed entirely from the experiment.

2.3.2 Synthetic minority oversampling technique (SMOTE).

SMOTE was introduced as a method to increase the sensitivity of predictor’s performance on imbalanced datasets without sacrificing large amounts of data through random undersampling [67]. To implement this, a sample from the minority class is drawn and its k = 3 nearest neighbors, determined by Euclidian distance, are identified. A vector is drawn to one of those neighbors from the selected sample and then multiplied by a number between 0 and 1. The resultant vector is then added to the selected sample to create a new synthetic data point [66]. This procedure is applicable only to numeric data. SMOTE-NC is a variation of this which can be applied to nominal data as well. We implemented SMOTE-NC using the Imbalanced Learning Library in Python to achieve an equal class distribution between slow and normative walkers [68].

2.3.3 Synthetic minority oversampling technique with edited nearest neighbors (SMOTE-ENN).

SMOTE-ENN is a popular class-balancing technique that was developed to combine SMOTE’s ability to generate new minority data with the ENN undersampling algorithm [69]. Following generation of new synthetic data points through SMOTE, ENN is applied. Using the same k-nearest neighbors technique, a sample and its neighborhood is identified. If a sample is found to have a different label than the majority of labels in its neighborhood, then all observations are deleted. Repeating this procedure results in equalized classes as well as less overlap between classes.

2.4 Sobol sensitivity analysis

We implemented a Sobol index sensitivity analysis to identify the relative importance of clinical variables for determination of current slow gait as well as development of slow gait over a defined timeframe. Sobol sensitivity analysis is a form of variance-based global sensitivity analysis that assesses the degree to which a model’s output variance can be attributed to each input variable. The importance of individual variables is assessed by first-order indices, while higher order indices indicate the importance of interactions [70]. We implemented Sobol index analysis using the Sensitivity Analysis Library (SALib) in Python [71,72]. Parameters with resulting Sobol indices >0.05 are considered significant.

3 Results

3.1 Model performance

3.1.1 Optimized models.

The best performing models of both the NN and the LR were class-balanced with the RUS method. The final dataset sizes after RUS class balancing are listed in Table 3.

Table 3. Final dataset sizes for each classifier after RUS class balancing.
Cut-point Dataset Normative Slow Total
0.8 m/s Current 276 276 552
Future: 6 Years 181 181 362
Future: 10 Years 247 245 492
1.0 m/s Current 910 910 1820
Future: 6 Years 274 274 548
Future: 10 Years 186 186 372

The Youden indices of RUS class-balanced models for the NN and LR analyses are shown in Fig 2. These results, along with sensitivity, specificity, precision, and AUPRC values are listed in Table 4. The best performing classifier was for the 10-year prediction using a 0.8 m/s cut-point, where the NN achieved a sensitivity and specificity of 81.2% and 87.9%, respectively. This performance is similar to that of the LR which achieved sensitivity and specificity in this case of 84.5% and 86.3%, respectively.

Fig 2. Youden index for NN (blue) and LR (red) models after class balancing using RUS.

Fig 2

RUS class balancing was not implemented for the 10-year prediction due to the balance in the native dataset. Error bars were determined by the standard deviation across all cross-validation folds.

Table 4. Performance metrics for the NN and LR models using RUS class balancing.
Cut-Point 0.8 m/s
Current Prediction 6-Year Prediction 10-Year Prediction
LR NN LR NN LR NN
Youden index 0.49 0.45 0.65 0.67 0.71 0.69
Sensitivity 0.78 0.73 0.84 0.85 0.85 0.81
Specificity 0.71 0.73 0.81 0.82 0.86 0.88
Precision 0.17 0.17 0.50 0.51 0.86 0.88
AUPRC 0.22 0.19 0.52 0.51 0.43 0.41
Cut-Point 1.0 m/s
Current Prediction 6-Year Prediction 10-Year Prediction
LR NN LR NN LR NN
Youden index 0.43 0.43 0.46 0.48 0.51 0.53
Sensitivity 0.75 0.70 0.43 0.68 0.75 0.72
Specificity 0.68 0.73 0.72 0.80 0.76 0.81
Precision 0.42 0.45 0.53 0.59 0.87 0.89
AUPRC 0.32 0.31 0.36 0.35 0.25 0.25

RUS class balancing was not implemented for the 10-year prediction due to the balance in the native dataset. Results are shown for all six classifiers. The AUPRC value is the difference between the AUC and the no-skill AUC.

3.1.2 Comparison of models with unbalanced classes.

To directly compare the performance of the NN and LR in the unbalanced case, the Youden indices for the NN and LR models without class balancing are presented in Fig 3; these results, along with sensitivity, specificity, precision, and AUPRC values are listed in the Supporting Information, S2 Table. The two distinct model types performed comparably across all cut-points and timeframes, with the NN slightly outperforming the LR in all but one case.

Fig 3. Youden index values for NN (red) and LR (blue) models without class balancing.

Fig 3

As seen, the NN exhibits overall superior performance. Error bars were determined by the standard deviation across all cross-validation folds.

3.1.3 Exploration of class balancing.

In addition to RUS, we explored two other class balancing techniques, along with the unbalanced case. Results for each classifier before and after data modification with the class-balancing techniques described above are shown in Fig 4a; these results, along with sensitivity, specificity, precision, and AUPRC values are listed in the Supporting Information, S3 Table. For the 10-year prediction with 0.8 m/s cut-point, classes were approximately equal without balancing, which was therefore not performed.

Fig 4. Youden index values for all class-balancing techniques beside the unbalanced case (blue bars) for the.

Fig 4

(a) NN and (b) LR model. Class-balancing techniques resulted in improved Youden indices. Class balancing performed by RUS, SMOTE, and SMOTE-ENN algorithms are represented with red, green, and purple bars, respectively. Class balancing was not implemented for the 10-year prediction due to the balance in the native dataset.

Performance of the LR for each dataset after the various class-balancing techniques are shown in Fig 4b. The Youden index, sensitivity, specificity, precision, and AUPRC values are listed in the Supporting Information, S4 Table. Both the NN and LR models were substantially improved through use of class balancing in all but one case (the 10-year prediction with 1.0 m/s cut-point).

3.2 Sensitivity analysis of optimized models

The results of the Sobol index sensitivity analysis for the optimized, RUS-balanced NN and LR models for each dataset are shown in Fig 5. All variables found to be significant (with an index greater than 0.05) are also listed in order of their Sobol index in the Supporting Information, S5 Fig. Across all the classifiers tested except one, age is the strongest predictor, though it is never the only significant predictor.

Fig 5. Sobol indices ranking the relative importance of input variables for model predictions.

Fig 5

Results are shown for the (a) NN with a 0.8 m/s cut-point, (b) NN model with a 1.0 m/s cut-point, (c) LR model with a 0.8 m/s cut-point, and (d) logistic regression model with a 1.0 m/s cut-point. Total Sobol values are shown in blue for prediction of current gait speed, red for prediction of gait speed in 6 years, and green for prediction of gait speed in 10 years. As seen, age is by far the most influential variable in all cases, with the importance of the other variables differing among the four panels.

4 Discussion

The purpose of this work was to predict aging-related slow gait and its determinants across various timeframes. The key contributions of this work are: (1) the development of a NN which can successfully predict slow gait with a flexible structure permitting several natural extensions of the present work, (2) the demonstration of the comparable or improved performance of the NN relative to a conventional LR, (3) the determination that the key determinants of future slow gait are age, BMI, sleep, and grip strength, and (4) the validation of the fact that class balancing substantially improves performance over models with unbalanced datasets.

4.1 Model performance

4.1.1 Optimized models.

The results for the optimized, class-balanced models (Fig 4 and Table 4) demonstrate that the NN performed as well as or better than the LR. The best performing classifier was for the 10-year prediction using a 0.8 m/s cut-point, where the NN achieved a sensitivity and specificity of 81.2% and 87.9%, respectively. This performance is similar to that of the LR which achieved a sensitivity and specificity of 84.5% and 86.3%, respectively. This is on a par with other complicated clinical analyses, but clearly encourages further work towards improvement.

Of note, all models performed, at best, with only moderate success; the maximum Youden index, for example, was ~ 0.7 (for 10-year prediction of slow gait with respect to the 0.8 m/sec cutoff). This motivates the future use of additional input data and perhaps elimination of those that were found not to provide substantial predictive power. Even with comparable performance, NN methods have an important advantage over LR for future work in that they are adaptable to the incorporation of images into further analysis. In contrast, the LR approach can incorporate image-derived metrics, such as regional volumes, but does not allow for use of images themselves. Similarly, NNs can be expanded to incorporate longitudinal data in a formal way, unlike LR.

We found that all the models performed better for future than for current gait prediction, despite the latter’s greater training set size (Table 3). This may reflect the particular set of input variables we chose or indicate that the chosen markers reflect physiologic status that emerges clinically only after a period of time. This relationship may even be time dependent as we also note that the 10-year classifiers slightly outperformed the 6-year classifiers, again independent of the training set size. Finally, we note that classifiers for the 0.8 m/s cut-point outperformed those using the 1.0 m/s cut-point for the equivalent timeframes, suggesting that the more extreme cut-point was more easily identified as abnormal by this set of input variables.

While the performance difference between the NN and the LR was minimal in the present study, the additional flexibility afforded by the NN and its ability to capture nonlinear relationships may prove to be crucial for future studies of gait and related physical outcomes. In particular, the use of NN allows for the addition of more complex inputs such as images and other non-tabular data.

4.1.2 Comparison of models with unbalanced classes.

Fig 3 shows that when using unbalanced data, across all cut-points and timeframes, the NN slightly outperforms standard LR except in the case of the 10-year prediction using a 0.8 m/s cut-point. However, as in the balanced data case, the model types perform comparably. The relatively weak performance of the current and 6-year predictions for both cut-points is largely attributable to low sensitivities seen in S2 Table in the Supporting Information. Such sensitivity values are not surprising given the extreme class imbalance observed for the corresponding data (Table 1). The NN exhibited substantially greater sensitivity than LR in these cases, suggesting the comparative resiliency of the NN approach to data imbalance. Nevertheless, the sensitivity and Youden index are still poor overall, motivating the exploration of class-balancing techniques to improve sensitivity.

Issues such as class balancing have come to the fore with the recent explosion of interest in DL and are known to virtually all practitioners. However, such methods have been much less recognized in the context of more traditional analytic methods such as LR. In that sense, the conventional NN approach incorporating class balancing (Fig 2), greatly outperforms the conventional LR approach in which no class balancing is performed (Fig 3). These results highlight therefore the great power of multivariate linear analysis in conjunction with more modern considerations related to dataset structure. This leaves open the question of why the theoretically more limited LR approach exhibited performance on par with the NN after class balancing. Evidently, for the variables selected and over the range of values studied, linear effects capture the dominant biomarkers of gait without the need for nonlinear modeling.

4.1.3 Exploration of class balancing.

As indicated by the Youden index results shown in Fig 4, both the NN and LR models were substantially improved through use of class balancing in all but one case. Increases in Youden index were most evident in models using a 0.8 m/s cut-point, for which class imbalance was more severe, but those with a 1.0 m/s cut-point also saw improvements. Increases in Youden index were driven by large increases in sensitivity as seen in the Supporting Information, S3 and S4 Tables. These increases were most apparent for current timeframe predictions, particularly with LR and for the 0.8 m/s cut-point for which the model was almost entirely unable to identify slow walkers. The current timeframe datasets had the greatest imbalance in the native data (see “% Slow” column in Table 1), and it was in fact for these that class balancing led to the largest increases in performance. These results strongly support the use of class balancing in related studies, particularly since our classifiers were trained on balanced data and tested on natural ratios; this is a more difficult classification task than balancing both training and testing sets.

The one model that did not benefit from class balancing was the NN designed to predict gait speed class in 10 years with the 1.0 m/s cut-point. We attribute this to the fact that this dataset was already well-balanced, actually containing more slow than normative walkers, in contrast to all other datasets used.

The three class balancing methods, RUS, SMOTE, and SMOTE-ENN, exhibited overall similar performance. We therefore selected RUS for class balancing, given its simplicity and the fact that it does not require synthetic data. It is interesting that the simplest approach, RUS, performed as well as the more sophisticated techniques of SMOTE and SMOTE-ENN. In this context, we note characteristics of SMOTE that may have limited its effectiveness in our study. Overlap between classes is particularly problematic for SMOTE, given that it blindly generalizes the region of a minority class without consideration of nearby samples of the majority class [66]. This strategy is particularly problematic in the case of highly skewed class distributions since the minority class tends to be sparse with respect to a majority class, thus resulting in a greater chance of class mixture [73]. SMOTE can also exhibit limited performance in the setting of multiple feature inputs; it becomes much more difficult to generate a representative sample of new data in higher dimensions. High dimensionality also gives rise to the phenomenon of hubness, in which a small number of points are overrepresented in the selection of nearest neighbors [74,75]. Given the efficacy of class balancing in this study, further investigation into the utility of other available methods may be warranted.

4.2 Sensitivity analysis

Sensitivity analysis was captured by the Sobol indices plotted in Fig 5 and listed in S5 Fig in the Supporting Information. In the best performing NN classifier (10-year prediction with 0.8 m/s cut-point; Fig 5b, green bars), we found five significant predictors with the strongest being age, BMI, and sleep quality. For the LR model, nine significant predictors were identified, with the strongest being age, grip strength, and BMI. Four of the five variables that were identified as most predictive in the NN analysis were also most predictive in the corresponding LR model. The remaining variable, sleep quality, may enter in a nonlinear fashion that required the NN to uncover. This motivates further comparative studies to characterize the nature of the relationships between clinical variables and slow gait outcomes.

Other methods for assessing sensitivity to particular variables were considered. These included logistic regression coefficients, Shapley Additive Explanations (SHAP), and Local Interpretable Model-agnostic Explanations (LIME). Sobol Index analysis was chosen due to (1) the need to have a single consistent metric across both model types, (2) its global ability to measure sensitivity across the entire input space, and (3) its applicability to nonlinear relationships.

4.2.1 Trends across time, cut-point, and model type.

We found that across all the classifiers tested except one, age is the strongest predictor. For the current prediction with 1.0 m/s cut-point, grip strength was a stronger predictor than age. Of note, this was also the lowest-performing classifier. The dominance of age as a predictor was as expected, but we found that its influence was in all cases modulated by other significant predictors.

For the NN, age was less dominant as a predictor for current than for future gait. Consistent with this, there were a greater number of significant variables in these analyses. In fact, for current prediction with a 1.0 m/s cut-point, every input variable was found to be significant. In contrast, for the 10-year prediction with the same cut-point, only three variables were found to be significant: age, depression, and hypertension. This result suggests that a wider array of the chosen variables exhibit a substantial predictive power for current slow gait as compared with future slow gait, which may reflect certain effects becoming dominant over time. In particular, osteoarthritis, diabetes, pain, and sex are significant for current but not for future predictions. One hypothesis for the differences across time is that certain metrics exhibit a more delayed influence on the aging trajectory of a subject. For example, smoking or consistent poor sleep quality presumably correlates to higher morbidity over longer time scales, while poor strength influences gait contemporaneously.

For the future prediction NN classifiers, more variables were found to be significant for the 0.8 m/s cut-point than the 1.0 m/s cut-point. Grip strength appeared as a significant variable in all but one classifier (10-year prediction with 1.0 m/s cut-point). Sleep quality, BMI, stroke, and hypertension appeared in four classifiers as significant predictors. Interestingly, for the 6-year prediction, age and stroke were the strongest determinants across both cut-points, but stroke did not appear among the predictive variables for the 10-year timeframe.

In this study, we performed direct comparisons of two different model structures, NN and LR. This allowed us to capture model dynamics and consider the key determinants of healthy aging within this dataset without model constraints. We found a similar set of dominant predictors for the LR (Figs 5c, 5d, and S6) as for the NN (Figs 5a, 5b, and S5). In particular, age, grip strength, BMI, and hypertension appeared as significant most frequently across both model types. Further, all four of these were found to be significant in our best performing models (10-year prediction with 0.8 m/s cut-point for both NN and LR). On the other hand, stroke and sleep quality were found to be significant more consistently in the NN than in the LR. This may indicate a stronger non-linearity in the relationship between these variables and gait speed.

Additionally, we found that more variables appeared as significant for the LR model as compared to the NN for the 10-year prediction, but fewer were significant for the LR for current prediction. We also found that the significant predictors in the LR were more consistent across classifiers than in the NN. Age, grip strength, MMSE score, and hypertension appeared as significant in every LR classifier, with age and grip strength among the top three factors in four classifiers and among the top five factors across all LR classifiers. BMI was found to be significant in five of the six LR classifiers, while stroke was found to be significant in four. Interestingly, diabetes and depression only appeared as significant in future prediction classifiers, not those of current timeframe; this may indicate the importance of these variables over time. Pain was the only variable that never appeared as significant in any of the LR classifiers.

While our methodology does not indicate causality versus correlation, the fact that future gait speed can be predicted by models which include modifiable risk factors may prove to be of substantial clinical utility. For example, BMI and grip strength were consistently found to be significant according to Sobol index analysis and can clearly form the basis for therapeutic intervention and guidance.

4.2.2 Consideration of survey data and strongest predictors.

We conducted several additional sensitivity studies with the NN to further define the impact of certain variables. These investigations were confined to the 10-year prediction of the 0.8 m/s classification cut point using balanced data; see S6 Fig.

Our variables included both self-reported survey data and objective, quantifiable data. The limitations of self-reported data have been well-recognized [76]. We therefore investigated the specific contribution of these variables to our models by performing a NN analysis using only sex and the four non-categorical (non-survey) variables of age, BMI, grip strength, and cognitive score as input variable. This resulted in comparable performance with a Youden index of 0.68 as compared to the Youden index of 0.69 for the original dataset of all 15 variables. These results are shown in the Supporting Information, S5 Table. This indicates that the categorical variables do not provide substantial additional predictive power to the model.

Since age was the dominant predictor as indicated by its large Sobol index in all models, we also developed a NN model with age and sex as the only input variables. The resulting Youden index of 0.65 indicated a somewhat degraded performance as compared to the all-variables result of 0.69. The decrease was due to lower sensitivity, 0.764, as compared to the all-variables sensitivity of 0.812. To further investigate the role of dominant variables, we trained a NN model using all variables except age. As expected based on Sobol index analysis, we found a substantial decrease in the Youden index, from 0.69 to 0.49. While performance indeed worsened through omission of age, it is noteworthy that the other variables alone still achieved decent performance for gait classification.

Previous investigators have established a strong correlation between height and gait speed, unsurprisingly, especially when examining current timeframes [77]. However, when we swapped height for BMI in our models, we encountered no discernable difference in the model performance. Thus, we did not include height in our classifiers.

We conclude that the 15 original variables formed a reasonable input set. While performance was essentially maintained when categorical variables were removed, these results support their use in that the self-reported survey data did not negatively impact performance. The results obtained when omitting age provided evidence that multiple input variables are indeed important for gait speed prediction.

4.3 Gait speed prediction in the literature

Previous analyses of gait speed as it relates to aging trajectories differ from the current study and present several limitations. A number of studies evaluate gait speed as a model input, rather than as an outcome [4,2833]. In fact, that line of investigation demonstrates the importance of gait speed as a measure of health status and provides the motivation for our study. Most studies that, like ours, evaluate the determinants of gait speed examine only predictors of current status, rather than predicting future aging trajectories [2127]. Our inclusion of future predictions as a fundamental target of our work was motivated by the potential clinical utility of aging trajectory prediction and predicated upon the idea that health factors may show increasing impact over time. Indeed, we found notably better predictive ability for both the NN and the LR models for future gait than for current gait.

4.3.1 Data considerations.

Our chosen variables are based on those identified in other studies, and most closely following the important work of Verghese et. al [34]. Other investigators have included similar sets of modifiable risk factors and medical conditions, [3436] with some also incorporating or focusing on cognition [37,38]. In addition, inflammatory markers, [39] body composition, [2] and brain volumes [35] have all been incorporated into gait studies. While lower extremity strength would seem an influential factor in the analysis of gait determinants, few studies have incorporated it [2,23,25]. This may be due to its presumed high correlation with the gait outcome itself, or with the relative difficulty and lower reproducibility in measuring it as compared to hand grip strength. Of note, previous work has shown that grip strength contributed more than lower-extremity strength to variance in walking speed [78].

In spite of this literature, there remains a knowledge gap regarding the relationship between gait speed decline and a range of potentially determinative factors. Although in the present case LR performed on par with the more flexible NN model, we conjecture that these more flexible DL architectures will prove to be better able to capture the influence of complex datasets, including evaluation of longitudinal variables and raw imaging data. An important part of our effort was to develop a DL model and to establish its performance on well-studied variables, so that with further developments, we will be able to more deeply exploit the full potential of the extensive, longitudinal BLSA database.

A further advance in the present work is the evaluation of two clinically relevant gait speeds, defined as the point of severe mobility disability (0.8 m/s) and the speed below which the risk of mortality doubles (1.0 m/s). In contrast, a number of previous studies employ very low cut-points which capture only the most severely impaired subjects [27,34].

4.3.2 Model considerations.

Previous work on future gait speed prediction has implemented only linear statistical models, including mixed-effects models, [2,37,38] hierarchical regressions, [35]. Poisson regressions, [34] and cross-sectional analysis [39]. Similarly, current gait speed prediction models have also mainly employed linear regressions [2326,58] and cross-sectional analyses, [22] along with logistic regressions, [21,27] although support vector regression [79] and decision tree analysis [48] have also been explored. We sought to introduce the NN approach to this important problem, given the complexity of human biochemistry and physiology. With numerous contributing and overlapping factors to consider, AI techniques, such as NNs, offer a potentially superior tool for the study of gait speed decline and aging trajectories. Deep learning models allow for the simultaneous consideration of multiple patient factors, and thus tend to be more suitable than classical statistical methods for problems involving large numbers of predictors by accounting for their combined influence and accommodating the inclusion of complex data such as images, clinical data, and other biomarkers and health [80,81]. Thus, our work represents a promising new application of NNs and, in addition to our biomedical results, serves as a pilot study towards exploration of more complex input variables including images and longitudinal data.

4.3.3 Sensitivity analysis.

Previous studies have used a variety of performance metrics, making direct comparisons difficult. Several of these, like ours, have found BMI, [34,35] grip strength, [34,35] and cognitive impairment [34,37] to be significant predictors of gait. However, other studies found these variables not to be significant [2,38]. Verghese et. al also found exercise and pain to be significant [34]. While Pinter et. al identified age as a strong determinant [35], we note that most modeling studies use age as a covariate or for stratification, instead of investigating its influence directly as we did. Though there is not a strong, clear consensus, there is overall support in the literature for our findings that age, grip strength, and BMI are among the strongest determinants of future gait speed. Sleep quality also emerged as an important predictor in some of our models, as did cognitive impairment and exercise. As compared to previous studies, we evaluated model performance over a longer timeframe (six and ten years, rather than three-to-four years), allowing us to capture longer-term effects. This is especially important for the BLSA dataset, which is derived from a relatively healthy cohort in which slow gait may develop over longer time scales.

Previous investigators have identified additional significant predictors of future gait speed decline include vision loss, [34] balance metrics, [34,36] brain volume and white matter hyperintensity change, [35] baseline gait speed, [36] difficulty with activities of daily living, [36] frailty, [36] reaction time, [36] thigh intermuscular fat area, [2] and inflammatory markers [39] Beavers et. al even found the longitudinal change in thigh fat and thigh muscle to be a strong predictor [2]. These findings further motivate the NN approach with application to a richer set of input variables.

4.4 Limitations

The main limitations of our study center around available data and potential biases. The BLSA cohort consists entirely of volunteers, who are overall healthier and more educated than the general population. Further, the cohort is largely limited to a catchment area of approximately 3 hours driving distance from Baltimore, MD. Other NIA-sponsored study populations, notably the HANDLS cohort which represents a population with greater diversity, [82] would therefore represent important opportunities for further study. Study investigators have targeted specific populations in their recruitment in an effort to achieve more balance in the dataset. Another limitation is that gait speed was not measured in the BLSA until 2004. This may introduce some bias in the upper age groups as, for example, the older individuals in the cohort represent exceptional agers whose initial gait speed measurement would have taken place at an advanced age. To explore the influence of these potential biases on our models, we investigated trends in several variables across the BLSA database and found the resulting dataset drift to be minimal (S2 and S3 Figs in the Supporting Information).

The BLSA dataset includes participants with a wide range of non-debilitating medical conditions. This increases the generalizability of our models and emphasizes that gait speed represents an outcome metric which integrates the function of multiple organ systems. As with predictions in any heterogeneous sample, this approach may limit model performance in terms of numerical metrics, while nevertheless appropriately addressing the overall question of the determinants of gait speed in the study population. Future work may therefore include subgroup analysis with participants stratified by specific underlying pathophysiology, resulting in a very different, but more targeted study. For example, prediction of gait speed trajectories for subjects with neurodegenerative disease may be of substantial clinical value. In any event, given the dataset requirements of NN training, more targeted studies with narrowly specified entrance criteria may require a very different recruitment strategy, given the largely normative aging population represented by the BLSA.

There were also certain limitations to our study based on available data. We were unable to include a variable indicating number of recent falls since fewer than half of recorded timepoints incorporated this measurement. In addition, ~ 15% of the datapoints for MMSE scores required imputation. The use of imputation, and the choice of the median for replacement, while a common method, may introduce a small degree of bias, and evaluation of alternative methods for imputation of missing data may be of interest in subsequent work. Here, we implemented a conventional approach and did not further investigate these issues. Another potential limitation is that gait speed was measured by trained observers rather than by truly objective operator-independent methods. However, this approach has been standardized and used extensively in the BLSA and other studies, with high test-retest reliability (ICC > 0.87). We also note that the majority of our input variables were available only as survey data, which may be less reliable than data extracted from medical records. However, BLSA survey data may be more reliable than most, given the relatively healthy status of BLSA participants and the attribution of unreliable survey data to impaired cognition and memory. Future work should consider more objective measures of the survey data included here.

We chose to include grip strength from only the right-hand side. For most participants, both right and left grip strength was included in the dataset and many also had their dominant hand noted. While we would not anticipate a significant impact on the model performance given the strong correlation we found between the right and left grip strength (Pearson coefficient = 0.74), future work could explore any impact of using the mean of the two grips or the dominant hand. Note also that using a single hand makes the model more generalizable to other datasets and more accessible for clinical implementation.

Extensions of the current study also include additional variables of interest that were excluded here due to the dataset limitations described in Section 2.1.1. These include inflammatory markers, body composition, brain volume, frailty, falls (balance), and vision. A great deal of current clinical research is also focused on social determinants of health, which are emerging as key variables in well-being. These will be key predictors to explore in future work as more of these determinants (e.g., race, income, education) are prioritized and recorded on a consistent basis.

The availability and consistency of the dataset will be a key limiting factor in the proposed future studies of longitudinal data. Initial attempts to implement a recurrent neural network (RNN) study of the present data in order to establish longitudinal effects have met with limited success, likely due to the strict requirements of an RNN for time point repeatability for a sufficiently large sample size. Additional approaches will be explored, given the importance of understanding the determinants of gait speed and its trajectory in aging.

5 Conclusion

This study identified determinants of slow gait in the aging population across several timeframes, noting that age, BMI, sleep, and grip strength are key determinants. In addition, we established a NN model for the exploration of aging-related gait speed decline which performs comparably to or better than a conventional logistic regression model. This structure includes the potential for incorporating additional health measures, more complex inputs including images, and longitudinal data. Our future work will incorporate additional variables with potentially non-linear relationships to gait speed including brain myelination patterns, muscle bioenergetic indices obtained with 31P magnetic resonance spectroscopy, blood or urine biomarkers, leg dominance, and gait biomechanics metrics [22]. A more systematic and extensive exploration of factors such as the social determinants of health and patient-reported outcome measures (PROMs) may provide additional understanding of gait determinants. Some of these advances will be modelled on our previous work developing a CNN + RNN deep learning model for Alzheimer’s disease classification [83,84].

The novel use of a NN for this purpose, and demonstration that it performs as well or better than a regression, justifies further expansion of this model to the consideration of more health measures, more complex inputs, and the incorporation of longitudinal information. While the NN and LR demonstrated similar performance in the present study, justifying the use of either approach, our future work will focus on the NN model due to its ability to capture nonlinear relationships, incorporate images, handle multiple types of variables at once, and more easily integrate longitudinal information. Development of this model and evaluation of its dominant predictors advances the understanding of aging trajectories as they relate to gait speed. Further development of these models may assist in complex decision-making in the clinical setting, aiding in the development and implementation of aging interventions which can improve patient quality-of-life and population health-span.

Supporting information

S1 Fig. Pearson correlation coefficient for each pair of input variables.

The scale bar on the right shows the color coding from red (strongly positively correlated) to blue (strongly negatively correlated).

(TIF)

pone.0325172.s001.tif (463.6KB, tif)
S1 Table. Key demographics of our dataset comparing the included and dropped data from the BLSA database.

Values are median + /- standard deviation. Bold text indicates values that were found to be significantly different.

(TIF)

pone.0325172.s002.tif (35.9KB, tif)
S2 Fig. Analysis of potential dataset drift in the year of visit for all BLSA subjects used in this study.

The lack of a clear trend across the means suggests minimal dataset drift over the year of visit.

(TIF)

pone.0325172.s003.tif (69.6KB, tif)
S3 Fig

a) Analysis of potential dataset drift in the year of birth for all BLSA subjects used in this study. The dotted line indicates a linear regression fit with regression coefficient of 0.0082. b) Comparison to a regression fit of age versus gait speed (decreasing left to right) with coefficient of 0.0083. The similar coefficients and shape of the data indicate minimal dataset drift due to year of birth.

(TIF)

pone.0325172.s004.tif (251.2KB, tif)
S4 Fig. An example learning curve for the NN training on 10 Year 0.8 m/s classifier.

The blue line indicates the loss of the training set, and the orange line represents the loss of the validation set. These loss curves show minimal overfitting after hyperparameter tuning.

(TIF)

pone.0325172.s005.tif (55KB, tif)
S2 Table. Performance metrics for the classifiers trained with unbalanced datasets.

The AUPRC value is the difference between the AUC and the no-skill AUC.

(TIF)

pone.0325172.s006.tif (44.5KB, tif)
S3 Table. Performance metrics of classifiers using the NN with various balancing techniques.

The AUPRC value is the difference between the AUC and the no-skill AUC. RUS = Random Undersampling, SMOTE = Synthetic Minority Oversampling Technique, ENN = Edited Nearest Neighbors. The asterisk indicates the classifier that did not need class balancing techniques because the data was already balanced.

(TIF)

pone.0325172.s007.tif (68.8KB, tif)
S4 Table. Performance metrics of classifiers using the LR with various balancing techniques.

The AUPRC value is the difference between the AUC and the no-skill AUC. RUS = Random Undersampling, SMOTE = Synthetic Minority Oversampling Technique, ENN = Edited Nearest Neighbors. The asterisk indicates the classifier that did not need class balancing techniques because the data was already balanced.

(TIF)

pone.0325172.s008.tif (68KB, tif)
S5 Fig. Total order Sobol indices above 0.05 (significance) for each classifier using the NN.

The color scale is shown in the inset, where darker colored cells indicate variables that are found to be significant more frequently across classifiers.

(TIF)

pone.0325172.s009.tif (390.9KB, tif)
S6 Fig. Total order Sobol indices above 0.05 (significance) for each classifier using the NN.

The color scale is shown in the inset in S5 Fig, where darker colored cells indicate variables that are found to be significant more frequently across classifiers.

(TIF)

pone.0325172.s010.tif (363.7KB, tif)
S5 Table. Results of the sensitivity analysis performed to study the influence of various input variables.

A: 15 Original inputs, B: Original inputs with height in place of BMI, C: Age and sex alone, D: Age, sex, BMI, grip strength, cognitive score (all the quantitative original inputs) E: Original inputs except age.

(TIF)

pone.0325172.s011.tif (19.5KB, tif)

Acknowledgments

We would like to thank all those who contributed to and guided this research = , including the Jeraj (Image Guided Therapy) Lab Group at University of Wisconsin – Madison who offered input into the deep learning aspects of the work. We are additionally grateful to all those who have helped accumulate the vast amounts of data in the BLSA database and to Elango Palchamy at the NIA who helped us access and curate the data for our use.

Abbreviations

NN

Neural network

LR

Logistic regression

DL

Deep learning

CNN

Convolutional neural networks

BLSA

Baltimore Longitudinal Study on Aging

BMI

Body mass index

AUPRC

Area under the precision-recall curve

MMSE

Mini Mental State Exam

ELU

Exponential Linear Unit

SCCE

Sparse categorical cross entropy

SAG

Stochastic average gradient

L-BFGS

Limited-memory Broyden-Fletcher-Goldfarb-Shanno

RUS

Random Undersampling

SMOTE

Synthetic minority oversampling technique

SMOTE-ENN

SMOTE with edited nearest neighbors undersampling

SALib

Sensitivity analysis library

Data Availability

The data that support the findings of this study are available from the BLSA (NIA) at this link (https://www.blsa.nih.gov/), but restrictions apply to the availability of these data. These restrictions are due to ethical considerations for informed consent to share human data. Consent to share data publicly was not included in the BLSA consent until recently, therefore most of the measurements used in these analyses are restricted by the IRB to “permissioned” access only. Analysis plans must be approved prior to gaining access to the data and a BLSA investigator must be included in the research team. Follow the instructions here (https://blsa.nih.gov/blsa-data-use) to register an account and request access to the data.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.Newman AB, Kritchevsky SB, Guralnik JM, Cummings SR, Salive M, Kuchel GA, et al. Accelerating the Search for Interventions Aimed at Expanding the Health Span in Humans: The Role of Epidemiology. J Gerontol A Biol Sci Med Sci. 2020;75(1):77–86. doi: 10.1093/gerona/glz230 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Beavers KM, Beavers DP, Houston DK, Harris TB, Hue TF, Koster A. Associations between body composition and gait-speed decline: results from the Health, Aging, and Body Composition study. The American Journal of Clinical Nutrition. 2013;97(3):552–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Studenski S, Perera S, Patel K, Rosano C, Faulkner K, Inzitari M, et al. Gait speed and survival in older adults. JAMA. 2011;305(1):50–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Cesari M, Kritchevsky SB, Penninx BWHJ, Nicklas BJ, Simonsick EM, Newman AB, et al. Prognostic value of usual gait speed in well-functioning older people--results from the Health, Aging and Body Composition Study. J Am Geriatr Soc. 2005;53(10):1675–80. doi: 10.1111/j.1532-5415.2005.53501.x [DOI] [PubMed] [Google Scholar]
  • 5.Ostir GV, Kuo Y-F, Berges IM, Markides KS, Ottenbacher KJ. Measures of lower body function and risk of mortality over 7 years of follow-up. Am J Epidemiol. 2007;166(5):599–605. doi: 10.1093/aje/kwm121 [DOI] [PubMed] [Google Scholar]
  • 6.Rolland Y, Lauwers-Cances V, Cesari M, Vellas B, Pahor M, Grandjean H. Physical performance measures as predictors of mortality in a cohort of community-dwelling older French women. Eur J Epidemiol. 2006;21(2):113–22. doi: 10.1007/s10654-005-5458-x [DOI] [PubMed] [Google Scholar]
  • 7.Rosano C, Newman AB, Katz R, Hirsch CH, Kuller LH. Association between lower digit symbol substitution test score and slower gait and greater risk of mortality and of developing incident disability in well-functioning older adults. J Am Geriatr Soc. 2008;56(9):1618–25. doi: 10.1111/j.1532-5415.2008.01856.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Woo J, Ho SC, Yu AL. Walking speed and stride length predicts 36 months dependency, mortality, and institutionalization in Chinese aged 70 and older. J Am Geriatr Soc. 1999;47(10):1257–60. doi: 10.1111/j.1532-5415.1999.tb05209.x [DOI] [PubMed] [Google Scholar]
  • 9.Abellan van Kan G, Rolland Y, Andrieu S, Bauer J, Beauchet O, Bonnefoy M, et al. Gait speed at usual pace as a predictor of adverse outcomes in community-dwelling older people an International Academy on Nutrition and Aging (IANA) Task Force. J Nutr Health Aging. 2009;13(10):881–9. doi: 10.1007/s12603-009-0246-z [DOI] [PubMed] [Google Scholar]
  • 10.Hall WJ. Update in geriatrics. Ann Intern Med. 2006;145(7):538–43. doi: 10.7326/0003-4819-145-7-200610030-00012 [DOI] [PubMed] [Google Scholar]
  • 11.Ayers E, Verghese J. Locomotion, cognition and influences of nutrition in ageing. Proc Nutr Soc. 2014;73(2):302–8. doi: 10.1017/S0029665113003716 [DOI] [PubMed] [Google Scholar]
  • 12.Studenski S, Perera S, Wallace D, Chandler JM, Duncan PW, Rooney E. Physical performance measures in the clinical setting. Journal of the American Geriatrics Society. 2003;51(3):314–22. [DOI] [PubMed] [Google Scholar]
  • 13.Goldman MD, Motl RW, Rudick RA. Possible clinical outcome measures for clinical trials in patients with multiple sclerosis. Ther Adv Neurol Disord. 2010;3(4):229–39. doi: 10.1177/1756285610374117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Tian Q, An Y, Resnick SM, Studenski S. The relative temporal sequence of decline in mobility and cognition among initially unimpaired older adults: Results from the Baltimore Longitudinal Study of Aging. Age Ageing. 2017;46(3):445–51. doi: 10.1093/ageing/afw185 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Tian Q, Resnick SM, Davatzikos C, Erus G, Simonsick EM, Studenski SA, et al. A prospective study of focal brain atrophy, mobility and fitness. J Intern Med. 2019;286(1):88–100. doi: 10.1111/joim.12894 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Skillbäck T, Blennow K, Zetterberg H, Skoog J, Rydén L, Wetterberg H, et al. Slowing gait speed precedes cognitive decline by several years. Alzheimer’s & Dementia. 2022;18(9):1667–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Alenazi AM, Alqahtani BA, Vennu V, Alshehri MM, Alanazi AD, Alrawaili SM, et al. Gait Speed as a Predictor for Diabetes Incidence in People with or at Risk of Knee Osteoarthritis: A Longitudinal Analysis from the Osteoarthritis Initiative. Int J Environ Res Public Health. 2021;18(9):4414. doi: 10.3390/ijerph18094414 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Karpman C, Benzo R. Gait speed as a measure of functional status in COPD patients. Int J Chron Obstruct Pulmon Dis. 2014;9:1315–20. doi: 10.2147/COPD.S54481 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Pulignano G, Del Sindaco D, Di Lenarda A, Alunni G, Senni M, Tarantini L, et al. Incremental Value of Gait Speed in Predicting Prognosis of Older Adults With Heart Failure: Insights From the IMAGE-HF Study. JACC Heart Fail. 2016;4(4):289–98. doi: 10.1016/j.jchf.2015.12.017 [DOI] [PubMed] [Google Scholar]
  • 20.Ferrucci L, Gonzalez-Freire M, Fabbri E, Simonsick E, Tanaka T, Moore Z, et al. Measuring biological aging in humans: A quest. Aging Cell. 2020;19(2):e13080. doi: 10.1111/acel.13080 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kyrdalen IL, Thingstad P, Sandvik L, Ormstad H. Associations between gait speed and well-known fall risk factors among community-dwelling older adults. Physiother Res Int. 2019;24(1):e1743. doi: 10.1002/pri.1743 [DOI] [PubMed] [Google Scholar]
  • 22.Choi S, Reiter DA, Shardell M, Simonsick EM, Studenski S, Spencer RG, et al. 31P Magnetic Resonance Spectroscopy Assessment of Muscle Bioenergetics as a Predictor of Gait Speed in the Baltimore Longitudinal Study of Aging. J Gerontol A Biol Sci Med Sci. 2016;71(12):1638–45. doi: 10.1093/gerona/glw059 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Mantel A, Trapuzzano A, Chizmar S, Haffke L, Dawson N. An Investigation of the Predictors of Comfortable and Fast Gait Speed in Community-Dwelling Older Adults. J Geriatr Phys Ther. 2019;42(4):E62–8. doi: 10.1519/JPT.0000000000000216 [DOI] [PubMed] [Google Scholar]
  • 24.Faulkner ME, Laporte JP, Gong Z, Akhonda MABS, Triebswetter C, Kiely M. Lower myelin content is associated with lower gait speed in cognitively unimpaired adults. The Journals of Gerontology: Series A. 2023. doi: glad080 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Zane AC, Reiter DA, Shardell M, Cameron D, Simonsick EM, Fishbein KW, et al. Muscle strength mediates the relationship between mitochondrial energetics and walking performance. Aging Cell. 2017;16(3):461–8. doi: 10.1111/acel.12568 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Caballero FF, Soulis G, Engchuan W, Sánchez-Niubó A, Arndt H, Ayuso-Mateos JL, et al. Advanced analytical methodologies for measuring healthy ageing and its determinants, using factor analysis and machine learning techniques: the ATHLOS project. Sci Rep. 2017;7:43955. doi: 10.1038/srep43955 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Busch T de A, Duarte YA, Pires Nunes D, Lebrão ML, Satya Naslavsky M, dos Santos Rodrigues A, et al. Factors associated with lower gait speed among the elderly living in a developing country: a cross-sectional population-based study. BMC Geriatr. 2015;15:35. doi: 10.1186/s12877-015-0031-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Hoogendijk EO, Rijnhart JJM, Skoog J, Robitaille A, van den Hout A, Ferrucci L, et al. Gait speed as predictor of transition into cognitive impairment: Findings from three longitudinal studies on aging. Exp Gerontol. 2020;129:110783. doi: 10.1016/j.exger.2019.110783 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Chou M-Y, Nishita Y, Nakagawa T, Tange C, Tomida M, Shimokata H, et al. Role of gait speed and grip strength in predicting 10-year cognitive decline among community-dwelling older people. BMC Geriatr. 2019;19(1):186. doi: 10.1186/s12877-019-1199-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Jayakody O, Breslin M, Srikanth VK, Callisaya ML. Gait Characteristics and Cognitive Decline: A Longitudinal Population-Based Study. J Alzheimers Dis. 2019;71(s1):S5–14. doi: 10.3233/JAD-181157 [DOI] [PubMed] [Google Scholar]
  • 31.Lee YH, Kim JS, Jung S-W, Hwang HS, Moon J-Y, Jeong K-H, et al. Gait speed and handgrip strength as predictors of all-cause mortality and cardiovascular events in hemodialysis patients. BMC Nephrol. 2020;21(1):166. doi: 10.1186/s12882-020-01831-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Zhou Y, Romijnders R, Hansen C, Campen J van, Maetzler W, Hortobágyi T, et al. The detection of age groups by dynamic gait outcomes using machine learning approaches. Sci Rep. 2020;10(1):4426. doi: 10.1038/s41598-020-61423-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Noh B, Youm C, Goh E, Lee M, Park H, Jeon H, et al. XGBoost based machine learning approach to predict the risk of fall in older adults using gait outcomes. Sci Rep. 2021;11(1):12183. doi: 10.1038/s41598-021-91797-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Verghese J, Wang C, Allali G, Holtzer R, Ayers E. Modifiable Risk Factors for New-Onset Slow Gait in Older Adults. J Am Med Dir Assoc. 2016;17(5):421–5. doi: 10.1016/j.jamda.2016.01.017 [DOI] [PubMed] [Google Scholar]
  • 35.Pinter D, Ritchie SJ, Gattringer T, Bastin ME, Hernández MDCV, Corley J, et al. Predictors of gait speed and its change over three years in community-dwelling older people. Aging (Albany NY). 2018;10(1):144–53. doi: 10.18632/aging.101365 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Dunsky A, Zeev A, Netz Y. Predictors of Future Walking Speed: A 12-Month Monitoring Program. Int J Aging Hum Dev. 2022;95(2):205–21. doi: 10.1177/00914150211066566 [DOI] [PubMed] [Google Scholar]
  • 37.Atkinson HH, Rosano C, Simonsick EM, Williamson JD, Davis C, Ambrosius WT, et al. Cognitive function, gait speed decline, and comorbidities: the health, aging and body composition study. J Gerontol A Biol Sci Med Sci. 2007;62(8):844–50. doi: 10.1093/gerona/62.8.844 [DOI] [PubMed] [Google Scholar]
  • 38.Mielke MM, Roberts RO, Savica R, Cha R, Drubach DI, Christianson T, et al. Assessing the temporal relationship between cognition and gait: slow gait predicts cognitive decline in the Mayo Clinic Study of Aging. J Gerontol A Biol Sci Med Sci. 2013;68(8):929–37. doi: 10.1093/gerona/gls256 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Verghese J, Holtzer R, Oh-Park M, Derby CA, Lipton RB, Wang C. Inflammatory markers and gait speed decline in older adults. J Gerontol A Biol Sci Med Sci. 2011;66(10):1083–9. doi: 10.1093/gerona/glr099 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Rosso AL, Sanders JL, Arnold AM, Boudreau RM, Hirsch CH, Carlson MC, et al. Multisystem physiologic impairments and changes in gait speed of older adults. J Gerontol A Biol Sci Med Sci. 2015;70(3):319–24. doi: 10.1093/gerona/glu176 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Higgins JP. Nonlinear systems in medicine. Yale J Biol Med. 2002;75(5–6):247–60. [PMC free article] [PubMed] [Google Scholar]
  • 42.Carballido-Landeira J, Escribano B. Nonlinear dynamics in biological systems. Springer. 2016. [Google Scholar]
  • 43.Tabue-Teguo M, Perès K, Simo N, Le Goff M, Perez Zepeda MU, Féart C, et al. Gait speed and body mass index: Results from the AMI study. PLoS One. 2020;15(3):e0229979. doi: 10.1371/journal.pone.0229979 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.El Haber N, Erbas B, Hill KD, Wark JD. Relationship between age and measures of balance, strength and gait: linear and non-linear analyses. Clin Sci (Lond). 2008;114(12):719–27. doi: 10.1042/CS20070301 [DOI] [PubMed] [Google Scholar]
  • 45.Richards EA, Christ SL, Rietdyk S, Teas E, Franks MM. Association of physical activity and gait speed: does context matter? American Journal of Lifestyle Medicine. 2023. doi: 15598276231157311 [Google Scholar]
  • 46.Buchner DM, Larson EB, Wagner EH, Koepsell TD, de Lateur BJ. Evidence for a non-linear relationship between leg strength and gait speed. Age Ageing. 1996;25(5):386–91. doi: 10.1093/ageing/25.5.386 [DOI] [PubMed] [Google Scholar]
  • 47.Quach L, Galica AM, Jones RN, Procter-Gray E, Manor B, Hannan MT, et al. The nonlinear relationship between gait speed and falls: the Maintenance of Balance, Independent Living, Intellect, and Zest in the Elderly of Boston Study. J Am Geriatr Soc. 2011;59(6):1069–73. doi: 10.1111/j.1532-5415.2011.03408.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Sasani K, Catanese HN, Ghods A, Rokni SA, Ghasemzadeh H, Downey RJ, et al. Gait speed and survival of older surgical patient with cancer: Prediction after machine learning. J Geriatr Oncol. 2019;10(1):120–5. doi: 10.1016/j.jgo.2018.06.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Lin C-C, Ou Y-K, Chen S-H, Liu Y-C, Lin J. Comparison of artificial neural network and logistic regression models for predicting mortality in elderly patients with hip fracture. Injury. 2010;41(8):869–73. doi: 10.1016/j.injury.2010.04.023 [DOI] [PubMed] [Google Scholar]
  • 50.Moon S, Ahmadnezhad P, Song H-J, Thompson J, Kipp K, Akinwuntan AE, et al. Artificial neural networks in neurorehabilitation: A scoping review. NeuroRehabilitation. 2020;46(3):259–69. doi: 10.3233/NRE-192996 [DOI] [PubMed] [Google Scholar]
  • 51.Summers MJ, Madl T, Vercelli AE, Aumayr G, Bleier DM, Ciferri L. Deep machine learning application to the detection of preclinical neurodegenerative diseases of aging. DigitCult-Scientific Journal on Digital Cultures. 2017;2(2):9–24. [Google Scholar]
  • 52.Ferrucci L. The Baltimore Longitudinal Study of Aging (BLSA): A 50-Year-Long Journey and Plans for the Future. Oxford University Press. 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Shock NW. Normal human aging: The Baltimore longitudinal study of aging. US Department of Health and Human Services, Public Health Service, National. 1984. [Google Scholar]
  • 54.Middleton A, Fritz SL, Lusardi M. Walking speed: the functional vital sign. J Aging Phys Act. 2015;23(2):314–22. doi: 10.1123/japa.2013-0236 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Montero-Odasso M, Schapira M, Varela C, Pitteri C, Soriano ER, Kaplan R, et al. Gait velocity in senior people. An easy test for detecting mobility impairment in community elderly. J Nutr Health Aging. 2004;8(5):340–3. [PubMed] [Google Scholar]
  • 56.Soto R, Díaz LA, Rivas V, Fuentes-López E, Zalaquett M, Bruera MJ, et al. Frailty and reduced gait speed are independently related to mortality of cirrhotic patients in long-term follow-up. Ann Hepatol. 2021;25:100327. doi: 10.1016/j.aohep.2021.100327 [DOI] [PubMed] [Google Scholar]
  • 57.Kon S, Canavan J, Schofield S, Banya W, Jones S, Nolan C. Gait Speed as a predictor of mortality in COPD. Eur Respiratory Soc. 2015. [Google Scholar]
  • 58.Davis J, Knight SP, Rizzo R, Donoghue OA, Kenny RA, Romero-Ortuno R, editors. A linear regression-based machine learning pipeline for the discovery of clinically relevant correlates of gait speed reserve from multiple physiological systems. 2021 29th European Signal Processing Conference (EUSIPCO); 2021: IEEE. [Google Scholar]
  • 59.Guralnik JM, Simonsick EM, Ferrucci L, Glynn RJ, Berkman LF, Blazer DG, et al. A short physical performance battery assessing lower extremity function: association with self-reported disability and prediction of mortality and nursing home admission. J Gerontol. 1994;49(2):M85-94. doi: 10.1093/geronj/49.2.m85 [DOI] [PubMed] [Google Scholar]
  • 60.de Fátima Ribeiro Silva C, Ohara DG, Matos AP, Pinto ACPN, Pegorari MS. Short Physical Performance Battery as a Measure of Physical Performance and Mortality Predictor in Older Adults: A Comprehensive Literature Review. Int J Environ Res Public Health. 2021;18(20):10612. doi: 10.3390/ijerph182010612 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Blankevoort CG, van Heuvelen MJG, Scherder EJA. Reliability of six physical performance tests in older people with dementia. Phys Ther. 2013;93(1):69–78. doi: 10.2522/ptj.20110164 [DOI] [PubMed] [Google Scholar]
  • 62.Puthoff ML, Saskowski D. Reliability and responsiveness of gait speed, five times sit to stand, and hand grip strength for patients in cardiac rehabilitation. Cardiopulm Phys Ther J. 2013;24(1):31–7. doi: 10.1097/01823246-201324010-00005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Goldberg A, Schepens S. Measurement error and minimum detectable change in 4-meter gait speed in older adults. Aging Clin Exp Res. 2011;23(5–6):406–12. doi: 10.1007/BF03325236 [DOI] [PubMed] [Google Scholar]
  • 64.Rolland YM, Cesari M, Miller ME, Penninx BW, Atkinson HH, Pahor M. Reliability of the 400‐m usual‐pace walk test as an assessment of mobility limitation in older adults. Journal of the American Geriatrics Society. 2004;52(6):972–6. [DOI] [PubMed] [Google Scholar]
  • 65.Chatfield MD, Brayne CE, Matthews FE. A systematic literature review of attrition between waves in longitudinal studies in the elderly shows a consistent pattern of dropout between differing studies. J Clin Epidemiol. 2005;58(1):13–9. doi: 10.1016/j.jclinepi.2004.05.006 [DOI] [PubMed] [Google Scholar]
  • 66.Fernández A, Garcia S, Herrera F, Chawla NV. SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. Journal of Artificial Intelligence Research. 2018;61:863–905. [Google Scholar]
  • 67.Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research. 2002;16:321–57. [Google Scholar]
  • 68.Lemaître G, Nogueira F, Aridas CK. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. The Journal of Machine Learning Research. 2017;18(1):559–63. [Google Scholar]
  • 69.Batista GEAPA, Prati RC, Monard MC. A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor Newsl. 2004;6(1):20–9. doi: 10.1145/1007730.1007735 [DOI] [Google Scholar]
  • 70.Tunkiel AT, Sui D, Wiktorski T. Data-driven sensitivity analysis of complex machine learning models: A case study of directional drilling. Journal of Petroleum Science and Engineering. 2020;195:107630. [Google Scholar]
  • 71.Iwanaga T, Usher W, Herman J. Toward SALib 2.0: Advancing the accessibility and interpretability of global sensitivity analyses. Socio-Environmental Systems Modelling. 2022;4:18155. [Google Scholar]
  • 72.Herman J, Usher W. SALib: An open-source Python library for sensitivity analysis. Journal of Open Source Software. 2017;2(9):97. [Google Scholar]
  • 73.Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C. Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Advances in Knowledge Discovery and Data Mining: 13th Pacific-Asia Conference, PAKDD 2009 Bangkok, Thailand, April 27-30, 2009 Proceedings, 2009. [Google Scholar]
  • 74.Radovanovic M, Nanopoulos A, Ivanovic M. Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research. 2010;11(sept):2487–531. [Google Scholar]
  • 75.Blagus R, Lusa L. SMOTE for high-dimensional class-imbalanced data. BMC Bioinformatics. 2013;14:106. doi: 10.1186/1471-2105-14-106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Smith B, Chu LK, Smith TC, Amoroso PJ, Boyko EJ, Hooper TI, et al. Challenges of self-reported medical conditions and electronic medical records among members of a large military cohort. BMC Med Res Methodol. 2008;8:37. doi: 10.1186/1471-2288-8-37 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Bohannon RW. Comfortable and maximum walking speed of adults aged 20-79 years: reference values and determinants. Age Ageing. 1997;26(1):15–9. doi: 10.1093/ageing/26.1.15 [DOI] [PubMed] [Google Scholar]
  • 78.Fragala MS, Alley DE, Shardell MD, Harris TB, McLean RR, Kiel DP, et al. Comparison of Handgrip and Leg Extension Strength in Predicting Slow Gait Speed in Older Adults. J Am Geriatr Soc. 2016;64(1):144–50. doi: 10.1111/jgs.13871 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.McGinnis RS, Mahadevan N, Moon Y, Seagers K, Sheth N, Wright JA Jr, et al. A machine learning approach for gait speed estimation using skin-mounted wearable sensors: From healthy controls to individuals with multiple sclerosis. PLoS One. 2017;12(6):e0178366. doi: 10.1371/journal.pone.0178366 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Kesler SR, Rao A, Blayney DW, Oakley-Girvan IA, Karuturi M, Palesh O. Predicting Long-Term Cognitive Outcome Following Breast Cancer with Pre-Treatment Resting State fMRI and Random Forest Machine Learning. Front Hum Neurosci. 2017;11:555. doi: 10.3389/fnhum.2017.00555 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Strobl C, Boulesteix A-L, Zeileis A, Hothorn T. Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics. 2007;8:25. doi: 10.1186/1471-2105-8-25 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Evans MK, Lepkowski JM, Powe NR, LaVeist T, Kuczmarski MF, Zonderman AB. Healthy aging in neighborhoods of diversity across the life span (HANDLS): overcoming barriers to implementing a longitudinal, epidemiologic, urban study of health, race, and socioeconomic status. Ethn Dis. 2010;20(3):267–75. [PMC free article] [PubMed] [Google Scholar]
  • 83.Deatsch A, Perovnik M, Namías M, Trošt M, Jeraj R. Development of a deep learning network for Alzheimer’s disease classification with evaluation of imaging modality and longitudinal data. Phys Med Biol. 2022;67(19):10.1088/1361-6560/ac8f10. doi: 10.1088/1361-6560/ac8f10 [DOI] [PubMed] [Google Scholar]
  • 84.Talos A. Autonomio Talos. http://github.com/autonomio/talos2020 [Google Scholar]

Decision Letter 0

Esedullah Akaras

Dear Dr. Deatsch,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Apr 25 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Esedullah Akaras

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1.Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. We note that you have indicated that there are restrictions to data sharing for this study. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

Before we proceed with your manuscript, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., a Research Ethics Committee or Institutional Review Board, etc.). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of recommended repositories, please see

https://journals.plos.org/plosone/s/recommended-repositories. You also have the option of uploading the data as Supporting Information files, but we would recommend depositing data directly to a data repository if possible.

We will update your Data Availability statement on your behalf to reflect the information you provide.

4. Your ethics statement should only appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please move it to the Methods section and delete it from any other section. Please ensure that your ethics statement is included in your manuscript, as the ethics statement entered into the online submission form will not be published alongside your manuscript.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Partly

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously? -->?>

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available??>

The PLOS Data policy

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

Reviewer #1: I congratulate the authors for their work. Considering that the decrease in walking speed is an indicator of mortality for elderly individuals with chronic diseases, your work is very valuable.

However, I have a few suggestions and questions.

The intended use of deep learning and NN in elderly individuals with chronic diseases should be detailed in the introduction section.

The fact that the walking speed of the individuals was not measured with an objective equipment can be stated as a limitation.

Only the right side was used for grip strength. Domimancy was not taken into concern?

Measurements of participants' walking speed over time may vary because of the individuals performing the measurements. it should be stated how this standardisation was achieved. Was interrater reliability performed? ICC value ?

In your study, age, grip strength and BMI were reported as major predictors similar to the results in the literature. LR and NN results are also similar. In this case, the superiority of NN or the reason for its preference is insufficient. This section needs to be supported by the authors.

Reviewer #2: This study addresses a highly relevant topic in the field, contributing valuable insights that enhance our understanding of gait and its determinants on aging. The authors have tackled an important question with a well-structured approach, and their findings have the potential to inform both clinical practice and future research. The thoroughness of the methodology and the depth of analysis further strengthen the study’s significance, making it a noteworthy addition to the existing literature. Nevertheless, I believe that implementing the suggested revisions will further enhance the quality and impact of the study, strengthening its contribution to the field.

Intro

1. The authors highlight the need for larger datasets in deep learning analyses and provide sample sizes from previous studies (108, 239, 746, 1901) as examples. However, no references are provided to support these figures. Including citations for these studies would enhance transparency and allow readers to verify the source of this information. (Line 81-86)

2. The introduction would benefit from a clearer explanation of how the present study differs from previous research and which specific gap in the literature it aims to address. For instance, the statement 'Several attempts have been made using statistical models to predict gait speed changes from a narrow set of potential predictors' suggests prior work in this area. However, it would be helpful if the authors explicitly outlined how their study expands upon or improves these past efforts. Additionally, specifying the professional groups (health workers, physiotherapist, physician etc.) that could benefit from these findings would provide readers with a clearer understanding of the study’s relevance and impact.

Methods

3. The primary aim of this study appears to focus on the relationship between gait speed, aging, and mortality. However, gait speed can be influenced by a wide range of factors, including orthopedic, neurological, and other medical conditions. Wouldn't it be more informative to stratify the analysis by different subgroups to account for these variations? Discussing the potential impact of such factors and whether a subgroup analysis could enhance the findings would strengthen the study's interpretation and applicability.

4. I acknowledge that biostatistics and methodological details can sometimes be complex to fully follow, so I apologize if I have missed anything. That being said, I would like to raise a few points regarding the chosen cut-points and the study population. The authors have established clinically relevant cut-points for gait speed (0.8 m/s and 1.0 m/s) based on prior literature. However, gait speed thresholds may vary across different populations, particularly in individuals with neurological conditions such as Parkinson’s disease or stroke. Were disease-specific variations in cut-points considered when applying these thresholds to the study population? If not, this could be a limitation, as the same cut-points may not be appropriate for all individuals.

5. Additionally, the study utilized gait speed data from the BLSA cohort, where measurement intervals varied by age. Since older individuals had more frequent assessments, could this have biased the long-term predictive model? Were adjustments made to account for the potential overrepresentation of gait speed decline in older participants? A discussion on how these factors may have influenced the results would strengthen the study’s findings.

6. It would be better for the readers if the authors explained a point about the measurement and use of grip strength in their analysis. In the table describing input variables, grip strength is reported in kilograms (Table 2). However, it is specifically labeled as "hand grip muscles right (kg)." Could you clarify whether only the right hand was measured and analyzed? If so, what was the rationale for not including the left hand or using an alternative approach such as the mean of both hands or the dominant hand? Additionally, considering that grip strength was identified as a significant factor in the Sobol index analysis and highlighted as a basis for therapeutic intervention, do you believe that using only the right hand impacts the generalizability of your findings? If left-hand data were available, would incorporating it alter your results? Clarifying these points would strengthen the methodological transparency and the applicability of your findings.

7. Previous studies (e.g., Sadeghi et al., 2000) suggest that lower limb dominance plays a role in gait biomechanics, with the dominant limb often contributing more to propulsion while the non-dominant limb provides stability. Considering this, do you think assessing lower limb dominance could enhance the interpretation of gait speed determinants in your study?

Discussion

8. The discussion would benefit from a more detailed justification of why grip strength was emphasized as a key predictor of gait speed rather than more directly related lower-limb strength measures. While grip strength has been associated with overall strength and function, gait speed is likely more directly influenced by lower-limb muscle strength. Addressing this distinction and discussing potential reasons for the focus on grip strength over leg strength measures would provide greater clarity for readers.

9. The authors have considered multiple gait-related parameters in their predictive model. However, in clinical and health-related research, patient-reported outcome measures (PROMs) are frequently used to capture the patient's perspective. PROMs can provide valuable insight into how individuals perceive their mobility, fatigue, pain, or fear of falling, which are factors that could influence gait patterns over time. Did the authors consider incorporating PROMs while examining the determinants of slow gait? If not, this could be a potential limitation of the study, as relying solely on objective gait parameters may overlook important subjective experiences that contribute to mobility decline. Including PROMs in future research could enhance the model's generalizability and provide a more comprehensive understanding of aging-related gait changes.

Reviewer #3: This manuscript presents a valuable contribution to the field of aging and mobility research by exploring predictive modeling techniques for slow gait, a key biomarker of health and longevity. By leveraging data from the Baltimore Longitudinal Study of Aging (BLSA), the study compares the performance of a deep learning neural network (NN) with traditional logistic regression (LR) models in predicting current and future slow gait at different timeframes (6-year and 10-year). Additionally, the study identifies key determinants of gait decline, such as age, BMI, sleep quality, and grip strength.

The study is well-motivated and methodologically rigorous, making a compelling case for integrating machine learning into aging research. However, some methodological and analytical aspects require further clarification or justification:

Strengths________________________________________________

One of the key strengths of this study is its innovative application of deep learning to predict aging-related slow gait. While gait speed has been extensively studied as a predictor of health outcomes, the use of a neural network represents a novel approach that could potentially capture complex, nonlinear relationships between predictors and mobility decline. Additionally, by benchmarking the NN against logistic regression, the authors provide a robust comparative analysis, which strengthens the validity of their results.

The study also benefits from its use of a well-established longitudinal dataset (BLSA), which enhances the reliability of the findings. The long-term follow-up (6 and 10 years) is particularly valuable, as it allows for a more comprehensive understanding of mobility decline over time. Few studies have attempted to predict future slow gait over such an extended period, making this study particularly relevant for aging research.

Another commendable aspect of the study is its attention to data imbalance, a common issue in clinical datasets. By applying various class balancing techniques (RUS, SMOTE, SMOTE-ENN), the authors effectively address the skewed distribution of slow versus normative walkers. This methodological rigor enhances the study’s robustness and provides useful insights into best practices for handling imbalanced datasets in clinical prediction models.

Finally, the study has clear clinical relevance, as it focuses on clinically meaningful gait speed cut-points (0.8 m/s and 1.0 m/s). These thresholds align with established research on mobility disability and mortality risk, ensuring that the findings are directly applicable to clinical decision-making. The identification of modifiable risk factors (e.g., BMI, grip strength, sleep quality) further underscores the study’s potential impact, as these variables could inform targeted interventions for preventing mobility decline.

Areas for improvement and suggested refinements (Minor Revisions)_______________________

While this study presents a strong and well-motivated analysis of aging-related slow gait prediction using deep learning and logistic regression, there are some methodological and conceptual aspects that would benefit from clarification and refinement. These do not require major changes to the core analysis but would enhance the transparency, interpretability, and generalizability of the findings.

- One of the key strengths of this study is its comparison between neural networks (NN) and logistic regression (LR). However, the results indicate that NN performs only marginally better than LR, raising the question of whether the added model complexity is necessary. While the authors state that NN allows for capturing nonlinear relationships, there is no strong evidence that such relationships exist in this dataset.

Suggested Improvement: A brief discussion on whether nonlinear interactions between predictors were observed (or theoretically expected) would strengthen the justification for using NN. If applicable, referencing studies that have successfully demonstrated nonlinear patterns in similar aging-related predictions would provide useful context.

- Deep learning models are susceptible to overfitting, especially when applied to datasets with a relatively small sample size (1,363 participants). The manuscript mentions the use of dropout layers, but there is no explicit discussion of other overfitting mitigation strategies, such as hyperparameter tuning, cross-validation, or external validation.

Suggested Improvement: A short statement on whether techniques such as k-fold cross-validation or regularization methods were used would provide confidence in the model’s generalizability. If external validation was considered but not performed, a mention of this as a future step would clarify the scope of the current analysis.

- A common challenge with deep learning models is their black-box nature, which limits clinical interpretability. While the study employs Sobol sensitivity analysis to rank predictor importance, this approach does not fully address the need for clinically meaningful explanations of how individual variables contribute to predictions.

Suggested Improvement: A brief comparison between the Sobol index results and logistic regression coefficients would help readers understand whether the NN model identifies similar key predictors as traditional methods. Additionally, a sentence or two acknowledging the potential use of SHAP (Shapley Additive Explanations) or LIME (Local Interpretable Model-agnostic Explanations) in future research would be beneficial.

- The study replaces missing values with the median, but the rationale for this choice is not discussed. This is particularly relevant given that 15% of MMSE scores (cognitive impairment) were missing, which could introduce bias.

Suggested Improvement: A brief justification for using median imputation over other common methods (e.g., mean imputation, multiple imputation) would improve transparency. If a sensitivity analysis was conducted to assess whether the imputation method affected results, mentioning this would be valuable.

- The study is based on the BLSA cohort, which consists primarily of highly educated volunteers from a limited geographic region. This raises concerns about generalizability to more diverse populations, particularly individuals from different socioeconomic, racial, and educational backgrounds.

Suggested Improvement: The Discussion or Limitations section should briefly acknowledge this limitation and suggest future validation in more heterogeneous cohorts. Even a short statement recognizing the potential biases of volunteer-based longitudinal studies would enhance transparency.

- While the results clearly compare NN and LR, a concise summary statement highlighting the key takeaways—whether NN significantly outperformed LR or if the results were comparable—would help readers quickly grasp the implications.

Suggested Improvement: A small summary table comparing the strengths and weaknesses of both models in terms of accuracy, interpretability, and clinical applicability would make this section more accessible.

Specific Comments

1. Introduction

- Line 50-55: The claim that “gait speed is an essential predictor of overall health and well-being” is well-supported, but additional references on the impact of gait speed on cognitive decline would strengthen the argument.

- Line 72-75: The statement that NNs allow for capturing nonlinear relationships should be supported with examples from prior research in aging.

2. Methods

- Line 160-162: The handling of missing MMSE data (15%) is a potential limitation. Was a sensitivity analysis conducted to assess the impact of missing data on results?

- Line 231-234: The manuscript mentions using different solvers for logistic regression. Was feature selection or regularization applied to avoid overfitting?

3. Results

- Table 3: The dataset sizes after class balancing should be contextualized. How does this compare to the original dataset?

4. Discussion

- Line 357-363: The claim that NNs are advantageous for handling images and longitudinal data is valid but not demonstrated in this study. Consider clarifying that this is a potential future direction.

- Line 472-475: The discussion on modifiable risk factors (BMI, grip strength) is strong but could benefit from practical implications for clinical interventions.

5. Conclusion

- Line 607-612: The conclusion suggests that deep learning should be further explored, but it should acknowledge that logistic regression performed similarly, questioning the necessity of NN in this context.

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: Review Form.docx

pone.0325172.s012.docx (16.9KB, docx)
PLoS One. 2025 Jun 17;20(6):e0325172. doi: 10.1371/journal.pone.0325172.r003

Author response to Decision Letter 1


24 Apr 2025

Reviewer #1:

1. I congratulate the authors for their work. Considering that the decrease in walking speed is an indicator of mortality for elderly individuals with chronic diseases, your work is very valuable.

a. We greatly appreciate Reviewer #1’s positive comments.

2. However, I have a few suggestions and questions. The intended use of deep learning and NN in elderly individuals with chronic diseases should be detailed in the introduction section.

a. We interpret this comment as suggesting that we note the eventual use of this in the clinical setting, which we agree is an excellent idea. We have provided potential applications and example use cases in a new paragraph in the introduction beginning, “The development of models…”

b. We have also further emphasized the applicability to a wide range of chronic diseases, with a new sentence and 3 new references in the introduction. The added sentence begins, “Gait speed is a metric of particular relevance for those with long-term, chronic conditions...”

3. The fact that the walking speed of the individuals was not measured with an objective equipment can be stated as a limitation.

a. We apologize for the evident lack of clarity on this point. We have edited the methods to emphasize that gait speed was measured using standard assessment methodologies based on usual current practice by trained observers, although we readily acknowledge that gait lab assessment would indeed be more precise. We have added two citations and a clarification to the methods section that states, “The SPPB, including its gait measurement component, is a standardized…”

b. We have now also included this as a minor limitation. It is unlikely that over the time frame of the timed walks, an improvement in measurement accuracy of ~1 second would result in any changes in classification in general and in our results in particular. However, acknowledging the Reviewer’s point, we have added a sentence to the discussion beginning, “Another potential limitation is that gait speed was measured by trained observers rather than by truly objective operator-independent methods.”

4. Only the right side was used for grip strength. Dominancy was not taken into concern?

a. We appreciate this consideration. We did have data from both hands for most participants; however, we evaluated their correlation and found significant collinearity, so we used only the right side. Indeed, we could have instead used “dominant hand”; this however would have forced us to use a somewhat smaller dataset as not every participant has their dominant hand noted. We have added a comment on the choice of only right side to the Data section 2.1.2 beginning with, “In fact, we elected to incorporate only the right-hand grip strength in the analysis…” We do acknowledge that a different choice could equally well have been made regarding incorporating left-sided grip strength.

5. Measurements of participants' walking speed over time may vary because of the individuals performing the measurements. It should be stated how this standardization was achieved. Was interrater reliability performed? ICC value?

a. We agree that these are important issues. First, we note that the BLSA protocols are long-established as high-quality assessments, and we made use of the generated data rather than re-collecting new data. Otherwise, interrater reliability and ICC are not included in the BLSA dataset. However, other studies which we have now cited have explored these or similar metrics. This resulted in our estimate of uncertainty of gait speed (0.06-0.11 m/s). Further exploration of the literature prompted by this comment indicated that ICC values for the timed walk tests range from 0.87 to 0.97. However, we were unable to find measurements of interrater reliability in the literature. We have now listed the ICC values in the Dataset Section 2.1.1 alongside the uncertainty values in conjunction with these additional citations. We have also, in line with this excellent point, incorporated these considerations into the limitations, beginning with, “Another potential limitation is that gait speed was measured by trained observers…”

6. In your study, age, grip strength and BMI were reported as major predictors similar to the results in the literature. LR and NN results are also similar. In this case, the superiority of NN or the reason for its preference is insufficient. This section needs to be supported by the authors.

a. We agree with this point and hope that we have emphasized that for this particular study, the performance of the NN and LR were similar. Of course, the additional flexibility afforded by the NN may prove to be crucial for other studies of gait and related physical outcomes. We have added a short paragraph in the Discussion Section 4.1.1 (discussing the optimized models’ performances) to remind readers of the advantages of the NN. In fact, we are currently completing another study in which a NN is used for gait prediction in which we evaluated use of input variables that are both tabular data and images. This is readily done with the NN formalism but cannot be accomplished with logistic regression in any conventional manner, since LR does not take images as input. The new paragraph begins, “While the performance difference between the NN and the LR was minimal in the present study…”

b. Note that to avoid further lengthening this already-lengthy manuscript, we kept this addition to the discussion rather short. More extensive arguments for the advantages of the NN are already included in both the introduction (“In addition to providing a natural means of developing implicit nonlinear models…”) and the conclusion (“The novel use of a NN for this purpose…”).

Reviewer #2:

1. This study addresses a highly relevant topic in the field, contributing valuable insights that enhance our understanding of gait and its determinants on aging. The authors have tackled an important question with a well-structured approach, and their findings have the potential to inform both clinical practice and future research. The thoroughness of the methodology and the depth of analysis further strengthen the study’s significance, making it a noteworthy addition to the existing literature. Nevertheless, I believe that implementing the suggested revisions will further enhance the quality and impact of the study, strengthening its contribution to the field.

a. We thank the reviewer for highlighting the potential significance of this work, and the detailed analysis we have provided.

2. Intro - The authors highlight the need for larger datasets in deep learning analyses and provide sample sizes from previous studies (108, 239, 746, 1901) as examples. However, no references are provided to support these figures. Including citations for these studies would enhance transparency and allow readers to verify the source of this information. (Line 81-86)

a. We apologize for this omission and thank the Reviewer for catching it. We have now provided the corresponding references.

3. Intro - The introduction would benefit from a clearer explanation of how the present study differs from previous research and which specific gap in the literature it aims to address. For instance, the statement 'Several attempts have been made using statistical models to predict gait speed changes from a narrow set of potential predictors' suggests prior work in this area. However, it would be helpful if the authors explicitly outlined how their study expands upon or improves these past efforts.

a. Thank you for this helpful comment. Although we did cite two papers with our initial comment, we have now further emphasized that as compared to prior work, the NN approach has the promise of greater flexibility and predictive power. We adjusted the sentence following the one noted by the reviewer to state: “These approaches, however, are restricted to the exploration of only linear relationships, while the complexity of human biochemistry and physiology suggests that their performance may be surpassed by that of models incorporating nonlinear effects and interactions.” We have also adjusted the paragraph flow to help connect this thought to the potential of the NN approach and indicate the possibility of expanding existing studies of nonlinear relationships to NN analysis.

b. Subsequently, we note in the MS: “Indeed, there remains a gap in the literature regarding the use of NNs for prediction of gait speed from clinical variables.” Using this as the topic sentence of a new paragraph emphasizes the Reviewer’s point and the specific gap we aim to fill.

c. We also call the reviewer’s attention to our Discussion Section 4.3.2 which fleshes out the models in the current literature and specifies in more detail the knowledge gap our work aims to fill.

4. Additionally, specifying the professional groups (health workers, physiotherapist, physician etc.) that could benefit from these findings would provide readers with a clearer understanding of the study’s relevance and impact.

a. We greatly appreciate this comment, which emphasizes the value in specifically addressing the important target audience of clinical practitioners. We have now specified the potential utility of this work to particular professional groups with a new paragraph in the introduction. The new paragraph begins, “The development of models that can accurately predict current and future gait speed decline would be of great clinical use in several contexts. For example...”

5. Methods - The primary aim of this study appears to focus on the relationship between gait speed, aging, and mortality. However, gait speed can be influenced by a wide range of factors, including orthopedic, neurological, and other medical conditions. Wouldn't it be more informative to stratify the analysis by different subgroups to account for these variations? Discussing the potential impact of such factors and whether a subgroup analysis could enhance the findings would strengthen the study's interpretation and applicability.

a. We greatly appreciate this comment. This represents a common trade-off in clinical studies between generalizability (using data from a large, heterogeneous, population) and accuracy (through limitation of confounding factors). Stratification by subgroup would, in principle, lead to greater accuracy within that subgroup, but would diminish the generalizability of the model to broad patient populations. In fact, gait speed is of great interest and importance precisely because, as the Reviewer points out, it integrates into one type of measurement a wide range of underlying pathologies. This makes it an excellent candidate measure for generalized datasets. We have now more specially highlighted this in the introduction with the addition of the sentence, “Indeed, one of the chief advantages of gait speed as a health metric is the fact that it is impacted by the integrity of a wide range of organ systems, including neurological status (both sensory and motor), cardiovascular health, orthopedic status, and pulmonary function.” Stratification would, therefore, result in study with a more limited and specific role for gait, in contrast to the present study in which gait is viewed as a broad-based integrated outcome. Of course, both approaches would have value, but for the present work, we elected to view gait speed as a final common indicator for multi-system and multi-organ function. In addition, especially in the context of NN analysis, further stratification would have a strongly negative impact on training capability by reducing dataset size. We have now incorporated this into the Limitations in a new paragraph that begins, “The BLSA dataset includes participants with a wide range of non-debilitating medical conditions.”

6. Methods - I acknowledge that biostatistics and methodological details can sometimes be complex to fully follow, so I apologize if I have missed anything. That being said, I would like to raise a few points regarding the chosen cut-points and the study population. The authors have established clinically relevant cut-points for gait speed (0.8 m/s and 1.0 m/s) based on prior literature. However, gait speed thresholds may vary across different populations, particularly in individuals with neurological conditions such as Parkinson’s disease or stroke. Were disease-specific variations in cut-points considered when applying these thresholds to the study population? If not, this could be a limitation, as the same cut-points may not be appropriate for all individuals.

a. We greatly appreciate this thoughtful comment. In fact, for this work, we were specifically interested in individuals without debilitating underlying conditions, in accordance with the BLSA participant population. This allowed us to gain insight into normative aging trajectories. Indeed, further work with targeted populations (e.g. prediction of gait speed trajectories for subjects with neurodegenerative disease) may be of substantial clinical value. We have added a clarifying statement to the description of the cut-points and how they are appropriate for the BLSA dataset: “…cut-points that have been shown to be clinically relevant for the normatively aging population represented by the BLSA.”

7. Methods - Additionally, the study utilized gait speed data from the BLSA cohort, where measurement intervals varied by age. Since older individuals had more frequent assessments, could this have biased the long-term predictive model? Were adjustments made to account for the potential overrepresentation of gait speed decline in older participants? A discussion on how these factors may have influenced the results would strengthen the study’s findings.

a. We indeed had a similar concern during model development. We found that there were far more measurements for the middle range of ages, with the potential therefore for greater accuracy, than the extremes, with the potential for decreased accuracy. However, with adequate training set size, accuracy even at the extremes would be expected to match that in the middle range. Therefore, we wished to determine whether the per-age training data was 1) adequate at the extremes and more than adequate in the middle range, or 2) inadequate at the extremes but (potentially) adequate in the middle range. Accordingly, we evaluated a per-age histogram of false predictions and compared it to the histogram of ages. We found, in fact, a remarkably similar structure, with the number of false predictions tracking the number of measurements. This is a strong indicator of lack of bias.

Figure 1. Histograms of the ages of participants at all gait speed measurements (blue) and the false predictions from the NN model (orange).

b. We also performed a sensitivity study in which measurements from the most over-represented age groups were excluded to an increasing extent. For the current gait speed prediction models tested, we found that the precision-recall area under the curve for the NN varied by less than 0.08 when removing up to 75% of the middle age range of values. This further supports the lack of bias due to overrepresentation of certain ages in the dataset.

c. We very much appreciate the reviewer bringing up this point and have now included a reference to these sensitivity studies in the Consideration of Bias Section 2.1.2. The new paragraph begins, “Lastly, we considered the possibility of bias from the uneven distribution of participant ages…” We have also now clarified the particular visit schedule dependence on participant age in Section 2.1.1.

8. Methods - It would be better for the readers if the authors explained a point about the measurement and use of grip strength in their analysis. In the table describing input variables, grip strength is reported in kilograms (Table 2). However, it is specifically labeled as "hand grip muscles right (kg)." Could you clarify whether only the right hand was measured and analyzed? If so, what was the rationale for not including the left hand or using an alternative approach such as the mean of both hands or the dominant hand? Additionally, considering that grip strength was identified as a significant factor in the Sobol index analysis and highlighted as a basis for therapeutic intervention, do you believe tha

Attachment

Submitted filename: Response to Reviewers.docx

pone.0325172.s014.docx (72KB, docx)

Decision Letter 1

Esedullah Akaras

Prediction of future aging-related slow gait and its determinants with deep learning and logistic regression

PONE-D-24-56052R1

Dear Dr. Deatsch,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager®  and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Esedullah Akaras

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions??>

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously? -->?>

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available??>

The PLOS Data policy

Reviewer #2: No

Reviewer #3: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #2: Yes

Reviewer #3: Yes

**********

Reviewer #2: I would like to sincerely thank the authors for their comprehensive and thoughtful responses to the reviewer comments. It is evident that considerable effort has been made to address the concerns raised during the initial review. The revised manuscript demonstrates substantial improvement, both in terms of clarity and scientific rigor. The additional explanations, methodological clarifications, and textual revisions have significantly strengthened the work’s contribution to the field. In particular, the enhancements to the Introduction and Limitations sections, as well as the added justifications regarding model choices and variable selection, are commendable. The manuscript is now more robust, transparent, and accessible to a broader clinical and academic audience. I appreciate the authors' diligence and responsiveness throughout the revision process.

Reviewer #3: All changes made by the authors are adequate and addressed the concerns of the reviewer. Therefore, I consider that the paper, in its current version, meets the necessary requirements to be published in Plos One.

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #2: No

Reviewer #3: Yes:  Eduardo Carballeira

**********

Acceptance letter

Esedullah Akaras

PONE-D-24-56052R1

PLOS ONE

Dear Dr. Deatsch,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Esedullah Akaras

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Pearson correlation coefficient for each pair of input variables.

    The scale bar on the right shows the color coding from red (strongly positively correlated) to blue (strongly negatively correlated).

    (TIF)

    pone.0325172.s001.tif (463.6KB, tif)
    S1 Table. Key demographics of our dataset comparing the included and dropped data from the BLSA database.

    Values are median + /- standard deviation. Bold text indicates values that were found to be significantly different.

    (TIF)

    pone.0325172.s002.tif (35.9KB, tif)
    S2 Fig. Analysis of potential dataset drift in the year of visit for all BLSA subjects used in this study.

    The lack of a clear trend across the means suggests minimal dataset drift over the year of visit.

    (TIF)

    pone.0325172.s003.tif (69.6KB, tif)
    S3 Fig

    a) Analysis of potential dataset drift in the year of birth for all BLSA subjects used in this study. The dotted line indicates a linear regression fit with regression coefficient of 0.0082. b) Comparison to a regression fit of age versus gait speed (decreasing left to right) with coefficient of 0.0083. The similar coefficients and shape of the data indicate minimal dataset drift due to year of birth.

    (TIF)

    pone.0325172.s004.tif (251.2KB, tif)
    S4 Fig. An example learning curve for the NN training on 10 Year 0.8 m/s classifier.

    The blue line indicates the loss of the training set, and the orange line represents the loss of the validation set. These loss curves show minimal overfitting after hyperparameter tuning.

    (TIF)

    pone.0325172.s005.tif (55KB, tif)
    S2 Table. Performance metrics for the classifiers trained with unbalanced datasets.

    The AUPRC value is the difference between the AUC and the no-skill AUC.

    (TIF)

    pone.0325172.s006.tif (44.5KB, tif)
    S3 Table. Performance metrics of classifiers using the NN with various balancing techniques.

    The AUPRC value is the difference between the AUC and the no-skill AUC. RUS = Random Undersampling, SMOTE = Synthetic Minority Oversampling Technique, ENN = Edited Nearest Neighbors. The asterisk indicates the classifier that did not need class balancing techniques because the data was already balanced.

    (TIF)

    pone.0325172.s007.tif (68.8KB, tif)
    S4 Table. Performance metrics of classifiers using the LR with various balancing techniques.

    The AUPRC value is the difference between the AUC and the no-skill AUC. RUS = Random Undersampling, SMOTE = Synthetic Minority Oversampling Technique, ENN = Edited Nearest Neighbors. The asterisk indicates the classifier that did not need class balancing techniques because the data was already balanced.

    (TIF)

    pone.0325172.s008.tif (68KB, tif)
    S5 Fig. Total order Sobol indices above 0.05 (significance) for each classifier using the NN.

    The color scale is shown in the inset, where darker colored cells indicate variables that are found to be significant more frequently across classifiers.

    (TIF)

    pone.0325172.s009.tif (390.9KB, tif)
    S6 Fig. Total order Sobol indices above 0.05 (significance) for each classifier using the NN.

    The color scale is shown in the inset in S5 Fig, where darker colored cells indicate variables that are found to be significant more frequently across classifiers.

    (TIF)

    pone.0325172.s010.tif (363.7KB, tif)
    S5 Table. Results of the sensitivity analysis performed to study the influence of various input variables.

    A: 15 Original inputs, B: Original inputs with height in place of BMI, C: Age and sex alone, D: Age, sex, BMI, grip strength, cognitive score (all the quantitative original inputs) E: Original inputs except age.

    (TIF)

    pone.0325172.s011.tif (19.5KB, tif)
    Attachment

    Submitted filename: Review Form.docx

    pone.0325172.s012.docx (16.9KB, docx)
    Attachment

    Submitted filename: Response to Reviewers.docx

    pone.0325172.s014.docx (72KB, docx)

    Data Availability Statement

    The data that support the findings of this study are available from the BLSA (NIA) at this link (https://www.blsa.nih.gov/), but restrictions apply to the availability of these data. These restrictions are due to ethical considerations for informed consent to share human data. Consent to share data publicly was not included in the BLSA consent until recently, therefore most of the measurements used in these analyses are restricted by the IRB to “permissioned” access only. Analysis plans must be approved prior to gaining access to the data and a BLSA investigator must be included in the research team. Follow the instructions here (https://blsa.nih.gov/blsa-data-use) to register an account and request access to the data.


    Articles from PLOS One are provided here courtesy of PLOS

    RESOURCES