Skip to main content
Nature Communications logoLink to Nature Communications
. 2025 Jul 4;16:6188. doi: 10.1038/s41467-025-61292-1

Utility of synthetic musculoskeletal gaits for generalizable healthcare applications

Yasunori Yamada 1,2,, Masatomo Kobayashi 1,2, Kaoru Shinkawa 1,2, Erhan Bilal 3, James Liao 4, Miyuki Nemoto 2, Miho Ota 2, Kiyotaka Nemoto 2, Tetsuaki Arai 2
PMCID: PMC12227639  PMID: 40615372

Abstract

Deep-neural-network-based artificial intelligence enables quantitative gait analysis with commodity sensors. However, current gait-analysis models are usually specialized for specific clinical populations and sensor settings due to the limited size and diversity of available datasets. We propose an approach that involves using synthetic gaits generated using a generative model learned via physics-based simulation with a broad spectrum of musculoskeletal parameters and evaluated its utility for data-efficient generalization of gait-analysis models across different clinical populations and sensor settings. The model trained solely on synthetic data estimates gait parameters with comparable or superior performance compared with real-data-trained models specialized for specific populations and sensor settings. Pre-training on synthetic data with self-supervised learning consistently enhances model performance and data efficiency in adapting to multiple gait-based downstream tasks. The results indicate that our approach offers an efficient means to augment data size and diversity for developing generalizable healthcare applications involving sensor-based gait analysis.

Subject terms: Neurological disorders, Diagnostic markers, Predictive markers, Machine learning


Automated gait assessment models are usually specialized to clinical populations and sensor settings due to the limited size and diversity of available datasets. Simulation-based synthetic data of musculoskeletal gaits boosts dataset size and diversity, enabling more accurate and versatile gait analysis models that can be applied across diverse settings.

Introduction

Gait disorders are common in individuals with neurological disorders111; thus, gait assessment has versatile utility for detecting diseases13, tracking progression3,4, planning treatments and assessing treatment response3,5,6 and predicting disease onset and progression7,8. Patients with Parkinson’s disease (PD) have altered characteristics in spatiotemporal gait parameters (e.g., gait speed, step length, step time)1, with some evident in the early and even prodromal disease stages3, and these alterations are associated with disease progression4. Children with cerebral palsy (CP) show complex and heterogeneous altered kinematic patterns of the lower and upper body during gaits, and gait assessments are used in diagnostics and treatment planning5. Beyond these neurological disorders with prominent movement disorders, altered gait parameters are also observed in prodromal stages of dementia9,10. Gait alteration can precede cognitive decline8,11, suggesting its clinical utility as an easily attainable predictor of future onset of dementia and cognitive impairment8,11. Gait assessments, however, are primarily conducted either through visual observation or using specialized equipment (e.g., optical motion capture). While the former lacks objectivity, the latter poses challenges in cost, time, and expertise required, prohibiting the adoption of gait assessments in large-scale studies or clinical routines.

Deep neural network (DNN)-based machine learning (ML) methods have shown tremendous potential for movement analysis in healthcare1223. ML models make quantitative gait analysis possible with commodity sensors, such as single-camera videos and wearable accelerometers1214, enabling easy and frequent measurements not only in routine clinical examination but also in home and community settings. Despite these methodological advancements and the potential transformative impact on both research and clinical practices1,7,13,1517, to the best of our knowledge, none have reached widespread adoption, at least partially, due to two problems. The first problem is that prior ML models were often trained for specific populations (e.g., children with CP13 or adults with PD12) and for specific sensor settings (e.g., videos recorded with a fixed camera from the front or side13,18 and accelerometers on the waist19,20). This limitation can degrade model performance in clinical populations or sensor settings differing from those represented in the training dataset, which can pose a significant barrier to widespread adoption. The second problem is that although the generalizability and scalability of ML models depend on the size and diversity of the training data24,25, such exhaustive datasets labeled with ground-truth measures (e.g., optical-motion-capture-based gait parameters as well as established clinical measures of disease severity and their longitudinal changes) are rarely available21,23. This is partially due to the practical difficulties in data curation and sharing involving personal and health information24. Sharing sensitive data, such as videos and health-related conditions, requires careful consideration due to the regulated nature of protected health information and privacy concerns. Even pre-processed data, such as body keypoint coordinates during gaits, as well as trained generative models, such as generative adversarial networks (GANs), involve the risk of information leakage24. Acquiring huge amounts of labeled data for each new clinical population or sensor setting may be the most valid approach but can be labor intensive and not scalable.

To address the issues of generalizability and scalability, synthetic-data generation and self-supervised learning may provide promising solutions. Synthetic data are being increasingly used for developing applications in medicine and healthcare as a viable complement or alternative to real data, particularly when data collection is difficult or impractical24,26. For synthetically generating motion data, in addition to traditional data-augmentation methods such as noise injection and synthetic minority oversampling for time series (e.g., T-SMOTE27), deep generative models, such as GANs trained on real datasets28, and musculoskeletal simulation based on motion-capture data16,29,30 have been used. These methods with motion-capture data, for example, are used for generating new synthetic marker trajectories and estimating unmeasured sensor data such as accelerometer data, and these simulated data are used for training ML models with supervised and self-supervised learning16,2832. These methods have improved model performance, but their application is often limited to the specific population or sensor setting included in the dataset used for synthetic-data generation. Recent studies proposed methods combining such deep generative models and musculoskeletal simulation to generate diverse synthetic gaits through reinforcement learning in physics-based simulation33,34. A deep generative model with musculoskeletal simulation learns biologically plausible muscle controls for a high-dimensional continuous domain of anatomical conditions, enabling the generation of realistic gaits for specific anatomical conditions and desired gait conditions (e.g., speed and step length). Interestingly, abnormal parameters under anatomical conditions, such as those associated with muscle deficits (e.g., weakness and contracture), can produce atypical gaits, for example, those similar to crouch gait observed in children with CP33. We hypothesize that these diverse synthetic gaits generated in simulation can increase and complement diversity of training data for developing generalizable gait-analysis models across various clinical populations and sensor settings. The other promising solution, self-supervised learning, is the process of training DNN models to learn meaningful representations without labels and has been successfully used for developing data-efficient generalizable models capable of being fine-tuned for various downstream tasks35. The generalizability of pre-trained models depends on the size and diversity of training data24,25. Even for gait-based applications, the outputs of which are unavailable in simulation, we hypothesize that synthetic gaits can be useful for learning general-purpose feature representations through self-supervised learning.

In this work, we propose an approach using synthetic gaits generated with the deep generative model learned in the musculoskeletal simulation for developing gait-analysis models (Fig. 1). Incorporating diverse synthetic gaits with a broad spectrum of musculoskeletal parameters including abnormal muscle conditions (Fig. 1a) and diverse synthetic sensor data emulating various settings (Fig. 1b) enhances the generalizability of gait-analysis models across different clinical populations and sensor settings. We evaluated the utility of our approach for data-efficient generalization of gait-analysis models using single-camera videos or wearable accelerometers across different clinical populations (CP, PD, and dementia) and different sensor settings (camera viewpoint/movement or accelerometer placement). We primarily focus on single-camera videos as a promising source in terms of the possibility of assessing whole-body gait patterns as well as its easy-to-use and ubiquitous nature with the dementia and CP datasets comprising 9247 videos collected from 1128 unique individuals (Supplementary Tables 12). We also present the results of using wearable accelerometers to demonstrate the applicability of our approach to different sensors, with the dementia and PD datasets comprising 2871 wearable-accelerometer data samples collected from 196 unique individuals (Supplementary Tables 35). Specifically, we evaluated the utility of our approach for two scenarios in which the outputs of gait-analysis models can or cannot be simulated. For the first scenario, we show that the model trained on synthetic data with supervised learning enables the estimation of gait parameters across different clinical populations and sensor settings, even without using any real data for model training (Fig. 1c, d). For the second scenario, we demonstrate the data-efficient generalizability of a pre-trained model on synthetic data with self-supervised learning across different downstream tasks involving different clinical populations and different sensor settings (Fig. 1e, f). We have made the following key contributions in this work leveraging current generative models and deep learning architectures: (i) in contrast to other data-augmentation methods including generative models and musculoskeletal simulations that exploit real data samples16,2832, we propose a complementary approach that uses diverse synthetic gaits generated using a simulation-learned generative model, eliminating the need for real data access; and (ii) through extensive experimental evaluations on multiple datasets and sensor modalities, we demonstrate the utility of our approach in improving model performance and data efficiency, highlighting its potential to facilitate the development of generalizable gait-analysis models across different clinical populations and sensor settings.

Fig. 1. Schematic illustrations for our approach using synthetic musculoskeletal gaits for developing generalizable gait-analysis models.

Fig. 1

a Synthetic gaits are generated with varying parameters related to gait and anatomical conditions as inputs for deep generative model that learns muscle control of human gaits through deep reinforcement learning and full-body musculoskeletal model in physics-based simulation. Gaits are simulated to be as diverse as possible by using parameters that include abnormal parameters. b Synthetic sensor data during simulated gaits are generated in multiple sensor settings to emulate real data collected in various sensor settings. c, d Example using synthetic video data for developing gait-parameter-estimation or muscle-activity-estimation model with supervised learning. Model is applied to real data from different clinical populations and sensor settings for inference. e, f Example using synthetic video data for model pre-training with self-supervised learning. Pre-trained model is then fine-tuned with real data to each downstream task and applied to separate real data. See Methods for more detailed descriptions. PD control, proportional-derivative control; RF algorithm, reinforcement learning algorithm.

Results

Model trained on synthetic data with supervised learning

We compared the performance of models trained solely on synthetic data with models trained on real data of specific clinical populations and sensor settings with the model architectures remaining identical. Specifically, we used the following commonly used DNN model architectures: fully convolutional network36,37, residual network (ResNet)36,37, and transformer38; all of which have demonstrated their utility for time-series analysis3638. Under both conditions, the models were trained to estimate gait parameters from the time series of 2D body poses extracted from each single video. Specifically, each synthetic-data-trained model was trained solely on synthetic data then tested on each of the three subsets of real data representing different clinical populations and camera settings: (i) front-view videos captured with a fixed camera in the dementia dataset, (ii) back-view videos with a fixed camera in the dementia dataset, and (iii) side-view videos with a rotating camera in the CP dataset (Fig. 2a). In contrast, the real-data-trained models were separately trained on each subset of real data and evaluated using a subject-wise cross-validation procedure by splitting the subset into training, validation, and test sets (Fig. 2b). By changing the amount of training data for the real-data-trained models, we investigated data efficiency, i.e., how many real-data samples were needed to achieve the performance of the model trained solely on synthetic data, in other words, how many required real-data samples could be reduced using synthetic data.

Fig. 2. Experiments on models for estimating gait parameters from single videos.

Fig. 2

a, b Schematic illustrations of development and evaluation of synthetic-data-trained model and real-data-trained model, respectively. Synthetic-data-trained model was applied to all three test sets of real data for different clinical populations and sensor settings. Real-data-trained model was developed and internally evaluated separately for each of three sets. c Estimation performance of gait parameters. Best synthetic-data-trained model based on ResNet architecture was applied to real data without any additional training on real data. Best performance among real-data-trained models with different architectures are shown. Error bars show the mean ± 95% CI calculated over independent models trained on different synthetic data (n = 5) and cross-validation iterations for models trained on real data (n = 10). d Data efficiency of synthetic-data-trained model. Data efficiency was measured by estimating amount of training data that real-data-trained models require to achieve the same performance of the synthetic-data-trained model. Data are presented as means, with shaded areas representing 95% CI calculated over cross-validation iterations (n = 10). Source data are provided as a Source Data file. CP cerebral palsy, CI confidence interval.

The best model solely trained on synthetic data using the ResNet architecture consistently showed reasonable performance across all three subsets of real data for all three gait parameters, with comparative performance to the models trained on a certain amount of real data or superior performance than all real-data-trained models built on the range of real data available for this study (Fig. 2c, d). For instance, when the synthetic-data-trained model was applied to the front-view videos in the dementia dataset, the model could estimate gait speed, step length, and step time with a Pearson’s correlation coefficient of 0.88, 0.80, and 0.97, respectively (see Supplementary Result 1 and Supplementary Tables 68 for the full results including error metrics). In terms of data efficiency, the performances for the gait speed and step length were comparable to the estimated performances of real-data-trained models trained on 42% (299 videos) and 33% (233 videos) of data, respectively, as illustrated in Fig. 2d. The performance for the step time was superior to those of all real-data-trained models trained on 20 to 95% of the total 714 videos. The same synthetic-data-trained model similarly demonstrated reasonable performance for estimating gait parameters from both the back-view videos in the dementia dataset and side-view videos with a rotating camera in the CP dataset (Fig. 2c). Overall, the synthetic-data-trained model could estimate gait parameters with comparative or superior performance to the specialized real-data-trained models trained on 33 to 95% of data (233 to 1287 videos; Fig. 2d). In contrast, when the real-data-trained models were applied to other clinical populations or other sensor settings different from those represented in the training dataset, model performance significantly degraded (Supplementary Result 2). These results indicate the potential of our approach using synthetic musculoskeletal gaits in improving both data efficiency and model generalizability across different clinical populations and sensor settings. We also observed consistent results when using 3D body poses instead of 2D body poses (Supplementary Result 3 and Supplementary Table 9). The additional exploratory result on the potential applicability of the synthetic-data-trained model to the estimation of other clinically-relevant gait parameters (i.e., variability measures, step width, knee joint angle) is also available in Supplementary Result 4 and Supplementary Table 10.

We also analyzed accelerometer data to demonstrate the applicability of our approach using synthetic gaits to other sensor modalities. The experimental setup was the same as that for video-based gait analysis except that models were trained to estimate the gait speed from the time-series data of each single wearable accelerometer (Fig. 3a, b). The accelerometer data from the two datasets represent the following four combinations of clinical population and sensor settings: (i) waist-worn or (ii) ankle-worn accelerometer in the dementia accelerometer dataset and (iii) chest-worn or (iv) thigh-worn accelerometer in the PD accelerometer dataset. Similar to the results on video data, the synthetic-data-trained model for accelerometer data consistently demonstrated reasonable performance across all four sensor placements involving different clinical populations (Fig. 3c; see Supplementary Result 5, Supplementary Fig. 1, and Supplementary Tables 1112 for the full results). An additional subgroup analysis revealed that the model performance on data collected in clinical practice was comparable to that on data collected in laboratory settings (Supplementary Method 1 and Supplementary Table 13). These results also underscore the data-efficient generalizability of using synthetic gaits for estimating gait parameters.

Fig. 3. Experiments on models for estimating gait speed from single-accelerometer data.

Fig. 3

a, b Schematic illustrations of development and evaluation of synthetic-data-trained model and real-data-trained model, respectively. Synthetic-data-trained model was applied to all four test sets of real data for different clinical populations and sensor settings. Real-data-trained model was developed and internally evaluated separately for each of the four sets. c Data efficiency of the synthetic-data-trained model. Data efficiency was measured by estimating amount of training data that real-data-trained models require to achieve the same performance of the synthetic-data-trained model. Data are presented as means, with shaded areas representing 95% CI calculated over cross-validation iterations (n = 10). Source data are provided as a Source Data file. PD Parkinson’s disease, CI confidence interval.

To further demonstrate the versatility of our approach in using multiple sensor modalities, we examined the feasibility of cross-modal estimation. We trained a model on synthetic data to estimate muscle-activity sequences from 2D-body-pose sequences and evaluated the performance of zero-shot inference of electromyography (EMG) activity from real video data in the dementia dataset (Fig. 4a; see Supplementary Method 2 for details). The results indicate that the model, trained solely on synthetic data, achieved noteworthy performance in estimating EMG activity across 14 lower-limb muscles for both front-view and back-view videos, with Pearson’s correlation coefficients ranging from 0.61 to 0.80 (average: 0.72; Fig. 4b, c; see Supplementary Table 14 for the full results). The finding suggests that leveraging synthetic musculoskeletal gaits could provide a viable method for in-depth motion analysis by using readily available modality data.

Fig. 4. Experiments on synthetic-data-trained model for cross-modal estimation from single videos to muscle activities.

Fig. 4

a Schematic illustrations of development and evaluation of synthetic-data-trained model. b Example of muscle-activity estimations from a single-camera video. c Estimation performance of EMG activities on synthetic-data-trained model based on Transformer architecture. The model was applied to real data without any additional training on real data. Error bars show the mean ± 95% CI calculated over independent models trained on different synthetic data (n = 5). Source data are provided as a Source Data file. EMG electromyography, CI confidence interval, BF biceps femoris, VM vastus medialis, RF rectus femoris, PL peroneus longus, TA tibialis anterior, SOL soleus, GAS medial gastrocnemius, L left, R right.

Model pre-trained on synthetic data with self-supervision

We next investigated the utility of our approach for the second scenario in which the pre-trained model on synthetic data is adapted to each downstream task. We investigated whether and to what extent pre-training with synthetic data can improve model performance by comparing the pre-trained model with the models trained from scratch. Specifically, we studied the following downstream tasks: cross-sectional tasks for classifying dementia-diagnostic status and classifying clinical status relevant to the severity of CP, as well as a longitudinal task for predicting cognitive decline over a three-year period. The dementia-diagnostic status consists of cognitively unimpaired (CU) older adults and patients with mild cognitive impairment (MCI) and dementia. The severity of CP is measured as the gross motor function classification system (GMFCS), which categorizes patients with CP into five different levels regarding gross-motor skills where level V is the severest; our analyzed dataset contained three different levels (I, II, and III). Regarding the DNN architectures, we used state-of-the-art models for time-series classification in addition to the above three basic model architectures: TimesNet39, Non-stationary Transformer40, and Informer41; as well as representative general sequence models: LSTM42 and Mamba43. We also incorporated TS2Vec44 as a pre-training model and subsequently examined the impact of pre-training on performance.

The pre-training with synthetic data enhanced model performance across all tasks, and the pre-trained models achieved the highest performance. For the cross-sectional, binary-classification task, the best pre-trained model achieved an area under the receiver operating characteristic curve (AUC) of 0.866 (95% confidence interval [CI]: 0.846 to 0.885) for CU versus dementia, with +0.044 improvement over the best model without pre-training (P = 1.1 × 10-2; Fig. 5a). For the cross-sectional, multi-class classification tasks, the best pre-trained models achieved a balanced accuracy of 54.4% (95% CI: 52.3 to 56.5%) for CU versus MCI versus dementia and 73.8% (95% CI: 73.4 to 74.2%) for the GMFCS level on CP, with improvement of +4.4 and +4.0 %pt over the best models without pre-training (P = 2.7 × 10-3 and P = 2.8 × 10-10; Fig. 5b, c). Regarding the longitudinal prediction task, the best pre-trained model could detect individuals with cognitive decline over the three-year period with an AUC of 0.732 (95% CI: 0.707 to 0.757), with +0.136 improvement over the best comparison model without pre-training (P = 4.1 × 10-7; Fig. 5d). Pairwise comparisons for each model architecture revealed consistent improvements by pre-training across all eight model architectures in all four downstream tasks (Fig. 5e–h), supporting the generality of the benefits of pre-training on synthetic gaits. See Supplementary Result 3 and Supplementary Fig. 2 for the result using 3D body poses.

Fig. 5. Experiment results on video-based models for downstream tasks.

Fig. 5

a–d Performances on four downstream tasks using single videos. Error bars show the mean ± 95% CI calculated over cross-validation iterations (n = 5, 10, or 20). e–h Pairwise comparisons for each model architecture without and with pre-training using synthetic data (n = 8 model architectures). i–l Data efficiency of model with pre-training using synthetic data. Data efficiency was measured by estimating amount of real data (subset of training set) required for fine-tuning the best pre-trained model to achieve best performance of models without pre-training. Data are presented as means, with shaded areas representing 95% CI calculated over cross-validation iterations (n = 10). Source data are provided as a Source Data file. AUC area under receiver operating characteristic curve, CU cognitively unimpaired, MCI mild cognitive impairment, GMFCS gross motor function classification system, CP cerebral palsy, FCN fully convolutional network, NST non-stationary transformer, CI confidence interval.

We then investigated the data efficiency of the pre-trained model on synthetic data by investigating to what extent the amount of real data required for model training can be reduced by pre-training to achieve the same performance of the best model without pre-training. The pre-trained model on synthetic data consistently showed notable data efficiency across all downstream tasks. For the cross-sectional classification task for CU versus dementia, for example, the pre-trained model used 43% of training data (Fig. 5i), corresponding to a reduction of 1149 videos, to achieve the same performance as that of the best model without pre-training. Overall, the pre-trained model achieved the same performance of the best models without pre-training with 14 to 61% of real data, i.e., reduction of 290 to 1873 videos, and with 38% as the average, i.e., reduction of 1179 videos (Fig. 5i–l). These results underscore the utility of pre-training with synthetic gaits in reducing the need for training data for multiple downstream tasks involving different clinical populations and sensor settings.

We also analyzed accelerometer data to further corroborate the results on the pre-trained model with other sensor modalities. We studied the following two downstream tasks using chest-worn accelerometer data: cross-sectional tasks for classifying PD and controls and for classifying medication states in patients with PD (ON and OFF medication states). We found consistent results for model performance and data efficiency with those on video data. In both tasks, pre-training on synthetic data with self-supervised learning improved the performance for each model architecture, and the pre-trained models achieved the highest performance, surpassing the best model without pre-training by +0.048 and +0.084 in AUC (Fig. 6a–d). The pre-trained model achieved the highest performance in the model without pre-training using 21 and 28% of the total training data (Fig. 6e, f).

Fig. 6. Experiment results on accelerometer-based models for downstream tasks.

Fig. 6

a, b Performances on two downstream tasks using chest-worn accelerometer data. Error bars show the mean ± 95% CI calculated over cross-validation iterations (n = 10). c, d Pairwise comparisons for each model architecture without and with pre-training using synthetic data (n = 8 model architectures). e, f Data efficiency of model with pre-training using synthetic data. Data efficiency was measured by estimating amount of real data (subset of training set) required for fine-tuning the best pre-trained model to achieve best performance of models without pre-training. Data are presented as means, with shaded areas representing 95% CI calculated over cross-validation iterations (n = 10). Source data are provided as a Source Data file. AUC area under receiver operating characteristic curve, PD Parkinson’s disease, FCN fully convolutional network, NST non-stationary transformer, CI confidence interval.

Lastly, we investigated the efficacy of using multimodal synthetic data for downstream tasks, where models separately pre-trained on synthetic video and accelerometer data were fused and fine-tuned for the downstream task. The results indicate that the multimodal pre-trained model surpassed the performance of both single-modal pre-trained models and the multimodal model without pre-training, indicating the benefits of using multimodal synthetic data. For the full results, see Supplementary Result 6 and Supplementary Fig. 3.

Impact of synthetic-data diversity on model generalizability

Across all experiments, we consistently observed the utility of synthetic gaits to achieve data-efficient generalization of gait-analysis models. With our approach, we generate diverse synthetic data by including (i) atypical gaits simulated with abnormal muscle conditions and (ii) multiple sensor settings. We investigated the contributions of these two factors on the resultant model performance. We hypothesize that the atypical synthetic gaits could help capture the characteristics of diverse human gaits, exemplified by those of patients with neurological disorders, thus improve the model generalizability across different clinical populations. The inclusion of synthetic sensor data representing multiple sensor settings could also enable the model to discover robust associations between input sensor data and target clinically relevant outputs against the difference in sensor settings, leading to improved model performance across different sensor settings. As gait parameters can have different distributions across different clinical populations and different severities of the diseases2,4,9,10,45, a range of gait parameters represented in the training dataset could also be one of the key factors in developing a generalizable model for estimating gait parameters across different clinical populations. Thus, we also examined this hypothesis by investigating the association between the range of gait parameters in synthetic gaits and performance of the zero-shot inference on gait parameters for real data.

We first investigated the association of atypical synthetic gaits with actual human gaits in our datasets. We estimated the probability-density function for vector representations of the synthetic gaits derived from the pre-trained model using a Gaussian mixture model46 after applying dimension reduction with uniform manifold approximation and projection (UMAP)47 (see Methods). We then investigated whether the real data in our datasets fall within 95% of the distribution of the synthetic gaits. Compared with the absence of the atypical synthetic gaits, their inclusion increased the proportion of real data falling within the distribution from 77.0% (95% CI: 75.5 to 78.4%) to 98.0% (95% CI: 97.7 to 98.4%) (Fig. 7). When we calculated the Fréchet inception distance48 for measuring the difference between distributions for synthetic and real data, the incorporation of the atypical synthetic gaits reduced the distance from 19.7 (95% CI: 18.6 to 20.8) to 11.2 (95% CI: 10.1 to 12.2). Data from patients with greater severity of diseases tended to fall in the distribution of the atypical synthetic gaits more often compared with those from healthy control or patients with lower severity. The proportion included in the distribution of the atypical synthetic gaits was 23.7% (95% CI: 22.6 to 24.8%) for the patients with dementia, which was higher than 20.4% (95% CI: 19.1 to 21.8%) for the patients with MCI and 19.2% (95% CI: 17.8 to 20.6%) for the older CU adults (P = 1.3 × 10-3 and P = 4.6×10-6). Regarding patients with CP, the proportion was 33.7% (95% CI: 30.8 to 36.5%) for the patients with GMFCS level III, which was higher than 26.2% (95% CI: 24.3 to 28.1%) and 25.6% (95% CI: 23.6 to 27.5%) for those with GMFCS levels II and I (P = 1.9 × 10-5 and P = 3.2 × 10-6). These results suggest that atypical synthetic gaits could help capture the characteristics of diverse human gaits, particularly those of patients with neurological disorders.

Fig. 7. Vector representation of pose sequences extracted from videos in UMAP space.

Fig. 7

a, b Clustering results without and with atypical synthetic gaits, respectively. Contours represent density estimation of clusters resulting from using Gaussian mixture model for synthetic gaits: fixed-camera/front-view (F), fixed-camera/back-view (B), and rotating-camera/side-view (R) with normal (N) and abnormal (A) parameters (FN, FA, BN, BA, RN, and RA, respectively). Each dot represents single video of real data, and color represents cluster group: black and red indicate data that fall within 95% of Gaussian distribution of synthetic gaits with normal and abnormal parameters, respectively, and yellow indicates data that do not fall into any distributions of synthetic gaits. UMAP uniform manifold approximation and projection.

We then conducted ablation experiments on the atypical synthetic gaits and synthetic sensor data for multiple sensor settings to examine the impact of the inclusions of both types of data on model performance. After removing these types of synthetic data from the training set, we conducted data augmentation with the remaining synthetic data by using standard methods (i.e., jittering, rotation, and translation of cameras) to mitigate the effect of differences in the total amount of training data. We observed performance degradation in the synthetic-data-trained model with ablation (Fig. 8a; see Supplementary Table 15 for the full result). Regarding the gait-parameter estimation on front-view videos in the dementia dataset, the ablation of atypical synthetic gaits degraded performance by up to −3.7%; that of sensor settings other than those represented in the test set (i.e., ablation of back-view and side-view videos) degraded performance by up to −7.2%; and that of both degraded performances by up to −22.1%. We observed similar performance degradation when adapting the synthetic-data-pre-trained model to downstream tasks. For instance, the ablation of atypical synthetic gaits degraded multi-class classification performance for the dementia diagnostic status by −5.2%; that of irrelevant sensor settings degraded performance by −3.7%; and that of both degraded performances by −6.3%. These results indicate that the incorporation of each type of synthetic data improves model performance, providing empirical support to the above-mentioned hypotheses.

Fig. 8. Effects of diversity in synthetic data on model performance.

Fig. 8

a Ablation results on videos in dementia dataset. Normalized performance indicates relative performance divided by that of full model. Error bars show the mean ± 95% CI calculated over independent models trained on different synthetic data (n = 10) and cross-validation iterations for models on downstream tasks (n = 10). b Gait-speed-estimation performance of synthetic-data-trained models on front-view videos in dementia dataset. Models were trained on synthetic data with different ranges of gait speed and with different amounts of training data. Data are presented as means, with shaded areas representing 95% CI calculated over cross-validation iterations (n = 15). Source data are provided as a Source Data file. CU cognitively unimpaired, MCI mild cognitive impairment, CI confidence interval.

Lastly, we investigated the contribution of the range of gait parameters represented in the synthetic gaits. Consistent with previous studies, we observed substantial differences in gait parameters across different clinical populations and different severities of the diseases (Supplementary Fig. 4 and Supplementary Tables 15). The model trained on specific clinical population also performed worse when applied to other clinical populations, and the degree of deterioration was greater in the groups with larger differences in gait characteristics (Supplementary Result 7), indicating that the differences across clinical populations can negatively impact model generalizability when training data are restricted to a specific population. On the basis of these results, we subsequently investigated the contribution of the range of gait parameters represented in the synthetic gaits on generalizability across different clinical populations as an additional factor beyond the inclusion of atypical gaits, as indicated in the ablation experiment. Specifically, we limited the range of gait speed in synthetic gaits and investigated its impact on the performance of synthetic-data-trained models on the videos in the dementia dataset. From the original synthetic dataset covering the entire range of gait speeds in the real data, we extracted two subsets by excluding the data with gait speeds below 1.0 and 1.2 m/s. Of note, the dementia dataset contained 15 and 40% of data with gait speeds below 1.0 and 1.2 m/s, respectively. Model performance thus improved as the amount of training data increased, while the wider range of gait speed in synthetic gaits generally yielded better performance (Fig. 8b). The impact of the range represented in training data on performance tended to be large, particularly when the amount of training data was small. This corroborated that the differences across clinical populations compromised model generalizability, particularly when real data for training are limited and restricted to a specific population. Overall, the results indicate that the range of gait parameters represented in the synthetic gaits as well as the amount of synthetic data for training, could contribute to the generalizability of the model across different clinical populations, particularly those with different distributions of gait characteristics.

Comparisons with GAN-based synthetic gaits

To investigate the difference of our simulation-based approach with other representative data-augmentation methods that exploit real data for data generation, we evaluated a GAN-based approach. On the downstream task on classifying CU and dementia, the use of GAN-based synthetic gaits for model training slightly improved the classification performance, with +0.008 improvement in the AUC over the best model trained solely on real data, but the degree of improvement was smaller than using our synthetic gaits (i.e., +0.044).

To deepen our understanding of this difference, we investigated the associations of both types of synthetic gaits with unseen real data in the test dataset by calculating the distance to the nearest synthetic data for each unseen real-data sample and compared them between the two synthetic-data-generation methods. The distance of synthetic gaits with our approach was smaller than that with the GAN-based approach (0.12 versus 0.19; P = 7.9 × 10-5). Although the difference may become smaller as the training data for the GAN increases, the results indicate that under data-limited conditions, our approach can offer one viable solution for generating diverse gait data as observed in real gaits.

Discussion

The generalizability and scalability of ML models depend on the size and diversity of the training data24,25, but current gait-analysis models with commodity sensors remain constrained by the limited size and diversity of training data due to practical difficulties in the exhaustive data curation with ground-truth labels for various clinical populations and sensor settings. This limitation can degrade model performance in populations or settings differing from those represented in the training dataset, as demonstrated in this study. To address this challenge, we propose using synthetic gaits generated using a deep generative model learned via physics-based simulation with a broad spectrum of musculoskeletal and gait parameters for building generalizable gait-analysis models across different clinical populations and sensor settings. This study demonstrated the utility of our approach for two scenarios. In the first scenario using synthetic data for supervised learning, the model trained solely on synthetic data had data-efficient generalizability enabling gait-parameter estimation for multiple clinical populations and sensor settings with no access to real data for training. To achieve the same performance of the synthetic-data-trained model, the real-data-trained models specialized for specific clinical populations and sensor settings required 233 to 1287 videos (or 122 to 581 accelerometer data samples), indicating that synthetic gaits can potentially reduce a substantial amount of real data required for training. In the second scenario using synthetic data for pre-training with self-supervised learning, pre-training on synthetic gaits consistently enhanced model performance in multiple downstream tasks, including cross-sectional classification for dementia diagnosis and severity grading of CP, as well as longitudinal prediction on cognitive decline. Pre-training on synthetic gaits provides a basis for data-efficient model adaptation for all downstream tasks, achieving comparable performance to the model without pre-training with 38 and 25% of the real video and accelerometer data samples on average, respectively. Taken together, we believe that the proposed approach, exploiting synthetic and diverse musculoskeletal gaits, holds considerable promise for accelerating the development and widespread adoption of generalizable gait-analysis tools in a broad range of healthcare applications.

The experiments on the effects of diversity in synthetic gaits revealed the key implications of our approach, particularly by demonstrating that the inclusion of a wide range of muscle and gait parameters in synthetic gaits help capture the characteristics of diverse human gaits that include those of patients with neurological disorders and improve model performance across different clinical populations. It should be noted that this does not suggest that the simulation can reproduce patients’ gait, nor the musculoskeletal parameters modified in this study can be directly associated with their gait impairments. What this study demonstrates is that the use of synthetic gaits generated with a musculoskeletal model can be one solution to augment the diversity of training data for developing generalizable gait-analysis models across various clinical populations in a different manner from typical data-augmentation techniques, including generative models trained on real data24. Although mechanisms underlying gait impairments in neurological disorders have not been fully elucidated45,49, synthetic gait generation emulating diverse abnormal conditions in a biologically plausible manner can efficiently augment the diversity of training data to represent diverse human gaits.

Our results on the utility of synthetic musculoskeletal gaits for two scenarios each have clinical implications. First, we demonstrated that the model trained solely on synthetic musculoskeletal gaits can estimate gait parameters with no access to real data for training, but this does not suggest that the utilization of synthetic data can replace real-data collection for developing gait-analysis models. Instead, we posit that synthetic gaits can be useful for complementing real data to augment the size and diversity of the training data, which is critical for improving the performance and generalizability of ML models, as demonstrated in this study. In the absence of additional available resources, synthetic gaits may provide practitioners with a means to augment their limited amount of real data, resulting in a gait-analysis model that performs reliably in the face of new clinical populations or modifications to sensor settings. We also argue that the proposed approach could enable quantitative gait assessments from retrospective sensor data stored in clinics without any explicit information of gait parameters. They may include videos during gait tests for visual evaluation by clinicians and accelerometer data for daily-activity monitoring. For instance, observational gait assessment is part of the movement disorder society-unified Parkinson’s disease rating scale (MDS-UPDRS), the most widely used measure in both clinical and research practices to assess the severity of motor symptoms in PD50. In patients with idiopathic normal pressure hydrocephalus, gait assessment is also commonly conducted to determine surgical candidacy6. Longitudinal studies have shown that accelerometer-derived physical-activity characteristics can predict future changes in cognitive function51 as well as future incidence of MCI or dementia22, although gait parameters have not been examined in these studies. Even when data were collected with unique sensor settings differing from those used in available datasets, our approach could enable exploratory analysis on the associations of gait parameters with clinical outcomes by using synthetic sensor data to develop gait-parameter-estimation models, although the resultant model for gait-parameter estimation should be validated with real data before the conclusion. The synthetic-data-trained model could also be used for in-silico comparison and exploration of optimal sensor settings and sensing parameters (e.g., sampling rate) for prospective data collection.

As the second clinical implication, we also showed that the pre-trained models on synthetic gaits with self-supervised learning consistently achieve superior performance and training efficiency in adapting to multiple downstream tasks representing different clinical populations and different sensor settings. Although the performances on some downstream tasks should be further improved in practical terms, the models on identifying dementia and predicting cognitive decline from only gait videos show promising performance. Given the clinical-data acquisition along with ground-truth data is often limited in scalability, this result holds promise for accelerating the development and adoption of gait analysis in a broad range of gait-related healthcare applications, particularly those involving longitudinal predictions or rare diseases. For instance, gait impairments have been shown to be a good predictor for cognitive decline and dementia progression8,11, but previous studies were often limited either in samples size or gait parameters investigated (e.g., mostly gait speed only). The proposed approach may help enhance the prediction capability by enabling multifaceted quantification of gait impairments with commodity sensors such as a smartphone camera, as well as by complementing real data for model training. The complementary data can be particularly impactful on applications involving rare diseases with gait impairment7 in which the synthetic and diverse musculoskeletal gaits may help mitigate the challenges associated with data scarcity. For example, a recent study on nine patients with Friedreich’s ataxia demonstrated that full-body gait features derived from a motion-capture system enable more precise longitudinal prediction of disease progression than traditional clinical assessments, suggesting the usefulness of sensor-based gait analysis to track personal disease trajectories as well as reduce the duration or size of clinical trials for disease-modifying therapies7. The utility of our approach using synthetic gaits in the context of rare diseases will be a promising focus of future work.

Our technical approach can be extended in several future directions. One promising direction may be the application of this approach to more general motion analysis beyond gait. For example, in-depth quantification of clinical physical examinations, such as sit-to-stand and peg tests, as well as analysis of activities in daily living have demonstrated its potential to offer additional insights that aid in assessing and predicting disease conditions7,21,52. Combining our approach with generative models for muscle-based motion control to synthesize human behaviors other than gaits, e.g., reaching movements, in-hand manipulations, and dexterous hand-object manipulations53, would extend its utility. Likewise, the application to the development of models for continuous monitoring in daily-living environments would be valuable. In this context, data augmentation and self-supervised learning have also been used as a means to address the challenge of limited available data54,55. Our approach can be combined with those methods to augment data diversity for developing generalizable models across different populations.

Validation of our approach in real-world environments is another notable direction for future research. Although we investigated the generalizability of our approach across four datasets involving different clinical populations and sensor settings, derived from different research studies and partially including real-world data, further validation is still needed, particularly to address the diversity of real-world data. Unlike research datasets, real-world clinical practice may involve patients with heterogeneous demographics and potential comorbidities, miscellaneous sensor products and restrictive sensor settings, as well as operational deviations. These diversities can be more pronounced in self-administered applications such as at-home care. Additionally, gait assessments in clinical practice often involve gaits with walking aids in patients with severe gait impairments, as well as clinical gait tasks involving turns50, which are not considered in the current implementation of our approach. Future research should investigate the efficacy of synthetic gaits in developing generalizable models to address the aforementioned real-world complexities. See Supplementary Discussions 1 and 2 for detailed discussions on limitations and future directions.

Methods

Overview of datasets

We used four datasets. Two datasets are video datasets involving different clinical populations and sensor settings: the dementia video dataset comprises videos recorded with a fixed camera from the front or back collected from patients with dementia and MCI as well as older CU adults, and the CP video dataset comprises videos recorded from the side with a horizontally rotating camera collected from children and younger adults with CP. For the ground-truth labels, both datasets include gait parameters derived from optical motion-capture data as well as clinical measures that can be used for downstream tasks. Since videos and motion-capture data in the CP dataset were not collected simultaneously, model performance in terms of estimating gait parameters is assumed lower than that for the dementia dataset, in which videos and motion-capture data were collected simultaneously, due to intra-individual, trail-to-trial variability13. The other two datasets are accelerometer datasets involving different clinical populations and sensor settings: the dementia accelerometer dataset comprises accelerometer data from the waists or ankles collected from patients with dementia and MCI as well as older CU adults, and the PD accelerometer dataset comprises accelerometer data on the chest or thighs collected from patients with PD and healthy controls of younger and older adults. Both accelerometer datasets have stopwatch-based gait-speed data as a ground-truth label for the gait-parameter-estimation task. A subset of the dementia accelerometer dataset can be supplementarily analyzed with the ground-truth gait parameters derived from optical motion-capture data acquired on different days. The dementia video and accelerometer datasets are our in-house datasets curated in Japan for this study, while the CP and PD datasets are third-party public datasets curated in USA13,56. The study was conducted with the approval of the Ethics Committee, University of Tsukuba Hospital (H29-065, R1-137, and R1-168), and it followed the ethical code for research with humans as stated in the Declaration of Helsinki.

Dementia video dataset

We analyzed a total of 7611 gait videos from 160 unique older adults with dementia (n = 35) and MCI (n = 63) as well as CU individuals (n = 52) and those without formal diagnosis (n = 10; see Supplementary Method 3 for the diagnostic details). The average participant age was 73.5 years (standard deviation (SD): 5.7), with 13.1 years of education (SD: 2.5). Their average height was 1.58 m (SD: 0.09). The participant demographics (self-reported sex, age, years of education, and height) as well as their cognitive, clinical, and gait profiles for each diagnostic group are presented in Supplementary Table 1. All data were collected at University of Tsukuba Hospital, Japan, between 2018 and 2019 (baseline) and between 2021 and 2022 (3-year follow-up). All participants provided written informed consent.

During a lab visit at the baseline, gait trials were video-recorded using fixed cameras from the front and back. The participants completed an average of 12.2 gait trials (SD: 2.1). In each trial, they were asked to walk 9.5 m at their usual pace. The walkway was a flat space 12 m long × 3 m wide without obstacles. The 9.5-m path started at 2.5 m from a short edge of the walkway and ended when crossing the opposite edge. Four commodity webcams were fixed approximately 1.25 m high at each of the four corners of the walkway, such that their angles of view looked approximately towards the center of the walkway. Each gait was recorded using the four cameras, which resulted in four videos per trial. Videos were stored in an MP4 format at a resolution of 720 × 1280 or 1080 × 1920 pixels for width and height and a rate of 30.0 frames per second.

For each video frame, we extracted 2D image-plane coordinates of body keypoints by using the 26-keypoint RTMPose model57, a state-of-the-art keypoint-extraction model at the time of analysis, which resulted in a time series of (x,y)-coordinates of keypoints along with a confidence score of 0 to 1 for each keypoint for each frame. After removing keypoints with a confidence score of less than 0.3, videos were automatically excluded from the analysis if (i) it was shorter than 60 frames (2 s) or (ii) did not contain a gait of at least four steps estimated by the number of peaks in the y-coordinate of ankle keypoints. Each time series was then truncated to exclude the first and last two steps or more to discard those affected by increased and decreased speed. Errors in keypoint detection, e.g., confusion of different keypoints, were corrected manually for each frame by re-labeling mis-identified keypoints or by excluding erroneous keypoints from the analysis. This procedure yielded a total of 7611 time-series data samples of an average length of 157 frames (SD: 42 frames) or of 5.2 s (SD: 1.4 s; Supplementary Fig. 5a for the full distribution).

Ground-truth gait parameters were computed using optical 3D motion-capture data concurrently recorded in the same gait trials. We used the eight-camera OptiTrack Flex 13 motion-capture system (NaturalPoint, Inc., Corvallis, OR, USA), sampled at 120 Hz with 50 reflective markers placed on pre-defined anatomical landmarks according to the manufacturer’s instructions. The data were semi-manually post-processed to correct marker tracking errors. To discard the increase and decrease speed effect, the first two steps and last 2.5 m were excluded from analyses. The gait parameters were computed on the basis of the 3D body kinematics measured using the motion-capture system, where the gait speed was calculated using the trajectory of the pelvis marker on the waist back, and the step measures were calculated using the trajectories of the heel markers. Specifically, the step measures were calculated by identifying stationary time-segments on the ground in the following steps. First, the position trajectories of the heel markers were interpolated using a piecewise cubic polynomial interpolation algorithm and low-pass filtered with a cut-off frequency of 10 Hz, which was used for calculating the velocity. We then identified stationary time-segments on the basis of the velocity of the heel markers and obtained timing of heel strikes and positions of heels at stationary phase. All results were plotted and visually confirmed. Finally, step time was calculated as the time interval between heel strikes, and step length and step width were calculated using the distance relative to the anteroposterior and mediolateral axes between the heel positions at successive stationary phases, respectively. Due to a number of missing markers in the motion-capture data, which precluded post-processing and fair calculation of gait parameters, we did not calculate ground-truth gait parameters for motion-capture data with initial marker-missing rates of approximately 20% or more and calculated them for three gait trials per participant if available. Consequently, ground-truth gait parameters were available for 1423 out of the 7611 videos. Regarding the downstream tasks, a total of 7122 videos from 150 participants with the baseline diagnosis were included in the analysis. Specifically, 7122, 4006, and 4373 videos from 150, 87, and 89 participants were used for the cross-sectional multi-class and binary-classification tasks on the dementia diagnostic status as well as the longitudinal-prediction task on cognitive decline, respectively. The cognitive decliners were defined as those with the 3-year follow-up Mini-Mental State Examination58 score of two or more points below the baseline score, resulting in 27 of the 89 participants (1292 of the 4373 videos) identified as a cognitive decliner. We also considered the following auxiliary variables for model analyses: the height for the gait-parameter estimation, assuming that it is essential as the basis for estimating spatial parameters, and the age, years of education, and height for the downstream tasks.

In this dataset, the participants underwent muscle-activity measurement simultaneously with video recording. To record muscle activity during gaits, we used 16 wireless surface EMG sensors sampled at 2000 Hz (Trigno Wireless EMG System, Delsys, Inc., Natick, MA, USA). These sensors were directly attached to the skin using double-sided adhesive tape, following skin preparation with alcohol wipes. Sensors were placed following SENIAM guidelines (http://seniam.org/), with bilateral placement over the following lower limb muscles: rectus femoris, biceps femoris, tibialis anterior, medial gastrocnemius, vastus medialis, peroneus longus, soleus, and medius gluteus muscle. As EMG data for the medius gluteus muscle could be collected only from a subset of the participants, these data were excluded from the present analysis. We further excluded data from the analysis if they met either of the following criteria: (i) missing values due to technical issues such as battery failure or transmission errors, or (ii) lack of substantial change in EMG values between stationary standing and walking, defined as the average EMG value during walking smaller than the mean+5 SD during stationary standing, which could be attributed to inadequate sensor attachment or skin sweat. This procedure yielded a total of 3566 video data samples paired with EMG data (average length: 5.2 s, SD: 1.4 s) from 150 unique participants consisting of patients with dementia (n = 32) and MCI (n = 58) as well as CU individuals (n = 50) and those without formal diagnosis (n = 10).

Cerebral-palsy video dataset

We analyzed a total of 1636 gait videos from 968 unique patients, mainly children, with CP diagnosis. The average age of the patients included in the analysis was 12.6 years (SD: 6.1). Their average height, mass, and gait deviation indices (GDIs) were 1.41 m (SD: 0.19), 40.2 kg (SD: 17.1), and 76.5 (SD: 10.9), respectively. The patient demographics (age, height, and mass) as well as clinical and gait profiles are detailed in Supplementary Table 2. The participants underwent an average of 1.2 gait visits (SD: 0.5), where each visit comprised one gait trial for video recording and another for ground-truth recording. All data were collected during their clinical gait visits conducted in USA. See the original article13 for details of the dataset.

Videos were recorded during gait trials using an unfixed, rotating camera from the sagittal view. In each trial, the patients were asked to walk back and forth along a 10-m path three to five times. Each gait was video-recorded using a single commodity camera placed approximately 3 to 4 m from the line of walking, where the camera was manually rotated around the vertical axis to follow the patient. Videos were stored in an MP4 format at a resolution of 640×480 pixels for width and height and a rate of 29.97 frames per second.

For each video frame, the time-series of (x,y)-coordinates of body keypoints were provided along with confidence scores, where keypoints were detected using the 25-keypoint OpenPose model59. As this is a third-party dataset provided solely as pre-extracted keypoint data, the model used for keypoint extraction was different from that used for the dementia video dataset. However, this methodological difference does not impact the interpretation of our investigation, as our objective was to demonstrate the utility of the proposed approach across datasets, rather than conduct a comparative performance analysis between the datasets. Specifically, we first split each time-series data sample into multiple segments of a straight walk by dividing the data at the turn timings. After removing keypoints with a confidence score of less than 0.3, segments were automatically excluded from the analysis if (i) they were shorter than 60 frames (2 s), (ii) any keypoint necessary for the analysis was missed for two or more consecutive frames, or (iii) the patient stopped in the middle of walking. A stop in the walk was detected by keypoint movement of ankles smaller than 0.05 in the image-plane coordinates normalized by the length between neck and middle hip. This procedure yielded a total of 1636 time-series data samples of an average length of 188 frames (SD: 108 frames) or of 6.3 s (SD: 3.6 s; Supplementary Fig. 5b for the full distribution).

Ground-truth gait parameters were provided for all body keypoint data on the basis of optical 3D motion-capture data separately recorded in the same lab visit. Specifically, the 12-camera Vicon MX motion-capture system (Vicon Motion Systems Ltd, Oxford, UK), sampled at 120 Hz, was used. The data were semi-manually post-processed to fill missing marker measurements then used to compute gait parameters of interest from 3D joint kinematics. Although the gait parameters were provided for each side in the original dataset, we used the averages of left and right values in our analysis for consistency across datasets. For the downstream task, the GMFCS level, a widely used five-level clinical grading system to assess the self-initiated movement abilities of patients with CP60, was rated by a physical therapist. The patients were rated as level I (lowest severity; n = 277), level II (n = 245), level III (n = 128), and level IV (n = 3), with no patients rated as level V (highest severity), yielding a total of 911 videos included in the analysis. We excluded level IV from our analysis due to the limited number of data samples, as it would make it impractical to conduct stratified cross-validation, as described in the subsequent section. For model analyses, we also considered the following auxiliary variables: the height for the gait-parameter estimation and height, mass, age, and GDI for the downstream task.

Dementia accelerometer dataset

We analyzed a total of 2489 wearable accelerometer data samples captured during gait trials from 164 unique older adults. The average participant age was 73.1 years (SD: 7.0) and the group included patients with baseline clinical diagnosis of dementia (n = 19) or MCI (n = 57) as well as CU individuals (n = 60). The participants included those diagnosed in routine clinical practice, for whom the diagnostic criteria may differ from those for other participants. The participant demographics (self-reported sex, age, and years of education) as well as their clinical and gait profiles for each diagnostic group are presented in Supplementary Table 3. All data were collected during lab visits or through a paid cognitive health checkup service in clinical practice (see Supplementary Method 1 for details) at the University of Tsukuba Hospital, Japan, between 2018 and 2023. All participants provided written informed consent.

Accelerometer data were recorded during gait trials using tri-axial accelerometers mounted on participants’ waists and both ankles. We used GENEActiv sensors (Activinsights Ltd, Kimbolton, UK), sampled at 100 Hz. Each ankle sensor was firmly attached to the outside of each ankle, just above the lateral malleolus, using Velcro bands with the sensor’s y-axis pointing downwards. The waist sensor was firmly attached to the lower back approximately on the L5 position, a commonly used sensor placement for approximating the center of mass, using surgical tape with the sensor’s y-axis pointing downwards.

In each gait trial, the participants were asked to walk flat corridors without any obstacles. We used three different distances of 4, 5, and 7 m for gait measurement, excluding acceleration and deceleration paths of 2 m or more, where the distance varied depending on the availability of the facility. The elapsed time necessary to walk the path was manually measured using a stopwatch for recording timestamps at timings when the participant crossed the start and end lines for the measurement. The accelerometer data between the start and end timestamps were cropped for each trial and used for the analysis. In summary, each participant completed an average of 4.0 gait trials (SD: 0.5). We excluded data from the analysis for (i) inconsistency in sensor orientation and (ii) errors in timestamp recording. This procedure resulted in a total of 2489 time-series data samples (803 for waist and 1686 for ankles) of an average length of 486 frames (SD: 123 frames) or of 4.9 s (SD: 1.2 s; Supplementary Fig. 5c for the full distribution), along with the stopwatch-based ground-truth gait-speed data.

For this dataset, 89 of the 164 participants overlapped with those included in the dementia video dataset, whereas all accelerometer data were captured in different visits. The accelerometer data for 68 of the 89 participants were collected at least one year after video- and motion-capture data collection. For the other 21 participants, some accelerometer data were collected within a shorter period with the video and motion-capture data (mean interval: 7.2 days, SD: 7.6 days; Supplementary Table 4), yielding 401 accelerometer data samples labeled with optical-motion-capture-based ground-truth gait parameters.

Parkinson’s disease accelerometer dataset

We analyzed a total of 382 wearable accelerometer data samples captured during gait trials from 32 unique individuals with PD (PD; n = 16) and controls (n = 16). The average age of the participants included in the analysis was 64.5 years (SD: 10.5). The details of the participant demographics (sex and age) as well as their clinical and gait profiles for each diagnostic group are presented in Supplementary Table 5. The gait data were collected during in-clinic assessments conducted in USA. See the original article56 for details for this dataset.

The accelerometer data included in our analysis were recorded during gait trials using tri-axial accelerometers mounted on participants’ chests and each anterior thigh. BioStamp RC sensors (MC 10 Inc., Lexington, MA) were used at a sample frequency of 31.25 Hz. The chest sensor was attached with the sensor’s y-axis pointing downwards. Each thigh sensor was attached with the sensor’s x-axis pointing upwards.

As the gait trials, the participants underwent 10-m walk tests as part of standard in-clinic assessments for PD. The elapsed time necessary to walk the path was measured using manually recorded timestamps at the start and end of the test, which is equivalent to measurement with a stopwatch. The accelerometer data between the start and end timestamps were cropped for each trial and used for the analysis. In summary, each participant with PD completed an average of 2.8 walk test trials (SD: 0.4) in the ON state and 2.9 trials (SD: 0.5) in the OFF state. The control participants completed 3.1 trials (SD: 0.2). We excluded data from the analysis for (i) inconsistency in sensor orientation and (ii) errors in timestamp recording. This procedure resulted in a total of 382 time-series data samples (131 for chest and 251 for thighs) of an average of 148 frames (SD: 33 frames) or of 4.8 s (SD: 1.1 s; Supplementary Fig. 5d for the full distribution), along with the stopwatch-based ground-truth gait-speed data.

Overview of proposed approach

Our proposed approach, illustrated in Fig. 1, consists of four main steps: (i) in-silico generation of synthetic gaits with a deep generative model learned in the musculoskeletal simulation (Fig. 1a); (ii) in-silico generation of synthetic sensor data derived from the synthetic gaits (Fig. 1b); and (iii) model training on synthetic data with supervised learning (Fig. 1c) followed by (iv) inference with the model on real data (Fig. 1d), or (iii’) model pre-training on synthetic data with self-supervised learning (Fig. 1e) followed by (iv’) fine-tuning of and inference with the model on real data for each downstream task (Fig. 1f). We use a single-camera video as an example to describe the approach.

Regarding the in-silico generation of synthetic gaits (i.e., the first step of our approach), we use Generative GaitNet33 that learns muscle control of human gaits in accordance with the desired anatomical and gait conditions through deep reinforcement learning and a full-body musculoskeletal model in physics-based simulation. We generate diverse synthetic gaits that represent a broad spectrum of gait and anatomical conditions by continuously modifying simulation parameters, including target gait parameters (e.g., stride length and cadence), body properties (e.g., body size and proportion of body parts), and muscle-deficit characteristics (e.g., weakness and contracture). Atypical synthetic gaits with abnormal parameters are included to increase the diversity of the synthetic data. In the second step, we simulate sensor data during these simulated gaits. For video data, we project three-dimensional (3D) body poses into a 2D plane, then obtain the time series of the 2D coordinates of body poses, equivalent to the data produced using computer-vision-based pose-estimation algorithms applied to real gait videos. To improve the model's generalizability across different sensor settings, we use the domain-randomization technique to generate synthetic sensor data, i.e., 2D body poses, for different camera positions, postures, and movements (i.e., fixed or rotating to follow the person). The synthetic data are used in the following two ways. The first involves using the time-series of 2D body poses and gait parameters derived from synthetic gaits (i.e., gait speed, step time, and step length) as input and output variables, respectively, for model training with supervised learning. The models trained solely on synthetic data are then applied to real data for gait-parameter estimation from 2D, single-camera gait videos. The second way involves using the time-series of 2D body poses as pre-training data for learning general-purpose representations through self-supervised learning. The pre-trained model is then fine-tuned with real data for each downstream task and tested with separate real data.

In-silico synthetic data generation

We used the pre-trained Generative GaitNet with a full-body musculoskeletal model33 and generated synthetic gaits with different parameter sets related to gait and anatomical conditions to emulate human gaits of adults and children, including pathological gaits, included in the datasets used in the analysis. The musculoskeletal body model has 50 degrees of freedom and 304 Hill-type muscle models, and its dynamics was simulated with 480 Hz on the DART physics engine61. The reference skeletal body model represents an individual of 168.7 cm in height and 72.9 kg in weight, and the body size and proportions of each body part of the model, as well as gait and muscle conditions, can be arbitrarily modified in the simulation by specifying the parameters. We modified a total of 132 continuous parameters consisting of 6 parameters for body properties, 2 parameters for gait conditions, 62 parameters for muscle contracture of lower limbs, and 62 parameters for muscle weakness of lower limbs. All parameters are scale factors with respect to the reference model. The body parameters are the scale factors of the whole-body size, as well as the proportion of each body part for the head and four lower limbs. The parameters for gait conditions are for specifying target stride length and cadence. The parameters for muscle weakness and contracture are the scaling factors of the maximal isometric force and optimal length for each muscle, respectively. We separately prepared parameter sets for adult- and children-like bodies as well as for normal and abnormal muscle conditions. While the scale factors of body size for adults were randomly generated with a uniform distribution ranging from 0.85 to 1.1, those for children were randomly generated with a normal distribution, N(0.75, 0.1) and clipped to fall within [0.5, 1]. While parameters for the normal muscle conditions were set to 1, those for abnormal muscle conditions each were randomly generated with a normal distribution, N(1, 0.05) and clipped to fall within [0.7, 1]. Other parameters were commonly generated as follows. The scaling factors for proportions of each body part were randomly generated with a uniform distribution ranging from 0.95 to 1.05, and the scaling factors for stride length and cadence were randomly generated with a uniform distribution ranging from 0.6 to 1.4 and from 0.8 to 1.4, respectively. We randomly generated 3000 parameter sets each for synthetic gaits of adults and children with normal and abnormal parameters of muscles, yielding a total of 12,000 parameter sets for synthetic gaits used in this study. We ran the simulation at 480 Hz with each parameter set and used data segments of a 10-m walk from the 5-m point in simulation runs where the musculoskeletal models were able to walk 20 m without falling. The parameters for body size and gait conditions were determined to have a wide range, including values from the datasets used in this study. The parameters for abnormal muscle conditions were determined on the basis of preliminary experiments to avoid too many failed runs where the musculoskeletal models were unable to walk 20 m due to falling. We used a laptop, HP OMEN 15-dh1004TX, with Intel Core i9-10885H CPU 2.40 GHz and a NVIDIA GeForce RTX 2080 SUPER with Max-Q design to run gait simulations. The simulation runs in almost real time. For more details including the learning algorithms of Generative GaitNet, see the original article33.

We then generated synthetic sensor data for videos and accelerometers during the simulated gaits. Regarding the videos, we projected 3D body poses into a 2D image plane assuming no lens distortion, emulating the data format after applying computer-vision-based pose-estimation algorithms into videos. Before the projection, we added random noise following a normal distribution N(0, 0.02) [m] into 3D body keypoints, assuming errors of pose estimation in real data. This value was determined on the basis of the mean per joint position errors of state-of-the-art 3D human-pose-estimation models on walking videos (e.g., around 11 to 25 mm62). The cameras were placed to emulate the camera settings in the datasets used in this study then translated and rotated randomly. To emulate fixed cameras from the front or back in the dementia dataset, one camera was placed 1.5 m to the mediolateral axis and 2.0 m backwards from the start position of the 10-m walk, while another camera was placed 1.5 m to the mediolateral axis and 2.0 m forwards from the end position of the 10-m walk. Both cameras were placed 1.25 m high, and rotated horizontally so that each camera faced the center of the 10-m walkway. Each camera was then translated from the reference position with random offsets each in the mediolateral, anteroposterior, and vertical axes, the values of which were determined with uniform distributions ranging ±0.025, ±0.15, and ±0.025 m, respectively. Finally, each camera was rotated in yaw, roll, and pitch with random degrees determined by uniform distributions ranging ±12.5, ±5.0, and ±5.0, respectively. To emulate horizontally rotating cameras from the side in the CP dataset, the camera was placed 1.0 m high and 3.5 m to the mediolateral axis from the center of the 10-m walkway. The camera was then translated from the reference position with random offsets each in the mediolateral, anteroposterior, and vertical axes, the values of which were determined with uniform distributions ranging ±1.5, ±0.1, and ±0.3 m, respectively. The camera’s initial posture was also rotated randomly in pitch and roll with random degrees determined from uniform distributions ranging ±12.5 and ±5.0, respectively. During the gait sequence, the camera was horizontally rotated to follow the center of mass of the body model with random noise following a normal distribution N(0, 0.5) [degrees]. To emulate accelerometer data in the datasets used in this study, that is, accelerometers attached to the waist, ankle, chest, and thigh, we used the acceleration of the following four body parts in each body coordinate: pelvis, tibia, torso, and femur. These body and gravitational accelerations were described in the coordinate system of each body part, then these data were used as synthetic accelerometer data. Similar to the camera, each coordinate of synthetic accelerometers was rotated randomly in yaw, roll, and pitch with random degrees determined by uniform distributions ranging ±30.0, ±15.0, and ±15.0, respectively. These random values for both synthetic videos and accelerometers were independently calculated for each synthetic sensor data sample.

Preprocessing of time-series sensor data

Both synthetic and real sensor data were converted into the time-series data of F features derived from sensor data with the same preprocessing procedures unless otherwise specified. After the preprocessing, we randomly extracted 30 segments of T-seconds (T is segment length) each, allowing for overlaps, from each time-series data sample of features. These segments served as fixed-length inputs for DNN models. The T was set to 2 and 1 s for the gait-parameter estimation and downstream tasks, respectively, applicable to both video and accelerometer data. The left-right axis was inverted both for videos recorded from the right side and for accelerometers acquired from the body part on the right side to simplify the subsequent training of DNN models.

The body keypoint data derived from each single video was processed in the following five steps: (i) resampling, (ii) interpolation for missing values, (iii) smoothing, (iv) coordinate transformation (scaling and translation), and (v) feature extraction. The body keypoint data were first resampled to 30 Hz, a standard sampling rate of video data, yielding 60- and 30-frame segments as DNN model inputs for gait-parameter estimation and downstream tasks, respectively. Because the real data used in this study were recorded at about 30 Hz, only synthetic data were down-sampled from 480 to 30 Hz. After imputing missing data using linear interpolation, we then applied smoothing to each time-series data sample using a 1D unit-variance Gaussian filter. In the fourth step, positions of body keypoints in each frame were divided by length l and translated so that the center of the hip is at the origin of the 2D coordinates, where l was calculated in each frame using the average of the Euclidean distance between the right hip and right shoulder and that between the left hip and left shoulder in the 2D image plane. Finally, we extracted F features in each frame and obtained time-series data of F features. For the model for estimating gait parameters, we used the following 15 features: 2D positions of lower-limb keypoints in both sides (hip, knee, and ankle), knee angles of both sides in the 2D coordinate formed by the ankle, knee, and hip keypoints in each side, and the Euclidean distance between the left and right ankle keypoints in the 2D coordinate. For the model for the downstream tasks, to capture whole-body dynamics of human gaits, we added 18 other features consisting of 2D positions of other body-part keypoints (shoulder, elbow, wrist, and toe in both sides as well as neck), yielding a total of 33 features.

Accelerometer data were processed in the following three steps: (i) low-pass filtering, (ii) resampling, and (iii) feature extraction. In the first and second steps, we applied a low-pass filter with a cut-off frequency of 30 Hz then resampled to 50 Hz, yielding 100- and 50-frame segments as DNN model inputs for gait-parameter estimation and downstream tasks, respectively. Thus, synthetic and real data in the dementia dataset were down-sampled from 480 and 100 Hz, respectively, while real data in the PD dataset were up-sampled from 31.25 Hz. We then added the Euclidean norm of acceleration to the raw tri-axial acceleration data for each four-body placement we studied (waist, ankle, chest, and thigh) and finally obtained time-series data of a total of 16 features (4 features × 4 placements). Because we focused on data analysis for single-accelerometer data in this study, accelerometer data acquired from different body placements separately formed inputs by using zero-padding to other feature dimensions (i.e., 12 dimensions). In the supplementary analysis on the subset of the dementia accelerometer dataset with optical motion-capture data, we analyzed waist-worn and ankle-worn accelerometers, yielding eight features. In the model with self-supervised training, we analyzed chest-worn accelerometer data, yielding four features.

DNN-model development and evaluation

For supervised learning, we used the following three models with representative network architectures for analyzing time-series data: fully convolutional network36,37, ResNet36,37, and transformer38. We used each model without any modification of hyperparameters for the network architecture and model training, as implemented in each of the previous study unless otherwise specified (see Supplementary Result 8 and Supplementary Tables 1619 for model performance with different hyperparameters). Due to the limited size of available real data, we also considered a simpler model architecture with fewer parameters. However, our preliminary analyses on simpler models based on the multi-layer perceptron architecture showed suboptimal performance in estimating gait parameters (see Supplementary Result 9 for details). Consequently, we opted to employ more complex models, as listed above, in the primary analysis. In brief, the fully convolutional network consists of three convolutional blocks and one global average pooling layer fully connected to an output layer. Each convolutional block contains three operations: a 1D convolution followed by batch normalization then rectified linear unit (ReLU) activation. The ResNet consists of three residual blocks and one global average pooling layer fully connected to an output layer. Each residual block contains three convolutional blocks with a linear shortcut linking the output of a residual block to its input. The transformer model consists of four transformer-encoder blocks, one global average pooling layer, and one fully connected layer that is fully connected to an output layer. Each transformer-encoder block contains a multi-head attention layer with dropout, layer normalization, and residual connection, followed by two 1D convolutions with dropout and layer normalization.

For the downstream tasks, we also included the following five models in the comparison: TimesNet39, Non-stationary Transformer40, Informer41, LSTM42, and Mamba43. The first three models show state-of-the-art performance for time-series classification tasks in the public benchmark63, and the last two were added as a representative general sequence model. TimesNet is a convolutional-neural-network-based architecture and designed to capture the temporal patterns derived from different periods with Fast Fourier Transform39. Non-stationary Transformer and Informer are transformer-based architectures40,41. Non-stationary Transformer was proposed as an effective series stationarization architecture that improves the predictive capability of non-stationary series40. Informer is designed to capture long-range dependency between time-series inputs and outputs for the time-series forecasting with a self-attention mechanism based on the query sparsity measurement41. Mamba is a state-space model architecture showing promising performance on various domains including time-series data analysis43. We used these four model architectures as implemented in the Time Series Library64, where the Mamba model, which is originally designed for forecasting tasks, was connected to the classification layers implemented in the TimesNet model with the same parameters. We trained all models for a maximum of 100 epochs with a batch size of 32 or 256, early-stopping by using Adam optimizer, and learning rates of 5 × 10-6 and 5 × 10-4 for video and accelerometer data, respectively.

We carried out self-supervised learning by using the TS2Vec method44 in addition to the above eight models. The TS2Vec method uses a hierarchical contrastive learning framework, in which contrasting is executed for lower-level (frame-level) to higher-level (whole-time-series-level) representations, to learn contextual representations for arbitrary sub-series at multiple levels of time granularity. With this method, positive pairs are defined as representations of the same timestamp from randomly masked views, without any temporal or spatial transformations, while all other pairs are considered negative, enabling the model to effectively learn, we assume, temporal and spatial gait dynamics. We used the same parameters used in the original study44, i.e., a batch size of 8, learning rate of 0.001, and representation dimension of 320. For fine-tuning on downstream tasks, we added a new fully connected layer with L1 regularizer (0.1, 0.01, or 0.001) on top of the frozen pre-trained encoder. This layer ingested the 320-dimensional representation yielded by the encoder and was fully connected to an output for each downstream task. Because our downstream tasks include classification on an imbalanced dataset, we experimented with and without class weights in the loss function, where the class weights were calculated as inverse class frequency. Regarding the other eight models, we first trained the model using synthetic data to forecast the next one second of time-series data (i.e., 30 and 50 frames for video and accelerometer data) then conducted fine-tuning on each downstream task. For fine-tuning on downstream tasks, we replaced the last layer connected to the output layer with one fully connected layer connected to a new output layer for each downstream task and trained this layer while freezing the remaining connections. A validation set was used for selecting the above-mentioned parameters for L1 regularization and class weighting as well as for applying early stopping.

The time-series data derived from videos for these DNN models was standardized in each feature dimension by using the mean and SD of the training set, while the data derived from accelerometers were directly fed into the DNN models. For the regression tasks, the output layer received inputs from previous layers as well as auxiliary variables by using a linear activation function, and the model was optimized using the mean squared error as the loss function. For the classification tasks, the output layer received the same inputs by using a softmax activation function, and the model was optimized using cross entropy as the loss function.

For internal evaluation, we used the subject-wise cross-validation procedure, with which the dataset was split into training, validation, and test sets in the ratio k:1:1, ensuring that each individual’s data were only included in either the training, validation, or test set. For gait-parameter estimation, we used k = 1, 2, 3, 8, 18, and 38, corresponding to training with 33, 50, 60, 80, 90, and 95% of the dataset, respectively. To evaluate the conditions under which the proportion of the training set was even smaller, subsets of the training set were used for actual model training, which were extracted in a subject-wise manner after splitting the dataset into the training, validation, and test sets. For the downstream classification tasks, we used k = 2 and 8 for video and accelerometer data, respectively, with stratified sampling to handle the imbalanced dataset. To mitigate the influence of selection bias and fairly assess the data efficiency, data splitting and subset selection were conducted randomly, and the cross-validation procedure was repeated a minimum of five times for each condition using distinct random seeds. Model outputs were averaged for segments extracted from the same single-sensor data or from the same individual. Model performance was then evaluated using Pearson’s correlation coefficient and mean absolute error for the regression tasks, whereas AUC and balanced accuracy were used for the binary and multi-class classification tasks, respectively. The statistical significance of performance improvement was examined from two-tailed t-tests.

Data efficiency was assessed by determining the amount of real data required to achieve a reference performance. For gait-parameter estimation, this corresponded to the amount of real data needed to train models from scratch to achieve the synthetic-data-trained model’s performance. For downstream tasks, it represented the amount of real data required to fine-tune the synthetic-data-pre-trained model to outperform the best model without pre-training. The required data amount was estimated with linear interpolation between data points, such as estimating performance at 55% of data from models trained on 50 and 60% of data. For gait-parameter estimation, each data point represents one dataset split with varying k, as described above. The video-based downstream tasks were evaluated for 10 to 100% with 10% increments, while the accelerometer-based downstream tasks were evaluated for 20 to 100% with 20% increments, given the smaller datasets.

For comparison, synthetic gait data were also generated using TimeGAN65, a representative GAN-based time-series augmentation technique. Specifically, GANs were trained for 100 epochs using the training dataset for each class in each cross-validation fold during the evaluation of real-data-trained models, and each trained GAN was used for generating synthetic time-series segments to double the size of the training set. Hyperparameters were left as the default values implemented in the Time Series Generative Modeling library66.

Dimension reduction and clustering

We used the UMAP47 and Gaussian mixture model46 to probe the associations between synthetic and real data used in this study. The UMAP was used as dimension reduction to find the low-dimensional subspace where synthetic gaits with normal and abnormal parameters were separated. The input data for the UMAP were vector representations (i.e., embedding vectors) obtained from the TS2Vec model pre-trained on synthetic data (i.e., 320-dimensional vectors), which achieved the highest average ranking across the downstream tasks (Supplementary Table 20). We used the UMAP algorithm with the following parameters: the number of components = 2, metric = “cosine”, minimum distance = 0.99, and number of neighbors = 15. The Gaussian mixture model was used to estimate the probability-density function of the representation vectors for synthetic gaits in the 2D UMAP subspace. When the UMAP was repeatedly applied with different parameters, we could robustly identify six clusters corresponding to the combinations of camera settings (fixed-camera/front-view, fixed-camera/back-view, and rotating-camera/side-view) and parameter types (normal and abnormal) for synthetic gaits. Quantitatively, when the Gaussian mixture model was applied with the number of mixture components of 6, about 90% of the synthetic data of each above-mentioned group belonged to the same cluster, which was different from the clusters dominantly belonging to those of different groups. Thus, we used six components for the Gaussian mixture model. Using the estimated probability-density function, we investigated whether the real data in our datasets fall within 95% of the Gaussian distributions for the synthetic data. We repeated this procedure 100 times for statistical analysis, and the statistical significance was examined using one-way analysis of variance with Tukey’s pairwise comparisons. The Fréchet inception distance48 was calculated for each cluster resulting from the Gaussian mixture model, and the mean value was used for the analysis.

Statistics & reproducibility

The statistical analyses of the data and the reproducibility of experiments are detailed in the Nature Portfolio Reporting Summary linked to this article.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Supplementary information

Reporting Summary (1.6MB, pdf)

Source data

Source Data (467.7KB, zip)

Acknowledgements

This work was supported in part by the Japan Society for the Promotion of Science, KAKENHI [grant numbers 19H01084 and 21K12153].

Author contributions

Y.Y. conceived the study and designed the experiments. Y.Y., K.S. and M.N. conducted data collection. M.K. and K.S. conducted data selection and cleaning. Y.Y. conducted data analysis and wrote the manuscript. M.K. helped with data analysis and manuscript writing. M.O., K.N. and T.A. conducted recruitment and diagnoses of the participants. E.B. and J.L. provided feedback on the study. All authors reviewed and approved the manuscript.

Peer review

Peer review information

Nature Communications thanks Haocong Rao, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Data availability

The dementia datasets are not publicly available as they contain information that could compromise research participant privacy/consent; however, de-identified preprocessed data will be available subject to the approval of the Ethics Committee, University of Tsukuba Hospital. Researchers wishing to access the data should contact the authors, Yasunori Yamada (ysnr@jp.ibm.com) and Miho Ota (ota@md.tsukuba.ac.jp). The expected timeframe for processing access requests is typically three months, but depends on the complexity of the request and the schedule of the Ethics Committee. The requirements on data usage will be determined by the Ethics Committee. The CP dataset is publicly available at SimTK with the identifier 10.18735/j0rz-0k12. The PD dataset is publicly available at IEEE DataPort with the identifier 10.21227/g2g8-1503. Source data are provided with this paper.

Code availability

The codes central to the main claims are available in open-source repositories. For generating synthetic data, we used Generative GaitNet (https://github.com/namjohn10/GenerativeGaitNet)33 and TimeGAN (https://github.com/AlexanderVNikitin/tsgm/blob/main/tsgm/models/timeGAN.py)65. For supervised DNN model development, we used the following codes with no modification of model architectures and parameters except as specified in the Methods section: fully convolutional network (https://github.com/hfawaz/dl-4-tsc/blob/master/classifiers/fcn.py) and ResNet (https://github.com/hfawaz/dl-4-tsc/blob/master/classifiers/resnet.py)37; transformer (https://github.com/keras-team/keras-io/blob/master/examples/timeseries/timeseries_classification_transformer.py)38; and Mamba (https://github.com/thuml/Time-Series-Library/blob/main/models/Mamba.py), TimesNet (https://github.com/thuml/Time-Series-Library/blob/main/models/TimesNet.py), Non-stationary Transformer (https://github.com/thuml/Time-Series-Library/blob/main/models/Nonstationary_Transformer.py), and Informer (https://github.com/thuml/Time-Series-Library/blob/main/models/Informer.py)64. For the self-supervised DNN model, we used TS2Vec (https://github.com/zhihanyue/ts2vec)44 with default parameters except as specified in the Methods section. All experiments and implementation details are described in sufficient detail in the Methods section to support replication of this study.

Competing interests

Y.Y., M.K., K.S. and E.B. are employed by IBM Corporation. J.L. is employed by Cleveland Clinic. M.N., K.N. and T.A. received funding from the Japan Society for the Promotion of Science, KAKENHI [grant numbers 19H01084 (M.N., K.N., and T.A.) and 21K12153 (K.N.)]. The funder did not play any active role in the scientific investigation and reporting of the study. The other authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

The online version contains supplementary material available at 10.1038/s41467-025-61292-1.

References

  • 1.Mirelman, A. et al. Gait impairments in parkinson’s disease. Lancet Neurol.18, 697–708 (2019). [DOI] [PubMed] [Google Scholar]
  • 2.Pieruccini-Faria, F. et al. Gait variability across neurodegenerative and cognitive disorders: Results from the canadian consortium of neurodegeneration in aging (CCNA) and the gait and brain study. Alzheimers Dement17, 1317–1328 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Del Din, S. et al. Gait analysis with wearables predicts conversion to parkinson disease. Ann. Neurol.86, 357–367 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Wilson, J. et al. Gait progression over 6 years in parkinson’s disease: effects of age, medication, and pathology. Front Aging Neurosci.12, 577435 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Gage, J. R., Schwartz, M. H., Koop, S. E. & Novacheck, T. F. The identification and treatment of gait problems in cerebral palsy. (John Wiley & Sons, 2009).
  • 6.Kundrick, A. et al. Idiopathic NPH patients with worse pre-surgical walk test performances demonstrate the greatest improvement in performance post-VPS. Parkinsonism Relat Disord113, (2023).
  • 7.Kadirvelu, B. et al. A wearable motion capture suit and machine learning predict disease progression in Friedreich’s ataxia. Nat. Med29, 86–94 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Skillbäck, T. et al. Slowing gait speed precedes cognitive decline by several years. Alzheimers Dement18, 1667–1676 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Bahureksa, L. et al. The impact of mild cognitive impairment on gait and balance: a systematic review and meta-analysis of studies using instrumented assessment. Gerontology63, 67–83 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Yamada, Y. et al. Combining multimodal behavioral data of gait, speech, and drawing for classification of alzheimer’s disease and mild cognitive impairment. J. Alzheimers Dis.84, 315–327 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Mielke, M. M. et al. Assessing the temporal relationship between cognition and gait: slow gait predicts cognitive decline in the Mayo Clinic Study of Aging. J. Gerontol. A Biol. Sci. Med Sci.68, 929–937 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Yu, H., Kang, K., Jeong, S. & Park, J. Deep vision system for clinical gait analysis in and out of hospital. In Neural Information Processing 633–642 10.1007/978-3-030-36808-1_69 (2019).
  • 13.Kidziński et al. Deep neural networks enable quantitative movement analysis using single-camera videos. Nat. Commun.11, 4054 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Hulleck, A. A., Menoth Mohan, D., Abdallah, N., El Rich, M. & Khalaf, K. Present and future of gait assessment in clinical practice: Towards the application of novel trends and technologies. Front Med Technol.4, 901331 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Masanneck, L., Gieseler, P., Gordon, W. J., Meuth, S. G. & Stern, A. D. Evidence from clinicaltrials.gov on the growth of digital health technologies in neurology trials. npj Digit Med6, 1–5 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Uhlrich, S. D., Uchida, T. K., Lee, M. R. & Delp, S. L. Ten steps to becoming a musculoskeletal simulation expert: A half-century of progress and outlook for the future. J. Biomech.154, 111623 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Zadka, A. et al. A wearable sensor and machine learning estimate step length in older adults and patients with neurological disorders. npj Digit Med7, 142 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Stenum, J., Hsu, M. M., Pantelyat, A. Y. & Roemmich, R. T. Clinical gait analysis using video-based pose estimation: Multiple perspectives, clinical populations, and measuring change. PLOS Digit Health3, e0000467 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Celik, Y., Stuart, S., Woo, W. L. & Godfrey, A. Gait analysis in neurological populations: Progression in the use of wearables. Med Eng. Phys.87, 9–29 (2021). [DOI] [PubMed] [Google Scholar]
  • 20.Ramesh, V. & Bilal, E. Detecting motor symptom fluctuations in parkinson’s disease with generative adversarial networks. npj Digit Med5, 138 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Uhlrich, S. D. et al. OpenCap: Human movement dynamics from smartphone videos. PLoS Comput Biol.19, e1011462 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Nguyen, S. et al. Accelerometer-measured physical activity and sitting with incident mild cognitive impairment or probable dementia among older women. Alzheimers Dement19, 3041–3054 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Khan, A., Galarraga, O., Garcia-Salicetti, S. & Vigneron, V. Deep learning for quantified gait analysis: A systematic literature review. IEEE Access 138932–138957 10.1109/ACCESS.2024.3434513 (2024).
  • 24.Chen, R. J., Lu, M. Y., Chen, T. Y., Williamson, D. F. K. & Mahmood, F. Synthetic data in machine learning for medicine and healthcare. Nat. Biomed. Eng.5, 493–497 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Chen, R. J. et al. Towards a general-purpose foundation model for computational pathology. Nat. Med30, 850–862 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Gao, C. et al. Synthetic data accelerates the development of generalizable learning-based algorithms for X-ray image analysis. Nat. Mach. Intell.5, 294–308 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Zhao, P. et al. T-SMOTE: Temporal-oriented synthetic minority oversampling technique for imbalanced time series classification. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence 2406–2412. 10.24963/ijcai.2022/334.
  • 28.Bicer, M., Phillips, A. T. M., Melis, A., McGregor, A. H. & Modenese, L. Generative deep learning applied to biomechanics: A new augmentation technique for motion capture datasets. J. Biomech.144, 111301 (2022). [DOI] [PubMed] [Google Scholar]
  • 29.Dorschky, E. et al. CNN-based estimation of sagittal plane walking and running biomechanics from measured and simulated inertial sensor data. Front Bioeng Biotechnol8, (2020). [DOI] [PMC free article] [PubMed]
  • 30.Filtjens, B. et al. Automated freezing of gait assessment with deep learning and data augmentation from simulated inertial measurement unit data. In 2023 IEEE 19th International Conference on Body Sensor Networks (BSN) 1–4 10.1109/BSN58485.2023.10330987 (2023).
  • 31.Tung, H.-Y., Tung, H.-W., Yumer, E. & Fragkiadaki, K. Self-supervised learning of motion capture. Adv Neural Inf Process Syst30, (2017).
  • 32.Tan, T., Shull, P. B., Hicks, J. L., Uhlrich, S. D. & Chaudhari, A. S. Self-Supervised Learning Improves Accuracy and Data Efficiency for IMU-Based Ground Reaction Force Estimation. IEEE Trans. Biomed. Eng.71, 2095–2104 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Park, J. et al. Generative GaitNet. In ACM SIGGRAPH 2022 Conference Proceedings 1–9 10.1145/3528233.3530717 (2022).
  • 34.Park, J., Park, M. S., Lee, J. & Won, J. Bidirectional GaitNet: A bidirectional prediction model of human gait and anatomical conditions. In ACM SIGGRAPH 2023 Conference Proceedings 1–9 10.1145/3588432.3591492 (2023).
  • 35.Krishnan, R., Rajpurkar, P. & Topol, E. J. Self-supervised learning in medicine and healthcare. Nat. Biomed. Eng.6, 1346–1352 (2022). [DOI] [PubMed] [Google Scholar]
  • 36.Wang, Z., Yan, W. & Oates, T. Time series classification from scratch with deep neural networks: A strong baseline. In 2017 International Joint Conference on Neural Networks (IJCNN) 1578–1585 10.1109/IJCNN.2017.7966039 (2017).
  • 37.Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L. & Muller, P.-A. Deep learning for time series classification: a review. Data Min. Knowl. Disc33, 917–963 (2019). [Google Scholar]
  • 38.Katrompas, A., Ntakouris, T. & Metsis, V. Recurrence and self-attention vs the transformer for time-series classification: A comparative study. In Artificial Intelligence in Medicine 99–109 10.1007/978-3-031-09342-5_10 (2022).
  • 39.Wu, H. et al. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. in The Eleventh International Conference on Learning Representations (ICLR 2023) (2023).
  • 40.Liu, Y., Wu, H., Wang, J. & Long, M. Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting. Adv. Neural. Inf. Process. Syst. 35, 9881–9893 (2022).
  • 41.Zhou, H. et al. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. Proc. AAAI Conf. Artif. Intell.35, 11106–11115 (2021). [Google Scholar]
  • 42.Hochreiter, S. & Schmidhuber, J. Long Short-Term Memory. Neural Comput9, 1735–1780 (1997). [DOI] [PubMed] [Google Scholar]
  • 43.Gu, A. & Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. Preprint at 10.48550/arXiv.2312.00752 (2024).
  • 44.Yue, Z. et al. TS2Vec: Towards universal representation of time series. Proc. AAAI Conf. Artif. Intell.36, 8980–8987 (2022). [Google Scholar]
  • 45.Morris, R., Lord, S., Bunce, J., Burn, D. & Rochester, L. Gait and cognition: Mapping the global and discrete relationships in ageing and neurodegenerative disease. Neurosci. Biobehav Rev.64, 326–345 (2016). [DOI] [PubMed] [Google Scholar]
  • 46.Reynolds, D. Gaussian mixture models. Encycl. Biometrics741, 659–663 (2009). [Google Scholar]
  • 47.McInnes, L., Healy, J., Saul, N. & Großberger, L. UMAP: Uniform manifold approximation and projection. J. Open Source Softw.3, 861 (2018). [Google Scholar]
  • 48.Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B. & Hochreiter, S. GANs Trained by a two time-scale update rule converge to a local nash equilibrium. Adv Neural Inf Process Syst30, (2017).
  • 49.Wilson, J., Allcock, L., Mc Ardle, R., Taylor, J.-P. & Rochester, L. The neural correlates of discrete gait characteristics in ageing: A structured review. Neurosci. Biobehav Rev.100, 344–369 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Goetz, C. G. et al. Movement disorder society-sponsored revision of the unified parkinson’s disease rating scale (mds-updrs): scale presentation and clinimetric testing results. Mov. Disord.23, 2129–2170 (2008). [DOI] [PubMed] [Google Scholar]
  • 51.Sewell, K. R. et al. The relationship between objective physical activity and change in cognitive function. Alzheimers Dement19, 2984–2993 (2023). [DOI] [PubMed] [Google Scholar]
  • 52.Ricotti, V. et al. Wearable full-body motion tracking of activities of daily living predicts disease trajectory in Duchenne muscular dystrophy. Nat. Med29, 95–103 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Caggiano, V., Dasari, S. & Kumar, V. MyoDex: a generalizable prior for dexterous manipulation. Proc. 40th Int. Conf. Mach. Learn.202, 3327–3346 (2023). [Google Scholar]
  • 54.Yuan, H. et al. Self-supervised learning for human activity recognition using 700,000 person-days of wearable data. npj Digit Med7, 1–10 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Brand, Y. E. et al. Self-supervised learning of wrist-worn daily living accelerometer data improves the automated detection of gait in older adults. Sci. Rep.14, 20854 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Adams, J. L. et al. A real-world study of wearable sensors in parkinson’s disease. npj Parkinsons Dis.7, 1–8 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Jiang, T. et al. RTMPose: Real-time multi-person pose estimation based on MMPose. Preprint at 10.48550/arXiv.2303.07399 (2023).
  • 58.Folstein, M. F., Folstein, S. E. & McHugh, P. R. Mini-mental state’. A practical method for grading the cognitive state of patients for the clinician. J. Psychiatr. Res. 12, 189–198 (1975). [DOI] [PubMed] [Google Scholar]
  • 59.Cao, Z., Simon, T., Wei, S.-E. & Sheikh, Y. Realtime multi-person 2D pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017) 7291–7299 (2017).
  • 60.Palisano, R. et al. Development and reliability of a system to classify gross motor function in children with cerebral palsy. Dev. Med Child Neurol.39, 214–223 (1997). [DOI] [PubMed] [Google Scholar]
  • 61.Lee, J. et al. DART: Dynamic animation and robotics toolkit. J. Open Source Softw.3, 500 (2018). [Google Scholar]
  • 62.Zhu, W. et al. MotionBERT: A unified perspective on learning human motion representations. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV) 15039–15053 (2023). 10.1109/ICCV51070.2023.01385.
  • 63.Bagnall, A. et al. The UEA multivariate time series classification archive, 2018. Preprint at 10.48550/arXiv.1811.00075 (2018).
  • 64.Wang, Y. et al. Deep Time Series Models: A comprehensive survey and benchmark. Preprint at 10.48550/arXiv.2407.13278 (2024).
  • 65.Yoon, J., Jarrett, D. & van der Schaar, M. Time-series generative adversarial networks. Adv Neural Inf Process Syst32, (2019).
  • 66.Nikitin, A., Iannucci, L. & Kaski, S. TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series. Adv Neural Inf Process Syst37, 12042–129061 (2024).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Reporting Summary (1.6MB, pdf)
Source Data (467.7KB, zip)

Data Availability Statement

The dementia datasets are not publicly available as they contain information that could compromise research participant privacy/consent; however, de-identified preprocessed data will be available subject to the approval of the Ethics Committee, University of Tsukuba Hospital. Researchers wishing to access the data should contact the authors, Yasunori Yamada (ysnr@jp.ibm.com) and Miho Ota (ota@md.tsukuba.ac.jp). The expected timeframe for processing access requests is typically three months, but depends on the complexity of the request and the schedule of the Ethics Committee. The requirements on data usage will be determined by the Ethics Committee. The CP dataset is publicly available at SimTK with the identifier 10.18735/j0rz-0k12. The PD dataset is publicly available at IEEE DataPort with the identifier 10.21227/g2g8-1503. Source data are provided with this paper.

The codes central to the main claims are available in open-source repositories. For generating synthetic data, we used Generative GaitNet (https://github.com/namjohn10/GenerativeGaitNet)33 and TimeGAN (https://github.com/AlexanderVNikitin/tsgm/blob/main/tsgm/models/timeGAN.py)65. For supervised DNN model development, we used the following codes with no modification of model architectures and parameters except as specified in the Methods section: fully convolutional network (https://github.com/hfawaz/dl-4-tsc/blob/master/classifiers/fcn.py) and ResNet (https://github.com/hfawaz/dl-4-tsc/blob/master/classifiers/resnet.py)37; transformer (https://github.com/keras-team/keras-io/blob/master/examples/timeseries/timeseries_classification_transformer.py)38; and Mamba (https://github.com/thuml/Time-Series-Library/blob/main/models/Mamba.py), TimesNet (https://github.com/thuml/Time-Series-Library/blob/main/models/TimesNet.py), Non-stationary Transformer (https://github.com/thuml/Time-Series-Library/blob/main/models/Nonstationary_Transformer.py), and Informer (https://github.com/thuml/Time-Series-Library/blob/main/models/Informer.py)64. For the self-supervised DNN model, we used TS2Vec (https://github.com/zhihanyue/ts2vec)44 with default parameters except as specified in the Methods section. All experiments and implementation details are described in sufficient detail in the Methods section to support replication of this study.


Articles from Nature Communications are provided here courtesy of Nature Publishing Group

RESOURCES