Towards data-driven stroke rehabilitation via wearable sensors and deep learning

Aakash Kaku; Avinash Parnandi; Anita Venkatesan; Natasha Pandit; Heidi Schambra; Carlos Fernandez-Granda

. Author manuscript; available in PMC: 2021 Jul 29.

Published in final edited form as: Proc Mach Learn Res. 2020 Aug;126:143–171.

Towards data-driven stroke rehabilitation via wearable sensors and deep learning

Aakash Kaku ^1,^*, Avinash Parnandi ^2,^*, Anita Venkatesan ³, Natasha Pandit ⁴, Heidi Schambra ^5,^†, Carlos Fernandez-Granda ^6,^†

PMCID: PMC8320306 NIHMSID: NIHMS1719971 PMID: 34337420

Abstract

Recovery after stroke is often incomplete, but rehabilitation training may potentiate recovery by engaging endogenous neuroplasticity. In preclinical models of stroke, high doses of rehabilitation training are required to restore functional movement to the affected limbs of animals. In humans, however, the necessary dose of training to potentiate recovery is not known. This ignorance stems from the lack of objective, pragmatic approaches for measuring training doses in rehabilitation activities. Here, to develop a measurement approach, we took the critical first step of automatically identifying functional primitives, the basic building block of activities. Forty-eight individuals with chronic stroke performed a variety of rehabilitation activities while wearing inertial measurement units (IMUs) to capture upper body motion. Primitives were identified by human labelers, who labeled and segmented the associated IMU data. We performed automatic classification of these primitives using machine learning. We designed a convolutional neural network model that outperformed existing methods. The model includes an initial module to compute separate embeddings of different physical quantities in the sensor data. In addition, it replaces batch normalization (which performs normalization based on statistics computed from the training data) with instance normalization (which uses statistics computed from the test data). This increases robustness to possible distributional shifts when applying the method to new patients. With this approach, we attained an average classification accuracy of 70%. Thus, using a combination of IMU-based motion capture and deep learning, we were able to identify primitives automatically. This approach builds towards objectively-measured rehabilitation training, enabling the identification and counting of functional primitives that accrues to a training dose.

1. Introduction

Stroke is the leading cause of disability in the United States, affecting nearly 1 million individuals annually and costing the US an estimated $240 billion (Go et al., 2014; Ovbiagele et al., 2013). Almost two-thirds of stroke patients have significant motor impairment in their upper extremities (UE), which limits their performance of activities of daily living (ADLs) like feeding, bathing, grooming, and dressing. Rehabilitation training, incorporating the repeated practice of ADLs, is the primary clinical intervention to reduce UE impairment. However, rehabilitation is increasingly believed to have a marginal impact on recovery because of its low numbers of functional repetitions, or training dose (Krakauer et al., 2012). In animals models, UE recovery is substantially improved by high-dose functional training delivered early after stroke (Murata et al., 2008; Jeffers et al., 2018). In humans, the optimal training dose to improve recovery is unknown, because no quantitative dose-response studies have been undertaken in the early weeks after stroke. The resulting vacuum of clinical guidelines has perpetuated the delivery of low and variable training doses (Lang et al., 2009).

A major reason for this failure is the absence of precise and pragmatic tools to measure training dose. Most rehabilitation studies use time-in-therapy to approximate dose (Lohse et al., 2018). Although one may intuit that more scheduled time equals more training repetitions, a linear relationship does not hold. In a seminal study observing standard rehabilitation practice, investigators found that the number of trained movements varied widely across clinicians and sessions (Lang et al., 2009), underscoring the imprecision of using time-in-therapy as a proxy for dose. Another approach for measuring dose is manual tallying, where a human observer identifies and counts motions of interest. Because functional motions are fluid and fast, they are difficult to disambiguate in real time. Video recordings aid scrutiny, but analysis is prohibitively time-intensive: in our experience, one minute of videotaped motion requires one hour of analysis by trained coders. This laboriousness makes manual tallying impractical for clinical or research deployment.

A third approach for measuring training dose is pairing motion capture technology with machine learning. Wearable devices such as inertial measurement units (IMUs) generate kinematic data about UE motions. Investigators decide on motions of interest (classes) that they wish to detect. Using a supervised approach, machine learning models can be trained to recognize classes of motions from their kinematic signatures (Parnandi et al., 2019). Once these motions are detected, they can be tallied to a dose.

Recent studies using this approach have sought to classify functional motion (e.g. tying shoelaces) and nonfunctional motion (e.g. arm swinging during walking) (McLeod et al., 2016; Bochniewicz et al., 2017; Leuenberger et al., 2017). In one, chronic stroke patients performed loosely-structured activities while wearing an IMU on their paretic wrist (Bochniewicz et al., 2017). From the IMU recordings, a random-forest model distinguished functional from nonfunctional motion with 70% accuracy. The resulting unit of measure was time spent in functional motion. While the classification performance of this approach is good, the resulting metric is nearly as problematic as measuring time-in-therapy: for example, did more time in functional motion correspond to the performance of more motions, or did it simply take longer to perform the same motions? What kinds of functional motions were made? Without knowing motion content, it is challenging to identify the relationship between repetitions and recovery, or to replicate a successful rehabilitation intervention.

In this work, we sought to address these limitations by taking the first step towards measuring rehabilitation dose. To unpack the motion content of rehabilitation, we focus on functional primitives, single motions or minimal-motions that serve a single purpose (Schambra et al., 2019). There are five classes of functional primitives: reach (motion to contact an object), transport (motion to convey an object), reposition (motion into proximity of an object), stabilize (minimal-motion to keep an object still), and idle (minimal-motion to stand at the ready). Rehabilitation activities can be successfully broken down into these constituent primitives, indicating that primitives are a useful unit of measure (Schambra et al., 2019). As a unit of measure, primitives thus provide motion content information that would inform a dose-response inquiry and the replication of an intervention. We further focus on primitives for three reasons. First, because primitives are a single motion event with a surprisingly consistent phenotype, even in stroke patients (Schambra et al., 2019), automated identification is facilitated. Second, because some stroke patients are unable to fully complete activities, primitives can provide a more nuanced picture of performance. Third, because primitives may be neurally hard-wired (Graziano, 2016; Ramanathan et al., 2006), measuring their execution may enable us to more precisely track central nervous system reorganization after stroke.

To develop an approach that identifies and counts functional primitives in a practical, automated manner, we paired sensor-based motion capture with supervised machine learning. We used an array of inertial measurement units (IMUs) on the upper body to generate richly characterized motion data. We had stroke patients perform a battery of rehabilitation activities, which generated a large sample of primitives with varying characteristics (e.g. speed, duration, extent, location in space). Once the motion data was labeled, we trained various machine learning models to classify primitives. We report our steps for identifying the best-performing algorithm and for optimizing its classification performance. Our approach is illustrated in Figure 1.

Figure 1: — Diagram of the proposed approach for identification of functional primitives. Stroke patients perform a battery of rehabilitation activities while wearing IMU sensors. Machine-learning models are trained to classify the functional primitives from the sensor data.

Generalizable Insights about Machine Learning in the Context of Healthcare

In this work, we performed a systematic comparison of machine learning methods for the task of functional-primitive identification, and propose a model that outperforms existing methodology. Our results suggest several insights that have the potential to generalize to other healthcare applications (especially those involving wearable sensors). First, deep learning methods that directly process multivariate time series of sensor data seem to be significantly more effective than techniques based on handcrafted statistical features. Second, in order to combine data that represents different physical quantities, it may be helpful to map them to a common representation space incorporating an initial module that produces a separate embedding for each quantity. Third, adaptive feature-normalization techniques, such as instance normalization, may increase the robustness of convolutional neural networks to shifts in the distribution of the data, which can occur when the models are applied to new patients. Adaptive normalization uses statistics computed on the test data, in contrast to batch normalization, which uses statistics computed on the training data.

2. Related Work

To the best of our knowledge, only one previous study has used machine learning to identify functional primitives from IMU sensor data (Guerra et al., 2017). The authors used hidden Markov models to learn a latent representation of the sensor data, which was then used to perform classification via logistic regression. They acquired their data based on a few highly structured tasks, primarily consisting of moving objects to/from horizontal and vertical targets. Although this approach is useful to develop proof-of-concept methods, it does not reflect many of the challenges of real-world scenarios where unstructured tasks generate more varied and complex motions. In the present work, we gather data from real-world rehabilitation activities. Modeling functional primitives in this setting requires more complex models such as deep neural networks. In addition, the previous work was based on a small number of mostly mildly impaired patients; the present work increases the sample size 8-fold and captures a wider range of impairment.

Activity recognition using data gathered with wearable sensors is an active area of research in machine learning. However, it is important to emphasize that recognizing activities does not address the problem of measuring rehabilitation dose. Activities are prolonged sequences of motions that achieve several goals (Schambra et al., 2019). Problematically, activities are not standardized: their motion content varies by individual, culture, and environment (Fisher et al., 1992; Teresi et al., 1989). For example, the motions undertaken to perform a cooking activity differ if the meal is breakfast or dinner, or Japanese or German. This variable motion content not only challenges the automated recognition of activities, it also limits the identification of a dose-response relationship and the reproducibility of interventions.

Although activity recognition does not serve dose quantitation, prior studies in this area offer computational directions for classifying patterns of motion. Initially, methodology was mostly based on statistical features processed with techniques such as random forests or fully-connected neural networks (e.g. Elvira et al. (2014); Kwapisz et al. (2011a)). More recently, deep learning methods have been applied to perform activity recognition without precomputing statistical features. Specifically, Wang et al. (2017) showed that a ResNet-style convolutional architecture outperformed traditional non-deep learning methods as well as fully convolutional networks on several activity-recognition datasets (Kwapisz et al., 2011b; Thammasat, 2013; Joshua and Varghese, 2014). Cui et al. (2016) demonstrated that a simple convolutional model performed well when trained on data sampled at multiple scales. Karungaru (2015), Oukrich et al. (2018) and Murad and Pyun (2017) successfully used recurrent networks like Long Short Term Memory (LSTM) and Bi-LSTM for activity recognition. However, Ha et al. (2015) found that convolutional neural networks may outperform recurrent networks for some tasks. Given these conflicting results, in this work we sought to determine the necessity of using statistical features and the performance of recurrent versus convolutional networks for classification of functional primitives.

3. Cohort

3.1. Cohort Selection

We collected motion data from 48 stroke patients in an inpatient rehabilitation setting. Individuals were included if they were ≥ 18 years old, had premorbid right-handed dominance, and had unilateral weakness from either ischemic or hemorrhagic stroke. Individuals were excluded if they had traumatic brain injury; any musculoskeletal or non-stroke neurological condition that interferes with the assessment of motor function; contracture at the shoulder, elbow, or wrist; moderate upper extremity dysmetria or truncal ataxia; visuospatial neglect; apraxia; global inattention; or legal blindness. Table 1 describes the demographic and clinical characteristics of the patients.

Table 1:

Demographic and clinical characteristics of the patients in the cohort.

	Training set	Test set 1	Test set 2
n	33	8	7
Age (in years)	56.3 (21.3–84.3)	60.9 (42.6–84.3)	58.3 (41.1–74.4)
Gender	18 F : 15 M	4 F : 4 M	4 F: 3 M
Time since stroke (in years)	6.5 (0.3–38.4)	3.1 (0.4–5.7)	3.16 (1.1–6.4)
Paretic side (Left : Right)	18 L : 15 R	4 L : 4 R	3 L : 4 R
Stroke type (Ischemic : Hemorrhagic)	30 I : 3 H	8 I : 0 H	2 I : 5 H
Fugl-Meyer Assessment score	48.1 (26–65)	49.4 (27–63)	15.3 (8–23)

Mildly / Moderately-impaired patients (Test set 1)
Method	Random forest	FCNN	CNN	LSTM	Proposed	Ensemble
Balanced accuracy	52.98	58.04	64.01	66.58	69.21	70.11
Severely-impaired patients (Test set 2)
Method	Random forest	FCNN	CNN	LSTM	Proposed	Ensemble
Balanced accuracy	32.95	36.60	38.22	41.76	43.50	44.39

Architecture	ResNet		DenseNet
Normalization	BN	IN	BN	IN
Input embedding	66.57	69.21	65.78	68.11
No input embedding	63.50	66.12	64.01	66.66

Activity	Workspace	Target object(s)	Instructions
Washing face	Sink with a small tub in it and two folded washcloths on either side of the countertop, 30 cm from edge closest to patient	Washcloths, faucet handle	Fill tub with water, dip washcloth on the right side into water, wring it, wiping each side of their face with wet washcloth, place it back on countertop. Use washcloth on the left side to dry face, place it back on countertop
Applying deodorant	Tabletop with deodorant placed at midline, 25 cm from edge closest to patient	Deodorant	Remove cap, twist base a few times, apply deodorant, replace cap, untwist the base, put deodorant on table
Hair combing	Tabletop with comb placed at midline, 25 cm from edge closest to patient	Comb	Pick up comb and comb both sides of head
Don/doffing glasses	Tabletop with glasses placed at midline, 25 cm from edge closest to patient	Glasses	Wear glasses, return hands to table, remove glasses and place on table
Eating	Table top with a standard-size paper plate (at midline, 2 cm from edge), utensils (3 cm from edge, 5 cm from either side of plate), a baggie with a slice of bread (25 cm from edge, 23 cm left of midline), and a margarine packet (32 cm from edge, 17 cm right of midline)	Fork, knife, re-sealable sandwich baggie, slice of bread, single-serve margarine container	Remove bread from plastic bag and put it on plate, open margarine pack and spread it on bread, cut bread into four pieces, cut off and eat a small bite-sized piece

Activity	Workspace	Target object(s)	Instructions
Drinking	Tabletop with water bottle and paper cup 18 cm to the left and right of midline, 25 cm from edge closest to patient	Water bottle (12 oz), paper cup	Open water bottle, pour water into cup, take a sip of water, place cup on table, and replace cap on bottle
Tooth brushing	Sink with toothpaste and toothbrush on either side of the countertop, 30 cm from edge closest to patient	Travel-sized toothpaste, toothbrush with built-up foam grip, faucet handle	Wet toothbrush, apply toothpaste to toothbrush, replace cap on toothpaste tube, brush teeth, rinse toothbrush and mouth, place toothbrush back on countertop
Moving object on a horizontal surface	Horizontal circular array (48.5 cm diameter) of 8 targets (5 cm diameter)	Toilet paper roll	Move the roll between the center and each outer target, resting between each motion and at the end
Moving object on/off a Shelf	Shelf with two levels (33 cm and 53 cm) with 3 targets on both levels (22.5 cm, 45 cm, and 67.5 cm away from the left-most edge)	Toilet paper roll	Move the roll between the center target and each target on the shelf, resting between each motion and at the end

Activity	Workspace	Target object(s)	Instructions ^*
Activity	Workspace	Target object(s)	Proximal >Distal	Proximal <Distal
Washing face	Sink with a small tub in it and two folded washcloths on either side of the countertop, 30 cm from edge closest to patient	Washcloths, faucet handle	Reach to touch faucet knob. Place washcloth in paretic hand and bring to both sides of face.	Open and close faucet. Lift washcloth from basin and wring it out.
Applying deodorant	Tabletop with deodorant placed at midline, 25 cm from edge closest to patient	Deodorant	Reach to touch deodorant. Place deodorant in paretic hand and bring to opposite armpit.	Lift deodorant for 3 seconds. From the horizontal position, rotate deodorant upright and return to original position.
Hair combing	Tabletop with comb placed at midline, 25 cm from edge closest to patient	Comb	Reach to touch comb. Place comb in paretic hand and bring to both sides of head.	Lift comb for 3 seconds.
Don/doffing glasses	Tabletop with glasses placed at midline, 25 cm from edge closest to patient	Glasses	Reach to touch glasses.	Lift glasses for 3 seconds.
Eating	Table top with a standard-size paper plate (at midline, 2 cm from edge), utensils (3 cm from edge, 5 cm from either side of plate), a baggie with a slice of bread (25 cm from edge, 23 cm left of midline), and a margarine packet (32 cm from edge, 17 cm right of midline)	Fork, knife, re-sealable sandwich baggie, slice of bread, single-serve margarine container	Reach to touch each item separately on paretic side. Place fork in paretic hand and bring fork to mouth.	Lift each object on paretic side for 3 seconds.

Joint/segment	Anatomical angle
Shoulder	Shoulder flexion/extension Shoulder internal/external rotation Shoulder ad-/abduction Shoulder total flexion^‡
Elbow	Elbow flexion/extension
Wrist	Wrist flexion/extension Forearm pronation/supination Wrist radial/ulnar deviation
Thorax	Thoracic^* flexion/extension Thoracic^* axial rotation Thoracic^* lateral flexion/extension
Lumbar	Lumbar^† flexion/extension Lumbar^† axial rotation Lumbar^† lateral flexion/extension

Method	Random forest	FCNN	CNN	LSTM	Proposed	Ensemble
Test set 1	59.66	62.43	64.98	68.21	70.67	71.87
Test set 2	33.79	39.62	43.11	48.78	44.44	48.36

L	S	Fold 1	Fold 2	Fold 3	Fold 4	Average
1	2	56.23	54.77	57.97	57.21	56.55
1	3000	56.31	54.73	58.05	57.06	56.54
1	1500	56.27	54.80	58.08	56.99	56.53
1	120	56.22	54.58	58.07	57.06	56.48
1	700	56.29	54.72	57.92	56.95	56.47
1	300	56.36	54.58	57.86	56.92	56.43
1	50	56.19	54.56	57.92	56.82	56.37
5	120	56.27	54.57	57.80	56.82	56.37
5	300	56.23	54.62	57.73	56.80	56.34
1	20	56.06	54.48	57.93	56.88	56.34

# of layers	Dim. of hidden units	Fold 1	Fold 2	Fold 3	Fold 4	Average
4	300	56.86	55.47	56.85	55.49	56.17
4	600	55.44	56.73	58.23	55.79	56.55
4	900	56.02	56.89	58.13	54.83	56.47
8	300	56.52	56.22	58.28	54.4	56.36
8	600	56.54	56.43	58	56.01	56.75
8	900	57.05	56.27	57.31	55.68	56.58
12	300	55.27	56.16	55.83	55.45	55.68
12	600	56.23	56.51	57.68	54.47	56.22
12	900	55.9	56.68	58	55.57	56.54

# of layers	Dim. of hidden units	Fold 1	Fold 2	Fold 3	Fold 4	Average
4	300	58.01	58.12	58.83	56.57	57.88
4	600	56.78	58.06	59.07	55.80	57.43
4	900	57.70	58.21	58.37	56.64	57.73
8	300	57.70	57.64	58.47	56.32	57.53
8	600	57.41	58.62	59.22	56.67	57.98
8	900	57.06	58.22	59.14	56.34	57.69
12	300	57.43	57.49	58.78	55.70	57.35
12	600	57.29	57.01	59.67	56.60	57.64
12	900	57.40	57.97	58.90	55.68	57.49

# of layers	Dim. of hidden units	Fold 1	Fold 2	Fold 3	Fold 4	Average
4	300	58.20	58.19	59.62	57.16	58.29
4	600	59.39	58.86	55.96	56.34	57.64
4	900	58.43	58.57	60.34	57.25	58.65
8	300	57.08	56.60	58.31	56.28	57.07
8	600	58.14	58.59	60.08	57.22	58.51
8	900	58.08	58.86	60.33	57.38	58.66
12	300	57.11	57.10	57.95	55.63	56.95
12	600	57.81	58.14	58.91	56.24	57.78
12	900	58.09	57.92	59.85	57.06	58.23

Dim. of hidden units	Fold 1	Fold 2	Fold 3	Fold 4	Average
400	57.4	61.78	61.68	59.43	60.07
1200	59.05	61.77	62.32	61.64	61.20
2000	61.57	62.85	64.38	60.95	62.44
2800	61.27	63.21	64.11	61.74	62.58
3600	59.79	62.07	64.18	63.46	62.38
4000	60.68	63.28	65.71	64.1	63.44
4500	60.81	63.55	65.19	62.76	63.08

DenseNet-style convolutional model
	Normalization	Fold 1	Fold 2	Fold 3	Fold 4	Average
Input embedding	IN	64.82	65.91	69.14	66.65	66.63
Input embedding	BN	62.34	65.72	66.07	63.98	64.53
No input embedding	IN	62.71	63.64	65.69	61.19	63.30
No input embedding	BN	57.95	60.97	63.47	58.39	60.19
ResNet-style convolutional model
	Normalization	Fold 1	Fold 2	Fold 3	Fold 4	Average
Input embedding	IN	65.59	68.94	69.45	67.09	67.76
Input embedding	BN	62.49	65.57	65.65	64.57	64.57
No input embedding	IN	61.57	59.15	62.21	63.12	61.51
No input embedding	BN	58.75	61.22	61.90	58.47	60.09

PERMALINK

Towards data-driven stroke rehabilitation via wearable sensors and deep learning

Aakash Kaku

Avinash Parnandi

Anita Venkatesan

Natasha Pandit

Heidi Schambra

Carlos Fernandez-Granda

Abstract

1. Introduction

Figure 1:

Generalizable Insights about Machine Learning in the Context of Healthcare

2. Related Work

3. Cohort

3.1. Cohort Selection

Table 1:

3.2. Data Acquisition and Labelling

3.3. Evaluation Protocol

4. Methodology

4.1. Learning Embeddings for Diverse Inputs

Figure 2:

4.2. Robust Generalization via Adaptive Feature Normalization

5. Computational Experiments

Random forest:

Hyperparameters:

Fully-connected neural network:

Hyperparameters:

Recurrent neural network:

Hyperparameters:

Convolutional neural network (CNN):

6. Results

Table 2:

Figure 3:

Table 3:

Figure 4:

Figure 5:

Figure 6:

7. Discussion

Acknowledgments

Appendix A. Description of the Rehabilitation Activities

Appendix B. Description of the Joint Angles

Appendix C. Additional Results

Table 4:

Appendix D. Convolutional Architectures

Table 5:

Figure 7:

Table 6:

Table 7:

Table 8:

Table 9:

Table 10:

Table 11:

Table 12:

Table 13:

Table 14:

Table 15:

Figure 8:

Figure 9:

Figure 10:

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases