Skip to main content
IEEE Open Journal of Engineering in Medicine and Biology logoLink to IEEE Open Journal of Engineering in Medicine and Biology
. 2022 Nov 10;3:202–210. doi: 10.1109/OJEMB.2022.3221306

Longitudinal Trend Monitoring of Multiple Sclerosis Ambulation Using Smartphones

Andrew P Creagh 1,, Frank Dondelinger 2, Florian Lipsmeier 2, Michael Lindemann 2, Maarten De Vos 3,4
PMCID: PMC9788677  PMID: 36578776

Abstract

Goal: Smartphone and wearable devices may act as powerful tools to remotely monitor physical function in people with neurodegenerative and autoimmune diseases from out-of-clinic environments. Detection of progression onset or worsening of symptoms is especially important in people living with multiple sclerosis (PwMS) in order to enable optimally adapted therapeutic strategies. MS symptoms typically follow subtle and fluctuating disease courses, patient-to-patient, and over time. Current in-clinic assessments are often too infrequently administered to reflect longitudinal changes in MS impairment that impact daily life. This work, therefore, explores how smartphones can administer daily two-minute walking assessments to monitor PwMS physical function at home. Methods: Remotely collected smartphone inertial sensor data was transformed through state-of-the-art Deep Convolutional Neural Networks, to estimate a participant's daily ambulatory-related disease severity, longitudinally over a 24-week study. Results: This study demonstrated that smartphone-based ambulatory severity outcomes could accurately estimate MS level of disability, as measured by the EDSS score (Inline graphic: 0.56,Inline graphic0.001). Furthermore, longitudinal severity outcomes were shown to accurately reflect individual participants' level of disability over the study duration. Conclusion: Smartphone-based assessments, that can be performed by patients from their home environments, could greatly augment standard in-clinic outcomes for neurodegenerative diseases. The ability to understand the impact of disease on daily-life between clinical visits, through objective digital outcomes, paves the way forward to better measure and identify signs of disease progression that may be occurring out-of-clinic, to monitor how different patients respond to various treatments, and to ultimately enable the development of better, and more personalised care.

Keywords: Deep learning, digital biomarkers, gait, multiple sclerosis, smartphones

I. Introduction

Neurodegenerative diseases, such as multiple sclerosis (MS), frequently fluctuate over time, and patient-to-patient, ensuring that it is notoriously difficult to quantify effective therapeutic interventions and disease management techniques. Current in-clinic assessments are often too infrequent to track changes in MS impairment over time. Importantly, it has been shown that earlier identification of changes in MS impairment are important to identify and provide better therapeutic strategies [1]. As a result, there exists a great opportunity to augment current clinical examination strategies, to integrate methods that accurately and remotely monitor disease-related changes and deterioration, that may occur at home and between clinician visits.

Although MS follows a highly heterogeneous and subject-specific disease course, the disease profiles can be grouped into four clinical phenotypes which are based on disease progression [2], [3]: the majority of PwMS will initially experience Relapsing-remitting MS (RRMS), a state dominated by sudden acute symptoms developing (a “relapse”) over days before generally plateauing over weeks or months [4], termed “remission”. RRMS generally affects 85% of PwMS and disease activity typically occurs acutely at a sub-clinical level. Secondary-progressive MS (SPMS) can occur in some RRMS patients, where the disease course continues to worsen with or without periods of remission. Half of RRMS patients will go onto develop SPMS [5], [6], [7]. Those experiencing consistent but worsening symptoms can be thought of as having Primary-progressive MS (PPMS) [4], [5], [7] (roughly 10% of PwMS [6]). Progressive-relapsing MS is more rare (affecting fewer than 5% of PwMS); it occurs from diagnoses as a progressive disease course, with periods of relapse, but without any remission periods.

Digital smartphone-based assessments offer the ability to objectively monitor disability levels in people with multiple sclerosis (PwMS) from out-of-clinic, at home environments [8], [9], [10], [11], [12]. For instance, smartphone-based monitoring was exemplified in a recent investigation by Bove et al. (2015) [13], with this study demonstrating the feasibility of administering daily smartphone-based tasks to PwMS over a one-year period. These technologies can provide new data-driven metrics for clinical decision-making during in-clinic visits [14] and may be more accurate than conventional clinical outcomes, recorded at infrequent visits, to detect subtle, progressive, sub-clinical changes or trends in long-term PwMS disability [13].

Alterations during ambulation (gait) due to MS are a amongst the most common indication of MS impairment [17], [18], [19], [20], [21], [22]. It has been shown that gait impairment affects quality of life, health status and productivity [23] in PwMS with the prevalence of these reported impairments between 75% and 90% [24]. PwMS can display postural instability [18], gait variability [19], [20], [21] and fatigue [22] during various stages of disease progression. The gold-standard assessment of overall disability in MS is the Expanded Disability Status Scale (EDSS) [25], and specific functional domain assessments such as the Timed 25-Foot Walk (T25FW), which is part of the Multiple Sclerosis Functional Composite score [26], [27], and the Two-Minute Walk Test (2MWT) which also assesses physical gait function and fatigue in PwMS [28]. In recent years however, there has been a shift towards the adoption of body worn sensors to objectively evaluate ambulatory performance in PwMS, circumventing the need for resource-intensive and expensive gait laboratory equipment, but also opening up the possibility to measure physical function outside of standard clinical settings [14], [20], [21], [21], [29], [30], [31], [32], [33], [33], [34].

This study builds upon our previous investigations [12], [15], [35], where we have shown how inertial sensors contained within consumer-based smartphones can be used to characterise gait impairments in PwMS from a remotely administered Two-Minute Walk Test (2MWT). The latter study first introduced how state-of-the-art Deep Convolutional Neural Networks (DCNN) can be applied to remote 2MWT smartphone sensor data to determine a study participants' status: such as healthy, PwMS with mild, or PwMS with moderate disability. The work presented here aims to evaluate how these DCNN severity predictions from daily 2MWTs can characterise the status of healthy participants versus PwMS with mild, or PwMS with moderate disability over a 24 week period.

II. Methods

A. Data

The FLOODLIGHT (FL) proof-of-concept (PoC) app was trialled in a 24-week, prospective study in PwMS and HCs (NCT02952911) to assess the feasibility of remote patient monitoring using smartphone (and smartwatch) devices [11], [16]. Participants were provided with a preconfigured smartphone (Samsung Galaxy S7) and smartwatch (Motorola 360 Sport) with the Floodlight PoC app installed. A total of 97 participants (24 HC subjects; 52 mildly disabled, PwMSmild, EDSS [0-3]; 21 moderately disabled PwMSmod, EDSS [3.5-5.5]) contributed data which was recorded from a 2MWT performed out-of-clinic [15]. Subjects were requested to perform a 2MWT daily over a 24-week period, and were clinically assessed at baseline, week 12 and week 24. For further information on the FL app, dataset, and population demographics we direct the reader to [16] and specifically to our previous work [12], [15], which this study expands upon. Table 1 depicts the population demographics for this study. All participants provided informed consent, and the ethical approval was obtained from ethics committee of the Hospital Universitari Vall d'Hebron, Barcelona, Spain and the institutional review board of the University of California San Francisco, San Francisco, CA, USA, prior to study initiation.

TABLE 1. Population Demographics.Inline graphic Clinical Scores Taken as the Average Per Subject Over the Entire Study, Where the Mean Inline graphic Standard Deviation Across Population are Reported; RRMS, Relapsing-Remitting MS; PPMS, Primary-Progressive MS; SPMS, Secondary-Progressive MS; EDSS, Expanded Disability Status Scale; T25FW, the Timed 25-Foot Walk; EDSS (amb.) Refers to the Ambulation Sub-Score as Part of the EDSS; [s], Indicates Measurement in Seconds.

HC (n = 24) PwMSmildInline graphic (n = 52) PwMSmodInline graphic (n = 21)
Age 35.6 Inline graphic 8.9 39.3 Inline graphic 8.3 40.5 Inline graphic 6.9
Sex (M/F) 18/6 16/36 7/14
RRMS/PPMS/SPMS 52/0/0 14/3/4
EDSS 1.7 Inline graphic 0.8 4.2 Inline graphic 0.7
EDSS (amb.) 0.1 Inline graphic 0.3 1.9 Inline graphic 1.5
T25FW [s] 5.0 Inline graphic 0.9 5.3 Inline graphic 0.9 7.9 Inline graphic 2.2

Inline graphicFor more information on the study population we refer the reader to [15] and [16];

Inline graphicPwMS with average EDSS Inline graphic;

Inline graphicPwMS with average EDSS Inline graphic;

B. Estimating Ambulatory-Related Disease Severity From Smartphone Sensor Data

Smartphone inertial sensor data was recorded while participants performed a daily, at home, two minute walk test (2MWT). The raw accelerometer sensor data from each 2MWT were then partitioned into multiple vector sequences (epochs), of 2.56 sec (128 samples/epoch) with 50% overlap between adjacent windows. A Deep Convolutional Neural Network (DCNN) was then trained to classify a given epoch as having been performed by a HC, PwMSmild or PwMSmod participant. The DCNN model implemented has previously been introduced in [12], where the network was first pre-trained on the UCI smartphone-based Human Activity Recognition (HAR) dataset, and thereafter fine-tuned on the data in FL for MS severity classification. Briefly, a DCNN applied a series of one-dimensional kernels on the raw sensor epoch Inline graphic with an input (channel 1-4): Inline graphic, where a are acceleration vectors for the Inline graphic-, Inline graphic- and Inline graphic- components containing samples Inline graphic and Inline graphic refers to original orientation invariant signal magnitude. The DCNN consisted of four causal convolutional blocks with batch normalisation (BN) layers (Inline graphic, Inline graphic): the 1st block extracted 32 fixed filters with a width of 9 samples, stride length of 1 (Inline graphic), with Inline graphic-norm regularisation (Inline graphic); the 2nd and 3rd blocks learned 64 filters, with width (Inline graphic); the 4th block learned 128 filters with a width of 6 (Inline graphic), followed by a final 3-class dense fully connected softmax layer. Max pooling operations were also applied in the 2nd and 4th layers with pool size p=2 and down-scaled by stride factor s=2. Smartphone orientation augmentation was performed by randomly rotating sensor-channel axis during training [36]. The DCNN was trained to minimise a multi-class categorical cross-entropy loss function for Inline graphic to learn the optimal network weights Inline graphic, using an Adam optimization algorithm with a learning rate Inline graphic, as well as Inline graphic and Inline graphic which determined the exponential decay rates for the moment estimates of the gradient [37], [38]. The network outputs are interpreted as Inline graphic. As such, Inline graphic can be thought of as the probability that a given epoch Inline graphic belonged to class Inline graphic. A continuous estimate of severity, the predicted level of MS disability, can then be captured by taking an average of all epoch predictions over a test for a given assessment day, Inline graphic such that:

B.

where Inline graphic are the number of windowed epochs for a given test date, Inline graphic, and Inline graphic lies in an ordinal range of Inline graphic. Therefore Inline graphic will be continuous such that Inline graphic and can conceptualised as a naïve estimate of MS disease severity, mapping a predicted level of disability ranging from healthy to mild to moderate.

Models were trained using a stratified, subject-wise, 5-fold cross-validation (CV), with subjects randomly partitioned into one of k=5 folds, as described previously in [15]. One set was denoted the training set (in-sample), which was further split into a smaller set for validation, using roughly 10% of the training subjects. Predictions were evaluated on all available 2MWTS per subject in each of the (out-of-sample) test sets.

C. Longitudinal Trend Monitoring of Remote Smartphone-Based Outcomes

Longitudinal trends of specific participants were examined as a time-series by considering the severity estimates Inline graphic of repeated 2MWTs over all their available data for the duration of the FL study. While participants were requested to perform a daily Two-Minute Walk Test (2MWT), some test-dates may be missing; it was also observed that various participants had differing adherence rates during the study. The number of valid 2MWT recordings contributed for each subject group over the study duration is presented in appendix Fig. 5. Further information related to participant adherence in the study is reported previously in [11], [16]. As the goal of this work was to perform longitudinal analysis of participants severity, namely visualise the average severity trends over time, missing 2MWT outcomes were first imputed using piecewise linear interpolation (PLI) [39], by considering Inline graphic as a time-series to impute missing test severity observations on a given date. Note: imputed 2MWTs were only included for calculation of average trend estimation for individual participants and not for model evaluation. Next, a simple trend estimation was applied to the time sequence of severity estimates (Inline graphic) across days (Inline graphic) using a Inline graphic centred linear moving average filter (MAF).

C.

where Inline graphic is the input sequence (severity estimates) and Inline graphic is the output (filtered) sequence (moving severity estimate) for each Inline graphic day; Inline graphic defines the order of the filter, in this case the number of days Inline graphic used in the moving average. A 7-day window was implemented in order to capture the trends in Inline graphic over the study duration.

Fig. 1.

Fig. 1.

Demonstration of how deep learning algorithms can transform smartphone measurements to predict MS patient severity symptoms between clinical visits. Illustration of Deep Convolutional Neural Network (DCNN) applied to raw smartphone inertial sensor data collected from a remotely executed Two-Minute Walk Test (2MWT), performed daily for 24-weeks.

D. Statistical Analysis

The association between estimated continuous disease severity and EDSS was tested using (linear) Pearson's (Inline graphic) and (non-linear) Spearman's (Inline graphic) correlation coefficient. A non-parametric Kruskal-Wallis (KWt) test by ranks assessed the median severity estimate between HC, PwMSmild, and PwMSmod groups. Statistical differences in smartphone severity estimates were also investigated within participants over the duration of the study. For instance, mean differences in severity estimates before and after specific events, such as the reporting of a relapse, were assessed using a Inline graphic-test. In cases where severity estimates had unequal variances before/after an event, as determined by a Brown-Forsythe (BF) test by medians [40], a Welch's Inline graphic-test correction was applied. Furthermore, differences in median severity estimates before/after each event were also assessed with a non-parametric Mann-Whitney U test.

III. Results

A. Digitally Estimated Severity Outcome

A continuous disease severity outcome was created by averaging all 2MWT predictions (i.e. HC, PwMSmild, PwMSmod) for each participant, calculated from each of the out-of-sample test sets during cross-validation. A disease severity outcome therefore mapped a posterior probability ranging from healthy, to mild, and to moderate for each subject. The distribution of the average severity per subject was displayed in Fig. 2, and demonstrated the positive relationship between average severity outcome and average EDSS per participant (over all available EDSS scores for that participant), (Pearson's Inline graphic; Spearman's Inline graphic; Inline graphic, Inline graphic). Model classification performance can also be determined by thresholding the estimated continuous level of disability in Fig. 2, at the boundaries between HC, PwMSmild, and PwMSgroups, as reported in [12].

Fig. 2.

Fig. 2.

The relationship between the continuous disease severity outcome estimate, EDSS and subject group. Figure depicts the scatter plot demonstrating the positive correlation (Inline graphic; Inline graphic; Inline graphic) between the average severity outcome and average EDSS score per subject. A DCNN model was constructed based on the average class predictions (HC, PwMSmild, PwMSmod) per subject over all 2MWTs, creating an estimated continuous prediction probability distribution, ranging from healthy to moderate MS. Each point therefore represents the average estimated severity outcome (probability) for that subject. A black line represents the line of best fit between severity and EDSS (Inline graphic).

B. Longitudinal Characterisation of Digitally Estimated Severity Outcomes

Disease severity outcomes were evaluated for each 2MWT performed per subject. As a result, longitudinal trends in ambulatory impairment can be monitored by examining daily 2MWT estimates for participants over the duration of the FL study. While the 24-week duration of the study and relatively low level of baseline impairment of the participants meant that we did not observe meaningful progression at the study cohort level, we could still investigate the ability of our methodology to capture participant-specific longitudinal trends. For example, Fig. 3 examines the longitudinal severity estimate outcome for various representative correctly classified HC, PwMSmild and PwMSmod participants. Individual 2MWT ambulatory severity estimates are depicted from the Inline graphic week until study completion in week 24, where dashed black lines represented site-visits where participants were assessed clinically. Blue lines depicted the 7-day average trend in severity outcomes.

Fig. 3.

Fig. 3.

Panel plot illustrating the longitudinal severity estimate outcome for correctly classified HC, PwMSmild and PwMSmod subjects. Depicted are the estimated level of disability for an example (a) HC subject; (b) a PwMSmild subject; (c) a PwMSmod subject during the study. Each circle represents the severity outcome estimate for a 2MWT performed on a given date. Shaded blue lines depict the and 7-day trend, represented by the Inline graphic-point centred moving average across days (Inline graphic). Missing test dates (which are not depicted) were imputed using piecewise linear interpolation. Dashed black lines represent site-visits where the participant was assessed clinically.

Fig. 3(a) first depicted a HC subject. This participant was examined at baseline (week 0; EDSS 0, T25FW: 5 [s]), midway through the study (week 12; EDSS 0;1 T25FW: 4.5 [s]) and at the study completion (week 24; EDSS: 0; T25FW: N/A2 [s]). It was observed that this subject was predicted as healthy with a low severity, consistently across the entire study. Many variations in severity outcomes were smoothed out across the 7-day moving average. Similarly, Fig. 3(b) demonstrated a correctly classified, stable, PwMSmild participant over the duration of the study. This participant was also clinically examined at week 0 (EDSS: 2.5; T25FW: 6.8) week 12 (EDSS: 2.5; T25FW: 6.5 [s]) and at week 24 (EDSS: 3; T25FW: 6.6). In comparison, Fig. 3(c) demonstrated a stable PwMSmod subject. This participant was examined at baseline (week 0; EDSS 3.5, T25FW: 5.4 [s]) and midway through the study3 (week 12; EDSS 4.5; T25FW: 4.9 [s]).

During the FLOODLIGHT study, four PwMS subjects reported relapses using the FL application on their smartphone. These participants' ambulatory-based 2MWT severity estimates were investigated in Fig. 4.

Fig. 4.

Fig. 4.

Panel plot illustrating the longitudinal severity estimate outcomes for participants who self-reported a relapse using the FLOODLIGHT smartphone application during the study. Each circle represents the severity outcome estimate for a 2MWT performed on a given date. Shaded blue lines depict the and 7-day trend, represented by the Inline graphic-point centred moving average across days (Inline graphic). Missing test dates (which are not depicted) were imputed using piecewise linear interpolation. Dashed black lines represent site-visits where the participant was assessed clinically. Dates of self-reported relapse onset are represented in black. Note: the participant in Fig. 4(d) also reported (non-relapse) adverse clinical events occurring on non-specified dates between weeks 8 and 12.

Fig. 4(a) depicts the longitudinal severity outcome trend for a PwMSmild subject who reported a relapse during the FL PoC study. A black line depicts the date of relapse on-setting during week 3, which was recorded by the participant using the FLOODLIGHT application on their smartphone. Dashed black lines represent site-visits where the participant was assessed clinically. This subject was examined at baseline (week 0; EDSS 1.5, T25FW: 4.9 [s]), week 12 (EDSS 1.5; T25FW: N/A4) and at the study completion in week 25 (EDSS: N/A5; T25FW: 5.5 [s]). In week 4, 7 days after reporting a relapse, the participant was assessed during an “unscheduled visit” where they exhibited a worsening of MS symptoms, i.e. an increase in EDSS and gait related T25FW (EDSS: 2.5; T25FW: 7.5 [s]). Their relapse was evaluated as a spinal topography outbreak.

Fig. 4(b) assesses the severity outcomes for another PwMSmild subject, with a clinical examination at baseline (week 0; EDSS: 1.5, T25FW: 4.9 [s]), and during visit 2 (week 12; EDSS: 1.5, T25FW: 6 [s]). This participant reported a relapse during week 23 where their EDSS rose by +1 during their clinical examination during study completion (EDSS: 2.5, T25FW: 5.9 [s]).

Fig. 4(c) examines the longitudinal severity outcome trend for a PwMSmod subject who reported a relapse during the FL PoC study. A black line depicts the date of relapse on-setting during week 13. This subject was clinically examined at baseline (week 0; EDSS 3.5, T25FW: 4.9 [s]), midway through the study (week 12; EDSS 3.5; T25FW: 6.6 [s]) and at the study completion (week 24; EDSS: 3.5; T25FW: 10.3 [s]).

Lastly, the ambulatory severity estimates for a PwMSmod participant who self-reported a relapse is shown in Fig. 4(d). This participant's clinical examination was reported at baseline (week 0; EDSS: 3.5; T25FW: 7.8 [s]), mid-study as (week 12; EDSS: 4; T25FW: 10.5 [s]) and during study completion as (week 24; EDSS: 4; T25FW: 11.5 [s]). During the clinical examination in week 12, this participant also reported non-relapse adverse clinical events, occurring on unspecified dates sometime between weeks 8 and 12. As such, the time between week 8 and week 12 is marked in Fig. 4(d) beginning with a long-dashed line. This PwMSmod subject was adherent to completing daily 2MWTs, with severity outcomes estimates consistently evaluated as moderately disabled up until week 8. Thereafter, the comparative number of completed daily 2MWTs dropped dramatically until study completion. It was observed that the stability of severity estimates predicted as PwMSmod diminished, with both greater variability between severity estimates and to the adherence of the participant to complete daily 2MWTs. Furthermore, a self-reported relapse was reported by this subject during week 22 using the FL application on their assigned smartphone, as marked by the solid black line.

IV. Discussion

The FLOODLIGHT PoC study demonstrates the capability of smartphone-based inertial sensor measurements to monitor ambulatory-related impairments during a remotely administered 2MWT to PwMS daily over a 24 week period. In this work, it was shown how a deep network classification model could (naïvely) estimate the level of participant disability from ordinal classification categories. Severity outcome estimates stratified across HC and PwMS groups and were strongly correlated to disease status (Inline graphic; Inline graphic, Inline graphic), as measured by the EDSS — considered the ground-truth assessment in PwMS [25]. For instance, no misclassifcation of HC as PwMSmod was observed, or vice-versa, indicating that severity estimates were reflective of true disease status (Fig. 2). More interestingly, those subjects at classification boundaries displayed severities representative of their clinical assessments. For instance, those with EDSS just above 3.5 (i.e. PwMSmod) were misclassified more as PwMSmild compared to those with EDSS much greater than 3.5, implying that a reflective estimate of disease severity could be captured by transforming a DCNN model into a simple probabilistic outcome per subject.

A. Examining Participant-Level Longitudinal Trends

The longitudinal patterns of healthy controls versus participants with varying manifestations of MS severity could be characterised by examining severity outcomes over the duration of the FL study for individual subjects. For instance, Fig. 3 depicted examples of stable trends for correctly classified HC, PwMSmild and PwMSmod subjects respectively. While participants had some incorrect predictions, the mean severity prediction over all repeated tests reflected the participant's true class grouping.

Evaluating subject's performance longitudinally suggested that severity estimates may be sensitive to capture MS-symptom worsening. An intriguing observation related to the stable PwMSmod participant depicted in Fig. 3(c), who was mainly predicted with a severity of PwMSmod, with a relatively consistent 7-day average. Some sequences of tests were predicted as milder however, particularly before the midway clinical visit in week 13. Interestingly, after week 13, this subjects' EDSS rose by Inline graphic to 4.5. A Brown-Forsythe (BF) test demonstrated that this subject had greater variance in their severity outcome before this clinical visit compared to after (BF, Inline graphic). Median severity outcomes were not significantly different between these time-points (Mann-Whitey U test, p=0.34), however mean severity outcomes were found to be significantly lower before this clinical visit than after (Welch's Inline graphic-test: pInline graphic0.05). It should be noted however that a change in EDSS scores between clinical visits did not correspond to significant changes in ambulatory-based severity estimates for all participants.

B. Examining Participant-Level Relapse Events

During the FL study, four participants experienced relapses which they self-reported using the application on their smartphones. Longitudinal analysis of the trajectories of daily severity estimates from these subjects revealed useful insights into the manifestation of relapses expressed in remote inertial sensor data. For instance, two subjects displayed an increased severity outcome up to and around the data of reporting a relapse (Fig. 4(a) and 4(c)), suggesting that sensor-based ambulatory outcomes could potentially be sensitive enough to remotely capture relapse events.

Observing the PwMSmild participant who reported a relapse (Fig. 4(a)), severity estimates increased after reporting a relapse, which corroborated with a worsening of clinically assessed symptoms from baseline (week 0; EDSS 1.5, T25FW: 4.9 [s]) to the unscheduled clinical visit, which was prompted by the relapse (EDSS: 2.5; T25FW, 7.5 [s]). Examination of severity outcomes leading up to week 3 demonstrated consistent “mild” trends using 7-day moving averages. Interestingly, after the date of onset of self-reported relapse, severity estimates rose towards “moderate,” indicating that MS symptom manifestation had worsened. Longer term analysis demonstrated that there was a significantly higher variability in predicted severity outcomes after relapse date than before (BF, pInline graphic). This subject was further assessed during week 12, where their EDSS returned to as it was reported at baseline (EDSS 1.5; T25FW: N/A). Severity outcomes also returned to consistently “mild” towards the end of the study from weeks 18 onwards, where median (U test, pInline graphic) and mean (Welch's Inline graphic-test, pInline graphic) severity outcomes where not significantly different before- and after-relapse. This subject was predicted as PwMSmild over their entire 2MWT outcome measures.

In contrast, the example participant presented in Fig. 4(b) did not exhibit any significant changes in severity estimates around the date of reporting a relapse in week 23. However, it could also be noted that this subject's EDSS scores rose by +1 between week 12 and 24, and their ambulatory estimated outcomes were more variable after week 12 (BF, pInline graphic).

Fig. 4(c) depicted a relapsing PwMSmod subject, with severity estimates that were consistently evaluated as “mild,” up until week 13, where this participant reported a relapse on-setting using the FL application on their smartphone. Severity outcomes then increased towards “moderate” during week 13 and peaked at week 14, around the suspected relapse date reported at the end of week 13. Thereafter, severity outcomes stabilised to “mild” before becoming more variable and “moderate” until the end of the study. Considering the relapse reporting date as a threshold, it was found that severity outcomes were significantly “milder” before relapse (where severity outcomes evaluated as PwMSmild) than after relapse on-setting (where severity outcomes evaluated as PwMSmod) when testing between mean (Welch's Inline graphic-test, pInline graphic) and between median (U test, pInline graphic) severity outcomes. A BF test also signified that severity outcome variability was higher after relapse on-setting than before (pInline graphic). This subject was misclassified as PwMSmild using all available 2MWTs, but interestingly was narrowly labelled a PwMSmod and not a PwMSmild subject using their available EDSS scores (EDSS, 3.5 Inline graphic 0).

Finally, Fig. 4(d) describes the longitudinal severity outcomes for a PwMSmod participant who was consistently estimated as having moderate disability for the first 9 weeks of the study period. During the mid-way assessment at week 12, this participant recalled that non-MS related adverse clinical events had occurred at unspecified points in the previous four weeks. Interestingly, adherence to executing daily 2MWTs dropped during this period, where a long-dashed line marks the beginning between weeks 8 and 12 in Fig. 4(d). It was observed that after week 12, the variability in sensor-based ambulatory severity estimates increased, where predictions fluctuated between healthy and moderate. Furthermore, this participant was non-adherent at providing daily 2MWTs after week 12, in comparison than the first 9 weeks. Towards the end of the study, this participant then self-reported an MS-related relapse as having occurred in week 22. As such, we need to consider not only that sensor-based outcomes could remotely evaluate a patient's level of disability, but that an absence of available data itself might also be indicative of changes in disability status.

C. Limitations

Despite the potential of smartphone-based outcomes to remotely monitor individual participant's ambulatory function longitudinally, there are several limitations of this study which must be considered. Importantly, the severity outcomes explored in this work were naïve estimates; although outcomes captured a trend of increased impairment with higher severity (as modelled by EDSS, Fig. 2), they should not be considered an exact measure of MS, nor a surrogate clinical outcome to permit any clinical diagnosis, or replace in-clinic assessments.

It should also be noted that the estimated level of participant disability were not always accurate, there were many subject misclassifications, as evident in Fig. 2. Particularly, some HC were incorrectly estimated as MS, as well as some PwMSmod who were underestimated to have milder level of disability. In this work, we have only shown correct and stable estimate examples (Fig. 3), however, it must be noted that some participants, both healthy or with MS, followed irregular trends or whose estimated level of disability were consistently incorrect.

Planned future work will aim to further characterise misclassifications and participant variance. Given MS is heterogeneous disease, where symptoms fluctuate day-to-day, it must be considered that sometimes MS symptoms can be absent for a given day, or sequence of days. For instance, this may help explain why some PwMS participant 2MWTs can be evaluated as healthy. It also must be acknowledged that severity estimates were based solely on 2MWT performance, an assessment originally only intended to investigate ambulatory function and fatigue in PwMS through the measurement of distance travelled [41], [42], [43]. Many participants in the FL PoC study may not have had ambulatory-related dysfunction, or whose milder level of disease did not impair their gait, compared to the healthy control cohort. As previously outlined, by definition PwMS with EDSSInline graphic3 may have little to no gait impairment [12], [15]. Furthermore, the blunt demarcation of mild and moderate MS based exclusively on the clinical EDSS score — which incorporates, but is not a direct measure of ambulatory function — could lead to an unreliable assignment of those “mild” versus “moderate” MS ambulatory function. For example, some participants might exhibit “moderate” symptoms that are more apparent in other functional domains, or have subtle alteration in ambulatory ability that a remote 2MWT assessment will not be sensitive to.

There are also several limitations associated with remote 2MWTs, which have been discussed previously in [12], [15], and must also be considered in the context of remotely estimating MS ambulatory severity. For example, although the 2MWT was standardised and analogous to that of an in-clinic performed assessment, the FL 2MWT was a remotely executed out-of-clinic assessment. As such, the performance of 2MWT can be highly influenced by the testing environment, such as the length of the hallways, the number and frequency of subject turns, or other factors which we cannot determine remotely [15].

In this work we proposed that averaging over categorical class predictions can create a simple and naïve estimate of ambulatory severity, but there could potentially be more informative and robust methodological approaches to learning disease severity estimates [44], [45]. It should be acknowledged that our DCNN model did not truly utilise the time-series nature of repeated 2MWT measurements from the FL PoC study. Each repeated test was treated as independent, and as such, trajectories did not incorporate any temporal information across a population or within a subject (for example, whether the previous day's Inline graphic test could affect the outcome at Inline graphic or Inline graphic). It would be assumed that this is critically missing temporal information which could help build more reliable and accurate longitudinal models, and should be considered as a key next step for future work. For instance, the repeated FL assessments, and therefore sensor outcomes that were extracted, could be analysed with models that exploit this aggregation of temporal information directly [46], [47]. Another limitation of averaging posterior class predictions is that we also average over uncertain or marginal predictions, often introducing a noise and variability into the unified estimate. Indeed, constructing more robust severity outcomes would not only explore more accurate modelling techniques, but should also aim to incorporate the data captured from other functional domains in FL, such as dexterity and cognition.

Nonetheless, we believe that the work presented in this study to be of important value, emphasising the potential of remote sensor outcomes to augment current in-clinic acquired patient information. The long-term remote monitoring of PwMS function could open up the space for true personalisation: the clustering of disease trajectories or similar patients, estimating the likelihood of disease progression, quantifying response to different treatments as a population or an individual, as well catching the mutable patterns of MS disease that are only visible out-of-clinic and as a function of time.

V. Conclusion

This work demonstrates the capability of smartphone technologies to administer daily ambulatory assessments to patients at home, and how that sensor data recorded can be transformed through state-of-the-art deep networks, to remotely monitor ambulatory-related level of disability over a 24 week period. The rapid development of frequent, objective, and sensitive digital measures of MS disability that can be administered remotely could revolutionise routine in-clinic assessments for PwMS. In the years to come, smartphone-based outcomes may identify and monitor digital signs of MS-related degeneration, ultimately informing better disease management techniques, to learn how different patients respond to various treatments, and potentially enabling the development of personalised therapeutic interventions.

Acknowledgment

The authors would like to thank all staff and participants involved in capturing test data. During the completion of this work, A. P. Creagh was a Ph.D. student at the University of Oxford and acknowledges the support of F. Hoffmann-La Roche Ltd.; F. Dondelinger and F. Lipsmeier are employees of F. Hoffmann-La Roche Ltd; M. Lindemann is a consultant for F. Hoffmann-La Roche Ltd. via Inovigate; M. De Vos has nothing to disclose.

Appendix A. Participant 2MWT Adherence

Fig. 5.

Fig. 5.

Participant adherence rates. Each line depicts the number valid Two Minute Walk Test (2MWT) recordings contributed for each subject group for the study duration.

Funding Statement

This work was supported in part by the F. Hoffmann-La Roche Ltd., in part by the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre (BRC), and in part by the Flemish Government through the Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen Programme.

Footnotes

1

Note: an EDSS of zero in this case refers to a normal neurological exam, the subject is healthy and has no disability.

2

N/A denotes scores not assessed at this visit.

3

Note: clinical assessment scores were not made available for this participant at study completion.

4

See footnote 3

5

See footnote 3

Contributor Information

Andrew P. Creagh, Email: andrew.creagh@eng.ox.ac.uk.

Frank Dondelinger, Email: frank.dondelinger@roche.com.

Florian Lipsmeier, Email: florian.lipsmeier@roche.com.

Michael Lindemann, Email: michael.linde-mann@roche.com.

Maarten De Vos, Email: maarten.devos@esat.kuleuven.be.

References

  • [1].Comi G. et al. , “Effect of early interferon treatment on conversion to definite multiple sclerosis: A randomised study,” Lancet, vol. 357, no. 9268, pp. 1576–1582, 2001. [DOI] [PubMed] [Google Scholar]
  • [2].Hauser S. L. and Goodin D. S., Multiple Sclerosis and Other Demyelinating Diseases. New York, NY, USA: McGraw-Hill, 2014. [Google Scholar]
  • [3].Lublin F. D. and Reingold S. C., “Defining the clinical course of multiple sclerosis: Results of an international survey,” Neurology, vol. 46, no. 4, pp. 907–911, 1996. [DOI] [PubMed] [Google Scholar]
  • [4].Goldenberg M. M., “Multiple sclerosis review,” Pharm. Therapeutics, vol. 37, no. 3, 2012, Art. no. 175. [PMC free article] [PubMed] [Google Scholar]
  • [5].Confavreux C., Vukusic S., Moreau T., and Adeleine P., “Relapses and progression of disability in multiple sclerosis,” New England J. Med., vol. 343, no. 20, pp. 1430–1438, 2000. [DOI] [PubMed] [Google Scholar]
  • [6].Fisher E., Lee J., Nakamura K., and Rudick R. A., “Gray matter atrophy in multiple sclerosis: A longitudinal study,” Ann. Neurol.: Official J. Amer. Neurological Assoc. Child Neurol. Soc., vol. 64, no. 3, pp. 255–265, 2008. [DOI] [PubMed] [Google Scholar]
  • [7].Steinman L., “Multiple sclerosis: A two-stage disease,” Nature Immunol., vol. 2, no. 9, pp. 762–764, 2001. [DOI] [PubMed] [Google Scholar]
  • [8].Creagh A. P. et al. , “Smartphone-based remote assessment of upper extremity function for multiple sclerosis using the draw a shape test,” Physiol. Meas., vol. 41, no. 5, 2020, Art. no. xan. [DOI] [PubMed] [Google Scholar]
  • [9].Pratap A. et al. , “Evaluating the utility of smartphone-based sensor assessments in persons with multiple sclerosis in the real-world using an app (elevatems): Observational, prospective pilot digital health study,” JMIR mHealth uHealth, vol. 8, no. 10, 2020, Art. no. e22108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Pham L., Harris T., Varosanec M., Morgan V., Kosa P., and Bielekova B., “Smartphone-based symbol-digit modalities test reliably captures brain damage in multiple sclerosis,” NPJ Digit. Med., vol. 4, no. 1, pp. 1–13, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Montalban X. et al. , “A smartphone sensor-based digital outcome assessment of multiple sclerosis,” Mult. Scler. J., vol. 28, no. 4, pp. 654–664, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Creagh A. P., Lipsmeier F., Lindemann M., and Vos M. D., “Interpretable deep learning for the remote characterisation of ambulation in multiple sclerosis using smartphones,” Sci. Rep., vol. 11, no. 1, pp. 1–14, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Bove R. et al. , “Evaluating more naturalistic outcome measures: A 1-year smartphone study in multiple sclerosis,” Neurol Neuroimmunol. Neuroinflamm., vol. 2, no. 6, 2015, Art. no. e162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Brichetto G., Pedullà L., Podda J., and Tacchino A., “Beyond center-based testing: Understanding and improving functioning with wearable technology in MS,” Mult. Scler. J., vol. 25, no. 10, pp. 1402–1411, 2019. [DOI] [PubMed] [Google Scholar]
  • [15].Creagh A. P. et al. , “Smartphone- and smartwatch-based remote characterisation of ambulation in multiple sclerosis during the two-minute walk test,” IEEE J. Biomed. Health Informat., vol. 25, no. 3, pp. 838–849, Mar. 2021. [DOI] [PubMed] [Google Scholar]
  • [16].Midaglia L. et al. , “Adherence and satisfaction of smartphone-and smartwatch-based remote active testing and passive monitoring in people with multiple sclerosis: Nonrandomized interventional feasibility study,” J. Med. Internet Res., vol. 21, no. 8, 2019, Art. no. e14863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Sosnoff J. J., Sandroff B. M., and Motl R. W., “Quantifying gait abnormalities in persons with multiple sclerosis with minimal disability,” Gait Posture, vol. 36, no. 1, pp. 154–156, 2012. [DOI] [PubMed] [Google Scholar]
  • [18].Martin C. L. et al. , “Gait and balance impairment in early multiple sclerosis in the absence of clinical disability,” Mult. Scler. J., vol. 12, no. 5, pp. 620–628, 2006. [DOI] [PubMed] [Google Scholar]
  • [19].Crenshaw S., Royer T., Richards J., and Hudson D., “Gait variability in people with multiple sclerosis,” Mult. Scler. J., vol. 12, no. 5, pp. 613–619, 2006. [DOI] [PubMed] [Google Scholar]
  • [20].Huisinga J. M., Mancini M., George R. J. S., and Horak F. B., “Accelerometry reveals differences in gait variability between patients with multiple sclerosis and healthy controls,” Ann. Biomed. Eng., vol. 41, no. 8, pp. 1670–1679, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Spain R. I., Mancini M., Horak F. B., and Bourdette D., “Body-worn sensors capture variability, but not decline, of gait and balance measures in multiple sclerosis over 18 months,” Gait Posture, vol. 39, no. 3, pp. 958–964, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Motl R. W., Sandroff B. M., Suh Y., and Sosnoff J. J., “Energy cost of walking and its association with gait parameters, daily activity, and fatigue in persons with mild multiple sclerosis,” Neurorehabil. Neural Repair, vol. 26, no. 8, pp. 1015–1021, 2012. [DOI] [PubMed] [Google Scholar]
  • [23].Zwibel H. L., “Contribution of impaired mobility and general symptoms to the burden of multiple sclerosis,” Adv. Ther., vol. 26, no. 12, pp. 1043–1057, 2009. [DOI] [PubMed] [Google Scholar]
  • [24].Hemmett L., Holmes J., Barnes M., and Russell N., “What drives quality of life in multiple sclerosis?,” QJM, vol. 97, no. 10, pp. 671–676, 2004. [DOI] [PubMed] [Google Scholar]
  • [25].Kurtzke J. F., “Rating neurologic impairment in multiple sclerosis: An expanded disability status scale (EDSS),” Neurol., vol. 33, no. 11, p. 1444, 1983. [DOI] [PubMed] [Google Scholar]
  • [26].Rudick R., Cutter G., and Reingold S., “The multiple sclerosis functional composite: A new clinical outcome measure for multiple sclerosis trials,” Mult. Scler. J., vol. 8, no. 5, pp. 359–365, 2002. [DOI] [PubMed] [Google Scholar]
  • [27].Motl R. W. et al. , “Validity of the timed 25-foot walk as an ambulatory performance outcome measure for multiple sclerosis,” Mult. Scler., vol. 23, no. 5, pp. 704–710, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].Motl R. W. et al. , “Evidence for the different physiological significance of the 6-and 2-minute walk tests in multiple sclerosis,” BMC Neurol., vol. 12, no. 1, 2012, Art. no. 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Sparaco M., Lavorgna L., Conforti R., Tedeschi G., and Bonavita S., “The role of wearable devices in multiple sclerosis,” Mult. Scler. Int., vol. 2018, 2018, Art. no. 7627643. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Jarchi D., Pope J., Lee T. K., Tamjidi L., Mirzaei A., and Sanei S., “A review on accelerometry based gait analysis and emerging clinical applications,” IEEE Rev. Biomed. Eng., vol. 11, pp. 177–194, 2018. [DOI] [PubMed] [Google Scholar]
  • [31].Godfrey A., Conway R., Meagher D., and Ó’Laighin G., “Direct measurement of human movement by accelerometry,” Med. Eng. Phys., vol. 30, no. 10, pp. 1364–1386, 2008. [DOI] [PubMed] [Google Scholar]
  • [32].Greene B. R. et al. , “Assessment and classification of early-stage multiple sclerosis with inertial sensors: Comparison against clinical measures of disease state,” IEEE J. Biomed. Health Informat., vol. 19, no. 4, pp. 1356–1361, Jul. 2015. [DOI] [PubMed] [Google Scholar]
  • [33].Spain R. et al. , “Body-worn motion sensors detect balance and gait deficits in people with multiple sclerosis who have normal walking speed,” Gait Posture, vol. 35, no. 4, pp. 573–578, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].Psarakis M., Greene D. A., Cole M. H., Lord S. R., Hoang P., and Brodie M., “Wearable technology reveals gait compensations, unstable walking patterns and fatigue in people with multiple sclerosis,” Physiol. Meas., vol. 39, no. 7, 2018, Art. no. 075004. [DOI] [PubMed] [Google Scholar]
  • [35].Bourke A. K., Scotland A., Lipsmeier F., Gossens C., and Lindemann M., “Gait characteristics harvested during a smartphone-based self-administered 2-minute walk test in people with multiple sclerosis: Test-retest reliability and minimum detectable change,” Sensors, vol. 20, no. 20, 2020, Art. no. 5906. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [36].Um T. T. et al. , “Data augmentation of wearable sensor data for Parkinson's disease monitoring using convolutional neural networks,” in Proc. 19th ACM Int. Conf. Multimodal Interact., 2017, pp. 216–220. [Google Scholar]
  • [37].Kingma D. P. and Ba J., “Adam: A method for stochastic optimization,” 2014, arXiv:1412.6980.
  • [38].Goodfellow I., Bengio Y., and Courville A., Deep Learning. Cambridge, MA, USA: MIT Press, 2016. [Google Scholar]
  • [39].Little R. J. and Rubin D. B., Statistical Analysis With Missing Data, vol. 793. Hoboken, NJ, USA: Wiley, 2019. [Google Scholar]
  • [40].Brown M. B. and Forsythe A. B., “Robust tests for the equality of variances,” J. Amer. Stat. Assoc., vol. 69, no. 346, pp. 364–367, 1974. [Google Scholar]
  • [41].Scalzitti D. A., Harwood K. J., Maring J. R., Leach S. J., Ruckert E. A., and Costello E., “Validation of the 2-minute walk test with the 6-minute walk test and other functional measures in persons with multiple sclerosis,” Int. J. MS Care, vol. 20, no. 4, pp. 158–163, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [42].Kieseier B. C. and Pozzilli C., “Assessing walking disability in multiple sclerosis,” Mult. Scler. J., vol. 18, no. 7, pp. 914–924, 2012. [DOI] [PubMed] [Google Scholar]
  • [43].Gijbels D., Eijnde B. O., and Feys P., “Comparison of the 2- and 6-minute walk test in multiple sclerosis,” Mult. Scler. J., vol. 17, no. 10, pp. 1269–1272, 2011. [DOI] [PubMed] [Google Scholar]
  • [44].Dyagilev K. and Saria S., “Learning (predictive) risk scores in the presence of censoring due to interventions,” Mach. Learn., vol. 102, no. 3, pp. 323–348, 2016. [Google Scholar]
  • [45].Zhan A. et al. , “Using smartphones and machine learning to quantify Parkinson disease severity: The mobile Parkinson disease score,” JAMA Neurol., vol. 75, no. 7, pp. 876–880, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [46].Schwab P. and Karlen W., “Phonemd: Learning to diagnose Parkinson's disease from smartphone data,” in Proc. AAAI Conf. Artif. Intell., 2019, vol. 33, pp. 1118–1125. [Google Scholar]
  • [47].Vaswani A. et al. , “Attention is all you need,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 5998–6008. [Google Scholar]

Articles from IEEE Open Journal of Engineering in Medicine and Biology are provided here courtesy of Institute of Electrical and Electronics Engineers

RESOURCES