Abstract
This paper presents the framework for developing a robotic system to improve accuracy and reliability of clinical assessment. Clinical assessment of spasticity tends to have poor reliability because of the nature of the in-person assessment. To improve accuracy and reliability of spasticity assessment, a haptic device, named the HESS (Haptic Elbow Spasticity Simulator) has been designed and constructed to recreate the clinical “feel” of elbow spasticity based on quantitative measurements. A mathematical model representing the spastic elbow joint was proposed based on clinical assessment using the Modified Ashworth Scale (MAS) and quantitative data (position, velocity, and torque) collected on subjects with elbow spasticity. Four haptic models (HMs) were created to represent the haptic feel of MAS 1, 1+, 2, and 3. The four HMs were assessed by experienced clinicians; three clinicians performed both in-person and haptic assessments, and had 100% agreement in MAS scores; and eight clinicians who were experienced with MAS assessed the four HMs without receiving any training prior to the test. Inter-rater reliability among the eight clinicians had substantial agreement (κ = 0.626). The eight clinicians also rated the level of realism (7.63 ± 0.92 out of 10) as compared to their experience with real patients.
Keywords: Elbow spasticity, haptic simulation, inter-rater reliability, modified Ashworth scale, spasticity assessment
I. Introduction
Spasticity is one of the most common and potentially disabling complications that affect individuals with neurological disorders. Accurate assessment of spasticity is important for designing optimal treatment plans, properly evaluating potential effects of treatment interventions or monitoring progression of recovery.
In clinical settings, spasticity is manually tested. Typically, clinician supports the proximal limb and then moves the patient's distal segment at one or more velocities while the patient is instructed to relax. The clinician judges the degree of spasticity based on resistance from patient's body. The Ashworth Scale, Modified Ashworth Scale (MAS), and Tardieu Scale are widely used clinical measures of spasticity; however, many studies report inconsistent results on reliability of those clinical scales [1]–[5], demonstrating the need for improving the reliability by providing clinicians with structured training [6]–[10].
The major reasons for the inconsistent results are clinician and subject variability involved in spasticity testing. If different raters move at different speeds, the same patient may respond differently due to the velocity-dependent characteristics of the spasticity [11]–[14]. Patient factors affecting spasticity include the time of the day that they are tested, fatigue level, body posture and position [15], and level of anxiety or attention paid during the test. Raters can be directed to perform the test at similar speeds by using visual or auditory feedback [16]; however, it would still be difficult to control all of the patient factors.
Another reason for the poor reliability of spasticity assessment is likely to be the qualitative descriptions of these clinical scales. For example, in the MAS, terms such as “considerable (MAS 3), more marked (MAS 2), or slight increase (MAS 1 or 1+) in muscle tone” are used which are clearly qualitative, thus different scores can be assigned to the same feel of muscle tone if raters perceive it differently. For improving accuracy and reliability of spasticity assessment, it is important to let trainees feel spastic joints which would help them clearly distinguish among these levels while they are being taught the clinical scales; however, involving many patients for training is impractical because 1) the variability due to the patient factors mentioned above will still induce inconsistency; 2) it is difficult to continually recruit patients with diverse degrees of spasticity for this purpose; and 3) patients may experience fatigue or boredom with repetitive testing.
As a solution to these difficulties, other programmable robotic systems that simulate patients' responses have been developed as training tools. For elbow spasticity, the upper limb patient simulator reproduced stiffness or muscle tone using magnetorheological (MR) brake [17], [18]. Another haptic simulator implemented velocity dependence of reflex onset angle and variable stiffness/damping at different degrees of spasticity [19]. A leg robot was developed for demonstrating ankle clonus, a related symptom to ankle spasticity seen in neurological patients [20]. Another device simulates contracture in the hand for training hand stretching exercises [21]. If those haptic systems are to be used as training tools, their accuracy and reliability needs to be evaluated with respect to the existing clinical scales; however, there is a paucity of data on clinical evaluation of existing training devices.
This paper presents a framework to standardize clinical assessment of spasticity including development of a novel haptic device capable of simulating spasticity in the elbow joint, named the HESS (Haptic Elbow Spasticity Simulator), mathematical modeling of spastic elbow joints, and a paradigm for training clinicians in the MAS. A novel mathematical model representing various levels of elbow spasticity is proposed here based on clinical data collected from subjects with elbow spasticity. By using the proposed model, four different MAS scores are implemented on the HESS, and the accuracy and reliability are evaluated with experienced clinicians. A preliminary version of this work has been reported [22], [23], but this paper presents engineering details, more clinical data and complete haptic models enhanced from the previous work.
II. Mathematical Modeling of Elbow Spasticity
The purpose of developing a mathematical model is twofold: 1) to identify realistic quantitative parameters that have potential to be used as objective measures representing severity of spasticity and 2) to program these into the haptic device so that the device can be controlled to output accurate joint torques at given input conditions (joint velocity, and MAS score).
A. Clinical Data Collection
First, we accumulated a database of quantitative information (position, velocity, and resistance torque) each with corresponding MAS scores. If we are to build standardized mathematical models representing MAS instrument, it is necessary to collect quantitative data corresponding to some gold standard for MAS scoring; however, the MAS is known for poor inter-rater reliability [4], [24]–[28] and there is minimal information in the literature on quantitative data that are correlated to MAS scores. This led us to consider the quantitative data sets with which a majority of experienced clinicians rated the same MAS scores on as standard data sets. Therefore, our strategy of building standard mathematical models is 1) to build a database including quantitative data and corresponding MAS scores; 2) extract data sets that experienced clinicians have strong agreement on; 3) and use the selected data sets to build mathematical models.
Three clinicians examined nine children with cerebral palsy (CP). All guardians of the children (mean age: 13.3 ± 3.5) gave written informed consent approved by the National Institutes of Health IRB. In order to obtain quantitative data, we developed a manual spasticity evaluator (MSE) which can measure elbow joint angles by a digital encoder (0.09 deg of resolution), angular velocities (by numerical differentiation of position), and the force (or torque) exerted (or felt) by clinicians while they manipulate the subject's elbow joint [Fig. 1(a)]. Force was measured by a force transducer (MLP-100, Transducer Techniques Inc., Temecula, CA) placed between clinician's hand and subject's arm. Surface EMG sensors were attached to the skin over the biceps and triceps muscle bellies to record muscle activation during the test. Position, velocity, force, and EMG data were sampled at 1 kHz by using a NI-PCIe-6321 board with a custom Labview program (National Instruments Inc., Austin, TX). The subjects were seated upright with their shoulder positioned at 40° abduction and 40° flexion. After aligning the MSE device with the patient's elbow joint [Fig. 1(b)], the clinicians were asked to assess spasticity in a same manner as they would do in clinic except that they were asked to move the patient's forearm in a horizontal plane. Each performed both slow and fast extension of the elbow and determined the MAS score based on the written instructions (Appendix).
Fig. 1.

Prototype MSE. (a) Design of MSE. (b) MSE attached at elbow.
Among the nine subjects, data from four subjects were selected each representing MAS scores of 1, 1+, 2, and 3. The raters had at least 67% of agreement on those four subjects listed in Table I. MAS scores 0 and 4 are excluded because they are obvious from the instruction (Appendix): either no resistance felt (score 0) or no movement possible (score 4). Quantitative data including time courses of synchronized position, velocity, and resistance torque collected at each MAS scores were used to build mathematical models.
TABLE I.
MAS Scores of Four Selected Subjects.
| Subject | #1 | #2 | #3 | #4 | |
|---|---|---|---|---|---|
| MAS scores | Rater 1 | 1 | 1+ | 2 | 3 |
| Rater 2 | 1 | 1+ | 2 | 2 | |
| Rater 3 | Absent* | 1+ | 2 | 3 | |
| MAS Scores used for the modeling | 1 | 1+ | 2 | 3 | |
B. Modeling of Spastic Elbow Joint
There have been several previous attempts to build a model of spasticity, but the models reported are too simple to describe sophisticated responses [19]–[21], [29], purely descriptive [30], [31] or focused only on modeling the spastic catch [32], which is defined as a sudden appearance of increased muscle tone during the fast passive movement [33]. The modeling provided in this paper attempts to quantitatively describe the known characteristics of spasticity in the MAS instrument (Appendix) and the literature [29], [32], [35]. For an accurate and realistic spasticity model, we divided the spastic response during passive movement into three phases: pre-catch, catch, and post-catch (Fig. 2). Equations describing each phase were proposed based on the clinical data and the descriptions in the MAS instructions.
Fig. 2.
Typical experimental data showing three phases of elbow spasticity: (i) Pre-catch, (ii) Catch, and (iii) Post-catch.
In the pre-catch phase, it is assumed that the resistance torque from the elbow joint can be regarded as a linear mass-spring-damper system
| (1) |
where m denotes mass (inertia) of the forearm and hand; b damping coefficient; and k stiffness.
Previous studies applied (1) for representing nonspastic [34] or spastic elbow joints [29]. Moreover, the clinical data collected under slow stretch also matched with this equation. Since the catch may not occur during slow stretch, the whole data set collected under slow stretch can be considered as a pre-catch phase. For instance, in Fig. 3, the measured torque during the slow stretching assessment (Subject #3) and the reconstructed torque calculated from (1) and position (velocity, and acceleration) data match with small error (11.4% of average error).
Fig. 3.

Comparison between measured force during slow movement and estimated force using (1).
The pre-catch phase ends with the occurrence of the catch at a specific joint angle which is called as catch angle or stretch reflex threshold [35]. Hence, determining the catch angle is essential in defining the end of the pre-catch phase. From the literature, stretching at a higher velocity typically evokes the catch sooner, and thus the catch angle is negatively proportional to the stretch velocity [35]. Moreover, the catch angle depends on the initial posture (or length of the muscle) [36]. Based on this information, the catch angle is determined as follows:
| (2) |
where L denotes for the catch angle constant; α (> 0) the slope of the stretching velocity versus catch angle curve; and the average speed of stretch during the pre-catch phase.
In the catch phase, it was reported that an impulse-like function was a suitable form for modeling the sudden increase of muscle tone [32]. The magnitude of the impulse is greater for higher MAS scores, and is velocity-dependent. In addition, our data show that the ratio of residual torque to the peak torque is different for each individual and the ratio is greater for higher MAS scores. Considering all the above factors, the following equation describes the torque during the catch phase:
| (3) |
where H denotes for catch torque constant that relates the stretching speed to the peak torque at catch; the stretching speed at the beginning of catch phase; τpre_end the torque at the end of pre-catch phase; Q (< 1) a constant scale factor that determines the amount of the residual torque after the peak torque (named as residual torque constant); tcatch_int the time when the catch phase begins; and ΔTpeak the time duration that the impulsive peak torque at catch maintains (Fig. 4).
Fig. 4.

Schematic model of catch phase.
In addition to H and Q which determine the amount of peak and residual torque, the time duration of the catch phase needs to be determined which automatically defines the end of catch phase and the beginning of post-catch phase. From the clinical data collected, we observed that the time duration of the catch phase, the end of which is delineated by the rapid decrease in residual torque from its peak value that occurs either before the end of the movement or before a second smaller rise in resistance torque if the movement has not yet been completed, is inversely proportional to the stretch velocity which is modeled as follows:
| (4) |
where D is named as the catch duration constant that determines time duration of the catch phase, and the average stretching speed during the catch phase. Practically, (4) is implemented by calculating at every sampling time, and the catch phase ends when time spent for catch phase (t − tcatch_init) starts to become greater than ΔTc(t). In the 12 trials performed at different stretching speeds for subject #3, remained at constant level which validates (4) (Fig. 5).
Fig. 5.

Clinical data supporting (4): from 12 trials at different stretching speeds (all data collected from subject #3).
After the catch phase, assuming the limb movement has not yet been completed, there is a secondary increase in resistance torque which starts from the residual torque determined by Q in (3) from the catch phase. The rate of increase in the post-catch phase is slower than that in the catch phase. To our knowledge, no research has modeled this phenomenon; however, the secondary increase also is described in the MAS instruction. For example, the absence or presence of increased resistance after catch distinguishes MAS 1 and 1+, respectively (Appendix). From the data collected, we found that the secondary increase in resistance torque is position-dependent. Hence, the torque in the post-catch phase is described as follows:
| (5) |
where kpost denotes for the elbow stiffness in post-catch phase; θpost_i the elbow joint angle at the beginning of the post-catch phase. Note that we do not need to determine the duration of the post-catch phase because post-catch phase ends if the speed of stretch is smaller than the speed threshold (), which occurs when the clinician stops the movement.
III. Design and Control of Hess
A. Design of HESS
HESS (Haptic Elbow Spasticity Simulator) has been designed to recreate the resistance that the clinicians would have felt during an in-person assessment [23]. The device consists of a mannequin forearm, brushless DC motor and controller (Barrett Technology Inc., Cambridge, MA), and a cable-driven speed-reducing mechanism (Fig. 6). The cable-driven mechanism consists of two stages of speed reduction, and the ratio at each stage is calculated by the ratio of diameters of input and output shafts.
Fig. 6.

Haptic Elbow Spasticity Simulator (HESS).
Since the brushless DC motor has about ten times greater bandwidth as compared to the MR brake used in the other previously referenced patient simulator [18], [20], the device is able to implement fast responses such as catch in greater accuracy. The speed reducing transmission was implemented by using the cable-driven speed reducing mechanism [37]. It utilizes frictionless rolling contact of two adjacent pulleys driven by two steel cables that are pre-tensioned but do not stretch. Therefore, the speed reducing mechanism implements small friction and near zero-backlash which are desirable mechanical properties of haptic devices. The detailed mechanical specification of the HESS is provided in Table II.
TABLE II.
Mechanical Specification of HESS
| Specifications | Value |
|---|---|
| Max. continuous torque | 28Nm |
| Max. speed (at motor shaft) | 300 deg/s |
| Position resolution at the elbow joint | 0.005 deg |
| Gear ratio | 18.8:1 |
| Weight of forearm and hand | 0.8 kg (forearm), 0.3 kg (hand) |
B. Control of HESS
1) Implementation of the Spasticity Model
The mathematical model described by (1)–(5) is implemented in the control scheme illustrated in Fig. 7. In the pre-catch phase, the haptic device is controlled by (1) until the joint angle reaches the catch angle determined by (2). Equation (2) yields unrealistic catch angles when is too small or too big. For practical implementation, we have modified the equation
| (6) |
θL represents the smallest feasible catch angle from the in-person assessment; whereas νL and νH denote the lower and the upper bounds of that the linear relationship in (2) holds. νL and θL are determined from the data collected during the in-person assessment, νH = (L − θL/α and from (2). From the in-person data collected from nine patients and the three clinicians, all measured were above 60°/s and the smallest catch angle was 60°; therefore νL and θL was set to 60°/s and 60°, respectively. These numbers, however, can vary depending on the data collected from the in-person assessment and should be adjusted accordingly. For example, if a clinician moved a patient's arm at a speed slower than 60°/s and the catch was observed, νL could be set to a smaller value.
Fig. 7.

Control flowchart implemented in HESS.
If the joint angle becomes greater than θcatch under a fast enough speed, the catch phase starts where the device is controlled by (3). The catch phase ends after ΔTc which is calculated by (4), and post catch phase starts where the rater feels the secondary increase in torque determined by (5) throughout the remainder of ROM. All parameters in (1)–(5) are estimated from the position and torque data collected during the in-person assessment which are listed in Table III. Among the parameters, L, H, Q, and D had monotonic relationships with MAS scores which implies that those parameters can potentially be used for quantifying the degree of spasticity.
TABLE III.
Parameters of the Four HMs
| L | H | Q | D | Other constants | |
|---|---|---|---|---|---|
| HM#1 | 130 | 1.4 | 0.15 | 60 |
m = 0, b = 1.26 (Nms/rad), and k = 3.15(Nm/rad) in (1) and (5) α=0.35 (sec) in (2) ΔTpeak=0.l(sec) in(3) kpost = 5.8(Nm/rad) in(5) = 20(deg/s) in Fig.6
|
| HM#2 | 120 | 2.0 | 0.3 | 50 | |
| HM#3 | 115 | 2.8 | 0.6 | 30 | |
| HM#4 | 110 | 3.8 | 0.8 | 10 |
Among all parameters, ΔTpeak and are constants that can be freely chosen within a reasonable range, b and k was estimated from the trials at slow velocity by using least square parameter estimation. The stiffness in post-catch phase, kpost is estimated from the data around the joint limit. Q, H, and D can be measured directly from the torque data collected in the in-person assessment. L and α are parameters found in the linear relationship between the stretching velocity and the catch angle curve that are obtained from multiple trials at different velocities.
The control scheme was implemented under a real-time operating system, Xenomai, with 1 kHz sampling rate. The calculated torque commands from (1), (3), and (5) were sent to the motor amplifier (PUCK, Barrett Technology Inc., Cambridge, MA) under torque control mode (25 kHz sampling rate). Since the mannequin forearm has the average inertia and size of the subject's forearm, the inertial effect [ in (1) and (5)] was not programmed into the control commands. This avoids high-level noise due to numerical differentiation involved in the calculation of acceleration from the position measurement.
2) Compensation of Friction
Friction force often deteriorates the haptic feel and backdrivability is considered an important mechanical property of a haptic device. In order to verify the amount of friction torque that would be felt by raters, we have tested the backdrivability of HESS. The resistance torque felt by a rater was measured by a torque sensor (TRT-200, sensor stiffness 2300 Nm/rad, Transducer Technique Inc., Temecula CA) while the rater moved the mannequin arm and the motor command was set to zero. The maximum torque felt by the rater was greater than 0.5 Nm mainly due to the viscous and static friction [Fig. 8(a)]. In order to compensate for the friction torque, the viscous and the Coulomb friction coefficients of the following model were identified. Even though the motor controller controlled for current of the motor, there might be effect of uncompensated back EMF. The friction model also includes any effect due to uncompensated back EMF by the motor controller.
Fig. 8.

Backdrivability test. (a) Without friction compensation. (b) With friction compensation.
| (7) |
where τext represents the torque measured during the backdrivability test, Imotor the inertia of the motor and the speed reduction mechanism, B the viscous damping coefficient, β and the Coulomb friction coefficient. By using least square estimation [38], Imotor = 0.0097 (kg m2), B = 0.168 (Nm sec), and β = 0.189 (Nm). With viscous and static friction compensated, τext measured during the backdrivability test was reduced below 0.2 Nm [Fig. 8(b)], which is small enough not to deteriorate the feel of the catch and residual resistance. In the real implementation, the output torque commands calculated from (1), (3), and (5) were updated after compensating the estimated friction torque in (7).
3) Safety Consideration
The flowchart in Fig. 7 assumes that raters will hold the device until the completion of the post-catch phase; however, the mannequin arm can slip out of their hands in the middle of the catch or post-catch phase if the resistance torque is too strong (e.g., MAS 3). When the device slips out of the operator's hand, it quickly accelerates opposite to the direction it has been moving. The loss of grasp was detected here by monitoring the velocity of the device. If a sudden change in velocity is detected, it is considered as loss of grip and the control mode is switched to the pre-catch phase to reduce resistance torque and thereby decrease the velocity.
C. Mechanical Behaviors of HESS
Mechanical behaviors of the four haptic models representing MAS 1, 1+, 2, and 3 are compared to those measured from in-person assessment (Fig. 9). Position was measured by the digital encoder installed along the motor shaft (Table II), and the force transducer (MLP-100, Transducer Techniques Inc., Temecula, CA) that is used for MSE device was installed between the rater's hand and the mannequin forearm to measure the torque felt by the raters. Since the raters cannot make same movement over trials, it is natural to see mismatching position and torque responses between the in-person and the haptic assessments. Note that the goal was to present mechanical behaviors that distinguish one MAS score from the others.
Fig. 9.
Mechanical behaviors during in-person and haptic assessments: (i) Pre-catch phase (ii) Catch phase, and (iii) Post-catch phase.
In the experimental results of MAS 1 [Fig. 9(a)], the peak torque is smaller (1 ~ 1.5 Nm) than the other MAS scores, and there is no post-catch phase so that the raters reach the joint limit within the catch phase. Therefore, the raters did not experience the secondary increase in resistance torque. However, in the other MAS scores [Fig. 9(b)–(d)], catch phases are brief and the resistance torque drops immediately after the catch followed by a secondary increase of resistance torque which is the post-catch phase. Higher MAS scores showed greater peak torques (related to H), smaller catch angles (related to L), shorter catch phase (related to D), and greater resistance torque at the end of post-catch phase. Note that the effect of parameter Q can be explained by (3). Parameter Q is described as the ratio of the decrease from peak torque to residual torque (local minimum after the catch) to the increase from torque at the end of pre-catch phase to the peak torque. Q is not seen directly from the graphs, but all four graphs show greater for Q higher MAS scores.
IV. Clinical Validation
A. Methods
The accuracy and reliability of the proposed HMs were validated by experienced clinicians. Eight experienced clinicians (six physical therapists and two physicians) including the three clinicians who collected clinical data shown in Table I participated in the validation process. All raters were experienced with using the MAS instrument from their previous institutions. They were provided with a written description of the scores but no additional training was provided during this study. In addition to the four models in Table III, 12 dummy models were added by using intermediate parameters of L, H, Q, and D to hide the identity of the four supposedly “true” models from the raters, similar to randomized “controls”.
Each rater came to the laboratory individually and was asked to manipulate HESS to give MAS scores to the HMs while the 16 models (including the four models in Table III and 12 dummy models) were presented in a randomized order. After reviewing MAS scoring instruction (Appendix) prior to the test, they moved the mannequin forearm as many times as they wanted; however, no training or instruction was provided regarding how much resistance corresponds to which MAS scores. They rated MAS scores based on their prior experience and the written MAS scoring instruction. After all trials, they scored level of realism [from 1 (worst) to 10 (best)] by comparing the haptic assessment with their prior experience with clinical assessments. For test–retest reliability after six months three of the eight clinicians tested the 16 HMs in a randomized order.
From the data collected, the accuracy of the four HMs was evaluated by using the MAS scores made by the three physical therapists who collected clinical data shown in Table I. The percent agreement between the MAS scores they made on in-person and haptic assessments were used to determine how closely the haptic models recreated the muscle tone of the four patients. The inter-rater reliability was evaluated from the data collected from all seven clinicians by using Fleiss's kappa statistic for multiple raters [39]. The test–retest reliability was evaluated from the data collected from three clinicians.
B. Results
First, the MAS scores that the three clinicians rated during the in-person and the haptic assessments had 100% agreement (Table IV). Table V shows the MAS scores rated on each haptic model (HM). Seven out of eight clinicians gave correct (or intended) scores to HM #1 (87.5% agreement), while six assigned correct (or intended) scores to HM #2 and #4 (75.0%). For HM #3, all eight gave correct scores (100%). Overall, mean agreement for the four HMs was 84.4±12.0%. The average stretching speed during pre-catch phase was different for raters and trials (Table V).
TABLE IV.
Comparison of MAS Scores From In-Person and Haptic Assessment
| MAS | Assessment | Rater 1* | Rater 2* | Rater 3* |
|---|---|---|---|---|
| 1 | In-person (subject 1) | 1 | 1 | 1 |
| Haptic (HM#1) | 1 | 1 | 1 | |
| 1+ | In-person (subject 2) | 1+ | 1+ | Absent** |
| Haptic (HM #2) | 1+ | 1+ | 1+ | |
| 2 | In-person (subject 3) | 2 | 2 | 2 |
| Haptic (HM #3) | 2 | 2 | 2 | |
| 3 | In-person (subject 4) | 3 | 2 | 3 |
| Haptic (HM #4) | 3 | 2 | 3 |
Rater 1,2, and 3 are same as the rater 1,2, and 3 in Table I, respectively.
Rater 3 did not test with subject 2.
TABLE V.
Rated MAS Scores and Stretching Speeds for Haptic Assessment
| HM#1 (MAS 1) | HM#2 (MAS 1+) | HM#3 (MAS 2) | HM#4 (MAS 3) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| rated score |
|
rated score |
|
rated score |
|
rated score |
|
||||
| Rater 1* | 1 | 122 | 1+ | 124 | 2 | 108 | 3 | 68 | |||
| Rater 2* | 1 | 100 | 1+ | 83 | 2 | 92 | 2 | 65 | |||
| Rater 3* | 1 | 86 | 1+ | 70 | 2 | 88 | 3 | 64 | |||
| Rater 4 | 1+ | 123 | 1+ | 131 | 2 | 110 | 3 | 72 | |||
| Rater 5 | 1 | 80 | 1+ | 70 | 2 | 77 | 3 | 60 | |||
| Rater 6 | 1 | 92 | 1 | 80 | 2 | 78 | 2 | 65 | |||
| Rater 7 | 1 | 82 | 1+ | 86 | 2 | 73 | 3 | 61 | |||
| Rater 8 | 1 | 89 | 1 | 93 | 2 | 80 | 3 | 60 | |||
Fleiss's kappa was obtained from Table IV to test inter-rater reliability. The kappa value (κ = 0.626) means that there was the substantial agreement [40] of MAS score among the eight raters. The inter-rater reliability of other 12 dummy models was poor (κ = 0.141). For test–retest reliability, each of the three raters had substantial (κ = 0.771) agreement on the four standard HMs while each of them had fair agreement (κ = 0.232) on the 12 dummy HMs. Lastly, from the questionnaire, the mean score regarding the level of realism of the device in comparison to a patient was 7.63 ± 0.92 out of a possible score of 10.
V. Discussion and Conclusion
This paper presents a framework for developing a robotic system which will help in standardizing the clinical assessment of spasticity. It involves 1) quantitative data collection from clinical assessment, 2) mathematical modeling of the clinical assessment, and 3) implementation and clinical evaluation of the model.
In order to improve the poor reliability of clinical spasticity assessment, there have been studies which attempted to quantify spasticity by correlating quantitative parameters to the clinical instruments [19], [20], [29]–[32], [35], [41]. It was reported that rate of change in resistance and the onset angle of stretch are closely related to the Ashworth Scale [41]. Most of the previous studies, however, modified the existing assessment protocol to obtain more consistent data [2], [29], [31], [35], [41] and/or tried to fit sophisticated phenomenon into single event [18]–[20].
The proposed mathematical model has advanced features which will ultimately enhance reliability of existing clinical assessment tools. First, it is a more precise and accurate model such that the raters judged it to be realistic. By dividing a spastic response into three phases, the model could deal with sophisticated responses precisely and accurately. The score was notably high given the fact that the “dummy” simulations were also included. Second, the proposed model uses existing clinical assessment tools without any modification, while it can potentially improve reliability of them. Therefore, clinicians can still use the same clinical instruments that they have been using. Lastly, it accounts for human variability which is unavoidable in clinical practices. In a clinical assessment, clinicians control for the stretching speed which is variable throughout the ROM as well as across the raters. For example, clinicians tend to slow the stretching speed immediately after they feel the sudden increase of muscle tone and stretching speeds are different across raters as shown in Table V. The proposed model accounts for this variability and Table V show promising results that the same model received similar scores even though it was operated at different speeds.
Among the parameters used in the mathematical model, four parameters [L in (2), H and Q in (3), and D in (4)] seemed most closely related to the MAS scores. From Table II, one can see that higher MAS scores have smaller L and D, and larger H and Q. We were able to create 12 dummy trials in between the four gold standard models. All seven raters did not report any weird feeling (different from realistic muscle tone) during the experiment; however, the inter-rater reliability on those twelve trials was not as good as the four standard HMs which is understandable because those trials sit between two MAS scores would be harder to distinguish. This paper did not employ statistical methods for correlating the four parameters with the degree of spasticity. Future study will be carried out to see whether the parameters can be used as quantitative measures indicating the degree of spasticity when more rigor or quantification is desired.
After four haptic models (HMs) were implemented, our primary concern was how to evaluate accuracy and reliability of HMs objectively. Generally, accuracy is defined as “degree of closeness to the absolute value”; however, it was hard to find a gold standard data of MAS scores beyond the written instruction. Even expert clinicians might perceive the same instruction differently. Therefore, the standard models were assumed to be the data from four representative subjects that experienced clinicians rated the same (or similarly). More data should be collected in the future to propose more generalized standard models. For the additional data to be collected the modeling, implementation, and evaluation methods will likely remain the same as the framework presented in this paper.
After refining the gold standard, we could compare position/force data collected from in-person assessment and haptic assessment to evaluate accuracy; however, it was our thought here that the clinical outcome measures should be used to evaluate accuracy if we are to improve accuracy of existing clinical assessment. This led us to compare the MAS scores, the most widely used clinical measure.
The accuracy of HMs was evaluated in two ways: 1) by comparing MAS scores from in-person assessment and corresponding haptic assessment, and 2) by percent agreement to the intended MAS scores. The first method would indicate how close the HMs reconstructed responses of the corresponding patients, and the haptic assessment had 100% matching MAS scores (Table IV). The second method would show how close the intended MAS scores of HMs are to the MAS scores perceived by experienced clinicians. The result (84.4 ± 12% agreement) and the survey results imply that the HMs implemented the haptic feeling that clinicians generally expect to feel.
As to the inter-rater reliability, there have been inconsistent results on inter-rater reliability of MAS score on elbow spasticity by using kappa statistics [2]–[4]. The agreements have been reported as poor to moderate (κ = 0.16 ~ 0.42) with four clinicians [4] and as moderate (κ = 0.52) with three clinicians [3]. In contrast, a study reported very good agreement (κ = 0.868 ~ 0.892) with two clinicians [2]. Note that all clinicians who participated in the previous studies had training session right before they tested which will significantly enhance reliability. The existing results on reliability seem to be controversial; they, however, show that the better agreement (the higher κ) was obtained from a smaller number of raters. In our reliability test, substantial agreement (κ = 0.626) was obtained from participating eight raters even without training prior to the test. All raters were experienced with the MAS instrument prior to the test. They were not only completely blinded to the identity of four standard HMs but also confused by the 12 dummy HMs randomly placed in between the four HMs. It is noteworthy that without training and even with the impediment, the four HMs still had substantial agreement. This shows that most raters intuitively perceived the haptic feel from the four standard HMs as the intended feel described in MAS instruction, which implies that HESS will be promising as a training tool.
The HESS and haptic models can provide a novel clinical analysis and training strategy that provides consistent stimuli that should yield consistent responses. There are many studies that tested reliability of the MAS instrument; however, the existing results are affected by three sources of variability: 1) patient variability, 2) a qualitatively described instrument, and 3) rater variability. In order to analyze the rater variability, it is necessary to isolate the third source from the others. The haptic device removes the patient variability and the mathematical modeling quantifies the qualitatively described instrument which let us study rater variability alone. From our pilot results, the raters had substantial test-retest agreement for the four standard HMs (κ = 0.771), but fair agreement on the other 12 dummy HMs (κ = 0.232). This implies that there are cases where raters have greater variability, and those cases might not be described clearly by one MAS score. Designing appropriate training programs for these cases will be more efficient. For example, we plan to collect enough data to find cases that cause greater intra-and inter-rater variability, to determine which MAS scores to assign, and to train raters for those cases using HESS.
The framework presented in this paper can also be applied to simulate other types of mechanical behaviors that involve haptic interaction. Rigidity in Parkinsonism is one example. First, a haptic device resembling patients' body part needs to be designed. Mechanical specifications should be satisfied such as adequate joint ROM, minimal friction from the device, and sufficiently high torque/force levels so as to mimic the targeted mechanical behaviors, and realistic inertial and size properties of the device based on anthropometric data. Second, quantitative data (position, force/torque, EMG, and etc.) needs to be collected from patients along with the clinical judgments. Third, the mechanical behaviors need to be identified as programmable language (e.g., mathematical equations) based on the data and the instructions of the clinical assessment. Lastly, the target behaviors should be implemented in the device and validated in a clinical setup.
The level of realism rated by the clinicians was high enough that they thought the HMs mimicked the spastic elbow joint. It is understandable that the clinicians could not rate perfect level (score 10) because the HMs were not as variable as the patients. The consistency in and the precise knowledge of what the clinicians feel from the HMs are important factors for why the HESS can be used as a training tool. Hence, it provides a feasible training opportunity to clinicians for improving accuracy and reliability of clinical assessments.
Acknowledgment
The authors would like to thank all subjects and clinicians volunteered for the study. Specifically, the authors would like to thank Dr. L. Prosser and Dr. C. Zampierri-Gallagher for their valuable comments.
This work was supported in part by the intramural research program of the National Institutes of Health (protocol number 90-CC-0168) and in part by the Center for Neuroscience and Regenerative Medicine (G192HF).
Biographies
Hyung-Soon Park (M'05) received the Ph.D. degree in mechanical engineering from the Korea Advanced Institute of Science and Technology, Daejeon, Korea, in 2004.
He is a staff scientist in the Rehabilitation Medicine Department, Clinical Center at the National Institutes of Health, Bethesda, MD. His current research interest focuses mainly on application of robotics and control technology on rehabilitation medicine, and biomechanics of human movement.
Jonghyun Kim (S'05) received the Ph.D. degree from the Department of Mechanical Engineering from Korea Advanced Institute of Science and Technology, Daejeon, Korea, in 2010.
Since 2010, he has been with the Rehabilitation Medicine Department, Clinical Center at the National Institutes of Health, Bethesda, MD, where he is currently a Visiting Fellow. His active research interests include rehabilitation robotic systems, haptic display, and the design and control of bilateral teleoperation.
Diane L. Damiano received the Ph.D. degree in research methods/biomechanics from the University of Virginia, Charlottesville, in 1993.
She is Chief of the Functional and Applied Biomechanics Section in the Rehabilitation Medicine Department, Clinical Center at the National Institutes of Health, Bethesda, MD. Her area of expertise is in the investigation of both existing and novel activity-based rehabilitation approaches in children with cerebral palsy which has helped to revolutionize the treatment of these patients.
Appendix
Modified Ashworth Scale (MAS) [42]
0 No increase in tone.
1 Slight increase in muscle tone, manifested by a catch and release or minimal resistance at the end of the ROM when the affected part is moved in flexion or extension.
1+ Slight increase in muscle tone, manifested by a catch, followed by minimal resistance throughout the remainder (less than half) of the ROM.
2 More marked increase in muscle tone through most of the ROM, but affected part easily moved.
3 Considerable increase in muscle tone, passive movement difficult.
4 Affected part rigid in flexion or extension.
Footnotes
H.-S. Park and J. Kim contributed equally to this work.
References
- [1].Fleuren JF, et al. Stop using the Ashworth Scale for the assessment of spasticity. J. Neurol. Neurosurg. Psychiatry. 2010 Jan;81(no. 1):46–52. doi: 10.1136/jnnp.2009.177071. [DOI] [PubMed] [Google Scholar]
- [2].Kaya T, et al. Inter-rater reliability of the Modified Ashworth Scale and modified Modified Ashworth Scale in assessing poststroke elbow flexor spasticity. Int. J. Rehabil. Res. 2011 Mar;34(no. 1):59–64. doi: 10.1097/MRR.0b013e32833d6cdf. [DOI] [PubMed] [Google Scholar]
- [3].Mehrholz J, et al. The influence of contractures and variation in measurement stretching velocity on the reliability of the modified Ashworth Scale in patients with severe brain injury. Clin. Rehabil. 2005 Jan;19(no. 1):63–72. doi: 10.1191/0269215505cr824oa. [DOI] [PubMed] [Google Scholar]
- [4].Mehrholz J, et al. Reliability of the modified Tardieu Scale and the modified Ashworth Scale in adult patients with severe brain injury: A comparison study. Clin. Rehabil. 2005 Oct;19(no. 7):751–9. doi: 10.1191/0269215505cr889oa. [DOI] [PubMed] [Google Scholar]
- [5].Yam WK, Leung MS. Interrater reliability of modified Ashworth Scale and modified Tardieu Scale in children with spastic cerebral palsy. J. Child Neurol. 2006 Dec;21(no. 12):1031–5. doi: 10.1177/7010.2006.00222. [DOI] [PubMed] [Google Scholar]
- [6].Gracies JM, et al. Reliability of the Tardieu Scale for assessing spasticity in children with cerebral palsy. Arch. Phys. Med. Rehabil. 2010 Mar;91(no. 3):421–8. doi: 10.1016/j.apmr.2009.11.017. [DOI] [PubMed] [Google Scholar]
- [7].Haas BM, et al. The inter rater reliability of the original and of the modified Ashworth Scale for the assessment of spasticity in patients with spinal cord injury. Spinal Cord. 1996 Sep;34(no. 9):560–4. doi: 10.1038/sc.1996.100. [DOI] [PubMed] [Google Scholar]
- [8].Klingels K, et al. Upper limb motor and sensory impairments in children with hemiplegic cerebral palsy. Can they be measured reliably? Disabil. Rehabil. 2010;32(no. 5):409–16. doi: 10.3109/09638280903171469. [DOI] [PubMed] [Google Scholar]
- [9].Pandyan AD, et al. A review of the properties and limitations of the Ashworth and modified Ashworth Scales as measures of spasticity. Clin. Rehabil. 1999 Oct;13(no. 5):373–83. doi: 10.1191/026921599677595404. [DOI] [PubMed] [Google Scholar]
- [10].Scholtes VA, et al. Clinical assessment of spasticity in children with cerebral palsy: a critical review of available instruments. Dev Med. Child Neurol. 2006 Jan;48(no. 1):64–73. doi: 10.1017/S0012162206000132. [DOI] [PubMed] [Google Scholar]
- [11].Haugh AB, Pandyan AD, Johnson GR. A systematic review of the Tardieu Scale for the measurement of spasticity. Disabil. Rehabil. 2006 Aug;28(no. 15):899–907. doi: 10.1080/09638280500404305. [DOI] [PubMed] [Google Scholar]
- [12].Lee HM, et al. Quantitative analysis of the velocity related pathophysiology of spasticity and rigidity in the elbow flexors. J. Neurol. Neurosurg. Psychiatry. 2002 May;72(no. 5):621–9. doi: 10.1136/jnnp.72.5.621. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Nielsen JB, Crone C, Hultborn H. The spinal pathophysiology of spasticity-from a basic science point of view. Acta Physiologica. 2007 Feb;189(no. 2):171–180. doi: 10.1111/j.1748-1716.2006.01652.x. [DOI] [PubMed] [Google Scholar]
- [14].Schmit BD, Rymer WZ. Identification of static and dynamic components of reflex sensitivity in spastic elbow flexors using a muscle activation model. Ann. Biomed. Eng. 2001 Apr;29(no. 4):330–9. doi: 10.1114/1.1359496. [DOI] [PubMed] [Google Scholar]
- [15].Shumway-Cook A, Woollacott MH. Motor Control: Translating Research into Clinical Practice. Lippincott Williams Wilkins; Philadelphia, PA: 2007. p. 115. [Google Scholar]
- [16].Wu YN, et al. Characterization of spasticity in cerebral palsy: Dependence of catch angle on velocity. Dev. Med. Child Neurol. 2010 Jun;52(no. 6):563–9. doi: 10.1111/j.1469-8749.2009.03602.x. [DOI] [PubMed] [Google Scholar]
- [17].Fujisawa T, et al. IEEE Int. Conf. Rehabil. Robot. Noordwijk; The Netherlands: 2007. Basic research on the upper limb patient simulator; pp. 48–51. [Google Scholar]
- [18].Takhashi Y, et al. IEEE Int. Conf. Rehabil. Robot. Zurich, Switzerland: 2011. Development of an upper limb patient simulator for physical therapy exercise; pp. 1117–1120. [DOI] [PubMed] [Google Scholar]
- [19].Grow DI, et al. IEEE Symp. Haptic Interfaces Virtual Environ. Teleoperator Syst. Reno, NV: Haptic simulaton of elbow joint spasticity; pp. 475–476. [Google Scholar]
- [20].Kikuchi T, Oda K, Furusho J. Leg-robot for demonstration of spastic movements of brain-injured patients with compact magnetorheological fluid clutch. Adv. Robot. 2010;24(no. 5–6):671–686. [Google Scholar]
- [21].Mouri T, et al. IEEE/RSJ Int. Conf. Intell. Robots Syst. San Diego, CA: Development of robot hand for therapist education/training on rehabilitation; pp. 2295–2300. [Google Scholar]
- [22].Kim J, Park HS, Damiano DL. IEEE Int. Conf. on Eng. Med. Biol. Soc. Boston, MA: Accuracy and reliability of haptic spasticity assessment using HESS (Haptic Elbow Spasticity Simulator) pp. 8527–8530. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Park H-S, Kim J, Damiano DL. Proc. IEEE Int. Conf. Rehabil. Zurich, Switzerland; 2011. Haptic recreation of elbow spasticity; pp. 1–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Blackburn M, Van Vliet P, Mockett SP. Reliability of measurements obtained with the modified Ashworth Scale in the lower extremities of people with stroke. Phys. Ther. 2002 Jan;82(no. 1):25–34. doi: 10.1093/ptj/82.1.25. [DOI] [PubMed] [Google Scholar]
- [25].Clopton N, et al. Interrater and intrarater reliability of the modified Ashworth Scale in children with hypertonia. Pediatr. Phys. Ther. 2005;17(no. 4):268–74. doi: 10.1097/01.pep.0000186509.41238.1a. [DOI] [PubMed] [Google Scholar]
- [26].Craven BC, Morris AR. Modified Ashworth Scale reliability for measurement of lower extremity spasticity among patients with SCI. Spinal Cord. 2010 Mar;48(no. 3):207–13. doi: 10.1038/sc.2009.107. [DOI] [PubMed] [Google Scholar]
- [27].Mutlu A, Livanelioglu A, Gunel MK. Reliability of Ashworth and Modified Ashworth Scales in children with spastic cerebral palsy. BMC Musculoskelet Disord. 2008;9:44–44. doi: 10.1186/1471-2474-9-44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Tederko P, et al. Reliability of clinical spasticity measurements in patients with cervical spinal cord injury. Ortop. Traumatol. Rehabil. 2007 Sep-Oct;9(no. 5):467–83. [PubMed] [Google Scholar]
- [29].McCrea PH, Eng JJ, Hodgson AJ. Linear spring-damper model of the hypertonic elbow: Reliability and validity. J. Neurosci. Methods. 2003 Sep.128(no. 1–2):121–8. doi: 10.1016/s0165-0270(03)00169-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Fee JW, Jr, Foulds RA. Neuromuscular modeling of spasticity in cerebral palsy. IEEE Trans. Neural Syst. Rehabil. Eng. 2004 Mar;12(no. 1):55–64. doi: 10.1109/TNSRE.2003.819926. [DOI] [PubMed] [Google Scholar]
- [31].Koo TK, Mak AF. A neuromusculoskeletal model to simulate the constant angular velocity elbow extension test of spasticity. Med. Eng. Phys. 2006 Jan;28(no. 1):60–9. doi: 10.1016/j.medengphy.2005.03.012. [DOI] [PubMed] [Google Scholar]
- [32].Zhang LQ, et al. System identification of tendon reflex dynamics. IEEE Trans. Rehabil. Eng. 1999 Jun;7(no. 2):193–203. doi: 10.1109/86.769410. [DOI] [PubMed] [Google Scholar]
- [33].Mayer NH. Clinicophysiologic concepts of spasticity and motor dys-function in adults with an upper motoneuron lesion. Muscle Nerve. 1997;6:S1–13. [PubMed] [Google Scholar]
- [34].Bennett DJ, et al. Time-varying stiffness of human elbow joint during cyclic voluntary movement. Exp Brain Res. 1992;88(no. 2):433–42. doi: 10.1007/BF02259118. [DOI] [PubMed] [Google Scholar]
- [35].Calota A, Feldman AG, Levin MF. Spasticity measurement based on tonic stretch reflex threshold in stroke using a portable device. Clin. Neurophysiol. 2008 Oct;119(no. 10):2329–37. doi: 10.1016/j.clinph.2008.07.215. [DOI] [PubMed] [Google Scholar]
- [36].Sheean G. The pathophysiology of spasticity. Eur. J. Neurol. 2002 May;9(no. Suppl. 1):3–9. doi: 10.1046/j.1468-1331.2002.0090s1003.x. [DOI] [PubMed] [Google Scholar]
- [37].Townsend WT. Ph.D. dissertation, Mechan. Eng. Dept. MIT; Cambridge: 1988. The effect of transmission design on force-controlled manipulator performance. [Google Scholar]
- [38].Ljung L. System Identification-Theory For the User. 2nd ed. Upper Saddle River: PTR Prentice Hall; 1999. [Google Scholar]
- [39].Fleiss JL. Measuring nominal scale agreement among many raters. Psychol. Bull. 1971;76(no. 5):378–382. [Google Scholar]
- [40].Landis JR, Koch GG. Measurement of observer agreement for categorical data. Biometrics. 1977;33(no. 1):159–174. [PubMed] [Google Scholar]
- [41].Damiano DL, et al. What does the Ashworth Scale really measure and are instrumented measures more valid and precise? Dev. Med. Child Neurol. 2002 Feb;44(no. 2):112–8. doi: 10.1017/s0012162201001761. [DOI] [PubMed] [Google Scholar]
- [42].Bohannon RW, Smith MB. Interrater reliability of a modified Ashworth Scale of muscle spasticity. Phys. Ther. 1987 Feb;67(no. 2):206–7. doi: 10.1093/ptj/67.2.206. [DOI] [PubMed] [Google Scholar]


= 20(deg/s) in 