Skip to main content
PLOS One logoLink to PLOS One
. 2020 Dec 10;15(12):e0243646. doi: 10.1371/journal.pone.0243646

Intra- and inter-rater reliability of joint range of motion tests using tape measure, digital inclinometer and inertial motion capturing

Laura Fraeulin 1,*, Fabian Holzgreve 1, Mark Brinkbäumer 2, Anna Dziuba 2, David Friebe 2, Stefanie Klemz 2, Marco Schmitt 2, Anna-Lena Theis A 2, Sarah Tenberg 2, Anke van Mark 1, Christian Maurer-Grubinger 1, Daniela Ohlendorf 1
Editor: Juliane Müller3
PMCID: PMC7728246  PMID: 33301541

Abstract

Background

In clinical practice range of motion (RoM) is usually assessed with low-cost devices such as a tape measure (TM) or a digital inclinometer (DI). However, the intra- and inter-rater reliability of typical RoM tests differ, which impairs the evaluation of therapy progress. More objective and reliable kinematic data can be obtained with the inertial motion capture system (IMC) by Xsens. The aim of this study was to obtain the intra- and inter-rater reliability of the TM, DI and IMC methods in five RoM tests: modified Thomas test (DI), shoulder test modified after Janda (DI), retroflexion of the trunk modified after Janda (DI), lateral inclination (TM) and fingertip-to-floor test (TM).

Methods

Two raters executed the RoM tests (TM or DI) in a randomized order on 22 healthy individuals while, simultaneously, the IMC data (Xsens MVN) was collected. After 15 warm-up repetitions, each rater recorded five measurements.

Findings

Intra-rater reliabilities were (almost) perfect for tests in all three devices (ICCs 0.886–0.996). Inter-rater reliability was substantial to (almost) perfect in the DI (ICCs 0.71–0.87) and the IMC methods (ICCs 0.61–0.993) and (almost) perfect in the TM methods (ICCs 0.923–0.961). The measurement error (ME) for the tests measured in degree (°) was 0.9–3.3° for the DI methods and 0.5–1.2° for the IMC approaches. In the tests measured in centimeters the ME was 0.5–1.3cm for the TM methods and 0.6–2.7cm for the IMC methods. Pearson correlations between the results of the DI or the TM respectively with the IMC results were significant in all tests except for the shoulder test on the right body side (r = 0.41–0.81).

Interpretation

Measurement repetitions of either one or multiple trained raters can be considered reliable in all three devices.

Introduction

Range of motion (RoM) measurements are often used to assess functional mobility [1, 2]. However, unassisted assessments of RoM, which are still performed in medical assessments, are subjective and, therefore, lack reliability [3, 4]. Nevertheless, standardized and objective RoM tests can be useful tools in choosing and evaluating therapeutic treatments in patients with issues of the musculoskeletal system [5, 6]. However, good intra- and inter-rater reliabilities should also be ensured for the respective test procedures using measurement devices in order to properly evaluate the therapy progress [2]. Both the intra- and inter-rater reliability depend on the experience of the rater, the health status of the participants, the accuracy of the instrument, its exact use and the respective test protocol [2, 7].

In clinical practice goniometers, digital inclinometers (DI) and tape measures (TM) are in frequent use [1, 2, 5, 6]. These pieces of equipment are easy to handle and are also cost and time effective. Recently, the smartphone has also received considerable attention as a measuring device and several applications have been compared to goniometer data, (reviewed by Keogh et al. [8]). The authors showed mostly adequate intra- and inter-rater reliability as well as validity for smartphone devices on all joints assessed. However, the intra-rater reliability of the DI has been shown to be equivalent [9] to the goniometer, while the inter-rater reliability was found to be superior [10, 11], since the exact placement of goniometer can be difficult [7, 12, 13]. The reliabilities of the DI and TM procedures, however, do differ [2, 14]; for example, in evaluating the flexion of the lumbar spine reliability was found to be inconsistent for several (slightly) modified versions of the Schoeber method using a TM [1517] and an inclinometer placed on the spine [18, 19].

A relatively new and frequently applied method is inertial motion capturing (IMC); this allows for precise and objective kinematic measurement in real world environments [3, 20, 21]. So far, IMC has been used by researchers when assessing functional RoM [22, 23] but, as yet, not for assessing the maximal range of motion which is held in static positions. Inertial sensors, like they are used in the MVN system by Xsens, combine signals of accelerometers, gyroscopes and magnetometers to determine the position, acceleration and orientation of body segments in space [24]. When multiple sensors are applied, joint angles can be calculated for all three body dimensions by means of biomechanical models [25].

In motion capturing, the optical approach (OMC) is considered the gold standard of kinematic measurements [26, 27]. However, as OMC is executed in the laboratory with several high precision cameras, it is scarcely available and is also expensive. Although research of the validity of IMC systems relies on small sample sizes [28], concurrent findings suggest that the IMC systems deliver good to excellent data especially in the frontal and sagittal planes [24, 27, 29, 30] and in slower and less complex movements [20, 31]. For example, comparing the Xsens system to the Optotrak system, Robert-Lachaine et al. [31] have shown that in the Xsens system the mean root mean square error (RMSE) on all joints was 1.2° in short functional movements compared to 2.8° in faster and more complex tasks (p≤0.001). They also showed, that the differences between the Xsens and Optotrak systems were significantly more attributed to discrepancies in the biomechanical model, rather than technological issues in estimating the orientation of segments (RMSE <5° in manual handling tasks); this has been supported by the findings of Zhang et al. [28]. The validation studies for the Xsens system have already been published in 2016 [32] and 2013 [28] and since then the calibration procedure and data processing protocols have been further developed.

In the present study, we chose to evaluate five traditional joint RoM tests. In these tests, RoM data are captured in static poses, mostly in the sagittal plane (fingertip-to-floor test, Thomas test, retroflexion of the trunk) and in the frontal plane (lateral inclination) while only for the shoulder test is a combination of planes used (Fig 1).

Fig 1. The range of motion measurements.

Fig 1

a) Shoulder test modified after Janda on the left-hand body side, b) modified Thomas test on the left-hand body side, c) retroflexion of the trunk after Janda in the modified version, d) fingertip-to-floor test and e) lateral inclination on the right-hand body side. The measurements were simultaneously recorded in the Xsens system (subjects wears the measurement suit) and the low cost devices.

The aim of this study was to collect joint range of motion data using low cost device methods (DI and TM) and more objective kinematic data using the Xsens system. For both approaches the intra- and inter-rater reliability are calculated in order to determine their practicability for medical assessment.

Methods

Study protocol

The aim of this study was to compare data obtained with either the DI [5, 7, 13] or TM [33, 34] with the data derived from the IMC system in order to assess the reliability of five RoM tests. Thus, we chose the following five tests (Fig 1) which depict the flexibility of the trunk in all three dimensions as well as the shoulder and the hip: the shoulder test modified after Janda [35], the modified Thomas test [5], retroflexion of the trunk after Janda in a modified version [35], the fingertip-to-floor test [33] and lateral inclination [12].

The study was approved by the ethics committee of Goethe University (2018–46) Frankfurt am Main, Germany.

Subjects

22 healthy subjects (12f/10m) with an average age of 25 years ± 2 volunteered to participate in this prospective study. The subjects were on average 174.1 cm ± 9.8 tall and weighed 66.6 kg ± 11.3. The average body mass index (BMI) was 21.9 ± 2.0 and all subjects were right-handed. All participants provided written informed consent. Two raters (sports scientists, B.A. and B.Sc., respectively) were intensively trained in the use of the measuring systems. They were responsible for performing the range of motion tests and positioning the DI and the TM.

To take part in the study, subjects had to be aged between 18 and 30 years. Exclusion criteria consisted of relevant operations to, or surgical stiffening, of the musculoskeletal system, relevant artificial joint replacement, severe diseases such as ankylosing spondylitis, chronic destructive joint diseases, multiple sclerosis, myodystrophic or neurodegenerative diseases, congenital malpositions of the musculoskeletal system or an acute heriated disc. In addition, the intake of muscle relaxants or other drugs that influence the elasticity of the musculature and pregnancy were considered as contra indicators.

Materials

Inertial Motion Capture (IMC)

The MVN BIOMECH Link system from Xsens (Enschede, Netherlands) was used for kinematic data collection. This system provides position, orientation and angle information, in addition to acceleration and velocity parameters, of the entire human body by means of 17 motion sensors, each consisting of acceleration and inertial sensors as well as gyroscopes. The sampling rate of the system is 240 Hz and the measurement error is specified by the manufacturer as ± 1%. The corresponding software displays the collected data in real-time using kinematic motion reconstruction as a biomechanical model. Due to the arrangement of the sensors within a full body suit (available in various sizes), the system is comfortable to wear without restricting the freedom of movement. In the study, when the position to be measured was reached, a marker was set in the recording session. The recording was executed in the multi-level scenario, which is recommended by Xsens when measurements are not uniquely performed on a flat floor (for our measurements subjects partly lay on a bench or stood on a step, thus, the multi-level scenario was the best setting applicable for these tests). Where subjects stood on even ground, the single-level scenario was chosen. After the measurements were taken, the "HD reprocessing" filter was applied to all recordings, this is provided in the MVN Analyze software and offers the best possible data quality according to the manufacturer.

Digital Inclinometer (DI)

A digital inclinometer (Model: AcumarTM DIGITAL INCLINOMETER Model ACU002 / Lafayette Instrument Company / Lafayette / USA) was used for the angle measurement in the modified Thomas test, in the retroflexion of the trunk modified after Janda and also in the shoulder test modified after Janda. As the DI displays only integers, the absolute measurement error is 0.3° [36].

Tape Measure (TM)

A commercially available tape measure was used to measure the distance between the finger and the ground and for lateral flexion. The measuring tape had a double measuring scale (inch and cm) with the smallest increments being 0.1 cm. The hard surface and the associated reduced buckling behavior enabled exact measurements to be made. The distances were recorded to the nearest 0.1 cm.

Measurement protocol

Prior to the measurements, subjects dressed in the measurement suits of the IMC system which were then calibrated. The five RoM tests described were carried out by the test persons in a randomized order. Since the modified Thomas test, the lateral inclination and the shoulder test modified after Janda were tested on both body sides, this study included a total of eight tests. For each test, 25 repetitions were performed which were recorded simultaneously by the IMC system and the DI or IMC. The first 20 repetitions were recorded by the first rater but not included in any calculations as they were included as warm-up in order to control for acute effects [37, 38]. For rater one, measurement 21–25 were included in the analysis [39]. Subsequently, the second rater measured another five repetitions. The order of the raters was chosen at random.

The raters were responsible for the correct positioning of the subjects at the beginning of each measurement and the application of either the DI or TM. When the subjects had reached the position to be measured, an additional investigator gave a verbal signal for the simultaneous measurement to be taken for the IMC and DI or TM. In the IMC, markers were set in the software which were checked after the recordings. The raters read out loud the angles or distances of the DI or the TM, respectively. The order of the raters was also randomized. The correct execution of the exercises was monitored by an additional investigator who underwent training at the same time as the raters.

Range of motion measurements

In Fig 1 the performed RoM tests are presented. Further detailed information can be found in the methodology paper of Holzgreve et al. [13].

Shoulder test modified after Janda

In contrast to the test by Janda, in this study the elbow was stretched (Fig 1A) and the raters placed the inclinometer proximal to the processus styloideus radii on the radius. While the test person was lying on a treatment couch, the shoulder joint was free and the arm was lowered in a controlled manner until the tension of the musculature terminated the movement. For each measurement, the rater moved the subject's arm and decided when the defined position had been reached. The test was performed separately for each arm. The joint centers of the wrist and the humerus head were extracted from the IMC. The arm length (distance between the humerus head and wrist) and the height difference between the humerus head and the wrist joint were calculated. The angle was calculated by the sin-1 from height/length. The recording was executed in the multi-level scenario and subsequently HD reprocessed.

Modified Thomas test

During the measurement session of the modified Thomas test (Fig 1B), the 0° alignment of the pelvis was checked by the raters after every fifth measurement. The inclinometer was then placed on the thigh, proximal to the patella, to determine the joint angle [40, 41]. The leg remained in the same position during all measurements and while changing the raters; the test was also performed separately for each leg. For the analysis of the IMC data, the flexion/extension angle of the hip joint was used. The recording was executed using the multi-level scenario and subsequently reprocessed (not HD).

Retroflexion of the trunk modified after Janda

The retroflexion of the trunk test, which was modified after Janda (Fig 1C), was determined by placing the inclinometer on the proximal part of the sternum. At the instructor's command, the test person adopted the position to be measured, maintained it for a few seconds and the angle was measured by the rater. For comparison with the IMC system, the orientation angle (y-axis) of the sternum was analyzed. The recording was executed using the multi-level scenario and subsequently HD reprocessed.

Fingertip-to-floor test

For the fingertip-to-floor tests (Fig 1D), subjects adopted the standardized position on a 15 cm high stool; this ensured that flexible persons could also execute the test with full RoM. For the execution, it was checked that the knees were always stretched and that the index fingers of both hands were brought together. The distances between the floor and the fingertip were measured using a conventional measuring tape. For the corresponding IMC data, the distance between the hand segments and the foot segments were calculated; to do so, the data of the left and right side were averaged for hand and foot segments.

Lateral inclination

The lateral inclination was executed in a standardized standing position. Sagittal fluctuations in the lateral inclination were eliminated by leaning the back against a wall. The ipsilateral hand of the body side to be measured had to be guided distally, directly along the body. An investigator confirmed that the test person kept their knees straight, that their heels were not lifted off the ground and that their backs were always leant against the wall. If the position to be measured was actively reached by the test person, the rater measured the lateral fingertip-to-floor distance using a measuring tape. Each body side was measured separately. For comparison with the IMC system, the distance between the hand segment and the floor was calculated. The recording was executed in the single level scenario and afterwards HD reprocessed.

Statistical analysis

Statistical analysis was performed using Matlab R2020a, Microsoft Excel 2016, IBM SPSS Statistics 25 and BIAS (Version 11) software programs. The relevant joint angles were provided in the Xsens software except for the specific angle used in the shoulder test. For this test, Matlab was used to calculate the distances between the segments and the angles. In addition, all necessary IMC data were analyzed in Matlab and exported as Excel-files similar to the DI and the TM data.

For the intra-rater reliability of rater 1 and rater 2, the last five measurements of each test were taken into account. Inter-rater reliability was analyzed by comparing the mean values of the last five measurements of rater 1 and rater 2. Reliabilities were calculated by means of intraclass correlation coefficients (ICC) using the BIAS software.

Intra-rater reliability (ICC) was assessed according to Bland and Altmann [42]:

ICC=(mSSBSST)/(m1)SST,

where m is the number of observations per subject, SST the total sum of squares, and SSB the sum of squares between persons. Measurements are considered reliable if differences between the two measurements of the same person are small compared to the differences between the individuals.

The measurement error (ME) is the square root of the average of the subject-specific variances:

ME=[ij(Y¯Yij)2/(m1)]/i

Repeatability was calculated as the mean difference between two measurements of the same subject, which can be estimated as √2⸱ME [43].

The coefficient of repeatability (CoR) was calculated as an estimation of the minimum detectable change: 1.96 x √2⸱ME.

For inter-rater reliability for the low cost device methods, the “two-way mixed, single-measure” ICC(3,1) according to Shrout and Fleiss [44] was used, where ICC(3,1) values were estimated via this random effects model:

Yij=μ+si+αj+eij,i=1,2,,22,j=1,2

with Yij the j-th observation of the i-th person, μ the fixed effect, si the random effect of the i-th person, iid, N(0, σs2), αj the fixed rater effect of the j-th rater, eij the measurement error, iid, N(0, σe2).

Measurements are considered reliable if differences between the observers are small compared to the differences between the individuals. The intra-class correlation coefficient ICC(3,1) measures the relation of the variance that is attributed to a random factor as

ICC(3,1) = [σ2s- σ2r] /[σ2s+ σ2r+ σ2e], with σ2r being the variance of the rater.

Both intra- and inter-rater reliability coefficients (ICCs) were classified by means of the method suggested by Landis and Koch [45]: ICCs 0–0.20 = “slight”, 0.21–0.40 = “fair”, 0.41–0.60 = “moderate”, 0.61–0.80 = “substantial”, 0.81–1.00 = “(almost) perfect”.

As the vast majority of data was normally distributed, the Pearson correlation coefficient was used to show the relationship between the measurement systems using BIAS. Only data of which the synchronicity could be assured was included. The correlation coefficients were classified according to Evans [46]: r <0.2 = “poor”, 0.2–0.4 = “weak”, 0.4–0.6 = “moderate”, 0.6–0.8 = “strong” and >0.8 = “optimal” correlations.

Results

Intra-rater reliability for all tests using the DI or the TM, respectively, showed (almost) perfect results (Table 1). The same was demonstrated for the IMC measurement system (Table 2). For the RoM tests measured in centimeters, the ME ranged from 0.5–1.3cm in the TM method and 0.6–2.7cm in the IMC method. The repeatability ranged from 0.7–1.9cm in the TM method and from 0.9–3.9cm in the IMC method. In the tests measured in degrees (°) the ME was 0.9–3.3° in the DI method and 0.5–1.2° in the IMC method. The repeatability was 1.2–4.7° in the DI method and 0.5–2.8° in the Xsens system.

Table 1. Intra-rater reliability for both raters using the DI and TM.

DI/TM Thomas test Shoulder test Retroflexion Lateral Inclination Fingertip-to-floor
Measuring system DI DI DI TM TM
Body side left right left right ------ left right -------
Rater
ICC Rater 1 0.976 0.987 0.942 0.945 0.914 0.887 0.975 0.965
Rater 2 0.962 0.955 0.925 0.947 0.886 0.974 0.962 0.951
95% Confidence Interval Rater 1 [0.95; 0.99] [0.97; 0.99] [0.90; 0.98] [0.91; 0.98] [0.86; 0.97] [0.82; 0.96] [0.96; 0.99] [0.94; 0.98]
Rater 2 [0.93; 0.98] [0.92; 0.98] [0.88; 0.97] [0.91; 0.98] [0.82; 0.95] [0.96; 0.99] [0.94; 0.99] [0.92; 0.98]
P-value Rater 1 0.001 0.001 0.001 0.001 >0.001 >0.001 >0.001 >0.001
Rater 2 0.001 0.001 0.001 0.001 >0.001 >0.001 >0.001 >0.001
ME Rater 1 0.9° 0.9° 1.8° 2.2° 2.7° 1.1 cm 0.6 cm 1.1 cm
Rater 2 1.1° 1.4° 2.2° 3.3° 0.5 cm 0.8 cm 1.3 cm
Repeatability Rater 1 1.6° 2.6° 3.1° 3.8° 1.6 cm 0.9 cm 1.5 cm
Rater 2 1.3° 1.2° 2.9° 3.1° 4.7° 0.7 cm 1.1 cm 1.9 cm
CoR Rater 1 3.1° 3.9° 5.1° 6.1° 7.5° 3.1 cm 1.8 cm 2.9 cm
Rater 2 2.4° 2.4° 5.7° 6.1° 9.2° 1.4 cm 2.2 cm 3.7 cm

DI = digital inclinometer; TM = tape measure; IMC = inertial motion capture; ME = measurement error; CoR = coefficient of repeatability.

ICCs were rounded to three digits after the decimal point when ICCs lay between 0.899–1.00 to show that they were still below 1. The units for measurement error, repeatability and CoR are centimeters where a TM was used and degree° where a DI was used.

Table 2. Intra-rater reliability for both raters using the IMC.

IMC Thomas test Shoulder test Retroflexion Lateral Inclination Fingertip-to-floor
Measuring system DI DI DI TM TM
Body side left right left right ------ left right -------
Rater
ICC Rater 1 0.996 0.990 0.921 0.970 0.972 0.985 0.924 0.958
Rater 2 0.994 0.982 0.899 0.938 0.974 0.984 0.942 0.921
95% Confidence Interval Rater 1 [0.99; 1.00] [0.98; 1.00] [0.87; 0.97] [0.95; 0.99] [0.95; 0.99] [0.98; 0.99] [0.88; 0.97] [0.93; 0.98]
Rater 2 [0.99; 1.00] [0.97; 0.99] [0.83; 0.97] [0.90; 0.98] [0.96; 0.99] [0.97; 0.99] [0.91; 0.98] [0.87; 0.97]
P-value Rater 1 >0.001 >0.001 >0.001 >0.001 >0.001 >0.001 >0.001 >0.001
Rater 2 >0.001 >0.001 >0.001 >0.001 >0.001 >0.001 >0.001 >0.001
ME Rater 1 0.5° 0.7° 1.5° 1.3° 1.1° 0.6 cm 1.6 cm 2.2 cm
Rater 2 0.7° 1.1° 1.6° 1.2° 0.7 cm 1.5 cm 2.7 cm
Repeatability Rater 1 0.7° 1.9° 1.5° 0.9 cm 2.2 cm 3.1 cm
Rater 2 1.6° 2.3° 2.8° 1.6° 1 cm 2.2 cm 3.9 cm
CoR Rater 1 1.4° 3.9° 3.7° 2.9° 1.8 cm 4.3 cm 6.1 cm
Rater 2 3.1° 4.5° 5.5° 3.1° 2 cm 4.3 cm 7.6 cm

DI = digital inclinometer; TM = tape measure; IMC = inertial motion capture; ME = measurement error; CoR = coefficient of repeatability.

ICCs were rounded to three digits after the decimal point when the ICCs lay between 0.899–1.00 to show that they were still below 1. The units for measurement error, repeatability and CoR are centimeters where a TM was used and degree° where a DI was used.

Inter-rater reliability (Table 3) for the DI and the TM protocols revealed substantial to (almost) perfect agreement. This can also be described for the IMC testing.

Table 3. Inter-rater reliability for using the TM and DI.

Inter-rater Reliability Thomas test Shoulder test Retroflexion Lateral Inclination Fingertip-to-floor
Measuring system DI DI DI TM TM
Body side left right left right ------ left right -------
Rater
ICC DI/TM 0.87 0.72 0.80 0.71 0.84 0.961 0.938 0.923
IMC 0.86 0.78 0.61 0.81 0.910 0.993 0.975 0.965
95% Confidence Interval DI/TM [0.77; 0.96] [0.55. 0.85] [0.57; 0.91] [0.43; 0.88] [0.66; 0.93] [0.91; 0.98] [0.86; 0.97] [0.82; 0.97]
IMC [0.69; 0.94] [0.55; 0.90] [0.21; 0.83] [0.58; 0.92] [0.79; 0.96] [0.98; 1.00] [0.94; 0.99] [0.92; 0.99]
P-value DI/TM 0.001 0.001 0.001 0.001 >0.001 >0.001 >0.001 >0.001
IMC >0.001 >0.001 0.003 >0.001 >0.001 >0.001 >0.001 >0.001

DI = digital inclinometer; TM = tape measure; IMC = inertial motion capture; ME = measurement error.

ICCs were rounded to three digits after the decimal point when the ICCs lay between 0.899–1.00 to show that they were still below 1.

The relationship between the two methods is illustrated in Fig 2 for each test. The correlations between the low cost device methods and the IMC method were shown to be moderate or better and except for the shoulder test on the right-hand body side statistically significant (Table 4). While the fingertip-to-floor test exhibited an optimal correlation, the analysis of the lateral inclination and the modified shoulder test on the right-hand body side revealed strong correlations. The retroflexion of the trunk, the modified shoulder test of the right-hand body side and the Thomas test were found to correlate in a moderate manner.

Fig 2. The relationship between the low cost device methods and the IMC method.

Fig 2

Here the means of the last five measurements of rater 1 were used for the plot.

Table 4. Pearson correlations between the DI–IMC or TM–IMC measurement systems.

Correlations Thomas test Shoulder test Retroflexion Lateral Inclination Fingertip-to-floor
Body side left right left right ------ left right -------
Measurement Systems DI—IMC DI—IMC DI—IMC DI—IMC DI—IMC TM—IMC TM—IMC TM—IMC
Correlation Coefficient (r) 0.49* 0.53* 0.54* 0.41 0.52* 0.79* 0.73* 0.81*
n 21 20 19 20 21 20 22 22

DI = digital inclinometer; TM = tape measure; IMC = inertial motion capture.

The 21st repetition of rater 1 was used exemplarily; only measurements where synchronicity could be ensured were included. Statistical significance is indicated by *.

The main findings of Tables 14, concerning the intra- and inter-rater reliabilities and the correlations, are summarized in Table 5.

Table 5. Summary of the correlations, intra- and inter-rater reliabilities.

Summary Thomas test Shoulder test Retroflexion Lateral Inclination Fingertip-to-floor
Meas. system DI IMC DI IMC DI IMC TM IMC TM IMC
Body side
Correlation left moderate moderate moderate strong optimal
right
Intra-rater Reliability left (almost) perfect
right
Inter-rater Reliability left (almost) perfect substantial substantial (almost) perfect (almost) perfect (almost) perfect
right substantial (almost) perfect

DI = digital inclinometer; TM = tape measure; IMC = inertial motion capture.

Discussion

The aim of this study was to provide both intra- and inter-rater reliabilities for five joint range of motion tests measured with low cost devices (TM or DI) and an IMC system. The results show, that intra-rater reliability was (almost) perfect in all tests for all three devices; whilst inter-rater reliability was shown to be substantial to (almost) perfect in the DI and the IMC methods and (almost) perfect in the TM methods (Table 5). Consequently, the measurement repetitions of all three devices with either one or multiple trained raters can be considered reliable.

While the TM and the DI are in frequent use, the relatively new IMC systems have not, as yet, been used for static ROM assessment. This study compliments the field of IMC science, being the first to provide intra- and inter-rater reliabilities on static RoM measurements. Although the IMC approach does not show the exact same angles or distances as a TM or a DI (e.g. distance calculation between the hand and foot sensors or the distance from the fingertip to floor measured via TM), the correlations between the methods were moderate to strong (Table 4, Fig 2). It can therefore be assumed that the same or at least a very similar mobility construct was measured. Typically, IMC systems are used for the kinematic measurement of motions, for which reliabilities have been provided. van der Straaten et al. [24] used an Xsens device (Awinda, 60Hz) for the examination of joint angles of the trunk, pelvis, hip, knee and ankle in all three degrees of freedom in a single leg squat. For the sagittal hip RoM the results showed within-session, between-session and between operator reliabilities ICCs of 0.9, 0.86 and 0.86 respectively. Relative to the total RoM executed in the single leg squat, the proportional standard error of measurement (%SEM) in the hip joint was 18%– 20% and the minimal detectable change (MDC) was 8–11°. In the Thomas test of the current study, we also measured the hip RoM in the sagittal plane and demonstrated a considerably lower ME (0.5–1.1°) and CoR (1.4–3.1°). This is possibly due to the more precise Xsens system MVN Link which has a sampling rate of 240Hz and the static nature of the Thomas test which improves measurement accuracy [20].

The results of the TM and DI methods can be evaluated with regard to the concurrent evidence. Regarding the measurement of shoulder mobility, literature provides a large range of different RoM measurements [14, 4750]; given the fact that the shoulder is a highly mobile joint, from a functional point of view, different degrees of freedom are of interest. In these studies, reproducibility has been shown to differ [14, 50, 51]; for example de Winter et al. [48] report ICCs ranging from 0.28–0.90 for the inter-rater reliability and a MDC of 20–25° for glenohumeral abduction and external rotation measured via DI. However, to our knowledge so far, no study has researched the precise RoM used in the current approach, which can be described as a combination of external rotation and abduction of the shoulder. Based on the presented results, this approach produces reliable results and an acceptable CoR ranging from 5.1–6.1° for the DI method and from 3.9–5.5° for the IMC method.

In the Thomas test the present findings of substantial to (almost) perfect reproducibility are supported by concurrent evidence [5, 11]. In addition, the CoRs of 2.4–3.9° in the DI method and 1.4–3.1° in the IMC method support a precise application of the test protocol in medical assessment. However, the Thomas test must, in general, be applied cautiously in medical assessments in obese people [52]; controlling for hip flexion [53] and testing proximal of the patella may be influenced by the curvature of the thigh and the subcutaneous fat tissue. Nevertheless, in the present study, this was not an issue as the subjects had a rather low BMI (21.9 ± 2.0).

In the retroflexion of the trunk, the available results on intra- and inter-rater reliabilities for measuring the retroflexion on different bony landmarks on the back are rather mixed [2, 5457]. Mellin et. al. [52] have shown that the lying position is the most reliable method available (correlations between repetitions: r 7.2–9.2). However, they applied the inclinometer on the spine, whereas in this study the DI was placed on the sternum as it is easy to palpate and provides a solid base for the DI. Up to the present day, no other study has aimed to evaluate the extension of the spine via taking measurements from the sternum. Although reliability was (almost) perfect, the ME in the DI method ranged from 2.71–3.30° and the CoR was 7.5–9.2°; these values can both be considered relatively high as RoM gains in the extension of the spine may only be small. Therefore, we recommend an even clearer definition of the bony landmarks when using the DI assessment. The IMC method, on the other hand, was shown to be a precise approach (ME (1.1–1.2°) and CoR (2.9–3.1°)).

In the lateral inclination the present results add to those from assessments of lateral spinal flexion in a study using a bubble inclinometer [58] and a measurement table against which the subject had to lean [59]; both studies produced (almost) perfect ICCs for the intra-rater reliability and inter-rater reliability, respectively. In the current study, the low ME and good repeatability values add to the practicability of the TM method in medical assessments. The ME of 0.5–1.1cm for the TM is even smaller than previously reported by Inger et al. [60] (2.6cm); here their SDC for this method was 7.3cm [60] which is considerably greater than the CoR of 1.8–2.2cm found in the present investigation. One explanation for this may be the thorough warm-up procedure used in the present investigation. In the IMC method, the CoR was 4.3cm; this could be due to the fact that the position data calculated by signal processing in IMC systems is not entirely accurate [61].

In the fingertip-to-floor test, current evidence shows mixed results. While three studies [1, 34, 62] confirm the current, very good intra-rater reliabilities (ICCs 0.97–0.99), two studies [4, 16] have reported a low reproducibility, possibly due to differences in the study design. Merrit et al. [4] recorded the measurements for the intra-rater reliability on three different days with only one instructional repetition; this, however, is not sufficient to control for acute effects. In addition, the subjects in this study were 18–65 years old. The lack of a warm-up might also explain the poor repeatability Gill et al. [16] described (coefficient of variation: 14.1), since the subjects maintained the flexed position while a rater took repeated measurements. Better reproducibility is presented by Ekedahl et al. [6] who report a 4.5cm MDC, which is similar to the present finding of 2.9–3.7cm CoR.

When selecting measuring instruments for medical assessment, the current findings, in general, support the easy to use low cost devices since the described MEs, repeatabilities and CoRs support to the good reproducibility results. It is only in the retroflexion of the trunk, that the application of the DI on the sternum might be too imprecise to capture actual changes in mobility. The IMC method also showed satisfying results in the tests measured in degrees, especially in the spinal movement where it could provide precise and reliable data. However, when measuring distances between body segments, it should be kept in mind, that position estimation with inertial sensors is not, as yet, entirely precise [61]. While the MVN Link system by Xsens is expensive and probably too time consuming for use in clinical practice, future research should aim to evaluate cheaper and more practical IMC systems.

Data Availability

The data underlying this study is available from: https://www.researchgate.net/publication/344687962_Joint_Range_of_Motion_Tests_measured_with_a_Digital_Inclinometer_a_Tape_Measure_and_Inertial_Motion_Capture?channel=doi&linkId=5f896cb792851c14bccc2fa8&showFulltext=true.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.Gauvin MG, Riddle DL, Rothstein JM. Reliability of clinical measurements of forward bending using the modified fingertip-to-floor method. Physical therapy. 1990;70(7):443–7. 10.1093/ptj/70.7.443 [DOI] [PubMed] [Google Scholar]
  • 2.Clarkson HM. Joint Motion and Function Assessment A Research-Based Practical Guide: Lippincott Williams & Wilkins; 2005. [Google Scholar]
  • 3.Cloete T, Scheffer C. Repeatability of an off-the-shelf, full body inertial motion capture system during clinical gait analysis. Conf Proc IEEE Eng Med Biol Soc. 2010;2010:5125–8. 10.1109/IEMBS.2010.5626196 [DOI] [PubMed] [Google Scholar]
  • 4.Merritt JL, McLean TJ, Erickson RP, Offord KP. Measurement of trunk flexibility in normal subjects: reproducibility of three clinical methods. Mayo Clin Proc. 1986;61(3):192–7. 10.1016/s0025-6196(12)61848-5 [DOI] [PubMed] [Google Scholar]
  • 5.Clapis PA, Davis SM, Davis RO. Reliability of inclinometer and goniometric measurements of hip extension flexibility using the modified Thomas test. Physiother Theory Pract. 2008;24(2):135–41. 10.1080/09593980701378256 [DOI] [PubMed] [Google Scholar]
  • 6.Ekedahl H, Jönsson B, Frobell RB. Fingertip-to-Floor Test and Straight Leg Raising Test: Validity, Responsiveness, and Predictive Value in Patients With Acute/Subacute Low Back Pain. Arch Phys Med Rehabil. 2012;93(12):2210–5. 10.1016/j.apmr.2012.04.020 [DOI] [PubMed] [Google Scholar]
  • 7.Antonaci F, Ghirmai S, Bono G, Nappi G. Current methods for cervical spine movement evaluation: a review. Clin Exp Rheumatol. 2000;18(2 Suppl 19):S45–52. [PubMed] [Google Scholar]
  • 8.Keogh JWL, Cox A, Anderson S, Liew B, Olsen A, Schram B, et al. Reliability and validity of clinically accessible smartphone applications to measure joint range of motion: A systematic review. PLoS One. 2019;14(5):e0215806 10.1371/journal.pone.0215806 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Bierma-Zeinstra SM, Bohnen AM, Ramlal R, Ridderikhoff J, Verhaar JA, Prins A. Comparison between two devices for measuring hip joint motions. Clin Rehabil. 1998;12(6):497–505. 10.1191/026921598677459668 [DOI] [PubMed] [Google Scholar]
  • 10.Boone DC, Azen SP, Lin CM, Spence C, Baron C, Lee L. Reliability of goniometric measurements. Phys Ther. 1978;58(11):1355–60. 10.1093/ptj/58.11.1355 [DOI] [PubMed] [Google Scholar]
  • 11.Roach S, San Juan JG, Suprak DN, Lyda M. Concurrent validity of digital inclinometer and universal goniometer in assessing passive hip mobility in healthy subjects. Int J Sports Phys Ther. 2013;8(5):680–8. [PMC free article] [PubMed] [Google Scholar]
  • 12.Petherick M, Rheault W, Kimble S, Lechner C, Senear V. Concurrent validity and intertester reliability of universal and fluid-based goniometers for active elbow range of motion. Phys Ther. 1988;68(6):966–9. 10.1093/ptj/68.6.966 [DOI] [PubMed] [Google Scholar]
  • 13.Holzgreve F, Maltry L, Lampe J, Schmidt H, Bader A, Rey J, et al. The office work and stretch training (OST) study: an individualized and standardized approach for reducing musculoskeletal disorders in office workers. Journal of Occupational Medicine and Toxicology. 2018;13(1):37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Hoving JL, Buchbinder R, Green S, Forbes A, Bellamy N, Brand C, et al. How reliably do rheumatologists measure shoulder movement? Ann Rheum Dis. 2002;61(7):612–6. 10.1136/ard.61.7.612 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Fitzgerald GK, Wynveen KJ, Rheault W, Rothschild B. Objective assessment with establishment of normal values for lumbar spinal range of motion. Phys Ther. 1983;63(11):1776–81. 10.1093/ptj/63.11.1776 [DOI] [PubMed] [Google Scholar]
  • 16.Gill K, Krag MH, Johnson GB, Haugh LD, Pope MH. Repeatability of four clinical methods for assessment of lumbar spinal motion. Spine. 1988;13(1):50–3. 10.1097/00007632-198801000-00012 [DOI] [PubMed] [Google Scholar]
  • 17.Hyytiainen K, Salminen JJ, Suvitie T, Wickstrom G, Pentti J. Reproducibility of nine tests to measure spinal mobility and trunk muscle strength. Scand J Rehabil Med. 1991;23(1):3–10. [PubMed] [Google Scholar]
  • 18.Portek I, Pearcy MJ, Reader GP, Mowat AG. Correlation between radiographic and clinical measurement of lumbar spine movement. Br J Rheumatol. 1983;22(4):197–205. 10.1093/rheumatology/22.4.197 [DOI] [PubMed] [Google Scholar]
  • 19.Saur PM, Ensink FB, Frese K, Seeger D, Hildebrandt J. Lumbar range of motion: reliability and validity of the inclinometer technique in the clinical measurement of trunk flexibility. Spine (Phila Pa 1976). 1996;21(11):1332–8. 10.1097/00007632-199606010-00011 [DOI] [PubMed] [Google Scholar]
  • 20.Al-Amri M, Nicholas K, Button K, Sparkes V, Sheeran L, Davies JL. Inertial Measurement Units for Clinical Movement Analysis: Reliability and Concurrent Validity. Sensors (Basel). 2018;18(3). 10.3390/s18030719 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Nuesch C, Roos E, Pagenstert G, Mundermann A. Measuring joint kinematics of treadmill walking and running: Comparison between an inertial sensor based system and a camera-based system. J Biomech. 2017;57:32–8. 10.1016/j.jbiomech.2017.03.015 [DOI] [PubMed] [Google Scholar]
  • 22.Doğan M, Koçak M, Onursal Kılınç Ö, Ayvat F, Sütçü G, Ayvat E, et al. Functional range of motion in the upper extremity and trunk joints: Nine functional everyday tasks with inertial sensors. Gait Posture. 2019;70:141–7. 10.1016/j.gaitpost.2019.02.024 [DOI] [PubMed] [Google Scholar]
  • 23.Rigoni M, Gill S, Babazadeh S, Elsewaisy O, Gillies H, Nguyen N, et al. Assessment of Shoulder Range of Motion Using a Wireless Inertial Motion Capture Device-A Validation Study. Sensors (Basel). 2019;19(8). 10.3390/s19081781 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.van der Straaten R, Bruijnes AKBD, Vanwanseele B, Jonkers I, De Baets L, Timmermans A. Reliability and Agreement of 3D Trunk and Lower Extremity Movement Analysis by Means of Inertial Sensor Technology for Unipodal and Bipodal Tasks. Sensors (Basel, Switzerland). 2019;19(1):141 10.3390/s19010141 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Mundt M, Thomsen W, David S, Dupré T, Bamer F, Potthast W, et al. Assessment of the measurement accuracy of inertial sensors during different tasks of daily living. J Biomech. 2019;84:81–6. 10.1016/j.jbiomech.2018.12.023 [DOI] [PubMed] [Google Scholar]
  • 26.Inokuchi H, Tojima M, Mano H, Ishikawa Y, Ogata N, Haga N. Neck range of motion measurements using a new three-dimensional motion analysis system: validity and repeatability. Eur Spine J. 2015;24(12):2807–15. 10.1007/s00586-015-3913-2 [DOI] [PubMed] [Google Scholar]
  • 27.Teufl W, Miezal M, Taetz B, Fröhlich M, Bleser G. Validity, Test-Retest Reliability and Long-Term Stability of Magnetometer Free Inertial Sensor Based 3D Joint Kinematics. Sensors (Basel). 2018;18(7). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Zhang JT, Novak AC, Brouwer B, Li Q. Concurrent validation of Xsens MVN measurement of lower limb joint angular kinematics. Physiol Meas. 2013;34(8):N63–9. 10.1088/0967-3334/34/8/N63 [DOI] [PubMed] [Google Scholar]
  • 29.Mavor MP, Ross GB, Clouthier AL, Karakolis T, Graham RB. Validation of an IMU Suit for Military-Based Tasks. Sensors. 2020;20(15). 10.3390/s20154280 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Karatsidis A, Jung M, Schepers HM, Bellusci G, de Zee M, Veltink PH, et al. Musculoskeletal model-based inverse dynamic analysis under ambulatory conditions using inertial motion capture. Med Eng Phys. 2019;65:68–77. 10.1016/j.medengphy.2018.12.021 [DOI] [PubMed] [Google Scholar]
  • 31.Robert-Lachaine X, Mecheri H, Larue C, Plamondon A. Validation of inertial measurement units with an optoelectronic system for whole-body motion analysis. Med Biol Eng Comput. 2017;55(4):609–19. 10.1007/s11517-016-1537-2 [DOI] [PubMed] [Google Scholar]
  • 32.Robert-Lachaine X, Mecheri H, Larue C, Plamondon A. Validation of inertial measurement units with an optoelectronic system for whole-body motion analysis. Med Biol Eng Comput. 2016;55 10.1007/s11517-016-1537-2 [DOI] [PubMed] [Google Scholar]
  • 33.Heikkila S, Viitanen JV, Kautiainen H, Kauppi M. Sensitivity to change of mobility tests; effect of short term intensive physiotherapy and exercise on spinal, hip, and shoulder measurements in spondyloarthropathy. J Rheumatol. 2000;27(5):1251–6. [PubMed] [Google Scholar]
  • 34.Perret C, Poiraudeau S, Fermanian J, Colau MM, Benhamou MA, Revel M. Validity, reliability, and responsiveness of the fingertip-to-floor test. Arch Phys Med Rehabil. 2001;82(11):1566–70. 10.1053/apmr.2001.26064 [DOI] [PubMed] [Google Scholar]
  • 35.Smolenski UC BJ, Beyer L, Harke G, Pahnke J, Seidel W. Janda. Manuelle Muskelfunktionsdiagnostik: Theorie und Praxis. 5 ed Germany: Elsevier Health Sciences; 2016. [Google Scholar]
  • 36.Grabe M. Measurement Uncertainties in Science and Technology. 2nd ed Berlin: Springer International Publishing; 2014. [Google Scholar]
  • 37.Boyce D, Brosky JA Jr. Determining the minimal number of cyclic passive stretch repetitions recommended for an acute increase in an indirect measure of hamstring length. Physiother Theory Pract. 2008;24(2):113–20. 10.1080/09593980701378298 [DOI] [PubMed] [Google Scholar]
  • 38.Hatano G, Suzuki S, Matsuo S, Kataura S, Yokoi K, Fukaya T, et al. Hamstring Stiffness Returns More Rapidly After Static Stretching Than Range of Motion, Stretch Tolerance, and Isometric Peak Torque. J Sport Rehabil. 2019;28(4):325–31. 10.1123/jsr.2017-0203 [DOI] [PubMed] [Google Scholar]
  • 39.Frost M, Stuckey S, Smalley LA, Dorman G. Reliability of measuring trunk motions in centimeters. Phys Ther. 1982;62(10):1431–7. 10.1093/ptj/62.10.1431 [DOI] [PubMed] [Google Scholar]
  • 40.Kim WD, Shin D. Correlations Between Hip Extension Range of Motion, Hip Extension Asymmetry, and Compensatory Lumbar Movement in Patients with Nonspecific Chronic Low Back Pain. Med Sci Monit. 2020;26:e925080 10.12659/MSM.925080 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Roach SM, San Juan JG, Suprak DN, Lyda M, Bies AJ, Boydston CR. Passive hip range of motion is reduced in active subjects with chronic low back pain compared to controls. Int J Sports Phys Ther. 2015;10(1):13–20. [PMC free article] [PubMed] [Google Scholar]
  • 42.Bland JM, Altman DG. Measurement error and correlation coefficients. BMJ. 1996;313(7048):41–2. 10.1136/bmj.313.7048.41 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Bland JM, Altman DG. Measurement error. Bmj. 1996;313(7059):744 10.1136/bmj.313.7059.744 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86(2):420–8. 10.1037//0033-2909.86.2.420 [DOI] [PubMed] [Google Scholar]
  • 45.Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–74. [PubMed] [Google Scholar]
  • 46.Evans JD. Straightforward statistics for the behavioral sciences Pacific Grove: Brooks/Cole Pub. Co; 1996. [Google Scholar]
  • 47.Valentine RE, Lewis JS. Intraobserver reliability of 4 physiologic movements of the shoulder in subjects with and without symptoms. Archives of physical medicine and rehabilitation. 2006;87(9):1242–9. 10.1016/j.apmr.2006.05.008 [DOI] [PubMed] [Google Scholar]
  • 48.de Winter AF, Heemskerk MA, Terwee CB, Jans MP, Devillé W, van Schaardenburg DJ, et al. Inter-observer reproducibility of measurements of range of motion in patients with shoulder pain using a digital inclinometer. BMC Musculoskelet Disord. 2004;5:18 10.1186/1471-2474-5-18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Green S, Buchbinder R, Forbes A, Bellamy N. A standardized protocol for measurement of range of movement of the shoulder using the Plurimeter-V inclinometer and assessment of its intrarater and interrater reliability. Arthritis Care Res. 1998;11(1):43–52. 10.1002/art.1790110108 [DOI] [PubMed] [Google Scholar]
  • 50.Kolber MJ, Saltzman SB, Beekhuizen KS, Cheng MS. Reliability and minimal detectable change of inclinometric shoulder mobility measurements. Physiother Theory Pract. 2009;25(8):572–81. 10.3109/09593980802667995 [DOI] [PubMed] [Google Scholar]
  • 51.https://de.statista.com/statistik/daten/studie/249080/umfrage/anteile-der-wirtschaftssektoren-am-bruttoinlandsprodukt-bip-der-eu-laender/ (11.09.2018)[
  • 52.Mellin G, Kiiski R, Weckstrom A. Effects of subject position on measurements of flexion, extension, and lateral flexion of the spine. Spine (Phila Pa 1976). 1991;16(9):1108–10. [DOI] [PubMed] [Google Scholar]
  • 53.Vigotsky AD, Lehman GJ, Beardsley C, Contreras B, Chung B, Feser EH. The modified Thomas test is not a valid measure of hip extension unless pelvic tilt is controlled. PeerJ. 2016;4:e2325 10.7717/peerj.2325 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Alaranta H, Hurri H, Heliovaara M, Soukka A, Harju R. Flexibility of the spine: normative values of goniometric and tape measurements. Scand J Rehabil Med. 1994;26(3):147–54. [PubMed] [Google Scholar]
  • 55.Burdett RG, Brown KE, Fall MP. Reliability and validity of four instruments for measuring lumbar spine and pelvic positions. Phys Ther. 1986;66(5):677–84. 10.1093/ptj/66.5.677 [DOI] [PubMed] [Google Scholar]
  • 56.Chen SP, Samo DG, Chen EH, Crampton AR, Conrad KM, Egan L, et al. Reliability of three lumbar sagittal motion measurement methods: surface inclinometers. J Occup Environ Med. 1997;39(3):217–23. 10.1097/00043764-199703000-00011 [DOI] [PubMed] [Google Scholar]
  • 57.Ng JK, Kippers V, Richardson CA, Parnianpour M. Range of motion and lordosis of the lumbar spine: reliability of measurement and normative values. Spine (Phila Pa 1976). 2001;26(1):53–60. 10.1097/00007632-200101010-00011 [DOI] [PubMed] [Google Scholar]
  • 58.Uswr PT. The Reliability of Bubble Inclinometer and Tape Measure in Determining Lumbar Spine Range of Motion in Healthy Individuals and Patients 2015. 137–44 p. [Google Scholar]
  • 59.Jonsson E, Ljungkvist I, Hamberg J. Standardized measurement of lateral spinal flexion and its use in evaluation of the effect of treatment of chronic low back pain. Ups J Med Sci. 1990;95(1):75–86. 10.3109/03009739009178578 [DOI] [PubMed] [Google Scholar]
  • 60.Inger L, Anderson B, Hildegunn L, Skouen J, Ostelo R, Magnussen L. Responsiveness to Change of 10 Physical Tests Used for Patients With Back Pain. Phys Ther. 2011;91:404–15. 10.2522/ptj.20100016 [DOI] [PubMed] [Google Scholar]
  • 61.Pollock DSG. Handbook of Time Series Analysis, Signal Processing, and Dynamics London: Academic Press; 1999. [Google Scholar]
  • 62.Kippers V, Parker AW. Toe-touch test. A measure of its validity. Phys Ther. 1987;67(11):1680–4. 10.1093/ptj/67.11.1680 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Juliane Müller

3 Aug 2020

PONE-D-20-15948

Intra- and Interrater Reliability of Range of Motion Tests: Tape Measure and Digital Inclinometer compared to Inertial Motion Capture

PLOS ONE

Dear Dr. Maltry,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

ACADEMIC EDITOR: Please see the reviewer comments for detailed feedback on your manuscript. Please consider all mentioned aspekts while revising your manuscript. Your study is of high interested but major changes are required (rationale; data analysis) before possible consideration of publication.

==============================

Please submit your revised manuscript by Sep 17 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Juliane Müller, PhD

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2.We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

3. Your ethics statement must appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please move it to the Methods section and delete it from any other section. Please also ensure that your ethics statement is included in your manuscript, as the ethics section of your online submission will not be published alongside your manuscript.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: General comments

This study seeks to examine an interesting question regarding the reliability of several range of motion tests that have traditionally been used in clinical physiotherapy practice to more recent advances in wearable IMU technology. Such a question therefore could be of interest to developers of these technologies as well as to researchers and practitioners in the field of physiotherapy and exercise science. The paper is quite well-written and presented throughout and details some quite comprehensive results. However, I have a number of major reservations for the authors to consider.

Specific comments

Title: I’m not sure the title completely encompasses the data provided in the manuscript. In that you have compared the Xsens inertial motion capture data to that of the tape measure and digital inclinometer data, this is some type of relationship, perhaps even partially a validity study. I therefore suggest you add in the phrase “Relationship” somewhere in the title as well as in the aims of the study, especially as the first two lines of results you have reported in your abstract (line 42 and 43), describe these relationships.

Line 46 – 47: I’m not sure if this interpretation is the best for your paper as that would mean that one of these tools is considered a criterion method of the assessment of joint range of motion. As traditional 3D motion capture systems such as Vicon are typically considered the criterion method for assessing joint range of motion, I would prefer you focus here on the reliability of each system rather than the relationship between them. Specifically, if the reliability of the Xsens is greater than that of the more traditional methods, that might be one reason to support its use in clinical practice.

Line 74: you may wish to refer to this systematic review which has summarised the validity of a class of inertial device (smart devices) to criterion measures as well as the intratester and intertester reliability of traditional and inertial methods of joint range of motion assessment. https://pubmed.ncbi.nlm.nih.gov/31067247/ This paper may provide some better context to the study and provide some data with which to compare your results within your discussion section.

Line 85 – 86: I was surprised you only included the digital inclinometer and tape measure here and not the goniometer as one of the traditional measurement tools. You mentioned goniometers on line 70 and I am therefore highly surprised they were not included in this study, especially with some of the recent research that has compared goniometers to the inertial sensors found in smart phones, such as that summarised in the systematic review I highlighted in the previous comment.

Line 117 – 130: while a comparison of traditional 3D motion capture methods to the inertial motion capture system is of theoretical and practical interest, this was not performed in the study and no data was provided in the introduction about how well these measurements are related in the literature. Therefore, I am not sure of the practical utility of this comparison. As you are not comparing any of these systems to what is considered the criterion method of traditional 3D motion capture, you can’t specifically state one system is more valid or accurate than the other. Further, even if the inertial motion capture system is more reliable than the traditional methods, the cost of the system, time required to collect data and the complicated Matlab code needed to generate the data means that such an approach is not clinically feasible. Further, such an approach even seems more complicated than traditional 3D motion capture in which the software provides the joint range of motion data relatively easily after the joint markers positions have been tracked during the movement.

Line 176 – 177: I may have missed it, but what is meant by HD reprocessed?

Line 219 – 239: it was great to see some measures of absolute and not just relative reliability included in this study. However, can you provide a clearer definition of measurement error and repeatability and perhaps some references to other studies that have used these measures compared to other absolute measurement error scores such as mean difference and limits of agreement, coefficient of variation, root mean square error etc?

Line 243: what is meant by “optimal effect size” here and in Table 5? I have seen that you have defined this term earlier in the statistics section, but I can’t see anywhere in the tables any effect size data. Further, if you’re comparing the difference in the scores between the different devices in the study, I would have thought that the smaller effect size, the greater the similarity/relationship between the two devices. This would mean in such a study you would like to see very small effect size differences rather than large effect size differences, meaning that an optimal effect size would be very small rather than large.

Table 2 – 4: I would suggest that the p-value row is placed after the 95% confidence interval row in these three tables. Further, I would suggest that for any of your measurement errors or repeatability values that are measured in degrees, two decimal places is way beyond the precision of measurement. I would therefore suggest that such data have one decimal place maximum. This would also apply to any distances that are measured in millimetres.

Table 4: I was wondering why there was no measurement error/repeatability data provided in this table.

Line 278: you cannot state that any of your devices or methods are more accurate than one another as you did not compare them to any criterion method such as traditional 3D motion capture. As I’ve stated before, you can only say that such devices were more similar or had stronger relationships to each other.

Lines 334 – 336: what are the units of measurement for this measurement error?

Line 336 – 338: would this small potential sagittal plane arm swing be much of a difference in the results? If so, was there any testing method or data analysis method you could have used to minimise the potential effect of the arm sway?

Line 342 – 345: could potential difference in the participants or the level of training of assessors perhaps be influential here in these between study differences?

Overall Discussion: it might be useful refer back to this systematic review https://pubmed.ncbi.nlm.nih.gov/31067247/ to compare your findings to that of the reliability and to a slightly lesser extent the validity research for measuring joint range of motion. When you are talking about your absolute reliability scores such as measurement error, it would also be useful to refer back to the wider literature to what is considered the minimum clinically important difference for these joint range of motion and whether the measurement error is smaller or larger than the MCID.

Overall references: you have included many up-to-date and relevant references for the IMU literature, but it appears many of the references for the inclinometer and tape measure methods are dated. Is this because little research has been conducted on these methods over the last two decades as the universal goniometer is the most common form of joint range of motion assessment in clinical practice?

Reviewer #2: The presented study investigates the inter- and intra-rater reliability of three different techniques to assess RoM based on five standardized movement tests. The authors highlight that validity and reliability is key for adequate practical application of RoM assessments in a clinical environment. They therefore compare the results of two low cost techniques (digital inclinometer and tape measure) against a kinematic measurement system (inertial motion capture). Based on the results (ICC and measurement errors), the authors recommend both low cost systems for the use in medical assessments.

Major issues:

In its current form it is not clear enough stated what is the precise aim of the investigation. Is the IMC system assumed to be the “gold standard” to validate the use of the two low cost techniques? If so, the validity and reliability of the IMC system against the laboratory “gold standard” for movement assessment (until today, that is still a marker based optoelectrical 3D-camera system) needs to be stated and discussed much more thoroughly.

The experimental design is in general thoroughly described; however, some aspects will benefit of more details to fully understand all performed comparisons (see points below).

In terms of conducted statistical tests it is recommended to also incorporate Bland-Altman analyses (with corresponding plots). This would allow to assess systematic and random errors between the systems given in its actual measurement units. A more critical discussion in regards to the findings of this study is required.

Minor issues/comments:

- Line 75: if the IMC is your gold standard, then you will need to provide further details of its validity/reliability compared to highly standardized laboratory methods.

- Line 92: this figure only shows the graphical representation of the IMC software. It would be helpful to also provide pictures of an exemplary participant where the actual measurements (IMC vs DI or IMC vs TM) are performed

- Line 101: its recommended to always use the same amount of decimal points (And in this case even rounding up to the full year would be fine for this kind of data)

- Line 117 and following: here the reader needs more information about the system. Are there other resources about measurement precision than given by the manufacturer (independent scientific investigations). The provided statement is too general. Measurement precision depends on various calculations and will much likely depend on the site and the movement (as discussed by the authors in part later in the discussion)

- Line 128: What is meant by HD reprocessed? This needs to be introduced/explained

- Line 137: Is there a source for that?

- Line 152: This is not clear for the reader. This means 15 familiarization trials before 5 actual measurements trials? But later your spearman calculations are based on up to 550 number of repetitions.

- Line 168 and following: as mentioned above, an exemplary picture with the methods applied would much likely be appreciated by the readers. Same applies for the other tests.

- Line 174: what is meant by “length in 3D”?

- Line 223: How much did the data deviate from normal distribution. This also affects the outcome/validity of the ICC calculations.

- Line 257 and following: The tables are sometimes difficult to read. It is recommended to separate the different statistical tests by thicker lines (or in a different way), e.g. between “Measurement error” and “95% Confidence interval”;

- All tables: used abbreviations need to be explained in a legend for each table

- Table 1: How are these high ”n’s” possible? That is not sufficiently explained before

- Table 2: The unit of measurement for “Measurement Error” and “repeatability” is missing, or? Both outcomes are not well described within the manuscript (and only short in the given reference)

- Table 4: the denotation “DI/TM” might be changed to “DI or TM”

- Line 276: language: why is a comparison justified? This is not well expressed. Low Spearman’s rho results (especially for the Thomas test) need to be further discussed.

- Line 279 and following: but there are more potential methodological reasons for that.

- Line 281 and following: Statement not clear

- Line 331: So in other words the comparison is potentially based on invalid data! That should be investigated and clearly stated

- Line 334 and following: measurement units?

- Line 356: If these methods are adequate to measure individual training progress, then it would be good to discuss how precise exactly they are. What is the minimal detectable change? Statement regarding the identification of individual training progress might however be beyond the possibilities of this study.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Justin Keogh

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Dec 10;15(12):e0243646. doi: 10.1371/journal.pone.0243646.r002

Author response to Decision Letter 0


16 Oct 2020

Response to Editor:

We have followed the instructions for formatting in the revision, uploaded the data in Researchgate and provided an DOI and included the Ethics statement only in the Methods Section.

Response to Reviewer 1:

Thank you very much for taking your time to assess our work and your suggestion. Please find the response to your comments below. In general, we learned, that there has been a misunderstanding of the aim of the study. We did not intend to conduct a validity study, which we apparently failed to describe more precisely in the first version of this manuscript, possibly because we are non-native English speakers. In the revised version, we emphasized on rephrasing the entire manuscript more distinct. We aimed at conducting range of motion data of low cost devices and IMC in order to obtain intra- and interrater reliabilities for both methods.

Reviewer #1: General comments

This study seeks to examine an interesting question regarding the reliability of several range of motion tests that have traditionally been used in clinical physiotherapy practice to more recent advances in wearable IMU technology. Such a question therefore could be of interest to developers of these technologies as well as to researchers and practitioners in the field of physiotherapy and exercise science. The paper is quite well-written and presented throughout and details some quite comprehensive results. However, I have a number of major reservations for the authors to consider.

Specific comments

Title: I’m not sure the title completely encompasses the data provided in the manuscript. In that you have compared the Xsens inertial motion capture data to that of the tape measure and digital inclinometer data, this is some type of relationship, perhaps even partially a validity study. I therefore suggest you add in the phrase “Relationship” somewhere in the title as well as in the aims of the study, especially as the first two lines of results you have reported in your abstract (line 42 and 43), describe these relationships.

Thank you for the suggestion. We rephrased the title: “Intra- and Interrater Reliability of Joint Range of Motion Tests using Tape Measure, Digital Inclinometer and Inertial Motion Capturing”

Line 46 – 47: I’m not sure if this interpretation is the best for your paper as that would mean that one of these tools is considered a criterion method of the assessment of joint range of motion. As traditional 3D motion capture systems such as Vicon are typically considered the criterion method for assessing joint range of motion, I would prefer you focus here on the reliability of each system rather than the relationship between them. Specifically, if the reliability of the Xsens is greater than that of the more traditional methods, that might be one reason to support its use in clinical practice.

Thank you for the suggestion, we considered that when we rewrote the entire abstract.

Line 74: you may wish to refer to this systematic review which has summarised the validity of a class of inertial device (smart devices) to criterion measures as well as the intratester and intertester reliability of traditional and inertial methods of joint range of motion assessment. https://pubmed.ncbi.nlm.nih.gov/31067247/ This paper may provide some better context to the study and provide some data with which to compare your results within your discussion section.

Thank you for the suggestion, we have included the study in the introduction.

Line 85 – 86: I was surprised you only included the digital inclinometer and tape measure here and not the goniometer as one of the traditional measurement tools. You mentioned goniometers on line 70 and I am therefore highly surprised they were not included in this study, especially with some of the recent research that has compared goniometers to the inertial sensors found in smart phones, such as that summarised in the systematic review I highlighted in the previous comment.

Previously to our study we had long discussions in our team on how best to evaluate the chosen range of motion tests. We are well aware that the goniometer is used a lot in ROM measurements. However, it was difficult in the included tests to apply the goniometer properly, since as you also stated in your systematic review, the typical goniometer is not long enough or the adjustments to bony landmarks is difficult. For example, in the retroflexion of the trunk it would be hard to choose bony landmarks and to properly apply the goniometer, which also accounts for the shoulder test. Most importantly, we searched the literature for measurement accuracy of the goniometer and the DI. Our findings showed, that the DI is at least as accurate as the goniometer or even higher in some studies. While the intrarater reliability seem to be similar, the interrater reliability has been shown to be lower. We have included this in the introduction section now.

Line 117 – 130: while a comparison of traditional 3D motion capture methods to the inertial motion capture system is of theoretical and practical interest, this was not performed in the study and no data was provided in the introduction about how well these measurements are related in the literature. Therefore, I am not sure of the practical utility of this comparison. As you are not comparing any of these systems to what is considered the criterion method of traditional 3D motion capture, you can’t specifically state one system is more valid or accurate than the other. Further, even if the inertial motion capture system is more reliable than the traditional methods, the cost of the system, time required to collect data and the complicated Matlab code needed to generate the data means that such an approach is not clinically feasible. Further, such an approach even seems more complicated than traditional 3D motion capture in which the software provides the joint range of motion data relatively easily after the joint markers positions have been tracked during the movement.

Thank you for referring to the optical motion capture systems, we realized, we should have explained that more clearly in the first place. We tried to expand on this topic in the revised version of the manuscript. Also, the Xsens software provides real time joint angles and position data as well. So the Thomas Test and the retroflexion of the trunk at least could also be evaluated without the matlab code. However, the specific angle of the shoulder and the distances needed to be calculated in matlab, but this would also be the case in the Vicon system.

Line 176 – 177: I may have missed it, but what is meant by HD reprocessed?

Thanks for mentioning. We clarified this in the manuscript: “Afterwards, on all recordings the “HD reprocessing” filter was applied, which is provided in the MVN Analyze software and offers the best possible data quality according to the manufacturer.”

Line 219 – 239: it was great to see some measures of absolute and not just relative reliability included in this study. However, can you provide a clearer definition of measurement error and repeatability and perhaps some references to other studies that have used these measures compared to other absolute measurement error scores such as mean difference and limits of agreement, coefficient of variation, root mean square error etc?

Thank you for the suggestion, we rewrote the section of statistical analyses much more detailed. As we have described before, we did not calculate validity eg. with Bland Altmann Plots an LoA. This is beacuse the low cost devices and the IMC system do not measure exactly the same construct. For example, in the tape measure tests we used the distances from the hand sensor, which is placed centrally on the back of the hand. In the lateral flexion we could calculate the distance to the floor because subjects were standing on even ground and we could use the single level scenario. In the finger-tip-to floor test we had to use the multi level scenario because subjects were standing on a little bench and not on even ground. In this scenario, a calculation to the floor is not possible. Therefore, we calculated the distance between the hand sensor and the foot sensor. It is also impossible to gain data of the fingertips in the IMC. Our aim was to best reproduce the construct used in the TM device, so the aimed range of motion is reflected likewise. Nevertheless, in the revised version of this manuscript we have included scatterplots which show the relationship between the methods (Fig2).

Line 243: what is meant by “optimal effect size” here and in Table 5? I have seen that you have defined this term earlier in the statistics section, but I can’t see anywhere in the tables any effect size data. Further, if you’re comparing the difference in the scores between the different devices in the study, I would have thought that the smaller effect size, the greater the similarity/relationship between the two devices. This would mean in such a study you would like to see very small effect size differences rather than large effect size differences, meaning that an optimal effect size would be very small rather than large.

Thanks for mentioning, the term effect size was wrongly used. We tried to clarify this in the entire manuscript. We are talking about Pearson correlations when we try to show, that the DI or TM measure similar constructs as the IMC. We aim at high correlations because this means that the DI or TM data shows a high similarity to those we calculated in Matlab using the IMC data. We did not calculate any differences.

Table 2 – 4: I would suggest that the p-value row is placed after the 95% confidence interval row in these three tables. Further, I would suggest that for any of your measurement errors or repeatability values that are measured in degrees, two decimal places is way beyond the precision of measurement. I would therefore suggest that such data have one decimal place maximum. This would also apply to any distances that are measured in millimetres.

� P-value has been moved in the line after the CI.

� Measuremt Errors and Repeatability has been changed to max one decimal place.

� There are no distances in this manuscript measured in millimeters.

Table 4: I was wondering why there was no measurement error/repeatability data provided in this table.

We have discussed this in detail with our statistics department and the colleague assured us that no measurement error is calculated for intra-rater reliability.

Line 278: you cannot state that any of your devices or methods are more accurate than one another as you did not compare them to any criterion method such as traditional 3D motion capture. As I’ve stated before, you can only say that such devices were more similar or had stronger relationships to each other.

Thank you for mentioning, we rewrote the discussion section avoiding classifications as “better” or “worse”.

Lines 334 – 336: what are the units of measurement for this measurement error?

The units are cm for the distances and ° for angular data. We have included this in the tables.

Line 336 – 338: would this small potential sagittal plane arm swing be much of a difference in the results? If so, was there any testing method or data analysis method you could have used to minimise the potential effect of the arm sway?

Line 342 – 345: could potential difference in the participants or the level of training of assessors perhaps be influential here in these between study differences?

Thank you for this question, we have now expanded on this: “While three studies [1, 44, 45] confirm the current, very good intra-rater reliabilities (ICCs 0.97 - 0.99), two studies [4, 13] have reported a low reproducibility, possibly due to differences in the study design. Merrit et al. [4] recorded the measurements for the intra-rater reliability on three different days but with only one instructional repetition, which is not enough to control for acute effects. Also, the subjects in this study were 18-65 years old. The lack of warm-up might also explain the poor repeatability Gill et al. [13] described (coefficient of variation: 14.1), since subjects maintained the flexed position while a rater took repeated measurements.”

Overall Discussion: it might be useful refer back to this systematic review https://pubmed.ncbi.nlm.nih.gov/31067247/ to compare your findings to that of the reliability and to a slightly lesser extent the validity research for measuring joint range of motion. When you are talking about your absolute reliability scores such as measurement error, it would also be useful to refer back to the wider literature to what is considered the minimum clinically important difference for these joint range of motion and whether the measurement error is smaller or larger than the MCID.

As described above we now focused more on the reliability instead of the validity.Thank you for mentioning the MCID. We initially considered referring to such values, but a deeper literature research showed that there are numerous MCID values even sometimes for the same joint ROM. Copay et al. summarized this in their 2Part Review doi: 10.2106/JBJS.RVW.17.00160. But we added comparisons to MDC values in the fingertip to floor test and the lateral inclination.

Overall references: you have included many up-to-date and relevant references for the IMU literature, but it appears many of the references for the inclinometer and tape measure methods are dated. Is this because little research has been conducted on these methods over the last two decades as the universal goniometer is the most common form of joint range of motion assessment in clinical practice?

Of course, the IMU literature has been published more recently, since it has been developed only a few years ago and its accuracy has reached scientific standard only recently. This is also true for the smartphone applications. For the DI method, standard reliability research which we focused on initially has been conducted much earlier, since this is obviously a more traditional method and the tests are basic RoM tests. For example in the Thomas test, the DI method is still frequently used in research on clinical applications; we added 2 more recent references:

1. Kim WD, Shin D. Correlations Between Hip Extension Range of Motion, Hip Extension Asymmetry, and Compensatory Lumbar Movement in Patients with Nonspecific Chronic Low Back Pain. Med Sci Monit. 2020;26:e925080.

2. Roach SM, San Juan JG, Suprak DN, Lyda M, Bies AJ, Boydston CR. Passive hip range of motion is reduced in active subjects with chronic low back pain compared to controls. Int J Sports Phys Ther. 2015;10(1):13-20.

Also 1 publication on the intrarater reliablity of a DI and a goniometer was included; showing that both are excellent.

Roach S, San Juan JG, Suprak DN, Lyda M. Concurrent validity of digital inclinometer and universal goniometer in assessing passive hip mobility in healthy subjects. Int J Sports Phys Ther. 2013;8(5):680-8.

Response to Reviewer 2:

Thank you very much for taking your time to assess our work and your suggestion. Please find the response to your comments below. In general, we learned, that there has been a misunderstanding of the aim of the study. We did not intend to conduct a validity study, which we apparently failed to describe more precisely in the first version of this manuscript, possibly because we are non-native English speakers. In the revised version, we emphasized on rephrasing the entire manuscript more distinct. We aimed at conducting range of motion data of low cost devices and IMC in order to obtain intra- and interrater reliabilities for both methods in order to give advice for clinical applications.

Reviewer #2: The presented study investigates the inter- and intra-rater reliability of three different techniques to assess RoM based on five standardized movement tests. The authors highlight that validity and reliability is key for adequate practical application of RoM assessments in a clinical environment. They therefore compare the results of two low cost techniques (digital inclinometer and tape measure) against a kinematic measurement system (inertial motion capture). Based on the results (ICC and measurement errors), the authors recommend both low cost systems for the use in medical assessments.

Major issues:

In its current form it is not clear enough stated what is the precise aim of the investigation. Is the IMC system assumed to be the “gold standard” to validate the use of the two low cost techniques? If so, the validity and reliability of the IMC system against the laboratory “gold standard” for movement assessment (until today, that is still a marker based optoelectrical 3D-camera system) needs to be stated and discussed much more thoroughly.

The experimental design is in general thoroughly described; however, some aspects will benefit of more details to fully understand all performed comparisons (see points below).

In terms of conducted statistical tests it is recommended to also incorporate Bland-Altman analyses (with corresponding plots). This would allow to assess systematic and random errors between the systems given in its actual measurement units. A more critical discussion in regards to the findings of this study is required.

We did not include Bland Altmann plots, because we are aware that the low cost devices and the IMC system does not measure exactly the same construct. For example, in the tape measure tests we used the distances from the hand sensor, which is placed centrally on the back of the hand. In the lateral flexion we could calculate the distance to the floor because subjects were standing on even ground and we could use the single level scenario. In the finger-tip-to floor test we had to use the multi level scenario because subjects were standing on a little bench and not on even ground. In this scenario, a calculation to the floor is not possible. Therefore, we calculated the distance between the hand sensor and the foot sensor. It is also impossible to gain data of the fingertips in the IMC. Our aim was to best reproduce the construct used in the TM device, so the aimed range of motion is reflected likewise. Nevertheless, in the revised version of this manuscript we have included scatterplots which show the relationship between the methods (Fig2).

Minor issues/comments:

- Line 75: if the IMC is your gold standard, then you will need to provide further details of its validity/reliability compared to highly standardized laboratory methods.

Thank you for the suggestion. Although this is not a validity study we now included information on the comparison of the OMC and the IMC in the introduction section.

Line 92: this figure only shows the graphical representation of the IMC software. It would be helpful to also provide pictures of an exemplary participant where the actual measurements (IMC vs DI or IMC vs TM) are performed

Thank you for mentioning. We now included a revised version of Fig1, where the exact measurements on an exemplary subject are shown. The subject wears the measurement suit and a rater measures angles and distances.

-Line 101: its recommended to always use the same amount of decimal points (And in this case even rounding up to the full year would be fine for this kind of data)

Thanks for the hint. Done.

Line 117 and following: here the reader needs more information about the system. Are there other resources about measurement precision than given by the manufacturer (independent scientific investigations). The provided statement is too general. Measurement precision depends on various calculations and will much likely depend on the site and the movement (as discussed by the authors in part later in the discussion).

Thank you for mentioning, we have expanded on this in the introduction section.

Line 128: What is meant by HD reprocessed? This needs to be introduced/explained

We have added explanations.

Line 137: Is there a source for that?

Included in the method section.

Line 152: This is not clear for the reader. This means 15 familiarization trials before 5 actual measurements trials? But later your spearman calculations are based on up to 550 number of repetitions.

We have tried to clarify that section. “In each test 25 repetitions were performed which were recorded simultaneously by the IMC system and the DI or IMC. The first 20 repetitions were recorded by the first rater but not included in any calculations as they were included as warm up in order to control for acute effects [32, 33]. For rater one, measurement 21 – 25 were included in the analysis [34]. Subsequently, the second rater measured another five repetitions. The order of the raters was randomly chosen.”

Thanks for also mentioning the correlations. There was a mistake we know corrected. The new correlations were executed on only one repetition per subject. See the description of the new Table 4: “The 21st repetition of rater 1 was used exemplarily; only measurements where synchronicity could be ensured were included.”

Line 168 and following: as mentioned above, an exemplary picture with the methods applied would much likely be appreciated by the readers. Same applies for the other tests.

See new Fig 1.

Line 174: what is meant by “length in 3D”?

Now rephrased: “The arm length (distance between humerus head and wrist)…

Line 223: How much did the data deviate from normal distribution. This also affects the outcome/validity of the ICC calculations.

In the first version of the manuscript there was a mistake in the calculations of the correlations. Now we used only one repetition of each subject to exemplarily show the correlation. Here, all data was normally distributed. In the ICC calculations five measurements of each rater were included.

Line 257 and following: The tables are sometimes difficult to read. It is recommended to separate the different statistical tests by thicker lines (or in a different way), e.g. between “Measurement error” and “95% Confidence interval”;

Thanks for mentioning, we reformatted the tables.

All tables: used abbreviations need to be explained in a legend for each table

Done.

Table 1: How are these high ”n’s” possible? That is not sufficiently explained

Thank you for mentioning. This was a mistake, we initially used all data, including the 15 initial repetitions. Now we used only 1 repetition per subject. See new Table 4.

Table 2: The unit of measurement for “Measurement Error” and “repeatability” is missing, or? Both outcomes are not well described within the manuscript (and only short in the given reference)

We have included the units in the tables and tried to enhance the statistical analysis section.

Table 4: the denotation “DI/TM” might be changed to “DI or TM”

Done.

Line 276: language: why is a comparison justified? This is not well expressed. Low Spearman’s rho results (especially for the Thomas test) need to be further discussed.

Thank you for the hint. We have rewritten the whole section.

Line 279 and following: but there are more potential methodological reasons for that.

Thanks for mentioning. We have rephrased the entire discussion, calculated new correlations.

Line 281 and following: Statement not clear

The entire section has been rephrased now.

Line 331: So in other words the comparison is potentially based on invalid data! That should be investigated and clearly stated

The authors are not sure, what comparison you mean, since line 331 is empty. You potentially mean Line 313, which is about the Thomas test. This confusion was based on the wrongly calculated correlations. In the revised manuscript with the newly calculated correlations this does not occur anymore.

Line 334 and following: measurement units?

Included.

Line 356: If these methods are adequate to measure individual training progress, then it would be good to discuss how precise exactly they are. What is the minimal detectable change? Statement regarding the identification of individual training progress might however be beyond the possibilities of this study.

Thank you for the advice. We included data on the measurement precision in the rewritten discussion and were more careful considering the training progress evaluation.

________________________________________

Attachment

Submitted filename: Reviewer 2.docx

Decision Letter 1

Juliane Müller

11 Nov 2020

PONE-D-20-15948R1

Intra- and Inter-Rater Reliability of Joint Range of Motion Tests using Tape Measure, Digital Inclinometer and Inertial Motion Capturing

PLOS ONE

Dear Dr. Fräulin,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

ACADEMIC EDITOR: Thank you for addressing most of the reviewer comments. The quality of your manuscript improved and puts out the strength of your manuscript. Nevertheless, there are still some minor comments left. Please take care of them while revising your manuscript.

==============================

Please submit your revised manuscript by Dec 26 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Juliane Müller, PhD

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: General comments

Thank you for attending to most of my comments on the initial version of the manuscript. The remaining specific comments I’ve provided below are ways in which I still believe the manuscript can be improved, with the line numbers I have provided with respect to the track changes version of your manuscript.

Specific comments

line 121: please remove this p-value regarding the RMSE.

Line 305-306: it might be worthwhile to say in the sentence that the XSens software calculates the relevant joint angles for all parts of the body assessed in the study with the exception of the shoulder test.

Line 360 – 364: I am a little bit unsure what is meant by the comparison of the measurement systems if that is not a validity question. Does this still reflect the revised title and aim of the study and my initial concerns regarding a validity comparison? Or are you using such a comparison to make it clear that these different measuring systems are not directly comparable and shouldn’t be interchanged?

Table 1 – 2: in regards to the measurement error, repeatability and COR scores, you have listed the units as (cm/o). This suggested the reader that you have some new unit of measurement that combines both distance and angles for all of your tests. However, I feel what you’re trying to represent here is that some of these measures are distances and others are joint angles. Therefore, can you make it clearer in these tables which unit of measurement reflects each of the different tests?

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Dec 10;15(12):e0243646. doi: 10.1371/journal.pone.0243646.r004

Author response to Decision Letter 1


11 Nov 2020

Thanks again for you’re your time and effort in reviewing our manuscript. We address the minor revisions in the latest clean version of the manuscript. Therefore, the line numbering is different. I added the correct line numbering in the specific comments.

Reviewer 1:

Specific comments

line 121: please remove this p-value regarding the RMSE.

� Done. See line 97.

Line 305-306: it might be worthwhile to say in the sentence that the XSens software calculates the relevant joint angles for all parts of the body assessed in the study with the exception of the shoulder test.

� Done. See line 251-254.

Line 360 – 364: I am a little bit unsure what is meant by the comparison of the measurement systems if that is not a validity question. Does this still reflect the revised title and aim of the study and my initial concerns regarding a validity comparison? Or are you using such a comparison to make it clear that these different measuring systems are not directly comparable and shouldn’t be interchanged?

� Sorry, that term “comparison” must have been missed in the first revision process. Of course we agree that it is not a validity study. We rather wanted to describe the relationship between the systems by showing the Pearson correlation. In the results section we already changed the wording in the first revision. Now we also changed it in this section, see line 287-289.

Table 1 – 2: in regards to the measurement error, repeatability and COR scores, you have listed the units as (cm/o). This suggested the reader that you have some new unit of measurement that combines both distance and angles for all of your tests. However, I feel what you’re trying to represent here is that some of these measures are distances and others are joint angles. Therefore, can you make it clearer in these tables which unit of measurement reflects each of the different tests?

� Thanks for mentioning, we have already described that in the description of the tables, but we agree it should be visible in the table itself. We changed that in the tables.

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 2

Juliane Müller

25 Nov 2020

Intra- and Inter-Rater Reliability of Joint Range of Motion Tests using Tape Measure, Digital Inclinometer and Inertial Motion Capturing

PONE-D-20-15948R2

Dear Dr. Fraeulin,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

We are thankful for your patients during this long lasting review process.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Juliane Müller, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Justin Keogh

Acceptance letter

Juliane Müller

27 Nov 2020

PONE-D-20-15948R2

Intra- and Inter-Rater Reliability of Joint Range of Motion Tests using Tape Measure, Digital Inclinometer and Inertial Motion Capturing

Dear Dr. Fraeulin:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Juliane Müller

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: Reviewer 2.docx

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    The data underlying this study is available from: https://www.researchgate.net/publication/344687962_Joint_Range_of_Motion_Tests_measured_with_a_Digital_Inclinometer_a_Tape_Measure_and_Inertial_Motion_Capture?channel=doi&linkId=5f896cb792851c14bccc2fa8&showFulltext=true.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES