Validity, reliability and responsiveness of a goniometer watch to measure pure forearm rotation

Daniel Harte; Alan Nevill; Lucia Ramsey; Suzanne Martin

doi:10.1177/17589983231211813

. 2023 Nov 1;29(1):30–40. doi: 10.1177/17589983231211813

Validity, reliability and responsiveness of a goniometer watch to measure pure forearm rotation

Daniel Harte ^1,^✉, Alan Nevill ², Lucia Ramsey ³, Suzanne Martin ³

PMCID: PMC10901163 PMID: 38434187

Abstract

Introduction

Innovative instruments have been designed to assess forearm rotation, an anatomically challenging motion to measure. This study assessed the concurrent validity, interrater reliability and responsiveness of a novel goniometer watch (GoWatch) to measure pure forearm rotation. The modified finger goniometer (MFG) was the criterion reference.

Methods

Forty participants with restricted forearm rotation were recruited. Two raters measured supination and pronation using the GoWatch and MFG before and after a hand therapy session. Repeated-measures ANOVA assessed for systematic bias with an apriori residual error of 5° deemed as acceptable. Secondary analysis used intraclass coefficients (ICCs) to categorise interrater reliability. Responsiveness of the GoWatch was calculated using Cohen’s d

Results

The GoWatch demonstrated acceptable agreement with the MFG with a mean difference for supination 1.19° and pronation 0.20°. Interrater reliability was also within acceptable limits with a mean difference GoWatch supination 4.43° and pronation 2.23°. Interrater reliability for GoWatch supination and pronation were categorized as excellent (ICC = 0.94) and good (ICC = 0.85) respectively. Systematic bias was observed in the instrument by rater interaction with rater two consistently underestimating GoWatch measures (p<.05). GoWatch supination showed small to medium responsiveness (Rater 1: d = 0.14; Rater 2: d = 0.29) and pronation very small to medium responsiveness (Rater 1: d = 0.29; Rater 2: d = 0.05).

Conclusion

The GoWatch is a viable and user-friendly alternative to measure forearm rotation with demonstrable validity, interrater reliability and responsiveness. Further research is required to ensure systematic bias is not endemic when used across multiple raters.

Keywords: Range of motion, wrist fractures, supination, outcome measures, pronation, goniometry

Introduction

Forearm rotation requires unrestricted mobility of the proximal radioulnar, distal radioulnar and humeroradial joints making it possible to orientate the palm up or down in functional tasks.¹ When injury occurs to these structures, function can be impaired.² Supination in particular can be affected due to restrictions imposed by casts and splints, protective posturing and as a result of a natural tendency to perform most activities of daily living with the forearm in pronation.³

One example of an injury that can present with restrictions in forearm rotation is distal radius fractures. The incidence of this injury is rising globally⁴ and as such patients with distal radius fractures are commonly assessed and treated by healthcare practitioners.

Outcome measures are vital for appropriate goal setting and to ascertain patient progress and subsequently must be reliable, valid and responsive.⁵ It is recommended when using the International Classification of Functioning, Disability, and Health (ICF) as a framework that assessment should incorporate Body Functions and Structures, Participation, Activities and the Environment.⁶ In the context of hand therapy, one example of Body Functions and Structures assessment is goniometry to objectively measure range of movement (ROM). Guidelines to standardise methods of goniometer measurement⁷ recommend measuring forearm rotation using a two-arm goniometer with one arm placed on the wrist and the other aligned parallel to the humerus.⁷ This standard method measures pure forearm rotation and has excellent test-retest reliability and inter-rater reliability.⁸ However, the circumference of the distal wrist is oval-like in shape posing issues with accuracy in placing the level surface of the goniometer arm. Additionally, the assessor is required to assume that the second arm is truly vertical using vision only.

An alternative approach is referred to as functional rotation, taking into consideration hand orientation and therefore may be more meaningful to everyday function. Two techniques to measure functional rotation have been described.^9,10 One involves the patient gripping a pencil while the therapist aligns one arm of the goniometer with the pencil.⁹ The second describes a tubular handle attached perpendicular to the horizontal arm of a standard goniometer and a plumb line attached to its axis.¹⁰ The patient holds the tubular handle to define the plane of the palm so when the forearm is rotated the weighted plumb line establishes the vertical plane. These measures have reported high intra-rater and inter-rater reliability,^10–12 though several disadvantages to both approaches are noted. When measuring functional rotation, the patient must be able to grasp the handle or pencil to define the palmar plane, however commonly observed in patients with distal radius fractures is the inability to make a fist in the early phase of rehabilitation. Further, compensatory movements at the wrist and the fourth and fifth metacarpals need to be considered along with proximal positions.

A design by Szekeres et al. demonstrated how the measurement of forearm rotation can be improved by attaching a plumb line to a finger goniometer with flat arms¹³ (Figure 1). The weight on the goniometer uses gravity to achieve a true vertical position. The authors propose that the flat arms of the finger goniometer allow the therapist to secure it more firmly to the flatter surface of the dorsal aspect of the wrist during measurement, gaining a more stable position. This design was further tested and demonstrated the same inter-rater reliability as the standard approach in measuring supination and slightly higher reliability in measuring pronation.¹⁴ However, this method requires continual manipulation by the clinician during measurement, limiting observation of the upper extremity as the therapist is unable to move back from the patient to observe for any compensatory postures.

Figure 1. — Measuring pronation (top left) and supination (top right) using the goniometer watch (GoWatch) and measuring pronation (bottom left) and supination (bottom right) using the modified finger goniometer (MFG).

In this study we tested an alternative design to measure pure forearm rotation which considers the issues raised with existing measuring approaches. Unlike the recent method proposed by Szekeres et al.,¹³ this method does not require continual manipulation by the clinician, allowing the therapist to observe for any compensatory postures. This is achieved by applying consistent pressure to the wrist using a “snap” bracelet with a mounted goniometer watch. A “snap" bracelet is a bi-stable object that can be manipulated into two different configurations¹⁵: it can transit from a straight shape with a groove along its entire width to a coiled shape where the groove disappears.

The GoWatch was designed in-house using a novelty toy watch purchased from a commercial outlet. The strap of the watch was a “snap” bracelet with a silicone cover that also had a circular mount where the small toy watch face was inserted. The GoWatch dial face was produced using a high-resolution 3D printer so that its base could fit into the existing mount.

Previous attempts to design goniometers that attach to the wrist^16,17 were impractical as they involved the assembly of a large apparatus. Further, the design in this study uses a ball bearing to act as a gravity-assisted dial, improving linear motion by reducing friction between the goniometer components. Therefore this design may theoretically offer further improvements to the recent method described and potentially be more user-friendly.

The purpose of this study was to test the concurrent validity, inter-rater reliability and responsiveness of this new goniometer design in measuring pure forearm rotation with the modified finger goniometer as the criterion instrument for reference.

Methods

Design

This was a prospective repeated measures study examining the validity, inter-rater reliability and responsiveness of a new goniometer device to measure pure forearm rotation. The Guidelines for Reporting Reliability and Agreement Studies (GRRAS),¹⁸ COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) Risk of Bias checklist were used to ensure quality procedures and reporting.¹⁹

Participants

Participants with upper limb trauma or disease that had impeded forearm rotation were consecutively recruited if they met the inclusion criteria. The availability of the chief investigator and both raters was necessary if participation was to occur on the day of recruitment otherwise the participant could be involved at a future therapy session.

Ethics

This study was approved by the Office of Research Ethics Committees, Northern Ireland (ORECNI) and the Medicines and Healthcare products Regulatory Authority (MHRA). MHRA recommended that the goniometer watch be labelled “exclusively for clinical investigations” as it was a new medical device. All participants provided written informed consent.

Recruitment

Potential participants were identified by surgeons and therapists through the Trust’s fracture clinic, physiotherapy and occupational therapy departments. Each eligible patient had a “cooling off" period to consider if they wished to participate in the study. Consent and participation in the study did not occur until participants saw the goniometer watch, gauged its acceptability and read the study information sheet. Alternatively, patients could choose to participate at the time of their next planned hospital review. Nobody who met the study criteria and was invited declined to participate in the study.

Sample size

Bland and Altman recommend that agreement is necessary to establish if the precision of a new measuring device is acceptable compared to the gold standard. They highlighted this using a sample size of only 17 participants.²⁰ A sample of 30 is commonly employed as a benchmark to rely on the central limit theorem, as statistical research has found that with a sample size of 30, the sampling distribution of the mean is approximately normal.²¹ Therefore 40 participants were proposed as an adequate and cost-effective sample size.

Inclusion criteria

The study aimed to recruit people aged 18 years and over with a history of upper limb trauma or disease affecting forearm rotation and that active ROM of the forearm was permitted by the treating physician.

Exclusion criteria

Any patient with an open wound around the distal portion of the wrist, hyperaesthesia or allodynia to the wrist or a cognitive impairment or learning difficulty that prevented understanding of verbal instructions or informed consent were excluded from the study. Also patients were excluded if ROM was contra-indicated due to phase of healing.

Materials

The weighted Modified Finger Goniometer (MFG) (Device 1) was placed on the dorsal aspect of the wrist, the long arm crossing both Lister's tubercle and the ulnar head with the weight over the ulnar side of the wrist. It is flipped to measure either supination or pronation (Figure 1).

A 360° goniometer watch (GoWatch) (device 2) with measurements in 2° increments was used (Figure 1). Zero degrees are indicated at the bottom of the watch face and incorporate gravity to influence the motion of the ball-bearing. The GoWatch is attached to the radial border of the wrist just proximal to the ulnar styloid using the snap bracelet mechanism. The strap is attached so the 0° marking on the watch face is aligned with the centre of the anatomical snuffbox. The arm is positioned to the side of the trunk in the midline position and the elbow flexed to 90°. The forearm moves either into pronation or supination and a record is made of the degrees reached on the GoWatch face.

Procedure

A pilot study was conducted with 10 participants to familiarise therapists with the process and address any safety concerns. No safety concerns were highlighted. Some guidance was provided to both raters by the Chief Investigator on the application of both measuring devices during the pilot. Two occupational therapists each with over 10 years’ experience were designated raters. Two trials of each goniometry method were performed by each therapist before and after a session of hand therapy. The first trial data were used to compare measures between raters (inter-rater reliability) and between devices (concurrent validity). The second trial was used to examine the responsiveness of the devices.

Randomisation and blinding

A box of 90 cards, indicating which instrument, therapist, and direction of rotation would be measured first was used. The second therapist followed the same sequence of instrument and direction. An online random sequence generator (www.random.org) determined the sequence of cards. After each measure, the rater gave their result verbally to the Chief Investigator who recorded it. Each rater did not have access to the measurement sheets to assist in blinding of the first set of results. Each rater was also blinded to the other's results during the study. Participants were not blinded to the results as they would hear the results verbally after each measure.

Participants were assessed over one session with approximately 30-60 min between trials. A therapy session including various active and passive movements and/or heat therapy occurred during the interval. Participants sat on a plinth with their feet flat on the ground and were instructed to keep their elbows to the side of their trunk. Therapists observed that the participant's shoulders were level, the elbow positioned at 90° and the forearm was in mid-position before movement in either direction occurred. Participants were then instructed to move their forearm arm into maximum pronation or supination.

Statistical analysis

Concurrent Validity and Inter-rater Reliability: Bland Altman plots were populated on Microsoft Excel (2016) to assess agreement between each device (concurrent validity) and each rater (interrater reliability) with a priori limits of agreement set at 5° and that the confidence interval (CI) of the mean difference included the line of equality (i.e. no difference in mean scores). A 5 to 10° margin of error has been deemed acceptable for intra and interrater reliability of wrist range of movement.²² The authors deemed a threshold of 5° difference as an acceptable benchmark for interrater reliability. Szekeres et al.¹⁴ do not provide details on the level of agreement between the two-arm goniometer and the MFG however provide the mean difference between raters using Bland Altman which reported pronation as 5° [95%CI(-6, 16)] and supination as 3° [95%CI(-9, 14)]. In the absence of any a priori values in the literature for concurrent validity, the authors also determined within 5° difference as an acceptable level of agreement.

Concurrent validity is the extent two instruments agree in measuring the same construct. One instrument is typically an established tool used as a reference when assessing the accuracy of a new measurement instrument.²³ Inter-rater reliability is the extent to which the measures by two or more raters agree.²⁴ Bland Altman enables visualization of the accuracy of these properties.

Bias: Repeated-measures ANOVA using the Statistical Package for Social Sciences (v.25 SPSS) measured the extent to which values recorded the similarities and differences between raters and instruments. The primary benefit of using repeated-measures ANOVA instead of Intraclass Correlation Coefficients (ICC) is that it can provide a more precise estimate of agreement²⁰ and more detailed information about the difference or relationship between raters and instruments, separating bias from unexplained residual error. Significant bias (systematic differences across variables) was set at p<.05. The square root of within-subject mean square errors was used to calculate unexplained error.

Responsiveness of each device was assessed by measuring the effect size between trials one and two using Cohen’s d to interpret if the effect size with small, medium or large with (d) equal to 0.2, 0.5 and 0.8 respectively.²⁵ Responsiveness is the extent to which a device can detect change over time.²⁶ No previous analysis of the responsiveness of the MFG has been conducted.

Interrater Reliability (Secondary Analysis): ICCs were calculated as a secondary analysis of inter-rater reliability to compare with key results reported in the literature. Inter-rater reliability using ICCs (type 2.2) was calculated at 95% CI. A two-way random effects model with the mean scores (k = 2) was used to ascertain absolute agreement between raters for both techniques.²⁷ An ICC score greater than 0.9 is deemed excellent while scores less than 0.5, between 0.5 and 0.75, and between 0.75 and 0.9 are poor, moderate and good respectively.²⁸ ICC scores were reported as average measures. Standard error of measurement and 95% CI (SEM₉₅) were also calculated to determine the precision of each device and the Shapiro-Wilk test (p>.05) was used to ascertain the normality of the data sets used.

Results

From November 2019 until December 2022 40 participants were recruited (Table 1). One upper extremity was assessed for each participant and all had restrictions in forearm rotation due to injury with most participants being female. Descriptive statistics on both measures are provided in Table 2.

Table 1.

Demographic characteristics of participants (n=40).

Characteristics	Mean	SD	Range
Age	53.6	13.58	20-78
Time from Injury/Surgery^a (Days)	98.79	59.78	24-344

	n	(%)
Female	30	75
Affected side
Left	22	55
Right	18	45
Injury type
Humeral fracture	3	7.5
Distal radius fracture	31	77.5
Distal radius and ulna fracture	2	5
Radial head fracture	2	5
Elbow dislocation	1	2.5
Triad injury	1	2.5

Open in a new tab

n = number of participants, SD = standard deviation.

^aDays counted from surgery rather than injury in cases where surgery was performed.

Table 2.

Descriptive statistics of GoWatch and MFG measures. Trial one measures carried out prior to therapy. Trial two measures carried out after 30-60 min interval of hand therapy.

GoWatch
		n	Mean (SD)	Range	n	Mean (SD)	Range
	Trial 1
Rater 1	Pronation	40	77.35(18.40)	24-108	40	76.83 (17.44)	24-100
Rater 1	Supination	40	36.13 (27.81)	-26-78	40	34.45 (27.44)	0-83
Rater 2	Pronation	40	75.13 (19.61)	36-130	40	75.25 (16.83)	30-105
Rater 2	Supination	40	31.70 (24.67)	-39-70	40	35.75 (25.30)	0-90
	Trial 2
Rater 1	Pronation	40	82.25 (16.93)	50-119	40	78.08 (15.69)	37-100
Rater 1	Supination	40	40.25 (27.00)	-21-86	39	37.74 (25.66)	0-84
Rater 2	Pronation	40	76.15 (16.45)	34-110	40	79.88 (17.45)	35-120
Rater 2	Supination	40	38.15 (23.94)	-26-82	40	39.85 (26.67)	0-100

Open in a new tab

Note: Trial one measures carried out prior to therapy. Trial two measures carried out after 30-60 min interval of hand therapy.

Concurrent validity

The mean difference between the MFG and GoWatch values for supination and pronation in trial one followed a normal distribution as determined by the Shapiro-Wilk test (p>.05). Bland Altman plots illustrate the mean difference between both instruments for supination was 1.19° [95%CI (-1.77, 4.15)] and for pronation was 0.20° [95%CI(-3.10, 2.70)] both of which were within the 95% CI and the 5° acceptable range indicating agreement (Figure 2).

Figure 2. — Agreement between MFG and GoWatch in measuring pronation and supination. Black central solid line = mean difference; two solid grey lines = 95% CI; black dotted lines = *apriori* Limits of Agreement (LOA); Outside grey dotted lines = LOA based on mean of the two values, minus and plus 1.96 standard deviations.

Inter-rater reliability

Agreement between raters was also assessed using Bland Altman plots. The normality of the mean difference scores between raters was determined using the Shapiro-Wilk tests however values for the mean difference in MFG pronation values were non-normal (p<.05) and Log10 transformations could not be performed due to zero and negative values.

The mean difference between rater one and two measuring supination with the GoWatch was 4.43° [95%CI(0.52, 8.33)] and for pronation was 2.23° [95%CI(-2.22, 6.65)] whereby the line of equality fell within the 95% CI and also within the 5° acceptable range indicating agreement. The mean difference between raters 1 and 2 measuring supination with the MFG was -1.30° [95%CI: -95.01, 2.4)] whereby the line of equality fell within the 95% CI, also within the 5° acceptable range and indicating agreement (Figure 3).

Figure 3. — Agreement between Rater one and two in measuring supination and pronation using the GoWatch and supination using the MFG. Black central solid line = mean difference; two solid grey lines = 95% CI; black dotted lines = *apriori* Limits of Agreement (LOA); Outside grey dotted lines = LOA based on mean of the two values, minus and plus 1.96 standard deviations. Note: MFG supination upper *apriori* LOA black dotted line not visible as overlaps with upper 95% CI (solid grey line).

Secondary analysis using ICCs was then performed. MFG data sets did not show linearity along the QQ plot line which was confirmed by the Shapiro-Wilks test (p<.05). GoWatch supination in trial one was 0.94 [95%CI(0.86, 0.97)] and pronation was 0.85 [95%CI(0.71, 0.92)] indicating excellent and good interrater reliability respectively. The SEM₉₅ for the GoWatch in trial one was 6.44° for supination and 7.33° for pronation, both outside the acceptable limits (<5°).

Systematic bias

Repeated-measures ANOVA data from trial one detected significant systematic bias between all supination and pronation measures (p<.01) with higher pronation values and a residual error of 36.21° (Mean square error = 1311.08).

Figure 4 illustrates a significant rater-by-device interaction with a systematic bias between raters 1 and 2 using the GoWatch (Device 2) when analysing average measures of combined pronation and supination in trial 1 (p = .02) with a residual error of 5.95° (Mean square error = 35.37). Rater 1 and Rater 2 closely agreed on MFG (Device 1) and GoWatch measures (interaction between error bars) however systematic bias was observed with Rater 2 consistently underestimating measures with the GoWatch compared to Rater 1. This is represented by the diagonal line between Rater 1 and Rater 2.

Repeated-measures ANOVA agreement between each rater, each device and each component of ROM (supination and pronation) in trial 1, detected no significant bias (p = .18) however the residual error of 8.26° (Mean square error = 68.28) was outside the acceptable apriori level of 5°.

Responsiveness

Normality testing of all rater one MFG measures and rater two MFG supination measures were significant using the Shapiro-Wilk test (p<.05) however both raters’ GoWatch measures followed a normal distribution and therefore were analysed for responsiveness using Cohen’s d. Using measures collated by Rater 1 a very small effect size was observed between trial one and two GoWatch supination (d = 0.14 95% CI [-0.30, 0.58]) and a small to medium effect for pronation (d = 0.29 95% CI [-0.17, 0.72]). For Rater 2 a small to medium effect size was observed between trial one and two GoWatch supination (d=0.29, 95% CI [-0.16; 0.72]) and a very small effect for pronation (d = 0.05, 95% CI [-0.38, 0.49]). Repeated-measures ANOVA also confirmed that there was a positive effect between trials one and two when analysing average combined measures (i.e. pronation + supination) (p<.01) with a residual error of 7.66° (Mean square error = 58.74).

Discussion

The GoWatch demonstrated acceptable concurrent validity and interrater reliability for both supination and pronation measures (<5°) however systematic bias was found when the rater-by-instrument interaction was analysed. The MFG demonstrated superior agreement between raters compared to the GoWatch. Results from the study by Szekeres et al.¹⁴ further demonstrate superior agreement between raters using the MFG than the results obtained on the GoWatch in this study.

Bland Altman plots (Figure 3) illustrate that the difference between raters measuring supination using the GoWatch increased as higher ROM measures were obtained, while differences outside the apriori limits of agreement for GoWatch pronation and MFG supination were evenly spread. Rater 2’s total measures (supination+pronation) produced on average significantly lower GoWatch measurements compared to their MFG measures in trial 1 (Figure 4). GoWatch responsiveness in detecting change between trials one and two was significant using both statistical methodologies i.e. repeated-measures ANOVA and Cohen’s d. Each rater detected some change using the GoWatch though this varied from a very small to small/medium effect. This treatment effect between trials was hypothesized as it was intended to elicit an improvement in ROM. However, the ability for the GoWatch to measure responsiveness should be considered with caution due to the systematic bias detected in this study and also that the procedure was unable to completely “blind” both raters and participants from the results. Furthermore, the mean difference in scores between trials one and two for both raters measuring supination and pronation are low relative to the acceptable level of agreement (<5°) except for Rater 2 measuring supination which had a mean difference of 6.45°. Armstrong et al⁸ have recommended that a meaningful change in rotation measures would be at least 10°. Future studies on the GoWatch should consider measuring responsiveness over a longer duration to determine if it can detect meaningful clinical change.

Secondary analysis of GoWatch scores using ICCs demonstrated good interrater reliability for measuring pronation and excellent interrater reliability for measuring supination. However, this criterion can be misleading as the use of correlation coefficients to test agreement between measurement instruments and raters is cautioned.²⁰ The ICCs show that the GoWatch is fit for purpose in a clinical setting however fail to identify systematic bias of the GoWatch measures. GoWatch ICCs imply excellent interrater reliability for measuring supination and good interrater reliability for measuring pronation, yet Bland Altman plots indicate the mean difference was smaller for pronation. This disparity between ICCs and mean difference has previously been demonstrated.²⁹ These authors advise that ICCs “do not always reflect the clinical implications of measurement errors” and recommend the utilization of other statistical methods such as those used in this study. The results further highlight the risk of basing assumptions on the clinical acceptability of a measuring instrument solely on ICCs. Szekeres et al.¹⁴ reported that the MFG had excellent inter-rater reliability for pronation (ICC: 0.86) and supination (ICC: 0.95),¹⁴ similar to results for the GoWatch however, no systematic bias was identified with the MFG using linear regression analysis. SEM₉₅ scores for GoWatch pronation were 7.33° and for supination 6.44°, both outside the acceptable apriori limits and a larger error than previously reported for the MFG (SEM₉₅ pronation = 2.1° and supination = 1.2°). SEM₉₅ will decrease as sample size increases and Szekeres et al.¹⁴ measured ICCs and SEM₉₅ using pooled samples of 60 participants over six sessions producing a sample size of 360 possibly explaining smaller SEM₉₅ values. With systematic bias, SEM₉₅ values may not reduce much further regardless of a larger sample size. Results were not pooled from both trials because the second trial measured effect size and therefore pooling to assess agreement and reliability would be a flaw in the statistical approach. Non-normality of MFG data meant comparisons with GoWatch responsiveness could not be made.

MFG supination measures had non-normal platykurtik kurtosis meaning that the values were uniform with few outliers. Rater 1 pronation measures had non-normal leptokurtic kurtosis meaning that there were frequent outliers. Similar trends were observed in the GoWatch measures and MFG Rater 2 pronation measures but kurtosis fell within acceptable limits (±1.0).

Systematic bias was also observed between all measures of supination and pronation with a residual error of 36.21°. This was hypothesized as most people have limitations in supination rather than pronation after a distal radius fracture³ and this phenomenon was observed in this cohort (82.5% of study participants). This may also explain the negative skewness of the pronation measures which were observed in both raters though with Rater 2’s measures just within normal distribution.

Systematic bias between raters may be an isolated event, due to weaknesses in blinding or the potentially arbitrary reference point for placing the GoWatch. The wide area of the anatomical snuffbox may produce variability in placement and therefore in measurement readings. In larger or oedematous limbs, the anatomical snuffbox may be less defined and further affect accuracy. Participants were on average assessed in this study at around 3 months from injury or surgery though details on oedema or visibility of the anatomical snuffbox were not collected for this study.

Several suggestions on future iterations may help reduce bias. Adjusting the position of the watch face whereby it is positioned on the dorsum of the wrist where the flatter and wider surface area may offer a more stable base for the watch and limit unwanted movement. Aligning the centre of the watch face with the third metacarpal base by a reference line on the snap bracelet could further improve precision, and a serrated edge on the interior of the watch face may minimise oscillating movements when end ROM is held.

Both raters in this study were experienced in hand therapy and were inducted via a pilot study on both methods which were novel to them. Both raters did not find the MFG intuitive, with the Chief Investigator interjecting to provide instruction to flip the goniometer to enable measurements in opposing directions. Formal feedback on each rater’s preference of instrument was however beyond the scope of this study.

Limitations included systematic bias of rater-instrument interactions restricting a thorough understanding of the reliability, validity and responsiveness of the GoWatch. Using only two raters limited generalisation of the results to hand therapy practitioners. Some datasets were not normally distributed and as such some important aspects of the analysis could not be completed. The study commenced in November 2019 and delays in recruitment were encountered due to COVID-19.

Systematic bias may be resolved by further research with more than two raters and the provision of more extensive training on each instrument. Future study procedures should also consider strategies to ensure both participants and raters are completely blind to measurements. Further iterations to the GoWatch may also improve precision and reduce systematic bias. The hinge at the base of the GoWatch allows it to fold 90° degrees parallel to the forearm and it can also rotate 360° in its fulcrum. Subsequently, the GoWatch may be used to measure other joint ROM (e.g. elbow flexion). The raters in this study found the GoWatch more intuitive to use than the MFG, however this is not particularly reflected in the results. Further, the exploration of clinician perceptions would be an important area of research. This study highlights the need for caution when interpreting ICC scores of measuring instruments. Using the categories defined by Shrout and Fleiss²⁸ can be misleading when choosing an instrument for clinical use.

The GoWatch is easy to use and further iterations may help improve its precision so that it becomes an instrument of choice for many clinicians. Thereafter, further research will be necessary to evaluate its measurement properties and to ascertain if systematic bias is endemic across multiple raters.

Acknowledgements

The authors would like to thank Christine Davidson and Emma McPhillips for assisting in data collection for this study. The lead author would also like to thank Make Marks in contributing to the design of the second generation prototype GoWatch used in this study. The GoWatch concept was devised by Daniel Harte.

Author contributions: DH researched literature and conceived the study. DH was involved in protocol development, gaining ethical approval and patient recruitment. Both AN and DH conducted data analysis. DH wrote the first draft of the manuscript. All authors reviewed and edited the manuscript and approved the final version of the manuscript.

Declaration of conflicting interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the British Association of Hand Therapists (BAHT) [grant award 2018].

Guarantor: DH.

Ethical statement

Ethical approval

The Ethics Committee of ORECNI approved this study (REC number: 239585).

Informed consent

Written informed consent was obtained from all patients for their anonymized information to be published in this article.

ORCID iD

Daniel Harte https://orcid.org/0000-0001-9199-823X

References

1.Soubeyrand M, Assabah B, Bégin M, et al. Pronation and supination of the hand: anatomy and biomechanics. Hand Surg Rehabil 2017; 36(1): 2–11. [DOI] [PubMed] [Google Scholar]
2.Kleinman WB. Distal radius instability and stiffness: common complications of distal radius fractures. Hand Clin 2010; 26(2): 245–264. [DOI] [PubMed] [Google Scholar]
3.Naughton N, Algar L. Therapy management of distal radius fractures. In: Skirven TM, Osterman AL, Fedorczyk JM, et al. (eds) Rehabilitation of the Hand and Upper Extremity. 7th ed. Philidelphia: Elsevier Health Sciences, 2021, pp. 833–850. [Google Scholar]
4.MacIntyre NJ, Dewan N. Epidemiology of distal radius fractures and factors predicting risk and prognosis. J Hand Ther 2016; 29(2): 136–145. [DOI] [PubMed] [Google Scholar]
5.Yang W, Houtrow A, Cull DS, et al. Quality and outcome measures for medical rehabilitation. In: Cifu DX. (ed) Braddom's Physical Medicine and Rehabilitation. 6th ed. Philidephia: Elsevier, 2021, pp. 100–114. e2. [Google Scholar]
6.World Health Organisation. International Classification of Functioning, and Health. WHO, Geneva 2018, https://www.who.int/standards/classifications/international-classification-of-functioning-disability-and-health (accessed 12 October 2023).
7.MacDermid J, Solomon G, Valdes K. Clinical assessment Recommendations. 3rd ed. NJ: Amercian Society of Hand Therapists: Mount Laurel, 2015. [Google Scholar]
8.Armstrong AD, MacDermid JC, Chinchalkar S, et al. Reliability of range-of-motion measurement in the elbow and forearm. J Shoulder Elbow Surg 1998; 7(6): 573–580. [DOI] [PubMed] [Google Scholar]
9.McRae R. Clinical Orthopaedic Examination. 1st ed. Edinburgh: Churchill Livingstone, 1981. [Google Scholar]
10.Flowers KR, Stephens-Chisar J, LaStayo P, et al. Intrarater reliability of a new method and instrumentation for measuring passive supination and pronation: a preliminary study. J Hand Ther 2001; 14(1): 30–35. [DOI] [PubMed] [Google Scholar]
11.McGarry G, Gardner E, Muirhead A. Measurement of forearm rotation: an evaluation of two techniques. J Hand Surg Br 1988; 13(3): 288–290. [DOI] [PubMed] [Google Scholar]
12.Karagiannopoulos C, Sitler M, Michlovitz S. Reliability of 2 functional goniometric methods for measuring forearm pronation and supination active range of motion. J Orthop Sports Phys Ther 2003; 33(9): 523–531. [DOI] [PubMed] [Google Scholar]
13.Szekeres M, MacDermid JC, Rooney J. A new method for measuring forearm rotation using a modified finger goniometer. J Hand Ther 2015; 28(4): 429–431. quiz 32. [DOI] [PubMed] [Google Scholar]
14.Szekeres M, MacDermid JC, Birmingham T, et al. The inter-rater reliability of the modified finger goniometer for measuring forearm rotation. J Hand Ther 2016; 29(3): 292–298. [DOI] [PubMed] [Google Scholar]
15.Guo Q, Zheng H, Chen W, et al. Modeling bistable behaviors in morphing structures through finite element simulations. Bio Med Mater Eng 2014; 24(1): 557–562. [DOI] [PubMed] [Google Scholar]
16.Wagner C. Determination of the rotary flexibility of the elbow joint. Eur J Appl Physiol Occup Physiol 1977; 37(1): 47–59. [DOI] [PubMed] [Google Scholar]
17.Darcus HD, Salter N. The amplitude of pronation and supination with the elbow flexed to a right angle. J Anat 1953; 87(2): 169–184. [PMC free article] [PubMed] [Google Scholar]
18.Kottner J, Audigé L, Brorson S, et al. Guidelines for reporting reliability and agreement studies (GRRAS) were proposed. Int J Nurs Stud 2011; 48(1): 661–671. [DOI] [PubMed] [Google Scholar]
19.Mokkink LB, Boers M, van der Vleuten CPM, et al. COSMIN risk of bias tool to assess the quality of studies on reliability or measurement error of outcome measurement instruments: a Delphi study. BMC Med Res Methodol 2020; 20(1): 293. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986; 1(8476): 307–310. [PubMed] [Google Scholar]
21.Kwak SG, Kim JH. Central limit theorem: the cornerstone of modern statistics. Korean J Anesthesiol 2017; 70(2): 144–156. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Marx RG, Bombardier C, Wright JG. What do we know about the reliability and validity of physical examination tests used to examine the upper extremity? J Hand Surg Am 1999; 24(1): 185–193. [DOI] [PubMed] [Google Scholar]
23.Field A. An Adventure in statistics: the Reality Enigma. 2nd ed. London: Sage Publications Ltd, 2022, p. 61. [Google Scholar]
24.Lange RT. Inter-rater reliability. In: Kreutzer JS, DeLuca J, Caplan B. (eds).Encyclopedia of Clinical Neuropsychology. New York: Springer, 2011, p. 1348. [Google Scholar]
25.Cohen J. Statistical Power analysis for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Lawrence Earlbaum Associates, 1988. [Google Scholar]
26.Terwee CB. Responsiveness to change. In: Michalos AC. (ed) Encyclopedia of Quality of Life and Well-Being Research. Netherlands: Springer, 2014, pp. 5547–5550. [Google Scholar]
27.Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med 2016; 15(2): 155–163. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1979; 86(2): 420–428. [DOI] [PubMed] [Google Scholar]
29.Lee KM, Lee J, Chung CY, et al. Pitfalls and important issues in testing reliability using intraclass correlation coefficients in orthopaedic research. Clin Orthop Surg 2012; 4(2): 149–155. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr1-17589983231211813] 1.Soubeyrand M, Assabah B, Bégin M, et al. Pronation and supination of the hand: anatomy and biomechanics. Hand Surg Rehabil 2017; 36(1): 2–11. [DOI] [PubMed] [Google Scholar]

[bibr2-17589983231211813] 2.Kleinman WB. Distal radius instability and stiffness: common complications of distal radius fractures. Hand Clin 2010; 26(2): 245–264. [DOI] [PubMed] [Google Scholar]

[bibr3-17589983231211813] 3.Naughton N, Algar L. Therapy management of distal radius fractures. In: Skirven TM, Osterman AL, Fedorczyk JM, et al. (eds) Rehabilitation of the Hand and Upper Extremity. 7th ed. Philidelphia: Elsevier Health Sciences, 2021, pp. 833–850. [Google Scholar]

[bibr4-17589983231211813] 4.MacIntyre NJ, Dewan N. Epidemiology of distal radius fractures and factors predicting risk and prognosis. J Hand Ther 2016; 29(2): 136–145. [DOI] [PubMed] [Google Scholar]

[bibr5-17589983231211813] 5.Yang W, Houtrow A, Cull DS, et al. Quality and outcome measures for medical rehabilitation. In: Cifu DX. (ed) Braddom's Physical Medicine and Rehabilitation. 6th ed. Philidephia: Elsevier, 2021, pp. 100–114. e2. [Google Scholar]

[bibr6-17589983231211813] 6.World Health Organisation. International Classification of Functioning, and Health. WHO, Geneva 2018, https://www.who.int/standards/classifications/international-classification-of-functioning-disability-and-health (accessed 12 October 2023).

[bibr7-17589983231211813] 7.MacDermid J, Solomon G, Valdes K. Clinical assessment Recommendations. 3rd ed. NJ: Amercian Society of Hand Therapists: Mount Laurel, 2015. [Google Scholar]

[bibr8-17589983231211813] 8.Armstrong AD, MacDermid JC, Chinchalkar S, et al. Reliability of range-of-motion measurement in the elbow and forearm. J Shoulder Elbow Surg 1998; 7(6): 573–580. [DOI] [PubMed] [Google Scholar]

[bibr9-17589983231211813] 9.McRae R. Clinical Orthopaedic Examination. 1st ed. Edinburgh: Churchill Livingstone, 1981. [Google Scholar]

[bibr10-17589983231211813] 10.Flowers KR, Stephens-Chisar J, LaStayo P, et al. Intrarater reliability of a new method and instrumentation for measuring passive supination and pronation: a preliminary study. J Hand Ther 2001; 14(1): 30–35. [DOI] [PubMed] [Google Scholar]

[bibr11-17589983231211813] 11.McGarry G, Gardner E, Muirhead A. Measurement of forearm rotation: an evaluation of two techniques. J Hand Surg Br 1988; 13(3): 288–290. [DOI] [PubMed] [Google Scholar]

[bibr12-17589983231211813] 12.Karagiannopoulos C, Sitler M, Michlovitz S. Reliability of 2 functional goniometric methods for measuring forearm pronation and supination active range of motion. J Orthop Sports Phys Ther 2003; 33(9): 523–531. [DOI] [PubMed] [Google Scholar]

[bibr13-17589983231211813] 13.Szekeres M, MacDermid JC, Rooney J. A new method for measuring forearm rotation using a modified finger goniometer. J Hand Ther 2015; 28(4): 429–431. quiz 32. [DOI] [PubMed] [Google Scholar]

[bibr14-17589983231211813] 14.Szekeres M, MacDermid JC, Birmingham T, et al. The inter-rater reliability of the modified finger goniometer for measuring forearm rotation. J Hand Ther 2016; 29(3): 292–298. [DOI] [PubMed] [Google Scholar]

[bibr15-17589983231211813] 15.Guo Q, Zheng H, Chen W, et al. Modeling bistable behaviors in morphing structures through finite element simulations. Bio Med Mater Eng 2014; 24(1): 557–562. [DOI] [PubMed] [Google Scholar]

[bibr16-17589983231211813] 16.Wagner C. Determination of the rotary flexibility of the elbow joint. Eur J Appl Physiol Occup Physiol 1977; 37(1): 47–59. [DOI] [PubMed] [Google Scholar]

[bibr17-17589983231211813] 17.Darcus HD, Salter N. The amplitude of pronation and supination with the elbow flexed to a right angle. J Anat 1953; 87(2): 169–184. [PMC free article] [PubMed] [Google Scholar]

[bibr18-17589983231211813] 18.Kottner J, Audigé L, Brorson S, et al. Guidelines for reporting reliability and agreement studies (GRRAS) were proposed. Int J Nurs Stud 2011; 48(1): 661–671. [DOI] [PubMed] [Google Scholar]

[bibr19-17589983231211813] 19.Mokkink LB, Boers M, van der Vleuten CPM, et al. COSMIN risk of bias tool to assess the quality of studies on reliability or measurement error of outcome measurement instruments: a Delphi study. BMC Med Res Methodol 2020; 20(1): 293. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr20-17589983231211813] 20.Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986; 1(8476): 307–310. [PubMed] [Google Scholar]

[bibr21-17589983231211813] 21.Kwak SG, Kim JH. Central limit theorem: the cornerstone of modern statistics. Korean J Anesthesiol 2017; 70(2): 144–156. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr22-17589983231211813] 22.Marx RG, Bombardier C, Wright JG. What do we know about the reliability and validity of physical examination tests used to examine the upper extremity? J Hand Surg Am 1999; 24(1): 185–193. [DOI] [PubMed] [Google Scholar]

[bibr23-17589983231211813] 23.Field A. An Adventure in statistics: the Reality Enigma. 2nd ed. London: Sage Publications Ltd, 2022, p. 61. [Google Scholar]

[bibr24-17589983231211813] 24.Lange RT. Inter-rater reliability. In: Kreutzer JS, DeLuca J, Caplan B. (eds).Encyclopedia of Clinical Neuropsychology. New York: Springer, 2011, p. 1348. [Google Scholar]

[bibr25-17589983231211813] 25.Cohen J. Statistical Power analysis for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Lawrence Earlbaum Associates, 1988. [Google Scholar]

[bibr26-17589983231211813] 26.Terwee CB. Responsiveness to change. In: Michalos AC. (ed) Encyclopedia of Quality of Life and Well-Being Research. Netherlands: Springer, 2014, pp. 5547–5550. [Google Scholar]

[bibr27-17589983231211813] 27.Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med 2016; 15(2): 155–163. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr28-17589983231211813] 28.Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1979; 86(2): 420–428. [DOI] [PubMed] [Google Scholar]

[bibr29-17589983231211813] 29.Lee KM, Lee J, Chung CY, et al. Pitfalls and important issues in testing reliability using intraclass correlation coefficients in orthopaedic research. Clin Orthop Surg 2012; 4(2): 149–155. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Validity, reliability and responsiveness of a goniometer watch to measure pure forearm rotation

Daniel Harte

Alan Nevill

Lucia Ramsey

Suzanne Martin

Abstract

Introduction

Methods

Results

Conclusion

Introduction

Figure 1.

Methods

Design

Participants

Ethics

Recruitment

Sample size

Inclusion criteria

Exclusion criteria

Materials

Procedure

Randomisation and blinding

Statistical analysis

Results

Table 1.

Table 2.

Concurrent validity

Figure 2.

Inter-rater reliability

Figure 3.

Systematic bias

Figure 4.

Responsiveness

Discussion

Acknowledgements

Ethical statement

Ethical approval

Informed consent

ORCID iD

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases