Dear Editor-in-chief
We are writing with regard to “Intra- and inter-rater reliability of the modified tuck jump assessment,” by Fort-Vanmeerhaeghe et al. (2017) published in the Journal of Sports Science & Medicine. The authors reported on the reliability of the modified Tuck Jump Assessment (TJA). The purpose of the article was twofold: to introduce a new scoring methodology and to report on the interrater and intrarater reliability. The authors found the modified TJA to have excellent interrater reliability (ICC = 0.94, 95% CI = 0.88-0.97) and intrarater reliability (rater 1 ICC = 0.94, 95% CI = 0.88-0.9; rater 2 ICC = 0.96, 95% CI = 0.92-0.98) with experienced raters (n = 2) in a sample of 24 elite volleyball athletes. Overall, we found the study to be well conducted and valuable to the field of injury screening; however, the study did not adequately explain how the raters were trained in the modified TJA to improve consistency of scoring, or the modifications of the individual flaw “excessive contact noise at landing.” This information is necessary to improve the clinical utility of the TJA and direct future reliability studies.
The TJA has been changed at least three times in the literature: from the initial introduction (Myer et al., 2006) to the most referenced and detailed protocol (Myer et al., 2011) to the publication under discussion (Fort-Vanmeerhaeghe et al., 2017). The initial test protocol was based upon clinical expertise and has evolved over time as new research emerged and problems arose with the original TJA. Initially, the TJA was scored on a visual analog scale (Myer et al., 2006), changed to a dichotomous scale (0 for no flaw or 1 for flaw present) (Myer et al., 2011) and most recently modified using an ordinal scale (Fort-Vanmeerhaeghe et al., 2017). A significant disparity in the reported interrater and intrarater reliability arose with the dichotomously scored TJA, between those involved in the development of the TJA (Herrington et al., 2013) and other researchers who were not involved (Dudley et al., 2013). Dudley, et al. (2013) reported the lack of a clarity in protocol and rater training in the dichotomous TJA description (Myer et al., 2011), and these limitations may have contributed to the poor to moderate reliability found in their study of varied raters with differing educational backgrounds. Possibly in reference to the issues brought up in Dudley, et al. (2013), Fort-Vanmeerhaeghe et al. (2017) suggested that a lack of background information and the specific training in the TJA led to reliability issues in the dichotomous TJA scoring, which they believed necessitated changing the TJA protocol. However, the authors did not provide a detailed explanation for the training of the raters, nor their involvement with the creation of the modified TJA, which would have provided important information as a significant learning effect with scoring was seen with the dichotomous TJA (Dudley et al., 2013) which may inflate the reliability in this study (Fort-Vanmeerhaeghe et al., 2017). Further and perhaps more importantly, the clinical applicability of the new ordinal scoring methods is limited because it is not clear what is required to train raters for reliable scoring, especially with a new, more complicated scoring system. Beyond a simple explanation that the raters “watched as many times as necessary and at whatever speeds they needed to score each test,” no other methodology on video scoring was reported (Fort-Vanmeerhaeghe et al., 2017). Several questions are not answered in the study but will significantly impact replication of the findings and the use in a clinical setting. Were the raters instructed on calibrating volume? Were the raters instructed in the criteria for scoring? Did the raters work together to calibrate their scoring prior to the study? If so, for how long and by what methods?
To illustrate, for “pause between jumps,” the following criteria are reported: (0) reactive and reflex jumps, (1) small pause between jumps, and (2) large pause between jumps. The authors do not explain the difference between small and large. If the frame rate is not controlled while watching the video frame by frame, a rater may incorrectly score a severe pause between jumps when there is no flaw present. To limit this error, a possible solution is for the rater to watch the video at normal speed and only mark a flaw present if a pause is noticeable. The difference between a large and small pause could then be determined by determining time during the pause by going frame by frame. Pauses longer than half a second could constitute a large flaw (2), while those below are a small flaw (1). The method of scoring for each flaw needs to be clear and outline common errors in methodology, especially with a new scoring criteria.
The flaw “excessive contact noise at landing” seems to have two separate criteria in modified TJA compared with the dichotomously scored TJA. Fort-Vanmeerhaeghe et al. (2017) provided the following criteria: (0) subtle noise at landing (landing on the balls of their feet), (1) audible noise at landing (heels almost touch the ground at landing), (2) loud and pronounced noise at landing (contact of the entire foot and heel on the ground between jumps). The text in parentheses was not included in other research on the TJA (Myer et al., 2011). No explanation for this addition is present in the study, and the ambiguity of these criteria will limit reproducibility. If an athlete lands softly and the entire foot and heel touch the ground between jumps, this may be related to the pause between jumps flaw. Would this still be scored as excessive contact noise and scored as a severe flaw even when the noise is not excessive? From the study, it is unclear what constitutes excessive contact noise, if noise was considered in the scoring, if the raters calibrated volume to a certain level during video analysis, and if foot landing strategy should impact scoring—this clarity is needed for reliability, clinical utility, and validity.
In closing, our team has found the TJA to be clinically valuable in practice. We suggest more detail on training methodology for adequate reliability in raters with the modified TJA (Dudley et al., 2013), and an improved method for quantifying excessive contact noise.
References
- Dudley L.A., Smith C.A., Olson B.K., Chimera N.J., Schmitz B., Warren M. (2013) Interrater and intrarater reliability of the tuck jump assessment by health professionals of varied educational backgrounds. Journal of Sports Medicine 2013, 483503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fort-Vanmeerhaeghe A., Montalvo A.M., Lloyd R.S., Read P., Myer G.D. (2017) Intra- and inter-rater reliability of the modified tuck jump assessment. Journal of Sports Science and Medicine 16, 117-124. [PMC free article] [PubMed] [Google Scholar]
- Herrington L., Myer G.D., Munro A. (2013) Intra and inter-tester reliability of the tuck jump assessment. Physical Therapy in Sport 14, 152-155. [DOI] [PubMed] [Google Scholar]
- Myer G.D., Paterno M.V., Ford K.R., Quatman C.E., Hewett T.E. (2006) Rehabilitation after anterior cruciate ligament reconstruction: criteria-based progression through the return-to-sport phase. Journal of Orthopaedic and Sports Physical Therapy 36, 385-402. [DOI] [PubMed] [Google Scholar]
- Myer G.D., Brent J.L., Ford K.R., Hewett T.E. (2011) Real-time assessment and neuromuscular training feedback techniques to prevent ACL injury in female athletes. Strength and Conditioning Journal 33, 21-35. [DOI] [PMC free article] [PubMed] [Google Scholar]
