Abstract
Background:
Ultrasound imaging is important in many fields such as medicine, sports, and health sciences to assess parts of muscle structure (e.g., muscle thickness [MT]) or composition (subcutaneous tissue [SubT]).
Objective:
The aim of the present study was to investigate the intra- and inter-rater reliability of MT and SubT measurements of the hip abductor muscles gluteus medius (GM) and tensor fascia latae (TFL).
Design:
Cross-sectional study.
Method:
Twenty young adults participated in the study. Intra-rater reliability was established by measuring the same two images twice by the same rater, while inter-rater reliability was assessed between two raters by measuring the same two images for each muscle. For both intra- and inter-rater reliability, the reliability of the TFL and GM outcomes (MT- SubT) were determined by the intraclass correlation coefficient (ICC), coefficient of variation (CV), standard error of the measurement (SEM), and Bland-Altman plots.
Results:
For intra-rater reliability, variables of both muscles showed an excellent ICC (≥0.90), lower CV and SEM, and bias near zero. Inter-rater reliability also showed an excellent ICC for both variables and muscles (≥0.81) with lower CV, SEM, and bias.
Conclusion:
Therefore, these results provide strong evidence of a reliable measure of MT and SubT from GM and TFL. The present study provides health care professionals and researchers increased confidence in using 2D ultrasound to assess the hip abductors muscles reliably.
Keywords: Ultrasound imaging, Hip abductor muscles, Gluteus medius, Tensor fascia latae, Muscle structure
1. Introduction
Understanding and assessing skeletal muscle structure is important in many fields such as medicine, sports, and health sciences. 2D ultrasound imaging is consistently used as a tool to assess muscle structure (e. g., muscle thickness [MT]) or composition (subcutaneous tissue [SubT]) (Whittaker and Stokes 2011; Franchi et al., 2018; Škarabot et al., 2021). Assessing muscle structure is important due to its influence on muscle performance such as increases in muscle force (Folland and Williams 2007) and velocity (Stasinaki et al., 2019), and a reliable measure of muscle structure and composition is critical to muscle performance. Although, reliability (intra-and inter-rater) of 2D ultrasound measurements have been reported for many muscles [e.g., biceps femoris (Franchi et al., 2018), rectus femoris and vastus lateralis (Rock et al., 2021), medial gastrocnemius (Legerlotz et al., 2010)], the reliability of 2D ultrasound measurements for gluteus medius (GM) and tensor fascia latae (TFL) muscles are largely unknown.
The hip abductor muscle structure has been shown to be related to functional tasks such as stepping (Addison et al., 2017; Inacio et al., 2019), balance and mobility tests (DeJong et al., 2020; Mirshams Shahshahani and Ashton-Miller 2020; Lanza et al., 2021b), and also contribute to sports performance (Van Cant et al., 2017; Miller et al., 2020). Assessment of the hip abductor muscles typically utilizes advanced imaging techniques including magnetic resonance imaging (Miller et al., 2020) or computerized tomography scans (Addison et al., 2017), which are costly and impractical for most clinical settings. The reduced use of ultrasound compare to other techniques to assess GM and TFL may be due to the problematic nature of assessing these muscles (e. g., participant exposure and position, and/or quality of imaging). Placement and position of the ultrasound probe for the GM and TFL can also be challenging without consistent recommendations for probe placement in the literature. Therefore, there exists a need for a reliable standardization technique to acquire ultrasound images from GM and TFL.
MT and SubT are two of the most often selected ultrasound measurements used clinically and in research. MT is regularly measured to identify differences in muscle size between individuals or changes after a training intervention (Pareja-Blanco et al., 2020; Lacerda et al., 2021), while SubT is often used to correct the signal from surface electromyography (Balshaw et al., 2018; Lanza et al., 2018; Škarabot et al., 2021). It is fundamental for researchers to understand if a reliable measure of MT and SubT can be extracted from GM and TFL. Therefore, the purpose of this study was to investigate the intra- and inter-rater reliability of MT and SubT measurements of the hip abductors muscles (GM and TFL) in young adults.
2. Methods
2.1. Participants
A convenience sample of 20 participants (18–35 years) with no history of neurological or muscular disorder, or lower extremity surgery or serious injury volunteered to participate in the study. All participants provided written informed consent before their participation. This study was approved by the University’s Institutional Review Board.
2.2. Measures and procedures
Ultrasound images of the dominant-limb (defined by the question: which foot would you kick a ball?) from the GM and TFL muscles were collected using 2D B-mode ultrasonography (Whale Sigma P5, Whale Imaging Inc., Waltham, MA, USA) with a 5–12 MHz frequency, 38-mm linear array probe by two evaluators (MBL and KR, 8 years and 3 years of ultrasonography experience, respectively). Surface electromyography was used during ultrasound image acquisition to confirm the muscle was in a relaxed state (iWorx, Dover, NH, USA). All participants were positioned in supine with knee in 0° (neutral), and the hip in 0° abduction and rotation for the TFL, and in a side lying position for the GM. The probe placement for each muscle was according to the SENIAM guide for surface electromyography position (Hermens et al., 2000) consistent with previous papers that measured SubT (or muscle electrode distance) using the same placement for electromyography and ultrasound recording (Balshaw et al., 2018). The probe was placed parallel to the long axis of the muscle and perpendicular to the skin surface at 50% the distance from the iliac crest to the trochanter for the GM, and on the line from the anterior spina iliaca superior to the lateral femoral condyle in the proximal 1/6 for the TFL. For imaging acquisition the skin was marked and pressure was minimized from the probe. The probe was fully removed from the skin between each recording. Consistent with our previous work, three images were taken for each muscle by each evaluator. Off-line analysis of MT and SubT was completed using the best two images that allowed for the identification of the superficial and deeper aponeurosis (e.g., Fig. 1). For intra-rater reliability one evaluator (MBL) took one measurement for each image for MT and SubT of GM and TFL. For inter-rater reliability each evaluator (MBL and KR) analyzed their own acquired images once for MT and SubT of GM and TFL. Because each evaluator measured two images, an average value from the two images was created for each variable and muscle for each evaluator. The average value from each evaluator was used to calculate inter-rater reliability.
Fig. 1.

Ultrasonography images of (A) tensor fascia latae (TFL), and (B) gluteus medius (GM) muscles. Top arrows on each image indicate the identification of subcutaneous tissue measure, while the bottom arrow indicates the identification of muscle thickness measure.
2.3. Data analysis
The software Tracker, version 5.1.5 (www.physlets.org/tracker/) was used for image analysis. MT was measured as the distance between the superficial and deep aponeuroses at 50% of the image length. SubT was measured as the distance from the surface of the skin to the muscle fascia at 50% of the image length (Fig. 1A and B). Importantly, SubT measurement also included skin and superficial aponeuroses, and any other tissue between the probe and target muscle (i.e fasica and connective tissue) as previously performed (Balshaw et al., 2018; Lanza et al., 2018; Škarabot et al., 2021).
2.4. Statistical analysis
For the intra- and inter-rater reliability, the relative reliability of the GM and TFL variables were determined by intraclass correlation coefficient (ICC2,1) and coefficient of variation [CV%, (SD/mean) × 100%] (Weir 2005). The ICC values were interpreted as weak (<0.4), moderate (0.4–0.59), good (0.6–0.74) and/or excellent (0.75–1.0) (Cicchetti 1994). Additionally, the percent standard error of measurement (SEM%) was calculated using the following formula: , where SD is the standard deviation, and ICC is the intraclass correlation coefficient (Lanza et al., 2021a). The agreement of the variables (e.g., MT image 1 vs. MT image 2; or MT evaluator 1 vs. MT evaluator 2) was assessed through the Bland-Altman plots (Bland and Altman 1986) and a linear regression analysis was performed to verify the presence of proportional bias (Ludbrook 2010). Statistical analyses were performed using SPSS version 26.0 (IBM Inc., Chicago, IL, USA). Data distribution was tested by using one-sample Kolmogorov-Smirnov testing, and CV is presented as mean ± SE. The level of significance was set at the α < 0.05.
3. Results
Participant characteristics and average values for MT and SubT can be seen in Table 1.
Table 1.
Participant demographics, muscle thickness and subcutaneous tissue as mean ± standard deviation.
| n | 20 |
| Females | 15 |
| Age (years) | 28.7 ± 4.9 |
| Height (m) | 1.7 ± 0.1 |
| Weight (Kg) | 69.7 ± 16.9 |
| BMI (Kg/m2) | 23.6 ± 4.0 |
| Muscle Thickness (cm) | |
| Gluteus Medius | 2.8 ± 0.6 |
| Tensor Fascia Latae | 2.4 ± 0.8 |
| Subcutaneous Tissue (cm) | |
| Gluteus Medius | 3.5 ± 1.1 |
| Tensor Fascia Latae | 1.3 ± 0.5 |
3.1. Intra-rater reliability
GM and TFL MT measurements demonstrated excellent intra-rater reliability with an ICC of 0.90 and 0.98 (respectively), and a CV of 4.5 ± 1.2% and 0.3 ± 0.07% (respectively). Similarly, GM and TFL SubT also demonstrated excellent intra-rater reliability with an ICC of 0.98 and 0.96 (respectively), and a CV of 2.0 ± 0.4% and 0.5 ± 0.2% (respectively), with SEM ranging from 4.6% to 9.3%, Table 2. Bland-Altman plots showed bias near 0 for MT (GM, d = 0.035 and TFL, d = 0.008) and SubT (GM, d = 0.022 and TFL, d = −0.009) Fig. 2, and no proportional bias was identified (P ≥ 0.314); hence no difference was noted when the same rater performed the analysis.
Table 2.
Inter-rater and intra-rater correlation coefficient (ICC) with 95% confidence interval (CI), coefficient of variation (CV) and SEM(%) from muscle thickness and subcutaneous tissue.
| Muscle Thickness | ICC | 95% CI | CV (%) | SEM (%) | |
|---|---|---|---|---|---|
| Inter-rater reliability | GM | 0.85 | 0.62–0.94 | 9.6 ± 8.4 | 9.2 |
| TFL | 0.96 | 0.89–0.98 | 6.1 ± 5.0 | 6.4 | |
| Intra-rater reliability | GM | 0.90 | 0.72–0.96 | 4.5 ± 1.2 | 9.0 |
| TFL | 0.98 | 0.95–0.99 | 0.3 ± 0.07 | 4.9 | |
| Subcutaneous Tissue | ICC | 95% CI | CV (%) | SEM (%) | |
| Inter-rater reliability | GM | 0.82 | 0.53–0.92 | 11.7 ± 18.4 | 13.0 |
| TFL | 0.81 | 0.51–0.92 | 14.2 ± 10.8 | 16.1 | |
| Intra-rater reliability | GM | 0.98 | 0.93–0.99 | 2.0 ± 0.4 | 4.6 |
| TFL | 0.96 | 0.90–0.98 | 0.5 ± 0.2 | 9.3 |
Fig. 2.

Bland-Altman plots for the intra-rater assessment of gluteus medius (GM) muscle thickness (A) and subcutaneous tissue (C), and tensor fascia latae (TFL) muscle thickness (B) and subcutaneous tissue (D), with limits of agreement (dotted line), from −1.96 SD (standard deviation) to +1.96 SD.
3.2. Inter-rater reliability
MT measurements between evaluators were excellent for GM (ICC = 0.850, CV = 9.6 ± 8.4) and TFL (ICC = 0.96, CV = 6.1 ± 5.0). GM SubT presented an excellent reliability (ICC = 0.815, CV = 9.7 ± 11.8%), as well as TFL (ICC = 0.81, CV = 14.2 ± 10.8%), and both muscles demonstrated a SEM ranging from 3.7% to 13.0%, Table 2. Similar to intra-rater reliability, Bland-Altman plots also presented an excellent agreement with a bias near 0 for MT (GM, d = 0.089 and TFL, d = −0.111) and SubT (GM, d = 0.125 and TFL, d = −0.191) Fig. 3, and no proportional bias was identified (P ≥ 0.745); again, no difference was noted when the different raters perform the analysis.
Fig. 3.

Bland-Altman plots for the inter-rater assessment of gluteus medius (GM) muscle thickness (A) and subcutaneous tissue (C), and tensor fascia latae (TFL) muscle thickness (B) and subcutaneous tissue (D), with limits of agreement (dotted line), from −1.96 SD (standard deviation) to +1.96 SD.
4. Discussion
The present study aimed to assess the intra- and inter-reliability of MT and SubT measurements of the hip abductor muscles (GM and TFL). We provided novel evidence for an overall excellent intra- and inter-rater reliability of GM and TFL MT and SubT assessed by 2D ultrasound in young adults. Thus, the technique used here may reliably be used in future studies assessing MT and SubT of the GM and TFL muscles.
We provide strong evidence of excellent intra-rater ICC values, and low CVs and SEM for the variables for both muscles investigated here. Only one previous study has reported intra-rater reliability of MT for the GM (but not SubT), which also showed an excellent ICC (0.96 up to 0.98) and also a small SEM (Whittaker and Emery 2014). Our results are also similar to other studies that investigated intra-rater reliability in the young adult population on other lower extermity muscles such as rectus femoris (Rock et al., 2021; Takahashi et al., 2021), vastus lateralis (Rock et al., 2021), and in pediatric populations for the medial gastrocnemius (Legerlotz et al., 2010). Importantly, although our results were similar, previous publications used different muscles and populations. The rectus femoris is a relatively easy muscle to assess with ultrasound and easily provides a high-quality image, allowing for high reliability even with little user experience. From our experience, GM and TFL are more difficult to assess compared to others muscles due to participant positioning and probe placement. In our experience, this occurs even in experienced users, and may require additional time to capture high quality images compared to other muscles (e.g., rectus femoris and vastus lateralis). Despite these difficulties, intra-rater evaluation presented an excellent absolute agreement (Bland-Altman plots) between measurements (variables in both muscles) with a bias near 0 in all situations, reinforcing the agreement between the analysis performed by the same rater. Our findings bring confidence the intra-rater measurements for MT and SubT of the GM and TFL are reliable.
This is the first time we are aware of that inter-rater evaluation is reported for both MT and SubT on both GM and TFL muscles. We demonstrated an excellent inter-rater reliability for both variables in both muscles. Although CVs and the SEM were also small, they were higher than presented for intra-rater reliability. These differences might be due to a difference in probe orientation. Small changes in probe orientation potentially causes variation in image quality and, thus, potential error in measurements (Klimstra et al., 2007; Bénard et al., 2009). It is probable that minor differences in probe orientation occurred between both raters (MBL and KR) and increased the error between raters. Nevertheless, Bland-Altman’s assessment for inter-rater measures showed a bias near to 0 for both muscles and variables, reinforcing our confidence about the consistency of the measure between both raters in the present study despite these minor differences. Furthermore, the information regarding SubT might bring more confidence to other researchers to use this measure in the muscles we analyzed here as a way to account for the influence of subcutaneous tissue (e.g., fat) on muscle activation measured from surface electromyography (Nordander et al., 2003).
In conclusion, we demonstrated an overall excellent intra- and inter-rater reliability of the muscle thickness and subcutaneous tissue measurements from the GM and TFL assessed by 2D ultrasound. Using the techniques employed in the present study, it is possible (but not certain) that measures performed on different days or between different evaluators will be very similar. Therefore, health care professionals and researchers can more confidently measure MT and SubT from GM and TFL to compare between participants or the effects of interventions on these properties.
4.1. Limitations and relevance
Although we demonstrated excellent reliability, it is important to replicate our procedures in different populations. Our evaluation is in the young population; hence extrapolation to other groups should be made carefully. For instance, future studies should assess the same technique in other populations, such as older adults, to ensure the same reliability is achieved. Despite these limitations, the results of the present study provide confidence for health care professionals and researchers to use 2D ultrasound to assess the hip abductor muscles.
Acknowledgments
We would like to acknowledge the participants for their valuable time.
Footnotes
The work was performed at the Department of Physical Therapy and Rehabilitation Science, University of Maryland School of Medicine, 100 Penn Street, Baltimore, MD 21201–1082, United States.
Ethical statement
All participants provided written informed consent before their participation. This study was approved by the University’s Institutional Review Board.
Declaration of competing interest
All the authors declare no conflicts of interest exist.
References
- Addison O, Inacio M, Bair W-N, et al. , 2017. Role of hip abductor muscle composition and torque in protective stepping for lateral balance recovery in older adults. Arch. Phys. Med. Rehabil 98, 1223–1228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Balshaw TG, Massey GJ, Maden-Wilkinson TM, et al. , 2018. Neural adaptations after 4 years vs 12 weeks of resistance training vs untrained. Scand. J. Med. Sci. Sports 29 (3), 348–359. [DOI] [PubMed] [Google Scholar]
- Bénard MR, Becher JG, Harlaar J, et al. , 2009. Anatomical information is needed in ultrasound imaging of muscle to avoid potentially substantial errors in measurement of muscle geometry. Muscle Nerve 39, 652–665. [DOI] [PubMed] [Google Scholar]
- Bland MJ, Altman DG, 1986. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1 (8476), 307–310. [PubMed] [Google Scholar]
- Cicchetti DV, 1994. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol. Assess 6, 284–290. [Google Scholar]
- DeJong AF, Mangum LC, Hertel J, 2020. Ultrasound imaging of the gluteal muscles during the Y-balance test in individuals with or without chronic ankle instability. J. Athl. Train 55, 49–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Folland JP, Williams AG, 2007. The adaptations to strength training. Sports Med. 37, 145–168. [DOI] [PubMed] [Google Scholar]
- Franchi MV, Longo S, Mallinson J, et al. , 2018. Muscle thickness correlates to muscle cross-sectional area in the assessment of strength training-induced hypertrophy. Scand. J. Med. Sci. Sports 28, 846–853. Wiley/Blackwell; (10.1111). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hermens HJ, Freriks B, Disselhorst-Klug C, et al. , 2000. Development of recommendations for SEMG sensors and sensor placement procedures. J. Electromyogr. Kinesiol 10, 361–374. [DOI] [PubMed] [Google Scholar]
- Inacio M, Creath R, Rogers MW, 2019. Effects of aging on hip abductor-adductor neuromuscular and mechanical performance during the weight transfer phase of lateral protective stepping. J. Biomech 82, 244–250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Klimstra M, Dowling J, Durkin JL, et al. , 2007. The effect of ultrasound probe orientation on muscle architecture measurement. J. Electromyogr. Kinesiol. official J.Int Soci Electrophysiological Kinesiology 17, 504–514. [DOI] [PubMed] [Google Scholar]
- Lacerda LT, Marra-Lopes RO, Lanza MB, et al. , 2021. Resistance training with different repetition duration to failure: effect on hypertrophy, strength and muscle activation. PeerJ 9, e10909. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lanza MB, Balshaw TG, Massey GJ, et al. , 2018. Does normalization of voluntary EMG amplitude to MMAX account for the influence of electrode location and adiposity? Scand. J. Med. Sci. Sports 28, 2558–2566. John Wiley & Sons, Ltd; (10.1111). [DOI] [PubMed] [Google Scholar]
- Lanza MB, Kang JH, Karl H, et al. , 2021a. Hip abductor power and velocity: reliability and association with physical function. J. Strength Condit Res Publish Ahead of Print. [DOI] [PubMed] [Google Scholar]
- Lanza MB, Rock K, Marchese V, et al. , 2021b. Hip abductor and adductor rate of torque development and muscle activation, but not muscle size, are associated with functional performance. Front. Physiol 12, 744153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Legerlotz K, Smith HK, Hing WA, 2010. Variation and reliability of ultrasonographic quantification of the architecture of the medial gastrocnemius muscle in young children. Clin. Physiol. Funct. Imag 30, 198–205. [DOI] [PubMed] [Google Scholar]
- Ludbrook J, 2010. Confidence in Altman-Bland plots: a critical review of the method of differences. Clin. Exp. Pharmacol. Physiol 37, 143–149. [DOI] [PubMed] [Google Scholar]
- Miller R, Balshaw TG, Massey GJ, et al. , 2020. The Muscle Morphology of Elite Sprint Running. Medicine & Science in Sports & Exercise. Publish Ah. [DOI] [PubMed] [Google Scholar]
- Mirshams Shahshahani P, Ashton-Miller JA, 2020. On the importance of the hip abductors during a clinical one legged balance test: a theoretical study. In: Masani K (Ed.), PLoS One 15, e0242454. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nordander C, Willner J, Hansson G-Å, et al. , 2003. Influence of the subcutaneous fat layer, as measured by ultrasound, skinfold calipers and BMI, on the EMG amplitude. Eur. J. Appl. Physiol 89, 514–519. Springer-Verlag. [DOI] [PubMed] [Google Scholar]
- Pareja-Blanco F, Alcazar J, Cornejo-Daza PJ, et al. , 2020. Effects of velocity loss in the bench press exercise on strength gains, neuromuscular adaptations, and muscle hypertrophy. Scand. J. Med. Sci. Sports 30, 2154–2166. [DOI] [PubMed] [Google Scholar]
- Rock K, Nelson C, Addison O, et al. , 2021. Assessing the reliability of handheld dynamometry and ultrasonography to measure quadriceps strength and muscle thickness in children, adolescents, and young adults. Phys. Occup. Ther. Pediatr 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Škarabot J, Balshaw TG, Maeo S, et al. , 2021. Neural adaptations to long-term resistance training: evidence for the confounding effect of muscle size on the interpretation of surface electromyography. J. Appl. Physiol japplphysiol.00094.2021. [DOI] [PubMed] [Google Scholar]
- Stasinaki A-N, Zaras N, Methenitis S, et al. , 2019. Rate of force development and muscle architecture after fast and slow velocity eccentric training. Sports 7, 41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Takahashi Y, Fujino Y, Miura K, et al. , 2021. Intra- and inter-rater reliability of rectus femoris muscle thickness measured using ultrasonography in healthy individuals. Ultrasound J 13, 21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van Cant J, Pitance L, Feipel V, 2017. Hip abductor, trunk extensor and ankle plantar flexor endurance in females with and without patellofemoral pain. J. Back Musculoskelet. Rehabil 30, 299–307. [DOI] [PubMed] [Google Scholar]
- Weir JP, 2005. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. J. Strength Condit Res 19, 231, 231. [DOI] [PubMed] [Google Scholar]
- Whittaker JL, Emery CA, 2014. Sonographic measures of the gluteus medius, gluteus minimus, and vastus medialis muscles. J. Orthop. Sports Phys. Ther 44, 627–632. [DOI] [PubMed] [Google Scholar]
- Whittaker JL, Stokes M, 2011. Ultrasound imaging and muscle function. J. Orthop. Sports Phys. Ther 41, 572–580. [DOI] [PubMed] [Google Scholar]
