Interrater reliability of the modified prone instability test for lumbar segmental instability in individuals with mechanical low back pain

Ellen R Larkin; Darren Q Calley; John H Hollman

doi:10.1080/10669817.2024.2352934

. 2024 May 16;32(5):540–547. doi: 10.1080/10669817.2024.2352934

Interrater reliability of the modified prone instability test for lumbar segmental instability in individuals with mechanical low back pain

Ellen R Larkin ^a,^b,^✉, Darren Q Calley ^c, John H Hollman ^c

PMCID: PMC11421138 PMID: 38753496

ABSTRACT

Objective

The purpose of this study was to establish the interrater reliability of measures obtained with a novel Modified Prone Instability Test (mPIT), which, like the original Prone Instability Test (PIT), is proposed to identify lumbar segmental instability. The mPIT has clinical feasibility advantages to the PIT, but its psychometric properties are yet to be determined.

Design

Repeated measures (test-retest) design, methods study

Methods

The mPIT was administered by two blinded testers, an orthopedic physical therapy resident with < 1 year experience and board-certified orthopedic specialist physical therapist with >25 years’ experience. Procedures were administered at an outpatient physical therapy clinic of a tertiary Medical Center. Participants included 50 adults (≥18 years old) with mechanical low back pain and no radicular (below the knee) symptoms (mean age 50.7 years, 66% female, 76% reported previous episodes of low back pain). Interrater reliability was measured via Fleiss’ kappa coefficient.

Results

Assessments of the mPIT had moderate interrater agreement (κ = .579 [95% CI = .302 to .856], p < .001.)

Conclusion

Measures obtained using the mPIT demonstrated moderate interrater reliability between a new graduate and an experienced clinician, which aligns with several studies examining interrater reliability of the original PIT. Further study examining comparative validation of the mPIT with other lumbar instability measures is warranted.

KEYWORDS: Low back pain, instability, reproducibility of results, reliability, prone instability test

Introduction

Lumbar segmental instability (LSI), characterized by excessive motion at a spinal segment, is a common purported cause of low back pain. LSI is influenced by multiple anatomic, movement coordination, metabolic, genetic, and degenerative factors [1,2]. Increased segmental motion has been reported in individuals with disc and facet joint degeneration, annular tears, and traction spurs [3–5]. Degenerative lumbar spine changes are thought to promote a larger neutral zone, less available spinal stiffness, and altered neuromuscular control in mid-ranges of spinal motion [6]. Multiple subjective and objective clinical examination findings are utilized to identify LSI [7,8].

The Prone Instability Test (PIT) has been described as a clinical examination tool to help identify lumbar segmental instability in patients with mechanical low back pain (LBP). Its theoretical basis originates with McGill, who described the mechanism of the test (a lumbar posterior-anterior [PA] manual pressure) as shearing of a lumbar segment anteriorly, followed by patient-generated resistance of that shear force via spinal stiffness achieved by activation of the multifidi, lumbopelvic, and hip extensor muscles [9–11]. Though research is still ongoing, studies have suggested that the PIT is one of four most instrumental findings in determining the benefit of a lumbar stabilization program for the treatment of individuals with LBP [12,13]. While several studies have reported assessments of segmental instability via the PIT have moderate to high interrater reliability, a notable exception in 2011 found low interrater reliability [8,14–17]. Ravenna and colleagues posited that imprecision of the starting position and execution of the PIT, such as variable levels of hip flexion and amounts of supported trunk in the setup due to patient proportions and height of the plinth available and hip extension in the execution, contributed to the lower reliability [16]. In our view, there are also feasibility challenges related to the starting position of the PIT. Namely, transitioning to the starting position of prone with legs resting on the floor can be difficult for patients with limited mobility or elevated pain and utilizes valuable clinical time [18] Limitations in feasibility may prompt the clinician to sacrifice either the PIT’s use or compromise its reliability and validity with inconsistent execution.

We have sought to solve both the feasibility and standardization challenges of the PIT with a modification of the PIT’s starting position (mPIT; see Figure 1) In the traditional PIT, the patient lies prone with trunk on the plinth and hips flexed so their feet are resting on the floor, whereas in the mPIT the patient lies prone with entire body supported by the plinth, hips neutral. Both tests then proceed to 2 phases with multifidi and other lumbopelvic musculature relaxed versus engaged via leg raising. If the patient reports decreased or resolved symptoms in Phase 2 as compared to Phase 1, the test is deemed positive; it is suspected that an anterior shearing force – termed anterior segmental instability – is a mechanism contributing to the patient’s pain [18]. The modified starting position provides improved positional consistency and standardization (rather than variable hip flexion dependent on patient proportions and plinth height) as well as decreases the magnitude of variability in the leg raise that activates the multifidi, lumbopelvic, and hip extensor muscles. According to Sung et al. [11], the starting position of the PIT may also produce more stiffness in the passive structures of the spine than the prone position, which may lead to false negatives by virtue of preventing pain provocation in Phase 1 of the test. Lastly, the prone starting position entirely on the plinth is more efficient and feasible for both patient and clinician; it saves time and energy for patients with mobility challenges and elevated pain [18].

Figure 1. — Phases of the modified and original prone instability tests. In the starting position, the patient lies prone in the modified test (a) and prone with feet on the floor in the original test (b). In phase 1, the examiner provides central posterior-to-anterior mobilizations to lumbar spine segments while the patient lies in the relaxed, comfortable position with lumbopelvic and deep segmental stabilizer muscles relaxed in both the modified (c) and original (d) tests. The examiner states, ‘I am going to put pressure on your low back. Let me know if this maneuver produces your familiar symptoms.’ then in phase 2, the examiner provides central posterior-to-anterior mobilizations to lumbar spine segments while the patient extends the hips, actively recruiting the deep spine stabilizers, erector spinae and hip extensors, in both the modified (e) and original (f) tests. Here, the examiner states, ‘keeping both your legs straight, lift your feet off the table/floor. Now when I push on your back, are your symptoms better, worse, or the same?’ Symptom improvement in phase 2 represents a positive test result.

To our knowledge, there have been no modified versions of the PIT reported in the literature; this is the first study to examine a modified PIT (mPIT). Therefore, we began with examining its psychometric properties. The purpose of this study was to establish the interrater reliability of assessments of segmental instability obtained using the novel mPIT in individuals with mechanical low back pain.

Methods

Participants

Participants (n = 50) were adults (≥18 years old) with current LBP. They were recruited from an outpatient physical therapy clinic and the Physical Medicine and Rehabilitation (PM&R) department of a tertiary care center. Power analysis determined that 50 participants were required for 80% power to detect a kappa coefficient of 0.40 or greater at α = .05. To capture a similar population to that of studies examining reported assessments of segmental instability via the PIT, inclusion and exclusion criteria were modeled from previous studies regarding the interrater reliability of the PIT [8,14–17]. Potential participants were excluded from participating if they had any red-flags (sudden, progressive lower-extremity weakness or loss of reflexes, saddle anesthesia, bowel/bladder changes), symptoms radiating below the knee, known history of spinal surgery or fracture, spinal deformity, systemic inflammatory condition, neurologic disease, or other serious medical conditions that indicate non-musculoskeletal cause of LBP. They were also excluded if they reported current pregnancy or breastfeeding, were non-English readers, or had LBP that was covered by worker’s compensation. The participants signed an informed consent form regarding data collection, participant rights, and reporting of research findings prior to data collection and testing. The study was approved by the appropriate scientific review committees and Institutional Review Board at Mayo Clini (IRB #21–006546).

Examiners

The examiners were two physical therapists, one orthopedic physical therapy resident with <1 year clinical experience and one American Board of Physical Therapy Specialties-certified orthopedic specialist with 25 years’ experience. To clearly define and standardize the mPIT procedure, the examiners underwent one 90-minute training session in which the research and procedure protocol were reviewed. Examiners practiced the mPIT on a physical therapy technician who provided feedback on the comparability of the manual pressure applied and location of palpation. Initial discrepancies in procedural execution were addressed by repeat testing and feedback to each examiner independently. The examiners also utilized standardized phrasing to explain the procedure to each participant (see Figure 1 caption for phrasing). For continuity throughout the data collection phase, the examiners also verbally reviewed the mPIT procedures prior to participant arrival on each data collection day.

Procedures

Descriptive data collection

Participants completed questionnaires describing their demographics, physical activity level and health, and a general history of their LBP. They also completed a Modified Oswestry Disability Index (mODI) and an Optimal Screening for Prediction of Referral and Outcome – Yellow Flags (OSPRO-YF).

The mODI is a valid and reliable measure of LBP-related disability [19]. The degree of disability is categorized by the mODI as follows: 0–20% = minimal, 21–40% = moderate, 41–60% = severe, 61–80% = crippled, 81–100% = bed-bound or exaggerated symptoms [20].

The OSPRO-YF was used to characterize psychological dimensions deemed relevant to LBP. The OSPRO-YF is a yellow flag assessment tool developed through the American Physical Therapy Association’s Academy of Orthopedic Physical Therapy to accurately estimate multiple individual psychological questionnaire scores (such as the Fear Avoidance Beliefs Questionnaire [FABQ] and Tampa Scale of Kinesiophobia [TSK-11]) in a concise manner. It also identifies the presence of yellow flags, defined as scores that fall in the top quartile for negative psychological questionnaires or the bottom quartile for positive psychological questionnaires [21]. Its 17-, 10-, and 7-item versions were validated in 2017; the 17-item version was used because it has the strongest psychometrics [22]. The OSPRO-YF was used in the descriptive data as a standardized, validated way to describe relevant psychological characteristics of the participant cohort.

Participants’ height and weight were also collected.

Procedure: starting position

For the starting position, each participant was instructed to lie prone on a plinth with arms down by their sides. They were allowed according to their preference to either turn their head to the side or put it in neutral cervical rotation using the opening in the plinth. The participant’s lumbar spine was exposed for palpation directly over skin. The examiner palpated the lower border of the ribs to locate the thoracolumbar junction and instructed the participant to report if manual pressure subsequently performed in Phase 1 provoked their familiar LBP symptoms (see Figure caption for phrasing).

Procedure: phase 1

In Phase 1, the examiner applied a central PA manual pressure over each lumbar segment (starting at L1, stopping at L5), pausing at the first segment that the participant indicated provocation of their symptoms. Next, the examiner applied manual pressure over the subsequent lumbar segment and asked the participant if this segment was more provocative or less provocative than the original prior segment. If less provocative, the examiner palpated again over the original provocative segment and performed Phase 2 of the mPIT there. If the subsequent segment was more provocative than the original, Phase 2 was performed at that subsequent segment.

If the manual pressure did not provoke the participant’s familiar symptoms at any segment, the examiner repeated the process down the entire lumbar spine an additional time for confirmation. If the manual pressure again did not provoke familiar symptoms, the test was discontinued and marked negative, rather than continuing to Phase 2.

Procedure: phase 2

In Phase 2, the examiner kept their hands at the most provocative segment to ensure accuracy of the manual pressure location. The participant was instructed to keep both knees straight and lift both legs about 1 inch off the table. This activated the lumbopelvic musculature to increase spinal stiffness and resistance of the provocative PA manual pressure. The examiner then performed the same manual pressure to the provocative lumbar segment and asked the participant if their symptoms were improved, worsened, or the same. Participant report of improved symptoms was marked as a positive test. Participant report of worsened or same symptoms was marked as a negative test.

The first examiner instructed the participant to follow the instructions of the second examiner and refrain from sharing the first test results with the second examiner. Approximately 1 minute after the first examiner completed the test, the second examiner entered the exam room and performed the mPIT. The order of examiners was randomized via computer software. Each examiner stored their results separately and did not discuss participants and testing until all data collection was completed. To assist with standardization and minimize testing effects, participants who inquired regarding the interpretation of the test were provided with this information only after the entire study procedure was completed.

Data analysis

Descriptive statistics were calculated regarding individual participant characteristics. These included demographic data, pain ratings, and objective measures of LBP-related disability and of psychological factors deemed relevant to LBP. Descriptive statistics and the kappa coefficient were calculated using IBM SPSS Statistics 28 software (IBM Corp, Armonk, NY, USA).

The primary outcome of the study was a measure of inter-examiner agreement as described by Fleiss’ kappa [23]. Fleiss’ kappa is used to represent two or more raters’ level of agreement after adjusting for chance when measuring a variable on a categorical scale [23]. Secondarily, kappa coefficients were used for subgroup discriminators based on (A) magnitude of disability, i.e. participants with minimal disability (Oswestry scores 0–20, n = 27) and those with moderate to severity disability (Oswestry scores 21–40 and 41–60, respectively, n = 23) and (B) symptom chronicity, i.e. participants with acute or subacute symptoms (<4 weeks and 4–12 weeks, respectively, n = 18) and those with chronic symptoms (>12 weeks, n = 21). The standards for interpreting interrater reliability proposed by Landis and Koch were used: <.01 = poor, .01–.20 = slight, .21–.40 = fair, .41–.60 = moderate, .61–.80 = substantial, .81–1 = almost perfect [23]. Prevalence and bias indices were also calculated [23].

Results

Descriptive data for our sample are presented in Table 1. In brief, the sample was 66% female with a mean age of 50.7 years (SD 18.7). Mean self-reported pain level was 4.1 (2.1) on the 10-point Visual Analogue Scale (VAS) and the majority (76%) reported that this was a recurrent episode of back pain. The mean mODI was 21.6 (14.7), or ‘moderate’ LBP-related disability. Presence of yellow flags as determined by the OSPRO-YF was highly variable (as low as 14.6% indicated on the estimated FABQ work subscale and as high as 43.8% on the estimated TSK-11.) Central tendencies and variance values for each OSPRO-YF estimate are reported in Table 1.

Table 1.

Descriptive characteristics of participants.

	Frequencies	Value
Age (years)	50	50.7 ± 18.7 (23.0–92.0)
Gender
Male	17	34.0%
Female	33	66.0%
Total	50
Body Mass Index (kg/m²)	50	29.4 ± 6.4 (19.5–43.1)
Physical Activity/week (minutes)
0–30	9	18.0%
30–60	11	22.0%
60–120	14	28.0%
120–150	8	16.0%
150+	8	16.0%
Total	50
Perceived Health Rating
Poor	1	2.0%
Fair	2	4.1%
Good	19	38.8%
Very Good	20	40.8%
Excellent	7	14.3%
Total	49
Pain (VAS^a)	48	4.1 ± 2.1 (1.0–8.0)
Prior LBP^bepisodes?
Yes	38	76.0%
No	12	24.0%
Total	50
Chronicity of pain
Acute (<4 weeks)	7	17.9%
Subacute (4–12 weeks)	11	28.2%
Chronic (>12 weeks)	21	53.8%
Total	39
Disability (mODI^c)		21.6 ± 14.7 (0.0–58.0)
Minimal	27	54.0%
Moderate	18	36.0%
Severe	5	10.0%
Crippled	0	0%
Bed-bound	0	0%
Total	50
		Presence of Yellow Flags % (number)	Mean Score (95% CI)	Std. Deviation
Psychological Estimates (OSPRO-YF^d)*
FABQ-W	47	25.5% (12)	9.5 (7.4, 11.6)	7.1
FABQ-PA	48	14.6% (7)	10.8 (9.4, 12.2)	4.8
TSK-11	48	43.8% (21)	21.8 (20.4, 23.3)	4.9
PCS	48	35.4% (17)	16.0 (13.7, 18.3)	7.9
STAI	48	29.2% (14)	35.5 (33.7, 37.3)	6.1
STAXI	48	18.8% (9)	14.7 (13.7, 15.6)	3.3
PHQ-9	48	29.2% (14)	5.27 (4.3, 6.3)	3.4
PASS20	48	41.7% (20)	32.5 (27.3, 37.7)	17.6
PSEQ	48	25.0% (12)	43.9 (41.3, 46.4)	8.7
SER	48	20.8% (10)	105 (102, 108)	9.9
CPAQ	48	22.9% (11)	73.5 (69.1, 77.9)	15.0

Open in a new tab

Values expressed as Mean ± SD or %, unless otherwise noted in the columns.

*See Appendix for abbreviations key for OSPRO-YF test estimates.

^aVisual Analogue Scale.

^bLow back pain.

^cModified Oswestry Disability Index.

^dOptimal Screening for Prediction of Referral and Outcome-Yellow Flags.

The examiners achieved an 82% agreement rate and κ= .579 (95% CI = .302 to .856), p < .001. Examiner 1 identified 36 positive and 14 negative test results, whereas Examiner 2 identified 33 positive and 17 negative test results (Table 2). Both prevalence and bias indices (0.38 and 0.06, respectively) were low and did not greatly impact kappa.

Table 2.

Distribution of rater outcomes for mPIT [6].

		Examiner 2 (E2)
		E2 Positive	E2 Negative	Total Outcomes
Examiner 1 (E1)	E1 Positive	30	6	36
	E1 Negative	3	11	14
		33	17	50

Open in a new tab

^aModified Prone Instability Test.

Subgroup analyses yielded variable findings (Table 3). For participants with minimal disability based on Oswestry scores, there was substantial agreement between the two raters, κ = 0.667 (95% CI = 0.367 to 0.966), p < .001. In contrast, for participants with moderate to severe disability, there was more moderate agreement, κ = 0.475 (95% CI = 0.097 to 0.853), p = 0.016. Based on symptom chronicity, raters agreed moderately for participants with acute/subacute symptoms, κ = 0.462 (95% CI = 0.122 to 0.801), p = .011, but agreed substantially for those with chronic symptoms, κ = 0.765 (95% CI = 0.464 to 0.999), p < .001.

Table 3.

Kappa subgroup analyses.

		Kappa	Confidence Interval	Number of Participants	Agreement	p value
Disability Rating (mODI)	Minimal disability	0.667	0.367–0.966	27	Substantial	<.001
Disability Rating (mODI)	Moderate to severe disability	0.475	0.097–0.853	23	Moderate	0.016
Chronicity of symptoms	Acute and Subacute	0.462	0.122–0.801	18	Moderate	0.011
Chronicity of symptoms	Chronic	0.765	0.464–0.999	21	Substantial	<.001

Open in a new tab

Discussion

The purpose of this study was to establish the interrater reliability of assessments obtained using the novel Modified Prone Instability Test. We found moderate reliability in the assessments derived from this test [24].

Overall, the kappa value produced by this study aligns with several studies examining the interrater reliability of the standard PIT. With the exception of Ravenna et al., most psychometric studies of the PIT have reported moderate to high interrater reliability with kappa ranging between 0.54 and 0.87 (Ravenna et al.: κ = 0.27) [8,14–17].

Additionally, the participant characteristics in this study are similar to several of other PIT interrater reliability studies, which strengthens the comparability between the main outcome of this study and previously established interrater reliability of the PIT. Four of five studies reported most participants had previous low back pain. The prevalence of LBP among participants in our study was 76%, whereas the prevalence of LBP among participants in other similar studies ranged from 66% to 83.7% [14,15]. Disability as reported by either the mODI or ODI was also similar, with an average of moderate to minimal disability. 21% of participants in our study reported moderate disability, whereas moderate levels of disability reported by participants in other related studies ranged from 17.7% to 34.9% [8,15]. Reported age and sex of the participants slightly varied, with this study having an average age of 50.7 (others: 33.5 [15]-39.2 [14]) and 66% female (others: 43% [16]-60% [8]).

The use of subgroup discriminators yielded variable but potentially useful findings as well. There was substantial agreement among both the group rated to have minimal disability (via mODI) and the group who rated their symptom chronicity as greater than 12 weeks. This may suggest assessments yielded from the mPIT conducted on patients in either of these groups may be more helpful between clinicians; that is, a clinician may consider repeat findings (via historical chart review or testing) of the mPIT for such patients to be dependable information which may assist classification into a subgroup responder category for treatment of LBP [12,13]. Further testing as to the validity of the mPIT is warranted. Additionally, these subgroup responder categories would benefit from repeat testing at a larger participant number for sufficiently powered statistics. To our knowledge, such stratification for assessments yielded from either the PIT or the mPIT is unique to this study; this protocol may therefore serve as a basis for future inquiries.

The standardized prone starting position of the mPIT may have contributed favorably to its interrater reliability. Both the PIT and mPIT require participants to increase spinal stiffness by activating their lumbopelvic musculature through active hip extension [11]. However, the starting position of the mPIT necessitates participants to move from a standard prone position to hips slightly hyperextended; hip hyperextension has a significantly smaller normal active range of motion than the hip flexion of the PIT’s starting position to Phase 2 position [18]. Therefore, the prone starting position of the mPIT has less potential variability than the starting position of the PIT.

The starting/Phase 1 position of the mPIT versus that of the PIT may also promote a more consistent execution of the PA manual pressure. In their 2019 paper, Sung et al. [11] found that in both study groups (individuals with and without LBP), there was a statistically significant increase in spinal stiffness between a resting prone position (which is the starting position/Phase 1 position of the mPIT) versus the traditional starting position of the PIT. They suggested that the flexion of the hips in the starting position of the PIT ‘may place tension in the passive structures’ of the lumbar extensor mechanism, which ‘may potentially increase stiffness enough to prevent pain production, resulting in a false-negative test result.’ [11](p905) This supports the notion that resting completely prone in the starting/Phase 1 position of the mPIT may aid in limiting false negatives of the test.

With the establishment of comparable interrater reliability to the original PIT and decreased spinal stiffness in the early phases of the mPIT, advantages in feasibility support the mPIT as the pragmatic choice to help identify lumbar segmental instability. Because some clinical lumbar exams include the prone position, this key difference in starting position makes an assessment to identify lumbar segmental instability more accessible, particularly for patients with mobility challenges or who experience elevated pain during transition movements. It also saves the clinician time in repositioning the patient for a single test.

Limitations

Though we had moderate agreement, a large confidence interval (κ = .579 [95% CI = .302 to .856]), p < .001) does indicate that caution must be used when interpreting the results of this study. Repeat studies using this protocol would be helpful to establish more conclusive results. To our knowledge, this is the first study examining the mPIT as a novel procedure. As is the case with the PIT, there is no gold standard for establishing validity of this pain-provocative test. Therefore, in clinically using the mPIT rather than the PIT, one is making the yet-unsupported assumption that the test can validly identify lumbar segmental instability. With interrater reliability preliminarily established, future directions may compare the mPIT and PIT for agreement or examine the validity of the mPIT to predict radiographic instability similar to Fritz et al. in 2005 [14].

Thirdly, while feedback on the force of manual pressures was given during the training session, we did not use a device to control for a standardized level of force with manual pressures. This lack of standardization may have impacted participants’ response to a potentially provocative manual pressure, though we believe our pragmatic approach to be a truer reproduction of this test as used in the clinic. Lastly, testing effects (order effects, response bias) may have impacted participant responses. We attempted to mitigate these by randomizing examiner order and specifically instructing participants to refrain from sharing their previous response with the second examiner.

Conclusion

This study established a moderate interrater reliability of assessments yielded by performance of the mPIT. This is the first study known to examine the novel mPIT, where the patient starting position is modified to prone lying with lower extremities supported on a treatment table rather than prone with legs off the table and variably flexed hips. As we did not examine validity of the mPIT, conclusions regarding the diagnostic accuracy of the test cannot yet be drawn. With interrater reliability preliminarily established, future directions may compare the assessments of the mPIT and PIT, and examine the validity of the mPIT as a clinical test for diagnosing lumbar segmental instability [14].

Key Points

Findings: Assessments obtained by the novel mPIT demonstrated moderate interrater reliability between a new graduate and an experienced clinician (κ = .579, p < .001).
Implications: With the establishment of comparable interrater reliability to the original PIT and decreased spinal stiffness in the early phases of the mPIT, advantages in feasibility over the PIT support the mPIT as the pragmatic test of choice to help identify lumbar segmental instability.
Caution: A large confidence interval (95% CI = .302 to .856) warrants caution when interpreting the results of this study. The validity of the mPIT has not yet been examined; further study is appropriate prior to drawing conclusions regarding its diagnostic value.

Supplementary Material

Supplemental Material

YJMT_A_2352934_SM0518.docx^{(13.2KB, docx)}

Acknowledgements

The authors would like to acknowledge Dr. Megan Erlandson, PT, DPT, OCS, Dr. Blake Robinson, PT, DPT, OCS, and Dr. Leah Wurm, PT, DPT, SCS for assistance with data collection and direction of the participants. Additionally, we would like to thank Dr. Karli Kerzman, PT, DPT, Dr. Kathleen Michaels, PT, DPT, and Dr. Elora Koepcke, PT, DPT for critical reviews of the study proposal, participation in the technical editing of the manuscript, and figure production.

Biographies

Ellen R. Larkin, PT, DPT, OCS, graduated from the Mayo Clinic School of Health Sciences Program in Physical Therapy in 2020 and completed the Mayo Clinic Orthopaedic Physical Therapy Residency in 2022. She is a full-time orthopaedic physical therapist in at M Health Fairview in Fridley, MN. Her research interests include clinical utility of diagnostic testing in low back pain, biopsychosocial factors related to musculoskelatal pain, and diagnosis and interventions in shoulder pain for the overhead athlete.

Darren Q. Calley, PT, DScPT, OCS, is the Program Director of the Mayo Clinic Physical Therapy Residency Programs and Assistant Professor of Physical Therapy in the Mayo Clinic College of Medicine and Science in Rochester, MN. He has 28 years of clinical experience specializing in orthopaedic physical therapy, with primary research interests in clinical & residency education, musculoskeletal examination & interventions, and biopsychosocial factors related to musculoskeletal pain.

John H. Hollman, PT, PhD, is Program Director of the Program in Physical Therapy and Associate Dean for Academic Affairs in the Mayo Clinic School of Health Sciences, and Professor of Physical Therapy in the Mayo Clinic College of Medicine and Science in Rochester, MN. He has been a physical therapist for 30 years and earned a PhD in Biomechanics, with specific expertise in kinematics and kinesiologic electromyography. His primary research interests include gait dynamics and lower extremity biomechanics.

Funding Statement

Funding provided by the Mayo Clinic Physical Medicine and Rehabilitation Department at Mayo Clinic, under Grant #21-006546.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Supplemental data

Supplemental data for this article can be accessed online at https://doi.org/10.1080/10669817.2024.2352934.

References

[1].Panjabi MM. The stabilizing system of the spine. Part I. Function, dysfunction, adaptation, and enhancement. J Spinal Disord. 1992. Dec;5(4):383–389; discussion 397. doi: 10.1097/00002517-199212000-00001 PMID: 1490034. [DOI] [PubMed] [Google Scholar]
[2].Panjabi MM. The stabilizing system of the spine. Part II. Neutral zone and instability hypothesis. J Spinal Disord. 1992. Dec;5(4):390–396; discussion 397. doi: 10.1097/00002517-199212000-00002 PMID: 1490035. [DOI] [PubMed] [Google Scholar]
[3].Bram J, Zanetti M, Min K, et al. MR abnormalities of the intervertebral disks and adjacent bone marrow as predictors of segmental instability of the lumbar spine. Acta Radiol. 1998;39(1):18–23. doi: 10.1080/02841859809172143 [DOI] [PubMed] [Google Scholar]
[4].Kong MH, Hymanson HJ, Song KY, et al. Kinetic magnetic resonance imaging analysis of abnormal segmental motion of the functional spine unit. J Neurosurg Spine. 2009;10(4):357–365. doi: 10.3171/2008.12.SPINE08321 [DOI] [PubMed] [Google Scholar]
[5].Schinnerer KA, Katz LD, Grauer JN. MR findings of exaggerated fluid in facet joints predicts instability. J Spinal Disord Tech. 2008;21(7):468–472. doi: 10.1097/BSD.0b013e3181585bab [DOI] [PubMed] [Google Scholar]
[6].Beazell JR, Mullins M, Grindstaff TL. Lumbar instability: an evolving and challenging concept. J Man Manip Ther. 2010. Mar;18(1):9–14. doi: 10.1179/106698110X12595770849443 PMID: 21655418; PMCID: PMC3103111. [DOI] [PMC free article] [PubMed] [Google Scholar]
[7].Cook C, Brismée JM, Sizer PS. Subjective and objective descriptors of clinical lumbar spine instability: a delphi study. Manual Therapy. 2006;11(1):11–21. doi: 10.1016/j.math.2005.01.002 [DOI] [PubMed] [Google Scholar]
[8].Hicks GE, Fritz JM, Delitto A, et al. Interrater reliability of clinical examination measures for identification of lumbar segmental instability. Arch Phys Med Rehabil. 2003. Dec;84(12):1858–1864. doi: 10.1016/s0003-9993(03)00365-4 [DOI] [PubMed] [Google Scholar]
[9].McGill S. Low back disorders: evidence-based prevention and rehabilitation. Champaign, Illinois: Human Kinetics; 2002. [Google Scholar]
[10].McGill SM, Hughson RL, Parks K.. Changes in lumbar lordosis modify the role of the extensor muscles. Clin Biomech (Bristol, Avon). 2000. Dec;15(10):777–780. doi: 10.1016/s0268-0033(00)00037-1 [DOI] [PubMed] [Google Scholar]
[11].Sung W, Hicks GE, Ebaugh D, et al. Individuals with and without low back pain use different motor control strategies to achieve spinal stiffness during the prone instability test. J Orthop Sports Phys Ther. 2019. 12;49(12):899–907. doi: 10.2519/jospt.2019.8577 [DOI] [PubMed] [Google Scholar]
[12].Hicks GE, Fritz JM, Delitto A, et al. Preliminary development of a clinical prediction rule for determining which patients with low back pain will respond to a stabilization exercise program. Arch Phys Med Rehabil. 2005. Sep;86(9):1753–1762. doi: 10.1016/j.apmr.2005.03.033 [DOI] [PubMed] [Google Scholar]
[13].Rabin A, Shashua A, Pizem K, et al. A clinical prediction rule to identify patients with low back pain who are likely to experience short-term success following lumbar stabilization exercises: a randomized controlled validation study. J Orthop Sports Phys Ther. 2014. Jan;44(1):6–B13. doi: 10.2519/jospt.2014.4888 [DOI] [PubMed] [Google Scholar]
[14].Fritz JM, Piva SR, Childs JD. Accuracy of the clinical examination to predict radiographic instability of the lumbar spine. Eur Spine J. 2005. Oct;14(8):743–750. doi: 10.1007/s00586-004-0803-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
[15].Rabin A, Shashua A, Pizem K, et al. The interrater reliability of physical examination tests that may predict the outcome or suggest the need for lumbar stabilization exercises. J Orthop Sports Phys Ther. 2013. Feb;43(2):83–90. doi: 10.2519/jospt.2013.4310 [DOI] [PubMed] [Google Scholar]
[16].Ravenna MM, Hoffman SL, van Dillen LR. Low interrater reliability of examiners performing the prone instability test: a clinical test for lumbar shear instability. Arch Phys Med Rehabil. 2011. Jun;92(6):913–919. doi: 10.1016/j.apmr.2010.12.042 [DOI] [PMC free article] [PubMed] [Google Scholar]
[17].Schneider M, Erhard R, Brach J, et al. Spinal palpation for lumbar segmental mobility and pain provocation: an interexaminer reliability study. J Manipulative Physiol Ther. 2008;31(6):465–473. doi: 10.1016/j.jmpt.2008.06.004 [DOI] [PubMed] [Google Scholar]
[18].Magee DJ. Orthopedic physical assessment. 5th ed. St. Louis, Missouri: Saunders Elsevier; 2008. [Google Scholar]
[19].Fritz JM, Irrgang JJ. A comparison of a modified Oswestry low back pain disability questionnaire and the Quebec back pain disability scale. Phys Ther. 2001. Feb;81(2):776–788. doi: 10.1093/ptj/81.2.776 [DOI] [PubMed] [Google Scholar]
[20].Deyo RA, Andersson G, Bombardier C, et al. Outcome measures for studying patients with low back pain. Spine (phila Pa 1976). Spine. [1994 Sep 15];19(18 Suppl):2032S–2036S. doi: 10.1097/00007632-199409151-00003 [DOI] [PubMed] [Google Scholar]
[21].Lentz TA, Beneciuk JM, Bialosky JE, et al. Development of a yellow flag assessment tool for orthopaedic physical therapists: results from the optimal screening for prediction of referral and outcome (OSPRO) cohort. J Orthop Sports Phys Ther. 2016. May;46(5):327–343. doi: 10.2519/jospt.2016.6487 [DOI] [PubMed] [Google Scholar]
[22].George SZ, Beneciuk JM, Lentz TA, et al. Optimal screening for prediction of referral and outcome (OSPRO) for musculoskeletal pain conditions: results from the validation cohort. J Orthop Sports Phys Ther. 2018. 06;48(6):460–475. doi: 10.2519/jospt.2018.7811 [DOI] [PMC free article] [PubMed] [Google Scholar]
[23].Sim J, Wright CC. The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Phys Ther Mar. 2005;85(3):257–268. doi: 10.1093/ptj/85.3.257 [DOI] [PubMed] [Google Scholar]
[24].Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977. Mar;33(1):159–174. doi: 10.2307/2529310 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material

YJMT_A_2352934_SM0518.docx^{(13.2KB, docx)}

[cit0001] [1].Panjabi MM. The stabilizing system of the spine. Part I. Function, dysfunction, adaptation, and enhancement. J Spinal Disord. 1992. Dec;5(4):383–389; discussion 397. doi: 10.1097/00002517-199212000-00001 PMID: 1490034. [DOI] [PubMed] [Google Scholar]

[cit0002] [2].Panjabi MM. The stabilizing system of the spine. Part II. Neutral zone and instability hypothesis. J Spinal Disord. 1992. Dec;5(4):390–396; discussion 397. doi: 10.1097/00002517-199212000-00002 PMID: 1490035. [DOI] [PubMed] [Google Scholar]

[cit0003] [3].Bram J, Zanetti M, Min K, et al. MR abnormalities of the intervertebral disks and adjacent bone marrow as predictors of segmental instability of the lumbar spine. Acta Radiol. 1998;39(1):18–23. doi: 10.1080/02841859809172143 [DOI] [PubMed] [Google Scholar]

[cit0004] [4].Kong MH, Hymanson HJ, Song KY, et al. Kinetic magnetic resonance imaging analysis of abnormal segmental motion of the functional spine unit. J Neurosurg Spine. 2009;10(4):357–365. doi: 10.3171/2008.12.SPINE08321 [DOI] [PubMed] [Google Scholar]

[cit0005] [5].Schinnerer KA, Katz LD, Grauer JN. MR findings of exaggerated fluid in facet joints predicts instability. J Spinal Disord Tech. 2008;21(7):468–472. doi: 10.1097/BSD.0b013e3181585bab [DOI] [PubMed] [Google Scholar]

[cit0006] [6].Beazell JR, Mullins M, Grindstaff TL. Lumbar instability: an evolving and challenging concept. J Man Manip Ther. 2010. Mar;18(1):9–14. doi: 10.1179/106698110X12595770849443 PMID: 21655418; PMCID: PMC3103111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0007] [7].Cook C, Brismée JM, Sizer PS. Subjective and objective descriptors of clinical lumbar spine instability: a delphi study. Manual Therapy. 2006;11(1):11–21. doi: 10.1016/j.math.2005.01.002 [DOI] [PubMed] [Google Scholar]

[cit0008] [8].Hicks GE, Fritz JM, Delitto A, et al. Interrater reliability of clinical examination measures for identification of lumbar segmental instability. Arch Phys Med Rehabil. 2003. Dec;84(12):1858–1864. doi: 10.1016/s0003-9993(03)00365-4 [DOI] [PubMed] [Google Scholar]

[cit0009] [9].McGill S. Low back disorders: evidence-based prevention and rehabilitation. Champaign, Illinois: Human Kinetics; 2002. [Google Scholar]

[cit0010] [10].McGill SM, Hughson RL, Parks K.. Changes in lumbar lordosis modify the role of the extensor muscles. Clin Biomech (Bristol, Avon). 2000. Dec;15(10):777–780. doi: 10.1016/s0268-0033(00)00037-1 [DOI] [PubMed] [Google Scholar]

[cit0011] [11].Sung W, Hicks GE, Ebaugh D, et al. Individuals with and without low back pain use different motor control strategies to achieve spinal stiffness during the prone instability test. J Orthop Sports Phys Ther. 2019. 12;49(12):899–907. doi: 10.2519/jospt.2019.8577 [DOI] [PubMed] [Google Scholar]

[cit0012] [12].Hicks GE, Fritz JM, Delitto A, et al. Preliminary development of a clinical prediction rule for determining which patients with low back pain will respond to a stabilization exercise program. Arch Phys Med Rehabil. 2005. Sep;86(9):1753–1762. doi: 10.1016/j.apmr.2005.03.033 [DOI] [PubMed] [Google Scholar]

[cit0013] [13].Rabin A, Shashua A, Pizem K, et al. A clinical prediction rule to identify patients with low back pain who are likely to experience short-term success following lumbar stabilization exercises: a randomized controlled validation study. J Orthop Sports Phys Ther. 2014. Jan;44(1):6–B13. doi: 10.2519/jospt.2014.4888 [DOI] [PubMed] [Google Scholar]

[cit0014] [14].Fritz JM, Piva SR, Childs JD. Accuracy of the clinical examination to predict radiographic instability of the lumbar spine. Eur Spine J. 2005. Oct;14(8):743–750. doi: 10.1007/s00586-004-0803-4 [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0015] [15].Rabin A, Shashua A, Pizem K, et al. The interrater reliability of physical examination tests that may predict the outcome or suggest the need for lumbar stabilization exercises. J Orthop Sports Phys Ther. 2013. Feb;43(2):83–90. doi: 10.2519/jospt.2013.4310 [DOI] [PubMed] [Google Scholar]

[cit0016] [16].Ravenna MM, Hoffman SL, van Dillen LR. Low interrater reliability of examiners performing the prone instability test: a clinical test for lumbar shear instability. Arch Phys Med Rehabil. 2011. Jun;92(6):913–919. doi: 10.1016/j.apmr.2010.12.042 [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0017] [17].Schneider M, Erhard R, Brach J, et al. Spinal palpation for lumbar segmental mobility and pain provocation: an interexaminer reliability study. J Manipulative Physiol Ther. 2008;31(6):465–473. doi: 10.1016/j.jmpt.2008.06.004 [DOI] [PubMed] [Google Scholar]

[cit0018] [18].Magee DJ. Orthopedic physical assessment. 5th ed. St. Louis, Missouri: Saunders Elsevier; 2008. [Google Scholar]

[cit0019] [19].Fritz JM, Irrgang JJ. A comparison of a modified Oswestry low back pain disability questionnaire and the Quebec back pain disability scale. Phys Ther. 2001. Feb;81(2):776–788. doi: 10.1093/ptj/81.2.776 [DOI] [PubMed] [Google Scholar]

[cit0020] [20].Deyo RA, Andersson G, Bombardier C, et al. Outcome measures for studying patients with low back pain. Spine (phila Pa 1976). Spine. [1994 Sep 15];19(18 Suppl):2032S–2036S. doi: 10.1097/00007632-199409151-00003 [DOI] [PubMed] [Google Scholar]

[cit0021] [21].Lentz TA, Beneciuk JM, Bialosky JE, et al. Development of a yellow flag assessment tool for orthopaedic physical therapists: results from the optimal screening for prediction of referral and outcome (OSPRO) cohort. J Orthop Sports Phys Ther. 2016. May;46(5):327–343. doi: 10.2519/jospt.2016.6487 [DOI] [PubMed] [Google Scholar]

[cit0022] [22].George SZ, Beneciuk JM, Lentz TA, et al. Optimal screening for prediction of referral and outcome (OSPRO) for musculoskeletal pain conditions: results from the validation cohort. J Orthop Sports Phys Ther. 2018. 06;48(6):460–475. doi: 10.2519/jospt.2018.7811 [DOI] [PMC free article] [PubMed] [Google Scholar]

[cit0023] [23].Sim J, Wright CC. The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Phys Ther Mar. 2005;85(3):257–268. doi: 10.1093/ptj/85.3.257 [DOI] [PubMed] [Google Scholar]

[cit0024] [24].Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977. Mar;33(1):159–174. doi: 10.2307/2529310 [DOI] [PubMed] [Google Scholar]

PERMALINK

Interrater reliability of the modified prone instability test for lumbar segmental instability in individuals with mechanical low back pain

Ellen R Larkin

Darren Q Calley

John H Hollman

ABSTRACT

Objective

Design

Methods

Results

Conclusion

Introduction

Figure 1.

Methods

Participants

Examiners

Procedures

Descriptive data collection

Procedure: starting position

Procedure: phase 1

Procedure: phase 2

Data analysis

Results

Table 1.

Table 2.

Table 3.

Discussion

Limitations

Conclusion

Key Points

Supplementary Material

Acknowledgements

Biographies

Funding Statement

Disclosure statement

Supplemental data

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases