ABSTRACT
Objectives
The physiotherapy literature lacks high-quality, registered systematic reviews (SRs) and ‘trustworthy’ randomized controlled trials (RCTs). It is unknown whether considering quality and ‘trustworthiness’ impact publication bias, heterogeneity, and the certainty of clinical recommendations observed in the literature.
Methods
We performed a methodological review of SRs investigating physiotherapy treatment of neuromusculoskeletal conditions indexed by MEDLINE, between 1 January 2018, and 25 October 2023. Blinded reviewers examined the prospective intent and quality of SRs and the ‘trustworthiness’ of RCTs included therein. Blinded reviewers extracted data for the variables of interest (Numeric Pain Rating Scale and Visual Analog Scale).
Results
Of the SRs identified (N = 677), 13 were included in the final review. These included a total of 109 RCTs, including duplicates. Only eight of these trials were deemed ‘trustworthy.’ Publication bias was identified, and heterogeneity across the trials (N = 55) included in the quantitative analysis was high (I2 = 80.11%, 95% CI [75.88, 83.60]). Publication bias and heterogeneity were eliminated (I2 = 0%, 95% CI [0.00, 37.44]) upon considering those prospectively registered (N = 14). Statistical significance, assessed via the p-value at baseline (<.001), was eliminated (p = .746) once prospective, external, and internal validity was considered. Statistical inference through estimation, evaluated via effect size, confidence intervals, and minimal detectable change, was not present at baseline and reduced throughout the screening process.
Discussion
Trials of musculoskeletal interventions to manage pain in patients with neuromusculoskeletal conditions lack certainty and confidence in their treatment effects and exhibit high heterogeneity. Statistically significant effects and heterogeneity are eliminated when considering ‘trustworthy’ quality evidence.
Conclusions
Consistent with previous findings, null effects, and low heterogeneity arise when considering the best available evidence. Meaningful effects are likely rare when assessed holistically using statistical inference through estimation and the confidence and certainty of the estimated effect.
Introduction
Recent systematic reviews (SRs) examining the effectiveness of physiotherapy (PT) interventions on neuromusculoskeletal impairments have reported high levels of statistical heterogeneity (also known as statistical variability) [1–9]. Although the average treatment effect informs the clinician of the treatment effect for most patients, that variability of the treatment effect (also known as heterogeneity) is essential when determining whether an intervention’s effect is precise enough to be helpful in the clinic for treating an individual patient. Statistical heterogeneity is assessed quantitatively via statistical testing [10] and measures the variation in outcome effects across the randomized controlled trials (RCTs) included in an SR [11,12] that is not due to chance [12]. Heterogeneity is commonly expressed via the I2 statistic. As a percentage, heterogeneity measured via I2 is considered low at 25%, moderate at 50%, and high at 75% [13]. Heterogeneity is expected when comparing results across RCTs [14], but strict guidelines on proceeding with meta-analysis based solely on heterogeneity using the I2 statistic still need to be improved to allow for the interpretation of whom the research findings apply (also known as generalizability) [10].
Several recent SRs examining the effectiveness of PT treatment for musculoskeletal impairments have completed a synthesis of findings with meta-analysis while reporting high levels of heterogeneity on some of their comparisons (reporting an I2 greater than 50%), including Devonshire et al. (0–93.6%) [8], Gränicher et al. (0–91%) [6], Mueller et al. (0–98%) [9], Paraskevopoulos et al. (0–88%) [2], Piano et al. (0–92%) [4], Prat-Luri et al. (0–97.52%) [7], Runge et al. (0–91%) [5], Saito et al. (0–89%) [3], and Satpute et al. (>75%, but as high as 97%) [1]. These authors do not report 95% confidence intervals (CIs) for I2 [1–9], leaving the variation in the precision of these I2 estimates unknown. The authors varied in their responses to high heterogeneity. For example, Gränicher et al. [6] downgraded the certainty of evidence due to inconsistency, while Runge et al. [5] performed a sensitivity analysis in the meta-analyses with I2 >50%, demonstrating a reduction in I2 when studies were removed from synthesis. O’Connell et al. [15] have shown a median decrease in I2 of 27% when removing ‘untrustworthy’ trials that: 1) do not have transparent pre-registration of protocols, 2) do not have appropriate ethical approval and transparent data stewardship, and 3) have potential indicators of research misconduct.
Kaplan and colleagues have shown us that heterogeneity decreases and null effects become normal when prospective registration is required and verified by the grant funder [16]. Although this is a known issue in research, there has yet to be a comparable strict enforcement of prospective clinical trial registration in PT-published research as a requirement for funding. In 2012, the policy of mandatory prospective registration was heralded as a means of reducing ‘selective reporting and publication bias,’ with many International Society of Physiotherapy Journal Editors (ISPJE) journals allowing a grace period from 2013 to 2015 [17]. Prospective registration ensures a greater internal validity for RCTs [18]. O’Connell et al. [15] have shown that in addition to reductions in heterogeneity and narrowing of CIs, null effects are increased when ‘untrustworthy’ RCTs are removed from synthesis. They recommend that authors evaluate protocol registration and best research practices of the RCTs included within a SR.15 Recent SRs have employed protocols [19] to ensure that RCTs included within SRs can be used to make strong clinical practice recommendations by having high confidence in estimated effects (also known as trustworthiness) [20,21]. Certainty in the estimated effects is established through the RCTs’ prospective validity with the publically available research record, external validity, internal validity, p-values, effect sizes, and CIs that do not overlap [22].
It is widely accepted in many healthcare fields that scientific research is experiencing a replication (also known as ‘trustworthiness’) crisis [23]. One reason for this crisis is the unexplained heterogeneity in research findings, which has been reported to be as high as 75% [24]. Although heterogeneity of research findings is expected, unexplained heterogeneity challenges the reproducibility of published studies. Since replication is a fundamental principle of science, identifying the variables that limit reproducibility, such as publication bias and unexplained heterogeneity, becomes important.
Interpreting the results of RCTs included in SRs requires creating ‘trustworthiness’ and generalizability by minimizing publication bias and unexplained heterogeneity. This is critical to ensuring clinical relevance when applying research to clinical practice [25].
Objectives
The objective of this review was to determine whether publication bias, heterogeneity, and clinical recommendations for the clinical management of patients with neuromusculoskeletal impairments are affected by: 1) prospective registration of SRs, 2) inclusion of ‘trustworthy’ RCTs with established external validity and at least a moderate level of internal validity [19], and 3) evaluation of high confidence in estimated effects (null or otherwise) when examining statistical significance through estimation, clinically relevant metrics of change, and precision of CIs. The ISPJE position is that statistical inference through estimation using effect sizes and CIs should be used when making decisions regarding the clinical efficacy of an intervention [26].
Materials and methods
Protocol and registration
The protocol for this review was reviewed by the Louisiana State University Health Sciences Center at Shreveport Institutional Review Board (IRB) and was considered exempt from IRB oversight (STUDY00002509). This methodological review was prospectively registered (https://doi.org/10.17605/OSF.IO/QK24E) All of the amendments to the protocol, including dates and reasons, can be found in Appendix 1.
This protocol was modified on 5 February 2024, during the data collection process to include the I2 values, 95% CIs of the I2 values, and the prediction interval (PI) values if they were present for the SRs. Additionally, the Grading of Recommendations, Assessment, Development, and Evaluations (GRADE) rating was determined to be present or not; if present, the GRADE certainty rating was recorded, and the recommendations based on the GRADE rating were captured. Portions of this protocol follow Riley et al. [27] for sampling and screening the SRs and Riley et al. [19] for establishing ‘trustworthy’ living SRs in the screening of the RCTs, with prospective modifications more specific to the scope of this paper, including not being specific to an individual body region/joint or a particular PT intervention.
Design
This review was reported in agreement with the PRISMA 2020 statement and flow diagram [28].
Eligibility criteria for SRs
We included SRs with meta-analyses involving musculoskeletal PT interventions treating patients with neuromusculoskeletal impairments and assessing patient improvement in pain outcomes (only via the Visual Analog Scale [VAS] and/or Numeric Pain Rating Scale [NPRS]) published in English between 1 January 2018 and 25 October 2023. SRs using other pain outcomes with the exclusion of, or in addition to, the VAS and NPRS were excluded. These modifications to the protocol were made during the data collection process as it was discovered that many of the screened SRs included RCTs in their meta-analyses that did not include a pain measure or a wide variety of different pain measures and pooled their results. Given that this would increase methodological heterogeneity and uncertainty of our findings when looking at the RCTs within the SRs, we modified the protocol during the data collection process and before analyzing the data (20 December 2023). The search date range was chosen to maximize the number of prospective SRs registered, given the low baselines reported by Oliveira et al. [29] (19%) in 2018, and Riley et al. [27] (15.4%) in 2022, and increase the number of registered RCTs within those SRs, given the expiration of the grace periods [17]. SRs were excluded if they included pilot studies, non-randomized trials, work not published in a peer-reviewed journal, or research that did not involve musculoskeletal interventions.
Information sources
The search included SRs indexed in the Medical Literature Analysis and Retrieval System Online (MEDLINE). The information sources used for this methodological SR have been previously published in a methods review [27]. The purpose of the search strategy was to create a sample of SRs, and using MEDLINE alone yields similar conclusions as using several databases [30].
Search strategy
The search strategy for this methodological SR has been previously published [27], with an edit made to update the end date (see Appendix 2). The Peer Review of Electronic Search Strategies (PRESS) checklist [31] was used by a professional librarian with expertise in SR search strategies to create the search strategy for the bibliographic database as described by Furlan et al. [32] and Lefebvre et al. [33]
Selection process and screening
Appendix 3 outlines the selection process and screening of the SR and RCTs, along with screening for confidence in estimated effects.
Data collection process
Assessment of SRs and RCTs
The criteria reported by Riley et al. [19,27] were used to assess the SRs and the ‘trustworthiness’ of their included RCTs. Given the redundancy of including these methods here, they are included in Appendix 4.
Study records
Data management
Title screening of the SRs was performed in EndNote, with results copied and listed in Excel for further review. Two blinded authors (DF and SS) screened all abstracts and performed a full-text review for the SRs and RCTs.
Data items
For the included SRs, we reported the registration and whether it was prospective or retrospective, compliance with the registered PICO question, the AMSTAR 2 ratings, the presence and values (if applicable) of the I2, I2 CI, and PI for the VAS or NPRS, and GRADE ratings. Identified RCTs were assessed as previously discussed to ensure they had established external validity on the PEDro, moderate-to-high internal validity on the PEDro, moderate to low risk of bias on the RoB 2, and represented moderate-to-high certainty of estimated effect on the GRADE criteria and/or provided detailed results for evaluation of confidence in estimated effects through statistical inference through estimation by evaluating p-values, estimated effects, and CIs. The data items included measures of pain (VAS, NPRS). As previously discussed, the time points of interest were Immediate (closest to immediately following the intervention), Short-term (closest to 1 month), Intermediate-term (closest to 6 months), and Long-term (closest to 12 months or longer) [34]. Data for the outcomes of interest were extracted in a spreadsheet on Google Drive. This data was then converted to an Excel spreadsheet and uploaded into MedCalc Statistical Software version 23.0.2 (MedCalc Software bvba, Ostend, Belgium; http://www.medcalc.org.)
Data syntheses
All VAS and NPRS scores were scaled from 0 to 100 and pooled for analysis. Descriptive statistics and statistical analyses were performed in MedCalc Statistical Software version 23.0.2. Statistical significance was set at α < .05.
Publication bias was assessed through the modification of the Egger’s test [35], Begg’s test and funnel plots [36]. Heterogeneity was assessed using the reported 95% PIs and the I2 statistic with its corresponding 95% CI whenever possible. The PI was used to indicate the dispersion of the effects [37].
The treatment effect sizes were interpreted as .2 to .49, which were considered small, .5 to .79, which were considered moderate, and .8 or greater, which were considered large [38]. The clinically meaningful differences for pain intensity were established by taking the most conservative thresholds for the tools identified based on the literature given [39] the wide variability of these values [40].
Results
Study selection
Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) diagrams are included (Figures 1 and 2), outlining the search and selection process for the SRs and RCTs included in this review. This methodological review also meets all items on the PRISMA 2020 Checklist. A total of 677 SRs were initially screened, with a final N of 13. A total of 109 (including duplicates) RCTs were extracted from those SRs, with a final N of 8, making it through the verification of prospective registration, quality, and risk-of-bias screening before being evaluated using the GRADE to determine the certainty in estimated effects. Not all RCTs could be included in the quantitative analysis secondary to including more than two pain contextual assessments and treatment groups, so Figures 1 and 2 outline the screening process vary slightly from the tables for the synthesized data (see Quantitative analysis).
Figure 1.

Study selection.
Figure 2.

Prospective registration, quality, and risk of bias screening.
The SRs included in the review are provided in Table 1. Excluded SRs are provided in Appendix 5. The screening results for the SRs are presented in Table 2. Of the 109 RCTs (including duplicates) extracted from the SRs, only 35 were registered, and only 16 were registered prospectively. Eleven of these discussions and conclusions matched the primary outcome. Excluded RCTs are provided in Appendix 6.
Table 1.
Full-text systematic reviews included (N = 13).
| Authors | Title | Journal | Year | Volume | Issue | Pages |
|---|---|---|---|---|---|---|
| Bernet et al.[41] | The effects of hip-targeted physical therapy interventions on low back pain: A systematic review and meta-analysis. | Musculoskelet Sci Pract | 2019 | 39 | 91–100 | |
| Bizzarri et al.[42] | Thoracic manual therapy is not more effective than placebo thoracic manual therapy in patients with shoulder dysfunctions: A systematic review with meta-analysis. | Musculoskelet Sci Pract | 2018 | 33 | 1–10 | |
| Campo et al.[43] | The effectiveness of biofeedback for improving pain, disability and work ability in adults with neck pain: A systematic review and meta-analysis. | Musculoskelet Sci Pract | 2021 | 52 | 102317 | |
| Ceballos-Laita et al.[44] | The effectiveness of hip interventions in patients with low-back pain: A systematic review and meta-analysis. | Braz J Phys Ther | 2023 | 27 | 2 | 100502 |
| de Oliveira Silva et al.[45] | Patient education for patellofemoral pain: A systematic review. | J Orthop Sports Phys Ther | 2020 | 50 | 7 | 388–396 |
| de Oliveira et al.[46] | Dose-response effect of lower limb resistance training volume on pain and function of women with patellofemoral pain: A systematic review and meta-regression. | Phys Ther Sport | 2023 | 63 | 95–103 | |
| Garzonio et al.[47] | Effectiveness of specific exercise for deep cervical muscles in nonspecific neck pain: A systematic review and meta-analysis. | Phys Ther | 2022 | 102 | 5 | 1–13 |
| Lew et al.[48] | Comparison of dry needling and trigger point manual therapy in patients with neck and upper back myofascial pain syndrome: A systematic review and meta-analysis. | J Man Manip Ther | 2021 | 29 | 3 | 136–146 |
| Ma et al.[49] | Effect of aquatic physical therapy on chronic low back pain: A systematic review and meta-analysis. | BMC Musculoskelet Disord | 2022 | 23 | 1 | 1050 |
| Ramírez-Vélez et al.[50] | Effects of kinesio taping alone versus sham taping in individuals with musculoskeletal conditions after intervention for at least one week: A systematic review and meta-analysis. | Physiotherapy | 2019 | 105 | 4 | 412–420 |
| Sørensen et al.[51] | Spinal manipulative therapy for nonspecific low back pain: Does targeting a specific vertebral level make a difference?: A systematic review with meta-analysis. | J Orthop Sports Phys Ther | 2023 | 53 | 9 | 529–539 |
| Vanti et al.[52] | Effectiveness of mechanical traction for lumbar radiculopathy: A systematic review and meta-analysis. | Phys Ther | 2021 | 101 | 3 | 1–13 |
| Vier et al.[53] | The effectiveness of dry needling for patients with orofacial pain associated with temporomandibular dysfunction: A systematic review and meta-analysis. | Braz J Phys Ther | 2019 | 23 | 1 | 3–11 |
Table 2.
Results of registration status and a measurement tool to assess systematic reviews 2 (AMSTAR 2) analyses for the included systematic reviews (N = 13).
| SR | Was SR Registered? | Was SR Prospectively or Retrospectively Registered? | Number of 16 AMSTAR 2 domains met by SR | Number of 7 critical AMSTAR 2 domains met by SR | AMSTAR 2 Rating of overall confidence in results of SR | I2 Value | GRADE rating present? | GRADE ratings | GRADE recommendations |
|---|---|---|---|---|---|---|---|---|---|
| Bernet et al.[41] | Yes | Retrospective | 10 | 3 | Critically Low | 83.21% | Yes | High risk of bias due to issues with concealed, randomized allocation, blinding, and obvious signs of heterogeneity. | Low |
| Bizzarri et al.[42] | No | NA | 10 | 3 | Critically Low | 0% pain at present; 0% pain during movement | Yes | Pain at present: Imprecision and indirectness; Pain during movement: Risk of bias, Imprecision and indirectness | Pain at present: Low; Pain during movement: Very low |
| Campo et al.[43] | Yes | Prospective | 14 | 6 | Low | Short-term: Pressure 51%, EMG 82%, combined 74%, residual 75%; Intermediate-term: Pressure NA, EMG 39%, combined 30%, residual 39% | Yes | Short-term: high risk of bias; Intermediate-term: high risk of bias and imprecision | Short-term: moderate; Intermediate-term: Low |
| Ceballos-Laita et al.[44] | Yes | Retrospective | 12 | 4 | Critically Low | Short-term = 0% strength+low back exercise vs. specific low back intervention, 0% hip mobilization and hip strength + specific low back intervention vs. specific low back intervention; medium-term = 0%. Long-term = 0% | Yes | Inconsistency, imprecision, and risk of bias | Very low |
| de Oliveira Silva et al.[45] | Yes | Prospective | 12 | 5 | Critically Low | Health education vs exercise short-term = 94%; Health care professional education vs. exercise short-term = 82%; Health care professional education vs exercise medium-term = 75% | Yes | Education material vs supervised exercise plus education material: short-term = inconsistency and imprecision, long-term = inconsistency, imprecision and publication bias; Health professional delivered education vs exercise plus health professional delivered education: short-term = inconsistency and publication bias, medium-term = inconsistency and publication bias | Education matieral vs supervised exercise plus education material: short-term = Low, long-term = Very Low; Health professional delivered education vs exercise plus health professional delivered education: short-term = Low, medium-term = Low |
| de Oliveira et al.[46] | Yes | Retrospective | 8 | 4 | Critically Low | Intervention groups overall effect = 87%; Intervention vs. control groups overall effect = 92%; Subgroups: sedentary = 92%, physically active = 87%. | No | ||
| Garzonio et al.[47] | Yes | Prospective | 13 | 6 | Low | Craniocervical flexion training vs. cervical flexion training = 0%; Craniocervical flexion training vs neck exercise = 81%; total = 68%; subgroup differences = 62.7% | Yes | Ranged from very low to moderate: Moderate = 9 ratings Low = 6 ratings Very Low = 5 ratings |
|
| Lew et al.[48] | Yes | Retrospective | 10 | 5 | Critically Low | 34.80% | No | ||
| Ma et al.[49] | Yes | Prospective | 12 | 5 | Critically Low | Aquatic physical therapy vs No Aquatic physical therapy = 41%, Aquatic physical therapy vs No Aquatic physical therapy at rest = 82%, Aquatic physical therapy vs land based exercise = 8%, Aquatic physical therapy vs land based exercise at rest = 82% | Yes | VAS: serious issues at risk of bias and imprecision; VAS at rest: Serious risk of bias, inconsistency, and imprecision | VAS = Low; VAS at rest = Very Low |
| Ramírez-Vélez et al.[50] | Yes | Retrospective | 13 | 6 | Low | Kinesio taping vs Sham taping: post-treatment = 92.18%, follow-up = 88.77%; Kinesio taping (pre & post): post- treatment = 86.13%, follow-up = 81.29%; Sham taping (pre & post): post- treatment = 94.26%, follow-up = 89.29%. | Yes | Pain immediate: serious at inconsistency, indirectness, strongly suspect publication bias; Pain at follow-up: serious at inconsistency, indirectness, and publication bias highly suspected. | Pain immediate: Very low; Pain follow-up: very low |
| Sørensen et al.[51] | Yes | Prospective | 12 | 5 | Critically Low | Post-treatment = 58.65%; follow-up closest to 12 mo. = 18.57% | Yes | Post-intervention serious for imprecision; follow-up: serious for imprecision | Post-intervention: moderate; follow-up: moderate |
| Vanti et al.[52] | Yes | Prospective | 13 | 6 | Low | Traction + physical therapy vs physical therapy = 30.44%; Traction + physical therapy vs physical therapy = 0%. | Yes | Prone Traction + physical therapy vs. physical therapy: no concerns recorded; Prone traction vs. transcutaneous electrical nerve stimulation: −2 imprecision; Supine traction + physical therapy vs. physical therapy:-1 risk of bias and −1 imprecision; Supine traction vs. laser: −2 imprecision; Supine traction vs. US: −2 imprecision; Supine traction vs. meds: −2 imprecision | Prone traction + physical therapy vs physical therapy: High; Prone Traction vs transcutaneous electrical nerve stimulation: Low; Supine traction + physical therapy vs physical therapy: Low; Supine traction vs laser: Low; Supine traction vs ultrasound: Low; Supine traction vs meds: Low |
| Vier et al.[53] | Yes | Retrospective | 11 | 5 | Critically Low | Dry needling vs Sham = 24%; Dry needling vs other interventions = 24% | Yes | Dry needling vs Sham = serious risk of bias, inconsistency, indirectness, and very serious imprecision; Dry needling vs other interventions = serious risk of bias, inconsistency, indirectness, and very serious imprecision | DN vs Sham = Very Low; DN vs other interventions = Very Low |
| |||||||||
SR:systematic review, GRADE: Grading of Recommendations, Assessment, Development, and Evaluations.
SR registration and quality assessment
Of the 13 included SRs, all but one [42] were registered; however, only six were done so prospectively [43,45,47,49,51,52]. No SRs contained a PICO question matching the registration, regardless of status (prospective or retrospective). The confidence in the estimated effect based on the methodological quality of the SRs using the AMSTAR 2 ratings was critically low or low. The interpretation of these findings is ‘The review has more than one critical flaw and should not be relied on’ and ‘The review has a critical flaw and may not provide an accurate and comprehensive summary of the available studies that address the question of interest,’ respectively [54]. In this situation, the synthesis of RCTs using SR methodology creates unreliable evidence based on the SR’s methodological strength. When there is an absence of reliable evidence, the most honest interpretation is that this evidence cannot be translated into clinical practice recommendations [55].
All the reviews reported heterogeneity in the form of the I2 statistic, but none included the I2 95% CI or the PI. Forty-two I2 values were reported across the included SRs, with variation across time points, specific comparisons, etc. The percent of these I2 values falling within each heterogeneity quartile is provided in Table 3.
Table 3.
Percent of I2 values from the included systematic reviews by quartile.
| Level of Heterogeneity by I2 Quartile | Included SRs (42 values) | Retrospectively or Non-registered SRs (20 values) | Prospectively Registered SRs (22 values) |
|---|---|---|---|
| 0–24% | 28.6% | 40.0% | 18.2% |
| 25–49% | 14.3% | 5.0% | 22.7% |
| 50–74% | 11.9% | 0% | 22.7% |
| 75–100% | 45.2% | 55.0% | 36.4% |
SR:systematic review.
All but two [46,48] of the SRs included GRADE ratings and recommendations. A total of 44 GRADE ratings were reported across the remaining SRs, some with a single rating and others with many across several comparisons. Most GRADE ratings were from the prospectively registered SRs (N = 37) versus those from the remaining SRs (N = 7). Across these GRADE ratings, 29.5% were very low, 40.9% were low, 27.3% were moderate, and 2.3% were high. The GRADE ratings for the prospectively registered SRs were as follows: 21.6% very low, 43.2% low, 32.4% moderate, and 2.7% high. The GRADE ratings for the retrospectively or non-registered SRs were: 71.4% very low and 28.6% low.
Quality and risk of bias assessment of RCTs
After removing duplicates, 10 RCTs were included to assess quality and risk of bias using the PEDro and RoB 2. The results are presented in Table 4.
Table 4.
Quality and risk of bias assessment (N = 10).
| Authors | Official PEDro Scale | PEDro Criterion 1 | PEDro Scores | RoB 2 Randomization Process | RoB 2 Deviations from the intended interventions | RoB 2 Missing outcome data | RoB 2 Measurement of the outcome | RoB 2 Selection of the reported result | RoB 2 Overall Risk |
|---|---|---|---|---|---|---|---|---|---|
| Almeida et al.[56] | Yes | Yes | 7 | Low | Low | Low | Low | Some concerns | Some concerns |
| Bade et al.[57] | Yes | Yes | 5 | NA | NA | NA | NA | NA | NA |
| Burns et al.[58] | Yes | Yes | 7 | Low | Low | Low | Low | Some concerns | Some concerns |
| de Medeiros et al.[59] | Yes | Yes | 8 | Low | Low | Some concerns | Low | Some concerns | Some concerns |
| de Oliveira et al.[60] | Yes | Yes | 8 | Low | Low | Low | Low | Some concerns | Some concerns |
| de Oliveira et al.[61] | Yes | Yes | 8 | Low | Low | Low | Low | Some concerns | Some concerns |
| Dolak et al.[62] | Yes | Yes | 6 | Low | Low | High | Low | Some concerns | High |
| Donaldson et al.[63] | Yes | Yes | 8 | Low | Low | Low | Low | Some concerns | Some concerns |
| Parreira et al.[64] | No | Yes | 8 | Low | Low | Low | Low | Some concerns | Some concerns |
| Silva et al.[65] | Yes | Yes | 8 | Low | Low | Low | Low | Low | Low |
Certainty of evidence from RCTs
We determined the certainty of evidence from the remaining eight RCTs [56,58–63,65] with PEDro scores ≥ six and low to moderate risk of bias on the RoB 2 using the GRADE criteria. The certainty of the evidence was high secondary to all domains, limitations in study design, inconsistency of results, indirectness, imprecision, and risk of bias, not requiring downgrading. Moderate-to-high PEDro scores and meeting the first PEDro criterion, low heterogeneity from the eight RCTs, a total N of 702 participants across the eight RCTs, and low to moderate risk of bias were used to justify these ratings. Confidence in estimated effects was also evaluated by determining statistical significance through estimation assessing p-values, effect sizes (through standardized mean differences), corresponding CIs, and considering minimal detectable changes (MDCs).
The effect of ‘trustworthiness’of RCTs on estimated effects and heterogeneity
The 105 RCTs, remaining after excluding four duplicates (see Figures 1 and 2), were extracted from the 13 included SRs assessing outcomes at 107 time points. Thirty-eight RCTs compared: 1) more than two interventions (N = 21); 2) more than two between-group contextual pain comparisons (N = 11); and 3) comparisons involving more than two interventions and two between-group contextual pain comparisons (N = 6) (see Appendix 7).
This methodological variability in the number of more than two between-group interventional and/or pain contextual assessments was unexpected. We omitted RCTs that included more than two interventions and/or utilized more than two different pain assessments at the time points of interest (N = 38) as they were irrelevant to the research question, added variability based on research design, and added variability related to the patient assessment. This decision was made based on Cochrane collaboration guidance [66] and to control for methodological heterogeneity related to research methods and patient-reported outcome measures. This left 71 RCTs for further analysis.
Of the 71 RCTs identified from the SRs, 15 were missing mean and/or standard deviation data that were either not reported or reported in graphic form only. One author (SR) contacted the authors who had RCTs with missing data. Three responses were received, creating a 20% response rate. These data were added to the data set. Given that there was a known reason why the data was missing (it was not reported) and that it was not missing at random (which is a requirement for the imputation of missing data), the analysis was performed on the RCTs with complete data (N = 59). The RCTs with missing data are included in Appendix 8.
Of the 12 RCTs with missing data, 18 of 107 between-group comparisons did not have data from all time points of interest, resulting in 16.8% missing data. RCT data were missing for 12 of the RCTs before screening, and there was no missing data for the RCTs for which prospective registration was verified.
Publication bias
When publication bias was assessed for the RCTs with complete data before the assessment of prospective registration, there was a statistically significance Egger’s test (p < .001) with an intercept of −3.2159 and a 95% CI of −4.7758 to −1.6560. Additionally, Begg’s test identified statistically significant findings (p < .001) with a Kendal’s Tau of −.4015. The funnel plot findings are provided below (Figure 3).
Figure 3.

Funnel plot for randomized controlled trials before the assessment of prospective registration.
When publication bias was similarly assessed for the verified prospectively registered RCTs, it revealed that Egger’s test was not statistically significant (p = .336) with an intercept of −0.7028 and a 95% CI of −2.1754 to .7698. Additionally, Begg’s was not statistically significant (p = .467) with a Kendal’s Tau of −.09360. These statistical findings are supplemented with the funnel plot below (Figure 4).
Figure 4.

Funnel plot for the prospectively registered randomized controlled trials.
Quantitative analyses
A quantitative analysis was performed on the remaining 55 RCTs (see Appendix 9), using pooled random effects, to analyze how each step of the screening process contributed to changes in the estimated effects and heterogeneity of the included data points.
The results of these analyses are presented in Table 5. The numbers of RCTs in each phase of the screening process vary from those presented previously (see Figures 1 and 2), secondary to some of the RCTs not being able to be included in the quantitative analysis secondary to methodological heterogeneity as previously outlined. The standard error remained steady throughout the execution of the ‘trustworthy’ protocol, ranging from .042 to .071 across the five points of RCT screening for ‘trustworthiness’. The p-value increased once only those registered RCTs were included and moved from significant (p < .05) to non-significant upon inclusion of only those RCTs with a moderate-to-high PEDro score and low to moderate risk of bias. Heterogeneity, as measured by the I2 statistic, decreased from 73.92% (95 % CI [64.74, 80.72]) to 0% (95% CI [0.00, 37.44]), and remained at 0%, once RCTs with prospectively verified registered were grouped together. The PI became smaller throughout the screening process and proportionally crossed zero in all cases.
Table 5.
Results of quantitative analysis of pooled estimated effects and heterogeneity of randomized controlled trials.
| Estimated Effects |
Heterogeneity |
|||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Number of RCTs with two treatment groups that included pain outcome measures | Total number of participants in the intervention group for all time points | Total number of participants in the comparison group for all time points | Total number of participants in total for all time points | p-value | SMD (Effect) | CI | SE | Tau | I2 | I2 (95% CI) | SD PI | PI (95% CI) Min | PI (95% CI) Max | |
| RCTs included from SRs | 55 | 2708 | 2691 | 5399 | <.001 | −.415 | −.540 to −.290 | .064 | −6.517 | 80.11 | 75.88 to 83.60 | 6.517 | −16.382 | 15.552 |
| Registered RCTs | 22 | 1642 | 1628 | 3270 | .045 | −.143 | −.282 to −.003 | .071 | −2.008 | 73.92 | 64.74 to 80.72 | 2.009 | −5.066 | 4.780 |
| Verified prospectively registered RCTs | 14 | 1144 | 1141 | 2285 | .049 | −.082 | −.163 to −.0002 | .042 | −1.967 | 0.00 | 0.00 to 37.44 | 1.967 | −4.902 | 4.738 |
| RCTs with PEDro score ≥ 6 | 9 | 801 | 805 | 1606 | .746 | −.016 | −.113 to .081 | .050 | −.325 | 0 | 0.00 to 30.55 | .329 | −.822 | .789 |
| RCTs with low or moderate risk of bias | 8 | 750 | 757 | 1507 | .897 | .003 | −.097 to .103 | .051 | .058 | 0 | 0.00 to 18.55 | .078 | −.187 | .193 |
RCTs: randomized controlled trials, SMD: standardized mean difference, CI: confidence interval, SE: standard error, SD: standard deviation, PI: prediction interval, SRs systematic reviews.
Discussion
This methodological review aimed to determine whether publication bias, heterogeneity, and clinical recommendations for neuromusculoskeletal impairments were affected by prospective registration of the SRs, the ‘trustworthiness’ of the RCTs included in those SRs, and the evaluation of high certainty in estimated effects using the GRADE criteria. The answer is a ‘yes’ to all three queries. Prospective registration eliminated publication bias, and eliminating ‘untrustworthy’ evidence reduced the heterogeneity of the included data, and led to an inability to reject the null when examining between-group effects.
For the SRs, prospective registration results in I2 values ranging from low to high, with 18.2% less than 25% and 36.4% higher than 75%; however, SRs that were either retrospectively registered or not registered at all had I2 that were split on either end of the spectrum (low and high), with very few ranging from 25% to 74%. These non-registered or retrospectively registered SRs had 40% of their I2 values less than 25%, but more concerning was 55% being over 75%. This rate of I2 values over 75% was 18.6% higher than those prospectively registered. Additionally, only the prospectively registered SRs had moderate or high GRADE ratings (including some very low and low ratings). In contrast, the non-registered/retrospectively registered SRs only had very low and low ratings. It should be considered whether there is any value in publishing SRs rated as critically low or low certainty using the GRADE criteria. Critically low and low certainty evidence on the GRADE is interpreted as ‘The true effect is probably markedly different from the estimated effect’ and ‘The true effect might be markedly different from the estimated effect’, respectively. This represents an absence of reliable evidence that can be confidently translated into meaningful practice recommendations. Again, the most honest interpretation is that this evidence cannot be translated into clinical practice recommendations [55]. Should a clinician be faced with a situation where the research evidence no longer supports an intervention they have been utilizing, they should conduct a thorough review of the current literature, consider alternative evidence-based interventions, collaborate with colleagues, and possibly modify their practice.
For the RCTs, the most noteworthy findings are related to the change in significance noted via the interpretation of the p-value and the data heterogeneity via the I2 statistic. Heterogeneity was not present once verified prospective registration was considered, as interpreted from the I2 statistic, from our pooled data set with a reduction from 73.92%, 95% CI [64.74, 80.72] (observed in all registered RCTs) to 0%, 95% CI [0.00, 37.44] (RCTs with a verified prospective registry). The decrease in heterogeneity and increase in null findings is very similar to what was previously noted by Kaplan et al. [16] once prospective registration was required for RCTs to acquire funding. Our findings demonstrate that retrospective registration does not reduce the variation in estimated effects in the literature. Statistical significance (p = .049) was still evident at this point in our methodology, and a narrow CI did not cross zero, albeit with a small effect (−.082). Statistical significance did not exist (p = .746) in the pooled estimated effects in the next step of our methodology once external and internal validity was assessed via the PEDro scores, requiring a score ≥ six. Therefore, although verifiable prospective intent does appear to solve the issue of publication bias and unexplained heterogeneity, the validity of the study design seems to make a difference regarding the accuracy of statistical significance testing assessed using p-values. None of the observed effects were ‘trustworthy’ or otherwise met the MDC for the variables in question, meaning there was a lack of certainty in estimated effects based on the GRADE criteria regardless of our screening process.
The reporting of effect sizes and meeting of MDCs was absent in the final eight ‘trustworthy’ RCTs due to null effects across the between-group analyses in all cases. In short, all eight of the included RCTs reported null effects for their between-group comparisons. It has been reported that scores on the NPRS and the VAS are highly correlated [67,68], and even using the lowest possible interpretation of MDC values reported in the literature (i.e. 2 points) [69–72], the lower end of the PI CI across the included RCTs before the ‘trustworthy’ screening process was −16.382 (see Table 5). When considering we scaled the data based on 100 for our analysis to combine the values for the VAS and NPRS, one would multiply the above reported MDC values by 10. This makes the interpretation of the lack of meaningful effects across the included RCTs even more drastic, given that, at best, there was still no meaningful effect before the initiation of the screening process. This is notable given that at this point, there were statistically significant effects across the included studies via interpretation of the p-value (<.001). More striking was the effect observed once only the eight ‘trustworthy’ RCTs were considered, with the lower end of the PI CI being −.187.
The interpretation of the data under all conditions based on statistical inference through estimation consistent with the position statement of the ISPJE did not change. The p-value and standardized mean effect changed but were not meaningful (through interpretation of the MDC) at the start. This suggests that the average treatment effect identified what may be meaningful for most patients, and the variability of the effect is the most appropriate way to determine if the effect will likely be clinically meaningful and applicable to an individual patient [73–75]. Through the interpretation of the PI at each step of the screening process, we observed PT ‘treatment may be moderately or trivially harmful, have no effect, or have a trivial or moderate benefit’ [73] to ‘treatment has no effect, is trivially effective, or is trivially harmful’ [73] when looking at the trustworthy RCTs. In applying the evidence to an individual patient in the clinic based on the variability of the effect, a reasoned approach would be to choose a variability that is interpreted as the ‘Treatment is effective, but the effect is too small to be meaningful’ [73] at a minimum. In a patient-centered model, a patient should be informed if, based on the variability of the effect, there is a chance that the treatment may harm them.
This screening process employed in this methodological review observed evidence representing an absence of evidence and created a strong case for the evidence of absence. Our findings provide data-driven support for the challenges of using null statistical hypothesis testing and p-values when interpreting research results and support the position statement of the ISPJE that statistical inference through estimation using effect sizes and CIs should be used when making decisions regarding the clinical efficacy of an intervention in a patient-centered model [26].
Limitations
Several unforeseen trial designs were encountered, including RCTs reporting multiple pain comparisons across multiple pain assessments (i.e. at rest versus with activity) and numerous group comparisons, apart from repeated comparisons across different time points. This made quantitative analysis difficult and would have increased the heterogeneity of the included data had all of these results been included in our analysis. We were also surprised to find data missing from some RCTs, requiring us to proceed with our analysis only with those with complete data for accurate reporting. The limited number of ‘trustworthy’ RCTs within the analyzed SRs also remains a challenge. Additionally, the findings of this study are only generalizable to the sampling methods we used and may not be generalizable to all physiotherapy SRs and RCTs published. Finally, the clinical variability of conditions and treatments included should have increased the variability of our findings, which should be considered when examining our findings.
Conclusions
This methodological review examined the effect of ‘trustworthiness’ on the evidence for treating persons with neuromusculoskeletal conditions with musculoskeletal PT interventions. ‘Untrustworthy’ evidence indicates statistically significant findings with high levels of heterogeneity. The current state of the ‘trustworthy’ evidence reports non-significant findings with no heterogeneity. In short, our current best evidence consistently indicates no meaningful between-group effects for PT interventions in decreasing patients’ pain compared to other treatments, sham interventions, or no treatment. Based on the selected journals included in this methodological review and two previously published studies [15,16], findings of null effects in PT research may be more likely in studies where prospective validity is verified and that are moderate to high in methodological quality. This suggests that significant findings should be more clinically meaningful when discovered and scrutinized to ensure they can be ‘trusted’ before publication. More research is needed to determine whether this is true for all published PT research.
Supplementary Material
Biographies
Daniel W. Flowers is an Associate Professor in the Doctor of Physical Therapy, PhD in Rehabilitation Sciences, and orthopedic residency programs at LSU Health Shreveport. He is board-certified in orthopedic physical therapy. His research interests include modifying the gait impairments of patients with systematic reviews, clinical reasoning, knee osteoarthritis, post-traumatic rehabilitation, and educational outcomes of physical therapy students.
Brian T. Swanson is an Associate Professor at the University of Hartford. He serves as Chair of the Department of Rehabilitation Sciences, Director of the Doctor of Physical Therapy Program, and co-director of the University of Hartford/HHCRN orthopedic physical therapy residency program. He is a board certified specialist in orthopedic physical therapy and a Fellow of the American Academy of Orthopaedic Manual Physical Therapists. Dr. Swanson’s research interests include validating tests and measures in orthopedic manual physical therapy, developing a further understanding of the mechanisms of manual physical therapy interventions, and evidence-based practice/research methodology.
Stephen M. Shaffer is a residency and fellowship trained clinical specialist, educator, and scientist with twenty-one years of experience in the physiotherapy profession. He also earned an academic doctorate in advanced orthopedic physical therapy. He has worked primarily in orthopedic settings and is a Fellow of the Canadian Academy of Manipulative Physiotherapy as well as the American Academy of Orthopaedic Manual Physical Therapists. Dr. Shaffer has co-authored numerous peer-reviewed scientific papers and has presented at local, state, national, and international venues. He also has experience lecturing at both the doctoral and post-professional levels.
Derek Clewley’s area of expertise and training is in orthopedics and manual physical therapy. He has achieved board certification in orthopedics and is recognized as a fellow of the American Academy of Orthopaedic Manual Physical Therapists. He is the associate editor of BMC Musculoskeletal Disorders and an AAOMPT Board Member.
Matthew Martin is an instructor at LSU Health Shreveport. He completed a neurologic physical therapy residency program at LSU Health Shreveport in July 2020. He became a board-certified clinical specialist in neurologic physical therapy in June 2021 with subsequent subspecialty certification in vestibular rehabilitation in 2022. He has presented research on dual-task training following CNS infection, blood flow restriction training in individuals with incomplete spinal cord injury, and vestibular habituation to reduce motion sensitivity related to chemotherapy treatment at state and national conferences.
Nicholas A. Russell is an Instructor of Physical Therapy at LSU Health Shreveport. He is residency trained in orthopedic physical therapy and is a Board-Certified Clinical Specialist in Orthopaedic Physical Therapy through the American Board of Physical Therapy Specialties. Dr. Russell is a Certified Strength and Conditioning Specialist through the NSCA and is certified in Dry Needling. He practices in the faculty rehabilitation clinic within the School of Allied Health Professions where he treats acute, chronic, and post-surgical orthopedic conditions of all types.
Sean P. Riley is a Lead Physical Therapist in the Hartford Healthcare Rehabilitation Network. He is board-certified in orthopedics and a Fellow of the American Academy of Orthopaedic Manual Physical Therapists. Dr. Riley’s research interests include symptom modification approaches to evaluating and treating neuromusculoskeletal disorders, evidence-based practice, research methodology, and clinical reasoning.
Funding Statement
The author(s) reported there is no funding associated with the work featured in this article.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Supplementary material
Supplemental data for this article can be accessed online at https://doi.org/10.1080/10669817.2025.2464548.
References
- [1].Satpute K, Reid S, Mitchell T, et al. Efficacy of mobilization with movement (MWM) for shoulder conditions: a systematic review and meta-analysis. J Man Manip Ther. 2022;30(1):13–32. doi: 10.1080/10669817.2021.1955181 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Paraskevopoulos E, Plakoutsis G, Chronopoulos E, et al. Effectiveness of combined program of manual therapy and exercise vs exercise only in patients with rotator cuff-related shoulder pain: a systematic review and meta-analysis. Sports Health. 2022;15(5):727–735. doi: 10.1177/19417381221136104 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Saito H, Harrold ME, Cavalheri V, et al. Scapular focused interventions to improve shoulder pain and function in adults with subacromial pain: a systematic review and meta-analysis. Physiother Theory Pract. 2018;34(9):653–670. doi: 10.1080/09593985.2018.1423656 [DOI] [PubMed] [Google Scholar]
- [4].Piano L, Ritorto V, Vigna I, et al. Individual patient education for managing acute and/or subacute low back pain: little additional benefit for pain and function compared to placebo. A systematic review with meta-analysis of randomized controlled trials. J Orthopaedic Sports Phys Ther. 2022;52(7):432–445. doi: 10.2519/jospt.2022.10698 [DOI] [PubMed] [Google Scholar]
- [5].Runge N, Aina A, May S.. The benefits of adding manual therapy to exercise therapy for improving pain and function in patients with knee or hip osteoarthritis: a systematic review with meta-analysis. J Orthopaedic Sports Phys Ther. 2022;52(10):675–A13. doi: 10.2519/jospt.2022.11062 [DOI] [PubMed] [Google Scholar]
- [6].Gränicher P, Mulder L, Lenssen T, et al. Prehabilitation improves knee functioning before and within the first year after total knee arthroplasty: a systematic review with meta-analysis. J Orthopaedic Sports Phys Ther. 2022;52(11):709–725. doi: 10.2519/jospt.2022.11160 [DOI] [PubMed] [Google Scholar]
- [7].Prat-Luri A, de Los Rios-Calonge J, Moreno-Navarro P, et al. Effect of trunk-focused exercises on pain, disability, quality of life, and trunk physical fitness in low back pain and how potential effect modifiers modulate their effects: a systematic review with meta-analyses. J Orthopaedic Sports Phys Ther. 2023;53(2):64–93. doi: 10.2519/jospt.2023.11091 [DOI] [PubMed] [Google Scholar]
- [8].Devonshire JJ, Wewege MA, Hansford HJ, et al. Effectiveness of cognitive functional therapy for reducing pain and disability in chronic low back pain: a systematic review and meta-analysis. J Orthopaedic Sports Phys Ther. 2023;53(5):244–285. doi: 10.2519/jospt.2023.11447 [DOI] [PubMed] [Google Scholar]
- [9].Mueller J, Weinig J, Niederer D, et al. Resistance, motor control and mindfulness-based exercises are effective for treating chronic non-specific neck pain: a systematic review with meta-analysis and dose-response meta-regression. J Orthopaedic Sports Phys Ther. 2023;53(8):420–459. doi: 10.2519/jospt.2023.11820 [DOI] [PubMed] [Google Scholar]
- [10].Israel H, Richter RR. A guide to understanding meta-analysis. J Orthopaedic Sports Phys Ther. 2011;41(7):496–504. doi: 10.2519/jospt.2011.3333 [DOI] [PubMed] [Google Scholar]
- [11].Fletcher J. What is heterogeneity and is it important? BMJ. 2007;334(7584):94–96. doi: 10.1136/bmj.39057.406644.68 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Deeks JJ, Higgins JPT, Altman DG, et al. Chapter 10: Analysing data and undertaking meta-analyses [last updated November 2024]. In: Higgins JPT, Thomas J, Chandler J, et al. (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.5. Cochrane, 2024. Available from www.training.cochrane.org/handbook. [Google Scholar]
- [13].Higgins JP, Thompson SG, Deeks JJ, et al. Measuring inconsistency in meta-analyses. BMJ. 2003;327(7414):557–560. doi: 10.1136/bmj.327.7414.557 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Higgins JP. Commentary: heterogeneity in meta-analysis should be expected and appropriately quantified. Int J Epidemiol. 2008;37(5):1158–1160. doi: 10.1093/ije/dyn204 [DOI] [PubMed] [Google Scholar]
- [15].O’Connell NE, Moore R, Stewart G, et al. Trials we cannot trust: investigating their impact on systematic reviews and clinical guidelines in spinal pain. J Pain. 2023;24(12):2103–2130. doi: 10.1016/j.jpain.2023.07.003 [DOI] [PubMed] [Google Scholar]
- [16].Kaplan RM, Irvin VL, Garattini S. Likelihood of null effects of large NHLBI clinical trials has increased over time. PLOS ONE. 2015;10(8):e0132382. doi: 10.1371/journal.pone.0132382 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Costa LOP, Lin C-W, Grossi DB, et al. Clinical trial registration in physiotherapy journals: recommendations from the international society of physiotherapy journal editors. J Orthopaedic Sports Phys Ther. 2012;42(12):978–981. doi: 10.2519/jospt.2012.0111 [DOI] [PubMed] [Google Scholar]
- [18].Riley SP, Swanson BT, Shaffer SM, et al. The unknown prevalence of postrandomization bias in 15 physical therapy journals: a methods review. J Orthopaedic Sports Phys Ther. 2021;51(11):542–550. doi: 10.2519/jospt.2021.10491 [DOI] [PubMed] [Google Scholar]
- [19].Riley SP, Swanson BT, Shaffer SM, et al. Protocol for the development of a ‘trustworthy’ living systematic review and meta analyses of manual therapy interventions to treat neuromusculoskeletal impairments. J Man Manipulative Ther. 2022;31(4):220–230. doi: 10.1080/10669817.2022.2119528 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Riley SP, Shaffer SM, Flowers DW, et al. Manual therapy for non-radicular cervical spine related impairments: establishing a ‘Trustworthy’living systematic review and meta-analysis. J Man Manipulative Ther. 2023;31(4):231–245. doi: 10.1080/10669817.2023.2201917 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Riley SP, Swanson BT, Shaffer SM, et al. Does manual therapy meaningfully change quantitative sensory testing and patient reported outcome measures in patients with musculoskeletal impairments related to the spine?: A ‘trustworthy’systematic review and meta-analysis. J Man Manipulative Ther. 2023;32(1):51–66. doi: 10.1080/10669817.2023.2247235 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Riley SP, Swanson BT, Cook CE. “Trustworthiness,” confidence in estimated effects, and confidently translating research into clinical practice. Arch Physiother. 2023;13(1):1–5. doi: 10.1186/s40945-023-00162-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Anvari F, Lakens D. The replicability crisis and public trust in psychological science. Comprehensive results in social psychology. Compr Results Soc Psychol. 2018;3(3):266–286. doi: 10.1080/23743603.2019.1684822 [DOI] [Google Scholar]
- [24].Stanley TD, Carter EC, Doucouliagos H. What meta-analyses reveal about the replicability of psychological research. Psychol Bull. 2018. Dec;144(12):1325–1346. doi: 10.1037/bul0000169 [DOI] [PubMed] [Google Scholar]
- [25].Riley SP, Swanson BT, Shaffer SM, et al. Why do ‘trustworthy’ living systematic reviews matter? J Man Manip Ther. 2023;31(4):215–219. doi: 10.1080/10669817.2023.2229610 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Elkins MR, Pinto RZ, Verhagen A, et al. Statistical inference through estimation: recommendations from the International society of. Physiother J Editors Physiother. 2022;115:A1–A6. doi: 10.1016/j.physio.2021.12.003 [DOI] [PubMed] [Google Scholar]
- [27].Riley SP, Swanson BT, Shaffer SM, et al. Is the quality of systematic reviews influenced by prospective registration: a methods review of systematic musculoskeletal physical therapy reviews. J Man Manip Ther. 2023;31(3):184–197. doi: 10.1080/10669817.2022.2110419 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Page MJ, Moher D, Bossuyt PM, et al. PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ. 2021;372:n160. doi: 10.1136/bmj.n160 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Oliveira CB, Elkins MR, Lemes ÍR, et al. A low proportion of systematic reviews in physical therapy are registered: a survey of 150 published systematic reviews. Braz J Phys Ther. 2018;22(3):177–183. doi: 10.1016/j.bjpt.2017.09.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Ewald H, Klerings I, Wagner G, et al. Searching two or more databases decreased the risk of missing relevant studies: a metaresearch study. J Clin Epidemiol. 2022. Sep;149:154–164. doi: 10.1016/j.jclinepi.2022.05.022 [DOI] [PubMed] [Google Scholar]
- [31].McGowan J, Sampson M, Salzwedel DM, et al. PRESS peer review of electronic search strategies: 2015 guideline statement. J Clin Epidemiol. 2016;75(Supplement C):40–46. doi: 10.1016/j.jclinepi.2016.01.021 [DOI] [PubMed] [Google Scholar]
- [32].Furlan ADMM, Chou MD. Updated method guideline for systematic reviews in the Cochrane back and neck group. Spine (Phila Pa 1976). 2015;40(21):1660–1673. doi: 10.1097/BRS.0000000000001061 [DOI] [PubMed] [Google Scholar]
- [33].Lefebvre C, Glanville J, Briscoe S et al, Chapter 4: Searching for and selecting studies [last updated September 2024]. In: Higgins JPT, Thomas J, Chandler J. (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.5. Cochrane, 2024. Available from www.training.cochrane.org/handbook [Google Scholar]
- [34].Blanpied PR, Gross AR, Elliott JM, et al. Neck pain: revision 2017. J Orthop Sports Phys Ther. 2017;47(7):A1–A83. doi: 10.2519/jospt.2017.0302 [DOI] [PubMed] [Google Scholar]
- [35].Pustejovsky JE, Rodgers MA. Testing for funnel plot asymmetry of standardized mean differences. Res Synth Methods. 2019. Mar;10(1):57–71. doi: 10.1002/jrsm.1332 [DOI] [PubMed] [Google Scholar]
- [36].Sterne JA, Sutton AJ, Ioannidis JP, et al. Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials. BMJ. 2011. Jul 22;343:d4002. doi: 10.1136/bmj.d4002 [DOI] [PubMed] [Google Scholar]
- [37].Borenstein M. Research note: In a meta-analysis, the I(2) index does not tell us how much the effect size varies across studies. J Physiother. 2020. Apr;66(2):135–139. doi: 10.1016/j.jphys.2020.02.011 [DOI] [PubMed] [Google Scholar]
- [38].Lee DK. Alternatives to P value: confidence interval and effect size. Korean J Anesthesiol. 2016. Dec;69(6):555–562. doi: 10.4097/kjae.2016.69.6.555 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Tagliaferri SD, Mitchell UH, Saueressig T, et al. Classification approaches for treating low back pain have small effects that are not clinically meaningful: a systematic review with meta-analysis. J Orthop Sports Phys Ther. 2022. Feb;52(2):67–84. doi: 10.2519/jospt.2022.10761 [DOI] [PubMed] [Google Scholar]
- [40].Schwind J, Learman K, O’Halloran B, et al. Different minimally important clinical difference (MCID) scores lead to different clinical prediction rules for the Oswestry disability index for the same sample of patients. J Man Manip Ther. 2013. May;21(2):71–78. doi: 10.1179/2042618613Y.0000000028 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Bernet BA, Peskura ET, Meyer ST, et al. The effects of hip-targeted physical therapy interventions on low back pain: a systematic review and meta-analysis. Musculoskelet Sci Pract. 2019;39:91–100. doi: 10.1016/j.msksp.2018.11.016 [DOI] [PubMed] [Google Scholar]
- [42].Bizzarri P, Buzzatti L, Cattrysse E, et al. Thoracic manual therapy is not more effective than placebo thoracic manual therapy in patients with shoulder dysfunctions: a systematic review with meta-analysis. Musculoskelet Sci Pract. 2018;33:1–10. doi: 10.1016/j.msksp.2017.10.006 [DOI] [PubMed] [Google Scholar]
- [43].Campo M, Zadro JR, Pappas E, et al. The effectiveness of biofeedback for improving pain, disability and work ability in adults with neck pain: a systematic review and meta-analysis. Musculoskelet Sci Pract. 2021;52:102317. doi: 10.1016/j.msksp.2021.102317 [DOI] [PubMed] [Google Scholar]
- [44].Ceballos-Laita L, Estebanez-de-Miguel E, Jimenez-Rejano JJ, et al. The effectiveness of hip interventions in patients with low-back pain: a systematic review and meta-analysis. Braz J Phys Ther. 2023;27(2):100502. doi: 10.1016/j.bjpt.2023.100502 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [45].de Oliveira Silva D, Pazzinatto MF, Rathleff MS, et al. Patient education for patellofemoral pain: a systematic review. J Orthop Sports Phys Ther. 2020;50(7):388–396. doi: 10.2519/jospt.2020.9400 [DOI] [PubMed] [Google Scholar]
- [46].de Oliveira NT, Lopez P, Severo-Silveira L, et al. Dose-response effect of lower limb resistance training volume on pain and function of women with patellofemoral pain: a systematic review and meta-regression lower limb resistance training in women with patellofemoral pain. Phys Ther Sport. 2023;63:95–103. doi: 10.1016/j.ptsp.2023.07.006 [DOI] [PubMed] [Google Scholar]
- [47].Garzonio S, Arbasetti C, Geri T, et al. Effectiveness of specific exercise for deep cervical muscles in nonspecific neck pain: a systematic review and meta-analysis. Phys Ther. 2022;102(5):zac001. doi: 10.1093/ptj/pzac001 [DOI] [PubMed] [Google Scholar]
- [48].Lew J, Kim J, Nair P. Comparison of dry needling and trigger point manual therapy in patients with neck and upper back myofascial pain syndrome: a systematic review and meta-analysis. J Man Manip Ther. 2021;29(3):136–146. doi: 10.1080/10669817.2020.1822618 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [49].Ma J, Zhang T, He Y, et al. Effect of aquatic physical therapy on chronic low back pain: a systematic review and meta-analysis. BMC Musculoskelet Disord. 2022;23(1):1050. doi: 10.1186/s12891-022-05981-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [50].Ramírez-Vélez R, Hormazábal-Aguayo I, Izquierdo M, et al. Effects of kinesio taping alone versus sham taping in individuals with musculoskeletal conditions after intervention for at least one week: a systematic review and meta-analysis. Physiotherapy. 2019;105(4):412–420. doi: 10.1016/j.physio.2019.04.001 [DOI] [PubMed] [Google Scholar]
- [51].Sørensen PW, Nim CG, Poulsen E, et al. Spinal manipulative therapy for nonspecific low back pain: does targeting a specific vertebral level make a difference?: A systematic review with meta-analysis. J Orthop Sports Phys Ther. 2023;53(9):529–539. doi: 10.2519/jospt.2023.11962 [DOI] [PubMed] [Google Scholar]
- [52].Vanti C, Panizzolo A, Turone L, et al. Effectiveness of mechanical traction for lumbar radiculopathy: a systematic review and meta-analysis. Phys Ther. 2021;101(3):zaa231. doi: 10.1093/ptj/pzaa231 [DOI] [PubMed] [Google Scholar]
- [53].Vier C, de Almeida MB, Neves ML, et al. The effectiveness of dry needling for patients with orofacial pain associated with temporomandibular dysfunction: a systematic review and meta-analysis. Braz J Phys Ther. 2019;23(1):3–11. doi: 10.1016/j.bjpt.2018.08.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [54].Shea BJ, Reeves BC, Wells G, et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ. 2017. Sep 21;358:j4008. doi: 10.1136/bmj.j4008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [55].Feres M, Feres MFN. Absence of evidence is not evidence of absence. J Appl Oral Sci. 2023. Mar 27;31:ed001. doi: 10.1590/1678-7757-2023-ed001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [56].Almeida GPL, Das Neves Rodrigues HL, Coelho BAL, et al. Anteromedial versus posterolateral hip musculature strengthening with dose-controlled in women with patellofemoral pain: a randomized controlled trial. Phys Ther Sport. 2021;49:149–156. doi: 10.1016/j.ptsp.2021.02.016 [DOI] [PubMed] [Google Scholar]
- [57].Bade M, Cobo‐Estevez M, Neeley D, et al. Effects of manual therapy and exercise targeting the hips in patients with low‐back pain—A randomized controlled trial. J Eval Clin Pract. 2017;23(4):734–740. doi: 10.1111/jep.12705 [DOI] [PubMed] [Google Scholar]
- [58].Burns SA, Cleland JA, Rivett DA, et al. When treating coexisting low back pain and hip impairments, focus on the back: adding specific hip treatment does not yield additional benefits—A randomized controlled trial. J Orthop Sports Phys Ther. 2021;51(12):581–601. doi: 10.2519/jospt.2021.10593 [DOI] [PubMed] [Google Scholar]
- [59].Medeiros SAD, Silva HJDA, Nascimento RMD, et al. Mat pilates is as effective as aquatic aerobic exercise in treating women with fibromyalgia: a clinical, randomized and blind trial. Adv Rheumatol. 2020;60(1):21. doi: 10.1186/s42358-020-0124-2 [DOI] [PubMed] [Google Scholar]
- [60].de Oliveira RF, Costa LOP, Nascimento LP, et al. Directed vertebral manipulation is not better than generic vertebral manipulation in patients with chronic low back pain: a randomised trial. J Physiother. 2020;66(3):174–179. doi: 10.1016/j.jphys.2020.06.007 [DOI] [PubMed] [Google Scholar]
- [61].de Oliveira RF, Liebano RE, Costa LDCM, et al. Immediate effects of region-specific and non–region-specific spinal manipulative therapy in patients with chronic low back pain: a randomized controlled trial. Phys Ther. 2013;93(6):748–756. doi: 10.2522/ptj.20120256 [DOI] [PubMed] [Google Scholar]
- [62].Dolak KL, Silkman C, McKeon JM, et al. Hip strengthening prior to functional exercises reduces pain sooner than quadriceps strengthening in females with patellofemoral pain syndrome: a randomized clinical trial. J Orthop Sports Phys Ther. 2011;41(8):560–570. doi: 10.2519/jospt.2011.3499 [DOI] [PubMed] [Google Scholar]
- [63].Donaldson M, Petersen S, Cook C, et al. A prescriptively selected nonthrust manipulation versus a therapist-selected nonthrust manipulation for treatment of individuals with low back pain: a randomized clinical trial. J Orthop Sports Phys Ther. 2016;46(4):243–250. doi: 10.2519/jospt.2016.6318 [DOI] [PubMed] [Google Scholar]
- [64].Parreira Pdo C, da Costa C, Takahashi R, et al. Kinesio taping to generate skin convolutions is not better than sham taping for people with chronic non-specific low back pain: a randomised trial. J Physiother. 2014;60(2):90–96. doi: 10.1016/j.jphys.2014.05.003 [DOI] [PubMed] [Google Scholar]
- [65].Silva NC, de Castro Silva M, Tamburús NY, et al. Adding neuromuscular training to a strengthening program did not produce additional improvement in clinical or kinematic outcomes in women with patellofemoral pain: a blinded randomised controlled trial. Musculoskelet Sci Pract. 2023;63:102720. doi: 10.1016/j.msksp.2023.102720 [DOI] [PubMed] [Google Scholar]
- [66].Higgins JPT, Eldridge S, Li T. Chapter 23: including variants on randomized trials [last updated October 2019]. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page MWelch V, editors. Cochrane handbook for systematic reviews of interventions version 6.5. Cochrane; 2024. Available from: www.training.cochrane.org/handbook [Google Scholar]
- [67].Shafshak TS, Elnemr R. The visual analogue scale versus numerical rating scale in measuring pain severity and predicting disability in low back pain. J Clin Rheumatol. 2021;27(7):282–285. doi: 10.1097/RHU.0000000000001320 [DOI] [PubMed] [Google Scholar]
- [68].Bijur PE, Latimer CT, Gallagher EJ. Validation of a verbally administered numerical rating scale of acute pain for use in the emergency department. Acad Emerg Med. 2003;10(4):390–392. doi: 10.1197/aemj.10.4.390 [DOI] [PubMed] [Google Scholar]
- [69].Childs JD, Piva SR, Fritz JM. Responsiveness of the numeric pain rating scale in patients with low back pain. Spine (Phila Pa 1976). 2005;30(11):1331–1334. doi: 10.1097/01.brs.0000164099.92112.29 [DOI] [PubMed] [Google Scholar]
- [70].Selhorst M, Rice W, Degenhart T, et al. Evaluation of a treatment algorithm for patients with patellofemoral pain syndrome: a pilot study. Int J Sports Phys Ther. 2015;10(2):178. [PMC free article] [PubMed] [Google Scholar]
- [71].Modarresi S, Lukacs MJ, Ghodrati M, et al. A systematic review and synthesis of psychometric properties of the numeric pain rating scale and the visual analog scale for use in people with neck pain. Clin J Pain. 2022;38(2):132–148. doi: 10.1097/AJP.0000000000000999 [DOI] [PubMed] [Google Scholar]
- [72].Mintken PE, Glynn P, Cleland JA. Psychometric properties of the shortened disabilities of the arm, shoulder, and hand questionnaire (QuickDASH) and numeric pain rating scale in patients with shoulder pain. J Shoulder Elbow Surg. 2009;18(6):920–926. doi: 10.1016/j.jse.2008.12.015 [DOI] [PubMed] [Google Scholar]
- [73].Kamper SJ. Confidence intervals: linking evidence to practice. J Orthop Sports Phys Ther. 2019;49(10):763–764. doi: 10.2519/jospt.2019.0706 [DOI] [PubMed] [Google Scholar]
- [74].Kamper SJ. Showing confidence (intervals). Braz J Phys Ther. 2019;23(4):277–278. doi: 10.1016/j.bjpt.2019.01.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [75].Elkins MR, Pinto RZ, Verhagen A, et al. Corrigendum to ‘statistical inference through estimation: recommendations from the international society of physiotherapy journal editors’. J Physiother. 2022;68(2):89. doi: 10.1016/j.jphys.2022.03.008 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
