ABSTRACT
Introduction
Reliable, valid, and responsive outcomes is foundational to address concerns about the risks and benefits of performing spinal manipulation and mobilization in pediatric populations. The aim of this systematic review was to synthesize evidence on measurement properties from cohort/case-control/cross-sectional/randomized studies on patient-reported (SQLI – Scoliosis Quality of Life Index; VAS-Visual Analog Scale; PAQLQ – Pediatric Asthma Quality of Life Questionnaire), observer-reported (Crying Diaries; ATEC – Autism Treatment Evaluation Checklist) and mixed (PedsQL – Pediatric Quality of Life Inventory) outcome measurements identified through a scoping review on manipulation and mobilization for pediatric populations with diverse medical conditions.
Method and Analysis
Electronic databases, clinicaltrial.gov and Ebsco Open Dissertations were searched up to 21 October 202221 October 2022. Two independent reviewers selected studies, extracted data, and assessed risk of bias. Qualitative synthesis was performed using COSMIN and Cochrane GRADE methodology to establish the certainty of evidence and overall rating: sufficient (+), insufficient (-), inconsistent (±), indeterminate (?).
Results
Eighteen studies (2 SQLI for scoliosis; 1 VAS – perceived influence of exertion or movement/position on low back problems; 1 PAQLQ for asthma; 1 Crying Diaries for infantile colic; 8 ATEC for autism; 5 PedsQL for cerebral palsy/scoliosis/healthy) with 9653 participants were selected. ATEC and PedsQL had overall sufficient (+) measurement properties with moderate certainty evidence. PAQLQ had indeterminate measurement properties with moderate certainty evidence. Very low certainty of evidence identified measurement properties to be indeterminate (?) for SQLI, Crying Diaries, and VAS- perceived influence of exertion or movement/position on low back problems.
Conclusion
ATEC for autism and PedsQL for asthma may be a suitable clinical outcome assessment (COA); additional validation studies on responsiveness and the minimal important difference are needed. Other COA require further validation.
KEYWORDS: Psychometrics; outcome measures, mobilization, manipulation, pediatric
1. Introduction
There is a great need to accurately map the scientific evidence on manipulation and mobilization using clinical outcome assessments (COA) with sufficient psychometric measurement properties (Box 1). The safety of spinal manipulation and mobilization for pediatric populations presenting with diverse conditions is of concern and is being questioned by policy makers, clinicians, and guardians internationally. The use of spinal manipulation and mobilization for pediatric populations (infants, children, and adolescents) with diverse clinical conditions has been associated with risks and uncertainty of treatment effect so much so that the Australian Government in the state of Victoria have restricted its practice in infants [1]. In a scoping review on spinal manipulation and mobilization for pediatric population, several COAs were identified as being used to assess the effectiveness of manipulation and mobilization therapy (Figure 1 [2]).
Box 1.
Clinical outcome assessments (COA) with defined psychometric measurement properties are categorized into four outcome measures as defined by [3].
Patient-reported outcome measure
A measurement based on patient’s self-report of difficulty completing a task or feelings experienced during activities of daily living.
Observer-reported outcome measure
A measurement based on observable signs, events or behaviors related to a patient’s health condition by someone other than the patient or a health professional. Generally, observer-reported outcomes are reported by a parent, caregiver, or someone who observes the patient in daily life. They are particularly useful for patients who cannot report for themselves (e.g. infants or individuals who are cognitively impaired). An observer-reported outcome measure does not include medical judgment or interpretation.
Clinician-reported outcome measure
A measurement based on observations of a patient’s health condition from a trained health-care professional. Most clinician-reported outcome measures involve a clinical judgment or interpretation of the observable signs, behaviors, or other manifestations related to a disease or condition. Clinician-reported outcome measures cannot directly assess symptoms that are known only to the patient.
Performance-based outcome measure
A measurement based on standardized task(s) performed by a patient that is administered and evaluated by an appropriately trained individual or is independently completed.
Psychometric measurement property definitions of three domains
-Reliability
‘The degree to which the measurement is free from measurement error’ [4]. This domain contains three measurement properties: internal consistency, reliability (test-retest, intra/inter-rater), and measurement error.
-Validity
‘The degree to which (an instrument) is an adequate reflection of the construct to be measured’ [3].
If an instrument does not have adequate construct or content validity, then it may not be assessing the skills that it purports to. This domain contains three measurement properties: content validity, construct validity (structural, hypothesis testing, cross-cultural), and criterion validity.
-Responsiveness
The ability to detect change over time in the construct to be measured [3,4].
Figure 1.

Clinical outcome assessment (COA) categories that were identified by a scoping review on manipulation and mobilization in pediatric populations [2]. Patient-reported and observer-reported outcome measures are reported within this systematic review while clinician-reported and performance-based outcomes are part of larger systematic review project. All outcomes were included in our search however the bolded outcomes were those identified to have psychometric properties for pediatric populations.
When evaluating the studies reported in this scoping review, it was determined that psychometric properties of the COAs used were poorly reported. We evaluate content validity and internal structure such as structural validity, internal consistency, reliability, construct validity (structural, hypothesis testing, cross-cultural), criterion validity, and responsiveness. Specifically, it was concluded that 14/69 (20%) studies provided supporting evidence of psychometric properties and 0/69 (0%) provided psychometric values. Knowing the psychometric properties of key COAs have a critical role in determining the effects of outcomes such as pain, function, motor development, mobility difficulties, and participation in life activities. The aim of this systematic review was to systematically evaluate the psychometric properties of key patient-reported outcome measures and observer-reported outcome measures identified by the Task Force on Spinal Manipulation in Children scoping review for infants, children, and adolescents aged birth up to 18 years across varied medical conditions.
2. Method
This systematic review was designed in accordance with the PRISMA statement [5] and Consensus-based Standards for the Selection of Health Measurement Instruments (COSMIN) guideline for systematic reviews [6]. The protocol was registered with Open Science Framework (DOI 10.17605/OSF.IO/RN4UX). This review was part of a larger project designed to depict COAs by subcategories: 1. Patient-reported and Observer-reported outcomes; 2. Clinician-reported and Performance-based outcomes [7].
2.1. Eligibility criteria
Type of participants: Paediatric patients (birth up to 18 years) with varied medical conditions identified by a scoping review [2]: Enuresis, Otitis Media, Colic, Excessive Crying/Colic, Breastfeeding, Low Back Pain, Headaches, Cerebral Palsy, Neck Pain, Scoliosis, Attention Deficit Disorder, Autism, Torticollis, Asthma, Chronic Respiratory Illness, KISS (kinetic imbalances due to suboccipital strain), and Dysfunctional Voiding. All conditions were considered, including healthy participants. If there were greater than 20% of the population older than 18 years old, the study was excluded.
Type of clinical outcome assessment: The psychometric properties of patient-reported and observer-reported outcome measures.
Type of study design: Cohort, case-control, cross-sectional, case-series, randomized controlled trials. Questionnaires, surveys, screening tools, case reports and studies that assessed a version of the COA that had been superseded were excluded.
Type of psychometric outcome: Studies that addressed a minimum of one measurement property or aspect of a measurement property from any of the three domains;
Validity: content validity (i.e. relevance, comprehensiveness, comprehensibility, outcome measurement items assessed and appropriately worded), construct validity (structural validity, hypothesis testing, cross-cultural validity), criterion validity
Reliability: internal consistency, test-retest reliability, intra-rater reliability, inter-rater reliability, measurement error (i.e. standard error of measurement (SEM), smallest detectable change (SDC) or limits of agreement (LoA))
Responsiveness (i.e. SDC, minimal important change, area under the curve (i.e. ROC – Receiver Operation Curve)),
Interpretability (i.e. floor and ceiling effect)
We excluded trials where the COA was used as an outcome measure or in validation of another instrument.
2.2. Information sources and search strategy
A medical librarian and information specialist (JM) designed, tailored, and performed an electronic search of PUBMED, Embase, and CINAHL from inception to 26 February 2021, with update 21 October 2022 for all languages but translation was restriction to team member translation capacity (English, Dutch, German, Spanish, French, Korean, Hungarian, Russian, and Croatian) and non-translated studies were recorded. The search strategy was to combine each COA identified by a previously completed scoping review [2] with a modified instrument properties search block for PUBMED created by Terwee and collegues [8] as well as a measurement property filter 1) for the Index to Chiropractic Literature created by JM, 2) for EMBASE translated by EP Jansma [8], and 3) for CINAHL developed by FS van Etten (COSMIN website). By selectively adding a chiropractic filter and a pediatric filter, we balanced the search strategy between recall and precision. A sample of the search parameters can be found in Box 2. We searched gray literature in clinicaltrial.gov and Ebsco Open Dissertations on 29 March 2021. The reference lists of all included articles were manually scanned (TH) for additional relevant studies.
Box 2.
Sample MEDLINE search strategy and parameters including a search block for psychometric properties and a search block for manipulation-mobilization.
Search block for psychometric properties:
(MH ‘Psychometrics’) or (TI psychometr* or AB psychometr*) or (TI clinimetr* or AB clinimetr*) or (TI clinometr* OR AB clinometr*) or (MH ‘Outcome Assessment’) or (TI outcome assessment or AB outcome assessment) or (TI outcome measure* or AB outcome measure*) or (MH ‘Health Status Indicators’) or (MH ‘Reproducibility of Results’) or (MH ‘Discriminant Analysis’) or ((TI reproducib* or AB reproducib*) or (TI reliab* or AB reliab*) or (TI unreliab* or AB unreliab*)) or ((TI valid* or AB valid*) or (TI coefficient or AB coefficient) or (TI homogeneity or AB homogeneity)) or (TI homogeneous or AB homogeneous) or (TI ‘coefficient of variation’ or AB ‘coefficient of variation’) or (TI ‘internal consistency’ or AB ‘internal consistency’) or (MH ‘Internal Consistency+’) or (MH ‘Reliability+’) or (MH ‘Measurement Error+’) or (MH ‘Content Validity+’) or ‘hypothesis testing’ or ‘structural validity’ or ‘cross-cultural validity’ or (MH ‘Criterion-Related Validity+’) or ‘responsiveness’ or ‘interpretability’ or (TI reliab* or AB reliab*) and ((TI test or AB test) OR (TI retest or AB retest)) or (TI stability or AB stability) or (TI interrater or AB interrater) or (TI inter-rater or AB inter-rater) or (TI intrarater or AB intrarater) or (TI intra-rater or AB intrarater) or (TI intertester or AB intertester) or (TI inter-tester or AB inter-tester) or (TI intratester or AB intratester) or (TI intra-tester or AB intra-tester) or (TI interobserver or AB interobserver) or (TI inter-observer or AB inter-observer) or (TI intraobserver or AB intraobserver) or (TI intra-observer or AB intra-observer) or (TI intertechnician or AB intertechnician) or (TI inter-technician or AB inter-technician) or (TI intratechnician or AB intratechnician) or (TI intra-technician or AB intra-technician) or (TI interexaminer or AB interexaminer) or (TI inter-examiner or AB inter-examiner) or (TI intraexaminer or AB intraexaminer) or (TI intra-examiner or AB intra-examiner) or (TI intra-examiner or AB intraexaminer) or (TI interassay or AB interassay) or (TI inter-assay or AB inter-assay) or (TI intraassay or AB intraassay) or (TI intra-assay or AB intra-assay) or (TI interindividual or AB interindividual) or (TI inter-individual or AB inter-individual) OR (TI intraindividual or AB intraindividual) or (TI intra-individual or AB intra-individual) or (TI interparticipant or AB interparticipant) or (TI inter-participant or AB inter-participant) or (TI intraparticipant or AB intraparticipant) or (TI intra-participant or AB intra-participant) or (TI kappa or AB kappa) or (TI kappa’s or AB kappa’s) or (TI kappas or AB kappas) or (TI repeatab* or AB repeatab*) or (TI responsive* or AB responsive*) or (TI interpretab* or AB interpretab*)
Search block for spinal mobilisation/manipulation:
‘spinal manipulation’ OR ‘spinal manipulations’ OR ‘spinal mobilization’ OR ‘spinal mobilizations’ OR ‘spinal mobilization’ OR ‘spinal mobilizations’ OR ‘spinal adjustment’ OR ‘spinal adjustments’ OR ‘spinal manual therapy’ OR ‘high velocity low amplitude thrust’ OR ‘HVLA’ OR ‘musculoskeletal of the spine’ OR ‘spinal musculoskeletal’ OR ‘manual therapy of the spine’ OR ‘cervical manual therapy’ OR ‘thoracic manual therapy’ OR ‘lumbar manual therapy’ OR ‘manual therapy of the lumbar spine’ OR ‘manual therapy of the thoracic spine’ OR ‘manual therapy of the cervical spine’ OR ‘spinal osteopathy’ OR ‘osteopathy of the spine’ OR ‘osteopathy of the cervical spine’ OR ‘osteopathy of the thoracic spine’ OR ‘osteopathy of the lumbar spine’ OR ‘spinal osteopathies’ OR ‘osteopathies of the spine’ OR ‘osteopathies of the cervical spine’ OR ‘osteopathies of the thoracic spine’ OR ‘osteopathies of the lumbar spine’ OR chirop* OR ‘spinal manipulative therapy’ OR ‘spinal manipulative therapies’
Key: TI = Title; AB = abstract; MH = MeSH heading (MeSH is the MEDLINE index)
2.3. Study identification and selection
The review manager COVIDENCE, (Veritas Health Innovation, Melbourne, Australia) was used in study screening, selection, and data extraction. Each stage of our review used pre-piloted forms and two independent reviewers (screening and selection review teams: JP/TH, AB/AG, KO/DC, NM/OA). An initial screening of study title and abstract and full text selection was performed after a 10-study calibration period and maintained through two additional meetings between the reviewers to ensure consistency (Kappa values were set at 0.4 moderate to 0.75 good a priori). Disagreements were resolved by discussion and with reference to a third reviewer when needed (JP).
2.4. Data extraction
A standardized data collection form was used by a pair of researchers (review teams: JP/TH, AB/TH, OA/AG) independently in addition to the COSMIN checklist [6]. Missing data from primary studies were addressed by either consulting the outcome measurement website/administrator when available and the author for key data. We did not check if studies were registered prior to their start.
2.5. Data items
The following data were extracted: 1. Classification and scoring of the COA; 2. Population demographic characteristics (medical condition, country, number of participants, age, sex, professional involved); 3. Inclusion/exclusion criteria; 4. Results per measurement property; and 5. Statistical methods used.
2.6. Risk of bias (RoB) assessment
Using the COSMIN Risk of Bias checklist, two reviewers review teams (JP/AB, TH/OA) independently evaluated the RoB of each included study and uncertainties resolved through discussion [6,9]. The COSMIN checklist uses a 4-point rating system: very good (VG), adequate (A), doubtful (D), and inadequate (I) (See Box 3 – Step 1). The RoB of research was judged per psychometric property reported, not per study. The overall score for each measurement property on the COSMIN checklist was determined by a ‘worse score counts’ approach.
Box 3. COSMIN steps in evaluating the evidence [30].
![]() |
2.7. Measures of psychometric values
Results for reliability (internal consistency, test-retest/intra-rater/inter-rater reliability, measurement error), validity (a. content validity including relevance, comprehensiveness, comprehensibility, outcome measurement item’s assessed and appropriately worded [10]; b. structural validity, c. hypothesis testing for construct validity, cross-cultural validity, and criterion validity), responsiveness and statistical methods used were reported in Table 4.
Table 4.
Psychometric properties of patient-reported and observer-reported outcome measures.
| Author (year)/ Country |
COA | n | Psychometric | RoB | Results |
|---|---|---|---|---|---|
| Parent et al. [11] Canada |
SQLI | 95 | Construct Validity | VG | Floor effect 1/95, Ceiling effect 12/22 − 6/22 hitting ceiling effect 20 to 50% of the time |
| Feise et al. [12] US |
SQLI | 84 | Internal Consistency |
VG | Global α = 0.89 (Domains: 0.82 to 0.85) |
| Reliability (test-retest) |
D | Global ICC = 0.80 (Domains: 0.46 to 0.85) |
|||
| Construct Validity | D | Comparison with Quality-of-Life Profile for Spinal Deformities: rs = 0.79 (Domains: 0.46 to 0.81) |
|||
| Gwet’s agreement: 0.73 to 0.98 | |||||
| Staes [13] Belgium |
VAS¶ (0 to 10) |
61 | Reliability (test-retest) |
D | k > 0.50 |
| Juniper et al. [14] Canada |
PAQLQ | 52 | Reliability (test-retest) |
A | Overall ICC = 0.95 Emotional ICC = 0.89 Activity Limitation ICC = 0.84 |
| Construct Validity | A | Pearson Correlation Coefficient (ρ) global rating of change: - Symptoms: ρ = 0.44 to 0.71 - Activities: ρ = 0.40 to 0.63 - Emotions: ρ = 0.41 to 0.60 |
|||
| Responsiveness | A | Minimal Important Difference: Overall: 0.42 Moderate changes quality of life: 1.03 Responsiveness Index: overall 0.59 |
|||
| Barr et al. [15] Canada |
Crying Diaries | 10 | Construct Validity | D | Comparison of crying hours on tape: rs = 0.64 |
| Abaoud et al. [16] Saudi Arabia |
ATEC | 363 | Internal Consistency |
A | Correlation between items & subscales total ρ = 0.133 to 0.601 (high internal consistency) |
| D | Cronbach’s alpha, Cronbach’s alpha if item deleted, Spearman-Brown’s split-half were computed to establish the checklist’s reliability and the contribution of each item to reliability α = 0.91 1st subscale α = 0.72 2nd subscale α = 0.97 3rd subscale |
||||
| Structural Validity | A | exploratory factorial analysis using principal component analysis was performed to check the reliability of the checklist configuration and the one-dimensionality of its four factors; analysis used Kaiser Criterion in which the factor is considered substantial if its latent root is ≤1.0; latent roots for the third and the fourth subscale ranged between 2.08 and 5.9 and 3.6 and 10.19, respectively, this supports the univariate nature of the dimensions. |
|||
| Content Validity | I | 10 expert referees evaluated for clear word usage, linguistic errors, and suitability with 90% minimal agreement level; the researchers made minor necessary adjustments to ensure the adapted version of the ATEC fit within a Saudi cultural context. | |||
| Construct Validity (discriminate) |
A | The correlation between the scores of the total ATEC and educational stages rs = 0.140; p = 0.01 |
|||
| Al Backer [17], Saudi Arabia |
ATEC | 40 | Construct Validity | VG | Comparison to Childhood Autism Rating Scale: rs = 0.15, p = 0.926 meaning that there was no correlation between CARS and ATEC scales |
| Freire et al. [18] US |
ATEC | 42 | Reliability (test-retest) |
D | rs = 0.90 |
| Construct Validity | I | Comparison of Childhood Autism Rating Scale: ICC = 0.80 | |||
| Geier et al. [19] US |
ATEC | 56 | Construct Validity | VG | Comparison of Childhood Autism Rating Scale: rs = 0.71 |
| Construct Validity (predictive) |
VG | To predict severely effected child: Cut-off point: total 49 Sensitivity 96%, Specificity 67% |
|||
| Magiati et al. [20] UK |
ATEC | 22 | Internal Consistency |
VG | α: total score: 0.91 to 0.96 |
| I | Follow-up 1: predicted 46% of the variance between follow-up 1 and follow-up 2: (R2 = 0.46, p = 0.001); Follow-up 2: outcome ranks (R2 = 0.64, p = 0.001) |
||||
| Mahapatra et al. [21] | ATEC | 2272 | Construct Validity (predictive) |
D | Visit 8-Visit 1 Least squared means: |
| US | − 2 to 3 years: −28.35 (SE 1.30; p < 0.0001) |
||||
| − 3 to 6 years: −19.73 (SE 0.72; p < 0.0001) |
|||||
| − 6 to 12 years: −13.80 (SE 0.96; p < 0.0001) |
|||||
| Memari et al. [22] Iran |
ATEC | 134 | Internal Consistency |
VG | α = 0.93 λ = 0.77 |
| Reliability (test-retest) |
D | ICC = 0.89 | |||
| Construct Validity | D | Comparison of Autism Diagnostic Interview - Revised: rs = 0.38 to 0.79 | |||
| Sunakarach and Kessomboon [23] Thailand |
ATEC | 160 | Reliability (inter-rater) |
A | ICC = 0.97 |
| Construct Validity (predictive) |
D | Cut-off of 8, to predict mild from moderate to severe: Sensitivity 94%, Specificity 62%, Positive Predictive Value 83%, Negative Predictive Value 81% | |||
| Cheung et al. [24] Hong Kong |
PedsQL | 566 | Construct Validity | A | Comparison of Scoliosis Research Society-22 item: − 8 to 12 years old: rs = 0.29 (p < 0.06) (Domains: 0.11 to 0.27) |
| − 13 to 18 years old: Total rs = 0.37 (p < 0.01) (Domains: 0.28 to 0.34) |
|||||
| 566 | Construct Validity (predictive) |
A | Floor and ceiling effects: Total score: − 8 to 12 years old: 0% floor effect, 10.6% ceiling effect |
||
| − 13 to 18 years old: 0% floor effect, 8.7% ceiling effect |
|||||
| Seid et al. [25] US |
PedsQL | 252 | Internal Consistency Measurement Error |
VG | α = 0.58 to 0.84 Total scores: - Child α = 0.84 - Parent α = 0.82 SEM: child self-report: 6.12, parent proxy scale: 5.84 |
| Construct Validity (discriminative) |
I | Child self-report total scores: - asthma vs healthy: paired t-test 0.56 - mild vs moderate: paired t-test 0.47 - mild vs severe: paired t-test 0.70 - severe vs mild: paired t-test 0.25 Parent proxy-report: - asthma vs healthy: paired t-test 0.73 - mild vs moderate: paired t-test 0.37 - mild vs severe: paired t-test 0.54 - moderate vs severe: paired t-test 0.19 |
|||
| Construct Validity (convergent) |
A | Parent proxy-report and child-report total score: ICC = 0.47 | |||
| Responsiveness | D | Child self-report total: Effect Size for all patients = 0.58, Effect Size for stable patients only = 0.75 Parent proxy-report total: Effect Size for all patients = 0.62, Effect Size for stable patients only = 0.71 |
|||
| Tantilipikorn et al. [26] Thailand |
PedsQL | 97 | Internal Consistency |
A | Parent reports α range = 0.74 to 0.94 |
| 54 | A | Child self-reports α range = 0.67 to 0.91 |
|||
| 97 | Reliability (test-retest) |
D | Parent reports ICC range = 0.75 to 0.95 |
||
| 54 | Reliability (test-retest) |
D | Child self-reports ICC range = 0.68 to 0.93 |
||
| 97 | Construct Validity | I | Parent reports: Factor Loading KMO = 0.787 |
||
| Varni et al. [27] US |
PedsQL | 288 | Construct Validity | I | Analysis of sensitivity between subgroups: F = 6.63 to 7.57 between Class I/II, Class III/IV, and healthy participants and they were significant at p = 0.001 |
| 43 | Construct Validity (predictive) |
I | Parents report: paired t-test = physical health 5.78, psychosocial health 6.29; total score 7.68 |
||
| 30 | Construct Validity (predictive) |
I | Child self-report: paired t-test = physical health 3.18, psychosocial health 3.18, total score 3.26 |
||
| Varni et al. [28] US |
PedsQL | 1677 | Internal Consistency |
VG | Parent proxy report: α = 0.90 Child self-report: α = 0.88 |
| 1629 | Construct Validity | VG | Analysis between healthy, acute, and chronically ill, have been found to distinguish between healthy children and children with acute and chronic health conditions, and to distinguish severity of illness within chronic health conditions. | ||
| 963 | Construct Validity | VG | Parents report: F = 38.90 to 128.30 p = 0.001 Child self-report: F = 4.84 to 15.05 p = 0.001 |
||
| VG | Comparison of indicators of morbidity and parent/child report: ICC = 0.13 to 0.05 |
Key: ¶ VAS (0 to 100) = Visual Analog Scale perceived influence of exertion or movement/position on low back problems (with anchors “no influence, worst increase in pain”); SQLI = Scoliosis Quality of Life Index, PedsQL = Pediatric Quality of Life Index, PAQLQ = Pediatric Asthma Quality of Life Questionnaire, ATEC = Autism Treatment Evaluation Checklist, “VG” very good, “A” adequate, “D” doubtful, “I” inadequate; Effect Size = the difference between the group means, divided by the pooled standard deviation or Cohen d effect size; α = Cronbach’s alpha, ICC = Intraclass correlation coefficient, k = Kappa Concordance Coefficient, KMO = Kaiser-Meyer-Olkin Test, rs = Spearman’s rank correlation coefficient; ρ = Pearson correlation coefficients (PCC); p = p-value level of significance, F = F–statistics, SE = Standard Error, SEM = Standard Error of Measure, λ = Guttman Split Half, R2 = Linear Regression.
The heterogeneity of the study population and the properties evaluated was qualitatively assessed. No further quantitative analysis or meta-analysis was completed by psychometric value due to the heterogeneity of study populations and the properties evaluated. Statistics depicted in this paper are directly quoted as the authors presented them. Confidence Intervals (CI) and standard deviations (SD) are presented if they were reported.
2.8. Data synthesis and analysis methods
A descriptive synthesis of findings for the psychometric properties of each COA across all identified studies was summarized in Table 1 and using the updated criteria of good measurement properties established by COSMIN’s (Appendix A [9]). The result for each measurement property (content validity, structural validity, internal consistency, reliability, measurement error, hypothesis testing for construct validity, cross-cultural validity, criterion validity and responsiveness) was rated as either sufficient (+), insufficient (–), or indeterminate (?) (Box 3 – Step 2; see Appendix A for definitions [9;30]). The results of all available studies on measurement properties were then qualitatively or quantitatively (pooled results) summarized using COSMIN defined criteria for an OVERALL RATING; this was based on our reviewer’s consensus confidence regarding the overall ratings being trustworthy and based on grading the quality of the evidence (Box 3 – Step 3). The overall ratings of each measurement property were rated as sufficient (+), insufficient (–), or inconsistent (±). This indicates how confident we were that the pooled results or overall ratings are trustworthy. Box 3 - Step 4 is detailed in section 2.9. Finally, the interpretability (Appendix B) and feasibility (Appendix C) although not formally considered measurement properties, were reported as they do inform the clinician when selecting a COM and when identifying the ease of application within its intended context.
Table 1.
Characteristics of included studies.
| Author (year) | Country of Publication | COA | Medical Conditions Studied | Professional | Age range (years) |
Sex ratio (M:F) |
Participant number |
Psychometric Domain Addressed |
|---|---|---|---|---|---|---|---|---|
| Parent et al. [11] | Canada | SQLI | Scoliosis | Physical Therapist |
8 to 20 | 0:95 | 95 | Validity |
| Feise et al. [12] | US | SQLI | Scoliosis | Chiropractor | 0 to 18 | 14:70 | 84 | Reliability Validity |
| Staes et al. [13] | Belgium | VAS¶ | Low back problems | Physical Therapist |
16 to 18 | 20:41 | 61 | Reliability |
| Juniper et al. [14] | Canada | PAQLQ | Asthma | NR | 7 to 17 | 30:22 | 52 | Reliability Validity Responsiveness |
| Barr et al. [15] | Canada | parental rated crying diary |
Excessive crying i.e.,infantile colic |
Medical Doctor |
5 to 7 weeks | 5:5 | 10 | Validity |
| Abaoud et al. [16] | Saudi Arabia | ATEC | Autism | Teacher | NR | 268:95 | 363 | Reliability |
| Al Backer [17] | Saudi Arabia | ATEC | Autism | Psychologist | 3 to 12 | 33:7 | 40 | Validity |
| Freire et al. [18] | Brazil | ATEC | Autism | NR | 2 to 6 | 34:8 | 42 | Reliability Validity |
| Geier et al. [19] | US | ATEC | Autism | NR | 2 to 16 | 49:7 | 56 | Validity |
| Magiati et al. [20] | UK | ATEC | Autism | NR | 2 to 4 | 22:0 | 22 | Reliability Validity |
| Mahapatra et al. [21] | US | ATEC | Autism | NR | 2 to 12 | 1181:391 | 2272 | Validity |
| Memari et al. [22] | Iran | ATEC | Autism | Medical Doctor |
6 to 15 | 111:23 | 134 | Reliability Validity |
| Sunakarach and Kessomboon [23] | Thailand | ATEC | Autism | Psychiatrist, psychologist and two translators |
3 to 18 | 47:113 | 160 | Reliability Validity |
| Cheung et al. [24] | Hong Kong | PedsQL | Scoliosis | NR | 8 to 18 | 183:383 | 566 | Validity |
| Desai et al. [29] | US | PedsQL | Hospitalized children otherwise healthy |
Medical Doctor |
0 to 18 | 2465:2172 | 4637 | Validity |
| Tantilipikorn et al. [26] | Thailand | PedsQL | Cerebral Palsy | Physical Therapist | 5 to 18 | 29:25 | Child: 54 Parent: 97 |
Reliability Validity |
| Seid et al. [25] | US | PedsQL | Asthma | NR | 3 to 14 | 155:97 | 252 | Reliability Validity Responsiveness |
| Varni et al. [28] | US | PedsQL | Chronically ill, acutely ill, healthy participants | NR | 5 to 18 | 815:830 | 963 | Reliability Validity |
Key: ATEC = Autism Evaluation Treatment Checklist, SQLI = Scoliosis Quality of Life Index, ¶ VAS (0 to 10) = Visual Analog Scale perceived influence of exertion or movement/position on low back problems (with anchors ‘no influence, worst increase in pain’), PedsQL = Pediatric Quality of Life Inventory, PAQLQ = Pediatric Asthma Quality of Life Questionnaire, NR: not reported.
2.9. Certainty assessment
We used the COSMIN version of the GRADE (Grading of Recommendations, Assessment, Development and Evaluations) approach to summarize the certainty of the evidence across all psychometric properties: 1. High, 2. Moderate, 3. Low, and 4. Very Low (see Box 3 – Step 4) and reported this in a summary of findings table (Table 3). A detailed explanation on objective requirements for downgrading items including inconsistency (−1 serious, −2 very serious), imprecision (−1 total n = 50 to 100; −2 total n < 50), indirectness (−1 serious; −2 very serious), and RoB (−1 serious; −2 very serious; −3 extremely serious) can be found in COSMIN guidelines (page 33–36 [9].
Table 3.
Summary of findings for each COA included: 1) criteria rating for good measurement property (sufficient (+), insufficient (-), inconsistent (±), indeterminate (?); see gray highlighted ROW) for each measurement property (content validity, structural validity, internal consistency, reliability, measurement error, hypothesis testing for construct validity, cross-cultural validity, criterion validity and responsiveness), 2) overall rating across measurement properties for each COA (sufficient (+), insufficient (–), or indeterminate (?); see gray highlight COLUMN), and 3) certainty of evidence (GRADE). Note that the author(s) with risk of bias score by psychometric property were rated as very good (VG), adequate (A), doubtful (D), or inadequate (I) (i.e., [12] (D)).
| COA |
Content Validity Structural Validity |
Reliability |
Validity |
Overall Rating across measurement properties |
Certainty of Evidence (GRADE) |
|---|---|---|---|---|---|
| Criteria Rating | Internal Consistency | Measurement error | Responsiveness | ||
| SQLI |
Content Validity: NR Structural Validity: NR Internal Consistency: α = 0.89 (1 study, n = 84) Feise et al. [12] (VG) |
Test-retest: ICC = 0.80, (1 study, n = 84) Feise et al. [12] (D) Measurement error: - SEM NR - SDC NR - LoA NR |
Construct Validity:-Ceiling effect 20 to 50%, (1 study, n = 95), [11] (VG);-Comparison of Quality-of-Life Profile for Spinal Deformities rs = 0.79, (1 study, n = 84) Feise et al. [12] (D) Cross-cultural: NR Criterion Validity: NR Responsiveness: SDC NR, MIC NR, ROC NR |
(?) indeterminate | VERY LOW ⊝⊝⊝⊝ downgraded due to limitation in
|
| Criteria Rating | ?|?|+ | + | ? | +|?|?|? | ||
| VAS¶ |
Content Validity: NR Structural Validity: NR Internal Consistency: NR |
Test-retest: k > 0.50, (1 study, n = 61) Staes et al. [13] (D) Measurement error: - SEM NR - SDC NR - LoA NR |
Construct Validity: NR Cross-cultural: NR Criterion Validity: NR Responsiveness: SDC NR, MIC NR, ROC NR |
(?) indeterminate | VERY LOW ⊝⊝⊝⊝ downgraded due to limitation in
|
| Criteria Rating | ?|?|? | - |? | ?|?|?|? | ||
| PAQLQ |
Content Validity: NR Structural Validity: NR Internal Consistency: NR |
Inter-rater: NA Test-retest: ICC = 0.95 (1 study, n = 52) Juniper et al. [14] (A) Measurement error: - SEM NR - SDC NR - LoA NR |
Construct Validity: Correlation with global rating of change: ρ = 0.40–0.71 (1 study, n = 52) Juniper at al. [14] (A) Cross-cultural: NR Criterion Validity: NR Responsiveness: MIC = 0.42 on a 1 to 7 scale, also available for subscores (1 study, n = 52) Juniper et al. [14] (A) |
(?) indeterminate | MODERATE ⊕⊕⊕⊝ downgraded due to limitation in
|
| Criteria Rating | ?|?|? | +|? | +|?|?|+ | ||
| Crying Dairies |
Content Validity: NR Structural Validity: NR Internal Consistency: NR |
Reliability: NR Measurement error: - SEM NR - SDC NR - LoA NR |
Construct Validity: Crying hour taped rs = 0.64 (1 study, n = 10) Barr et al. [15] (D) Cross-cultural: NR Criterion Validity: NR Responsiveness: SDC NR, MIC NR, ROC NR |
(?) indeterminate | VERY LOW ⊝⊝⊝⊝ downgraded due to limitation in
|
| Criteria Rating | ?|?|? | ?|? | ?|?|?|? | ||
| ATEC |
Content Validity:
Relevance - researchers made minor necessary adjustments to ensure the adapted version of the ATEC fit within a Saudi cultural context. (1 study, n = 363) Abaoud et al. [16] (A); Comprehensiveness - NR; Comprehensibility: NR; PROM item’s assessed and appropriately worded (1 study, n = 363) Abaoud et al. [16] (A) Structural Validity: exploratory factor analysis supports the univariate nature of the dimensions using Kaiser Criterion (1 study, n = 363) Abaoud et al. [16] (A) Internal Consistency: α = 0.72 to 0.97 and PCC between items and subscales = 0.133 to 0.601 (3 studies, n = 519) Abaoud et al. [16] (A), Memari et al. [22] (VG), Magiati et al. [20] (VG) |
Inter-rater: ICC = 0.97 (1 study, n = 160) Sunakarach et al. [23] (A) Test-retest: ICC = 0.89 (1 study, n = 134) Memari et al. [22] (D) ρ = 0.90 (1 study, n = 42) Freire et al. [18] (D) Measurement error: - SEM NR - SDC NR - LoA NR |
Construct Validity (5 studies, n = 2544): -Correlation with Childhood Autism Rating scale (3 studies, n = 138): rs = 0.15 Al Backer et al. [17] (VG) rs = 0.71 Geier et al. [19] (VG) ICC = 0.80 Freire Geier et al. [18] (I) -Comparison of Autism Diagnostic interview rs = 0.38 to 0.79 (1 study, n = 134) Memari et al. [22] (D) -Least square means Visit 8 - Visit 1 (1 study, n = 2272) Mahapatra et al. [21] (D) - Discriminative: rs = 0.14 (1 study, n = 363) Abaoud et al. [16] (A) Cross-cultural: NR Criterion Validity: - Predictive: Cutoff 8: Sensitivity 94%, Specificity 62% (1 study, n = 164) Sunakarach et al. [23] (D) Cutoff 49 severely affected child: Sensitivity 96%, Specificity 67%, (1 study, n = 56) Geier et al. [19] (VG) Least squared means by age (1 study, n = 2,472) Mahapatra et al. [21] (D) Responsiveness: SDC NR, MIC NR, ROC NR |
(+) sufficient | MODERATE ⊕⊕⊕⊝ downgraded due to limitation in
|
| Criteria Rating | +|+|+ | +|? | +|?|+|? | ||
| PedsQL |
Content Validity: NR Structure Validity: NR Internal Consistency: Child self-report α = 0.67 to 0.91 (3 studies, n = 1983) Tantilipikom et al. [26] (A) Varni et al. [28] (VG) Seid et al. [25] (VG) Parent proxy α = 0.74 to 0.94 (3 studies, n = 2026) Tantilipikom et al. [26] (A) Varni et al. [28] (VG) Seid et al. [25] (VG) |
Test-retest: Parent proxy ICC = 0.75 to 0.95 (1 study, n = 97) Tantilipikom et al. [26] (D) Child self-reports ICC = 0.63 to 0.93 (1 study, n = 54) Tantilipikom et al. [26] (D) Measurement error: child self-report: SEM 6.12, parent proxy scale: SEM 5.84 (1 study, n = 252) Seid et al. [25] (D) - SDC NR - LoA NR |
Construct Validity (6 studies, n = 7,468): -Compared to known medical complexity Cohen d=0.32–0.40 (1 study, n = 4637) Desai et al. [29] (VG) -Compared healthy, acute/chronic illness, and severity of illness F tests were significant between groups (1 study, n = 1629) Varni et al. [28] (VG) -Comparison of indictors of morbidity: ICC = 0.13 to 0.05 (1 study, n = 963) Varni et al. [28] (VG) -Compared to Scoliosis Research society-22 varied ages: ICC = 0.29 to 0.37 (1 study, n = 566) Cheung et al. [24] (A) Convergent validity -Parent proxy-report and child report: ICC = 0.47 (1 study, n = 252) Seid et al. [25] (A) |
(+) sufficient | MODERATE ⊕⊕⊕⊝ downgraded due to limitation in
|
|
-F = 4.84 to 15.05 (2 studies, n = 1881) Varni et al. [28] (VG) Seid et al. [25] (I) -Analysis of sensitivity between subgroups F = 6.63 to 7.57, p = 0.001 (1 study, n = 288) Varni et al. [27] (I) -Parent report: F = 38.90 to 128.30, p = 0.001 (1 study, n = 963) Varni et al. [28] (VG) -Patient report: Factor loading KMO 0.787 (1 study, n = 97) |
|||||
| Tantilipikom et al. [26] (D) - Discriminative validity Child self-report (total): - asthma vs healthy: t = 0.56 - mild vs severe: t = 0.70 Parent proxy-report: - asthma vs healthy: t = 0.73 - mild vs severe: t = 0.54 (1 study, n = 252) Seid et al. [25] (I) Cross-cultural: NR Criterion Validity: - Predictive Validity (3 studies, n = 5273): -Loss of school days, readmission, Emergency visit (1 study, n = 4637) Desai et al. [29] (I) -Parent report physical health t-test 5.78 Child report physical health t-test 3.18 (1 study, n = 43 and 30) Varni et al. [27] (I) -Ceiling effect 10% (1 study, n = 566) Cheung et al. [24] (A) Responsiveness: -Child self-report Effect Size-1 for all patients Cohen d = 0.58 small (1 study, n = 252) Seid et al. [25] (D) -Parent poxy report Effect Size-1 for all patients Cohen d = 0.62 small (1 study, n = 252) Seid et al. [25] (D) |
|||||
| -t-test 3.18 (1 study, n = 43 and 30) Varni et al. [27] (I) -Ceiling effect 10% (1 study, n = 566) Cheung et al. [24] (A) Responsiveness: -Child self-report Effect Size-1 for all patients Cohen d = 0.58 small (1 study, n = 252) Seid et al. [25] (D) -Parent poxy report Effect Size-1 for all patients Cohen d = 0.62 small (1 study, n = 252) Seid et al. [25] (D) |
|||||
| Criteria Rating | ?|?|+ | +|? | -|?|-|? |
Key: COA = clinical outcome assessment; criteria rating: “+” = sufficient, “−” = insufficient, “?” = indeterminate, consistent with criteria defined in Table 1 from [30]in Appendix A.; ATEC = Autism Evaluation Treatment Checklist, SQLI = Scoliosis Quality of Life Index, ¶VAS = Visual Analog Scale perceived influence of exertion or movement/position on low back problems (with anchors “no influence, worst increase in pain”), PedsQL = Pediatric Quality of Life Inventory, PAQLQ = Pediatric Asthma Quality of Life Questionnaire, PROM = Patient Reported Outcome Measure, n = sample size; MIC = Minimal important change; PROM = Patient reported outcome measure; ρ = Pearson correlation coefficient (PCC); SDC = Smallest Detectable Change; α = Cronbach’s alpha; rs = Spearman’s ranked correlation coefficient (rho); ICC = interclass correlation coefficient; k = kappa concordance coefficient (Cohen’s kappa); t = paired t-test; p = p-value level of significance; NR = not reported; Effect Size = the difference between the group means, divided by the pooled standard deviation or Cohen d effect size; RoB = risk of bias; SEM = standard error of measurement, SDC = smallest detectable change (i.e. 1.96 * √2 * SEM as the measurement error of a change score), LoA = limits of agreement.
3.0. Results
3.1. Study selection
We identified 2348 unique records through database searches, including four studies from reference list screening. Details of the search are presented in the PRISMA flow diagram (Figure 2).
Figure 2.

The PRISMA diagram for study flow.
Following the title and abstract screening, 248 studies were retrieved for full text review. Of these, 153 did not meet the inclusion criteria and were therefore excluded. We found 95 records that addressed the psychometrics of the COAs investigated. Of these 95, 18 studies researched patient-reported, observer-reported, or mixed patient- and observer-reported COAs while 77 further studies assessing clinician-reported or performance-based outcomes were excluded and reported else where [7]. No trials were awaiting assessment or are ongoing. Of the COAs identified (Figure 1) in the inclusion criteria, no articles addressing psychometrics in pediatrics (0 up to 18 years) were found for the following: Global Perceived Effect Scale, Satisfaction Scales, 10-point Likert Scales for symptom complaints, Global Improvement Scale, Patient Asthma Specific Quality of Life Scale, Roland Morris Scale, and Symptom Diaries.
3.1.1. Included studies
We included 18 studies with 9653 participants analyzed. These studies discussed:
3 patient-reported COAs: Scoliosis Quality of Life Index (SQLI, n = 2 studies), Visual Analogue Scale (VAS perceived influence of exertion or movement/position on low back problems, n = 1 studies), Pediatric Quality of Life Inventory (PedsQL, n = 5 studies);
2 observer-reported COAs: Crying Diaries (n = 1 study), Autism Treatment Evaluation Checklist (ATEC, n = 8 studies); and
1 mixed patient- and observer-reported COA: Pediatric Asthma Quality of Life Questionnaire (PAQLQ, n = 1).
Most studies assessed a spectrum of psychometric properties, few were comprehensive; a detailed description of the studies included can be found in Table 1.
Most populations were convenience samples drawn from community and government supported healthcare centers, schools, and daycares. Autism had a higher prevalence in males [31,32] while scoliosis in females [33]. A summary of the population’s age and diagnostic category can be found in Table 2.
Table 2.
Population by clinical outcome assessment.
| COA | Age Study Population | Diagnostic Categories (sample size) |
|---|---|---|
| Scoliosis Quality of Life Index (SQLI) |
0 to 20 years | Idiopathic Adolescent Scoliosis (n = 179) |
| Visual Analogue Scale¶ (VAS) | 16 to 18 years | Low back problems (n = 61) |
| Pediatric Asthma Quality of Life Questionnaire (PAQLQ) | 7 to 17 years | Asthma (n = 52) |
| Crying diaries | 5 to 7 weeks | Excessively Crying Infants (n = 10) i.e., infantile colic |
| Autism Treatment Evaluation Checklist (ATEC) | 2 to 18 years | Autism Spectrum Disorder (n = 3089) |
| Pediatric Quality of Life (PedsQL) |
3 to 18 years | Cerebral palsy (n = 151) Scoliosis (n = 566) Orthopaedic injuries (n = 48) Acutely ill (n = 4785) Chronically ill (n = 367) Healthy (n = 401) Asthma (n = 252) |
Key: ¶ VAS (0 to 10) = Visual Analog Scale perceived influence of exertion or movement/position on low back problems (with anchors “no influence, worst increase in pain”).
3.1.2. Excluded studies
Almost 50% (84/169) of the articles excluded was due to an adult population or a COA not on the inclusion list being studied. Twenty-three articles did not address the psychometric properties of reliability, validity, or responsiveness. These articles included guides for the COAs, treatment approaches or perspectives on etiology. Sixteen articles were judged not to have sufficient information for data extraction (i.e. abstracts from conferences). Three articles were written in Chinese, and we did not have a translator.
3.2. Risk of bias
From the 18 included articles, 35 psychometric properties were studied. RoB was judged to be adequate or very good for 54% (19/35) of the properties presented. The most common sources of bias were poor statistical analysis, unclear methodology and small population numbers.
3.3. Main results by clinical outcome assessment
A summary of main findings with overall rating by COA and the certainty of evidence (GRADE) with reason for downgrading are found in Table 3. Further supporting evidence for Table 3 with description of the results, including RoB and detailed statistical value can be found in Table 4. Some COA (i.e. PedsQL) evaluated different medical conditions (see Table 2) and could not be directly compared to each other.
3.3.1. Patient-reported outcomes
3.3.1.1. Scoliosis quality of life index (SQLI)
The SQLI is a 22-item self-reported or patient-reported health-related quality-of-life questionnaire with 22 questions and 5 domains that was adapted from the Scoliosis Research Society − 22 (SRS-22) questionnaire with the intention of being more applicable to adolescents with idiopathic scoliosis. It takes 2 to 3 minutes to complete and it scored by summing each domain [12]. [11] and [12] assessed the qualities of the SQLI including internal consistency, floor-ceiling effect, test-retest reliability, and construct validity. It is important to note that the population of the Feise study ranged from 8 to 20 years old. Overall, the measurement property of SQLI was indeterminate (?) and quality of evidence was low GRADE downgraded due to limitation in imprecision (−1) and indirectness (−1).
3.3.1.2. Visual analogue scale (VAS) perceived influence of exertion and movement/position on low back problems
The VAS for perceived influence of exertion and movement/position on low back problems is a 10 mm line with anchors of (0) no influence at one end and (10) worst increase in pain at the other end. The VAS exertion or movement/position on low back problems used by [13]had 14 questions assessing two domains – perceived influence of exertion or movement/position on low back problems in adolescents and utilized the anchors ‘no influence, worst increase in pain’. Note that these anchors are unusual and are unvalidated; the anchor for VASpain are typically ‘no pain and worst possible pain’ [34]. Overall, the measurement property of VAS for perceived influence of exertion and movement/position on low back problems was indeterminate (?) and quality of evidence was very low GRADE downgrade due to limitation in risk of bias (−3) and imprecision (−1).
3.3.1.3. Pediatric asthma quality of life questionnaire (PAQLQ)
The PAQLQ-Pediatric Asthma Quality of Life Questionnaire has 23 items (mean score 1 to 7) with three domains: Activity (5 items), Symptoms (10 items), and Emotional Function (8 items). Response options used either a Blue Card (1 = extremely bothered to 7 = not bothered) or Green Card (1 = all of the time to 7 = none of the time) option. One single small study [14] assessed test-retest reliability, construct validity and responsiveness with each psychometric property having adequate quality when assess using the risk of bias checklist. The overall rating was indeterminate (?) and certainty of evidence was moderate downgraded due to imprecision (−1: total 1 study, n < 100).
3.3.2. Observer-reported outcomes
3.3.2.1. Crying diaries
Crying Diaries are observer-reported COA. A care giver records the number of hours of fussing (paroxysmal fussing in infancy, sometimes called ‘colic’), crying, sleeping and content time of the infant compared to reported crying hours to audio taped recording [15]. Crying is a complex act and often interpreted as a negative emotion. It has elements of movement, facial expression, and voice. Considerable variation between subjects suggested a wide range of individual recording styles. One extremely small study of doubtful quality [15]n = 10) assessed construct validity. The overall rating was indeterminate (?) and the certainty of evidence remained very low. This was downgraded due to doubtful risk of bias (−3; there is only one study of inadequate doubtful quality available) and imprecision (−2; there was 1 study, n < 50).
3.3.2.2. Autism treatment evaluation checklist (ATEC)
The ATEC is available in 25 languages [35]. It contains 77 questions that are classified into four subscales proposed to determine the effect of treatment on children with autism spectrum disorder: I. speech/language/communication (14 items); II. sociability (20 items); III. sensory/cognitive awareness (18 items); and IV. health/physical/behavior (25 items). It is designed to be completed by parents, teachers, or caretakers. Eight studies [16–23] evaluated seven elements of validity and reliability (See Table 4). All were in the English Language. ATEC had a sufficient (+) overall rating, and the certainty of evidence was moderate downgraded due to limitations in risk of bias (−1); there were multiple studies of doubtful quality available but also several studies of at least adequate quality). The feasibility seems acceptable, that is, it is short and simple with an easy-on-line or pencil-paper application that is free and accessible (see Appendix B and C).
3.3.3. Mixed patient-reports and observer-reported
3.3.3.1. Pediatric quality of life index (PedsQL)
The PedsQL consists of brief 23 items to assess health-related quality of life in children and young people in the community, school, and clinic with four subscales for physical (8 items), emotional (5 items), social (5 items), and school (5 items) influences [27]. It is a mixed patient-reports (the Self-Report) and parents-reported (the Proxy Report) COA. Six studies address its psychometric properties [24–29]. Reliability and validity are detailed in Table 4. Construct validity had varied results depending on the construct hypothesis. Risk of bias ranged from in inadequate to very good depending on the domain being assessed. The certainty of the evidence was considered moderate with an overall rating of sufficient (+) for the medical condition of cerebral palsy, scoliosis, orthopedic injuries, acutely ill, chronically ill and healthy participants downgraded due to limitation in risk of bias (−1); there were a number of studies of doubtful or inadequate quality.
4. Discussion
We evaluated psychometric properties of pre-identified COAs used to determine the effectiveness of manipulation and mobilization for various medical conditions in a pediatric population [2]. Of nine (9) COAs, three (3) were classified as patient-report, two (2) observer-reported for example parent-reported, and one (1) mixed classification. Although some information was available on VAS (perceived influence of exertion or movement/position on low back problems), Crying Diaries, SQLI and PAQLQ, there was very low certainty evidence for the former three or moderate certainty evidence for PAQLQ with overall indeterminate information on measurement properties to reach a conclusion regarding clinical and research use. For children with asthma, PedsQL had sufficient evidence (moderate certainty). Note that their website (https://www.pedsql.org/) explores other medical conditions, and the rating may change by disorder classification. While ATEC had sufficient information with moderate certainty evidence and were judged to be feasible, the clinical and research interpretability on instrument responsiveness is limited that is, to measure meaningful change over time. It may be selected to be an appropriate patient-reported outcome measure for a child or an adolescent with autism when seeking treatment. The ATEC is available for use at the following web address: https://autism.org/autism-treatment-evaluation-checklist/.
4.1. Overall completeness and applicability of evidence
The context and purpose of our research question was to evaluate clinical assessment tools used to assess the effectiveness of manipulation and mobilization for various medical conditions in a pediatric population. Of the nine patient-reported or observer-reported COAs, ATEC was found to have sufficient evidence for clinical use. We, however, question if this COA is fit for evaluating the effects of manipulation and mobilization therapy. It is important to consider if these tools are appropriate to evaluate outcomes. Manipulation or mobilization therapy, in particular manipulations have been suggested as a treatment technique to reduce the symptoms of Autism [36–38]. However, the research is decidedly weak; there was one case series, 11 case reports and one randomized clinical trial identified [39]. All articles reported an improvement in symptoms. One randomized trial used ATEC for the clinical effects of spinal manipulations [37]. In this study, 14 children previously diagnosed with autism were treated with a percussion adjustment instrument. The ATEC was scored initially and after three months of treatment. Improvement was demonstrated through an overall decrease in ATEC scores, ranging from 62.5 to 48 difference. It is important to note that the responsiveness to change including minimal detectable change and minimal important difference has not been established for ATEC. Authors have suggested that improvement in autism symptoms was due to correcting neural disturbances allowing ideal healing and healthier development [38] or decreasing the afferent sensory information which allow a neurological disafforestation [36]; although research evidence on this suggested model is lacking. While ATEC shows moderate certainty of reliability and validity, it does not have reporting of full responsiveness to change and may result in a misleading conclusion. It may be acceptable to measure the severity of autism symptoms but without knowing the psychometric property of responsiveness may have limited purpose in evaluating change. For PAQLQ in asthma, another trial on manipulation [40] denoted increases in quality of life were greater than the minimally important differences in both treatment and control groups at two and four months. However, there were no significant differences between the groups overall. In this case, the interpretability of responsiveness helps to identify an important baseline to end of study change, yet no significant between group difference was observed.
Tools that could be used to assess the pediatric population (≤18 years) regarding responsiveness to manual therapy treatment for pain and function are inferred from adult data to adolescents and older children. For example, the Neck Disability Index has been used to assess neck function in pediatric populations but is invalid for pediatric or adolescent neck pain. The Neck Disability Index assesses adult activities, such as work, driving, reading, and concentration. These activities do not apply to many children and only few adolescents. However, there are measurement tools that may be more suitable for use when examining the effects of treatment to the spine including, the Young Spine Questionnaire (YSQ) [41]. It contains questions that validly assess the presence of spinal pain (cervical, thoracic, lumbar), in addition to the frequency and intensity of spinal pain and its participation consequences for children aged 9 to 11 years. Most recently, the Young Disability Questionnaire (spine) has been developed [42] and validated [43] for measuring the consequences of neck, midback and low back pain in school children (9 to 12 years).
For self-report of pain intensity, there are numerous measures that have been identified as having acceptable reliability, validity and responsiveness in pediatric populations. Three commonly used tools that have been validated for measuring pain intensity generally in pediatric populations (6 to 8 years+), are the Faces Pain Scale – Revised (FPS-R) [44], Visual Analog Scale (VAS) with anchors “no pain’, ‘worst possible pain’ [45] and the Numerical Rating Scale (NRS-11) [46]. Nonetheless, the following have been suggested as most appropriate for clinical use according to age of child: Pieces of Hurt Tool (3 to 5 years); Faces Pain Scale – Revised (6 to 11 years) and Verbal Numeric Rating Scale-11 (VNRS-11 (12 to 18 years) [47]. However, it is important to note that these instruments were tested mostly in pediatric populations with arthritis and acute post surgical pain and have not been independently examined for validly measuring spinal pain in pediatric populations. Whilst new measures are being developed to validly assess pain in pediatric populations, numerous age and psychometric evidence gaps remain for measuring pain-related outcomes related to the spine. Evaluation of responsiveness and minimal important change are two essential properties to allow better interpretation of treatment outcome relevant to pediatric patients.
Finally, longer established instruments for health-related quality of life such as EQ-5D-Y and HUI2/3 [48], are generic multi-attribute utility instruments with acceptable psychometric properties (comprehensive content, reliable, valid, and responsive) and comprehensive evidence on the psychometric performance but were not used in the assessment of treatment outcomes for spinal manipulation or mobilization in pediatric populations.
4.2. Quality of the evidence
Our thorough methodological assessment of included studies using the robust and validated COSMIN checklist across a variety of COAs is a strength of our review. COSMIN was developed in 2016 with structural and developmental validity [49]. Initial inter-rater reliability showed acceptable percentage agreement but low kappa values (61%, k < 0.40) [50]. Further improvements have been made to reliability via consensus across terminology [51]. A second addition has improved validity and clarification [52]. Some argue that the ‘worse score counts’ may have a floor effect but it is these serious flaws that contribute to low/poor overall interpretability of a COA. The main flaws in the retrieved articles were poor statistical analysis, poor methodological reporting, and low population numbers.
4.3. Potential biases in the review process
We were unable to limit the search parameters to the medical conditions discussed in the scoping review [2] due to the necessity to include norms and vague definitions of the medical conditions treated in the research. Our review was comprehensive, the search strategy identified more than 3000 citations for potential inclusion and over 248 studies were retrieved at full text. There were also two articles that we could not retrieve, although all efforts to contact the authors were made. Furthermore, three articles could not be translated by members of the IFOMPT/IOPTP Paediatric Spinal Manipulation Task Force. The COSMIN checklist is determined by expert opinion and the inter-rater reliability for COSMIN can be poor; we balanced this by holding group consensus discussions to ensure calibration and interpretation of checklist items. Our review was restricted to primarily the English language, albeit five Spanish and three Dutch articles were translated. Test validation across varied medical condition groups may be perceived as a limitation. Albeit test validation across populations should be explored and adds to the generalizability of the measure’s application.
4.4. Agreements and disagreements with other studies or reviews
To our knowledge there are no other systematic reviews assessing the psychometric properties of SQLI for scoliosis; VAS -for perceived influence of exertion and movement/position for low back problems; PAQLQ for asthma; Crying Diaries for infantile colic; ATEC for autism; or PedsQL for cerebral palsy or scoliosis. It is also important to note the research paper presented by Griffiths and collegues [53]. This study looked at the psychometrics of seven outcome measures used with a pediatric population. However, none of the assessment tools examined in this article were identified in the scoping review [2] as a tool used to determine the effectiveness of manual therapy.
5. Conclusions
5.1. Implications for practice
From the six (6) COAs used in 18 studies to measure outcomes of spinal manipulation or mobilization in pediatric populations in a previous scoping review, only two (ATEC and PedsQL) had moderate certainty of evidence for sufficient measurement properties to be used clinically.
5.2. Implications for research
Research on full psychometric properties across many COA continues to be needed. Validity was found to be indeterminate in the majority of the COAs studied. For ATEC and PedsQL, specific attention to responsiveness including minimal detectable change and minimal clinical difference is needed.
Supplementary Material
Acknowledgements
Jurgen Mollema, University of Applied Sciences, The Netherlands was our medical information specialist and research librarian. Derek Clewney, Duke University, USA is a member of the collaborative International Federation of Orthopaedic Manipulative Physical Therapists (IFOMPT) and International Organisation of Physical Therapists in Paediatrics (IOPTP) Task Force on Spinal Manipulation in Children and assisted with data screening.
Biographies
Tricia Hayton is a private practitioner in Oakville, Ontario, Canada. She was a graduate student of the OMPT program at McMaster University and completed this work as part of her research requirements.
Anita Gross is an Associate Clinical Professor at McMaster University on the School of Rehabilitation Sciences leading their advanced orthopedic musculoskeletal-manipulative physical therapy (OMPT) program. She is a lecturer in the Master’s of Clinical Science program in Manipulative Therapy at Western University and the Canadian Physiotherapy Association AIM program. She is the chair of the IFOMPT/ IOPTP Taskforce on Pediatric Manipulation informing PT policy with systematic reviews and evidence gap maps. She is a clinician scientist and educator. She has over 150 peer reviewed publications, has been principal/co-investigator on 30 grants and has been an invited speaker at 20 international conferences. She coordinates the Cervical Overview Group, an International Network that conducts and maintains Cochrane systematic reviews on neck pain and participates in randomized clinical trials on back pain (Welback). She works in private practice OMPT and is a Fellow of the Canadian Academy of Manipulative Physiotherapy (FCAMPT).
Annalie Basson is a clinician and part-time lecturer at the University of Witwatersrand, Johannesburg, South Africa working in private practice in Pretoria.
Ken Olson is the president and co-owner of the physical therapy private practice Northern Rehab Physical Therapy Specialists in DeKalb, Illinois and is adjunct faculty for Northern Illinois University. He is a Past-President of both the International Federation of Orthopaedic Manipulative Physical Therapists (IFOMPT) and the American Academy of Orthopaedic Manual Physical Therapists (AAOMPT).
Oliver Ang primary research interests are innovative interventions using digital technologies to address cervical disorders and contextual factors, particularly therapeutic alliance, in physical therapy treatments. He is currently involved in the Spinal Manipulation and Patient SelfManagement for Preventing Chronic Back Pain (PACBACK), the Integrated Supported Biopsychosocial Self-Management for Back Related Leg Pain (SUPPORT) and Partners4Pain studies, funded by the US National Institute of Health (NIH). He is a member of the validity assessment team of the Cervical Overview Group.
Nikki Milne works as an Associate Professor of Physiotherapy (Paediatrics) at Bond University where she has worked for the past 16 years. Prior to starting work in the academic setting Nikki worked as a Paediatric physiotherapist for NSW Health which led to her research interests in child health and wellbeing and paediatric curriculum. Nikki has a special interest in child health, learning and paediatric physiotherapy and is passionate about the inclusion of paediatric curriculum in entry-level physiotherapy programs, to ensure that all graduates of accredited entry-level programs have knowledge and skills to safely and effectively work with children.
Jan Pool has worked as Associate Professor Institute of Human Movement Studies, Faculty of Health Care and as a Coordinator/Head of Master Program Physical Therapy division; Orthopedic Manual Therapy. He was senior researcher of Research Group Lifestyle and Health, HU University of Applied Sciences Utrecht, Utrecht, The Netherlands. He worked as a manual therapist for over 30 years in a private clinic. His scientific interest culminated in a master degree in epidemiology in 2003 and a doctorate in medicine in 2007 both at the Free University Amsterdam. He wrote numerous articles on the topics neck pain, chronic pain and manipulative therapy and has a special interest in clinimetry. He was a member of the board of the Dutch Association of Manual therapy in The Netherlands (NVMT), from 1990 till 1998. From 2000-2016 he was a member of the Standard Committee of the International Federation of Manipulative Physical Therapy (IFOMPT). Jan became a member of the Spinal Manipulation Taskforce in 2020
Funding Statement
Funding was provided by the Canadian Academy of Manipulative Physiotherapy (CAMPT), Research Fund for CAMPT Accredited Programs. This is a private fund held within this organization. https://manippt.org/member-resources/.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Ethics and dissemination
Ethical approval was not required for this systematic review. Results were disseminated via a peer reviewed journal, profession specific position guidelines, and conferences.
Supplementary material
Supplemental data for this article can be accessed online at https://doi.org/10.1080/10669817.2023.2281650
References
- [1].Chiropractic Board . Chiropractic Board of Australia policy statement: interim policy on spinal manipulation for infants and young children. Melbourne; 2019. p. 1–2. Available from: https://www.chiropracticboard.gov.au/Codes-guidelines/Position-statements/Interim-policy-on-spinal-manipulation.aspx [Google Scholar]
- [2].Milne N, Longeri L, Patel A, et al. Spinal manipulation and mobilisation in the treatment of infants, children, and adolescents: a systematic scoping review. BioMed Central Pediatr. 2022. Dec;22(1):1–24. doi: 10.1186/s12887-022-03781-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].McKown S, Acquadro C, Anfray C, et al. Good practices for the translation, cultural adaptation, and linguistic validation of clinician-reported outcome, observer-reported outcome, and performance outcome measures. J Patient-Reported Outcomes. 2020. Dec;4(1):1–8. doi: 10.1186/s41687-020-00248-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Stratford P. Is this change real? Interpreting patient outcomes in Physical therapy. Philadelphia: F.A. Davis; 2013. [Google Scholar]
- [5].Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Syst Rev. 2021. Dec;10(1):1–1. doi: 10.1186/s13643-021-01626-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Mokkink LB, Prinsen CA, Patrick DL, et al. COSMIN study design checklist for patient-reported outcome measurement instruments. Amsterdam, The Netherlands. 2019. 1–32. [Google Scholar]
- [7].Hayton T, Gross A, Basson A, et al . Psychometric properties of clinician-reported and performance-based outcomes cited in a scoping review on spinal manipulation and mobilization for pediatric populations with diverse medical conditions: a systematic review. J Man Manipulate Therapy. 2009. Jan; doi: 10.1080/10669817.2023.2269038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Terwee CB, Jansma EP, Riphagen II, et al. Development of a methodological PubMed search filter for finding studies on measurement properties of measurement instruments. Qual Life Res. 2009. Oct;18(8):1115–1123. doi: 10.1007/s11136-009-9528-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Mokkink LB, Prinsen C, Patrick DL et al. COSMIN methodology for systematic reviews of patient-reported outcome measures (PROMs). User Manual. 2018. Feb;78(1). https://cosmin.nl/wp-content/uploads/COSMIN-syst-review-for-PROMs-manual_version-1_feb-2018.pdf [Google Scholar]
- [10].Terwee CB, Prinsen CA, Chiarotto A et al. COSIM methodology for assessing the content validity of PROMs – user manual. 2018. Available from: http://www/cosmin.nl/
- [11].Parent EC, Hill D, Moreau M, et al. Score distribution of the Scoliosis quality of Life index questionnaire in different subgroups of patients with adolescent idiopathic scoliosis. Spine. 2007 Jul 15;32(16):1767–1777. doi: 10.1097/BRS.0b013e3180b9f7a5 [DOI] [PubMed] [Google Scholar]
- [12].Feise RJ, Donaldson S, Crowther ER, et al. Construction and validation of the scoliosis quality of life index in adolescent idiopathic scoliosis. Spine. 2005 Jun 1;30(11):1310–1315. doi: 10.1097/01.brs.0000163885.12834.ca [DOI] [PubMed] [Google Scholar]
- [13].Staes F, Stappaerts K, Vertommen H, et al. Visual analogue scale for the perceived influence of exertion and movements/positions on low back problems in surveys of adolescents. Acta Paediatrica. 2000. Jun;89(6):713–716. doi: 10.1111/j.1651-2227.2000.tb00371.x [DOI] [PubMed] [Google Scholar]
- [14].Juniper EF, Guyatt GH, Feeny DH, et al. Measuring quality of life in children with asthma. Qual Life Res. 1996;5(1):35–46. doi: 10.1007/BF00435967 [DOI] [PubMed] [Google Scholar]
- [15].Barr RG, Kramer MS, Boisjoly C, et al. Parental diary of infant cry and fuss behaviour. Arch Dischildhood. 1988 Apr 1;63(4):380–387. doi: 10.1136/adc.63.4.380 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Abaoud AA, Almalki NS, Bakhiet SF, et al. Psychometric properties of the Autism treatment evaluation checklist in Saudi Arabia. Res Autism Spectr Disord. 2020 Aug 1;76:101604. doi: 10.1016/j.rasd.2020.101604 [DOI] [Google Scholar]
- [17].Al Backer NB. Correlation between Autism treatment evaluation checklist (ATEC) and Childhood Autism rating Scale (CARS) in the evaluation of autism spectrum disorder. Sudan J Paediatr. 2016;16(1):17–22. [PMC free article] [PubMed] [Google Scholar]
- [18].Freire MH, André AM, Kummer AM. Test-retest reliability, and concurrent validity of Autism treatment evaluation checklist (ATEC). Rev Bras de Psiquiatr. 2018. Jan;67:63–64. doi: 10.1590/0047-2085000000186 [DOI] [Google Scholar]
- [19].Geier DA, Kern JK, Geier MR. A comparison of the Autism treatment evaluation checklist (ATEC) and the Childhood Autism rating Scale (CARS) for the quantitative evaluation of autism. J Mental Health Res Intellectual Disabilities. 2013 Oct 1;6(4):255–267. doi: 10.1080/19315864.2012.681340 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Magiati I, Moss J, Yates R, et al. Is the Autism treatment evaluation checklist a useful tool for monitoring progress in children with autism spectrum disorders? J Intellectual Disability Res. 2011;55(3):302–312. doi: 10.1111/j.1365-2788.2010.01359.x [DOI] [PubMed] [Google Scholar]
- [21].Mahapatra S, Khokhlovich E, Martinez S, et al. Longitudinal epidemiological study of autism subgroups using Autism treatment evaluation checklist (ATEC) score. J Autism Dev Disord. 2020. May;50(5):1497–1508. doi: 10.1007/s10803-018-3699-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Memari AH, Shayestehfar M, Mirfazeli FS, et al. Cross-cultural adaptation, reliability, and validity of the autism treatment evaluation checklist in persian. Iran J Pediatr. 2013. Jun;23(3):269–275. [PMC free article] [PubMed] [Google Scholar]
- [23].Sunakarach K, Kessomboon P. Validity and reliability of the Thai version of the Autism treatment evaluation checklist: a two-phase diagnostic accuracy study. F1000Res. 2018 May 3;7(538):538. doi: 10.12688/f1000research.14537.1 [DOI] [Google Scholar]
- [24].Cheung PW, Wong CK, Cheung JP. Comparative study of the use of paediatric quality of Life Inventory 4.0 generic core scales in paediatric patients with spine and limb pathologies. Bone Joint J. 2020. Jul;102(7):890–898. doi: 10.1302/0301-620X.102B7.BJJ-2019-1766.R2 [DOI] [PubMed] [Google Scholar]
- [25].Seid M, Limbers CA, Driscoll KA, et al. Reliability, validity, and responsiveness of the pediatric quality of life inventory (PedsQL) generic core scales and asthma symptoms scale in vulnerable children with asthma. J Asthma. 2010. Mar;47(2):170–177. doi: 10.3109/02770900903533966 [DOI] [PubMed] [Google Scholar]
- [26].Tantilipikorn P, Watter P, Prasertsukdee S. Feasibility, reliability and validity of the Thai version of the pediatric quality of life inventory 3.0 cerebral palsy module. Qual Life Res. 2013. Mar;22(2):415–421. doi: 10.1007/s11136-012-0161-3 [DOI] [PubMed] [Google Scholar]
- [27].Varni JW, Seid M, Knight TS, et al. The PedsQLTM 4.0 generic core scales: sensitivity, responsiveness, and impact on clinical decision-making. J Behav Med. 2002. Apr;25(2):175–193. doi: 10.1023/A:1014836921812 [DOI] [PubMed] [Google Scholar]
- [28].Varni JW, Seid M, Kurtin PS. PedsQL™ 4.0: reliability and validity of the pediatric quality of Life Inventory™ version 4.0 generic core scales in healthy and patient populations. Med care. 2001. Aug;1:800–812. doi: 10.1097/00005650-200108000-00006 [DOI] [PubMed] [Google Scholar]
- [29].Desai AD, Zhou C, Stanford S, et al. Validity and responsiveness of the pediatric quality of life inventory (PedsQL) 4.0 generic core scales in the pediatric inpatient setting. JAMA Pediatr. 2014 Dec 1;168(12):1114–1121. doi: 10.1001/jamapediatrics.2014.1600 [DOI] [PubMed] [Google Scholar]
- [30].Prinsen CA, Mokkink LB, Bouter LM, et al. COSMIN guideline for systematic reviews of patient-reported outcome measures. Qual Life Res. 2018. May;27(5):1147–1157. doi: 10.1007/s11136-018-1798-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Santesso N, Glenton C, Dahm P, et al. GRADE guidelines 26: informative statements to communicate the findings of systematic reviews of interventions. J Clinical Epidemiol. 2020 Mar 1;119:126–135. doi: 10.1016/j.jclinepi.2019.10.014 [DOI] [PubMed] [Google Scholar]
- [32].Lai MC, Lerch JP, Floris DL, et al. Imaging sex/gender and autism in the brain: etiological implications. J Neurosci Res. 2017 Jan 2;95(1–2):380–397. doi: 10.1002/jnr.23948 [DOI] [PubMed] [Google Scholar]
- [33].Pruijs JE, Stengs C, Keessen W. Parameter variation in stable scoliosis. Eur Spine J. 1995. Jun;4(3):176–179. doi: 10.1007/BF00298242 [DOI] [PubMed] [Google Scholar]
- [34].Lalloo C, Mesaroli G, Makkar M, et al. Outcome measures for pediatric pain: practical guidance on clinical use in juvenile arthritis. Arthritis Care Res (Hoboken). 2020. Oct;72(Suppl 10):358–368. doi: 10.1002/acr.24217 [DOI] [PubMed] [Google Scholar]
- [35].Edelson S. ATEC: development and application. 2021. Available from: http://www.autism.org/autism-treatment-evaluation-checklist/atec-development.
- [36].Cohn A. Improvement in Autism spectrum disorder following vertebral subluxation reduction: a case study. J Pediatr Matern Fam Health. 2011;Sep:87–89. [Google Scholar]
- [37].Khorshid KA, Sweat RW, Zemba DA, et al. Clinical efficacy of upper cervical versus full spine chiropractic care on children with autism: a randomized clinical trial. J Vertebr Subluxat Res. 2006. Mar;9:1–7. [Google Scholar]
- [38].Kronau S, Thiel B, Jakel A, et al. Clinical effects of spinal manipulation in the management of children and young adults diagnosed with autism spectrum disorder – a systematic review of the literature. J Clin Pediatr. 2016;15(3):1280–1291. [Google Scholar]
- [39].Marini N, Marini S. Improvement in Autism in a child coupled with reduction in vertebral subluxations: a case study and selective review of the literature. J Pediatr Matern Fam Health. 2010. July;107–114. [Google Scholar]
- [40].Balon J, Aker PD, Crowther ER, et al. A comparison of active and simulated chiropractic manipulation as adjunctive treatment for childhood asthma. N Engl J Med. 1998 Oct 8;339(15):1013–1020. doi: 10.1056/NEJM199810083391501 [DOI] [PubMed] [Google Scholar]
- [41].Lauridsen HH, Hestbaek L. Development of the young spine questionnaire. BMC Musculoskelet Disord. 2013;14(1):185. doi: 10.1186/1471-2474-14-185 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [42].Meldgaard E, Lauridsen HH, Hestbaek L. The young Disability Questionnaire-spine: item development, pilot testing and conceptualisation of a questionnaire to measure consequences of spinal pain in children. BMJ Open. 2021;11(5):e045580. doi: 10.1136/bmjopen-2020-045580 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [43].Lauridsen HH, Meldgaard E, Hestbæk L, et al. Development of the young Disability Questionnaire (spine) for children with spinal pain: field testing in Danish school children. BMJ Open. 2023;13(5):e064382. doi: 10.1136/bmjopen-2022-064382 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Hicks CL, von Baeyer CL, Spafford PA, et al. The Faces pain Scale–Revised: toward a common metric in pediatric pain measurement. Pain. 2001;93(2):173–183. doi: 10.1016/S0304-3959(01)00314-1 [DOI] [PubMed] [Google Scholar]
- [45].Huguet A, Stinson JN, McGrath PJ. Measurement of self-reported pain intensity in children and adolescents. J Psychosom Res. 2010;68(4):329–336. doi: 10.1016/j.jpsychores.2009.06.003 [DOI] [PubMed] [Google Scholar]
- [46].von Baeyer CL, Spagrud LJ, McCormick JC, et al. Three new datasets supporting use of the numerical rating Scale (NRS-11) for children’s self-reports of pain intensity. Pain. 2009. Jun;143(3):223–227. doi: 10.1016/j.pain.2009.03.002 [DOI] [PubMed] [Google Scholar]
- [47].Michaleff ZA, Kamper SJ, Stinson JN, et al. Measuring musculoskeletal pain in infants, children, and adolescents. J Orthop Sports Phys Ther. 2017. Oct;47(10):712–730. doi: 10.2519/jospt.2017.7469 [DOI] [PubMed] [Google Scholar]
- [48].Kwon J, Smith S, Raghunandan R, et al. Systematic review of the psychometric performance of generic Childhood multi-attribute utility instruments. Appl Health Econ Health Pol. 2023. May;3:1–26. doi: 10.1007/s40258-023-00806-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [49].Gagnier JJ, Lai J, Mokkink LB, et al. COSMIN reporting guideline for studies on measurement properties of patient-reported outcome measures. Qual Life Res. 2021. Aug;30(8):2197–2218. doi: 10.1007/s11136-021-02822-4 [DOI] [PubMed] [Google Scholar]
- [50].Mokkink LB, Terwee CB, Gibbons E, et al. Inter-rater reliability of the COSMIN (COnsensus-based standards for the selection of health status measurement instruments) checklist. Qual Life Res. 2010. Oct;19:25–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [51].Mokkink L, Terwee C, Patrick D, et al. International consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes: results of the COSMIN study. J Clinical Epidemiol. 2010;63:737–745. doi: 10.1016/j.jclinepi.2010.02.006 [DOI] [PubMed] [Google Scholar]
- [52].Mokkink LB, Terwee CB, Knol DL, et al. The COSMIN checklist for evaluating the methodological quality of studies on measurement properties: a clarification of its content. BMC Med Res Methodol. 2010. Dec;10(1):1–8. doi: 10.1186/1471-2288-10-22 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [53].Griffiths A, Toovey R, Morgan PE, et al. Psychometric properties of gross motor assessment tools for children: a systematic review. BMJ Open. 2018;8:1–14. doi: 10.1136/bmjopen-2018-021734 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

