Skip to main content
Springer logoLink to Springer
. 2024 Jul 30;33(11):3003–3012. doi: 10.1007/s11136-024-03742-9

Measurement invariance of the PROMIS emotional distress and subjective well-being domains among autistic and General Population adolescents

Elizabeth A Kaplan-Kahn 1,, Rachel M Benecke 1,2, Whitney Guthrie 1,3, Benjamin E Yerys 1,3, Laura Graham Holmes 4, Judith S Miller 1,3
PMCID: PMC11541282  PMID: 39080090

Abstract

Purpose

Quality of life (QoL) is identified as a clinical and research priority by the autistic community. Researchers have the responsibility to ensure that instruments used to measure QoL do so reliably and accurately among autistic participants.

Methods

Our study evaluated measurement invariance of Emotional Distress (Depression, Anxiety, Anger, Psychological Stress) and Subjective Well-Being (Life Satisfaction, Positive Affect, and Meaning & Purpose) scales of the Patient-Reported Outcomes Measurement Information System (PROMIS) among groups of autistic (N=140, n per scale=132–140) and general population (N=1,224, n per scale=406–411) teenagers (14–17 years). These scales were included in the PROMIS Autism Battery-Lifespan, which uses PROMIS scales to measure QoL domains most relevant for autistic people.

Results

Multi-group confirmatory factor analyses using permutation tests demonstrated that Depression and Positive Affect scales exhibited scalar invariance between groups, indicating that scores can be meaningfully compared across autistic and general population teens. Anger and Psychological Stress scales demonstrated metric invariance between groups, indicating that these scales measure the same latent trait in both groups, but group comparisons are not supported.

Conclusion

We provide guidance as to how these scales can be used in psychometrically supported ways to capture constructs relevant for understanding QoL among autistic teens.

Supplementary Information

The online version contains supplementary material available at 10.1007/s11136-024-03742-9.

Keywords: Autism, Adolescence, Measurement invariance, PROMIS, Self-report, Patient-reported outcomes

Introduction

Autistic1 advocates and their families prioritize research and clinical services focused on outcomes [36]. Prominent among these outcomes is quality of life (QoL), which the World Health Organization defines as “an individual’s perception of their position in life in the context of the culture and value systems in which they live and in relation to their goals, expectations, standards, and concerns” [7]. Maximizing QoL among autistic individuals is often identified as a goal across a wide range of disciplines in autism research [8]. Agencies and organizations at national (e.g., the Interagency Autism Coordinating Committee) and international (e.g., the World Health Organization) levels have also emphasized large gaps in our current understanding of how support services and care models can increase QoL among autistic people [9]. However, to maximize QoL among autistic people, researchers need to ensure that they are using psychometrically reliable and valid tools to measure autistic QoL.

Psychometric validation of QoL measures is critical when considering their use in autistic populations. If a measure’s functioning has not been tested within autistic populations, autism researchers cannot assume they are accurately measuring QoL. Making the assumption that a measure functions the same way across populations without evidence poses a threat to the measure’s validity, impeding its clinical utility [10]. In qualitative studies, autistic individuals and their parents discussed the items of popular QoL measures and indicated multiple themes impacting the validity of their use within autistic populations, including emotional vocabulary and misinterpretation of items [11]. Thus, there is a clear need to test the psychometric properties of QoL instruments developed for use in the general population in order to confidently use these measures in autistic populations.

Literature examining psychometric properties of QoL measures among autistic individuals is largely focused on adults [12]. Findings from this nascent literature demonstrate potentially questionable psychometric support for popular QoL measures developed for the general population. For example, the World Health Organization Quality of Life-Brief Version (WHOQOL-BREF) [13] demonstrated mixed fit indices in a sample of autistic adults and required four iterations of post-hoc modifications to achieve acceptable fit [14]. Psychometric support for the autism-specific supplement to the WHOQOL-BREF, the Autism Spectrum Quality of Life [14, 15], has also been mixed [16]. The Patient-Reported Outcomes Measurement Information System Global Health survey (PROMIS Global-10), demonstrated promising psychometric properties, and minimal differential item functioning across subgroups of autistic adults [17]. These results provide the evidence necessary to confidently use the PROMIS Global-10 as a reliable and valid indicator of QoL in autistic adults. However, this work did not explore whether such a measure performed similarly across autistic and general population groups. Therefore, for research seeking to compare QoL scores across groups of participants, open questions remain about whether the PROMIS scales demonstrate measurement invariance across autistic and general population groups.

When comparing scores from any given scale across two or more groups, researchers make an implicit, yet consequential, assumption that the scale measures the same construct in the same way across the groups. This assumption is known as measurement invariance, which is defined as “whether or not, under different conditions of observing and studying phenomena, measurement operations yield measures of the same attribute” [18]. Comparisons and interpretations of scores between groups are only meaningful if the scale demonstrates measurement invariance [19, 20]. Research has explored measurement invariance across autistic and general population groups for domains such as cognitive functioning [2123], behavioral concerns [24], sleep impairment [25], and depression [26, 27]. Given the burgeoning emphasis on incorporating QoL outcomes into autism research, it is crucial to evaluate measurement invariance in QoL measures.

To measure autistic QoL, Graham Holmes et al. (2020) leveraged patient-reported outcome measures from the National Institutes of Health (NIH) [28] to curate a specialized battery for assessing QoL across the lifespan for individuals on the autism spectrum: the PROMIS Autism Battery – Lifespan (PAB-L). Each measure includes scales assessing various domains of QoL, including health, emotional distress, subjective well-being, and social functioning. The PAB-L demonstrated good reliability, feasibility, and acceptability among a large (N = 912) sample of autistic individuals and their families [29]. Despite these strengths, it remains unclear whether items on the PAB-L contribute to the measurement of QoL in the same way across autistic and general populations.

The primary objective of the current study was to evaluate the measurement invariance of the PAB-L Emotional Distress (Depression, Anger, Anxiety, Psychological Stress) and Subjective Well-Being (Life Satisfaction, Positive Affect, and Meaning & Purpose) scales among autistic and general population teens. Using existing data from the original PAB-L study [29] and from publicly available PROMIS pediatric scores from a nationally representative sample of participants [30], we conducted secondary data analyses using multi-group confirmatory factor analyses (CFA) to assess for measurement invariance across autistic and general population adolescents. We focused our analyses on self-report because we were primarily interested in the measurement of lived experience of QoL, which is best captured by self-report.

Methods

Participants

Teenagers and their parents participated via online surveys. Parents reported on their teen’s demographic information in both autistic and general population groups. Participants were considered eligible if they were between the ages of 14 and 17 years old. Demographic characteristics, including age, gender, race, and ethnicity, are reported for participants with available data for at least one PROMIS scale (Table 1). All recruitment and study procedures were approved by the institutional review board at the Children’s Hospital of Philadelphia.

Table 1.

Demographic information

Variable Autistic Group General Population Group
Totals 140 (132–140 per scale) 1,224 (406–411 per scale)
Age
 14 35 (25.71%) 293 (23.94%)
 15 36 (25.71%) 313 (25.57%)
 16 36 (25.71%) 321 (26.23%)
 17 33 (23.57%) 297 (24.26%)
Gender
 Female 37 (26.43%) 604 (49.35%)
 Male 100 (71.43%) 620 (50.65%)
 Transgender 3 (2.14%) N/A
Race
 Asian 0 (0%) 46 (3.76%)
 Black 5 (3.57%) 114 (9.31%)
 Native American 0 (0%) 5 (0.41%)
 White 130 (92.86%) 924 (75.49%)
 Other or more than one race 5 (3.57%) 165 (13.48%)
 Missing 0 (0%) 1 (0.08%)
Ethnicity
 Hispanic or Latino/a 9 (6.43%) 165 (13.48%)
 Not Hispanic or Latino/a 131 (93.57%) 1,058 (86.44%)
 Missing 0 (0%) 1 (0.08%)

Autistic sample

For our autistic sample, we used adolescent self-report data from Graham Holmes et al. (2020). Participants were recruited via the Children’s Hospital of Philadelphia or the Interactive Autism Network between November 2017 and June 2018. 140 participants responded to at least one PROMIS scale (Table 2). The sample of autistic adolescents was well distributed across ages, predominantly white (92.86%) and non-Hispanic/Latino (93.57%). Approximately three times as many autistic participants identified as male (n = 100; 71.43%) than identified as female (n = 37; 26.43%), and three autistic adolescents identified as transgender (Table 1).

Table 2.

Domain summaries by group

Scale Autistic Group General Population Group
n Cronbach’s alpha M (SD) n Cronbach’s alpha M (SD)
Emotional Distress
 Anger 132 0.89 51.13 (11.10) 406 0.91 46.15 (10.52)
 Anxiety 139 0.93 53.08 (10.90) 406 0.92 44.29 (9.77)
 Depressive symptoms 139 0.95 53.42 (11.43) 411 0.95 46.85 (10.40)
 Psychological stress 138 0.95 59.06 (9.70) 407 0.94 52.11 (9.74)
Subjective Well-Being
 Life satisfaction 139 0.96 43.63 (10.17) 407 0.96 46.96 (9.43)
 Meaning and purpose 138 0.94 41.67 (10.10) 411 0.93 48.60 (9.12)
 Positive affect 140 0.93 45.64 (8.95) 406 0.94 48.12 (8.89)

Note M = Mean, SD = Standard Deviation

General population sample

Our comparison sample comes from a nationally representative normative sample of the PROMIS [30]. Participants were recruited by GfK Knowledge Panel, an extant online probability panel of participants in the United States. For adolescent self-report, 1,627 participants responded to at least one PROMIS scale, with approximately 400 participants responding per scale (Table 2). Like the autistic sample, general population teens were distributed evenly across the range of ages. Recruitment for the general population sample aimed to be representative of the United States population, reflected in a balanced gender ratio (50.52% identifying as male) and increased racial and ethnic diversity, albeit still mostly white (76.58%) and non-Hispanic/Latino (86.11%). No response option for identifying as transgender was included for the general population sample (Table 1).

Measure

The PROMIS pediatric scales are measures of QoL, publicly available through the NIH [31]. Each scale is unidimensional [3238], assessing a single factor driving QoL, and can be administered individually or as part of a wider QoL battery. PROMIS scales were validated on representative samples of the U.S. population [39]. Responses for each scale item range from 1 to 5, with higher scores corresponding to higher endorsement of the construct being measured. Most PROMIS scales are available in three formats: parent proxy (for 5–17-year-olds), pediatric self-report (for 8–17-year-olds), and adult self-report (for 18 + year-olds). Many scales include short forms, which consist of 4 to 8 items and are scored through the Assessment Center (www.assessmentcenter.net) [40]. Short forms of all the self-report scales used in the current study have demonstrated no meaningful differential item functioning as a function of gender [3237]; thus, while autistic and general population teens differed in their gender ratios (Table 1), prior evidence supports the conclusion that the scales perform similarly across genders.

The present study evaluated adolescent self-report PROMIS scales related to the domains of Emotional Distress (Anger, Anxiety, Depression, Psychological Stress scales) and Subjective Well-Being (Life Satisfaction, Positive Affect, Meaning & Purpose scales). Though the Emotional Distress and Subjective Well-Being domains do not represent higher order factors, they are drawn from past literature and theoretical judgement of the PROMIS scales [29, 39, 41]. We evaluated the short form version of each scale, consisting of 5 items for Anger and 8 items for all other scales (i.e., Anxiety, Depression, Psychological Stress, Life Satisfaction, Positive Affect, and Meaning & Purpose).

Data analysis

We first fit CFA models to the baseline PROMIS models for the combined group of participants using the lavaan package in R [42]. Models were identified by setting factor variance to 1. Given the ordinal nature of the PROMIS data (reported on a Likert scale), we used the robust diagonally weighted least squares estimator and pairwise deletion was used to handle missing values. Due to limitations to using standard fit indices when using diagonally weighted least squares estimation [43], we evaluated model fit using robust categorical maximum-likelihood estimators including Comparative Fit Index (CFIcML), Tucker-Lewis Index (TLIcML), and Standardized Root Mean Squared Residual (SRMRcML). These fit indices were evaluated in assessing adequate model fit, including: CFIcML and TLIcML values of ≥ 0.95, and SRMRcML values of ≤ 0.08 [44, 45]. Root Mean Square Error of Approximation (RMSEA) values exhibit large standard errors in models with few items and factors [46], and above-threshold RMSEA values are common among the PROMIS pediatric scales, even among their validation samples [33, 35, 47]; thus, given that each PROMIS scale is unidimensional and short forms of each scale were used in the current study, we did not evaluate model fit using RMSEA.

Next, to evaluate whether measurement of QoL domains varied between the general population sample and the autistic sample, we fit a series of increasingly restrictive multigroup CFA models to the data [45, 48] using lavaan [42] and semTools [49, 50]. Multigroup CFA models were conducted for each scale separately. First, multigroup CFA models were tested for configural invariance by allowing all factor loadings and item intercepts to vary freely for both groups. This analysis examined whether the overall factor structure of the measure fit well for both groups. Second, metric invariance was evaluated by testing for differences in fit between configural CFA models and models in which factor loadings were constrained to be equivalent across groups while allowing item intercepts to vary freely. This analysis tested whether the pattern of item loadings was equivalent across groups. Third, we tested the CFA models for scalar invariance by testing for differences in fit between the metric CFA models and models in which both factor loadings and item intercepts were constrained to be equivalent across groups.

Recommendations for criteria to evaluate changes in model fit within invariance models vary widely and depend on several relevant factors such as sample size, difference in sample size between the groups, and level of invariance [44, 46, 51, 52]. Given these limitations, we used permutation tests [49] to evaluate the hypothesis of measurement invariance and determine whether the ∆χ2 and the ∆CFI between the increasingly restrictive models were significant. As reported in Jorgensen et al., 2018, permutation tests provide better Type-I error control than commonly used benchmarks. We generated null distributions for all measurement invariance tests based on 1,000 permutated datasets, and non-significant p-values (≥ 0.01) indicated evidence of measurement invariance (i.e., no significant decrease in model fit between increasingly restrictive models). To establish invariance, we required both ∆χ2 and ∆CFI to be non-significant.

There were cases on the Anxiety and Positive Affect Scales where a very small number of participants endorsed a particular response for an item, such that after pairwise deletion and permutation, no instances of that response was present in one group or the other. This occurred in both the autistic and general population teen samples. To address this issue, we collapsed adjacent response categories (Table S1) for these scales when testing for measurement invariance [53, 54].

When configural invariance was not established based on at least one significant permuted fit indicator, model modification indices were subsequently examined for each scale and modifications to the model were implemented after ensuring a theoretical justification for the modification. In line with past literature testing measurement invariance between autistic and nonautistic groups [22], the modified models were then used in subsequent measurement invariance steps (when modifications were required, table S2).

Results

Internal reliability

Across both autistic and general population teens, the PROMIS scales demonstrated good to excellent internal consistency (alpha range 0.89–0.96, Table 2). The Anger scale demonstrated the lowest Cronbach’s alpha values for both groups, which is expected given that this scale had the fewest items.

Confirmatory factor analysis

Initial baseline model fits for the PROMIS scales for the combined sample were adequate to excellent for all scales across all fit indices (Table 3). Likewise, the fits of the scale models in the autistic and general population samples were adequate to excellent across all fit indices (Table 4).

Table 3.

Confirmatory factor analysis model results – combined sample

Model X 2 df p CFIcML TLIcML SRMRcML
Emotional Distress
 Anger 102.737 5 < 0.001 0.948 0.895 0.039
 Anxiety 124.666 20 < 0.001 0.940 0.916 0.032
 Depressive Symptoms 125.392 19 < 0.001 0.964 0.946 0.020
 Psychological Stress 104.996 19 < 0.001 0.978 0.968 0.021
Subjective Well-Being
 Life Satisfaction 242.282 19 < 0.001 0.954 0.933 0.026
 Meaning & Purpose 216.716 19 < 0.001 0.942 0.915 0.035
 Positive Affect 180.392 18 < 0.001 0.948 0.919 0.032

Note df = Degrees of Freedom, CFI = Comparative Fit Index, TLI = Tucker-Lewis Index, SRMR = Standardized Root Mean Squared Residual, cML = Categorical Maximum Likelihood Estimator

Table 4.

Confirmatory factor analysis model results by group

Model X 2 df p CFIcML TLIcML SRMRcML
Autistic Teens
Emotional Distress
 Anger 69.530 5 < 0.001 0.961 0.921 0.074
 Anxiety 92.229 20 < 0.001 0.978 0.969 0.050
 Depressive Symptoms 61.743 19 < 0.001 0.995 0.993 0.031
 Psychological Stress 59.820 19 < 0.001 0.993 0.989 0.034
Subjective Well-Being
 Life Satisfaction 85.842 19 < 0.001 0.920 0.883 0.035
 Meaning & Purpose 101.802 19 < 0.001 0.984 0.976 0.049
 Positive Affect 40.630 18 < 0.001 0.957 0.934 0.030
General Population Teens
Emotional Distress
 Anger 35.340 5 < 0.001 0.996 0.991 0.028
 Anxiety 73.744 20 < 0.001 0.993 0.991 0.029
 Depressive Symptoms 67.167 19 < 0.001 0.997 0.996 0.020
 Psychological Stress 88.006 19 < 0.001 0.995 0.992 0.028
Subjective Well-Being
 Life Satisfaction 239.379 19 < 0.001 0.962 0.944 0.023
 Meaning & Purpose 133.062 19 < 0.001 0.987 0.980 0.040
 Positive Affect 170.662 18 < 0.001 0.932 0.894 0.040

Note df = Degrees of Freedom, CFI = Comparative Fit Index, TLI = Tucker-Lewis Index, SRMR = Standardized Root Mean Squared Residual, cML = Categorical Maximum Likelihood Estimator

Measurement invariance

Results of the measurement invariance analyses for the Emotional Distress scales are presented in Table 5. The Depression scale was the only measure that demonstrated metric and scalar invariance, as evidenced by nonsignificant (all p’s < 0.01) permutation testing with increasing equality constraints. The Anger and Psychological Stress scales both demonstrated metric invariance but not scalar invariance. Specifically, the chi-square difference test and ∆CFI values for the configural and metric models for the Anger scale were nonsignificant; however, the Anger scale exhibited significant chi-squared (p = .009) and ∆CFI values (p = .009) when testing for scalar invariance. The Psychological Stress scale also exhibited similar fit between the configural and metric models, but significant deterioration in model fit when testing for scalar invariance, with significant chi squared (p < .001) and ∆CFI values (p < .001). Finally, the Anxiety scale did not demonstrate metric or scalar invariance based on ∆CFI (p = .007).

Table 5.

Model comparison statistics (w/ permutation) – emotional distress

Model Model Comparison Statistics
Models ΔX 2 p ΔCFI p
Anger
 1 – Configural Null vs. 1 51.956 0.211 0.997 0.156
 2 – Metric 1 vs. 2 4.190 0.754 <-0.001 0.854
 3 – Scalar 2 vs. 3 14.166 0.009* <-0.001 0.009*
Anxiety
 1 – Configural Null vs. 1 62.083 0.841 0.999 0.299
 2 – Metric 1 vs. 2 40.601 0.019 − 0.002 0.007*
 3 – Scalar 2 vs. 3 -1.831 0.439 0.001 0.615
Depression
 1 – Configural Null vs. 1 52.711 0.682 1.000 0.461
 2 – Metric 1 vs. 2 24.082 0.290 <-0.001 0.222
 3 – Scalar 2 vs. 3 -2.044 0.486 < 0.001 0.673
Psychological stress
 1 – Configural Null vs. 1 56.710 0.042 < 0.999 0.017
 2 – Metric 1 vs. 2 14.474 0.488 <-0.001 0.448
 3 – Scalar 2 vs. 3 38.374 < 0.001* <-0.001 < 0.001*

Note ΔCFI = Change in Comparative Fit Index, * indicates significant p-value < 0.01

Results of the measurement invariance analyses for the Subjective Well-Being scales are presented in Table 6. The Positive Affect scale demonstrated metric and scalar invariance, with nonsignificant chi-square difference test and ∆CFI values between increasingly restrictive models. The Life Satisfaction scale did not demonstrate metric or scalar invariance based on both chi-square difference tests (p = .008) and ∆CFI value (p = .005). The Meaning & Purpose scale did not demonstrate metric or scalar invariance based significant ∆CFI (p = .007).

Table 6.

Model comparison statistics (w/ permutation) – subjective wellbeing

Model Model Comparison Statistics
Models ΔX2 p DCFI p
Life Satisfaction
 1 – Configural Null vs. 1 100.734 0.999 > 0.999 0.993
 2 – Metric 1 vs. 2 85.488 0.008* <-0.001 0.005*
 3 – Scalar 2 vs. 3 -60.401 0.991 0.001 0.993
Meaning & Purpose
 1 – Configural Null vs. 1 102.971 0.241 0.998 0.010
 2 – Metric 1 vs. 2 52.788 0.020 − 0.001 0.007*
 3 – Scalar 2 vs. 3 -21.834 0.922 0.001 0.974
Positive Affect
 1 – Configural Null vs. 1 103.168 0.295 0.999 0.228
 2 – Metric 1 vs. 2 24.401 0.379 <-0.001 0.363
 3 – Scalar 2 vs. 3 9.573 0.081 < 0.001 0.086

Note ΔCFI = Change in Comparative Fit Index, * indicates significant p-value < 0.01

Discussion

The current study is the first to document the varying levels of measurement invariance between the self-reported QoL of autistic and general population teenagers across seven PROMIS pediatric scales. Testing measurement invariance is a critical step towards assuring that measures that were developed and validated on non-autistic populations function in the same way for autistic individuals. Our focus on the self-report of teenagers emphasizes the lived experience of QoL across the community-prioritized domains of Emotional Distress (Depression, Anger, Anxiety, and Psychological Stress) and Subjective Well-Being (Life Satisfaction, Positive Affect, and Meaning & Purpose). Such efforts are essential for validating patient-reported outcomes for use in autism research [55]. Taken together, the current study addresses an important gap in the literature on autistic QoL by providing the psychometric validation necessary to confidently use these scales to measure various dimensions of QoL among autistic youth.

Our results highlight varying degrees of measurement invariance on the Emotional Distress and Subjective Well-Being self-report scales for autistic and general population teens. The Depression and Positive Affect scales demonstrated configural, metric, and scalar invariance between the two groups. These results suggest that the constructs are measured in a similar fashion among autistic teens as they are in the general population and that the scores from these scales can be compared meaningfully across autistic and general population groups.

The Anger and Psychological Stress scales both demonstrated configural and metric, but not scalar, invariance between the self-report of autistic teens and general population teens. These results suggest that the scales’ items capture the respective latent constructs across both groups, but researchers should not compare means between autistic and general populations on these self-report scales. The scales demonstrate a degree of measurement bias such that equivalent scores on the scales do not necessarily imply equal levels of anger or psychological stress in autistic versus non-autistic teens. Finally, the Anxiety, Life Satisfaction, and Meaning & Purpose teen self-report scales did not demonstrate metric or scalar invariance between the groups, indicating that these PROMIS scales may function differently between autistic and general population teens. These results suggest that the scales do not capture the intended constructs in autistic teens in the same way as they do in non-autistic teens.

Our findings for varying levels of measurement invariance of the PROMIS scales between autistic and general population teens is not surprising given that these measures were not specifically developed to measure QoL among autistic teens. The PROMIS pediatric Depression, Anxiety, and Anger scales were developed and standardized on a large sample of children and teens recruited from pediatric clinics and school settings [3638]. Similarly, the PROMIS pediatric Psychological Stress [32], Life Satisfaction [33], Positive Affect [35], and Meaning & Purpose [34] scales were developed and standardized with samples of children from opt-in online panels, school districts, and hospital clinic settings. These samples included children with chronic health conditions, such as asthma, Attention-Deficit/Hyperactivity Disorder, and gastrointestinal disorders; however, autistic children were not reported as a part of the measures’ development samples. The fact that autistic individuals were not knowingly included in these development samples should not negate the tremendous amount of work that goes into creating and validating important patient reported outcome measures to capture different dimensions of QoL in pediatric populations. However, researchers and clinicians using these measures as outcomes in autism research and/or clinical work should do so with the knowledge that most of the PAB-L scales are not specifically validated for use with autistic teens.

To our knowledge, the present investigation is the first study to investigate the factorial validity of the self-report of autistic teens using the PROMIS scales. While previous research has investigated measurement invariance of PROMIS scales between parent-proxy reports for autistic and non-autistic children [25], self-report of QoL domains, particularly among autistic individuals, is likely a better estimation of an individual’s lived experiences of QoL. Future research will benefit from investigations of how PROMIS scales function similarly or differently depending on reporter (e.g., parent-proxy vs. self-report) in capturing autistic QoL. Previous research has highlighted ways in which the measurement of subjective experiences differ between parent-report and autistic adolescent self-report, including social anxiety [56] and sensory sensitivity [57]. Thus, psychometric investigations of QoL measures which include autistic self-report are likely to improve our measurement, and therefore understanding, of autistic QoL.

Beyond the individual scale findings presented, the current investigation highlights an important gap in the field of autism research. Put simply, the rigor with which we measure QoL in autism lags significantly behind the state of measurement science [55]. Specifically, standard psychometric practices, including testing whether a measure that was developed using non-autistic samples functions similarly in autistic populations, are infrequently applied before utilizing the scale in autism research. Such practices are particularly crucial in the adoption of patient-reported outcome measures, including QoL, as these constructs tap into subjective experiences, rather than objective or observable indicators. Researchers using the PROMIS scales have the advantage of a rigorous development and validation process in the general population or for clinical groups that were included in the initial construction. However, this rigorous development process is only advantageous to autism researchers if the scale functions in the same way in autistic populations.

Clinical implications

The degree that a violation of measurement invariance is problematic depends on how the instrument is used across groups [58]. Some research seeks to compare autistic individuals to non-autistic individuals through comparing scores on a single measure and drawing conclusions based on differences (or lack of differences) between group means. To accurately compare group means, the instrument should demonstrate scalar invariance between the two groups. In the present study, the Depression and Positive Affect scales of the PROMIS self-report demonstrated scalar invariance between autistic and general population teens. Put differently, our results demonstrate that similar scores on the PROMS self-report Depression and Positive Affect subscales imply similar levels of depression and positive affect between autistic and non-autistic teens. As such, these scales can be validly used to compare these groups.

In contrast, other studies may use a teen’s self-report as an outcome within an autistic population, rather than compare across autistic and non-autistic teenagers. In this case, an instrument demonstrating metric (but not scalar) invariance may be sufficient given the research question. Our finding that the PROMIS Anger and Psychological Stress scales demonstrated invariance at the metric level suggests that these scales function similarly in the measurement of anger and psychological stress across groups. However, these scales did not demonstrate scalar invariance, suggesting test bias and that equal observed scores on the PROMIS Anger and Psychological Stress scales do not necessarily imply equal levels of anger or psychological stress in autistic versus non-autistic teens.

Limitations

Results of our study are limited by several factors related to our samples of autistic and general population teens. First, given that this was secondary data analysis based on virtual surveys, limited information was available regarding the cognitive functioning, verbal abilities, or adaptive behavior skills of our samples. Future work able to characterize more multidimensional characteristics would be helpful in validating the generalizability of results presented here. Relatedly, our current study did not include a representative sample of autistic teens who were reported to be minimally verbal or have co-occurring intellectual disability. It is unknown to what extent our current findings would extend to autistic teens with co-occurring intellectual disability. Given the high prevalence rate of intellectual disability among autistic individuals [59], additional research is necessary to understand how to reliably and validly measure QoL among this population across the lifespan. Finally, autism diagnostic status was not reported in the general population sample data, so it is possible that the general population sample contained some autistic adolescents.

As is common in autism research given the imbalanced sex and gender ratio of individuals diagnosed with autism, our autistic sample had a higher proportion of teens identifying as male than female or transgender. In addition, since only the autistic group was asked about transgender identity, there is no comparison group for transgender adolescents in the general population sample. Future work may benefit from oversampling methods for autistic women, transgender, and nonbinary individuals to characterize their lived experience of QoL. Additionally, both autistic and general population groups were more white and non-Hispanic/Latino than is reflected in the current United States population. Increasing racial and ethnic diversity is largely recognized as a high-priority area of growth across a wide range of autism research [60, 61].

Conclusion

The results of this study provide evidence that the Depression and Positive Affect scales of the PROMIS can be used confidently in research and clinical work based on the self-report of autistic and non-autistic teens. Scores on the PROMIS Depression and Positive Affect scales can also be compared across these two groups given the psychometric properties of the scales. The PROMIS Anger and Psychological Stress scales demonstrated metric invariance and can be used to measure these QoL constructs in autistic teens, though caution should be used in research seeking to compare autistic and general population teens on these measures. Finally, neither scalar nor metric invariance was indicated for the Anxiety, Life Satisfaction, and Meaning & Purpose scales, suggesting that these scales measure these constructs differently in autistic and non-autistic teens. The inclusion of QoL outcomes is a welcome advancement in autism research; however, the process in validating QoL measures for use in autistic populations is an ongoing process necessary to support such research. We believe the past and current work on the PAB-L exemplifies this process – the PAB-L leveraged rigorously developed PROMIS scales and consulted with autistic individuals, their families, and care providers to select scales that reflect constructs most meaningful for this population. Feasibility, acceptability, and reliability of the PAB-L were favorable [29], and the current work extends this process to examine the psychometric properties of the scales for autistic teenagers as compared with general population teenagers. Multi-group CFAs that demonstrate measurement invariance between autistic and general populations help researchers make empirically supported decisions regarding the measures they use to capture QoL outcomes in their research.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1 (15.6KB, docx)

Acknowledgements

The authors would like to thank Zachary J. Williams for the generous amounts of time and expertise he provided consulting on the methods for this manuscript.

Author contributions

EAKK: Conceptualization, Methodology, Data Curation, Formal analysis, Writing – Original Draft; RMB: Conceptualization, Methodology, Data Curation, Formal analysis, Writing – Review & Editing; WG: Supervision, Writing – Review & Editing; BEY: Supervision, Writing – Review & Editing; LGH: Data Curation, Supervision, Writing – Review & Editing; JSM: Supervision, Writing – Review & Editing.

Funding

This work was supported by the McMorris Family Foundation and the Children’s Hospital of Philadelphia Women’s Committee.

Declarations

Ethics approval

This study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the Institutional Review Board of the Children’s Hospital of Philadelphia.

Consent to participate

Informed Consent was obtained from the parents of the teenagers included in the study.

Competing interests

The authors have no relevant financial or non-financial interests to disclose.

Footnotes

1

‘Identity-first’ language is preferred by the majority of autistic adults [1, 2] and will be used throughout this manuscript.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Kenny, L., Hattersley, C., Molins, B., et al. (2015). Which terms should be used to describe autism? Perspectives from the UK autism community. Autism, 20, 442–462. [DOI] [PubMed] [Google Scholar]
  • 2.Bottema-Beutel, K., Kapp, S. K., Lester, J. N., et al. (2021). Avoiding Ableist language: Suggestions for Autism Researchers. Autism Adulthood, 3, 18–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Benevides, T. W., Shore, S. M., Palmer, K., et al. (2020). Listening to the autistic voice: Mental health priorities to guide research and practice in autism from a stakeholder-driven project. Autism, 24, 822–833. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Frazier, T. W., Dawson, G., Murray, D., et al. (2018). Brief report: A Survey of Autism Research priorities across a Diverse Community of stakeholders. Journal of Autism and Developmental Disorders, 48, 3965–3971. [DOI] [PubMed] [Google Scholar]
  • 5.Gotham, K., Marvin, A. R., Taylor, J. L., et al. (2015). Characterizing the daily life, needs, and priorities of adults with autism spectrum disorder from interactive Autism Network data. Autism, 19, 794–804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Pellicano, E., Dinsmore, A., & Charman, T. (2014). What should autism research focus upon? Community views and priorities from the United Kingdom. Autism, 18, 756–770. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.World Health Organization, Division of Mental Health and Prevention of Substance Abuse. (1997). WHOQOL: Measuring quality of life (p. 63482). Geneva. http://www.who.int/iris/handle/10665/.
  • 8.Brown, H. M., Dwyer, P. S. R., Gassner, D. L., et al. (2022). The Autism Intervention Research Network on Physical Health Autistic Researcher Review Board. Pediatrics, 149, e2020049437F. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Lord, C., Charman, T., Havdahl, A., et al. (2022). The Lancet Commission on the future of care and clinical research in autism. The Lancet, 399, 271–334. [DOI] [PubMed] [Google Scholar]
  • 10.Hays, D. G., & Wood, C. (2017). Stepping outside the normed Sample: Implications for Validity. Measurement and Evaluation in Counseling and Development, 50, 282–288. [Google Scholar]
  • 11.Tavernor, L., Barron, E., Rodgers, J., et al. (2012). Finding out what matters: Validity of quality of life measurement in young people with ASD: Validity of QoL measurement in ASD. Child: Care, Health and Development, 39, 592–601. [DOI] [PubMed] [Google Scholar]
  • 12.Ayres, M., Parr, J. R., Rodgers, J., et al. (2018). A systematic review of quality of life of adults on the autism spectrum. Autism, 22, 774–783. [DOI] [PubMed] [Google Scholar]
  • 13.Harper, A., & Power, M. (1998). Development of the World Health Organization WHOQOL-BREF quality of life assessment. Psychological Medicine, 28, 551–558. [DOI] [PubMed] [Google Scholar]
  • 14.McConachie, H., Mason, D., Parr, J. R., et al. (2018). Enhancing the validity of a quality of life measure for autistic people. Journal of Autism and Developmental Disorders, 48, 1596–1611. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.McConachie, H., Wilson, C., Mason, D., et al. (2020). What is important in measuring quality of life? Reflections by autistic adults in four countries. Autism Adulthood, 2, 4–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Williams, Z. J., & Gotham, K. O. (2021). Assessing general and autism-relevant quality of life in autistic adults: A psychometric investigation using item response theory. Autism Res, aur.2519. [DOI] [PMC free article] [PubMed]
  • 17.Williams, Z. J., Cascio, C. J., & Woynaroski, T. G. (2022). Measuring subjective quality of life in autistic adults with the PROMIS global–10: Psychometric study and development of an autism-specific scoring method. Autism, 136236132210853. [DOI] [PMC free article] [PubMed]
  • 18.Horn, J. L., & Mcardle, J. J. (1992). A practical and theoretical guide to measurement invariance in aging research. Experimental Aging Research, 18, 117–144. [DOI] [PubMed] [Google Scholar]
  • 19.Chen, F. F. (2008). What happens if we compare chopsticks with forks? The impact of making inappropriate comparisons in cross-cultural research. Journal of Personality and Social Psychology, 95, 1005–1018. [DOI] [PubMed] [Google Scholar]
  • 20.Davidov, E., Meuleman, B., Cieciuch, J., et al. (2014). Measurement equivalence in Cross-national Research. Annual Review of Sociology, 40, 55–75. [Google Scholar]
  • 21.Clements, C. C., Watkins, M. W., Schultz, R. T., et al. (2020). Does the factor structure of IQ Differ between the Differential ability scales (DAS-II) normative sample and autistic children? Autism Research, 13, 1184–1194. [DOI] [PubMed] [Google Scholar]
  • 22.Stephenson, K. G., Beck, J. S., South, M. (2021). Validity of the WISC-V in Youth with Autism Spectrum disorder: Factor structure and measurement invariance. Journal of Clinical Child and Adolescent Psychology: The Official Journal for the Society of Clinical Child and Adolescent Psychology, American Psychological Association, Division, 53, 1–13. [DOI] [PubMed]
  • 23.StephensonKG, Levine, A., Russell, N. C. C., et al. (2023). Measuring intelligence in autism and ADHD: Measurement invariance of the -Binet 5th edition and impact of subtest scatter on abbreviated IQ accuracy. Autism Research, 16, 2350–2363. [DOI] [PubMed] [Google Scholar]
  • 24.Medeiros, K., Mazurek, M. O., & Kanne, S. (2017). Investigating the factor structure of the child Behavior Checklist in a large sample of children with autism spectrum disorder. Research in Autism Spectrum Disorders, 40, 24–40. [Google Scholar]
  • 25.Meltzer, L. J., Forrest, C. B., de la Motte, A., et al. (2020). Clinical validity of the PROMIS Pediatric Sleep measures across populations of children with chronic illnesses and Neurodevelopment disorders. Journal of Pediatric Psychology, 45, 319–327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.McKernan, E. P. Factor Invariance of Anxiety and Depression Measures in Autism.
  • 27.Williams, Z. J., Everaert, J., & Gotham, K. O. (2021). Measuring Depression in autistic adults: Psychometric validation of the Beck Depression Inventory–II. Assessment, 28, 858–876. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Reeve, B. B., Hays, R. D., Bjorner, J. B., et al. (2007). Psychometric evaluation and calibration of Health-Related Quality of Life Item banks: Plans for the patient-reported outcomes Measurement Information System (PROMIS). Medical Care, 45, S22–S31. [DOI] [PubMed] [Google Scholar]
  • 29.Graham Holmes, L., Zampella, C. J., Clements, C., et al. (2020). A Lifespan Approach to patient-reported outcomes and quality of life for people on the Autism Spectrum. Autism Research, 13, 970–987. [DOI] [PubMed] [Google Scholar]
  • 30.Carle, A. C., Bevans, K. B., Tucker, C. A., et al. (2021). Using nationally representative percentiles to interpret PROMIS pediatric measures. Quality of Life Research, 30, 997–1004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Psychometric Evaluation and Calibration of Health-Related Quality of Life Item Banks. (2021). Plans for the patient-reported outcomes Measurement Information System (PROMIS). Medical Care, 45, 11. [DOI] [PubMed] [Google Scholar]
  • 32.Bevans, K. B., Gardner, W., Pajer, K. A., et al. (2018). Psychometric evaluation of the PROMIS Pediatric Psychological and physical stress experiences measures. Journal of Pediatric Psychology, 43, 15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Forrest, C. B., Devine, J., Bevans, K. B., et al. (2018). Development and psychometric evaluation of the PROMIS Pediatric life satisfaction item banks, child-report, and parent-proxy editions. Quality of Life Research, 27, 18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Forrest, C. B., Bevans, K. B., Filus, A., et al. (2019). Assessing children’s Eudaimonic Well-Being: The PROMIS Pediatric meaning and purpose item banks. Journal of Pediatric Psychology, 44, 1074–1082. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Forrest, C. B., Ravens-Sieberer, U., Devine, J., et al. (2018). Development and evaluation of the PROMIS® Pediatric Positive Affect Item Bank, child-report and parent-Proxy editions. Journal of Happiness Studies, 19, 699–718. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Irwin, D. E., Stucky, B. D., Langer, M. M., et al. (2012). PROMIS Pediatric anger scale: An item response theory analysis. Quality of Life Research, 21, 697–706. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Irwin, D. E., Stucky, B., Langer, M. M., et al. (2010). An item response analysis of the pediatric PROMIS anxiety and depressive symptoms scales. Quality of Life Research, 19, 595–607. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Irwin, D. E., Stucky, B. D., Thissen, D., et al. (2010). Sampling plan and patient characteristics of the PROMIS pediatrics large-scale survey. Quality of Life Research, 19, 585–594. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Cella, D., Riley, W., Stone, A., et al. (2010). The patient-reported outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. Journal of Clinical Epidemiology, 63, 1179–1194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Cella, D., Gershon, R., Bass, M. (2017). Assessment Center User Manual, https://www.assessmentcenter.net/ac1/assessmentcenter_manual.pdf.
  • 41.Ravens-Sieberer, U., Devine, J., Bevans, K., et al. (2014). Subjective well-being measures for children were developed within the PROMIS project: Presentation of first results. Journal of Clinical Epidemiology, 67, 207–218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Rosseel, Y. (2012). Lavaan: An R Package for Structural equation modeling. Journal of Statistical Software, 48. 10.18637/jss.v048.i02. Epub ahead of print.
  • 43.Xia, Y., Yang, Y., RMSEA, & CFI (2019). TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods. Behavior Research Methods, 51, 409–428. [DOI] [PubMed] [Google Scholar]
  • 44.Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling Multidisciplinary Journal, 6, 1–55. [Google Scholar]
  • 45.van de Schoot, R., Lugtig, P., & Hox, J. (2012). A checklist for testing measurement invariance. European Journal of Developmental Psychology, 9, 486–492. [Google Scholar]
  • 46.Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for Testing Measurement Invariance. Structural Equation Modeling Multidisciplinary Journal, 9, 233–255. [Google Scholar]
  • 47.Forrest, C. B., Bevans, K. B., Pratiwadi, R., et al. (2014). Development of the PROMIS® pediatric global health (PGH-7) measure. Quality of Life Research, 23, 1221–1231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Kline, R. B. (2015). Principles and practice of structural equation modeling. Guilford Press.
  • 49.Jorgensen, T. D., Kite, B. A., Chen, P-Y., et al. (2018). Permutation randomization methods for testing measurement equivalence and detecting differential item functioning in multiple-group confirmatory factor analysis. Psychological Methods, 23, 708–728. [DOI] [PubMed] [Google Scholar]
  • 50.Jorgensen, T. D., Pornprasertmanit, S., Schoemann, A. M. (2021). Useful tools for structural equation modeling. R package version 0.5-5, https://cran.r-project.org/web/packages/semTools/semTools.pdf.
  • 51.Chen, F. F. (2007). Sensitivity of goodness of fit indexes to lack of Measurement Invariance. Structural Equation Modeling Multidisciplinary Journal, 14, 464–504. [Google Scholar]
  • 52.Meade, A. W., Johnson, E. C., & Braddy, P. W. (2008). Power and sensitivity of alternative fit indices in tests of measurement invariance. Journal of Applied Psychology, 93, 568–592. [DOI] [PubMed] [Google Scholar]
  • 53.Colvin, K. F., & Gorgun, G. Collapsing Scale Categories: Comparing the Psychometric Properties of Resulting Scales. 25.
  • 54.Tsai, C-L., Wind, S., & Estrada, S. (2024). Exploring the effects of collapsing rating scale categories in polytomous item response theory analyses: An illustration and Simulation Study. Measurement-Interdisciplinary Research and Perspectives; 1–24.
  • 55.Schiltz, H. K., Williams, Z. J., Zheng, S. (2024). Measurement matters: A commentary on the state of the science on patient reported outcome measures (PROMs) in autism research. Autism Research, aur.3114. [DOI] [PubMed]
  • 56.Schiltz, H. K., Magnus, B. E., McVey, A. J., et al. (2021). A psychometric analysis of the social anxiety scale for adolescents among Youth with Autism Spectrum Disorder: Caregiver–adolescent agreement, factor structure, and Validity. Assessment, 28, 100–115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Keith, J. M. (2019). The importance of adolescent self-report in Autism Spectrum Disorder: Integration of Questionnaire and autonomic measures. Journal of Abnormal Child Psychology, 14. [DOI] [PMC free article] [PubMed]
  • 58.Borsboom, D. (2006). When does Measurement Invariance Matter? Medical Care, 44, S176–S181. [DOI] [PubMed] [Google Scholar]
  • 59.Maenner, M. J., Shaw, K. A., Bakian, A. V., et al. (2021). Prevalence and characteristics of Autism Spectrum Disorder among children aged 8 years — Autism and Developmental Disabilities Monitoring Network, 11 sites, United States, 2018. MMWR Surveill Summaries, 70, 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Pierce, N. P., O’Reilly, M. F., Sorrells, A. M., et al. (2014). Ethnicity Reporting Practices for Empirical Research in three autism-related journals. Journal of Autism and Developmental Disorders, 44, 1507–1519. [DOI] [PubMed] [Google Scholar]
  • 61.West, E. A., Travers, J. C., Kemper, T. D., et al. (2016). Racial and Ethnic Diversity of Participants in Research supporting evidence-based practices for Learners with Autism Spectrum Disorder. Journal of Special Education, 50, 151–163. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (15.6KB, docx)

Articles from Quality of Life Research are provided here courtesy of Springer

RESOURCES