Abstract
The Penn State Worry Questionnaire for Children (PSWQ-C), developed by Bruce F. Chorpita, is a tool used worldwide for evaluating worry in children. The original US English PSWQ-C has been translated into several other languages and there is support for good psychometric properties of the original and several translated versions. However, the cross-cultural measurement equivalence of the PSWQ-C has not been tested yet. The aim of this study is to evaluate the measurement invariance of the PSWQ-C scale across Poland and the US. This study examines the measurement invariance of the PSWQ-C between Poland and the US— analyzing data from samples of children in Poland, responding to the Polish adaptation of the PSWQ-C (N = 199), and children in the US, responding to the original US English PSWQ-C (N = 199). Data analysis involved two steps. The first step was to test the factor structure of the PSWQ-C in the original US English and Polish versions. Subsequently, multigroup confirmatory factor analysis was conducted to compare three levels of measurement equivalence (configural, metric, and scalar) across the two samples from Poland and the US. The results indicate that the bifactor model is the most appropriate model for the data in both groups. Nevertheless, when considering the entire group, the one-factorial model meets the acceptability criteria, demonstrating equivalence at all levels of measurement. The cross-cultural measurement invariance of the Penn State Worry Questionnaire for Children has been demonstrated across Poland and the United States of America.
Keywords: PSWQ-C; Worry; Children, adolescents, measurement invariance; Cross-cultural; Instrument; Poland, US
Subject terms: Health care, Psychology, Psychology
Introduction
Worry is a common experience that affects everyone to some degree. Scientific research into this phenomenon began in the 1980 s with Borkovec’s pioneering work on excessive mental activity, which made it difficult to fall asleep1. Since then, numerous studies have addressed the issue of worry. These studies have led to the conclusion that worry is a cognitive process that primarily involves thoughts (less often images) which are mainly focused on uncertainty about future events1,2. Such thoughts are typically negative, repetitive, and, in cases of pathological worry, both intrusive and uncontrollable3. Such thoughts give rise to unpleasant emotions, primarily anxiety4. From a clinical perspective, excessive worry is a key feature of anxiety disorders, particularly generalized anxiety disorder (GAD)5, but also separation anxiety, social phobia6, obsessive-compulsive disorder, panic attacks, post-traumatic stress disorder (PTSD)7 and also depressive disorders8. Worry is a universal phenomenon that affects individuals to varying degrees, from children9, adolescents10 to the elderly11. Several methods have been developed to assess this phenomenon. The most widely used instrument is the Penn State Worry Questionnaire, developed by Meyer and colleagues12 and its derivative for children and young people, the Penn State Worry Questionnaire for Children (PSWQ-C), developed by Chorpita13. The PSWQ-C consists of 14 items, assessing children’s tendency to worry. Respondents are asked to indicate how often each item applies to them by choosing from the following responses for each item: “never,” “sometimes,” “often,” and “always.” Their responses are scored on a 4-point Likert scale from 0 (never) to 3 (always). The total sum scores can range from 0 to 42, with higher scores indicating greater tendency to worry. The PSWQ-C demonstrated favorable psychometric properties in community samples13: moderate correlation (r =.52) with the Children’s Depression Inventory (CDI) by Kovacs and high correlation (r =.71) with the Worry/Oversensitivity subscale from the Revised Children’s Manifest Anxiety Scale (RCMAS) by Reynolds & Richmond.
Similarly, the PSWQ-C demonstrated comparable psychometric properties in clinical samples13,14. The internal consistency coefficient has been found to range from 0.8215, 0.8913 to 0.9114. The test–retest reliability after one week in a clinical sample was r =.9213.
Several adaptations of the PSWQ-C have been developed, including translations into French16, Korean17, Danish5, Romanian18, Italian19, Indonesian20, Chinese21, and Polish22. However, no studies have been published evaluating its cross-cultural measurement invariance. Only two studies have evaluated the intra-cultural measurement equivalence of the PSWQ-C (i.e., within a single country). The first study was conducted with a sample of youth in Romanian and assessed the measurement invariance of the Romanian translation of the PSWQ-C across gender, age, and clinical groups of patients with various psychiatric diagnoses, including anxiety disorders, depression, sleep disorders, and anorexia nervosa18. The findings of this study support configural, metric, and scalar invariance for the PSWQ-C scores across boys and girls, children and adolescents, and community and clinical samples. A second study addressing the measurement equivalence of a Chinese language adaptation of the PSWQ-C has been conducted with a sample of Chinese adolescents, examining age, gender, and longitudinal invariance across two time periods (the beginning and end of a semester)23. The results obtained also confirm configural, metric, scalar invariance for the PSWQ-A across gender, age, and time. However, this study used an abbreviated version of the PSWQ with only 8-items, which limits the applicability of its findings to the PSWQ-C.
Thus, the evidence for intra-cultural measurement invariance of the PSWQ-C is promising. However, for researchers engaged in cross-cultural research, it is essential to verify the equivalence of measurement between groups from disparate countries. Three levels of equivalence are typically regarded as critical and are most commonly tested [comp. 24–29]: configural, which assesses whether respondents from different countries understand the measured construct in the same way; metric, which verifies the comparability of the units of measurement employed in the groups; and scalar, which verifies whether individuals with the same level of a given variable (e.g. people with similar levels of worry) will score the same if they are tested with the same scale. There is also strict invariance, which additionally assumes equality of measurement error variances, but it is rarely tested or required in the assessment of measurement equivalence26. Confirming the measurement equivalence of an instrument supports the comparability of test results obtained from disparate samples.
Regarding the factor structure of the PSWQ-C previous studies report different results: EFA in the original U.S. English validation study indicated a single-factor structure13. A study of the translated French version16, reports evidence for a two factor structure, with positively and negatively worded items loading on distinct factors. The same results were obtained by Esbjørn et al.5 and Pestle et al.14 and by Talik22. It is also noteworthy that a two-factor solution was identified in the adult version of the PSWQ, which initiated a debate among scholars whether the factors have any theoretical significance or whether they are simply method factors, related to the way items are formulated (positive and negative), whereas the worry construct itself is unidimensional30,31. Brown32 and Hazlett-Stevens, Ullman, and Craske33 suggested a solution for this question: using confirmatory factor analysis (CFA), they demonstrated that all items of the PSWQ measure the unitary theoretical construct of worry, but additionally, some of the items share additional common variance related to the positive or negative wording of these items. Reise, Moore, and Haviland34 proposed the use of bifactor model. This model consists of either one general trait factor combined with one method factor for negatively worded items and one general trait factor combined with two distinct method factors for the positively and the negatively worded items, respectively. Therefore, the current standard in research on the factor structure of the PSWQ-C (and also PSWQ) is to test 4 models35:
Model M1: a single-factor model with all 14 items loading on one factor;
Model M2: two-factor model with two latent variables representing positively and negatively worded items;
Model M3: a bifactor model with one general trait factor and two method factors representing positively and negatively worded items, respectively;
Model M4: a bifactor model consisting of a general trait factor representing all 14 items and one method factor representing negatively worded items.
The present study
The aim of this study is to evaluate the measurement invariance of the PSWQ-C scale across Poland and the US. The validation process involved two stages. The first step was to check whether the factor structure of the PSWQ-C was consistent across both countries. Secondly, a multigroup confirmatory factor analysis was conducted to compare the three levels of measurement equivalence (configural, metric, and scalar) across two samples: one from Poland and one from the US.
Materials and methods
Participants and procedure
The study participants were composed of two groups: Group 1, consisting of 191 participants from Poland, aged 8–19 years, with M = 14.21, SD = 3.31, and 102 girls (53.4%) and 89 boys (46.6%). The current study involved a portion of the sample (N = 191) from the larger group of participants involved the Polish adaptation of the PSWQ-C (N = 620). The participants were randomly selected using a random observation sampling procedure based on a percentage of the expected number of people, compatible with the US sample size. All participants (100%) were Caucasian, and 63.1% of the participants were living in urban areas, and 36.9% were living in rural areas (villages).
Group 2, consisting of 191 participants from the US, aged 8–19 years, with a M = 13.23, SD = 2.64 and 104 (54.4%) girls and 85 (44.5%) boys (data on the gender and race/ethnicity of 1 participant were missing). The ethnicity of 147 (76.6%) was Caucasian, of 20 (10.4%) was African American, of 13 (6.8%) was Asian American, and of 11 (5.7%) was another ethnicity.
The research in Poland was conducted in April 2022 at randomly selected schools in Lublin, Kraków, and Wrocław. The paper-and-pencil method was used during one school lesson. More detailed information on the data of the sample from Poland can be found in Talik22.
The participants in the US were recruited from community schools in grades 1 through 12, specifically from the Niskayuna and Albany School District. Experimenters visited the schools to describe the study and to distribute child and parental consent forms. The children were informed about the confidential and voluntary nature of the study. The duration of the testing sessions was approximately 15 min. The data for the group from the US are re-used from the original US English PSWQ-C validation study and further relevant details about these data and the original study US can be found in Chorpita et al.13. (To ensure age homogeneity between the Polish and American groups, children aged 6 and 7 years (N = 8) were excluded from the American sample for the purpose of these analyses).
Informed consent
was obtained from all participants and/or their legal guardians. All experimental protocols involving human participants were approved by the Institute of Psychology at the John Paul II Catholic University of Lublin, (approval/grant no. 05–0611-2). All experiments were performed in accordance with relevant guidelines and regulations.
Measures
The study used the original version of the PSWQ-C by Chorpita13 and its Polish adaptation by Talik22. The original US- English PSWQ-C demonstrated acceptable psychometric properties: its reliability with Cronbach’s α internal consistency coefficient was 0.89. The reliability was higher in the group of older children (aged 12–19, α = 0.90) than in younger ones (aged 6–11, α = 0.81). The questionnaire showed a high correlation (r =.71, p <.05) with a subscale of a measure of anxiety (RCMAS – Revised Children’s Manifest Anxiety Scale) and a moderate correlation (r =.52, p <.05) with a scale measuring depression (CDI – Children’s Depression Inventory).
In a recent validation study the Polish adaptation also showed comparable satisfactory psychometric properties: Omega McDonald index reliability was r =.96. The test–retest correlation after four weeks was ρ = 0.71 (n = 43), which, according to Nunnally and Bernstein49, may be regarded as acceptable for research purposes, although it reflects only moderate reliability. Convergent and differential validity were also satisfactory: between r =.71 (anxiety by the State-Trait Personality Inventory– STPI) and r =.58 (depression by the Children’s Depression Inventory– CDI) and between r = −.44 (self-efficacy by the Generalized Self-Efficacy Scale– GSES) and r =.−46 (optimism by the Revised Life Orientation Test– LOT-R)22.
We confirm that all methods were performed in accordance with the relevant guidelines and regulations.
Data analysis
First, descriptive statistics were computed, and the normality of the PSWQ-C scores was examined via the Kolmogorov‒Smirnov test. Since there were missing data in both samples, we used mean imputation to replace missing values. The number of missing cases was 9 in the Polish sample – all with one item responses missing and 27 in the US sample – 26 with 3 or less item responses missing and one with exactly 5 items.We decided not to remove this case (1) because of the relatively small sample size (N = 191) and (2) for consistency with the PSWQ-C adaptation study in which 191 subjects participated13.
Second, confirmatory factor analysis (CFA) was used to examine the construct validity of the PSWQ-C in both groups.
Third, multigroup confirmatory factor analysis (MGCFA) was used.
to investigate three levels of measurement equivalence (configural, metric, and scalar) across the two groups. All calculations were performed using SPSS 28 and AMOS.
Results
Descriptive statistics
Table 1 presents the descriptive statistics of the PSWQ-C for both groups: Poland (Group 1) and the US (Group 2).
Table 1.
Descriptive statistics of the PSWQ-C total sum score in group 1 (Poland) and group 2 (US).
| Group 1 (Poland) | Group 2 (US) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Descriptive statistics | ||||||||||
| N | M | SD | MIN | MAX | N | M | SD | MIN | MAX | |
| Children1 | 96 | 18.43 | 8.97 | 1 | 40 | 57 | 15.93 | 6.47 | 5 | 37 |
| Adolescents2 | 95 | 22.48 | 9.91 | 0 | 42 | 134 | 19.06 | 8.25 | 0 | 39 |
| Total | 191 | 20.44 | 9.65 | 0 | 42 | 191 | 18.13 | 7.88 | 0 | 39 |
| Boys | 89 | 15.75 | 8.40 | 0 | 42 | 86 | 16.15 | 7.09 | 0 | 38 |
| Girls | 102 | 24.54 | 8.78 | 3 | 42 | 104 | 19.84 | 8.13 | 2 | 39 |
Annotation. 1 Age range = 8–11 years; 2 Age range = 12–19 years.
The mean PSWQ-C total score of the Polish group was higher than that of the US group (M = 20.44; SD = 9.65; M = 18.13; SD = 7.88, respectively). The difference was statistically significant (t (380) = 2,574; p <.01; d Cohen = 0.263) but the effect size was weak. The distribution of individual scale items deviates from normal distribution in the sample from the US (K-S (191) = 0.098; p <.001). In the Polish sample, normality was not rejected by the Kolmogorov–Smirnov test (K-S (191) = 0.062, p =.070).
Before conducting multigroup CFA to test for measurement invariance, the model fit was tested separately for each group (Poland and US). Bootstrapped maximum likelihood was used in CFA due to non-normal distribution, which was also confirmed by multivariate kurtosis statistics (c.r. PL = 8.464 and c.r. US = 13.605). To evaluate the goodness of fit of the model, several commonly used indicators were selected based on Yu’s36 recommendations (Table 2).
Table 2.
Selected indicators of model fit in CFA.
| χ2 | P | χ2/df | RMSEA | LO | Hi | PCLOSE | CFI | GFI | Hoelter’s CN (p < 0,05) |
|---|---|---|---|---|---|---|---|---|---|
| ≥ 0.05 | [2–5] | ≤ 0.05 | ≤ 0.05 | ≤ 0.08 | ≥ 0.05 ≤ 1 | ≥ 0.95 | ≥ 0.90 | ≥ 200 |
Source: Yu36.
Table 2 gives the critical values for support or rejection of the model. If there is a good model fit (which means that the estimated covariance matrix is similar to the observed covariance matrix) the values obtained should be close to “0” like in root mean square error of approximation (RMSEA) or to “1” like in indicators of the quality of the model fit to the data (PCLOSE, GFI, CFI). The required cut-off point for value is : ≤ 0.05 (RMSEA), ≥. 05 (PCLOSE), ≥. 0.90 (GFI) and ≥. 95 (CFI). The critical Hoelter N value indicates the minimum sample size at which the model’s chi-square statistic reaches statistical significance at the desired level of power.
The four models mentioned before were tested in both groups
Model M1: a single-factor model with all 14 items loading on one factor;
Model M2: two-factor model with two latent variables representing positively and negatively worded items;
Model M3: a bifactor model with one general trait factor and two method factors representing positively and negatively worded items, respectively;
Model M4: a bifactor model consisting of a general trait factor representing all 14 items and one method factor representing negatively worded items.
The results are presented in Table 3 (Polish sample) and 4 (US sample).
Table 3.
The fit indices of the four models for the sample from Poland (N = 191).
| χ2 | p < | χ2/df | RMSEA | LO | Hi | PCLOSE | CFI | GFI | Hoelter’s CN (p <.05) |
|
|---|---|---|---|---|---|---|---|---|---|---|
| M1 | 87.160 | 0.069 | 1.263 | 0.037 | 0.000 | 0.059 | 0.811 | 0.989 | 0.940 | 195 |
| M2 | 81.186 | 0.150 | 1.177 | 0.030 | 0.000 | 0.054 | 0.902 | 0.992 | 0.942 | 143 |
| M3 | 71.038 | 0.202 | 1.146 | 0.028 | 0.000 | 0.054 | 0.914 | 0.994 | 0.949 | 218 |
| M4 | 85.807 | 0.071 | 1.262 | 0.037 | 0.000 | 0.059 | 0.811 | 0.989 | 0.941 | 196 |
Annotation. Fit indices that best fit the data are bold. M1: a one-factor model with all 14 items loading on one factor; M2: two-factor model with two latent variables representing positively and negatively worded items; M3: a bifactor model with a general trait factor and two method factors representing positively and negatively worded items, respectively; M4: a two-factor model consisting of a trait factor representing all 14 items and a method factor representing negatively worded items.
The analysis of the data from the Polish sample indicated that all models exhibited satisfactory fit rates, with the best fit indices observed in the third model (M3), the bifactor model with one general trait factor and two method factors representing positively and negatively worded items.
The fitting parameters for the four models were then also calculated for the second group (US sample) and the results are presented in Table 4.
Table 4.
The fit indices of the four models for the sample from the US (N = 191).
| χ2 | p < | χ2/df | RMSEA | LO | Hi | PCLOSE | CFI | GFI | Hoelter’s CN (p <.05) |
|
|---|---|---|---|---|---|---|---|---|---|---|
| M1 | 65.139 | 0.437 | 1.018 | 0.010 | 0.000 | 0.045 | 0.978 | 0.999 | 0.956 | 245 |
| M2 | 64.866 | 0.377 | 1.046 | 0.016 | 0.000 | 0.047 | 0.967 | 0.997 | 0.958 | 239 |
| M3 | 53.877 | 0.698 | 0.898 | 0.000 | 0.000 | 0.036 | 0.995 | 1.00 | 0.964 | 279 |
| M4 | 83.160 | 0.172 | 1.157 | 0.029 | 0.000 | 0.053 | 0.921 | 0.988 | 0.944 | 217 |
Annotation. Fit indices that best fit the data are bold. M1: a one-factor model with all 14 items loading on one factor; M2: two-factor model with two latent variables representing positively and negatively worded items; M3: a bifactor model with one general trait factor and two method factors representing positively and negatively worded items, respectively; M4: a bifactor model consisting of one general trait factor representing all 14 items and one method factor representing negatively worded items.
As in the analysis of the data from the Polish sample, the analysis of the data from the US sample indicated that all four models met the acceptability criteria and that the model that best fits the data is model 3, the bifactor model.
Although the bifactor model provided the best overall fit to the data, we heeded ongoing methodological critiques—namely, its documented tendency to exhibit superior goodness of fit across diverse data-generating processes. Such superiority may reflect overfitting rather than the true latent structure. As Murray and Johnson37, p. 420] observed, “the bi-factor model fits better, but not necessarily because it is a better description of ability structure”38–43.
Before proceeding to the multigroup CFA, we evaluated a bifactor specification comprising a general trait factor loading on all 14 PSWQ-C items and a wording-related method factor loading on the negatively keyed items (2, 7, 9). To evaluate the strength of the general factor in the bifactor model, we reported ECV and ωH. Prior methodological work suggests that higher ECV and ωH values indicate a stronger general factor, and values around ECV ≥ 0.60–0.70 and ωH ≥ 0.70–0.80 are often interpreted as supportive of primarily general-factor interpretation of the total score [e.g.,44]. In the Polish sample (N = 191), bifactor diagnostics indicated a dominant general factor (ΩH = 0.908; ΩT = 0.942; ECV_general = 0.854), consistent with an essentially unidimensional structure. Thus, although some method variance attributable to negative wording is present, the total score primarily reflects the common trait and can be interpreted as a reliable indicator of generalized worry. In the U.S. sample (N = 191), the pattern was comparable (ΩH = 0.88, ΩT = 0.90, ECV_general = 0.89). Taken together, these results support an essentially unidimensional structure: while some method variance due to negative wording is present, the total score primarily reflects the common trait and can be interpreted as a reliable indicator of generalized worry.
We also examined the standardized factor loadings (i.e., standardized regression weights) for Model 1 in both groups (Table 5).
Table 5.
Standardized regression weights for the one-factor model (M1) in the Polish (N = 191) and US (N = 191) sample.
| PL | US | |
|---|---|---|
| Items of PSWQ-C | General trait factor | General trait factor |
| 1 | 0.723*** | 0.579*** |
| 2-reversed | 0.340*** | 0.396*** |
| 3 | 0.814*** | 0.679*** |
| 4 | 0.710*** | 0.677*** |
| 5 | 0.726*** | 0.575*** |
| 6 | 0.863*** | 0.746*** |
| 7- reversed | 0.370*** | 0.374*** |
| 8 | 0.769*** | 0.653*** |
| 9- reversed | 0.435*** | 0.333*** |
| 10 | 0.798*** | 0.673*** |
| 11 | 0.761*** | 0.578*** |
| 12 | 0.657*** | 0.664*** |
| 13 | 0.798*** | 0.733*** |
| 14 | 0.791*** | 0.438*** |
Note. *** p <.001; ** p <.01; * p <.05.
The factor loadings were found to be all significant and comparable between the two groups. All of the standardized regression weights on the general trait factor exceed 0.40 in the US sample and in the Polish group – all except the inverted items (2, 7, 9) (according to the recommendations, the value of the factor loadings should exceed 0.70, although less restrictively, 0.40 is allowed as a minimum cut-off point45.
Consequently, Model 1—the unidimensional solution—was selected for further analyses.
Multigroup CFA
To test for measurement invariance across the two groups, a multigroup confirmatory factor analysis (MCFA) with bootstrapped maximum likelihood was conducted, following the procedures described by Byrne (1989). To assess the equivalence of measurement, we included additional indicators to evaluate the model’s fit to the data: a decrease in CFI of less than or equal to − 0.010 from the previous model and a change in RMSEA of less than or equal to 0.015 from the previous model, as well as changes of more than 1 in Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) from the previous model46.
The initial step involved testing the model (M1) without equality constraints across groups. This indicates that the two groups were evaluated without applying equal values to either factor loadings or averages.
A multi-group analysis was conducted to assess configurational equivalence (MCI). The configural invariant model showed very good fit to the data, as indicated by CFI = 0.971, RMSEA = 0.036 (90% CI [0.026–0.046.026.046] (Table 6). This means, that in both groups, the structure is identical and the factor loadings of the PSWQ-C items map onto the latent factors of the scales. In other words, this suggests that the responses from participants in both samples (Poland and the US) attributed the same meaning to the construct under study (worry), with a similar structure underlying the items28, namely a one-factor structure.
Table 6.
Assessment of PSWQ-C measurement equivalence levels across the samples from Poland and the US (N = 382): one-factor model (M1).
| df | χ2 | χ2/df | RMSEA | LO | Hi | CFI | AIC | BIC | Δχ2 (df) | |
|---|---|---|---|---|---|---|---|---|---|---|
| MCI | 144 | 219.221 | 1.522 | 0.036 | 0.026 | 0.046 | 0.971 | 407.221 | 422.631 | – |
| MMI | 157 | 236.562 | 1.507 | 0.036 | 0.026 | 0.045 | 0.969 | 398.562 | 411.841 | 17.341(13) |
| MSI | 158 | 245.869 | 1.556 | 0.037 | 0.028 | 0.046 | 0.966 | 405.869 | 418.984 | 9.307(1) |
Annotation: MCI – model of configural invariance; MMI – model of metric invariance; MSI – model of scalar invariance.
Next, the Model of Metric Invariance (MMI) was tested, in which all factor loadings were constrained to be equal across groups. The metric equivalence indices were acceptable. In comparison with the earlier model, which was of a lower equivalence level (configural invariance), the observed decrease in fit indices of the metric model compared to the configural model did not exceed the cut-off points: the RMSEA decreased by less than 0.015 [MMI (0.036) – MCI (0.036) = 0.000], and the CFI decreased by less than 0.010 [MMI (0.969) – MCI (0.971) =- −0,002]. The value of the Bayesian statistics changed by more than one point (see Table 6). The confirmation of metric equivalence indicates that the units of measurement in both groups are comparable.
Finally, the last model, the Model of Scalar Invariance (MSI), was tested, in which item intercepts were constrained to be equal across groups. As before, the decrease in the fit rate of this model compared to the metric model does not exceed the assumed cut-off points. Once again, the ΔCFI was less than the 0.010 cut-off [MMI (0.966) – MCI (0.969) =.
− 0.003], and the ΔRMSEA was less than 0.015 [MMI (0.037) – MCI (0.036) = 0.001] (Table 7). The confirmation of scalar equivalence signifies that a specific level of the trait under investigation (i.e., worry) is equivalent to the same response by the subjects in both groups.
In conclusion, the one-factor model demonstrated an appropriate fit to the data in both groups, and achieved all levels of measurement equivalence (configural, metric and scalar).
Discussion
The aim of our study was to assess the measurement equivalence of the PSWQ-C across two cultural groups: one from Poland and the other from the US. This is the first study investigating the cross-cultural measurement invariance of the PSWQ-C.
In our study, a bifactor model comprising a general factor and two wording-related method factors (positive- and negative-keyed items) yielded the best overall fit when the Polish and U.S. samples were analyzed separately. In light of methodological critiques of the bifactor specification—particularly its propensity to inflate fit indices—we undertook supplementary analyses, which indicated that, although wording effects are present, these method factors account for only a small share of the variance. By contrast, the general factor accounted for the majority of the common variance, consistent with a shared construct indexing generalized worry. These results supported retaining a single-factor solution, consistent with theoretical models of worry positing that a general propensity to worry accounts for most of the common variance. The additional factors in M2–M4 are best interpreted as wording-related method effects (positively vs. negatively keyed items). Previous studies have also supported a unidimensional structure for the PSWQ-C13,19, particularly for the 11-item short form obtained by removing the negatively worded items15,17,18,47.
Measurement invariance of the one-factor model was supported at all evaluated levels (configural, metric, scalar) across the Polish and U.S. samples. This suggests that respondents in both groups have a comparable understanding of the construct of worry (configural invariance). The units of measurement in the data from the two samples are comparable (metric invariance). Furthermore, the test results of two or more individuals with the same level of the trait under study – in this case, worry – are identical, regardless of the measurement method employed, whether the Polish or the original English-language version of the PSWQ-C is used (scalar invariance). The results indicate that the PSWQ-C is best characterized by a single general factor, with the total score indexing a general propensity to worry.
The confirmation of configural, metric, and scalar invariance of the PSWQ across the Polish and U.S. samples supports meaningful comparisons in cross-cultural research. Specifically, achieving scalar invariance indicates that between-country differences in scores can be interpreted as differences in the level of latent worry rather than measurement artifacts. This, in turn, permits direct comparisons of observed scores and justifies comparisons of latent means between Poland and the USA. Overall, these findings underscore the usefulness of the scale for cross-cultural studies and enable reliable comparisons of worry levels in both populations. Given the support for a dominant general factor and the evidence of configural, metric, and scalar invariance, the total PSWQ score can be used as a valid and comparable indicator of generalized worry in both countries.
Limitations
The research presented here has several limitations. The first issue relates to the missing data and that the number of missings in the US sample was 3 times higher than in the Polish sample. In the context of incomplete data sets, it leaves the possibility that without the missings the data might have turned out differently. Additionally, the use of mean imputation may have led to an underestimation of variability and an attenuation of associations between variables, which in turn could have affected the covariance matrix used in the CFA. It would therefore be advisable to re-run all analyses using more advanced methods for handling missing data (e.g., multiple imputation) and to compare the results obtained. The next limitation related to the choice of estimator in the CFA should be acknowledged. Due to non-normality, we used bootstrapped maximum likelihood even though the scale items were ordinal (0–3). At the same time, this decision is consistent with common applied practice, as Likert-type responses are often treated as approximations of continuous variables when modeling latent structure50. Importantly, simulation research suggests that, under many conditions, conclusions regarding global fit and factor relationships can be similar across ML and categorical approaches, although factor loadings may be more accurately estimated using categorical methods when the number of categories is small48. Therefore, we recalculated all models using WLSMV and obtained convergent conclusions regarding measurement equivalence across countries; given this convergence, we retained the original ML-based results in the manuscript and report the WLSMV analyses as an additional robustness check (Multi-group CFA with the WLSMV estimator showed good fit of the configural model (CFI = 0.990, TLI = 0.988, RMSEA = 0.083, SRMR = 0.076). Imposing equality constraints on factor loadings was associated with only a small change in fit (CFI = 0.987, RMSEA = 0.092), and the scalar model for ordinal data (equal loadings and thresholds) maintained comparable fit (CFI = 0.989, RMSEA = 0.081). Overall, the results support configural, metric, and scalar invariance of the scale between Poland and the USA). The next limitation concerns the cross-sectional design of the study, which precludes any conclusions about the temporal stability of the worry construct. As a consequence, future research should investigate the longitudinal invariance of the PSWQ-C. It would also be informative to further test the measurement equivalence in groups that differ in sex and clinical status.
Author contributions
All authors contributed to the study conception and design: Elżbieta Talik – Conceptualization, Methodology, Formal analysis and investigation, Writing - original draft preparation, Writing - review and editing, Funding acquisition, Resources, SupervisionManuel Sprung – Conceptualization, Methodology, Formal analysis and investigation, Writing - original draft preparation, Writing - review and editing, Resources, SupervisionBruce Chorpita - Methodology, Writing - review and editing, Resources, SupervisionAll authors read and approved the final manuscript.
Funding
This work was supported by the Faculty of Social Sciences, The John Paul II Catholic University of Lublin, Discipline of Psychology under Grant number 05-0611-2.
Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Declarations
Competing interests
The authors declare no competing interests.
Reporting
We report how we determined our sample size, all data exclusions, all manipulations, and all measures in the study. The data for the group from the US are re-used from the original US English PSWQ-C validation study and further relevant details about these data and the original study US can be found in Chorpita et al. 1997. The data for the group from the Poland involved a portion of the sample (N = 199) from the larger group of participants involved the Polish adaptation of the PSWQ-C (N = 620). More detailed information on the data of the sample from Poland can be found in Talik (2024).
Ethical approval
This work was approved by the Faculty of Social Sciences, The John Paul II Catholic University of Lublin, Discipline of Psychology under Grant number 05-0611-2.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1. Borkovec, T. D. The nature, functions and origins of worry. In Worrying: Perspectives on Theory, Assessment and Treatment (eds (eds Davey, G. C. L. & Tallis, F.) 5–33 (Wiley, (1994).
- 2.Borkovec, T. D., Ray, W. J., Stöber, J. & Worry A cognitive phenomenon intimately linked to affective, physiological, and interpersonal behavioral processes. Cogn. Ther. Res.22, 561–576 (1998). [Google Scholar]
- 3.Hirsch, C. R. & Mathews, A. A cognitive model of pathological worry. Behav. Res. Ther.50, 636–646 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Kelly, W. & Miller, M. A discussion of worry with suggestions for counselors. Couns. Values. 44, 55–66 (1999). [Google Scholar]
- 5.Esbjørn, B. H. et al. Meta-worry, worry, and anxiety in children and adolescents. J. Clin. Child. Adolesc. Psychol.44, 145–156 (2015). [DOI] [PubMed] [Google Scholar]
- 6.Weems, C. F., Silverman, W. K. & La Greca, A. M. Worry and anxiety disorders in youth. J. Abnorm. Child. Psychol.28, 63–72 (2000). [DOI] [PubMed] [Google Scholar]
- 7.Dar, K. A. & Iqbal, N. Worry and rumination in generalized anxiety disorder and obsessive compulsive disorder. J. Psychol.149, 866–880 (2015). [DOI] [PubMed] [Google Scholar]
- 8.Dar, K. A., Iqbal, N. & Mushtaq, A. Intolerance of uncertainty, depression, and anxiety: indirect and moderating effects of worry. Asian J. Psychiatry. 29, 129–133 (2017). [DOI] [PubMed] [Google Scholar]
- 9.Vasey, M. W., Crnic, K. A. & Carter, W. G. Worry in childhood. Cogn. Ther. Res.18, 529–549 (1994). [Google Scholar]
- 10.Talik, E. Worry and stress coping strategies among youth. Rev. Psychol.65, 113–128 (2022). [Google Scholar]
- 11.Janowski, K. A tendency to worry in the elderly. In Starzenie się Z godnością (eds (eds Steuden, S., Stanowska, M. & Janowski, K.) 231–240 (Wydawnictwo KUL, (2011).
- 12.Meyer, T. J., Miller, M. L., Metzger, R. L. & Borkovec, T. D. Development and validation of the Penn state worry questionnaire. Behav. Res. Ther.28, 487–495 (1990). [DOI] [PubMed] [Google Scholar]
- 13.Chorpita, B. F., Tracey, S. A., Brown, T. A., Collica, T. J. & Barlow, D. H. Penn state worry questionnaire for children. J. Clin. Child. Psychol.26, 369–379 (1997). [DOI] [PubMed] [Google Scholar]
- 14.Pestle, S. L., Chorpita, B. F. & Schiffman, J. Psychometric properties of the PSWQ-C in a clinical sample. J. Clin. Child. Adolesc. Psychol.37, 465–471 (2008). [DOI] [PubMed] [Google Scholar]
- 15.Muris, P., Meesters, C. & Gobel, M. Reliability and validity of the Penn state worry questionnaire in children. J. Behav. Ther. Exp. Psychiatry. 32, 63–72 (2001). [DOI] [PubMed] [Google Scholar]
- 16.Gosselin, P. et al. Psychometric properties of the French version of the Penn state worry questionnaire for children. Can. Psychol.43, 270–277 (2002). [Google Scholar]
- 17.Kang, S. G., Shin, J. H. & Song, S. W. Reliability and validity of the Korean version of the Penn state worry questionnaire for children. J. Korean Med. Sci.25, 1210–1216 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Păsărelu, C. R. et al. Age, gender and clinical invariance of the PSWQ-C. Child. Psychiatry Hum. Dev.48, 359–369 (2017). [DOI] [PubMed] [Google Scholar]
- 19.Benedetto, L., Trobia, S., Di Blasi, D. & Ingrassia, M. Penn state worry questionnaire for children (PSWQ-C): Italian adaptation of a measure of worry in children and adolescents. Psicol. Clin. Sviluppo. 23, 283–294 (2019). [Google Scholar]
- 20.Ediati, A. & Utari, A. Psychometric evaluation of the Indonesian version of the Penn state worry questionnaire for children. J. Educ. Health Community Psychol.8, 43–61 (2019). [Google Scholar]
- 21.Liu, Y. & Zhong, J. Psychometric properties of the Penn state worry questionnaire for children in Chinese adolescents. J. Abnorm. Child. Psychol.48, 1499–1510 (2020). [DOI] [PubMed] [Google Scholar]
- 22.Talik, E. Polish adaptation and validation of the PSWQ-C. Ann Psychol (2024).
- 23.Xie, S. S., Xiao, H. W. & Lin, R. M. Abbreviated PSWQ for Chinese adolescents. Front Psychiatry14 (2023). [DOI] [PMC free article] [PubMed]
- 24.Byrne, B. M. Testing instrument equivalence across cultural groups: basic concepts, testing strategies, and common complexities. In Evidence-based Psychological Practice with Ethnic Minorities (eds Zane, N., Bernal, G. & Leong, F.) (2016). T. L.) 125–143 (American Psychological Association.
- 25.Cieciuch, J. & Davidov, E. Establishing measurement invariance across online and offline samples: A tutorial with Amos and Mplus. Stud. Psychol. Theor. Prax. 15, 83–99 (2015). [Google Scholar]
- 26.Lubiewska, K. & Głogowska, K. Measurement equivalence analysis in psychological research. Pol. Forum Psychol.23, 330–356 (2018). [Google Scholar]
- 27.Schmitt, N. & Kuljanin, G. Measurement invariance: review and implications. Hum. Resour. Manag Rev.18, 210–222 (2008). [Google Scholar]
- 28.Welkenhuysen-Gybels, J. & van de Vijver, F. J. Methods for the evaluation of construct equivalence in studies involving many groups. In Proceedings of the 56th Annual Conference of the American Association for Public Opinion Research (2001).
- 29.Van de Schoot, R., Lugtig, P. & Hox, J. A checklist for testing measurement invariance. Eur. J. Dev. Psychol.9, 486–492 (2012). [Google Scholar]
- 30.Gana, K. et al. Factorial structure of a French version of the Penn state worry questionnaire. Eur. J. Psychol. Assess.18, 158–168 (2002). [Google Scholar]
- 31.Hopko, D. R. et al. Assessing worry in older adults. Psychol. Assess.15, 173–183 (2003). [DOI] [PubMed] [Google Scholar]
- 32.Brown, T. A. Confirmatory factor analysis of the Penn state worry questionnaire: multiple factors or method effects? Behav. Res. Ther.41, 1411–1426 (2003). [DOI] [PubMed] [Google Scholar]
- 33.Hazlett-Stevens, H., Ullman, J. B. & Craske, M. G. Factor structure of the Penn state worry questionnaire. Assessment11, 361–370 (2004). [DOI] [PubMed] [Google Scholar]
- 34.Reise, S. P., Moore, T. M. & Haviland, M. G. Bifactor models and rotations. J. Pers. Assess.92, 544–559 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Pajkossy, P., Simor, P., Szendi, I. & Racsmány, M. Hungarian validation of the Penn state worry questionnaire. Eur. J. Psychol. Assess.30, 238–247 (2014). [Google Scholar]
- 36.Yu, C. Y. Evaluating Cutoff Criteria of Model Fit Indices for Latent Variable Models with Binary and Continuous OutcomesDoctoral dissertation, University of California, Los Angeles, (2002).
- 37.Murray, A. L. & Johnson, W. Limitations of model fit in bifactor versus higher-order models. Intelligence41, 407–422 (2013). [Google Scholar]
- 38.Bonifay, W., Lane, S. P. & Reise, S. P. Three concerns with applying a bifactor model as a structure of psychopathology. Clin. Psychol. Sci.5, 184–186 (2017). [Google Scholar]
- 39.Decker, S. Don’t use a bifactor model unless you believe the true structure is bifactor. J. Psychoeduc Assess.39, 39–49 (2021). [Google Scholar]
- 40.Flores-Kanter, P. E., Domínguez-Lara, S., Trógolo, M. A. & Medrano, L. A. Best practices in the use of bifactor models: conceptual grounds, fit indices and complementary indicators. Revista Evaluar. 18, 44–48 (2018). [Google Scholar]
- 41.Markon, K. E. Bifactor and hierarchical models. Annu. Rev. Clin. Psychol.15, 51–69 (2019). [DOI] [PubMed] [Google Scholar]
- 42.Morgan, G. B. et al. Are fit indices biased in favor of bifactor models? J. Intell.3, 2–20 (2015). [Google Scholar]
- 43.Reise, S. et al. Is the bifactor model better? Multivar. Behav. Res.51, 818–838 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Rodriguez, A., Reise, S. P. & Haviland, M. G. Evaluating bifactor models. Psychol. Methods. 21, 137–150 (2016). [DOI] [PubMed] [Google Scholar]
- 45.Bedyńska, S. & Książek, M. Statystyczny drogowskaz: praktyczny przewodnik wykorzystania modeli regresji oraz równań strukturalnych (Wydawnictwo Akademickie Sedno, 2012).
- 46.Zercher, F. et al. Comparability of universalism values over time. Front. Psychol.6, 733 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Barajas Mosquera, H. N. & Ruiz, F. J. Validity evidence of the Penn state worry Questionnaire–Children in Colombian children. Rev. Psicol. Clín Niños Adolesc.12, 50–57 (2025). [Google Scholar]
- 48.Rhemtulla, M., Brosseau-Liard, P. É. & Savalei, V. When can categorical variables be treated as continuous? Psychol. Methods. 17, 354–373 (2012). [DOI] [PubMed] [Google Scholar]
- 49.Nunnally, J. C. & Bernstein, I. H. Psychometric Theory (McGraw-Hill, 1994).
- 50.Robitzsch, A. Why ordinal variables can almost always be treated as continuous. Front. Educ.5, 589965 (2020). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
