Abstract
The Kessler psychological distress scale is a useful tool for identifying possible psychological problems and has been widely used in research and health services. Unfortunately, its application in various populations has not always been psychometrically supported. For this reason, the present study investigated the psychometric properties of its Spanish version in adolescents, verifying its factorial structure, measurement invariance by gender, internal consistency and the discrimination and difficulty parameters of its items according to the Item Response Theory (IRT). A sample of 5132 Ecuadorian adolescents was evaluated. The sample is equally distributed between male and female participants (50%) and basic and higher education (51% the former). All participants were between 11 and 20 years old. The results show that a 9-item version with correlated intercepts presents the best fit. In addition, it is invariant by gender at a strict level and has adequate internal consistency. IRT analyses indicated that all the items, except for item eight, present adequate discrimination and difficulty. Based on these results, we conclude that the 9-item version of the Psychological Distress Scale is the most appropriate for this population.
Keywords: Adolescents, Anxiety disorders, Depression disorder, Measurement invariance, Validity
Introduction
Psychological problems are now reported more than before for various reasons [1]. There seems to be a greater interest and awareness within the general population about this type of problem, as well as a greater ability to identify them in primary health care services [2] [3]. This leads to an increase in the demand for psychological services, generating pressure on health systems, both to reliably identify them and treat them. This need of psychological attention is particularly relevant in the adolescent population, since it presents greater psychological vulnerability due to the transition to adulthood [4]. It has been observed, for example, that adolescents present higher impulsivity when compared to adults [5]. Studies report the presence of mental disorders in adolescence to be between 18% and 30% [6], with social dysfunction and anxiety the most prevalent, followed by somatisation and depression [7]. However, identifying these symptoms can be difficult in Latin-American countries due to the limited availability of instruments adapted locally.
In order to evaluate psychological distress in large groups it is necessary to build and adapt brief assessment instruments. Brief instruments allows psychological evaluations to be performed in primary health care services, and thus, to identify problems at early stages of development. However, the use of certain rapid assessment tests to measure general distress such as the 12-item version of the General Health Questionnaire (GHQ-12) [8], The Brief Symptom Check List (LSB-50) [9] or instruments evaluating more specific conditions such as anxiety with The Generalized Anxiety Disorder 7-item scale (GAD-7) [10] are not commonly used as diagnostic resources in primary care with adolescents. Based on this premise, Kessler and Mroczek [11] created the Kessler Psychological Distress Scale (K-10) to assess general psychological distress in hospital care. The final version of the instrument included a set of 10 items that showed optimal sensitivity and specificity for measuring psychological distress and depression [12]. It is important to highlight that the instrument is not designed for the detection of specific disorders, but as a screening tool for the identification of possible symptoms related to anxiety and depression [13]. This type of instrument is particularly useful in settings where a rapid evaluation is needed and it may be difficult to perform a comprehensive assessment. Unfortunately, most of this type of instruments are not properly validated or adapted to the populations that probably need them most, such as populations with reduced health care budgets [14].
The K-10 has been widely used internationally since its inception. Among its advantages are the short time required for its application, the easy understanding of its items, and its simple scoring and interpretation process. In addition, at the instrumental level, it has demonstrated good psychometric properties. The K-10 has shown factorial validity (in its original version in English it has a single-factor structure) and good internal consistency (adequate and high). Due to these aspects, the K-10 has been translated and adapted in various countries such as the United States of America [15], China [16], Mexico [17], Canada [18], Palestine [19], Tanzania [20] and Brazil [21]. In addition, the K-10 shows convergent validity with measures of anxiety, depression [15], mental health [16] somatic symptoms [18] and quality of life [19]. However, in Latin America, its psychometric properties have not been sufficiently investigated in non-adult populations.
Psychometric Properties of the K-10
There are several studies in which the factorial structure of the K-10 has been analysed. Although the K-10 is indeed presented as an instrument evaluating only one factor (psychological distress), there is a considerable debate that questions this view. Some evidence indicates that the unifactorial model of the K-10 presents a better fit, when in addition, the model includes correlated errors with the items Nervous-Very Nervous, Restless-Very Restless and Sad-Very sad [22, 23]. Other studies consider, rather, a two-factor structure: (a) anxiety, including the items Fatigued, Nervous, Very Nervous, Restless, Very Restless and No Energy; and (b) depression, including the items Hopeless, Sad, Very Sad and Useless [15, 16, 20]. In this respect, there does not seem to be a consensus.
One other problem frequently found in factorial studies of the K-10 corresponds to the lack of consensus regarding the appropriate number of items for its composition. Several authors recommend the elimination of items to improve its adjustment indexes. Some of these proposed versions include a 9-item version [24], a 7 item-version [25], and even a 6 item-version [19, 26, 27]. The reasons for the better fit of the models after the elimination of the items are not clear. These could be related to internal validity factors associated with the statistical estimators used, as well as by linguistic and cultural differences between the investigated populations.
Most of the studies investigating its psychometric properties are performed using adult samples [15–21], but the use of the K-10 in adolescents is not uncommon. It is important to consider that the adult and adolescent populations present important differences. In fact, many instruments usually present different versions for each population. This is one of the reasons for analysing the performance of the K-10 in adolescents, particularly when considering that adolescents are considered a vulnerable group [4].
The evidence of its validity in the adolescent population is quite limited and is reduced to two studies, both using the 6-item version (K6): one performed in Indonesia [28] and the other in the United States of America [29]. We have not been able to find studies investigating the psychometric properties of the K10 in adolescents.
One last limitation is the lack of exploration of other important psychometric properties of the K-10, such as measurement invariance, which is usually necessary for multigroup studies. For example, several authors have found that women tend to score higher than men on distress [24, 30–32]. However, it is not clear whether these differences arise from different interpretation of the items or from real differences in the experience of distress [33]. Although it is common in this type of comparative study to assume that all between-group differences detected by an instrument derive from differences in the underlying constructs [34], this may not be always true [35–37]. We have only been able to find one study investigating measurement invariance in the children population [38]. One other limitation in the psychometric study of the K-10 is that only properties from the Classical Test Theory (CTT) have been analysed. The Item Response Theory (IRT) can provide a more informative analysis regarding the performance of this test, by reviewing indicators such as Discrimination, Difficulty and Pseudo-randomness of each test item.
Within the Latin American context, particularly in Ecuador, there are no previous studies investigating measurement invariance and IRT properties of this instrument in the adolescent population. This lack of a psychometric evaluation introduces the possibility that the results of studies or interventions carried out with this instrument in the adolescent population are biased by problems of measurement invariance or psychometric validity.
Method
Aims of the Study and Design
Based on the gaps identified in prior research, the following aims are proposed for the study: (a) verify the factorial structure of the Spanish version of the K-10, in search of the model that presents the best fit in a sample of adolescents from Ecuador, (b) confirm the measurement invariance by gender of the K-10, (c) determine the internal consistency of the scale; and (d) evaluate the discrimination and difficulty parameters of the K-10 items according to the IRT. With this, it is hypothesized that the Spanish version of the K-10 presents a unifactorial structure (H1); is invariant by gender (H2) and the discrimination and difficulty parameters are adequate (H3). All these aims will be investigated using a cross-sectional design [39].
Participants
The study involved 5.132 adolescents from 9 cities from the province of Tungurahua in Ecuador. Regarding gender distribution, 50.3% of the participants were men (n = 2567) and 49.7% were women (n = 2537), 28.4% were between 11 and 13 years old (early adolescence), 46.6% between 14 and 16 years old (middle adolescence) and 24.9% between 17 and 20 years old (late adolescence). All participants were adolescents enrolled in school and enrolled in the 2020–2021 school year at various educational centres in the province. Fifty-one per cent of the participants were in primary school and the other half (48.8%) were in high school.
Instruments
The Kessler Psychological Distress Scale [12]. Spanish version adapted to the Ecuadorian adult population [24]. This scale measures general psychological distress. It consists of 10 affirmative items such as ‘During the past 30 days, about how often did you feel nervous?’ and ‘During the past 30 days, about how often did you feel that everything was an effort?’. It uses a 5-point Likert scale to register responses, where 1 is “None of the time” and 5 is “All of the time”. A total score is calculated from all responses.
Procedure
All school directors from district 18D02, Zone 3 in Tungurahua (Ecuador) were contacted and invited to participate. Directors from 86 centres from 9 cities accepted the invitation to participate in this study. Due to Covid-19, the evaluation was performed online using Google Forms during the first semester of 2021. Participants completed the questionnaires at their own home. In order to avoid participants responding more than one time to the questionnaires, only one evaluation per student was admitted per device. In a preliminary pilot study with 50 adolescents, it was evidenced that the Spanish translation of the K-10 originally adapted for adults did not require any special adaptation for adolescents, and thus, the same version was used [24]. Before completing the questionnaire, legal consent was obtained from the schools and legal guardians of the adolescents. Informed consent was presented before the questionnaire, indicating that participation was voluntary and anonymous and that the data would only be used for research purposes. No economic compensation was given to participants. However, all participants received training on good mental health habits, school adaptation, and coping strategies for academic stress. The present study adheres to the ethical standards and recommendations of the Helsinki Convention.
Data Analysis
We began performing descriptive statistics analyses (measures of central tendency, dispersion and distribution) since these results allow verifying the assumption of univariate normality, which is fulfilled when its skewness and kurtosis are within the range ± 1.5 [40]. The assumption of multivariate normality was also analysed by the Mardia’s kurtosis and skewness tests [41], which is fulfilled in the absence of statistical significance [42].
After this, a Confirmatory Factor Analysis (CFA) was performed using a polychoric correlation matrix and Diagonally Weighted Least Squares (DWLS) estimation. Which are performed in the absence of multivariate normality and with categorical response variables [43]. The factorial validity of the scale was interpreted using the absolute fit indicators Chi-Square (χ2), Normed Chi-Square (χ2/df), the Standardised Root Mean Square (SRMR), Comparative Fit Index (CFI), Tucker- Lewis Index (TLI), and Root Mean Square Error of Approximation (RMSEA). We also analysed the factor loadings of each item (λ) to determine the model fit and the contribution of each item to the latent construct. The structural model is considered adequate when the indicators fall within the different cut-off points: the p-value of χ2 is be significant, the χ2/df is less than 4, the CFI and the TLI are greater than 0.95, the SRMR and the RMSEA are less than 0.08 and the λ are greater than 0.50 [34, 44–46].
Measurement invariance by gender was analysed performing a Multiple-group Confirmatory Factor Analysis (CFA-MG) with DWLS estimations. Invariance can be tested at three hierarchical levels of restriction (metric, strong and strict). For each level, a restricted model is created and compared to the previous one. If the new model fit does not show to be significantly worse than the previous fit it is understood that a higher level of invariance has been demonstrated. We used χ2, the CFI and the RMSEA to test this. In the case of the CFI and RMSEA fit indices, we used 0.2 as the cut-off point [33, 44]. In the case of strong invariance, the differences by gender in the latent measures (ΔK) are analysed, for this the intercepts in the CFA of the group of men are set to 0 and the group of women is left free to later estimate possible significant differences (p < 0.05) along with effect size (ΔK*) which may be small (ΔK* > 0.2), moderate (ΔK* > 0.5) or large (ΔK* > 0.8). We also analysed the K-10 internal consistency by calculating the McDonald’s ω coefficient [47].
The analysis of the K-10 from the IRT consisted of the study of the discrimination and difficulty parameters of its items. This was done after verifying the assumption of unidimensional traits according to the CFA previously performed. The analysis of discrimination and difficulty of items was performed using a Graded Response Model (GRM). This is an extension of the 2-Parameter Logistic Model (2-PLM) for ordered polytomous items [48–50]. In the discrimination parameter, the slope of change of the answers to the items is analysed based on the level of ability that the person has. In line with common practice and in order to allow easy comparison with other studies, we standardised the resulting scores from the analyses, where 0 represents the mean and 1 a standard deviation. Adequate discrimination is considered when the values of the reported items are greater than 1. On the other hand, in the difficulty parameter, the item’s behaviour along the skills scale is analysed, which is expected to be at the point of average probability (0.50) of the ability of individuals. Since the K-10 responses are registered on a five-item scale, there are four difficulty estimates (one for each threshold) in which their values are expected to increase as the scale increases (Likert scale). Finally, information curves were created for the scale with the Test Information Curve (TIC) and the items with the Item Characteristic Curves (ICC).
Statistical analyses were performed in R version 4.1.1. [51] using the MVN, lavaan, MBESS, SemTools and ltm packages.
Results
Preliminary Analysis of the Items
Table 1 presents the descriptive statistics of the K-10 items between the group of men and women. The response trend between the items is homogeneous, fluctuating between M(item 3) = 1.81 (SD = 0.95) and M(item 8) = 3.01 (SD = 1.22) in men and M(item 3) = 1.95 (SD = 1.01) and M(item 8) = 3.25 (SD = 1.21) in women. In the analysis of the data distribution, both in men and women, the values of skewness (g1) and kurtosis (g2) are within the range of ± 1.5, which allows us to estimate that, at the univariate level of the items, the assumption of normality is met. However, the assumption of multivariate normality is not met according to the results of Mardia’s skewness and kurtosis test.
Table 1.
Descriptive statistics and normality analysis
| Item | Men | Women | ||||||
|---|---|---|---|---|---|---|---|---|
| M | SD | g1 | g2 | M | SD | g1 | g2 | |
| Item 1 | 2.49 | 1.00 | 0.41 | − 0.18 | 2.70 | 1.04 | 0.32 | − 0.31 |
| Item 2 | 2.57 | 0.94 | 0.40 | − 0.09 | 2.74 | 0.95 | 0.32 | − 0.22 |
| Item 3 | 1.81 | 0.95 | 1.05 | 0.51 | 1.95 | 1.01 | 0.89 | 0.09 |
| Item 4 | 2.12 | 0.95 | 0.66 | 0.04 | 2.36 | 1.06 | 0.48 | − 0.33 |
| Item 5 | 2.33 | 1.00 | 0.61 | 0.04 | 2.41 | 1.00 | 0.52 | − 0.12 |
| Item 6 | 2.04 | 1.03 | 0.83 | 0.08 | 2.05 | 1.02 | 0.77 | − 0.09 |
| Item 7 | 2.27 | 1.09 | 0.65 | − 0.26 | 2.65 | 1.14 | 0.27 | − 0.76 |
| Item 8 | 3.01 | 1.22 | 0.06 | − 0.95 | 3.25 | 1.21 | -0.07 | − 1.01 |
| Item 9 | 2.07 | 1.08 | 0.85 | − 0.03 | 2.48 | 1.17 | 0.40 | − 0.76 |
| Item 10 | 2.08 | 1.19 | 0.94 | − 0.04 | 2.30 | 1.30 | 0.65 | − 0.72 |
| K-10 | 20.80 | 6.19 | 0.69 | 0.066 | 22.84 | 6.51 | 0.50 | − 0.07 |
| Mardia | 2392.9*** | 47.5*** | 1510.0*** | 31.18*** | ||||
*** p value < .001; M arithmetic mean, SD standard deviation, g1 Skewness, g2 Kurtosis
Confirmatory Factor Analysis
Table 2 shows the CFA of the K-10 with DWLS estimation. In it, various adjustment models are analysed on both, the 10 item-version of the K-10 and a 9-item version (K-9) not including item 8. Three models are tested for both compositions of the scale: (a) a unifactorial model, (b) a unifactorial with correlated intercepts model, and (c) a two-factor model (i.e., anxiety and depression).
Table 2.
Confirmatory factor analysis of the K-10 and K-9 with DWLS estimations
| Model | Χ2 | df | Χ2/df | CFI | TLI | SRMR | RMSEA |
|---|---|---|---|---|---|---|---|
| Unifactorial (K-10) | 1380.62*** | 35 | 39.4 | 0.986 | 0.982 | 0.053 | 0.087 [0.083–0.091] |
| Correlated intercepts (K-10) | 624.89*** | 32 | 19.5 | 0.994 | 0.991 | 0.038 | 0.060 [0.056–0.064] |
| Two factors (K-10) | 1072.64*** | 34 | 31.5 | 0.989 | 0.985 | 0.048 | 0.077 [0.073–0.081] |
| Unifactorial (K-9) | 1302.14*** | 27 | 48.2 | 0.987 | 0.982 | 0.055 | 0.096 [0.092–0.010] |
| Correlated intercepts (K-9) | 531.39*** | 24 | 22.1 | 0.995 | 0.992 | 0.037 | 0.064 [0.060–0.069] |
| Two factors (K-9) | 987.53*** | 26 | 38.0 | 0.990 | 0.986 | 0.049 | 0.085 [0.081–0.090] |
χ2 chi-squared, df degrees of freedom, χ2/df normed Chi-squared, CFI comparative fit index, TLI Tucker-Lewis index, RMSEA root-mean-square error of approximation, SRMR standardised root-mean-squared residual
The best fits are found in the 10- and 9-item models with correlated intercepts, since these show acceptable fit indices (CFI, TLI, SRMR, and RMSEA). However, in the 10-item version, the saturation of item 8 (λ = 0.10) is poor. It should also be noted that the χ2 and χ2/df values show a poor performance. This is due to the large size of the sample, since these indicators are sensitive to the size of the sample Table 3.
Table 3.
Results from factorial invariance at different levels of restriction
| Restrictions | χ2 | CFI | RMSEA | Δχ2 | ΔCFI | ΔRMSEA |
|---|---|---|---|---|---|---|
| Baseline—men | (24) 331.8 | 0.994 | 0.991 | – | – | – |
| Baseline—women | (24) 224.5 | 0.995 | 0.992 | – | – | – |
| Unrestricted | (48) 210.9 | 0.994 | 0.036 | – | – | – |
| Metric | (56) 219.7 | 0.994 | 0.034 | (8) 8.8 | 0.000 | 0.003 |
| Strong | (64) 335.8 | 0.991 | 0.041 | (8) 116.0*** | 0.004 | 0.007 |
| Strict | (65) 790.6 | 0.975 | 0.016 | (1) 454.8*** | 0.016 | 0.025 |
*** p < .001; χ2 chi-squared, CFI comparative fit index, RMSEA Root-mean-square error of approximation, Δ difference
Regarding the saturations of the other items, these show loads above λ > 0.50 fluctuating between λ(item 1) = 0.64 and λ(item 7) = 0.81 (see Table 4). This indicates that the unifactorial model adequately explains the variance across the items of the 9-item version.
Table 4.
Item saturation and internal consistency of the K-9
| Item | Sample | Women | Men | Δλ |
|---|---|---|---|---|
| Item 1 | 0.66 | 0.68 | 0.62 | 0.06 |
| Item 2 | 0.63 | 0.62 | 0.63 | 0.01 |
| Item 3 | 0.74 | 0.74 | 0.73 | 0.01 |
| Item 4 | 0.79 | 0.80 | 0.78 | 0.02 |
| Item 5 | 0.68 | 0.72 | 0.65 | 0.07 |
| Item 6 | 0.64 | .68 | 0.62 | 0.06 |
| Item 7 | 0.76 | 0.78 | 0.74 | 0.04 |
| Item 8 | – | – | – | – |
| Item 9 | 75 | 0.75 | 0.75 | 0.00 |
| Item 10 | 0.77 | 0.78 | 0.76 | 0.02 |
| Δω | ||||
| ω [CI 95%] | 0.886 [0.881–0.892] | 0.875 [0.866–0.884] | 0.893 [0.887–0.901] | − 0.014 [0.00–− 0.02] |
Δλ saturation difference, ω McDonald’s coefficient, CI 95% confidence interval at 95%
Measurement Invariance and Internal Consistency
Table 3 shows the results from the factorial invariance analysis of the K-9 segmented by gender. The baseline model for both men and women has an adequate fit in each of the segments. Furthermore, significant differences between the fit of each model were not found on χ2, with Δχ2 = 124.94 (p > 0.05) indicating measurement invariance at this level.
Regarding the CFA-MG with DWLS estimation, it is observed that as the restriction levels are added (metric, strong and strict), no significant differences are found between each consecutive restricted model according to the ΔCFI and ΔRMSEA indices. This is an indication of measurement invariance at a strict level.
In addition, Table 4 shows homogeneous λ values between men and women in the different items of the K-9 and, all of them above 0.50. The difference between the loadings of the fit models for the groups is minimal, so these are considered to be equivalent at a strict level. According to this level of invariance, the difference previously observed on the latent means between men and women indicates greater psychological distress in women than in men with ΔK = − 0.107.3 (p < 0.001) and a small effect size ΔK* = 0.27.
In relation to the internal consistency of the K-9, Table 4 also shows the ω coefficient point estimate and 95% confidence interval to be acceptable.
Graded Response Model
Since the K-9 CFA evidences a unifactorial structure, the unidimensionality and local independence assumptions are met, which allows the analysis of item discrimination and item difficulty based on the IRT to be performed.
As seen in Table 5, regarding item discrimination, it is observed that all items, except item 8, present an adequate discriminant capacity (values above 1). As for the difficulty parameter, all the threshold estimators (b1–b4) increased monotonically as expected, except for item 8 again.
Table 5.
Item response theory item parameters for the K-10
| Item | A | b1 | b2 | b3 | b4 |
|---|---|---|---|---|---|
| Item 1 | 1.544 | − 1.606 | − 0.085 | 1.393 | 2.573 |
| Item 2 | 1.506 | − 2.007 | − 0.135 | 1.396 | 2.834 |
| Item 3 | 1.933 | − 0.161 | 0.892 | 2.006 | 3.073 |
| Item 4 | 2.256 | − 0.842 | 0.436 | 1.526 | 2.714 |
| Item 5 | 1.752 | − 1.218 | 0.342 | 1.585 | 2.720 |
| Item 6 | 1.544 | − 0.498 | 0.774 | 1.953 | 3.182 |
| Item 7 | 2.332 | − 0.922 | 0.148 | 1.059 | 2.081 |
| Item 8 | 0.188 | − 12.667 | − 3.727 | 2.338 | 8.422 |
| Item 9 | 2.248 | − 0.634 | 0.387 | 1.226 | 2.258 |
| Item 10 | 1.990 | − 0.386 | 0.499 | 1.234 | 2.044 |
a discrimination parameter, b difficulty parameter
Figure 1 also shows the Test Information Curves (TIC) and the Item Information Curves (IIC) of the 9 items of the K-9. It can be observed in the Item Characteristic Curves that items 7, 1 and 4 are the most relevant and precise to evaluate the latent variable of psychological distress since they show the greatest discrimination capacity. In the same figure, the TIC reveals that the K-9 is reliable or accurate, especially in the scale ranges − 0.5 and 3.
Fig. 1.
Item information curves and results from the K-9 test. IIC item information curves, TIC test information curve
Discussion
The present study sought to evaluate the psychometric properties of the K-10 by verifying its factorial structure, confirming its measurement invariance by gender, determining its internal consistency and contrasting the discrimination and difficulty parameters of its items according to the IRT in a large sample of adolescents from Ecuador. It is important to highlight that several of the previous studies investigating its psychometric properties have used relatively small sample sizes [19, 24].
The CFA with DWLS estimation (see Table 2), revealed that the unifactorial and two-factor models of the 10-item scale present a good fit of the data, although the model with correlated intercepts is the one presenting the best fit. This is in line with previous reports [22, 23]. However, item eight showed poor factor saturation. This problem was already identified in an Ecuadorian adult sample [24], thus we also consider its removal appropriate. Considering this, we tested similar models with a 9-item version of the scale (K-9) and found the unifactorial model with correlated intercepts to present the best fit. We have not found previous studies reporting similar results, particularly with the adolescent population. Our results support the unifactorial model of the scale previously reported in adult populations [15–21] and adolescents [28, 29]. These findings allow us to consider that the interpretation of the K-9 in adolescents is similar to that of Ecuadorian adults, that is, it also measures personal distress. Regarding measurement invariance by gender from the K-9, we could find that the scale is invariant at a strict level according to the CFI and RMSEA difference indices. However, the χ2 difference index reported significant differences in two of the restricted models (constraining factor loadings and intercepts to be equivalent between groups) indicating a lack of strong and strict invariance. This contradiction between indices is due to the difference in their sensitivity for detecting inequality across models. Some authors [36] have argued that χ2 is overly sensitive on large samples, as in our case. Previous studies with the K-10 [38] have found similar results. In this respect, our findings represent a contribution to the understanding of the characteristics of this instrument.
Given the presence of strong equivalence, a latent mean analysis was performed in which differences between groups were found, with greater distress experienced by women when compared to men. We could not find prior studies investigating the K-10 in adolescent samples that reported similar conditions. Results showed that female participants scored higher than male participants and, since the K-9 showed to be invariant by sex, these differences derive from the groups. It is not surprising to find women report higher distress than men, however in adolescents, it could be a proxy for future psychological problems and special attention should be given.
In the analyses of the internal consistency of the 9-item version of the K-10, we found that the McDonald’s ω coefficients, at both its point estimate and 95% confidence intervals, were above 0.85 in all cases. This was true for the total sample and the groups of men and women. These findings are consistent with preliminary studies in adult populations for both the 9-item version [24] and versions using correlated intercepts models [22, 23]. Again, this is the first study reporting these results on adolescent populations.
Finally, regarding the analyses of item discrimination and difficulty according to the IRT, the Graded Response Model analysis for ordered polytomous items [48, 49] indicated that all the items have adequate discrimination and difficulty, except for item 8. These results further support the elimination of this item, both from the classical test theory and item response theory, showing poor factor loading, poor discrimination and great differences in difficulty between points of the Likert scale and very high values at both ends of the scale.
At a psychometric level, the scale has shown good properties on adolescents both on factor analyses and its internal consistency based on the CTT, as well as good discrimination and difficulty of its items based on the IRT. These findings support the use of the 9-item version (K-9) of the K-10 on adolescent populations. The unifactorial structure also supports the notion that it evaluates a single construct (i.e., Psychological Distress) in adolescents and adults despite the anxiety and depression theme that could be found at a face validity level.
Regarding the implications of the study, our results indicate that the K-10 does not perform adequately in adolescents, however, removing item 8 significantly improved its performance. Having this instrument will allow regional researchers to carry out more precise comparative and diagnostic studies as well as to understand its limitations. Furthermore, our study used robust estimations to assess the reliability and validity of the K-9. In this case, this represents a contribution since the type of data analysed demands this type of robust estimations and these analyses are not commonly seen in studies investigating the K-10. One other significant contribution of this study is the psychometric analysis of the K-10 according to the item response theory, which allows for estimating the difficulty and discrimination of each item.
Limitations
One of the greatest limitations of the study is the convoluted context in which the data was collected, that is, during the restrictions imposed by the Ecuadorian government to reduce the spread of COVID-19 and the uncertainty faced by the population. Although many of these measures had a positive impact on reducing the spread of the virus, they also lead to an increase in anxiety and depression in the Ecuadorian population [7] [52]. This particular context in which the study was conducted could have affected the responses given by participants, elevating the score on some items more than others. For these reasons, other studies should be carried out in contexts in which there is not such a significant stressor, that is, in more stable conditions and in which emotions are less exacerbated to evaluate if similar results are found.
Another important limitation is that we have not been able to establish a cut-off point to differentiate between the clinical and healthy populations. Although t-scores can indeed be established from the data provided, it is always useful to contrast these elevations with the identification of clinical patients, since a standard deviation is not always synonymous with adaptive or functional maladjustment. For this reason, diagnostic validity studies are recommended to determine the capacity of the K-9 to screen the population based on conditions that require specialised psychological or psychiatric care. Likewise, convergent validity tests are required in relation to other variables such as depression, anxiety or stress to better understand the clinical utility of the test.
The age range of participants could potentially be a limitation as well. Our sample consisted of adolescents from 11 to 20 years old, which provides a richer sample in terms of the age spectrum, however, this age range incorporates different developmental stages as well, where different response patterns could be observed due to a different understanding of the items. From this point of view, such a sample could require analyses of measurement invariance as well.
Summary
The study provides evidence of good factorial validity of the 9-item Kessler Scale (K-9), as well as gender invariance. These findings are in line with other studies around the world that provide evidence of its validity for use in research and intervention. These results also generate a significant contribution to the psychometric study of this scale since it was carried out in a large sample of adolescents from Ecuador, in which there are few validation studies. In addition, a new contribution in the psychometric exploration of the measure is incorporated by analysing the properties of difficulty and discrimination through the Item Response Theory (IRT), which has never been done before. In conclusion, the 9-item version of the Kessler Scale is suitable for the evaluation of psychological distress in Ecuadorian adolescents.
Declarations
Conflict of interest
No potential conflict of interest was reported by the authors.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Richter D, Wall A, Bruen A, Whittington R. Is the global prevalence rate of adult mental illness increasing? Systematic review and meta-analysis. Act Psych Scandinavica. 2019;140(5):393–407. doi: 10.1111/acps.13083. [DOI] [PubMed] [Google Scholar]
- 2.Kohrt BA, Luitel NP, Acharya P, Jordans MJ. Detection of depression in low resource settings: validation of the patient health questionnaire (PHQ-9) and cultural concepts of distress in Nepal. BMC Psychiatry. 2016;16(1):1–14. doi: 10.1186/s12888-016-0768-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Staab JP, Datto CJ, Weinrieb RM, Gariti P, Rynn M, Evans DL. Detection and diagnosis of psychiatric disorders in primary medical care settings. Med Clin North Am. 2001;85(3):579–596. doi: 10.1016/S0025-7125(05)70330-8. [DOI] [PubMed] [Google Scholar]
- 4.Holder MK, Blaustein JD. Puberty and adolescence as a time of vulnerability to stressors that alter neurobehavioral processes. Front Neuroendoc. 2014;35(1):89–110. doi: 10.1016/j.yfrne.2013.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Aponte-Zurita G, Moreta-Herrera R. Evidencias de validez y fiabilidad de una Escala de Impulsividad en adolescentes del Ecuador. Psychol Soc Educ. 2022;14(3):48–56. doi: 10.21071/psye.v14i3.14976. [DOI] [Google Scholar]
- 6.Deighton J, Lereya ST, Casey P, Patalay P, Humphrey N, Wolpert M. Prevalence of mental health problems in schools: poverty and other risk factors among 28 000 adolescents in England. Br J Psychiatry. 2019;215(3):565–656. doi: 10.1192/bjp.2019.19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Zumba-Tello D, Moreta-Herrera R. Afectividad, Regulación Emocional, Estrés y Salud Mental en adolescentes del Ecuador en tiempos de pandemia. Revista de Psicología de la Salud UHM. 2022;10(1):117–129. [Google Scholar]
- 8.Goldberg D. Manual del General Health Questionnaire. Windsor: NFER Publishing; 1978. pp. 8–12. [Google Scholar]
- 9.Abuín MR, de Rivera L. La medición de síntomas psicológicos y psicosomáticos: el Listado de Síntomas Breve (LSB-50) Clínica y Salud. 2014;25(2):131–141. doi: 10.1016/j.clysa.2014.06.001. [DOI] [Google Scholar]
- 10.Spitzer RL, Kroenke K, Williams JB, Löwe B. A brief measure for assessing generalized anxiety disorder: the GAD-7. Arch Intern Med. 2006;10:1092–1097. doi: 10.1001/archinte.166.10.1092. [DOI] [PubMed] [Google Scholar]
- 11.Kessler RC, Barker PR, Colpe LJ, Epstein JF, Gfroerer JC, Hiripi E, et al. Screening for serious mental illness in the general population. Arch Gen Psychiatry. 2003;60(2):184–189. doi: 10.1001/archpsyc.60.2.184. [DOI] [PubMed] [Google Scholar]
- 12.Kessler R, Mroczek D. (1994) Final Versions of our Non-Specific Psychological Distress Scale. In: Ann Arbor(Ed). Survey Research Center of the Institute for Social Research.
- 13.Brenlla ME, Aranguren M. Adaptción ar-gentina de la Escala de Malestar Psicológico de Kessler (K-10) Revista de Psicología. 2010 doi: 10.18800/psico.201002.005. [DOI] [Google Scholar]
- 14.Moreta-Herrera R, Perdomo-Pérez M, Reyes-Valenzuela C, Torres-Salazar C, Ramírez-Iglesias G. Invarianza factorial según nacionalidad y fiabilidad de la Escala de Afecto Positivo y Negativo (PANAS) en universitarios de Colombia y Ecuador. Anuario de Psicología. 2021;51(2):76–85. [Google Scholar]
- 15.Bessaha ML. Factor structure of the Kessler psychological distress scale (K6) among emerging adults. Res on Soc Work Prac. 2015;27(5):616–624. doi: 10.1177/1049731515594425. [DOI] [Google Scholar]
- 16.Bu XQ, You LM, Li Y, Liu K, Zheng J, Yan TB, et al. Psychometric properties of the Kessler 10 scale in Chinese parents of children with cancer. Cancer Nurs. 2017;40(4):297–304. doi: 10.1097/NCC.0000000000000392. [DOI] [PubMed] [Google Scholar]
- 17.Vargas Terrez BE, Villamil Salcedo V, Rodríguez Estrada C, Pérez Romero J, Cortés SJ. Validación de la escala Kessler 10 (K-10) en la detección de depresión y ansiedad en el primer nivel de atención. Propiedades psicométricas Salud mental. 2011;34(4):323–331. [Google Scholar]
- 18.Sampasa-Kanyinga H, Zamorski MA, Colman I. The psychometric properties of the 10-item Kessler psychological distress scale (K10) in Canadian military personnel. PLoS ONE. 2018;13(4):e0196562. doi: 10.1371/journal.pone.0196562. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Easton SD, Safadi NS, Wang Y, Hasson RG. The Kessler psychological distress scale: translation and validation of an Arabic version. Health Qual Life Outcomes. 2017;15(1):215. doi: 10.1186/s12955-017-0783-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ricardo J, Vissoci N, Vaca SD, El-gabri D, Oliveira LP, De Mvungi M, et al. Cross-cultural adaptation and psychometric properties of the Kessler scale of psychological distress to a traumatic brain injury population in Swahili and the Tanzanian Setting. Health Qual Life Outcomes. 2018;16(147):4–11. doi: 10.1186/s12955-018-0973-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.da Silva BFP, Santos-Vitti L, Faro A. Kessler psychological distress scale: internal structure and relation to other variables. Psico-USF. 2021;26(1):91–101. doi: 10.1590/1413-82712021260108. [DOI] [Google Scholar]
- 22.Milkias B, Ametaj A, Alemayehu M, Girma E, Yared M, Kim HH, et al. Psychometric properties and factor structure of the Kessler-10 among Ethiopian adults. J Affec Dis. 2022;303:180–196. doi: 10.1016/j.jad.2022.02.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Sunderland M, Mahoney A, Andrews G. Investigating the factor structure of the Kessler psychological distress scale in community and clinical samples of the Australian population. J Psychopathol Behav Assess. 2012;34:253–259. doi: 10.1007/s10862-012-9276-7. [DOI] [Google Scholar]
- 24.Larzabal-Fernandez A, Ramos-Noboa MI, Jaramillo-Zambrano A, Hong-Hong AE. Propiedades psicométricas de la Escala de Malestar Subjetivo de Kessler (K10) en adultos ecuatorianos. CienciAmérica. 2020;9(3):27. doi: 10.33210/ca.v9i3.265. [DOI] [Google Scholar]
- 25.Uddin MN, Islam FMA, Al MA. Psychometric evaluation of an interview-administered version of the Kessler 10-item questionnaire (K10) for measuring psychological distress in rural Bangladesh. BMJ Open. 2018;8(6):1–11. doi: 10.1136/bmjopen-2018-022967. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Cotton SM, Menssink J, Filia K, Rickwood D. The psychometric characteristics of the Kessler psychological distress scale (K6) in help-seeking youth: What do you miss when using it as an outcome measure? Psych Res. 2021;305:114182. doi: 10.1016/j.psychres.2021.114182. [DOI] [PubMed] [Google Scholar]
- 27.Peiper N, Lee A, Deashner N, Wing J. The performance of the K6 scale in a large school sample: a follow-up study evaluating measurement invariance on the Idaho youth prevention survey. Psychol Assess. 2016;28(6):775–779. doi: 10.1037/pas0000188. [DOI] [PubMed] [Google Scholar]
- 28.Tran TD, Kaligis F, Wiguna T, Willenberg L, Nguyen HTM, Luchters S, et al. Screening for depressive and anxiety disorders among adolescents in Indonesia: formal validation of the centre for epidemiologic studies depression scale—revised and the Kessler psychological distress scale. J Affect Dis. 2019;246:189–194. doi: 10.1016/j.jad.2018.12.042. [DOI] [PubMed] [Google Scholar]
- 29.Mewton L, Kessler RC, Slade T, Hobbs MJ, Brownhill L, Birrell L, et al. The psychometric properties of the Kessler psychological distress scale (K6) in a general population sample of adolescents. Psychol Assess. 2016;28(10):1232–1242. doi: 10.1037/pas0000239. [DOI] [PubMed] [Google Scholar]
- 30.Álvarez D, Soler MJ, Cobo R. Bienestar psicológico en adolescentes: relaciones con autoestima, autoeficacia, malestar psicológico y síntomas depresivos. R Orient Educ. 2019;33(63):23–43. [Google Scholar]
- 31.Baillie AJ. Predictive gender and education bias in Kessler’s psychological distress Scale (K10) Soc Psych and Psych Epid. 2005;40(9):743–748. doi: 10.1007/s00127-005-0935-9. [DOI] [PubMed] [Google Scholar]
- 32.Pilco Ushiña KX, Larzabal-Fernandez A. Relación entre autoeficacia, estrés percibido y malestar psicológico en una muestra de adolescentes de Tungurahua. R Psicol Unemi. 2022;6(10):86–95. [Google Scholar]
- 33.Asparouhov T, Muthén B. Multiple-group factor analysis alignment. Struct Equat Mod. 2014;21(4):495–508. doi: 10.1080/10705511.2014.919210. [DOI] [Google Scholar]
- 34.Byrne BM. (2006) Structural equation modeling with EQS: basic concepts, applications, and programming Associates LE.
- 35.Caycho-Rodríguez T, Vilca LW, Cervigni M, Gallegos M, Martino P, Portillo N, et al. Fear of COVID-19 scale: validity, reliability and factorial invariance in Argentina’s general population. Death Stud. 2020;46(3):543–552. doi: 10.1080/07481187.2020.1836071. [DOI] [PubMed] [Google Scholar]
- 36.Meade AW, Johnson EC, Braddy PW. Power and sensitivity of alternative fit indices in tests of measurement invariance. J Applied Psych. 2008;93:568–592. doi: 10.1037/0021-9010.93.3.568. [DOI] [PubMed] [Google Scholar]
- 37.Moreta-Herrera R, Dominguez-Lara S, Vaca-Quintana D, Zambrano-Estrella J, Gavilanes-Gómez D, Ruperti-Lucero E, et al. Psychometric properties of the general health questionnaire (GHQ-28) in Ecuadorian college students. Psihologijske teme. 2021;30(3):573–590. doi: 10.31820/pt.30.3.9. [DOI] [Google Scholar]
- 38.Ren Q, Li Y, Chen DG. Measurement invariance of the Kessler psychological distress scale (K10) among children of Chinese rural-to-urban migrant workers. Brain and Behav. 2021;11(12):e2417. doi: 10.1002/brb3.2417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Ato M, López J, Benavente A. Un sistema de clasificación de los diseños de investigación en psicología. Anales de Psicología. 2013;29(3):1038–1059. doi: 10.6018/analesps.29.3.178511. [DOI] [Google Scholar]
- 40.Ferrando PJ, Anguiano-Carrasco C. El análisis factorial como técnica de investigación en psicología. Papeles del Psicólogo. 2010;31(1):18–33. [Google Scholar]
- 41.Mardia KV. Measures of multivariate skewness and kurtosis with applications measures of multivariate skewness and kurtosis with applications. Biometrika. 1970;57:519. doi: 10.2307/2334770. [DOI] [Google Scholar]
- 42.Cain MK, Zhang Z, Yuan KH. Univariate and multivariate skewness and kurtosis for measuring nonnormality: prevalence, influence and estimation. Behav Res Methods. 2017;49(5):1716–1735. doi: 10.3758/s13428-016-0814-1. [DOI] [PubMed] [Google Scholar]
- 43.Li CH. Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares. Behav Res Methods. 2016;48(3):936–949. doi: 10.3758/s1342. [DOI] [PubMed] [Google Scholar]
- 44.Brown TA. Confirmatory factor analysis for applied research. 2. New York: Guilford Publications; 2015. [Google Scholar]
- 45.Dominguez-Lara S. Propuesta de puntos de corte para cargas factoriales: una perspectiva de fiabilidad de constructo. Enferm Clin. 2018;28(6):401–402. doi: 10.1016/j.enfcli.2018.06.002. [DOI] [PubMed] [Google Scholar]
- 46.Wolf EJ, Harrington KM, Clark SL, Miller MW. Sample size requirements for structural equation models: an evaluation of power, bias, and solution propriety. Educ Psychol Measur. 2013;73(6):913–934. doi: 10.1177/0013164413495237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.McDonald RP. (1999) Test theory: A unified treatment Lawrence Erlbaum Associates I.
- 48.Hambleton RK, van der Linden WJ, Wells CS. IRT models for the analysis of polytomously scored data: Brief and selected history of model building advances. In: Nering ML, Ostini R, editors. Handbook of polytomous item response theory models. Beijing: Routledge; 2010. pp. 21–42. [Google Scholar]
- 49.Samejima F. (1997) Graded response model. In Van der Linden WJ, Hambleton RK, (eds). Springer, p. 85–100.
- 50.Moreta-Herrera R, Caycho-Rodríguez T, Salinas A, Jiménez-Borja M, Gavilanes-Gómez D, Jiménez-Mosquera C. Factorial validity, reliability, measurement invariance and the graded response model for the COVID-19 anxiety scale in a sample of ecuadorians. OMEGA—J Death Dying. 2022 doi: 10.1177/00302228221116515. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.R Core Team. (2019) R: A language and environment for statistical computing. R Foundation for Statistical Computing Vienna, Austria.
- 52.Rodas JA, Jara-Rizzo MF, Greene CM, Moreta-Herrera R, Oleas D. Cognitive emotion regulation strategies and psychological distress during lockdown due to COVID-19. Int J Psychol. 2022;57(315):3–324. doi: 10.1002/ijop.12818. [DOI] [PMC free article] [PubMed] [Google Scholar]

