Towards standardization of measuring anxiety and depression: Differential item functioning for language and Dutch reference values of PROMIS item banks

Ellen B M Elsman; Gerard Flens; Edwin de Beurs; Leo D Roorda; Caroline B Terwee

doi:10.1371/journal.pone.0273287

. 2022 Aug 23;17(8):e0273287. doi: 10.1371/journal.pone.0273287

Towards standardization of measuring anxiety and depression: Differential item functioning for language and Dutch reference values of PROMIS item banks

Ellen B M Elsman ¹, Gerard Flens ², Edwin de Beurs ^3,⁴, Leo D Roorda ⁵, Caroline B Terwee ^1,^*

Editor: Thiago Machado Ardenghi⁶

PMCID: PMC9398458 PMID: 35998333

Abstract

Introduction

The outcomes anxiety and depression are measured frequently by healthcare providers to assess the impact of a disease, but with numerous instruments. PROMIS item banks provide an opportunity for standardized measurement. Cross-cultural validity of measures and the availability of reference values are prerequisites for standardized measurement.

Methods

PROMIS Anxiety and Depression item banks were completed by 1002 representative Dutch persons. To evaluate cross-cultural validity, data from US participants in PROMIS wave 1 were used and differential item functioning (DIF) was investigated, using an iterative hybrid of logistic regression and item response theory. McFadden’s pseudo R²-change of 2% was the critical threshold. The impact of any DIF on full item banks and short forms was investigated. To obtain Dutch reference values, T-scores for anxiety and depression were calculated for the complete Dutch sample, and age-group and gender subpopulations. Thresholds corresponding to normal limits, mild, moderate and severe symptoms were computed.

Results

In both item banks, two items had DIF but with minimal impact on population level T-scores for full item banks and short forms. The Dutch general population had a T-score of 49.9 for anxiety and 49.6 for depression, similar to the T-scores of 50.0 of the US general population. T-scores for age-group and gender subpopulations were also similar to T-scores of the US general population. Thresholds for mild, moderate and severe anxiety and depression were set to 55, 60 and 70, identical to US thresholds.

Conclusions

The limited number of items with DIF and its minimal impact, enables the use of standard (US) item parameters and comparisons of scores between Dutch and US populations. The Dutch reference values provide an important tool for healthcare professionals and researchers to evaluate and interpret symptoms of anxiety and depression, stimulating the uptake of PROMIS measures, and contributing to standardized outcome measurement.

Introduction

Symptoms of anxiety and depression are prevalent among patients with various conditions, such as diabetes [1], cancer [2], cardiovascular diseases cardiovascular diseases [3], and numerous mental health disorders [4, 5]. These symptoms are commonly measured by healthcare providers to assess the impact of disease and its treatment. The importance of measuring anxiety and depression is reflected in the widespread inclusion of both outcomes in Standard Sets for major medical conditions by the International Consortium for Health Outcome Measurement (ICHOM) [6]. Currently, anxiety and depression are included in 16 out of 28 Standard Sets, thereby being among the most commonly included outcomes [7].

To assess the outcomes anxiety and depression in patients or the general population, researchers and clinicians use patient-reported outcome measures (PROMs) [8–10]. Numerous PROMs assessing anxiety or depression exist, although not all of them meet the standards for reliability, validity and feasibility [11–13]. Uniformity in PROMs to measure anxiety and depression is lacking, which makes it difficult to compare their scores, and hinders benchmarking and quality of care improvements. Moreover, it is labor-intensive and costly to build in different PROMs and their scoring algorithms in electronic health records, it is difficult for healthcare providers to use different PROMs for patients with different conditions and interpret the results correctly, and it is burdensome for patients with multiple conditions to complete different PROMs measuring the same construct, which scores are not shared between healthcare providers [14–16].

To work towards standardized outcome measurement of anxiety and depression and to overcome the above mentioned challenges, the Patient-Reported Outcomes Measurement Information System (PROMIS)® initiative might provide opportunities (see [17, 18] for an overview of PROMIS and early aims and findings about the initiative). PROMIS aims to develop and maintain a state-of-the-art assessment system to measure patient-reported health with highly accurate, precise and short measures [17, 18]. The PROMIS initiative has resulted in a wide range of universal applicable (generic) item banks for use across patient populations, targeting various constructs, including item banks for measuring anxiety and depression [19–21]. PROMIS item banks can be used to create fixed questionnaires with a small number of items (also known as short forms), or used as a computerized adaptive test (CAT), which is more dynamic [22, 23]. In a CAT, items are selected from an item bank based on a persons’ responses. The administration of items stops when a pre-specified criterion is met. As a result, the administration burden is reduced, with a negligible loss of precision.

The PROMIS v1.0 Anxiety item bank contains 29 items [24], whereas the v1.0 Depression item bank contains 28 items [25]. Both item banks can be applied as short forms or CAT. Fixed length short forms of 4, 6, 7 and 8 items exist for anxiety (i.e. the PROMIS Short Form v1.0 –Anxiety 4a, 6a, 7a and 8a respectively), and of 4, 6 and 8 items for depression (i.e. the PROMIS Short Form v1.0 –Depression 4a, 6a, 8a and 8b respectively) [24, 25]. Short forms with increasing length result in more precise scores. As such, instruments intended for large scale data collection and comparisons of large groups can be short, whereas instruments intended for obtaining individual scores, diagnosing and comparing small groups should be longer. Moreover, instruments intended to monitor health status over time require more precision and thus need to be longer as well. For all these intended uses, CAT-based assessment is a good option because it combines efficiency and precision [26].

Several studies have compared PROMIS anxiety and depression instruments with legacy measures for anxiety and depression, such as the Patient Health Questionnaire, Beck Depression Inventory, General Anxiety Disorder and Centre for Epidemiological Studies Depression [27–32]. These studies conclude that PROMIS anxiety and depression instruments perform similar to these legacy measures, and can be used to screen and evaluate depression and anxiety in the general population, as well as in patient groups [27–32].

PROMIS instruments have been implemented in various institutions and health disciplines, such as orthopedics [33–35], oncology [36] and diabetes [37]. Major translation efforts have been conducted [28, 38–41], including the translation of 17 adult item banks into Dutch-Flemish [42]. The Dutch-Flemish PROMIS item banks for anxiety and depression have been validated in a representative sample of the Dutch general population as well as a clinical sample with common mental disorders [43–45], and they can be used in clinical practice and research involving the Dutch population.

In order to pursue standardized measurement of anxiety and depression and to provide contextual meaning to scores, cross-cultural validity of measures and the availability of reference values are important prerequisites. Cross-cultural validity, by means of differential item functioning (DIF) for language, has not yet been investigated for the Dutch-Flemish PROMIS item banks anxiety and depression [46]. Items should be free of DIF to ensure that the US scoring algorithm, which is the default scoring algorithm by PROMIS convention, is appropriate to use in other countries and that country-specific scores are not biased, in order to compare scores between countries. PROMIS item banks are scaled in such a way that the US general population has a mean score of 50 with a standard deviation of 10 [17]. However, some studies have shown that reference values of PROMIS scales in other countries deviate from the mean score of 50 that is obtained from the US general population [47, 48]. Therefore, this study aims to investigate DIF for language between the Netherlands and the US for the PROMIS Anxiety and Depression item banks, assess its impact, and subsequently provide reference values for these item banks for the Dutch general population.

Materials and methods

The Medical Ethical Committee of Amsterdam UMC, location VUmc, the Netherlands, confirmed that the study protocol was exempted from ethical approval according to the Dutch Medical Research in Human Subjects Act (WMO), as no experiments were conducted. The study adhered to the tenets of the Declaration of Helsinki.

Participants and procedures

Data was collected in 2014 [43, 44]. Participants were recruited from an existing internet panel of the Dutch general population by a data collection company (Desan Research Solutions). Participants needed to be representative for the Dutch general population with respect to age distribution, gender, educational level (low, middle, high), region of residence (north, east, south, west) and ethnicity (native Dutch, first- and second-generation western immigrant, first- and second-generation non-western immigrant). Representativeness of participants was compared to data from Statistics Netherlands in 2013 with maximum allowable deviations of 2.5%. Participants were asked to complete the full Dutch-Flemish PROMIS item banks Anxiety and Depression, through a web-based survey in which skipping of items was not allowed. Additionally, participants completed questions regarding their sociodemographic characteristics.

For evaluating DIF for language, data from US participants was obtained from the HealthMeasures Dataverse [49], containing PROMIS wave 1 data of 21,113 participants. The calibration subsample of the anxiety and depression item banks was used [21], containing respectively 14,836 and 14,839 respondents.

To investigate how often the DIF items were included in CATs, CATs from an ongoing study in a clinical population sample of adult patients who started outpatient treatment for common mental disorders [32] were assessed.

Measures

The PROMIS Item Bank v1.0 –Anxiety consists of 29 items that assess self-reported fear (panic, fearfulness), anxious misery (dread, worry), hyperarousal (nervousness, restlessness, tension) and somatic symptoms related to arousal (dizziness, racing heart) [21, 24]. Example items include ‘I felt anxious’, ‘I felt fearful’ and ‘I felt worried’. The PROMIS Item Bank v1.0 –Depression consists of 28 items that assess self-reported negative mood (guilt, sadness), views of self (worthlessness, self-criticism), social cognition (interpersonal alienation, loneliness) and decreased positive affect and engagement (loss of purpose, meaning and interest) [21, 25]. Example items include ‘I felt depressed’, ‘I felt sad’ and ‘I felt lonely’. All items have a 7-day recall period and are scored on a 5-point Likert scale with response options 1 = never, 2 = rarely, 3 = sometimes, 4 = often and 5 = always. Total scores are derived from the original US IRT model (i.e. the Graded Response Model (GRM) [50]) and expressed as T-scores, with a mean of 50 and a standard deviation of 10 for the US general population [17]. Higher scores represent more anxiety/depression. In line with PROMIS convention, T-scores were calculated based on the item parameters from the original US calibration sample with expected a posteriori estimates [19]. T-scores can either be calculated by uploading item scores in the online HealthMeasures Scoring Service program, provided by the US Assessment Center [20], or by using the conversion tables in the PROMIS anxiety/depression scoring manuals to convert raw sum scores into T-scores [24, 25]. Scoring Service is the most accurate scoring method available because it uses IRT-based response pattern scoring and can handle missing data (the conversion table can only be used when all items are completed) and was therefore used for obtaining Dutch reference values in this study.

Statistical analyses

Descriptive statistics were used to summarize sociodemographic characteristics of participants. DIF analyses were conducted with an iterative hybrid of logistic regression and IRT with the lordif package [51] in R. In the logistic regression framework, three regression models were compared: model 1, in which item responses are predicted by the latent trait; model 2, in which item responses are predicted by the latent trait and group (US or NL) membership; and model 3, in which item responses are predicted by the latent trait, group membership (US or NL) and the interaction between these terms. Uniform and non-uniform DIF were assessed by comparing model 1 with model 2 and model 2 with model 3, respectively. The likelihood-ratio χ² test with detection criterion R2 was used to detect DIF. McFadden’s pseudo R² was used as a measure of DIF magnitude, with a 2% change being considered as critical threshold [51, 52]. Monte Carlo simulations implemented in the lordif package (1000 replications) were performed to check for type I error inflation [51].

The impact of DIF on item and total scores was assessed by visual inspection of category response curves (CRCs) and test characteristic curves (TCCs) per group. To assess the impact of DIF on short forms and full item bank T-scores, T-scores were calculated with the original US item parameters with expected a posteriori estimates from the GRM model (obtained from HealthMeasures), which is standard practice for PROMIS measures, as well as with a hybrid set of item parameters, and subsequently compared. The hybrid set of item parameters consisted of the original (US) item parameters for the non-DIF items and rescaled Dutch item parameters for the DIF items. Dutch item parameters were obtained by fitting a GRM to the Dutch general population sample, using the mirt package [53] in R. To obtain a hybrid set of item parameters the Stocking-Lord method was used to rescale the Dutch item parameters for DIF items to the US metric [54, 55]. The equate function in the lordif package computes linear transformation constants (with DIF free items as anchor) that can be used to equate the Dutch item parameters to the scale of the US item parameters [51], while minimizing the squared difference between the test characteristic curve. These constants were then used to transform the Dutch discrimination (α) and location (β) parameters of the DIF items into new item parameters (α_new and β_new) on the US metric.

The mean T-score of respondents was calculated for the original and hybrid approach, to investigate the impact on T-scores on a group level. Furthermore, for each respondent the absolute difference between the original and hybrid approach was calculated, to investigate the impact on T-scores of individuals. To investigate the impact of DIF on CATs, it was assessed how often the DIF items were included in CATs, based on 4047 CATs for anxiety and 4293 CATs for depression from an ongoing study [32].

To provide reference values for the Dutch general population, T-scores on the complete item banks were calculated with the original US item parameters for the entire group of participants, as well as for age-range (18–34 years, 35–44 years, 45–54 years, 55–64 years, 65–74 years and ≥75 years) and gender subpopulations, in accordance with available subpopulation reference scores of the US population [56]. T-scores of the Dutch general population were compared to the US general population and age-range and gender subpopulation reference scores. T-score ranges that correspond to within normal limits, and to mild, moderate and severe symptoms [57] were computed using thresholds based on mean plus 0.5, 1 and 2 standard deviations. Subsequently, the percentage of participants that would fall within each category was calculated.

Results

A total of 1486 participants were invited, of which 1055 completed the PROMIS Anxiety and Depression item banks (response rate 71%). Because of suspicious response patterns (e.g. all responses in one category combined with short response times), 53 participants were excluded from the analysis. Sociodemographic characteristics of the remaining 1002 participants are presented in Table 1. Differences in sociodemographic characteristics between the study participants and the Dutch general population in 2013 were all less than 2.5%, except for ethnicity.

Table 1. Sociodemographic characteristics of participants and the Dutch general population.

Sociodemographic characteristic	Study participants^* (n = 1002)	Dutch adult population 2013^a (n = 13.3 million)
Age in years, mean ± SD (range)	49 ± 17 (18–100)
18–39	34.3	34
40–64	44.4	44
≥65	21.3	22
Gender
Male	47.9	49
Female	52.1	51
Educational level
Low	32.0	32
Middle	39.9	40
High	28.0	28
Region of residence
North	11.5	10
East	20.5	21
South	21.5	22
West	46.6	47
Ethnicity
Native	79.6	80
1^st and 2^nd generation western immigrant	12.6	10
1^st and 2^nd generation non-western immigrant	7.8	10
Living situation
Single	29.2
Married/living together	60.0
Relationship, not living together	4.0
Living with parents	5.7
Other	1.1
Currently treated for psychological complaints
Yes	10.1
No	89.9

Open in a new tab

* all results expressed as % unless otherwise noted.

SD: standard deviation;

^a Based on data from statistics Netherlands (https://www.cbs.nl)

Monte Carlo simulations indicated that the type I error rate for DIF detection was well controlled, as the empirical thresholds for probability associated with the χ² statistic were all close to the nominal α (= 0.01) level, ranging from 0.009–0.01 for both item banks. This indicates that there is no need for establishing empirical thresholds through Monte Carlo simulations [51]. The McFadden’s pseudo R² thresholds from the Monte Carlo simulations were all very small (≤0.0004 for anxiety and ≤0.0003 for depression), and as this is an effect size measure, applying a threshold that is substantially less than what would be considered a small but meaningful effect (e.g. 0.02) would not be meaningful according to any standard [51]. Therefore, the nominal α level of 0.01 and the McFadden’s pseudo R² value of 0.02 were maintained. Table 2 shows the results of the DIF analyses. Two items in the anxiety item bank, ‘It scared me when I felt nervous’ (EDANX03) and ‘I felt worried’ (EDANX30), showed uniform and non-uniform DIF, respectively. The item ‘I felt worried’ is present in the PROMIS anxiety 7a short form. The items are present in respectively 1 and 3% of the CAT-based assessments. In the depression item bank, two items showed uniform DIF: ‘I felt worthless’ (EDDEP04) and ‘I felt unhappy’ (EDDEP36). Both these items are present in the PROMIS depression 6a, 8a and 8b short forms. The item ‘I felt worthless’ is also present in the PROMIS depression 4a short form. The item ‘I felt worthless’ is present in 8% of the CAT-based assessments, whereas the item ‘I felt unhappy’ is present in all CAT-based assessments. For the item ‘It scared me when I felt nervous’, the threshold parameters for the Dutch population were mostly slightly lower than the thresholds for the US population, indicating that the Dutch population endorses higher response categories at the same level of anxiety. The same applied for the item ‘I felt worthless’. For the item ‘I felt unhappy’, the threshold parameters for the Dutch population were slightly higher than the thresholds for the US population, indicating that the Dutch population endorses lower response categories at the same level of depression. Fig 1 illustrates the impact of DIF on respondents total scores. The plots on the left show the impact of DIF when all items are considered, whereas the plots on the right show the impact of DIF when only DIF items are considered. The plots show that DIF had a minimal impact on the total score when all items are administered in each item bank. S1 Fig shows the impact of DIF on item scores per group for the items displaying DIF.

Table 2. McFadden’s pseudo R² and IRT parameters for items displaying DIF.

Item bank	Item with DIF	DIF type	McFadden’s pseudo R²	Slope; and threshold parameters	Included in CAT^d
Anxiety	EDANX03: It scared me when I felt nervous	Uniform	R²₁₂ = 0.021 R²₂₃ = 0.011	NL: 2.62; 0.15, 0.97, 2.02	1%
Anxiety	EDANX03: It scared me when I felt nervous	Uniform	R²₁₂ = 0.021 R²₂₃ = 0.011	US: 3.74; 0.59, 1.18, 1.95	1%
	EDANX30: I felt worried^a	Non-uniform	R²₁₂ = 0.010 R²₂₃ = 0.033	NL: 2.16; -1.12, -0.10, 1.29, 2.64 US: 3.14; -0.57, 0.24, 1.22, 2.12	3%
Depression	EDDEP04: I felt worthless^b	Uniform	R²₁₂ = 0.024 R²₂₃ = 0.013	NL: 2.93; -0.17, 0.58, 1.56, 2.61 US: 4.37; 0.29, 0.88, 1.61, 2.36	8%
	EDDEP36: I felt unhappy^c	Uniform	R²₁₂ = 0.037 R²₂₃ = 0.001	NL: 4.21; -0.14, 0.61, 1.33, 2.20 US: 3.44; -0.64, 0.23, 1.20, 2.17	100%

Open in a new tab

The bold population had lower thresholds compared to the other population, indicating that this population endorses higher item response categories at the same level of the domain (anxiety, depression)

^a present in the anxiety 7a short form

^b present in the depression 4a, 6a, 8a and 8b short form

^c present in the depression 6a, 8a and 8b short form

^d Based on 4047 CAT-based assessments for anxiety and 4293 CAT-based assessments for depression

Table 3 displays the impact of DIF on T-scores of item banks, and short forms wherein DIF items are present. On a population level, mean anxiety T-scores based on hybrid parameters were approximately 0.5 point lower than T-scores based on the original US parameters, both for the full item bank as the short form. Differences on a population level were even smaller for depression T-scores of item banks and most short forms. Only for the 4a short form, mean depression T-scores based on the hybrid parameters were 1 point lower than T-scores based on the original US parameters. On an individual level, absolute T-score differences between the original and hybrid approach for anxiety ranged from 0 to 1.7 for the full item bank and from 0 to 1.9 for the short form. Absolute T-score differences between the original and hybrid approach for the depression item banks and most short forms ranged from 0 to 1.2 for individuals. A maximum T-score difference between the two approaches of 2.6 was found for the depression short form 4a. S2 Fig shows the difference for each full item bank and short form in relation to the T-score of individuals. Notably, the largest differences were found for participants with T-scores on the lower end of the scale.

Table 3. PROMIS anxiety and depression T-scores^a based on different sets of item parameters for different versions of the instruments.

Version	Mean population T-score (SD) original approach^b	Mean population T-score (SD) hybrid approach^c	Mean absolute T-score difference (SD), [range]
Anxiety
Full item bank	49.9 (10.1)	49.5 (10.3)	0.40 (0.29) [0.00–1.67]
Short form 7a	50.3 (9.2)	49.8 (9.6)	0.59 (0.48) [0.00–1.88]
Depression
Full item bank	49.6 (10.0)	49.7 (9.9)	0.15 (0.09) [0.00–0.73]
Short form 4a	50.9 (8.5)	49.9 (8.7)	1.02 (0.52) [0.00–2.57]
Short form 6a	49.7 (9.3)	49.8 (9.0)	0.52 (0.29) [0.00–1.24]
Short form 8a	50.3 (9.2)	50.4 (8.9)	0.42 (0.25) [0.01–1.10]
Short form 8b	50.2 (9.3)	50.2 (9.1)	0.38 (0.22) [0.00–1.15]

Open in a new tab

SD: standard deviation

^a T-scores, higher scores represent more anxiety/depression

^b All items have the US item parameters

^c Non-DIF items have the US item parameters, DIF items have the Dutch item parameters rescaled to the US metric

Dutch reference values for anxiety and depression and comparisons with the US general population, using the original US item parameters, are presented in Table 4. Differences between T-scores of the Dutch general population and T-scores of the US general population for anxiety and depression were small (difference of 0.1 and 0.4 for anxiety and depression, respectively). Differences between T-scores of the Dutch general population and T-scores of the US general population for age-range and gender subpopulations were also small (differences between 0.1 and 0.7 for anxiety and between 0.1 and 1.4 for depression). T-scores of the Dutch general population and US general population showed similar patterns, with males scoring lower (i.e. less anxious or depressed) than females and lower scores for older age groups.

Table 4. PROMIS anxiety and depression Dutch reference values^a by age and gender and comparisons with the US general population [58].

		Anxiety			Depression
	N Dutch population (%)	N US population (%)	Dutch mean T- score (SD)	US mean T-score (SD)	N US population (%)	Dutch mean T-score (SD)	US mean T-score (SD)
Total	1002 (100)	2724 (100)	49.9 (10.1)	50.0 (10.0)	2160 (100)	49.6 (10.0)	50.0 (10.0)
Gender
Male	480 (48)	1069 (39)	49.0 (10.0)	48.6 (9.5)	890 (41)	48.8 (10.1)	48.7 (9.7)
Female	522 (52)	1654 (61)	50.6 (10.1)	50.9 (10.2)	1269 (59)	50.4 (9.9)	50.9 (10.1)
Age in years
18–34	253 (25)	659 (24)	51.8 (9.9)	52.4 (10.7)	496 (23)	52.0 (9.3)	52.3 (10.9)
35–44	147 (15)	496 (18)	51.4 (10.9)	50.9 (11.1)	366 (17)	50.5 (10.8)	50.6 (10.9)
45–54	173 (17)	417 (15)	50.0 (10.9)	50.1 (9.5)	359 (17)	50.0 (11.0)	50.8 (10.0)
55–64	216 (22)	442 (16)	48.9 (9.4)	49.3 (9.5)	373 (17)	48.8 (9.8)	49.5 (9.7)
65–74	191 (19)	365 (13)	47.5 (9.1)	48.1 (8.8)	290 (13)	47.0 (8.9)	48.4 (8.8)
75+	22 (2)	345 (13)	46.2 (9.4)	46.9 (7.9)	276 (13)	46.0 (9.6)	46.5 (7.2)

Open in a new tab

SD: standard deviation

^a T-scores, higher scores represent more anxiety/depression; T-scores were calculated based on the original US item parameters

Using 0.5, 1 and 2 standard deviations, thresholds for mild, moderate and severe anxiety were set to 55, 60 and 70 respectively. The same thresholds applied for depression (Fig 2). When these thresholds were applied to anxiety T-scores of participants, 70% fell within normal limits (i.e. ≤55), 14% had mild symptoms (i.e. 56–60), 15% had moderate symptoms (i.e. 61–70) and 1% had severe symptoms (i.e. >70). For depression, 71% fell within normal limits (i.e. ≤55), 15% had mild symptoms (i.e. 56–60), 13% had moderate symptoms (i.e. 61–70) and 1% had severe symptoms (i.e. >70).

Fig 2 — The blue lines represent imaginary data showing the course of symptoms over three consecutive assessments (T1, T2 and T3).

Discussion

This study assessed DIF for language between the Netherlands and the US for the PROMIS Anxiety and Depression item banks, and presented Dutch reference values for the general population and relevant subpopulations. We found some items with DIF, but the impact of DIF on population level T-scores was small, both for full item banks as for short forms. This supports the applicability of the US scoring algorithm in the Netherlands and strengthens the cross-cultural validity of the Dutch-Flemish PROMIS Anxiety and Depression item banks. It enables the comparison of scores between Dutch and US populations. The established Dutch reference values can be used to interpret symptoms of anxiety and depression in research and clinical practice.

We only found a limited number of items with DIF, which had negligible impact on total scores when all items in the item banks were administered. However, the impact of DIF might be more in short forms wherein DIF items were present, because only a small number of items are administered. On a population level, the impact of DIF on T-scores was small for all short forms. Because DIF in the two depression items had opposite direction, the effect of DIF might have been canceled out in the short forms in which both items were present. On individual level the impact was larger, especially for the depression short form 4a, with a maximum difference of 2.6 points. This is close to the amount of 3 points that is generally considered minimally important [60–63], and therefore this short form might not be the best option to assess symptoms of depression in individuals. Most DIF items were not frequently administered in CAT-based assessments, but the item ‘I felt unhappy’ was present in all CATs [32]. In a future study, it might be interesting to explore the impact of DIF on CAT T-scores and also to explore whether omitting DIF items from the item bank could result in equally precise scores and similar amount of items administered.

DIF for language could be caused by a lack of translational equivalence [64]. In the development of PROMIS measures generally a translatability review is performed, but only for Spanish. In a translatability review, the original measure is reviewed to determine its suitability for future translations. A translatability review is best conducted as early as possible during the development of a new measure, preferably before quantitative testing, as changes to the measure can still be made at this point. To make new PROMIS measures more applicable for translations to other languages, which is increasingly occurring, a broader translatability review might be a useful additional step in the developmental process of PROMIS measures. During the Dutch-Flemish translation process of the PROMIS Anxiety and Depression item banks, no particular difficulties were experienced with translating the items showing DIF for language [42]. The items ‘I felt worried’ and ‘I felt unhappy’ also showed DIF for language in a study comparing the Brazilian to the US version [41], but studies in Germany and Spain found different DIF items [65, 66]. Although a translatability review might reduce translation difficulties, it does not replace the evaluation of DIF for language, as DIF can also occur due to cultural differences [67]. Therefore DIF studies are recommended after every translation [52].

The negligible impact of DIF made it possible to compare item bank scores of the Dutch general population to the US general population. T-scores of the Dutch general population were similar to scores of the US general population, both for the total population (difference of 0.1 for anxiety and 0.4 for depression) and for age-range and gender subpopulations. Unfortunately, it is not clear yet what a minimal important difference is in scores between groups for anxiety and depression [68], although most studies suggest a within-person change of at least 3 points to be meaningful [60–63]. Thus, we think it is safe to conclude that T-scores of the Dutch general population were similar to scores of the US general population. Because of similarity in scores and standard deviations, thresholds for mild, moderate and severe symptoms of anxiety and depression were identical to the thresholds based on the US data [57, 63].

The inclusion of anxiety and depression outcomes in many ICHOM Standard Sets shows that measuring anxiety and depression is relevant for many patient groups and persons without diseases, and not only those with mental disorders [6, 7]. In the Standard Set for Overall Adult Health, it is advocated to measure anxiety and depression via the PROMIS Scale v1.2 –Global Health [69], resulting in a global mental health score [70], for which Dutch reference values recently have been published [48]. In the other 16 Standard Sets that include anxiety and depression, a range of PROMs is advocated, including disease-specific PROMs, cancer-specific PROMs, anxiety/depression-specific PROMs and generic PROMs [6, 7]. A more universal and standardized approach to measuring anxiety and depression will facilitate outcome measurement in clinical practice and comparisons of scores across patient groups [14, 15]. PROMIS anxiety and depression instruments offer opportunities here, and the results of this study expands their utility.

PROMIS anxiety and depression instruments have several advantages over current legacy instruments for anxiety and depression. First, PROMIS instruments are applicable across the general population and various patient groups, as well as those patients with multimorbidity, rare diseases, or without a definite diagnoses [17, 18, 20]. This enables the comparison of patient groups, benchmarking and improving the quality of care. Second, PROMIS Anxiety and Depression item banks can be used as CAT, which reduces the response burden while high measurement precision is maintained, and as such is valuable in clinical practice [23]. Currently a limited number of countries, including the Netherlands, have access to technical solutions for CAT applications, but this is expected to expand rapidly in the near future. Third, several crosswalk studies have linked scores of legacy instruments to PROMIS anxiety and depression instruments [71–75], which facilitates the uptake of PROMIS instruments and the interpretation of scores, even when legacy instruments have been used in the past. Last, PROMIS is a sustainable state-of-the-art measurement system that is actively maintained by the PROMIS Health Organization, in order to facilitate the widespread use and adoption of PROMIS in research and clinical practice.

A strength of the present study is that we not only assessed the impact of DIF when all items were considered, but applied Stocking-Lord constants to investigate the impact of DIF on T-scores of full item banks and short forms. Moreover, the large sample size made sure that the Dutch reference values have been estimated reliably. However, some subgroups (especially adults ages 75 years and older) were relatively small, which can be considered a limitation. Second, although our sample was broadly representative for the Dutch general population on some important characteristics, we cannot be certain that this is also the case for other important characteristics, such as income level and employment status. One could argue that persons who have the time to participate in an internet panel and complete item banks, might more often be persons without full-time employment, which might in turn be caused by physical or mental problems. The non-probabilistic selection procedure might have had an impact on the general population reference scores presented in this article.

Conclusions

The limited number of items with DIF in PROMIS Anxiety and Depression item banks, having small impact on population T-scores, supports the applicability of the US scoring algorithm and enables the comparison of scores of the Dutch and US population. The Dutch general population had a T-score of 49.9 for anxiety and 49.6 for depression, similar to the T-scores of 50 of the US general population. The Dutch reference values reported in this study provide an important tool for healthcare professionals and researchers to evaluate and interpret symptoms of anxiety and depression. The presented reference values for subpopulations allow a more tailored and relevant interpretation and understanding of symptoms of anxiety and depression. Incorporating the Dutch reference values and thresholds in the feedback patients and healthcare professionals receive regarding their mental health status as assessed with PROMIS anxiety and depression instruments, will facilitate interpretation of scores by patients and healthcare professionals. The availability of Dutch reference values may stimulate the uptake of PROMIS instruments for anxiety and depression, and contribute to standardized measurements of anxiety and depression.

Supporting information

S1 Fig. Category response curves for items displaying DIF.

(TIF)

Click here for additional data file.^{(89.3KB, tif)}

S2 Fig. Relation between T-scores and differences in T-scores original vs. hybrid approach.

(TIF)

Click here for additional data file.^{(199.8KB, tif)}

S1 Data. Anxiety data of respondents.

(POR)

Click here for additional data file.^{(102.2KB, por)}

S2 Data. Depression data of respondents.

(POR)

Click here for additional data file.^{(99.6KB, por)}

Data Availability

All relevant data are within the article and its Supporting Information files.

Funding Statement

The PROMIS Health Organization is a non-profit charitable foundation and the Dutch-Flemish PROMIS National Center is a network of local members of the PHO who are developing or applying PROMIS measures in the Netherlands or Belgium. Both organizations did not provide support in the form of salaries for authors CT and LR, and did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section. The author(s) received no specific funding for this work.

References

1.Lloyd C, Dyer P, Barnett A. Prevalence of symptoms of depression and anxiety in a diabetes clinic population. Diabetic medicine. 2000;17(3):198–202. doi: 10.1046/j.1464-5491.2000.00260.x [DOI] [PubMed] [Google Scholar]
2.Singer S, Das-Munshi J, Brähler E. Prevalence of mental health conditions in cancer patients in acute care—a meta-analysis. Annals of Oncology. 2010;21(5):925–30. doi: 10.1093/annonc/mdp515 [DOI] [PubMed] [Google Scholar]
3.Hare DL, Toukhsati SR, Johansson P, Jaarsma T. Depression and cardiovascular disease: a clinical review. European heart journal. 2014;35(21):1365–72. doi: 10.1093/eurheartj/eht462 [DOI] [PubMed] [Google Scholar]
4.Frank JD. Psychotherapy: The restoration of morale. American Journal of Psychiatry. 1974;131(3):271–4. doi: 10.1176/ajp.131.3.271 [DOI] [PubMed] [Google Scholar]
5.Clarke DM, Kissane DW. Demoralization: its phenomenology and importance. Australian & New Zealand Journal of Psychiatry. 2002;36(6):733–42. doi: 10.1046/j.1440-1614.2002.01086.x [DOI] [PubMed] [Google Scholar]
6.ICHOM. International Consortium for Health Outcomes Measurement (ICHOM) 2021. Available from: www.ichom.org.
7.Terwee CB, Zuidgeest M, Vonkeman HE, Cella D, Haverman L, Roorda LD. Common patient-reported outcomes across ICHOM Standard Sets: the potential contribution of PROMIS®. BMC Medical Informatics and Decision Making. 2021;21(1):1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Nezu AM, Ronan GF, Meadows EA, McClure KS. Practitioner’s guide to empirically-based measures of depression: Springer Science & Business Media; 2000. [Google Scholar]
9.Roemer L. Measures for anxiety and related constructs. Practitioner’s guide to empirically based measures of anxiety: Springer; 2002. p. 49–83. [Google Scholar]
10.Antony MM, Stein MB. Oxford handbook of anxiety and related disorders: Oxford University Press; 2008. [Google Scholar]
11.Vodermaier A, Linden W, Siu C. Screening for emotional distress in cancer patients: a systematic review of assessment instruments. Journal of the National Cancer Institute. 2009;101(21):1464–88. doi: 10.1093/jnci/djp336 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.McHugh RK, Rasmussen JL, Otto MW. Comprehension of self‐report evidence‐based measures of anxiety. Depression and Anxiety. 2011;28(7):607–14. doi: 10.1002/da.20827 [DOI] [PubMed] [Google Scholar]
13.Nelson CJ, Cho C, Berk AR, Holland J, Roth AJ. Are gold standard depression measures appropriate for use in geriatric cancer patients? A systematic evaluation of self-report depression instruments used with geriatric, cancer, and geriatric cancer samples. Journal of Clinical Oncology. 2010;28(2):348. doi: 10.1200/JCO.2009.23.0201 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Calvert M, Kyte D, Price G, Valderas JM, Hjollund NH. Maximising the impact of patient reported outcome assessment for patients and society. BMJ. 2019;364. doi: 10.1136/bmj.k5267 [DOI] [PubMed] [Google Scholar]
15.Jim HS, Hoogland AI, Brownstein NC, Barata A, Dicker AP, Knoop H, et al. Innovations in research and clinical care using patient‐generated health data. CA: a cancer journal for clinicians. 2020;70(3):182–99. doi: 10.3322/caac.21608 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Eton DT, Beebe TJ, Hagen PT, Halyard MY, Montori VM, Naessens JM, et al. Harmonizing and consolidating the measurement of patient-reported information at health care institutions: a position statement of the Mayo Clinic. Patient Related Outcome Measures. 2014;5:7. doi: 10.2147/PROM.S55069 [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Cella D, Riley W, Stone A, Rothrock N, Reeve B, Yount S, et al. The Patient-Reported Outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. Journal of clinical epidemiology. 2010;63(11):1179–94. doi: 10.1016/j.jclinepi.2010.04.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Cella D, Yount S, Rothrock N, Gershon R, Cook K, Reeve B, et al. The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years. Medical care. 2007;45(5 Suppl 1):S3. doi: 10.1097/01.mlr.0000258615.42478.55 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Pilkonis PA, Yu L, Dodds NE, Johnston KL, Maihoefer CC, Lawrence SM. Validation of the depression item bank from the Patient-Reported Outcomes Measurement Information System (PROMIS®) in a three-month observational study. Journal of psychiatric research. 2014;56:112–9. doi: 10.1016/j.jpsychires.2014.05.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Schalet BD, Pilkonis PA, Yu L, Dodds N, Johnston KL, Yount S, et al. Clinical validity of PROMIS depression, anxiety, and anger across diverse clinical samples. Journal of clinical epidemiology. 2016;73:119–27. doi: 10.1016/j.jclinepi.2015.08.036 [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Pilkonis PA, Choi SW, Reise SP, Stover AM, Riley WT, Cella D, et al. Item banks for measuring emotional distress from the Patient-Reported Outcomes Measurement Information System (PROMIS®): depression, anxiety, and anger. Assessment. 2011;18(3):263–83. doi: 10.1177/1073191111411667 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Embretson SE, Reise SP. Item response theory: Psychology Press; 2013. [Google Scholar]
23.Cella D, Gershon R, Lai J-S, Choi S. The future of outcomes measurement: item banking, tailored short-forms, and computerized adaptive assessment. Quality of Life Research. 2007;16(1):133–41. [DOI] [PubMed] [Google Scholar]
24.HealthMeasures. PROMIS Anxiety Scoring Manual [cited 2020]. Available from: https://www.healthmeasures.net/images/PROMIS/manuals/PROMIS_Anxiety_Scoring_Manual.pdf.
25.HealthMeasures. PROMIS Depression Scoring Manual [cited 2020]. Available from: https://www.healthmeasures.net/images/PROMIS/manuals/PROMIS_Depression_Scoring_Manual.pdf.
26.HealthMeasures. How to select a HealthMeasure 2020 [cited 2020 December]. Available from: https://www.healthmeasures.net/applications-of-healthmeasures/guidance/selecting-a-healthmeasure.
27.Purvis TE, Neuman BJ, Riley LH, Skolasky RL. Comparison of PROMIS Anxiety and Depression, PHQ-8, and GAD-7 to screen for anxiety and depression among patients presenting for spine surgery. Journal of Neurosurgery: Spine. 2019;30(4):524–31. [DOI] [PubMed] [Google Scholar]
28.Sunderland M, Batterham P, Calear A, Carragher N. Validity of the PROMIS depression and anxiety common metrics in an online sample of Australian adults. Quality of Life Research. 2018;27(9):2453–8. doi: 10.1007/s11136-018-1905-5 [DOI] [PubMed] [Google Scholar]
29.Clover K, Lambert SD, Oldmeadow C, Britton B, King MT, Mitchell AJ, et al. PROMIS depression measures perform similarly to legacy measures relative to a structured diagnostic interview for depression in cancer patients. Quality of Life Research. 2018;27(5):1357–67. doi: 10.1007/s11136-018-1803-x [DOI] [PubMed] [Google Scholar]
30.Amtmann D, Kim J, Chung H, Bamer AM, Askew RL, Wu S, et al. Comparing CESD-10, PHQ-9, and PROMIS depression instruments in individuals with multiple sclerosis. Rehabilitation psychology. 2014;59(2):220. doi: 10.1037/a0035919 [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Freedland KE, Steinmeyer BC, Carney RM, Rubin EH, Rich MW. Use of the PROMIS® Depression scale and the Beck Depression Inventory in patients with heart failure. Health Psychology. 2019;38(5):369. doi: 10.1037/hea0000682 [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Flens G, Terwee CB, Smits N, Williams G, Spinhoven P, Roorda LD, et al. Construct validity, responsiveness, and utility of change indicators of the Dutch-Flemish PROMIS item banks for depression and anxiety administered as computerized adaptive test (CAT): A comparison with the Brief Symptom Inventory (BSI). Psychological Assessment. 2021. [DOI] [PubMed] [Google Scholar]
33.Lizzio VA, Blanchett J, Borowsky P, Meldau JE, Verma NN, Muh S, et al. Feasibility of PROMIS CAT administration in the ambulatory sports medicine clinic with respect to cost and patient compliance: a single-surgeon experience. Orthopaedic journal of sports medicine. 2019;7(1):2325967118821875. doi: 10.1177/2325967118821875 [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Beleckas CM, Prather H, Guattery J, Wright M, Kelly M, Calfee RP. Anxiety in the orthopedic patient: using PROMIS to assess mental health. Quality of Life Research. 2018;27(9):2275–82. doi: 10.1007/s11136-018-1867-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Papuga MO, Dasilva C, McIntyre A, Mitten D, Kates S, Baumhauer J. Large-scale clinical implementation of PROMIS computer adaptive testing with direct incorporation into the electronic medical record. Health Systems. 2018;7(1):1–12. doi: 10.1057/s41306-016-0016-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Wagner LI, Schink J, Bass M, Patel S, Diaz MV, Rothrock N, et al. Bringing PROMIS to practice: brief and precise symptom screening in ambulatory cancer care. Cancer. 2015;121(6):927–34. doi: 10.1002/cncr.29104 [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Scholle SH, Morton S, Homco J, Rodriguez K, Anderson D, Hahn E, et al. Implementation of the PROMIS-29 in routine care for people with diabetes: challenges and opportunities. The Journal of ambulatory care management. 2018;41(4):274–87. doi: 10.1097/JAC.0000000000000248 [DOI] [PubMed] [Google Scholar]
38.HealthMeasures. Available translations 2020 [cited 2020 December]. Available from: https://www.healthmeasures.net/explore-measurement-systems/promis/intro-to-promis/available-translations.
39.Vilagut G, Forero C, Adroher N, Olariu E, Cella D, Alonso J, et al. Testing the PROMIS® Depression measures for monitoring depression in a clinical sample outside the US. Journal of psychiatric research. 2015;68:140–50. doi: 10.1016/j.jpsychires.2015.06.009 [DOI] [PubMed] [Google Scholar]
40.Jakob T, Nagl M, Gramm L, Heyduck K, Farin E, Glattacker M. Psychometric properties of a German translation of the PROMIS® depression item bank. Evaluation & the health professions. 2017;40(1):106–20. [DOI] [PubMed] [Google Scholar]
41.de Castro NFC, Pinto RdMC, da Silva Mendonça TM, da Silva CHM. Psychometric validation of PROMIS® Anxiety and Depression Item Banks for the Brazilian population. Quality of Life Research. 2020;29(1):201–11. doi: 10.1007/s11136-019-02319-1 [DOI] [PubMed] [Google Scholar]
42.Terwee C, Roorda L, De Vet H, Dekker J, Westhovens R, Van Leeuwen J, et al. Dutch–Flemish translation of 17 item banks from the patient-reported outcomes measurement information system (PROMIS). Quality of Life Research. 2014;23(6):1733–41. doi: 10.1007/s11136-013-0611-6 [DOI] [PubMed] [Google Scholar]
43.Flens G, Smits N, Terwee CB, Dekker J, Huijbrechts I, de Beurs E. Development of a computer adaptive test for depression based on the Dutch-Flemish version of the PROMIS item bank. Evaluation & the health professions. 2017;40(1):79–105. doi: 10.1177/0163278716684168 [DOI] [PubMed] [Google Scholar]
44.Flens G, Smits N, Terwee CB, Dekker J, Huijbrechts I, Spinhoven P, et al. Development of a computerized adaptive test for anxiety based on the Dutch–Flemish version of the PROMIS item bank. Assessment. 2019;26(7):1362–74. doi: 10.1177/1073191117746742 [DOI] [PubMed] [Google Scholar]
45.Flens G, Smits N, Terwee CB, Pijck L, Spinhoven P, de Beurs E. Practical Significance of Longitudinal Measurement Invariance Violations in the Dutch–Flemish PROMIS Item Banks for Depression and Anxiety: An Illustration With Ordered-Categorical Data. Assessment. 2021;28(1):277–94. doi: 10.1177/1073191119880967 [DOI] [PubMed] [Google Scholar]
46.van Bebber J, Flens G, Wigman JT, de Beurs E, Sytema S, Wunderink L, et al. Application of the Patient‐Reported Outcomes Measurement Information System (PROMIS) item parameters for Anxiety and Depression in the Netherlands. International journal of methods in psychiatric research. 2018;27(4):e1744. doi: 10.1002/mpr.1744 [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Fischer F, Gibbons C, Coste J, Valderas JM, Rose M, Leplège A. Measurement invariance and general population reference values of the PROMIS Profile 29 in the UK, France, and Germany. Quality of Life Research. 2018;27(4):999–1014. doi: 10.1007/s11136-018-1785-8 [DOI] [PubMed] [Google Scholar]
48.Elsman EB, Roorda LD, Crins MH, Boers M, Terwee CB. Dutch reference values for the Patient-Reported Outcomes Measurement Information System Scale v1. 2-Global Health (PROMIS-GH). Journal of Patient-Reported Outcomes. 2021;5(1):1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Cella D. PROMIS 1 Wave 1. Harvard Dataverse; 2015.
50.Reeve BB, Hays RD, Bjorner JB, Cook KF, Crane PK, Teresi JA, et al. Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Medical care. 2007:S22–S31. doi: 10.1097/01.mlr.0000250483.85507.04 [DOI] [PubMed] [Google Scholar]
51.Choi SW, Gibbons LE, Crane PK. Lordif: An R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and Monte Carlo simulations. Journal of statistical software. 2011;39(8):1. doi: 10.18637/jss.v039.i08 [DOI] [PMC free article] [PubMed] [Google Scholar]
52.HealthMeasures. Minimum requirements for the release of PROMIS instruments after translation and recommendations for further psychometirc evaluation. 2014.
53.Chalmers RP. mirt: A multidimensional item response theory package for the R environment. Journal of statistical Software. 2012;48(1):1–29. [Google Scholar]
54.Stocking ML, Lord FM. Developing a common metric in item response theory. Applied psychological measurement. 1983;7(2):201–10. [Google Scholar]
55.Kolen MJ, Brennan RL. Test equating, scaling, and linking: Methods and practices: Springer Science & Business Media; 2014. [Google Scholar]
56.HealthMeasures. PROMIS reference populations 2021 [cited 2021 March]. Available from: https://www.healthmeasures.net/score-and-interpret/interpret-scores/promis/reference-populations.
57.HealthMeasures. PROMIS Score Cut-Points [cited 2020]. Available from: http://www.healthmeasures.net/score-and-interpret/interpret-scores/promis/promis-score-cut-points.
58.HealthMeasures. Gender and Age Range Sub-norms for Adult PROMIS Measures Centered on the US General Census 2000 [cited 2020]. Available from: http://www.healthmeasures.net/score-and-interpret/interpret-scores/promis/reference-populations.
59.van Muilekom MM, Luijten MA, van Oers HA, Terwee CB, van Litsenburg RR, Roorda LD, et al. From statistics to clinics: the visual feedback of PROMIS® CATs. Journal of Patient-Reported Outcomes. 2021;5(1):1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Swanholm E, McDonald W, Makris U, Noe C, Gatchel R. Estimates of Minimally Important Differences (MID s) for Two Patient‐Reported Outcomes Measurement Information System (PROMIS) Computer‐Adaptive Tests in Chronic Pain Patients. Journal of Applied Biobehavioral Research. 2014;19(4):217–32. [Google Scholar]
61.Yost KJ, Eton DT, Garcia SF, Cella D. Minimally important differences were estimated for six Patient-Reported Outcomes Measurement Information System-Cancer scales in advanced-stage cancer patients. Journal of clinical epidemiology. 2011;64(5):507–16. doi: 10.1016/j.jclinepi.2010.11.018 [DOI] [PMC free article] [PubMed] [Google Scholar]
62.Lee AC, Driban JB, Price LL, Harvey WF, Rodday AM, Wang C. Responsiveness and minimally important differences for 4 patient-reported outcomes measurement information system short forms: physical function, pain interference, depression, and anxiety in knee osteoarthritis. The Journal of Pain. 2017;18(9):1096–110. doi: 10.1016/j.jpain.2017.05.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
63.Kroenke K, Stump TE, Chen CX, Kean J, Bair MJ, Damush TM, et al. Minimally important differences and severity thresholds are estimated for the PROMIS depression scales from three randomized clinical trials. Journal of affective disorders. 2020;266:100–8. doi: 10.1016/j.jad.2020.01.101 [DOI] [PMC free article] [PubMed] [Google Scholar]
64.Scott NW, Fayers PM, Aaronson NK, Bottomley A, de Graeff A, Groenvold M, et al. Differential item functioning (DIF) analyses of health-related quality of life instruments using logistic regression. Health and quality of life outcomes. 2010;8(1):1–9. doi: 10.1186/1477-7525-8-81 [DOI] [PMC free article] [PubMed] [Google Scholar]
65.Vilagut G, Forero CG, Castro-Rodriguez JI, Olariu E, Barbaglia G, Astals M, et al. Measurement equivalence of PROMIS depression in Spain and the United States. Psychological assessment. 2019;31(2):248. doi: 10.1037/pas0000665 [DOI] [PubMed] [Google Scholar]
66.Fischer HF, Wahl I, Nolte S, Liegl G, Brähler E, Löwe B, et al. Language‐related differential item functioning between English and German PROMIS Depression items is negligible. International journal of methods in psychiatric research. 2017;26(4):e1530. doi: 10.1002/mpr.1530 [DOI] [PMC free article] [PubMed] [Google Scholar]
67.Acquadro C, Patrick DL, Eremenco S, Martin ML, Kuliś D, Correia H, et al. Emerging good practices for translatability assessment (TA) of patient-reported outcome (PRO) measures. Journal of patient-reported outcomes. 2018;2(1):1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
68.HealthMeasures. Meaningful change for PROMIS [cited 2020]. Available from: http://www.healthmeasures.net/score-and-interpret/interpret-scores/promis/meaningful-change.
69.ICHOM. Overall Adult Health 2021 [cited 2021 February]. Available from: https://www.ichom.org/portfolio/overall-adult-health/.
70.HealthMeasures. PROMIS Global Health Scoring Manual [cited 2020]. Available from: http://www.healthmeasures.net/images/PROMIS/manuals/PROMIS_Global_Scoring_Manual.pdf.
71.Victorson D, Schalet BD, Kundu S, Helfand BT, Novakovic K, Penedo F, et al. Establishing a common metric for self‐reported anxiety in patients with prostate cancer: Linking the Memorial Anxiety Scale for Prostate Cancer with PROMIS Anxiety. Cancer. 2019;125(18):3249–58. doi: 10.1002/cncr.32189 [DOI] [PubMed] [Google Scholar]
72.Kaat AJ, Newcomb ME, Ryan DT, Mustanski B. Expanding a common metric for depression reporting: linking two scales to PROMIS® depression. Quality of Life Research. 2017;26(5):1119–28. doi: 10.1007/s11136-016-1450-z [DOI] [PMC free article] [PubMed] [Google Scholar]
73.Kim J, Chung H, Askew RL, Park R, Jones SM, Cook KF, et al. Translating CESD-20 and PHQ-9 scores to PROMIS depression. Assessment. 2017;24(3):300–7. doi: 10.1177/1073191115607042 [DOI] [PubMed] [Google Scholar]
74.Choi SW, Schalet B, Cook KF, Cella D. Establishing a common metric for depressive symptoms: linking the BDI-II, CES-D, and PHQ-9 to PROMIS depression. Psychological assessment. 2014;26(2):513. doi: 10.1037/a0035768 [DOI] [PMC free article] [PubMed] [Google Scholar]
75.Schalet BD, Cook KF, Choi SW, Cella D. Establishing a common metric for self-reported anxiety: linking the MASQ, PANAS, and GAD-7 to PROMIS Anxiety. Journal of anxiety disorders. 2014;28(1):88–96. doi: 10.1016/j.janxdis.2013.11.006 [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0273287.r001

Decision Letter 0

Vanessa Carels

16 Dec 2021

PONE-D-21-18689Towards standardization of measuring anxiety and depression: Differential item functioning for language and Dutch reference values of PROMIS item banksPLOS ONE

Dear Dr. Terwee,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. The manuscript has been evaluated by two reviewers, and their comments are available below.The reviewers have raised a number of concerns that need attention, and they request additional information on methodological aspects of the study and analyses, as well as additional discussion and contextaualization of the current work. Could you please revise the manuscript to carefully address the concerns raised? Please submit your revised manuscript by Jan 28 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Vanessa Carels

Staff Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf.

2. Thank you for stating the following in the Competing Interests/Financial Disclosure section:

“CB Terwee and LD Roorda are members of the PROMIS Health Organization and the Dutch-Flemish PROMIS National Center, which aim to improve health outcomes by developing, maintaining, improving, and encouraging the application of PROMIS in research and clinical practice. The other authors have no conflict of interest.”

We note that one or more of the authors are employed by a commercial company: PROMIS Health Organization and the Dutch-Flemish PROMIS National Center

a. Please provide an amended Funding Statement declaring this commercial affiliation, as well as a statement regarding the Role of Funders in your study. If the funding organization did not play a role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript and only provided financial support in the form of authors' salaries and/or research materials, please review your statements relating to the author contributions, and ensure you have specifically and accurately indicated the role(s) that these authors had in your study. You can update author roles in the Author Contributions section of the online submission form.

Please also include the following statement within your amended Funding Statement.

“The funder provided support in the form of salaries for authors [insert relevant initials], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.”

If your commercial affiliation did play a role in your study, please state and explain this role within your updated Funding Statement.

b. Please also provide an updated Competing Interests Statement declaring this commercial affiliation along with any other relevant declarations relating to employment, consultancy, patents, products in development, or marketed products, etc.

Within your Competing Interests Statement, please confirm that this commercial affiliation does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests) . If this adherence statement is not accurate and there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.

Please include both an updated Funding Statement and Competing Interests Statement in your cover letter. We will change the online submission form on your behalf.

3. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Please see the following video for instructions on linking an ORCID iD to your Editorial Manager account: https://www.youtube.com/watch?v=_xcclfuvtxQ

4. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Thank you for the opportunity to review your manuscript on testing DIF by language for the PROMIS Depression and Anxiety measures. Overall, the paper is well-written and the methods are rigorous. There were a few places that could use a bit more clarity and expanded discussion (noted below) but overall the manuscript provides a nice addition to the PROM literature on the use of PROMIS instruments across different countries.

1. As someone who has familiarity with developing and using PROMIS instruments, I had an easy time following along but for readers who do not have such familiarity, you might want to explicitly point them to some of the early PROMIS papers that describe the initiative instead of just citing them (current references 17 and 18 could be called out in the text with something such as, "see [17, 18] for an overview of PROMIS for additional insight into the aims and early findings of this initiative" or something like that.

2. Even with my familiarity, there were a few places where I got confused. I went into this assuming there were calibrated and normed item parameters for the Dutch version of the anxiety and depression instruments but now I'm not sure there actually are. Maybe this could be stated upfront? I got confused in the results section (pg. 12, line 254) as to whether you were talking about the Dutch sample scored on the US metric or the Dutch sample scored on the Dutch metric (if there is one?). I think the nuanced difference is that you refer to the Dutch general population versus reference population, but you do say “reference values for…the Dutch general population” so that muddies things. I think this part is referring to the Dutch sample scored with the US algorithm compared to the US reference scores but this could be clarified, especially for people who are not familiar with the PROMIS terminology. Even just saying it bluntly somewhere in the results, e.g., "T-scores between the Dutch sample scored with the US algorithm compared to the US reference T-scores…"

3. In general, the discussion doesn’t really focus on the applicability of the findings and mostly restates the introduction. It would be helpful to focus more on how performing this DIF analysis relates to being able to use the PROMIS measures - does it expand their utility to enable reliable and valid global comparison between two countries? Does it say anything about language differences across the Netherlands and the US that may be important for consideration when developing new measures or interpreting current ones? Generally we do a Spanish translatability review so it’s unsurprising that the Spain DIF didn’t find differences - do your results suggest the US development process should include a broader translatability review? Or are there too many natural linguistic differences across countries that such an initiative would be too hard to conduct and better to consider via DIF afterwards? For the Netherlands specifically, did any of the US items have to be translated to have close but not the exact same wording that might be playing a role here? I reviewed the Terwee et al. 2014 paper and it doesn't look like there were issues for Anxiety and Depression but may be worth referring back to in this paper as a reason to retain the items with minimal DIF as well. Overall, I think the discussion could be strengthened by including more considerations like these instead of just restating what's in the introduction.

Reviewer #2: This is a methodological article on the performance of the Dutch version of two PROMIS domains on the mental dimension of health, the PROMIS depression and Anxiety domains. The article has two main objectives: firstly, it aims to assess differential item functioning of the Dutch PROMIS Depression and Anxiety item banks compared to the original English version; and second, to provide population reference norms. The article is relevant as it provides further evidence on the validity and comparability of the Dutch version of these measures for their use in an international context. Additionally, population reference norms obtained for the Dutch population provide additional aids to facilitate interpretation of the scores within the Dutch context.

The article is clearly written and follows state of the art methods to pursue it objectives, and a large, adequately sized sample is used. Here I include some comments intended to provide the authors with suggestions to enhance understanding of the methods followed in some sections and improve the overall quality of the manuscript.

Specific comments:

- Participants and Methods, page 6: The sample was selected from an internet panel. Although evidence is presented that the sample has broadly the same distribution as the Dutch adult population with regard to main variables (age, sex, education level, region and ethnicity), my main concern has to do with the representativeness of the sample as it was obtained using non-probabilistic methods where not all individuals on the Dutch general population had equal probability of selection. This selection procedure may have an impact on the general population Norms presented, and should at least be acknowledged in the limitations section of the article.

- Statistical analysis, page 7: A reference to justify the selected cut-off point used to determine DIF on the McFadden’s pseudo R2 should be added. Moreover, more details on the Monte Carlo simulation, its usefulness and the results obtained regarding these simulations should be provided. Are the Monte Carlo simulations applied those implemented in the lordif package in R? If so, these simulations generate empirical distributions of the McFadden’s pseudo R2 statistic under no-DIF conditions and preserving observed group differences in the ability level. This would serve to determine an empirical cut-off point for the statistic used to determine DIF. How many replications were conducted? Which were the results of the Monte carlo simulations? How did the Monte Carlo simulations results differ from the threshold used for the McFadden’s pseudo R2?

- Statistical analysis, page 8: the part describing equating of the Dutch item parameters with DIF to US metric using Stocking and Lord transformation is disproportionately long, and I would recommend to reduce it.

- Statistical analysis, page 8: “T-scores were calculated with the original US item parameters, as well as with a hybrid set of item parameters, and subsequently compared”: it should be indicated how the T-scores were obtained, i.e. was Expected a posteriori (EAP) estimation from the GRM model used? .

- Statistical analysis, page 8: “To investigate the impact of DIF on CATs, it was assessed how often the DIF items were included in CATs, based on 4047 CATs for anxiety and 4293 CATs for depression from an ongoing study [32].”. Please, clarify the source of the sample from this study. Is it general population or a patients’ sample? The items selected for a CAT may depend on the ability of the individual assessed.

- Discussion, page 15, second paragraph: “The inclusion of anxiety and depression outcomes in many ICHOM standard Sets shows that measuring anxiety and depression is relevant for many patient groups, and not only those with mental disorders (…). A more universal and standardized approach to measuring anxiety and depression will facilitate outcome measurement in clinical practice and comparisons of scores across patient groups [14, 15]. PROMIS anxiety and depression instruments offer opportunities here.”. This paragraph is quite redundant with the first paragraph of the introduction and it does not add additional information to the discussion related to the results obtained. Therefore, I suggest to eliminate it or reduce it.

Minor comments:

- Participants and procedures, page 6: “Participants needed to be representative for the Dutch general population with respect to age distribution, gender, educational level (low, middle, high), region of residence (north, east, south, west) and ethnicity (native Dutch, first- and second-generation western immigrant, first- and second generation non-western immigrant).

- Measures section, page 6, description of the PROMIS Item Banks V1.0 - Anxiety and Depression: The reference for the article where the development of the PROMIS mental health domains (Pilkonis et al,2021. doi:10.1177/1073191111411667) should be added here.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Aug 23;17(8):e0273287. doi: 10.1371/journal.pone.0273287.r002

Author response to Decision Letter 0

10 Jan 2022

Please note that we have also uploaded a 'response to reviewers' Word document.

PLOS ONE

Staff editor: Vanessa Carels

10 January 2022

Manuscript ID: PONE-D-21-18689

Title: Towards standardization of measuring anxiety and depression: Differential item functioning for language and Dutch reference values of PROMIS item banks

Dear dr. Carels,

First of all, we would like to thank you for the opportunity to submit a revised version of our manuscript. We thank the reviewers for their careful examination of the manuscript and useful suggestions to improve the quality of the paper. Below is our point-by-point response to each of the reviewers’ comments. Our manuscript contains tracked changes to highlight all changes that have been made to the manuscript. Additionally, we have included an unmarked version of the manuscript without tracked changes. Please note that the page numbers in the authors’ reply refer to the revised manuscript with tracked changes.

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf.

We have taken careful consideration that our revised manuscript meets the PLOS ONE style requirements.

2. Thank you for stating the following in the Competing Interests/Financial Disclosure section:

We note that one or more of the authors are employed by a commercial company: PROMIS Health Organization and the Dutch-Flemish PROMIS National Center

Please also include the following statement within your amended Funding Statement.

If your commercial affiliation did play a role in your study, please state and explain this role within your updated Funding Statement.

We have updated the Author Contributions section in the online submission form.

Please include both an updated Funding Statement and Competing Interests Statement in your cover letter. We will change the online submission form on your behalf.

Thank you for changing this information on our behalf. For the Funding statement, the following text can be inserted:

“The PROMIS Health Organization is a non-profit charitable foundation and the Dutch-Flemish PROMIS National Center is a network of local members of the PHO who are developing or applying PROMIS measures in the Netherlands or Belgium. Both organizations did not provide support in the form of salaries for authors CT and LR, and did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.”

For the Competing Interest statement, the following text can be inserted:

We have linked the corresponding author’s ORCID ID to editorial manager.

We have reviewed our reference list and updated the references that were previously indicated as ‘submitted’ but at this time have been published. We could not identify any papers that have been retracted in our reference list.

Review Comments to the Author

Reviewer #1:

Thank you for the opportunity to review your manuscript on testing DIF by language for the PROMIS Depression and Anxiety measures. Overall, the paper is well-written and the methods are rigorous. There were a few places that could use a bit more clarity and expanded discussion (noted below) but overall the manuscript provides a nice addition to the PROM literature on the use of PROMIS instruments across different countries.

We thank the reviewer for the compliments and will reply to the more detailed comments below.

We agree with the reviewer that less familiar readers might benefit from explicitly pointing them towards tome of the early PROMIS papers, and have changed the manuscript accordingly (page 3-4).

We thank the reviewer for this valuable comment. Although Dutch item parameters were used in the Netherlands for a while as an experiment, they are no longer used. The T-scores of the anxiety and depression instruments were therefore calculated in this study using the US calibrated and normed item parameters, which is standard practice for PROMIS measures (see also highlighted sentence page 7). This is also the reason we wanted to investigate DIF for language, because only if items are free of DIF, the US scoring algorithm is appropriate to use and scores between countries can be compared. We estimated Dutch item parameters from a graded response model fitted on the Dutch general population sample (see also highlighted sentence page 8), and have used these to obtain a hybrid set of parameters for those items that had DIF for language. We could then compare the scores resulting from the hybrid set of item parameters with the scores resulting from the original US item parameters. For the comparison of scores between the Dutch population and the US population, we have calculated T-scores with the original US item parameters (see also highlighted sentence on page 9). We agree with the reviewer that some information could be stated more upfront and can be clarified in relevant parts throughout the manuscript. We also agree that providing reference values for the Dutch population and comparing these to the US reference population is confusing. We have changed the manuscript to resolve these issues (page 2, 5, 7, 8, 9, 13, 15, 16, 18).

The reviewer makes some valuable suggestions that will strengthen our discussion section. With respect to the utility of PROMIS measures and comparing scores between two countries, we believe that the DIF analyses and the minimal impact on population level T-scores supports the applicability of the US scoring algorithm and strengthens the validity of the Dutch-Flemish item banks. Because of the results of the DIF analyses, we were able to validly compare Dutch population’ scores with US population’ scores. We have adapted the first paragraph of the discussion to reflect on this (page 14). Conducting a broader translatability review for new PROMIS measures than just Spanish might be a valuable additional step in the developmental process, especially because PROMIS measures are increasingly being translated to many different languages. However, although it might reduce difficulties with translations of items, it does not replace the evaluation of DIF for language after a measure has been developed and translated because DIF may also occur because of cultural differences . For the DIF items in our study, no particular difficulties were encountered during their translation, but it is remarkable that some of these items also showed DIF for language in other countries. We have elaborated on this in the discussion (page 15). We have also adapted the introduction and discussion to diminish the amount of overlap in these sections (page 3, 14-15, 16).

Reviewer #2:

This is a methodological article on the performance of the Dutch version of two PROMIS domains on the mental dimension of health, the PROMIS depression and Anxiety domains. The article has two main objectives: firstly, it aims to assess differential item functioning of the Dutch PROMIS Depression and Anxiety item banks compared to the original English version; and second, to provide population reference norms. The article is relevant as it provides further evidence on the validity and comparability of the Dutch version of these measures for their use in an international context. Additionally, population reference norms obtained for the Dutch population provide additional aids to facilitate interpretation of the scores within the Dutch context. The article is clearly written and follows state of the art methods to pursue it objectives, and a large, adequately sized sample is used. Here I include some comments intended to provide the authors with suggestions to enhance understanding of the methods followed in some sections and improve the overall quality of the manuscript.

We thank the reviewer for acknowledging the relevance of our article and the other compliments. We reply to the more detailed comments below.

Specific comments:

The reviewer makes a valid remark. Indeed, although the sample is broadly representative for the Dutch population on important characteristics, we cannot be sure this is true for other important characteristics, such as income levels, employment and presence of disease. For example, those who have time to participate in an internet panel and complete these item banks voluntarily, might more often be persons without full-time employment, for example caused by physical or mental complaints. We have now explained this in the discussion (page 17).

We thank the reviewer for these questions. We have added references to justify the selected cut-off point for the McFadden’s pseudo R2 (page 8). Indeed, the Monte Carlo simulations were those implemented in the lordif package in R. The Monte Carlo simulations implemented in the lordif package are driven by type I error control (i.e. control of false positive results). We conducted 1000 replications (this took approximately 2 hours per item bank on a new, high-speed computer), and found that the empirical threshold values for probability associated with the χ2 statistic were all close to the nominal α (=0.01) level: for anxiety mean 1-2=0.009, mean 1-3=0.01, mean 2-3=0.009, and for depression mean 1-2=0.01, mean 1-3=0.01, mean 2-3=0.01. This suggests that the type I error rate is well controlled, and there is no need for establishing empirical thresholds through Monte Carlo simulations . The resulting McFadden’s pseudo R2 thresholds from the Monte Carlo simulations were all very small (≤0.0004 for anxiety and ≤0.0003 for depression), and as this is an effect size measure, applying a threshold that is substantially less than what would be considered a small but meaningful effect (e.g. 0.02) would not be meaningful according to any standard3. Based on these results, we concluded that type I error rate was well controlled for both items banks and we maintained the nominal α level of 0.01 and the McFadden’s pseudo R2 value of 0.02. We have elaborated on this in the manuscript (page 8 and 10-11).

We agree with the reviewer that this section could be reduced, and have done so accordingly (page 8-9).

Indeed, expected a posteriori estimates from the GRM were used (see also highlighted sentence page 7). We have now added this to the above sentence as well (page 8).

It concerned a clinical population sample of adult patients who started outpatient treatment for common mental disorders. We have added this to the manuscript (page 6).

We agree with the reviewer that some of the information was redundant, and we have adapted the introduction and reduced this paragraph in the discussion to eliminate redundant information (page 3, page 16).

Minor comments:

We are not sure what the reviewer means with this comment.

We have added this reference (page 6-7).

We hope we adequately addressed all points and comments made by the reviewers. Thank you for reconsidering our manuscript. We look forward to receiving your response in due course.

Yours sincerely,

The authors

Attachment

Submitted filename: Response to Reviewers.docx

Click here for additional data file.^{(31.6KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0273287.r003

Decision Letter 1

Jianhong Zhou

30 Mar 2022

PONE-D-21-18689R1Towards standardization of measuring anxiety and depression: Differential item functioning for language and Dutch reference values of PROMIS item banksPLOS ONE

Dear Dr. Terwee,

Specifically, please address the remaining concerns from Reviewer 2.

Please submit your revised manuscript by May 14 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Jianhong Zhou

Associate Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #1: Thank you for addressing all of my comments. The manuscript is a great contribution to the measurement science literature as well as to researchers and clinicians implementing PROMIS.

Reviewer #2: I really appreciate the author’s careful consideration of the reviewers’ comments and clear responses to them. I only have a few additional minor suggestions for further clarification (the indication of page and line numbers refer to the version with track changes):

- Page 5, line 107; and Page9, line 194: The indication that the reference values provided refer to the Dutch general population has been deleted. However, I think it is important to indicate which is the reference population for which the reference values are provided, as reference values (or sub-norms) can be provided for any relevant population (see norms and sub-norms chapter in PROMIS Score and Interpret section of the PROMIS web page : https://www.healthmeasures.net/score-and-interpret/interpret-scores/promis/reference-populations). Therefore, I would keep the indication that reference values obtained are Dutch general population-based reference values for the overall population and by age and gender.

- Page 7, line 143: “expressed as T-scores, with a mean of 50 and a standard deviation of 10 for the US general population [16]”: Please check this reference, as I think it does not refer to scoring interpretation of PROMIS. Should this be reference 17 instead?

- Pages 10 (line 216) and 11 (line 220), reference 58: Given this reference is a personal e-mail sent to the author, I don’t think this can be included among the references of the article.

- Page 12, line 248: “The Stocking-Lord constants were Α=0.7604 and Β=-0.1070 for anxiety and Α=0.7786 and Β=0.0339 for depression”: Following my suggestions the information regarding stocking and lord transformation has been substantially reduced, which I think is Good. Now, I don’t think it is relevant to include the SL constants A and B for anxiety and depression. Otherwise, if included, additional indication should be provided to indicate what A and B mean (perhaps keep the formulas)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Gemma Vilagut

PLoS One. 2022 Aug 23;17(8):e0273287. doi: 10.1371/journal.pone.0273287.r004

Author response to Decision Letter 1

25 Apr 2022

Reviewer #1:

Thank you for addressing all of my comments. The manuscript is a great contribution to the measurement science literature as well as to researchers and clinicians implementing PROMIS.

We are pleased to read that we have addressed the reviewer’s comments adequately and for the reviewer’s acknowledgement of the value of our manuscript.

Reviewer #2:

I really appreciate the author’s careful consideration of the reviewers’ comments and clear responses to them. I only have a few additional minor suggestions for further clarification (the indication of page and line numbers refer to the version with track changes):

We thank the reviewer for these compliments and will respond to the additional suggestions below.

We thank the reviewer for this remark, and agree that the reference values are established for the Dutch general population and that this notion is important. The reason we changed this is because these reference values, although established for the Dutch general population, are often used in clinical care, to compare scores from clinical populations with. However, we realize that changing these sentences might have led to confusion, and therefore have undone these changes (p. 5, line 104; p. 8, line 182).

The reviewer is correct, this should be reference 17. We have changed this in the manuscript (p. 7, line 140).

- Pages 10 (line 216) and 11 (line 220), reference 58: Given this reference is a personal e-mail sent to the author, I don’t think this can be included among the references of the article.

We have checked the journal requirements and the reviewer is correct. As such, we have deleted this reference from the manuscript (p. 10, lines 204 and 208).

The reviewer makes a valid remark. We have removed the SL constants from the manuscript (p. 11, lines 236-237).

Attachment

Submitted filename: Response to Reviewers.docx

Click here for additional data file.^{(22.6KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0273287.r005

Decision Letter 2

Thiago Machado Ardenghi

8 Aug 2022

Towards standardization of measuring anxiety and depression: Differential item functioning for language and Dutch reference values of PROMIS item banks

PONE-D-21-18689R2

Dear Dr. Terwee

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Thiago Machado Ardenghi

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

Reviewer #3: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Reviewer #1: (No Response)

Reviewer #2: The authors have adequately addressed all my comments and I think the article is adequate and ready for publication.

Reviewer #3: The present study assessed DIF for language between the Netherlands and the US for the PROMIS Anxiety and Depression item banks, and presented Dutch reference values for the general population and relevant subpopulations.

I believe the paper satisfies the criteria to be accepted for publication in PLOS one.

The study presents the results of original research. Analyses are performed to a high technical standard and are described in sufficient detail. Conclusions are presented in an appropriate fashion and are supported by the data. Also, the article is presented in an intelligible fashion and is written in standard English.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Marilia Leão Goettems

**********

PLoS One. doi: 10.1371/journal.pone.0273287.r006

Acceptance letter

Thiago Machado Ardenghi

10 Aug 2022

PONE-D-21-18689R2

Towards standardization of measuring anxiety and depression: Differential item functioning for language and Dutch reference values of PROMIS item banks

Dear Dr. Terwee:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Thiago Machado Ardenghi

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Fig. Category response curves for items displaying DIF.

(TIF)

Click here for additional data file.^{(89.3KB, tif)}

S2 Fig. Relation between T-scores and differences in T-scores original vs. hybrid approach.

(TIF)

Click here for additional data file.^{(199.8KB, tif)}

S1 Data. Anxiety data of respondents.

(POR)

Click here for additional data file.^{(102.2KB, por)}

S2 Data. Depression data of respondents.

(POR)

Click here for additional data file.^{(99.6KB, por)}

Attachment

Submitted filename: Response to Reviewers.docx

Click here for additional data file.^{(31.6KB, docx)}

Attachment

Submitted filename: Response to Reviewers.docx

Click here for additional data file.^{(22.6KB, docx)}

Data Availability Statement

All relevant data are within the article and its Supporting Information files.

[pone.0273287.ref001] 1.Lloyd C, Dyer P, Barnett A. Prevalence of symptoms of depression and anxiety in a diabetes clinic population. Diabetic medicine. 2000;17(3):198–202. doi: 10.1046/j.1464-5491.2000.00260.x [DOI] [PubMed] [Google Scholar]

[pone.0273287.ref002] 2.Singer S, Das-Munshi J, Brähler E. Prevalence of mental health conditions in cancer patients in acute care—a meta-analysis. Annals of Oncology. 2010;21(5):925–30. doi: 10.1093/annonc/mdp515 [DOI] [PubMed] [Google Scholar]

[pone.0273287.ref003] 3.Hare DL, Toukhsati SR, Johansson P, Jaarsma T. Depression and cardiovascular disease: a clinical review. European heart journal. 2014;35(21):1365–72. doi: 10.1093/eurheartj/eht462 [DOI] [PubMed] [Google Scholar]

[pone.0273287.ref004] 4.Frank JD. Psychotherapy: The restoration of morale. American Journal of Psychiatry. 1974;131(3):271–4. doi: 10.1176/ajp.131.3.271 [DOI] [PubMed] [Google Scholar]

[pone.0273287.ref005] 5.Clarke DM, Kissane DW. Demoralization: its phenomenology and importance. Australian & New Zealand Journal of Psychiatry. 2002;36(6):733–42. doi: 10.1046/j.1440-1614.2002.01086.x [DOI] [PubMed] [Google Scholar]

[pone.0273287.ref006] 6.ICHOM. International Consortium for Health Outcomes Measurement (ICHOM) 2021. Available from: www.ichom.org.

[pone.0273287.ref007] 7.Terwee CB, Zuidgeest M, Vonkeman HE, Cella D, Haverman L, Roorda LD. Common patient-reported outcomes across ICHOM Standard Sets: the potential contribution of PROMIS®. BMC Medical Informatics and Decision Making. 2021;21(1):1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref008] 8.Nezu AM, Ronan GF, Meadows EA, McClure KS. Practitioner’s guide to empirically-based measures of depression: Springer Science & Business Media; 2000. [Google Scholar]

[pone.0273287.ref009] 9.Roemer L. Measures for anxiety and related constructs. Practitioner’s guide to empirically based measures of anxiety: Springer; 2002. p. 49–83. [Google Scholar]

[pone.0273287.ref010] 10.Antony MM, Stein MB. Oxford handbook of anxiety and related disorders: Oxford University Press; 2008. [Google Scholar]

[pone.0273287.ref011] 11.Vodermaier A, Linden W, Siu C. Screening for emotional distress in cancer patients: a systematic review of assessment instruments. Journal of the National Cancer Institute. 2009;101(21):1464–88. doi: 10.1093/jnci/djp336 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref012] 12.McHugh RK, Rasmussen JL, Otto MW. Comprehension of self‐report evidence‐based measures of anxiety. Depression and Anxiety. 2011;28(7):607–14. doi: 10.1002/da.20827 [DOI] [PubMed] [Google Scholar]

[pone.0273287.ref013] 13.Nelson CJ, Cho C, Berk AR, Holland J, Roth AJ. Are gold standard depression measures appropriate for use in geriatric cancer patients? A systematic evaluation of self-report depression instruments used with geriatric, cancer, and geriatric cancer samples. Journal of Clinical Oncology. 2010;28(2):348. doi: 10.1200/JCO.2009.23.0201 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref014] 14.Calvert M, Kyte D, Price G, Valderas JM, Hjollund NH. Maximising the impact of patient reported outcome assessment for patients and society. BMJ. 2019;364. doi: 10.1136/bmj.k5267 [DOI] [PubMed] [Google Scholar]

[pone.0273287.ref015] 15.Jim HS, Hoogland AI, Brownstein NC, Barata A, Dicker AP, Knoop H, et al. Innovations in research and clinical care using patient‐generated health data. CA: a cancer journal for clinicians. 2020;70(3):182–99. doi: 10.3322/caac.21608 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref016] 16.Eton DT, Beebe TJ, Hagen PT, Halyard MY, Montori VM, Naessens JM, et al. Harmonizing and consolidating the measurement of patient-reported information at health care institutions: a position statement of the Mayo Clinic. Patient Related Outcome Measures. 2014;5:7. doi: 10.2147/PROM.S55069 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref017] 17.Cella D, Riley W, Stone A, Rothrock N, Reeve B, Yount S, et al. The Patient-Reported Outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. Journal of clinical epidemiology. 2010;63(11):1179–94. doi: 10.1016/j.jclinepi.2010.04.011 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref018] 18.Cella D, Yount S, Rothrock N, Gershon R, Cook K, Reeve B, et al. The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years. Medical care. 2007;45(5 Suppl 1):S3. doi: 10.1097/01.mlr.0000258615.42478.55 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref019] 19.Pilkonis PA, Yu L, Dodds NE, Johnston KL, Maihoefer CC, Lawrence SM. Validation of the depression item bank from the Patient-Reported Outcomes Measurement Information System (PROMIS®) in a three-month observational study. Journal of psychiatric research. 2014;56:112–9. doi: 10.1016/j.jpsychires.2014.05.010 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref020] 20.Schalet BD, Pilkonis PA, Yu L, Dodds N, Johnston KL, Yount S, et al. Clinical validity of PROMIS depression, anxiety, and anger across diverse clinical samples. Journal of clinical epidemiology. 2016;73:119–27. doi: 10.1016/j.jclinepi.2015.08.036 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref021] 21.Pilkonis PA, Choi SW, Reise SP, Stover AM, Riley WT, Cella D, et al. Item banks for measuring emotional distress from the Patient-Reported Outcomes Measurement Information System (PROMIS®): depression, anxiety, and anger. Assessment. 2011;18(3):263–83. doi: 10.1177/1073191111411667 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref022] 22.Embretson SE, Reise SP. Item response theory: Psychology Press; 2013. [Google Scholar]

[pone.0273287.ref023] 23.Cella D, Gershon R, Lai J-S, Choi S. The future of outcomes measurement: item banking, tailored short-forms, and computerized adaptive assessment. Quality of Life Research. 2007;16(1):133–41. [DOI] [PubMed] [Google Scholar]

[pone.0273287.ref024] 24.HealthMeasures. PROMIS Anxiety Scoring Manual [cited 2020]. Available from: https://www.healthmeasures.net/images/PROMIS/manuals/PROMIS_Anxiety_Scoring_Manual.pdf.

[pone.0273287.ref025] 25.HealthMeasures. PROMIS Depression Scoring Manual [cited 2020]. Available from: https://www.healthmeasures.net/images/PROMIS/manuals/PROMIS_Depression_Scoring_Manual.pdf.

[pone.0273287.ref026] 26.HealthMeasures. How to select a HealthMeasure 2020 [cited 2020 December]. Available from: https://www.healthmeasures.net/applications-of-healthmeasures/guidance/selecting-a-healthmeasure.

[pone.0273287.ref027] 27.Purvis TE, Neuman BJ, Riley LH, Skolasky RL. Comparison of PROMIS Anxiety and Depression, PHQ-8, and GAD-7 to screen for anxiety and depression among patients presenting for spine surgery. Journal of Neurosurgery: Spine. 2019;30(4):524–31. [DOI] [PubMed] [Google Scholar]

[pone.0273287.ref028] 28.Sunderland M, Batterham P, Calear A, Carragher N. Validity of the PROMIS depression and anxiety common metrics in an online sample of Australian adults. Quality of Life Research. 2018;27(9):2453–8. doi: 10.1007/s11136-018-1905-5 [DOI] [PubMed] [Google Scholar]

[pone.0273287.ref029] 29.Clover K, Lambert SD, Oldmeadow C, Britton B, King MT, Mitchell AJ, et al. PROMIS depression measures perform similarly to legacy measures relative to a structured diagnostic interview for depression in cancer patients. Quality of Life Research. 2018;27(5):1357–67. doi: 10.1007/s11136-018-1803-x [DOI] [PubMed] [Google Scholar]

[pone.0273287.ref030] 30.Amtmann D, Kim J, Chung H, Bamer AM, Askew RL, Wu S, et al. Comparing CESD-10, PHQ-9, and PROMIS depression instruments in individuals with multiple sclerosis. Rehabilitation psychology. 2014;59(2):220. doi: 10.1037/a0035919 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref031] 31.Freedland KE, Steinmeyer BC, Carney RM, Rubin EH, Rich MW. Use of the PROMIS® Depression scale and the Beck Depression Inventory in patients with heart failure. Health Psychology. 2019;38(5):369. doi: 10.1037/hea0000682 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref032] 32.Flens G, Terwee CB, Smits N, Williams G, Spinhoven P, Roorda LD, et al. Construct validity, responsiveness, and utility of change indicators of the Dutch-Flemish PROMIS item banks for depression and anxiety administered as computerized adaptive test (CAT): A comparison with the Brief Symptom Inventory (BSI). Psychological Assessment. 2021. [DOI] [PubMed] [Google Scholar]

[pone.0273287.ref033] 33.Lizzio VA, Blanchett J, Borowsky P, Meldau JE, Verma NN, Muh S, et al. Feasibility of PROMIS CAT administration in the ambulatory sports medicine clinic with respect to cost and patient compliance: a single-surgeon experience. Orthopaedic journal of sports medicine. 2019;7(1):2325967118821875. doi: 10.1177/2325967118821875 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref034] 34.Beleckas CM, Prather H, Guattery J, Wright M, Kelly M, Calfee RP. Anxiety in the orthopedic patient: using PROMIS to assess mental health. Quality of Life Research. 2018;27(9):2275–82. doi: 10.1007/s11136-018-1867-7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref035] 35.Papuga MO, Dasilva C, McIntyre A, Mitten D, Kates S, Baumhauer J. Large-scale clinical implementation of PROMIS computer adaptive testing with direct incorporation into the electronic medical record. Health Systems. 2018;7(1):1–12. doi: 10.1057/s41306-016-0016-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref036] 36.Wagner LI, Schink J, Bass M, Patel S, Diaz MV, Rothrock N, et al. Bringing PROMIS to practice: brief and precise symptom screening in ambulatory cancer care. Cancer. 2015;121(6):927–34. doi: 10.1002/cncr.29104 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref037] 37.Scholle SH, Morton S, Homco J, Rodriguez K, Anderson D, Hahn E, et al. Implementation of the PROMIS-29 in routine care for people with diabetes: challenges and opportunities. The Journal of ambulatory care management. 2018;41(4):274–87. doi: 10.1097/JAC.0000000000000248 [DOI] [PubMed] [Google Scholar]

[pone.0273287.ref038] 38.HealthMeasures. Available translations 2020 [cited 2020 December]. Available from: https://www.healthmeasures.net/explore-measurement-systems/promis/intro-to-promis/available-translations.

[pone.0273287.ref039] 39.Vilagut G, Forero C, Adroher N, Olariu E, Cella D, Alonso J, et al. Testing the PROMIS® Depression measures for monitoring depression in a clinical sample outside the US. Journal of psychiatric research. 2015;68:140–50. doi: 10.1016/j.jpsychires.2015.06.009 [DOI] [PubMed] [Google Scholar]

[pone.0273287.ref040] 40.Jakob T, Nagl M, Gramm L, Heyduck K, Farin E, Glattacker M. Psychometric properties of a German translation of the PROMIS® depression item bank. Evaluation & the health professions. 2017;40(1):106–20. [DOI] [PubMed] [Google Scholar]

[pone.0273287.ref041] 41.de Castro NFC, Pinto RdMC, da Silva Mendonça TM, da Silva CHM. Psychometric validation of PROMIS® Anxiety and Depression Item Banks for the Brazilian population. Quality of Life Research. 2020;29(1):201–11. doi: 10.1007/s11136-019-02319-1 [DOI] [PubMed] [Google Scholar]

[pone.0273287.ref042] 42.Terwee C, Roorda L, De Vet H, Dekker J, Westhovens R, Van Leeuwen J, et al. Dutch–Flemish translation of 17 item banks from the patient-reported outcomes measurement information system (PROMIS). Quality of Life Research. 2014;23(6):1733–41. doi: 10.1007/s11136-013-0611-6 [DOI] [PubMed] [Google Scholar]

[pone.0273287.ref043] 43.Flens G, Smits N, Terwee CB, Dekker J, Huijbrechts I, de Beurs E. Development of a computer adaptive test for depression based on the Dutch-Flemish version of the PROMIS item bank. Evaluation & the health professions. 2017;40(1):79–105. doi: 10.1177/0163278716684168 [DOI] [PubMed] [Google Scholar]

[pone.0273287.ref044] 44.Flens G, Smits N, Terwee CB, Dekker J, Huijbrechts I, Spinhoven P, et al. Development of a computerized adaptive test for anxiety based on the Dutch–Flemish version of the PROMIS item bank. Assessment. 2019;26(7):1362–74. doi: 10.1177/1073191117746742 [DOI] [PubMed] [Google Scholar]

[pone.0273287.ref045] 45.Flens G, Smits N, Terwee CB, Pijck L, Spinhoven P, de Beurs E. Practical Significance of Longitudinal Measurement Invariance Violations in the Dutch–Flemish PROMIS Item Banks for Depression and Anxiety: An Illustration With Ordered-Categorical Data. Assessment. 2021;28(1):277–94. doi: 10.1177/1073191119880967 [DOI] [PubMed] [Google Scholar]

[pone.0273287.ref046] 46.van Bebber J, Flens G, Wigman JT, de Beurs E, Sytema S, Wunderink L, et al. Application of the Patient‐Reported Outcomes Measurement Information System (PROMIS) item parameters for Anxiety and Depression in the Netherlands. International journal of methods in psychiatric research. 2018;27(4):e1744. doi: 10.1002/mpr.1744 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref047] 47.Fischer F, Gibbons C, Coste J, Valderas JM, Rose M, Leplège A. Measurement invariance and general population reference values of the PROMIS Profile 29 in the UK, France, and Germany. Quality of Life Research. 2018;27(4):999–1014. doi: 10.1007/s11136-018-1785-8 [DOI] [PubMed] [Google Scholar]

[pone.0273287.ref048] 48.Elsman EB, Roorda LD, Crins MH, Boers M, Terwee CB. Dutch reference values for the Patient-Reported Outcomes Measurement Information System Scale v1. 2-Global Health (PROMIS-GH). Journal of Patient-Reported Outcomes. 2021;5(1):1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref049] 49.Cella D. PROMIS 1 Wave 1. Harvard Dataverse; 2015.

[pone.0273287.ref050] 50.Reeve BB, Hays RD, Bjorner JB, Cook KF, Crane PK, Teresi JA, et al. Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Medical care. 2007:S22–S31. doi: 10.1097/01.mlr.0000250483.85507.04 [DOI] [PubMed] [Google Scholar]

[pone.0273287.ref051] 51.Choi SW, Gibbons LE, Crane PK. Lordif: An R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and Monte Carlo simulations. Journal of statistical software. 2011;39(8):1. doi: 10.18637/jss.v039.i08 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref052] 52.HealthMeasures. Minimum requirements for the release of PROMIS instruments after translation and recommendations for further psychometirc evaluation. 2014.

[pone.0273287.ref053] 53.Chalmers RP. mirt: A multidimensional item response theory package for the R environment. Journal of statistical Software. 2012;48(1):1–29. [Google Scholar]

[pone.0273287.ref054] 54.Stocking ML, Lord FM. Developing a common metric in item response theory. Applied psychological measurement. 1983;7(2):201–10. [Google Scholar]

[pone.0273287.ref055] 55.Kolen MJ, Brennan RL. Test equating, scaling, and linking: Methods and practices: Springer Science & Business Media; 2014. [Google Scholar]

[pone.0273287.ref056] 56.HealthMeasures. PROMIS reference populations 2021 [cited 2021 March]. Available from: https://www.healthmeasures.net/score-and-interpret/interpret-scores/promis/reference-populations.

[pone.0273287.ref057] 57.HealthMeasures. PROMIS Score Cut-Points [cited 2020]. Available from: http://www.healthmeasures.net/score-and-interpret/interpret-scores/promis/promis-score-cut-points.

[pone.0273287.ref058] 58.HealthMeasures. Gender and Age Range Sub-norms for Adult PROMIS Measures Centered on the US General Census 2000 [cited 2020]. Available from: http://www.healthmeasures.net/score-and-interpret/interpret-scores/promis/reference-populations.

[pone.0273287.ref059] 59.van Muilekom MM, Luijten MA, van Oers HA, Terwee CB, van Litsenburg RR, Roorda LD, et al. From statistics to clinics: the visual feedback of PROMIS® CATs. Journal of Patient-Reported Outcomes. 2021;5(1):1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref060] 60.Swanholm E, McDonald W, Makris U, Noe C, Gatchel R. Estimates of Minimally Important Differences (MID s) for Two Patient‐Reported Outcomes Measurement Information System (PROMIS) Computer‐Adaptive Tests in Chronic Pain Patients. Journal of Applied Biobehavioral Research. 2014;19(4):217–32. [Google Scholar]

[pone.0273287.ref061] 61.Yost KJ, Eton DT, Garcia SF, Cella D. Minimally important differences were estimated for six Patient-Reported Outcomes Measurement Information System-Cancer scales in advanced-stage cancer patients. Journal of clinical epidemiology. 2011;64(5):507–16. doi: 10.1016/j.jclinepi.2010.11.018 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref062] 62.Lee AC, Driban JB, Price LL, Harvey WF, Rodday AM, Wang C. Responsiveness and minimally important differences for 4 patient-reported outcomes measurement information system short forms: physical function, pain interference, depression, and anxiety in knee osteoarthritis. The Journal of Pain. 2017;18(9):1096–110. doi: 10.1016/j.jpain.2017.05.001 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref063] 63.Kroenke K, Stump TE, Chen CX, Kean J, Bair MJ, Damush TM, et al. Minimally important differences and severity thresholds are estimated for the PROMIS depression scales from three randomized clinical trials. Journal of affective disorders. 2020;266:100–8. doi: 10.1016/j.jad.2020.01.101 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref064] 64.Scott NW, Fayers PM, Aaronson NK, Bottomley A, de Graeff A, Groenvold M, et al. Differential item functioning (DIF) analyses of health-related quality of life instruments using logistic regression. Health and quality of life outcomes. 2010;8(1):1–9. doi: 10.1186/1477-7525-8-81 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref065] 65.Vilagut G, Forero CG, Castro-Rodriguez JI, Olariu E, Barbaglia G, Astals M, et al. Measurement equivalence of PROMIS depression in Spain and the United States. Psychological assessment. 2019;31(2):248. doi: 10.1037/pas0000665 [DOI] [PubMed] [Google Scholar]

[pone.0273287.ref066] 66.Fischer HF, Wahl I, Nolte S, Liegl G, Brähler E, Löwe B, et al. Language‐related differential item functioning between English and German PROMIS Depression items is negligible. International journal of methods in psychiatric research. 2017;26(4):e1530. doi: 10.1002/mpr.1530 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref067] 67.Acquadro C, Patrick DL, Eremenco S, Martin ML, Kuliś D, Correia H, et al. Emerging good practices for translatability assessment (TA) of patient-reported outcome (PRO) measures. Journal of patient-reported outcomes. 2018;2(1):1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref068] 68.HealthMeasures. Meaningful change for PROMIS [cited 2020]. Available from: http://www.healthmeasures.net/score-and-interpret/interpret-scores/promis/meaningful-change.

[pone.0273287.ref069] 69.ICHOM. Overall Adult Health 2021 [cited 2021 February]. Available from: https://www.ichom.org/portfolio/overall-adult-health/.

[pone.0273287.ref070] 70.HealthMeasures. PROMIS Global Health Scoring Manual [cited 2020]. Available from: http://www.healthmeasures.net/images/PROMIS/manuals/PROMIS_Global_Scoring_Manual.pdf.

[pone.0273287.ref071] 71.Victorson D, Schalet BD, Kundu S, Helfand BT, Novakovic K, Penedo F, et al. Establishing a common metric for self‐reported anxiety in patients with prostate cancer: Linking the Memorial Anxiety Scale for Prostate Cancer with PROMIS Anxiety. Cancer. 2019;125(18):3249–58. doi: 10.1002/cncr.32189 [DOI] [PubMed] [Google Scholar]

[pone.0273287.ref072] 72.Kaat AJ, Newcomb ME, Ryan DT, Mustanski B. Expanding a common metric for depression reporting: linking two scales to PROMIS® depression. Quality of Life Research. 2017;26(5):1119–28. doi: 10.1007/s11136-016-1450-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref073] 73.Kim J, Chung H, Askew RL, Park R, Jones SM, Cook KF, et al. Translating CESD-20 and PHQ-9 scores to PROMIS depression. Assessment. 2017;24(3):300–7. doi: 10.1177/1073191115607042 [DOI] [PubMed] [Google Scholar]

[pone.0273287.ref074] 74.Choi SW, Schalet B, Cook KF, Cella D. Establishing a common metric for depressive symptoms: linking the BDI-II, CES-D, and PHQ-9 to PROMIS depression. Psychological assessment. 2014;26(2):513. doi: 10.1037/a0035768 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0273287.ref075] 75.Schalet BD, Cook KF, Choi SW, Cella D. Establishing a common metric for self-reported anxiety: linking the MASQ, PANAS, and GAD-7 to PROMIS Anxiety. Journal of anxiety disorders. 2014;28(1):88–96. doi: 10.1016/j.janxdis.2013.11.006 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Towards standardization of measuring anxiety and depression: Differential item functioning for language and Dutch reference values of PROMIS item banks

Ellen B M Elsman

Gerard Flens

Edwin de Beurs

Leo D Roorda

Caroline B Terwee

Roles

Abstract

Introduction

Methods

Results

Conclusions

Introduction

Materials and methods

Participants and procedures

Measures

Statistical analyses

Results

Table 1. Sociodemographic characteristics of participants and the Dutch general population.

Table 2. McFadden’s pseudo R2 and IRT parameters for items displaying DIF.

Fig 1. Total impact of DIF on the test characteristic curve (TCC) for anxiety and depression.

Table 3. PROMIS anxiety and depression T-scoresa based on different sets of item parameters for different versions of the instruments.

Table 4. PROMIS anxiety and depression Dutch reference valuesa by age and gender and comparisons with the US general population [58].

Fig 2. Visual feedback [59] of PROMIS Anxiety and Depression scores, based on Dutch mean T-scores and Dutch thresholds for mild, moderate and severe symptoms.

Discussion

Conclusions

Supporting information

Data Availability

Funding Statement

References

Decision Letter 0

Vanessa Carels

Roles

Author response to Decision Letter 0

Decision Letter 1

Jianhong Zhou

Roles

Author response to Decision Letter 1

Decision Letter 2

Thiago Machado Ardenghi

Roles

Acceptance letter

Thiago Machado Ardenghi

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Table 2. McFadden’s pseudo R² and IRT parameters for items displaying DIF.

Table 3. PROMIS anxiety and depression T-scores^a based on different sets of item parameters for different versions of the instruments.

Table 4. PROMIS anxiety and depression Dutch reference values^a by age and gender and comparisons with the US general population [58].