Abstract
Limited psychometric information is available to guide best practices for measuring youth irritability. This report compares performance of irritability measures using item response theory (IRT). Study 1 used a sample of 482 early adolescents and compared the parent- and youth-report affective reactivity index (ARI) and irritability factors derived from the parent-report Child Behavior Checklist (CBCL) and clinician-administered Kiddie Schedule for Affective Disorders (K-SADS). Study 2 combined data from three childhood samples (N = 811) and compared performance of the parent-report ARI and CBCL and the clinician-administered Preschool Age Psychiatric Assessment (PAPA). The ARI emerged as the best measure of childhood irritability across the developmental periods, while the CBCL and K-SADS provided an adequate amount of information in early adolescents. No measure reliably assessed irritability at modest severity levels. Using IRT across large pools of developmental samples and measures is needed to guide the field in the measurement of youth irritability.
Keywords: item response theory, IRT, irritability, measurement, assessment
Youth irritability is broadly defined as a mood of low frustration–tolerance characterized by anger and temper outbursts (Brotman et al., 2017; Stringaris, Zavos, et al., 2012). Irritability, a transdiagnostic construct, constitutes one of the most common reasons for pediatric psychiatric evaluation and predicts adult depression, anxiety, suicidality, and lower educational and financial attainment (Brotman et al., 2017; Copeland et al., 2014; Hawes et al., 2020; Stringaris et al., 2009). Despite recent examination of youth irritability, little empirical data are available to guide researchers and clinicians in its assessment. This is a critical gap in our knowledge, as identification of reliable and valid measures is the cornerstone of scientific inquiry. This gap further poses an obstacle to integrating findings across studies and clinical decision making.
Parent-report measures, and to a lesser extent self-report measures in older youth, are the most widely used methods to assess childhood irritability (Aebi et al., 2013; Melvin et al., 2018; Stringaris, Goodman, et al., 2012; Stringaris, Zavos, et al., 2012; Wakschlag et al., 2012). Informant- and self-report measures are economical and efficient, and can assess symptoms across time and contexts. Researchers have also derived ad hoc irritability scales from clinical interviews assessing irritability symptoms across psychiatric disorders (depression, oppositional defiant disorder [ODD], disruptive mood dysregulation disorder [DMDD]; Dougherty et al., 2015; Melvin et al., 2018; Merwin et al., 2018; Pagliaccio et al., 2018; Wakschlag et al., 2015).
Few measures have been specifically designed to assess youth irritability. Notably, the affective reactivity index (ARI; Stringaris, Goodman, et al., 2012) was developed as a brief measure to assess the severity of youth’s irritable mood (e.g., “easily annoyed by others,” “often loses temper”) in the past 6 months. The parent- and youth-report ARI consists of six severity items rated on a 3-point scale from (not true) to (certainly true), and an additional item assessing functional impairment. Prior work supports the ARI’s unidimensionality and demonstrates its excellent internal consistency and good test–retest reliability and convergent and predictive validity in clinical and community samples (Mulraney et al., 2014; Stringaris, Goodman, et al., 2012). The ARI has been used with school-aged children (aged 9 years and older) and adolescents (Melvin et al., 2018; Mulraney et al., 2014; Tseng et al., 2017). In addition to the ARI, Wakschlag and colleagues (Wakschlag et al., 2012; Wakschlag et al., 2015) used developmentally sensitive quantitative methods, including item response theory (IRT), to develop the parent-report Multidimensional Assessment Profile of Disruptive Behavior (MAP-DB) Temper Loss scale (22 items) for preschool-age children; data support the scale as a reliable and valid measure that captures the full spectrum of irritability in early childhood (Wakschlag et al., 2012; Wakschlag et al., 2015; Wiggins et al., 2018).
Another commonly used parent-report measure of childhood irritability is an empirically derived irritability factor score (Evans et al., 2020; Roberson-Nay et al., 2015; Stringaris, Zavos, et al., 2012; Wiggins et al., 2014) from the parent-rated Child Behavior Checklist (CBCL; Achenbach & Rescorla, 2001). The CBCL irritability factor scale includes three items assessing irritable mood, sudden changes in mood, and temper outbursts that are rated on a 3-point scale from (not true) to (very/often true). The parent-report irritability factor has been used with children ranging from preschool-age through adolescence and demonstrates adequate internal consistency and concurrent validity, and structural invariance over time (Aebi et al., 2013; Roberson-Nay et al., 2015; Stringaris, Zavos, et al., 2012; Tseng et al., 2017; Wiggins et al., 2014).
Irritability scales have also been derived from clinical interviews, including the Kiddie Schedule for Affective Disorders and Schizophrenia (K-SADS) and the Preschool Age Psychiatric Assessment (PAPA). As these are psychiatric diagnostic interviews developed to assess clinically significant symptoms, the irritability scales derived from these measures likely assess levels of irritability at the higher end of the continuum. For instance, the K-SADS rates items as absent, subthreshold, and threshold based on the interview’s guidelines for deriving psychiatric diagnoses. Furthermore, the ad hoc PAPA scale uses frequency cut-offs based on Diagnostic and Statistical Manual of Mental Disorders–Fifth edition (DSM-5) DMDD criteria to capture chronic and severe levels of irritability. The ad hoc K-SADS scale has been used in adolescents (Melvin et al., 2018), and the ad hoc PAPA scale has been used in preschool- and school-age children (Dougherty et al., 2015; Merwin et al., 2018; Pagliaccio et al., 2018; Wakschlag et al., 2015). Although these measures have been employed, there are limited data on their psychometric properties.
The current report conducted IRT analyses to compare functioning across measures of irritability in two developmental periods. IRT measures levels of a latent trait (in this case, irritability symptomatology) based on item responses and provides information about which levels of the latent trait an assessment is most suited to measure. Since IRT analyses measure the same latent factor from a single item or a multi-item measure, these techniques can be used to compare multiple measures of a single construct, irritability, on a common metric (Olino et al., 2013).
We examined irritability measures separately in adolescent and childhood samples, as measurement has differed somewhat across ages. Moreover, as irritability is more prevalent at younger ages (Wiggins et al., 2014), it is possible the construct may differ across development (Wakschlag et al., 2012; Wakschlag et al., 2015). We did not test measurement invariance over time as we did not have data on these measures in a single sample over time. In Study 1, we used a large early adolescent sample (n = 482; Mage = 12.66, SD = 0.46) and included irritability scales derived from the parent- and youth-report ARI, CBCL, and K-SADS. Study 2 combined three childhood samples for a total of 811 children (Mage = 6.63, SD = 0.75) and derived irritability scales from the parent ARI, CBCL, and PAPA. Based on a review of each measure’s items, we hypothesized that all measures across ages would provide reliable information at higher levels of irritability with little reliable information provided at lower levels of irritability.
Study I: Early Adolescent Sample
Method
Data for this study were drawn from the (Stony Brook Temperament Study (SBTS))—a community-based, longitudinal study examining the role of early child temperament in the development of childhood depression and anxiety (Klein & Finsaas, 2017). A total of 609 typically developing children have been enrolled in the study. Data collection began when the children were age 3 and has continued at 3-year intervals at ages 6, 9, and 12 years. The institutional review board (IRB) approved all study procedures, and informed parental consent and youth assent was obtained. At age 12 (M = 12.66 years, SD = 0.46), when the ARI was first introduced, 482 parents and children completed at least one measure of irritability and were included in the IRT analysis (see Table 1).
Table 1.
Sample Characteristics.
Study 1 |
Study 2 |
|||
---|---|---|---|---|
Sample (n = 482) | Sample 1 (n = 521) | Sample 2 (n = 115) | Sample 3 (n = 175) | |
Child age, M (SD), range | 12.66 (0.46), 11.50-14.17 | 6.07 (0.44), 5.08-7.58 | 7.28 (0.96), 5.50-10.0 | 6.53 (0.84), 5.0-9.17 |
Child sex, % male | 54 | 55 | 49 | 49 |
Child race, % | ||||
White | 89 | 87.9 | 44.7 | 60.3 |
Black | 7.9 | 9.2 | 31.6 | 10.3 |
Asian | 2.3 | 2.7 | 0.9 | 8 |
Other | 0.9 | 0.2 | 22.6 | 21.2 |
Hispanic ethnicity, % | 12.9 | 12.9 | 15.7 | 15.5 |
Parent with a 4-year college degree, % | 68.3 | 62.0 | 79.1 | 79.4 |
Median family income | $70,000-$90,000 | $70,000-$90,000 | $70,000-$100,000 | $70,000-$100,000 |
Irritability measures, M (SD), range | ||||
M-ARI | 1.5 (1.96), 0-10 | — | 1.93(2.39), 0-10 | 2.31(2.74), 0-12 |
Y-ARI | 1.73 (2.03), 0-12 | — | — | — |
M-CBCL irritability | 0.75 (1.18), 0-6 | 0.97 (1.21), 0-5 | 0.88 (1.36), 0-6 | — |
K-SADS irritability | 0.64 (1.73), 0-12 | — | — | — |
PAPA irritability | — | 0.77 (1.33), 0-6 | 1.27 (1.48), 0-7 | 0.92 (1.29), 0-6 |
Note. M-ARI = mother-reported affective reactivity index; Y-ARI = youth-reported affective reactivity index; M-CBCL = mother-reported Child Behavior Checklist; K-SADS = Kiddie Schedule for Affective Disorders; PAPA = Preschool Age Psychiatric Assessment.
ARI (Stringaris, Goodman, et al., 2012) assesses youth’s irritable mood over the past 6 months using six items rated on a 3-point scale from (never true) to (certainly true; Table 2). The ARI was completed by mothers (α = .85, n = 467) and youth (α = .77, n = 444).
Table 2.
Item Parameters.
Discrimination |
Difficulty |
|||
---|---|---|---|---|
Item | a | b1 | b2 | |
Study 1: Early adolescent sample | ||||
M-ARI-1 | Easily annoyed by others | 2.05 | 0.46 | 2.46 |
M-ARI-2 | Often loses temper | 4.77 | 0.79 | 1.82 |
M-ARI-3 | Stays angry for a long time | 2.96 | 1.89 | 3.33 |
M-ARI-4a | Angry most of the time | 4.04 | 1.98 | — |
M-ARI-5 | Gets angry frequently | 3.69 | 1.26 | 2.25 |
M-ARI-6 | Loses temper easily | 5.67 | 0.83 | 1.75 |
Y-ARI-1 | Easily annoyed by others | 0.62 | −0.51 | 4.45 |
Y-ARI-2 | Often loses temper | 1.09 | 1.04 | 3.29 |
Y-ARI-3 | Stays angry for a long time | 0.55 | 2.81 | 7.73 |
Y-ARI-4 | Angry most of the time | 0.80 | 4.12 | 6.29 |
Y-ARI-5 | Gets angry frequently | 0.77 | 2.17 | 5.55 |
Y-ARI-6 | Loses temper easily | 0.86 | 1.40 | 3.73 |
M-CBCL-1 | Stubborn, sullen, or irritable | 2.45 | 0.63 | 2.34 |
M-CBCL-2 | Sudden changes in mood or feelings | 1.82 | 1.17 | 3.20 |
M-CBCL-3 | Temper tantrums or hot temper | 4.00 | 0.93 | 2.20 |
K-SADS-1 | Irritability and anger | 1.86 | 2.11 | 3.08 |
K-SADS-2 | Argues with adults | 2.22 | 1.37 | 2.18 |
K-SADS-3 | Loses temper | 2.53 | 1.37 | 2.31 |
K-SADS-4 | Easily annoyed or touchy | 2.74 | 1.80 | 2.47 |
K-SADS-5 | Temper outbursts | 1.95 | 2.45 | 3.32 |
K-SADS-6 | Persistent negative mood | 2.57 | 1.86 | 2.11 |
Study 2: Linked childhood sample | ||||
M-ARI-1 | Easily annoyed by others | 1.65 | 0.10 | 2.07 |
M-ARI-2 | Often loses temper | 6.17 | 0.18 | 1.23 |
M-ARI-3 | Stays angry for a long time | 3.03 | 1.17 | 2.00 |
M-ARI-4 | Angry most of the time | 3.16 | 1.67 | 2.71 |
M-ARI-5 | Gets angry frequently | 4.39 | 0.79 | 1.84 |
M-ARI-6 | Loses temper easily | 4.77 | 0.45 | 1.39 |
M-CBCL-1 | Stubborn, sullen, or irritable | 1.70 | 0.48 | 2.64 |
M-CBCL-2 | Sudden changes in mood or feelings | 1.56 | 1.36 | 3.42 |
M-CBCL-3 | Temper tantrums or hot temper | 1.67 | 0.61 | 2.54 |
PAPA-1 | Touchy or easily annoyed | 1.86 | 1.93 | — |
PAPA-2 | Displays of anger | 2.47 | 1.51 | — |
PAPA-3 | Easily frustrated | 1.40 | 1.94 | — |
PAPA-4 | Loses temper | 1.46 | 1.74 | — |
PAPA-5 | Irritable mood | 2.19 | 0.92 | — |
PAPA-6 | Temper tantrums | 2.08 | 2.14 | — |
PAPA-7 | Duration criterion met | 0.59 | 3.10 |
Note. Item thresholds (difficulty parameters) indicating severity greater than the 95th percentile (i.e., 1.65) are displayed in bold. a = discrimination parameter; b = difficulty parameter; M-ARI = mother-reported affective reactivity index; Y-ARI = youth-reported affective reactivity index; M-CBCL = mother-reported Child Behavior Checklist; K-SADS = Kiddie Schedule for Affective Disorders; PAPA = Preschool Age Psychiatric Assessment; PAPA items had two response options (absent/present).
Mothers did not rate the response “2 = certainly true.”
Child Behavior Checklist Ages 6-18 Version (Achenbach & Rescorla, 2001) is a parent-report measure of childhood problems. The irritability factor score comprises three items rated on a 3-point scale from (not true) to (very/often true; Table 2; Evans et al., 2020; Roberson-Nay et al., 2015; Stringaris, Zavos, et al., 2012). Maternal reports were used to derive the irritability score (α = .76, n = 472).
Kiddie Schedule for Affective Disorders and Schizophrenia (Kaufman et al., 2016) is a semistructured clinician-administered psychiatric interview based on DSM-5. Information was gathered from both parents and youth separately; clinicians’ final ratings were based on both informants’ responses. The K-SADS irritability scale consists of six items that assess irritable mood and correspond to the ARI. The items include the following: (1) irritable and angry mood (depression screener); (2) argues with adults (ODD screener); (3) loses temper (ODD screener); (4) easily annoyed or touchy (ODD supplement); (5) temper outbursts (DMDD screener); and (6) persistent negative (irritable/cranky/angry) mood (DMDD screener; Table 2). Items were rated on a scale of “absent,” “subthreshold,” and “threshold” according to scoring guidelines. The six items were summed to create the K-SADS irritability score (α = .85, n = 476).
Data Analytic Plan
IRT refers to a family of models that set the probability of a response to an item or set of items as a function of an individual’s level of a latent underlying trait theta (θ) and parameters of the item(s; Lord, 1980). The graded response model is commonly used for polytomous item responses (e.g., Likert-type scales; Samejima, 1969). This model produces one discrimination parameter (a) and n − 1 difficulty parameters (b), where n represents the number of response options. The discrimination parameter gives information about how well item responses differentiate individuals with varying levels of the latent trait θ; discrimination parameters greater than 1.00 indicate that the item provides acceptable amounts of information about the latent trait. Difficulty parameters indicate the threshold at which an individual has a 50% probability of crossing from one response category to the next. These parameters are used to generate item characteristic curves, which can be aggregated across items to generate a test information function. This function indicates the spread across θ at which a measure gives reliable information.
We ran exploratory factor analysis (EFA) and one-factor confirmatory factor analysis (CFA) to test the unidimensionality of each measure before running the IRT. In our EFAs, we used the ratio of the first to second eigenvalues to assess dimensionality. Ratios greater than 4 support unidimensionality (Lord, 1980). In the one-factor CFA, we used the comparative fit index (CFI > 0.95), Tucker–Lewis index (TLI > 0.95), and the root mean square error of approximation (RMSEA< 0.10) to assess goodness of model fit (Lord, 1980; Steiger, 1990). All EFA and CFA analyses were conducted using Mplus Version 8 using the weighted least squares mean and variance-adjusted (WLSMV) estimator. To recover IRT parameters, models were reestimated using the robust maximum likelihood estimator.
Results
Table 1 provides descriptive statistics for all measures. Correlations between the measures (Supplementary Material Table S1, available online) ranged from 0.23 (youth-report ARI with mother-report CBCL) to 0.75 (mother-report ARI with mother-report CBCL). Raw scores by percentile rank for each measure are provided in Supplementary Material Table S2 (available online) and show that on the mother-report ARI, youth-report ARI, mother-report CBCL, and the K-SADS, a score of one was at the 60th percentile, 40th percentile, 70th percentile, and 80th percentile, respectively.
Dimensionality and IRT.
Table 3 summarizes model fit information for the EFA and one-factor CFA for each measure. EFA and CFA results for each of the ARI mother-report, ARI youth-report, and the K-SADS irritability factor all had good model fit and supported unidimensionality. We could not assess model fit for the CBCL, as a three-item CFA is a just-identified model. Therefore, we relied on EFA results, which supported unidimensionality (ratio of the first to second eigenvalue >4).
Table 3.
Results of the Unidimensionality Analyses.
EFA |
CFA |
|||||
---|---|---|---|---|---|---|
1st Eigenvalue | 2nd Eigenvalue | Ratio | CFI | TLI | RMSEA | |
Early adolescent sample | ||||||
M-ARI | 5.08 | 0.42 | 12.10 | 1.00 | 1.00 | 0.04 |
Y-ARI | 3.92 | 0.77 | 5.07 | 0.97 | 0.96 | 0.09 |
M-CBCL | 2.46 | 0.34 | 7.18 | — | — | — |
K-SADS | 4.89 | 0.56 | 8.71 | 0.99 | 0.98 | 0.08 |
Linked childhood sample | ||||||
M-ARI | 4.76 | 0.48 | 9.84 | 1.00 | 1.00 | 0.06 |
M-CBCL | 2.29 | 0.41 | 5.63 | — | — | — |
PAPA | 4.07 | 0.97 | 4.20 | 0.99 | 0.98 | 0.04 |
Note. M-ARI = mother-reported affective reactivity index; Y-ARI = youth-reported affective reactivity index; M-CBCL = mother-reported Child Behavior Checklist; K-SADS = Kiddie Schedule for Affective Disorders; PAPA = Preschool Age Psychiatric Assessment.
An IRT graded response model was run loading items from all four measures onto a single latent irritability construct. Table 2 provides discrimination and difficulty parameters for each measure’s items. Discrimination parameters for the mother-report ARI ranged from 2.05 to 5.67; first and second difficulty parameters ranged from 0.46 to 1.98 and 1.75 to 3.33, respectively. For the youth-report ARI, discrimination parameters ranged from 0.55 to 1.09; first and second difficultly parameters ranged from −0.51 to 4.12 and 3.29 to 7.73, respectively. Discrimination parameters for the CBCL ranged from 1.82 to 4.00; first and second difficulty parameters ranged from 0.63 to 1.17 and 2.20 to 3.20, respectively. For the K-SADS, discrimination parameters ranged from 1.86 to 2.74; first and second difficulty parameters ranged from 1.37 to 2.45 and 2.11 to 3.32, respectively.
Figure 1 illustrates the Total Information Curves for each measure. Test information above five is approximately equivalent to a reliability of .80, while test information above 10 is approximately equivalent to a reliability of .90. Only the mother-report ARI provided information above a 10, measuring irritability severity well from approximately 0.50 to 2.50 standard deviation above the mean of irritability severity. The CBCL gave acceptable information about irritability severity (test information >5) from 0.50 to 1.25 standard deviation above the mean, and from 2.00 to 2.50 standard deviation above the mean. The K-SADS gave acceptable information over a slightly more severe range of 1.25 to 3.00 standard deviation above the mean. The youth-report ARI performed poorly in comparison with the other three measures in this sample and provided no reliable information at any range of irritability.
Figure 1.
Study 1: Total information curves for the early adolescent sample: Mother- and youth-report ARI and irritability factors derived from the CBCL and K-SADS.
Note. ARI = affective reactivity index; CBCL = Child Behavior Checklist; K-SADS = Kiddie Schedule for Affective Disorders.
Study 2: Childhood Sample
Method
Data for Study 2 were drawn from three samples (see Table 1) to obtain a sufficiently large sample size for IRT and to acquire data across the ARI, CBCL, and PAPA.
Sample 1.
The SBTS sample is described above in Study 1. At age 6 (M = 6.07 years, SD = 0.44), 521 parents completed irritability measures, which included the irritability factor scores derived from the mother-reported CBCL (described above; α = .68, n = 468) and the PAPA (n = 516). The PAPA is semistructured parent-reported diagnostic interview designed to assess DSM psychiatric disorders in children aged 2 to 6 years, although it has been used in children up to age 9 years (Luby et al., 2013; Merwin et al., 2018). As described in Dougherty et al. (2013), six items from the PAPA were used to assess chronic irritability including: irritable mood, prone to feelings of anger (touchy/easily annoyed), prone to displays of anger, prone to feelings of frustration, discrete episodes of temper without violence (loses temper), and discrete episodes of excessive temper (shouting/crying/stamping) and/or with violence or attempts at damage (temper tantrums). Items were rated for intensity, frequency, and duration. Each item was coded as present if the behaviors were intrusive, interfering, and generalizable across activities at a threshold-level and occurred at least 45 times over the past 3 months. To assess whether the child experienced irritable mood states for a long time, this criterion was coded present if the child was rated as having at least 30-minute duration of irritable mood, frustration, annoyance, or anger, or difficulty recovering from temper tantrums. The 7-point total irritability scale consisted of the sum of symptoms coded as present according to the intensity, frequency, and duration criteria described above (α = .77).
Sample 2.
The second sample was obtained from a longitudinal study investigating neuroendocrine function and risk for depression (Dougherty, Tolep, Smith, & Rose, 2013). The University of Maryland IRB approved all study procedures, and informed parental consent was obtained. Parents and their preschool-age children (n = 175) were recruited at baseline, and parents with a history of depression were overselected. At the 3-year follow-up (child age M = 7.28 years, SD = 0.96), 115 parents completed the ARI (α = .85, n = 112), CBCL (α = .80, n = 113), and/or PAPA (α = .66, n = 113).
Sample 3.
The third sample was obtained from a longitudinal study examining early childhood internalizing symptoms (Bufferd, Dougherty, & Olino, 2017). At baseline, 299 preschool-age children were recruited from the surrounding communities near the University of Maryland and California State University San Marcos. The IRB at both institutions approved the study, and informed consent was obtained from parents. Two years later (child age M = 6.53 years, SD = 0.84), 175 parents completed the ARI (α = .87, n = 172) and/or PAPA (α = .68, n = 166).
Data Analytic Plan
Study 1’s analytic approach was employed in Study 2. However, as measures were administered across three different samples, we used the concurrent calibration method to link each sample’s data when running the IRT (Olino et al., 2013). This method assumes that the underlying construct is comprised of all the items from ARI, CBCL, and PAPA; items that were not administered to a particular sample were coded as missing in the data. Discrimination and difficulty parameters for all items were calculated in a single model.
Results
There were no differences between the samples on proportion of boys versus girls (p = .22). Children in Sample 2 were older, F(2, 964) = 91.93, p < .001, and less likely to have married parents, χ2(2, n = 673) = 21.48, p < .001, than children in Samples 1 and 3. More children in Sample 3 were non-White, χ2(2, n = 674) = 172.20, p < .001, than in Samples 1 and 2. Children in Sample 1 were more likely to have families making greater than $40,000 per year than children in the other two samples, χ2(2, n = 613) = 19.46, p> < .001.
Correlations between the measures (Table S1, available online) ranged from .42 (PAPA with CBCL) to .68 (ARI with CBCL). Raw scores by percentile rank for each measure are provide in Table S2 (available online) and show that on the mother-report ARI, CBCL, and the PAPA, a score of one was at the 40th percentile, 60th percentile, and 70th percentile, respectively.
Dimensionality and IRT.
Table 3 summarizes CFA and EFA results for each measure, and results support the assumption of unidimensionality for each measure. Table 2 lists item parameter information for each item included in the IRT. Discrimination parameters for the mother-report ARI ranged from 1.65 to 4.77; first and second difficulty parameters ranged from 0.45 to 1.67 and 1.23 to 2.71, respectively. For the mother-report CBCL, discrimination parameters ranged from 1.56 to 1.70; first and second difficulty parameters ranged from 0.48 to 1.36 and 2.54 to 3.42, respectively. Discrimination parameters for the PAPA ranged from 0.59 to 2.48; first difficulty parameter ranged from 0.92 to 3.10. Figure 2 illustrates the Total Information Curves for each measure. The ARI provided good test information (>10) from 0 to 2 standard deviation above the mean of irritability severity (acceptable test information from −0.25 to 2.75 standard deviation above the mean). Both the CBCL and PAPA provided poor information about irritability, as neither gave test information above a five.1
Figure 2.
Study 2: Total information curves for the childhood linked sample: Mother-report ARI and irritability factors derived from the CBCL and PAPA.
Note. ARI = affective reactivity index; CBCL = Child Behavior Checklist; PAPA = Preschool Age Psychiatric Assessment.
Discussion
This is the first study to use IRT to examine multiple irritability measures across ages. IRT approaches yield foundational information about the functioning of measures and informs the selection of reliable measures across development. In the adolescent sample, the mother-report ARI and irritability factors derived from the CBCL and K-SADS provided reliable information, yet across all measures, information was restricted to higher levels of irritability. The youth-report ARI did not provide reliable information about the latent construct. In the childhood sample, the mother-report ARI similarly gave acceptable to good information only at higher levels of irritability, whereas the CBCL and PAPA were unreliable at all levels of irritability.
In the adolescent sample, all measures demonstrated unidimensionality. Discrimination parameters revealed that for the mother-report ARI, CBCL, and K-SADS, all items provided acceptable amounts of information about the latent trait (discrimination statistics >1; Baker, 2001). In contrast, only one item on the youth-report ARI (“often loses temper”) contributed acceptable information about the latent trait. Difficulty parameters further showed that all items on the mother-report ARI, youth-report ARI, CBCL, and K-SADS captured information at higher levels of severity and were poorer at capturing variations at lower levels. These findings suggest that these measures, particularly the mother-report ARI, may be appropriate in identifying clinically significant levels of irritability that warrant psychiatric attention. However, none of these measures provided reliable information at average or low levels of irritability, and thus do not capture the full normal–abnormal continuum.
Youth-report ARI may have performed poorly for several reasons. IRT assumes that all items assess the same underlying construct. It is possible that youth-reports capture a different underlying construct of irritability than reported by parents and clinicians. In addition, there is evidence that parent-reported youth irritability, in contrast to youth-reports, is a better predictor of youth-reported clinical outcomes in adulthood (Stringaris et al., 2009), which raises concerns about the utility of youth-report. Nevertheless, youth-report may provide incremental information if assessed using ecological momentary assessments, allowing youth to report on their moods in real-time. Future research is needed to compare multiple youth-report measures, alongside parent and clinician ratings, to determine whether the measures capture common or distinct clinical phenomena and evaluate the clinical utility of information gathered by various informants and methods.
In the childhood sample, all measures similarly demonstrated unidimensionality. Discrimination parameters indicated that all items on the mother-report ARI and irritability factor scores derived from the CBCL and PAPA provided acceptable information about the latent irritability factor (discrimination statistics > 1), except for the duration item of the PAPA. Difficulty parameters indicated that the mother-report ARI items, “easily annoyed by others,” “stays angry for a long time,” “angry most of the time,” and “gets angry frequently,” captured information at severe levels of irritability, whereas items “often loses temper” and “loses temper easily” did not capture severe levels of irritability (95th percentile). As temper loss is common at younger ages, these latter items may not discriminate impairing irritability in this age group. Difficulty parameters of the CBCL revealed that all three items only captured severe levels of irritability. Similarly, although most PAPA items coded present captured severe levels of irritability, PAPA items, “displays of anger” and “irritable mood” occurring ≥ 45 times in the past three months, did not capture adequate information at severe levels of irritability (95th percentile), calling into question these thresholds in this age range. Importantly, although restricted to severe levels of irritability, only mother-report ARI in the childhood sample provided reliable information of the latent irritability construct, suggesting the CBCL and PAPA irritability scores do not capture this latent irritability construct in this age range. Reliability for the latter measures may be improved if the CBCL contained more items assessing irritability or empirically derived cutoffs were used to assess irritability with the PAPA.
Across the adolescent and childhood samples, no measure provided reliable information across the full spectrum of irritability. In contrast, prior IRT analyses have demonstrated that the MAP-DB Temper Loss scale, which was developed to assess irritability in preschool-age children, reliably captures the full spectrum of irritability in early childhood (Wakschlag et al., 2012; Wakschlag et al., 2015). Future research in the measurement of youth irritability would benefit from a full-spectrum approach in older youth as well, as this may aid in the identification of developmentally sensitive clinical thresholds and the identification of risk prior to developing into difficult-to-treat clinical disorders.
Although the current report provides important information about the functioning of multiple irritability measures in two different developmental periods, limitations are worth noting. The functioning of these measures was not examined in early childhood and older adolescence, which are important developmental periods when irritability is prevalent. In addition, other measures of irritability have been employed in the literature and were not examined here.
As measurement is the cornerstone of advancing a field, what are some lessons learned and potential next steps? Our findings demonstrate that the parent-report ARI, relative to the other measures examined, yielded the most reliable information over the widest range of irritability and is recommended as the first line instrument to assess irritability in these developmental periods. However, there should be further testing against other measures to identify any potential contexts that yield different findings. Moving forward, it is critical to apply developmentally sensitive quantitative methods to compare existing irritability measures and determine their psychometric functioning. This work would be facilitated by developing a data-sharing repository for irritability measures across a diverse set of samples (an approach advocated at the 3rd Congress on Pediatric Irritability and Dysregulation, 2019). Applying quantitative methods across a wide range of measures (e.g., clinical and temperament assessments) and ages will identify best-measurement practices and identify gaps in our current assessments. Moreover, we can use IRT to calibrate across measures so comparisons can be made and we can better assess irritability longitudinally (Kaat et al., 2019). Ultimately, increased attention to the measurement of irritability will pave the way for advances in the etiology, pathophysiology, identification, and treatment for impairing irritability.
Supplementary Material
Acknowledgments
We would like to thank all the parents and youth who contributed data for the current report.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Support for this research was provided by funding through NIMH R01 MH069942 (Klein), the University of Maryland (UMD) College of Behavioral and Social Sciences Dean’s Research Initiative Award (Dougherty), the UMD Research and Scholars Award (Dougherty), a California State University San Marcos (CSUSM) Grant Proposal Seed Money Award (Bufferd), a CSUSM University Professional Development Award (Bufferd), and the National Science Foundation Graduate Research Fellowship Program (Chad-Friedman).
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethics Approval
Approval was obtain from the ethics committees of the University of Maryland College Park, Stony Brook University, and California State University San Marcos. The procedures used in this study adhere to the tenets of the Declaration of Helsinki.
Supplemental Material
Supplemental material for this article is available online.
To evaluate violations of local dependence, we examined across both the early adolescent and linked childhood samples modification indices within the initial WLSMV CFA model including all items simultaneously. This provided information about areas of strain in the model concerning residual covariances. In the early adolescent sample, we identified one residual covariance path that was suggestive of violations of local dependence; therefore, we estimated the model permitting the residual covariance. In this model, there was minimal impact on the discrimination and difficulty parameters. Thus, we retained the more restricted model results. In the childhood linked sample, no modification indices suggested violations of local dependence.
References
- Achenbach TM, & Rescorla LA (2001). Manual for the ASEBA school-age forms and profiles. University of Vermont Research Center for Children, Youth and Families. [Google Scholar]
- Aebi M, Plattner B, Metzke CW, Bessler C, & Steinhausen H-C (2013). Parent- and self-reported dimensions of oppositionality in youth: Construct validity, concurrent validity, and the prediction of criminal outcomes in adulthood. Journal of Child Psychology and Psychiatry, 54(9), 941–949. 10.1111/jcpp.12039 [DOI] [PubMed] [Google Scholar]
- Baker FB (2001). The basics of item response theory: ERIC Clearinghouse on Assessment and Evaluation. https://files.eric.ed.gov/fulltext/ED458219.pdf [Google Scholar]
- Brotman MA, Kircanski K, & Leibenluft E (2017). Irritability in children and adolescents. Annual Review of Clinical Psychology, 13(1), 317–341. 10.1146/annurev-clinpsy-032816-044941 [DOI] [PubMed] [Google Scholar]
- Bufferd SJ, Dougherty LR, & Olino TM (2017). Mapping the frequency and severity of depressive behaviors in preschool-aged children. Child Psychiatry & Human Development, 48(6), 934–943. 10.1007/s10578-017-0715-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Copeland WE, Shanahan L, Egger H, Angold A, & Costello EJ (2014). Adult diagnostic and functional outcomes of DSM-5 disruptive mood dysregulation disorder. American Journal of Psychiatry, 171(6), 668–674. 10.1176/appi.ajp.2014.13091213 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dougherty LR, Smith VC, Bufferd SJ, Kessel E, Carlson GA, & Klein DN (2015). Preschool irritability predicts child psychopathology, functional impairment, and service use at age nine. Journal of Child Psychology and Psychiatry, 56(9), 999–1007. 10.1111/jcpp.12403 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dougherty LR, Tolep MR, Smith VC, & Rose S (2013). Early exposure to parental depression and parenting: Associations with young offspring’s stress physiology and oppositional behavior. Journal of Abnormal Child Psychology, 41(8), 1299–1310. 10.1007/s10802-013-9763-7 [DOI] [PubMed] [Google Scholar]
- Evans SC, Bonadio FT, Bearman SK, Ugueto AM, Chorpita BF, & Weisz JR (2020). Assessing the irritable and defiant dimensions of youth oppositional behavior using CBCL and YSR Items. Journal of Clinical Child & Adolescent Psychology, 49(6), 804–819. 10.1080/15374416.2019.1622119 [DOI] [PubMed] [Google Scholar]
- Hawes MT, Carlson GA, Finsaas MC, Olino TM, Seely JR, & Klein DN (2020). Dimensions of irritability in adolescents: Longitudinal associations with psychopathology in adulthood. Psychological Medicine, 50(16), 2759–2767. 10.1017/S0033291719002903 [DOI] [PubMed] [Google Scholar]
- Kaat AJ, Blackwell CK, Estabrook CR, Burns JL, Petitclerc A, Briggs-Gowan MJ, Gershon RC, Cella D, Perlman SB, & Wakschlag LS (2019). Linking the Child Behavior Checklist (CBCL) with the Multidimensional Assessment Profile of Disruptive Behavior (MAP-DB): Advancing a dimensional spectrum approach to disruptive behavior. Journal of Child and Family Studies, 28(2), 343–353. 10.1007/s10826-018-1272-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kaufman J, Birmaher B, Axelson D, Perepletchikova F, Brent D, & Ryan ND (2016). K-SADS-PL DSM-5. https://www.kennedykrieger.org/sites/default/files/library/documents/faculty/ksads-dsm-5-screener.pdf [Google Scholar]
- Klein DN, & Finsaas MC (2017). The Stony Brook Temperament Study: Early antecedents and pathways to emotional disorders. Child Development Perspectives, 11(4), 257–263. 10.1111/cdep.12242 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lord FM (1980). Applications of item response theory to practical testing problems. Taylor & Francis. 10.4324/9780203056615 [DOI] [Google Scholar]
- Luby J, Belden A, Botteron K, Marrus N, Harms MP, Babb C, Nishino T, & Barch D (2013). The effects of poverty on childhood brain development: The mediating effect of caregiving and stressful life events. JAMA Pediatrics, 167(12), 1135–1142. 10.1001/jamapediatrics.2013.3139 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Melvin GA, Tonge BJ, Mulraney M, Gordon M, Taffe J, & Klimkeit E (2018). The Cranky Thermometers: Visual analogue scales measuring irritability in youth. Journal of Adolescence, 64(April), 146–154. 10.1016/j.adolescence.2018.02.008 [DOI] [PubMed] [Google Scholar]
- Merwin SM, Barrios C, Smith VC, Lemay EP, & Dougherty LR (2018). Outcomes of early parent-child adrenocortical attunement in the high-risk offspring of depressed parents. Developmental Psychobiology, 60(4), 468–482. 10.1002/dev.21623 [DOI] [PubMed] [Google Scholar]
- Mulraney MA, Melvin GA, & Tonge BJ (2014). Psychometric properties of the Affective Reactivity Index in Australian adults and adolescents. Psychological Assessment, 26(1), 148–155. 10.1037/a0034891 [DOI] [PubMed] [Google Scholar]
- Olino TM, Yu L, McMakin DL, Forbes EE, Seeley JR, Lewinsohn PM, & Pilkonis PA (2013). Comparisons across depression assessment instruments in adolescence and young adulthood: An item response theory study using two linking methods. Journal of Abnormal Child Psychology, 41(8), 1267–1277. 10.1007/s10802-013-9756-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pagliaccio D, Pine DS, Barch DM, Luby JL, & Leibenluft E (2018). Irritability trajectories, cortical thickness, and clinical outcomes in a sample enriched for preschool depression. Journal of the American Academy of Child & Adolescent Psychiatry, 57(5), 336–342.e6. 10.1016/j.jaac.2018.02.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roberson-Nay R, Leibenluft E, Brotman MA, Myers J, Larsson H, Lichtenstein P, & Kendler KS (2015). Longitudinal stability of genetic and environmental influences on irritability: From childhood to young adulthood. American Journal of Psychiatry, 172(7), 657–664. 10.1176/appi.ajp.2015.14040509 [DOI] [PubMed] [Google Scholar]
- Samejima F (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement, 34(4, Pt. 2), 1–97. 10.1007/BF03372160 [DOI] [Google Scholar]
- Steiger JH (1990). Structural model evaluation and modification: An interval estimation approach. Multivariate Behavioral Research, 25(2), 173–180. 10.1207/s15327906mbr2502_4 [DOI] [PubMed] [Google Scholar]
- Stringaris A, Cohen P, Pine DS, & Leibenluft E (2009). Adult outcomes of youth irritability: A 20-year prospective community-based study. American Journal of Psychiatry, 166(9), 1048–1054. 10.1176/appi.ajp.2009.08121849 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stringaris A, Goodman R, Ferdinando S, Razdan V, Muhrer E, Leibenluft E, & Brotman MA (2012). The Affective Reactivity Index: A concise irritability scale for clinical and research settings. Journal of Child Psychology and Psychiatry, 53(11), 1109–1117. 10.1111/j.1469-7610.2012.02561.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stringaris A, Zavos H, Leibenluft E, Maughan B, & Eley T (2012). Adolescent irritability: Phenotypic associations and genetic links with depressed mood. American Journal of Psychiatry, 169(1), 47–54. 10.1176/appi.ajp.2011.10101549 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tseng W-L, Moroney E, Machlin L, Roberson-Nay R, Hettema JM, Carney D, Stoddard J, Towbin KA, Pine DS, Leibenluft E, & Brotman MA (2017). Test-retest reliability and validity of a frustration paradigm and irritability measures. Journal of Affective Disorders, 212(April), 38–45. 10.1016/jjad.2017.01.024 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wakschlag LS, Choi SW, Carter AS, Hullsiek H, Burns J, McCarthy K, Leibenluft E, & Briggs-Gowan MJ (2012). Defining the developmental parameters of temper loss in early childhood: Implications for developmental psychopathology. Journal of Child Psychology and Psychiatry, 53(11), 1099–1108. 10.1111/j.1469-7610.2012.02595.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wakschlag LS, Estabrook R, Petitclerc A, Henry D, Burns JL, Perlman SB, Voss JL, Pine DS, Leibenluft E, & Briggs-Gowan M (2015). Clinical implications of a dimensional approach: The normal:abnormal spectrum of early irritability. Journal of the American Academy of Child & Adolescent Psychiatry, 54(8), 626–634. 10.1016/j.jaac.2015.05.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wiggins JL, Briggs-Gowan MJ, Estabrook R, Brotman MA, Pine DS, Leibenluft E, & Wakschlag LS (2018). Identifying clinically significant irritability in early childhood. Journal of the American Academy of Child & Adolescent Psychiatry, 57(3), 191–199.e2. 10.1016/j.jaac.2017.12.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wiggins JL, Mitchell C, Stringaris A, & Leibenluft E (2014). Developmental trajectories of irritability and bidirectional associations with maternal depression. Journal of the American Academy of Child & Adolescent Psychiatry, 53(11), 1191–1205.e4. 10.1016/j.jaac.2014.08.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.