Abstract
The aim of the present study was to develop an abbreviated social cognition (SC) battery for individuals with schizophrenia spectrum disorders (SSD) to reduce the heterogeneity of and increase the frequency of assessment of SC impairment. To this end, the present study utilized Item Response Theory to develop brief versions of SC tasks administered to individuals with SSD (n = 386) and individuals without a psychiatric diagnosis (n = 292) during the Social Cognition Psychometric Evaluation (SCOPE) Study. Seven brief measures of SC were evaluated (i.e., Ambiguous Intentions and Hostility Questionnaire [AIHQ], Bell Lysaker Emotion Recognition Task [BLERT], Penn Emotion Recognition Task, Reading the Mind in the Eyes Task, Hinting Task, Intentionality Bias Task, Relationships Across Domains Task), and the existing brief version of The Awareness of Social Inference Test was reviewed. Psychometric properties for each brief SC measure were evaluated and compared to the original measures. Based on psychometric properties and relationships with other measures of SC, neurocognition, and functioning, two brief tasks (AIHQ, BLERT) and the full-length Hinting task were recommended for inclusion in a brief battery of SC tasks from the SCOPE Study (BB-SCOPE). The resulting BB-SCOPE is efficient, with an estimated administration time of 15 minutes, and comprehensively assesses three domains of SC (i.e., attributional bias, emotion processing, theory of mind) to identify severe SC impairment. Scoring of BB-SCOPE is also straightforward and includes a recommended cut-point of 60 for identifying SC impairment.
Keywords: schizophrenia, schizoaffective disorder, social cognition, measurement
Social cognition (SC) comprises how individuals think about themselves, others, social situations, and social interactions, and impairments in SC are well-documented in schizophrenia spectrum disorders (SSD; Hajdúk et al., 2018; Penn et al., 1997). SC impairments correlate with current symptoms, predict poorer outcome trajectories, and demonstrate reliable and unique relationships to functional outcomes, highlighting the role of SC as a compelling treatment target (Green et al., 2010; Halverson et al., 2019; Harvey and Penn, 2010; Velthorst et al., 2017).
While many individuals with SSD exhibit SC impairment, research suggests 25% of individuals with SSD do not show impairment in SC (Hajdúk et al., 2018). Therefore, accurate assessment of SC in SSD is important to identify individuals that may optimally benefit from treatments targeting these impairments. To this end, the multi-phase Social Cognition and Psychometric Evaluation (SCOPE) Study was designed to identify existing SC tasks that optimally assess SC in SSD. SC tasks were classified as “acceptable as is”, “acceptable with modifications”, or “not recommended” based on psychometric properties and expert consensus (Pinkham et al., 2018, 2016, 2014). The SCOPE Study evaluated 11 tasks of SC and identified six SC tasks rated as “acceptable” assessing the following domains of SC: emotion processing – Penn Emotion Recognition Task (ER40), Bell Lysaker Emotion Recognition Task (BLERT); mental state attribution – Eyes Task, The Awareness of Social Inferences Test – Part III (TASIT-III), Hinting Task; attributional style – Intentionality Bias Task (IBT). Five tasks were not recommended (i.e., Ambiguous Intentions Hostility Questionnaire [AIHQ], Trustworthiness Task, Relationships Across Domains Task [RAD], Social Attribution Test – Multiple Choice [SAT-MC], Mini Profile of Nonverbal Sensitivity [MiniPONS]), including all tasks of social perception (i.e., RAD, SAT-MC, MiniPONS).
Altogether, administration of tasks with an acceptable rating from the SCOPE Study (i.e., ER40, BLERT, Eyes, TASIT, Hinting, IBT) is estimated to have a total mean administration time around one hour (Pinkham et al., 2018). An administration length of one hour is less than ideal for individuals in an acute illness phase (e.g., inpatient setting) or for individuals attending a standard 50-minute community care outpatient appointment. Given the relationship of SC with functional impairment and the importance of early intervention, a brief battery of SC is imperative for efficient treatment planning to obtain optimal functional improvements.
Similar challenges (i.e., long administration times, heterogeneity in measurement) existed in the assessment of neurocognition (NC) in SSD before the development of the Brief Assessment of Cognition in Schizophrenia (BACS; Keefe et al., 2004). The BACS is a comprehensive assessment of key domains of NC in SSD with strong relationships to functional outcomes specifically developed as a brief battery for use in clinical trials. With a total administration time of 35 minutes, the BACS is feasible to administer, demonstrates good psychometric properties, and is strongly correlated with standard batteries of NC with longer administration times.
An analogous approach to brief assessment of SC in SSD is needed to prompt similar dissemination of a standard battery to identify SC impairment. A more clinic-friendly battery may reduce heterogeneity of SC measurement in research and increase SC assessment in clinical practice. Whereas the BACS was developed de novo, the SCOPE battery offers a useful starting point to create a brief battery of SC for several reasons. First, SCOPE measures were identified according to expert survey and finalized based on thorough psychometric evaluation, suggesting this is one of the most comprehensive evaluated batteries possible. Second, with over 650 individuals across all phases, the SCOPE Study is one of the largest samples to examine task performance in both SSD and individuals without a psychiatric diagnosis. The SCOPE Study also carefully assessed symptoms, NC, and functioning allowing for comprehensive validity investigation.
Item Response Theory (IRT) is a compelling approach for developing brief versions of measures (Bock, 1997; Thissen and Orlando, 2001) due to an emphasis on individual item performance (DeVellis, 2017). For each item, IRT estimates a parameter for item difficulty and item discrimination. Estimation of these parameters and graphical presentation allow for identification of items that optimally discriminate different levels of a latent trait (e.g., emotion perception) as well as ensure the entire trait continuum is covered (e.g., low to high levels of emotion perception). There are several examples of using an IRT approach for development and validation of brief measures (Bortolotti et al., 2013; Petrillo et al., 2015), including in SSD (Ventura et al., 2010).
The aims of the present study are 1) develop brief versions of all tasks from the SCOPE Study with “acceptable” ratings; 2) examine brief versions of tasks with previous “not recommended” ratings where concerns may be addressed by shortening the task (e.g., long administration times such as in the RAD Task); 3) examine psychometric properties of brief versions of all tasks (i.e., group differences, internal reliability, utility as a repeated measure, and relationships with functional outcomes); 4) make recommendations for a comprehensive brief battery of SCOPE (BB-SCOPE) to efficiently assess SC impairment in SSD.
Method
Participants
Participants were individuals with SSD (n = 386) and healthy controls (HC; n = 292) recruited from Southern Methodist University (n = 165), The University of Miami Miller School of Medicine (n = 227), The University of Texas at Dallas (n = 135), and the University of North Carolina at Chapel Hill (n = 151). Institutional Review Boards at all institutions approved study procedures.
Participants during phase three (SSD n = 179, HC n = 104) and phase five (SSD n = 158, HC n = 153) completed a baseline visit and a retest assessment two to four weeks later. Participants provided informed consent and completed SC and NC tasks as well as functional outcome measures (counterbalanced order). For individuals with SSD, symptom severity was assessed using the Positive and Negative Syndrome Scale (PANSS; Kay et al., 1987), rater ICCs > .80. Participants during Phase Four (SSD n = 49, HC n = 35) completed a single study visit of SC tasks rated as “acceptable with modifications” from phase three. Visit procedures were similar to phases three and five with the exception of no retest visit. Detailed study methods and procedures are published elsewhere (Cornacchio et al., 2017; Pinkham et al., 2018, 2016).
Measures
Social Cognition Measures
Attributional Style/Bias.
The Ambiguous Intentions and Hostility Questionnaire (AIHQ; Combs et al., 2007) presents five vignettes and asks participants to rate the extent to which the person from the vignette performed an action on purpose, how angry the action would make them feel, and how much they would blame the other person. Ratings for each vignette are summed and then averaged (i.e., sum of vignette total scores divided by five) to create a Blame Index (range 3-16). The original AIHQ includes open-ended responses scored by trained raters, however, research suggests scoring only the Likert items may improve AIHQ performance, and thus only the Blame Index will be examined in the present study (Buck et al., 2017).
The Intentionality Bias Task (IBT; Rosset, 2008) consists of 24 short sentences describing simple actions. Participants indicate whether these actions occurred “on purpose” or “by accident.” Intentionality bias is calculated as the percentage of trials indicated as intentional, with higher scores indicating greater intentionality bias (range 0 to 1).
Tasks of attributional bias tend to show stronger relationships to paranoia and suspiciousness than other domains of SC, and therefore correlations with symptoms were also examined (Buck et al., 2017; Pinkham et al., 2016).
Emotion Perception and Processing.
The Bell Lysaker Emotion Recognition Task (BLERT; Bryson et al., 1997) includes 21 videos of an actor displaying different emotions. Participants select which affect state (e.g., happiness, sadness) was most prominently displayed. Correct responses are summed for a total score (range 0-21).
The Penn Emotion Recognition Test (ER40; Kohler et al., 2003) includes 40 photographs of faces expressing four basic emotions as well as neutral expressions. Participants choose the correct emotion from five choices. An accuracy score is calculated by summing correctly identified emotions (range 0-40).
Mental State Attribution.
The Awareness of Social Inferences Test Part III (TASIT; McDonald et al., 2003) measures the ability to detect lies or sarcasm and has two forms (i.e., form A and form B) for counterbalanced administration across retesting. Participants watch videos of social interactions and answer questions regarding the intentions, beliefs, and meanings of the speakers and their interactions. A short version of the TASIT, the TASIT-S (Honan et al., 2016) was developed after initial data collection and will be re-examined in the present study. The TASIT-S includes only one form and includes items from both form A and form B so alternate form reliability will not be examined since forms A and B were counterbalanced across administrations in the SCOPE Study (total range of correctly answered questions 0-36).
The Reading the Mind in the Eyes Test (Eyes; Baron-Cohen et al., 2001) consists of 36 black-and-white photographs of eye regions expressing different thoughts and feelings. Participants select the thought/feeling portrayed in the photograph from a list of four options. Correct items are summed for a total score (range 0-36).
The Hinting Task (Corcoran et al., 1995) includes short passages of interactions between two characters ending with an indirect statement. Participants are prompted to explain the intention of this statement. Responses are coded as “2” (correctly described intention), “1” (correctly described intention after additional information is provided), or “0” (did not provide accurate description of intention) with a total score range 0-20, see Klein (2020) for recommended scoring criteria.
Social Perception.
The Relationship Across Domains Test (RAD; Sergi et al., 2009) presents interactions consistent with four relational models. Participants respond “yes” or “no” if they think a future behavior is likely based on the vignette. A total correct score is calculated (range 0-45). The RAD was not recommended after phase three due to long administration time (i.e., 15 minutes), however a brief version with careful item selection may address long administration time.
Neurocognition Measures
A subset of the MATRICS Consensus Cognitive Battery (MCCB; Nuechterlein et al., 2008) assessed NC. MCCB tasks administered included: Trail Making Test – Part A, BACS – Symbol Coding, Category Fluency, Animal Naming, Letter-Number Span, and the Hopkins Verbal Learning Test-Revised.
Functional Outcomes
The UCSD Performance-Based Skills Assessment – Brief (UPSA-B; Mausbach et al., 2007) assessed functional capacity. Social competence was assessed with the Social Skills Performance Assessment (SSPA; Patterson et al., 2001). Real-world functioning was assessed by informants (SSD only) as well as self-reports using the Specific Levels of Functioning Scale (SLOF; Schneider and Struening, 1983).
Data Analytic Plan
Measures were assessed for multidimensionality by comparing unidimensional and multidimensional IRT models using the mirt package for R (Chalmers, 2012) with Bayesian Information Criterion (BIC) values emphasized since this criterion rewards parsimonious models (Kuha, 2004). For measures where multidimensional models were recommended, unidimensional IRT models were fit to each dimension separately to preserve original task factors. IRT model fitting and test statistics were computed for all tasks (i.e., one-, two-, or three-parameter models for dichotomous responses and graded response, graded partial credit, and nominal models for polytomous responses). Once an appropriate model fit was established, individual item difficulty (b) and discrimination (a) statistics were reviewed. Item information values (θ) were evaluated based on performance at one standard deviation below average ability since a brief battery is intended to identify individuals with impaired SC. Candidate items were compiled into brief versions of each SC task.
Psychometric properties of each brief measure were examined: internal consistency (coefficient omegas), test-retest (Pearson’s r correlation coefficient), utility as a repeated measure (paired-samples t-test comparing administration timepoints), floor/ceiling effects (proportion of participants performing at chance level/achieving perfect scores), and group differences (t-test comparing healthy control performance with SSD sample). Relationships with NC task performance and functional outcome measures were also assessed (Pearson’s r). Pearson’s r values greater than .6 were considered acceptable (Akoglu, 2018; Kraemer et al., 2012). Relationships with indicators of functioning were assessed through a series of regressions with SC measures as predictors and functional outcome measures as dependent variables. Regression models including both NC and brief SC measures as predictors of functional outcomes tested incremental validity. Relationships between original and brief SC measures were also examined (Pearson’s r). While some approaches to develop brief batteries emphasize a single efficient test, SC is comprised of meaningful domains (Buck et al., 2016; Riedel et al., 2021). Therefore, development of BB-SCOPE emphasized representation of SC measures across SC domains with good psychometric properties and unique relationships with functional outcomes. Receiver operating characteristic (ROC) analyses identified an optimal BB-SCOPE total score based on Youden’s Index maximizing sensitivity and specificity (Robin et al., 2011; Youden, 1950) for identifying SC impairment. Choosing an optimal cut-score balanced identification of individuals one standard deviation below HC performance and high area under the curve (AUC) values for identifying social competence and functional capacity.
Results
Participant demographic and clinical characteristics are presented in Table 1. Performance on SC tasks, NC tasks, and functional outcomes are presented in Table 2. SSD individuals completed fewer years of education and had lower WRAT-3 scores compared with HC and performed significantly lower on all indices of SC, NC, and functional outcomes.
Table 1.
Sample Characteristics by Diagnostic Group
| SSD (n = 386) |
HC (n = 292) |
p-valuea | |
|---|---|---|---|
| Age, years | 41.5 ± 11.9 | 41.2 ± 12.7 | .74 |
| Male, % (n) | 66.0 (254) | 57.2 (167) | .02 |
| Education, years | 12.9 ± 2.4 | 13.9 ± 1.9 | <.01 |
| WRAT-3 Standard Score | 94.6 ± 15.0 | 98.8 ± 12.3 | <.01 |
| Race/Ethnicityb, % (n) | |||
| White | 48.4 (187) | 46.6 (136) | .63 |
| Black | 45.3 (175) | 46.6 (136) | .75 |
| Other | 6.2 (24) | 6.8 (20) | .74 |
| Hispanic/Latinx | 17.6 (68) | 19.5 (57) | .64 |
| Diagnosis, % (n) | |||
| Schizophrenia | 51.8 (200) | ||
| Schizoaffective | 47.2 (182) | ||
| Psychosis NOS | 10 (4) | ||
| Medication Typec, % (n) | |||
| Typical | 12.7 (49) | ||
| Atypical | 74.6 (288) | ||
| Combination | 4.9 (19) | ||
| No Antipsychotic | 7.8 (30) | ||
| PANSS | |||
| Positive | 16.3 ± 5.3 | ||
| Negative | 14.1 ± 5.4 | ||
| General | 31.9 ± 8.0 | ||
| Total | 62.2 ± 14.8 |
Note:
Chi-squared for categorical variables (sex, race/ethnicity), t-test for continuous variables (age, education, WRAT standard score)
individuals were able to identify more than one race/ethnicity so percentages will not add up to 100
Medication information was unavailable for 7 participants. WRAT - Wide Range Achievement Test – 3rd Edition, PANSS = Positive and Negative Syndrome Scale, SSD = schizophrenia spectrum disorder, HC = healthy control; all values presented are M ± SD unless otherwise noted.
Table 2.
Social Cognition Tasks and Functional Outcome Measures by Diagnostic Group
| SSD | HC | p-value | |
|---|---|---|---|
| Social Cognition Tasks | |||
| AIHQa | 8.7 ± 2.9 | 7.0 ± 2.4 | <.01 |
| BLERT | 13.6 ± 4.1 | 15.8 ± 2.7 | <.01 |
| ER40 | 30.5 ± 5.1 | 33.0 ± 3.4 | <.01 |
| Eyes | 21.0 ± 5.5 | 24.3 ± 4.5 | <.01 |
| Hinting | 13.4 ± 3.8 | 16.1 ± 2.5 | <.01 |
| IBTb | 44.5 ± 17.9 | 40.4 ± 14.6 | .03 |
| RADa | 24.8 ± 5.8 | 29.8 ± 5.2 | <.01 |
| TASITc | 44.6 ± 7.6 | 51.1 ± 6.3 | <.01 |
| Neurocognition Tasks | |||
| Animal Naming | 19.2 ± 5.7 | 22.6 ± 5.9 | <.01 |
| HVLT | 20.8 ± 5.6 | 25.1 ± 4.6 | <.01 |
| Letter Number Sequence | 11.9 ± 4.1 | 14.6 ± 3.8 | <.01 |
| Symbol Coding | 42.9 ± 11.6 | 53.2 ± 12.5 | <.01 |
| Trails A | 40.4 ± 18.3 | 31.5 ± 10.9 | <.01 |
| Functional Outcome Measures | |||
| SSPA Average | 4.1 ± 0.5 | 4.6 ± 0.4 | <.01 |
| SSPA1 | 4.2 ± 0.6 | 4.6 ± 0.4 | <.01 |
| SSPA2 | 4.0 ± 0.6 | 4.4 ± 0.5 | <.01 |
| SLOF Self-Renort Average | 4.2 ± 0.6 | 4.6 ± 0.4 | <.01 |
| Interpersonal Relationships | 3.6 ± 0.9 | 4.1 ± 0.7 | <.01 |
| Social Acceptability | 4.5 ± 0.5 | 4.7 ± 0.4 | <.01 |
| Activities of Community Living | 4.4 ± 0.8 | 4.8 ± 0.5 | <.01 |
| Work Skills | 4.1 ± 0.8 | 4.7 ± 0.5 | <.01 |
| SLOF Informant Average d | 4.0 ± 0.6 | ||
| Interpersonal Relationships | 3.4 ± 0.9 | ||
| Social Acceptability | 4.4 ± 0.6 | ||
| Activities of Community Living | 4.4 ± 0.8 | ||
| Work Skills | 3.6 ± 0.9 | ||
| UPSA-B Total d | 70.3 ± 14.2 |
Note:
only collected during initial phase
IBT only collected during final phase
task version counterbalanced across visits, values presented are for Form A
only administered to SSD group; AIHQ = Ambiguous Intentions and Hostility Questionnaire Blame Index, BLERT = Bell Lysaker Emotion Recognition Task, ER40 = Penn Emotion Recognition Task, Eyes = Reading the Mind in the Eyes Task, Hinting = Hinting Task, IBT = Intentionality Bias Task, RAD = Relationships Across Domains Task; TASIT = The Awareness of Social Inference Test, HVLT = Hopkins Verbal Learning Test, SSPA = Social Skills Performance Assessment, SLOF = Specific Levels of Functioning, UPSA-B = UCSD Performance-Based Skills Assessment - Brief, HC = healthy control, SSD = schizophrenia spectrum disorder; all values presented are M ± SD.
Development of Brief Social Cognition Tasks
Assessment of SC task dimensionality, IRT fit statistics, and optimal model fit are presented in Supplementary Materials. Items retained for brief versions are presented in Supplementary Tables 1-8. The number of items retained for brief versions was guided by a preference to select efficient items balanced with number of items needed for acceptable psychometric properties. Efforts were made to retain original factors as intended by the original task authors. Figure 1 presents IRT analysis of original and brief SC tasks.
Figure 1. Item and Test Characteristic Curves.
Note: Dashed lines represent original task items, solid lines represent items retained for brief versions of tasks; Test information curves presented separately by task factor for multidimensional tasks (i.e., IBT, TASIT, RAD); AIHQ = Ambiguous Intentions and Hostility Questionnaire, BLERT = Bell Lysaker Emotion Recognition Task, ER40 = Penn Emotion Recognition Task, Eyes = Reading the Mind in the Eyes Task, Hinting = Hinting Task, IBT = Intentionality Bias Task, RAD = Relationship Across Domains Task, TASIT = The Awareness of Social Inference Test.
Evaluation of Brief Social Cognition Tasks
HC outperformed SSD on all brief SC tasks except for the brief IBT (IBT-B; see Table 3). Brief SC tasks exhibited better psychometric properties within SSD compared with HC (see Table 4). In general, brief SC tasks retained similar, albeit slightly reduced, psychometric properties and demonstrated strong relationships with original tasks (i.e., Pearson’s rs .87 - .96, ps <.05; see Table 5). The brief ER40 (ER40-B) and the brief RAD (RAD-B) demonstrated reduced relationships with indicators of functioning compared with original task performance while the brief Eyes (Eyes-B) demonstrated improved incremental validity and stronger relationships with indicators of functioning.
Table 3.
Performance on Brief Social Cognition Tasks by Diagnostic Group
| Brief Task Characteristics | Group Differences | Relationship with Original Task |
||||||
|---|---|---|---|---|---|---|---|---|
| Items | Range | SSD | HC | t-test | Cohen’s d | p | Pearson’s r | |
| Attributional Bias | ||||||||
| AIHQ - B | 15 | 3 – 16 | 8.7 ± 2.9 | 7.0 ± 2.4 | t(303.6) = 7.1 | 0.64 | <.01 | - |
| IBT-B | 14 | 0 – 100 | 44.4 ± 17.2 | 40.6 ± 14.8 | t(278)= 1.9 | 0.23 | .05 | .91 |
| Emotion Processing | ||||||||
| BLERT-B | 10 | 0 – 10 | 6.4 ± 2.3 | 7.5 ± 1.7 | t(674.1) = 7.4 | 0.55 | <.01 | .91 |
| ER40 - B | 18 | 0 – 18 | 13.4 ± 3.2 | 15.0 ± 2.1 | t(666.5) = 7.8 | 0.64 | <.01 | .92 |
| Social Perception | ||||||||
| RAD - B | 21 | 0 – 21 | 11.8 ± 3.5 | 14.7 ± 2.8 | t(253.9) = 7.6 | 0.79 | <.01 | .91 |
| Theory of Mind | ||||||||
| Eyes - B | 18 | 0 – 18 | 11.2 ± 3.6 | 13.3 ± 2.8 | t(674.8) = 8.4 | 0.64 | <.01 | .91 |
| Hinting - B | 8 | 0 – 16 | 11.1 ± 3.2 | 13.2 ± 2.3 | t(672.6) = 9.9 | 0.76 | <.01 | .96 |
| TASIT-S | 36 | 0 – 36 | 24.0 ± 4.7 | 27.5 ± 4.1 | t(345.9) = 10.4 | 0.80 | <.01 | .93 |
Note: AIHQ-B = Ambiguous Intentions and Hostility Questionnaire Blame Index- Brief, BLERT-B = Bell Lysaker Emotion Recognition Task - Brief, ER40-B = Penn Emotion Recognition Task - Brief, Eyes-B = Reading the Mind in the Eyes Task - Brief, Hinting-B = Hinting Task-Brief, IBT-B = Intentionality Bias Task - Brief, RAD-B = Relationships Across Domains Task - Brief; TASIT-S= The Awareness of Social Inference Test Part III – short version, SSD = schizophrenia spectrum disorder, HC = healthy control; all values presented are M ± SD. AIHQ-B includes only Likert items from AIHQ (i.e., not rater-scored items) so no comparison (i.e., Pearson’s r) with original task presented.
Table 4.
Comparison of Brief Social Cognition Tasks by Diagnostic Group
| HC | Internal Consistency |
Test-Retest | Repeated Measures Utility | Relationship with Original Task |
||||
|---|---|---|---|---|---|---|---|---|
| SSD | ||||||||
| Brief Social Cognition Task | n | McDonald’s w | Pearson’s r | Repeated t-test | Cohen’s d | % Floor/ Chance |
% Ceiling | Pearson’s r |
| Attributional Bias | ||||||||
| AIHQ - B | 138 | .86 | .76** | t(126) = 4.14** | 0.37 | - | - | - |
| 228 | .87 | .67** | t(213) = 1.75** | 0.12 | - | - | - | |
| IBT-B | 150 | .57 | .50** | t(135) = 3.04** | 0.26 | - | - | .90** |
| 130 | .65 | .58** | t(116) = 2.90** | 0.27 | - | - | .92** | |
| Emotion Processing | ||||||||
| BLERT-B | 292 | .53 | .61** | t(274) = 2.65** | 0.16 | 0 | 8.22 | .87** |
| 386 | .71 | .69** | t(361) = 4.86** | 0.26 | 2.59 | 5.96 | .91** | |
| ER40 - B | 292 | .55 | .60** | t(274) = 0.98 | 0.06 | 0 | 4.79 | .87** |
| 386 | .72 | .71** | t(360) = 4.13** | 0.22 | 0.26 | 1.04 | .93** | |
| Social Perception | ||||||||
| RAD-B | 104 | .63 | .66** | t(97) = 0.35 | 0.04 | 9.62 | 0 | .87** |
| 175 | .76 | .66** | t(165) = 1.97 | 0.15 | 40.57 | 0.57 | .90** | |
| Theory of Mind | ||||||||
| Eyes - B | 292 | .62 | .71** | t(274) = 0.49, | 0.03 | 0.34 | 3.08 | .88** |
| 386 | .73 | .76** | t(361) = 0.75 | 0.04 | 3.63 | 1.55 | .91** | |
| Hinting - B | 288 | .65 | .61** | t(273) = 3.02** | 0.18 | 0 | 12.15 | .94** |
| 384 | .68 | .65** | t(359) = 3.37** | 0.18 | 0.26 | 2.08 | .96** | |
| TASIT - S | 286 | .64 | - | - | - | 1.40 | 1.05 | .92** |
| 378 | .74 | - | - | - | 7.14 | 0 | .92** | |
Note:
p<.01; Floor and ceiling effects not reported for measures of attributional bias (i.e., AIHQ-B and IBT-B) since scores reflect bias rather than correct or incorrect responses, TASIT-S test-retest and repeated t-test not reported since TASIT-S is based on TASIT Form A which was only administered to each participant once; HC = healthy control group; SSD = schizophrenia spectrum disorder group; AIHQ-B = Ambiguous Intentions and Hostility Questionnaire Blame Index- Brief, BLERT-B = Bell Lysaker Emotion Recognition Task - Brief, ER40-B = Penn Emotion Recognition Task - Brief, Eyes-B = Reading the Mind in the Eyes Task - Brief, Hinting-B = Hinting Task-Brief, IBT-B = Intentionality Bias Task - Brief, RAD-B = Relationships Across Domains Task - Brief; TASIT-S= The Awareness of Social Inference Test Short Form.
Table 5.
Comparison of Original and Brief Social Cognition Tasks
| Original | Internal Consistency | Test-Retest | Repeated Measures Utility | Convergent Validity |
Divergent Validity |
Relationships with Indicators of Functioning |
Incremental Validity | Time | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Brief | ||||||||||||
| SC Task | Items | McDonald’s CO |
Pearson’s r |
Repeated t-test |
Cohen’s D |
% Floor/ Chance |
% Ceiling |
Proportion of SC with rs p <.05 |
Proportion of NC with rs p <.05 |
Functional Outcomes Predicted |
Functional Outcomes Predicted (after NC) |
Minutes |
| Attributional Bias | ||||||||||||
| AIHQ | 15 | - | - | - | - | - | - | - | - | - | - | - |
| 15 | .88 | .73 ** | t(340) = 3.5** | 0.19 | - | - | 2/6 | 0/5 | SLOF-Inf, SSPA | SSPA | 6.35 | |
| IBT | 24 | .68 | .58** | t(245) = 3.9** | 0.25 | - | - | 3/5 | 2/5 | UPSA-B | UPSA-B | 5.43 |
| 14 | .63 | .55** | t(252) = 4.2** | 0.26 | - | - | 1/5 | 1/5 | SSPA, UPSA-B | UPSA-B | 3.16 | |
| Emotion Processing | ||||||||||||
| BLERT | 21 | .76 | .75** | t(636) = 7.4** | 0.29 | 0 | 1.18 | 7/7 | 5/5 | SLOF-Inf | SLOF-Inf | 7.09 |
| 10 | .69 | .68 ** | t(636) = 5.5** | 0.22 | 1.48 | 6.93 | 6/7 | 5/5 | SLOF-Inf | SLOF-Inf | 3.37 | |
| ER40 | 40 | .81 | .74** | t (625) = 4.1** | 0.16 | 0.15 | 0.15 | 7/7 | 5/5 | SSPA | - | 3.21 |
| 18 | .71 | .70** | t(635) = 3.9** | 0.15 | 0.15 | 2.65 | 5/7 | 5/5 | - | - | 1.44 | |
| Social Perception | ||||||||||||
| RAD | 45 | .78 | .79** | t(263) = 3.6** | 0.22 | 31.9 | 0 | 6/6 | 5/5 | UPSA-B | - | 15.84 |
| 21 | .76 | .71** | t(263) = 1.9 | 0.11 | 29.03 | 0.36 | 5/6 | 5/5 | - | - | 7.39 | |
| Theory of Mind | ||||||||||||
| Eyes | 36 | .74 | .79** | t(636) = 0.3 | 0.01 | 1.18 | 0 | 7/7 | 5/5 | - | - | 6.56 |
| 18 | .72 | .76** | t(636) = 0.3 | 0.01 | 2.21 | 2.21 | 7/7 | 5/5 | UPSA-B | UPSA-B | 3.28 | |
| Hinting | 10 | .72 | .68 ** | t(633) = 4.8** | 0.19 | 0.15 | 2.83 | 7/7 | 5/5 | SSPA, UPSA-B | SSPA, UPSA-B | 6.13 |
| 8 | .71 | .66** | t(633) = 4.5** | 0.18 | 0.15 | 6.40 | 5/7 | 5/5 | SSPA. UPSA-B | SLOF-SR, SSPA, UPSA-B | 4.90 | |
| TASIT | 64 | .82 | .64** | t(634) = 5.3** | 0.21 | 3.23 | 0 | 6/7 | 5/5 | SLOF-Inf, UPSA-B | - | 17.92 |
| 36 | .71 | - | - | - | 4.46 | 0.46 | 5/7 | 5/5 | SLOF-Inf, SSPA, UPSA-B | - | 10.08 | |
Note:
p<.01: Bolded values indicate tasks selected for the final brief battery of social cognition (BB-SCOPE); Floor and ceiling effects not reported for measures of attributional bias (i.e., AIHQ-B and IBT-B) since scores reflect bias rather than correct or incorrect responses, TASIT-S test-retest and repeated t-test not reported since TASIT-S is based on TASIT Form A which was only administered to each participant once; AIHQ-B = Ambiguous Intentions and Hostility Questionnaire Blame Index- Brief, BLERT-B = Bell Lysaker Emotion Recognition Task - Brief, ER40-B = Penn Emotion Recognition Task - Brief, Eyes-B = Reading the Mind in the Eyes Task - Brief, Hinting-B = Hinting Task-Brief, IBT-B = Intentionality Bias Task - Brief, RAD-B = Relationships Across Domains Task - Brief; TASIT-S= The Awareness of Social Inference Test Short Form.
Internal Consistency
Most brief SC tasks retained good internal consistency (i.e., McDonald’s coefficient omegas [] = .71 - .88) with higher internal consistency observed in SSD compared with HC, as also seen in SCOPE. The IBT-B demonstrated acceptable internal consistency ( = .63), similar to the original IBT ( = .68).
Test-Retest Reliability
Test-retest reliability was acceptable (rs .66-.76), except for the IBT-B which demonstrated poor test-retest reliability (r = .55), similar to the original IBT (r = .58). Test-retest reliability for brief tasks was slightly higher within HC compared to SSD.
Utility as a Repeated Measure
Similar performance was observed for brief measures compared with original measures. Brief tasks demonstrated significant differences in performance between testing visits (repeated t-test ps <.05) except for the RAD-B and the Eyes-B. Effect sizes for repeated performance were small (i.e., Cohen’s ds .04 - .27 in SSD) and similar to original tasks. Effect sizes for repeated performance were generally larger in SSD compared with HC except for the brief Hinting task (Hinting-B) and brief AIHQ (AIHQ-B) which demonstrated smaller or identical effect sizes in SSD compared with HC.
Floor/chance and ceiling affects were mostly limited and comparable between brief and original tasks except for the RAD-B (29.05% performance at chance). Pronounced chance performance effects observed were similar to the original RAD (31.90% performance at chance). Overall, SSD had slightly higher floor/chance and lower ceiling performance compared with HC.
Convergent Validity
Brief SC tasks demonstrated significant relationships with other brief SC measures (see Table 6).
Table 6.
Social Cognition, Neurocognition, and Functional Outcome Correlations in Schizophrenia Spectrum Disorders
| Attributional Bias | Emotion Processing | Social Perception |
Theory of Mind | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Brief Social Cognition Tasks | AIHQ-B | IBT-B | BLERT -B | ER40-B | RAD-B | Eyes-B | Hinting-B | TASIT-S | |
| Convergent Validity | AIHQ | - | - | −.15* | −.07 | −.08 | −.14* | .04 | −.04 |
| IBT | - | - | −.19 | −.30 | - | −.18* | −.12 | −.03 | |
| BLERT | −.18** | −.16 | - | .59** | .39** | .58** | .35** | .47** | |
| ER40 | −.07 | −.27** | .64** | - | .36** | .56** | .31** | .39** | |
| RAD | −.07 | - | .47** | .45** | - | .51** | .31** | .48** | |
| Eyes | −.15* | −.16 | .61** | .56** | .58** | - | .41** | .45** | |
| Hinting | .04 | −.14 | .38** | .36** | .35** | .39** | - | .39** | |
| TASIT | −.05 | −.04 | .54** | .48** | .61** | .57** | .40** | - | |
| Divergent Validity | Animal Naming | −.13 | −.04 | .34** | .29** | .25** | .37** | .22** | .34** |
| HVLT-R | −.03 | −.19* | .43** | .38** | .51** | .40** | .30** | .44** | |
| Letter Number Span | −.10 | −.12 | .45** | .40** | .55** | .48** | .35** | .43** | |
| Symbol Coding | −.02 | −.12 | .38** | .35** | .39** | .42** | .25** | .43** | |
| Trails A | .04 | −.08 | −.27** | −.24** | −.24** | −.28** | −.17** | −.28** | |
| Relationships with Indicators of Functioning | SLOF Informant Total | −.13 | −.15 | .23** | .14* | .17* | .13* | .15* | .19** |
| Activities of Community Living | −.15 | −.14 | .17* | .13* | .15 | .10 | .13* | .11 | |
| Interpersonal Relationships | −.05 | −.05 | .19** | .10 | .08 | .09 | .12* | .14* | |
| Social Acceptability | −.16* | −.13 | .11 | .04 | .03 | .10 | .01 | .01 | |
| Work Skills | .02 | −.14 | .24** | .14* | .22** | .13* | .17** | .25** | |
| SLOF Self-Report Total | - | .03 | .12 | .13 | - | .05 | −.08 | −.05 | |
| Activities of Community Living | - | −.05 | .22* | .28** | - | .25** | .08 | .12 | |
| Interpersonal Relationships | - | .08 | −.01 | −.03 | - | −.11 | −.13 | −.22** | |
| Social Acceptability | - | −.08 | .02 | −.01 | - | −.06 | −.13 | .01 | |
| Work Skills | - | .15 | .02 | .02 | - | −.07 | −.12 | −.10 | |
| SSPA Average | .12 | −.27** | .35** | .33** | .23** | .32** | .41** | .35** | |
| SSPA 1 | .04 | .19* | .30** | .29** | .11 | .28** | .32** | .29** | |
| SSPA 2 | .17* | −.28** | .31** | .31** | .32** | .29** | .40** | .33** | |
| UPSA-B Total | .03 | −.31** | .36** | .34** | .40** | .43** | .43** | .39** | |
Note: Correlations between original social cognition tasks presented in gray, SLOF Self-Report not administered during same phases at AIHQ and RAD task administration
p <.05
p <.01. AIHQ = Ambiguous Intentions and Hostility Questionnaire, BLERT = Bell Lysaker Emotion Recognition Task, ER40 = Penn Emotion Recognition Task, Eyes = Reading the Mind in the Eyes Task, Hinting = Hinting Task, IBT = Intentionality Bias Task, RAD = Relationship Across Domains Task, TASIT = The Awareness of Social Inference Test, -B and -S after task names indicate brief versions of tasks, HVLT = Hopkins Verbal Learning Test, SSPA = Social Skills Performance Assessment, SLOF = Specific Levels of Functioning, UPSA-B = UCSD Performance-Based Skills Assessment - Brief, SSD = schizophrenia spectrum disorder.
Divergent Validity
Divergent validity of brief SC tasks was less impressive with most brief SC tasks demonstrating significant relationships with NC tasks in SSD, similar to original SC tasks (see Table 6). Measures of attributional bias demonstrated stronger divergent validity.
Relationships with Indicators of Functioning
Correlations with functional outcomes in SSD are presented in Table 6. Brief SC tasks demonstrated correlations with functional outcomes similar to original versions. Brief SC tasks also demonstrated unique relationships with functional outcomes in a series of linear regressions with the exception of the ER40-B and the RAD-B (see Table 7). Follow-up analyses examined relationships between brief SC tasks and high-quality informants (i.e., high-contact family members or friends) with similar results with the exception of no significant relationships observed between the informant SLOF and the AIHQ-B.
Table 7.
Regressions Predicting Functional Outcomes in Schizophrenia Spectrum Disorders
| Predicting Functional Outcomes | Incremental Validity Predicting Functional Outcomes Beyond NC Perfonnance |
|||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Model Predictors | SLOF Informant |
SLOF Self-Report |
SSPA | UPSA-B | SLOF Informant |
SLOF Self-Report |
SSPA | UPSA-B | ||
| Task Administration | All Phases (n = 316) | Block: NC Tasks | ||||||||
| Adjusted R2 | - | - | - | - | .06 | <.01 | .16 | .30 | ||
| Block: SC Tasks a | ||||||||||
| Adjusted R2 | .06 | .02 | .24 | .28 | Δ <.01 | Δ<.01 | Δ.08 | Δ.06 | ||
| SC Brief Tasks (β) | ||||||||||
| BLERT-B | .19 | .20 | .11 | .06 | .19 | .14 | .06 | −.01 | ||
| ER40-B | −.02 | .13 | .10 | .05 | −.06 | .15 | .06 | .02 | ||
| Eyes-B | −.06 | −.06 | .04 | .17 | −.07 | −.06 | .01 | .12 | ||
| Hinting-B | .06 | −.14 | .28 | .26 | .08 | −.21 | .26 | .22 | ||
| TASIT-S | .16 | −.12 | .13 | .17 | .06 | −.17 | .07 | .06 | ||
| Phases Three & Four (n = 163) | Block: SC Tasks b | |||||||||
| Adjusted R2 | .11 | .20 | .30 | Δ.04 | Δ.09 | Δ.08 | ||||
| SC Brief Tasks (β) | ||||||||||
| AIHQ-B | −.16 | - | .17 | .05 | −.14 | - | .21 | .06 | ||
| RAD-B | −.01 | - | .01 | .15 | −.01 | - | −.10 | .01 | ||
| Phase Five (n = 123) | Block: SC Tasks c | |||||||||
| Adjusted R2 | .11 | .01 | .30 | .31 | Δ.05 | Δ<.01 | Δ.12 | Δ.07 | ||
| SC Brief Tasks (β) | ||||||||||
| IBT-B | −.10 | .10 | -.17 | -.25 | −.06 | .15 | −.13 | −.20 | ||
Note:
Model includes tasks from all phases (i.e., BLERT, ER40, Eyes, Hinting, TASIT)
Model includes tasks administered during all phases with RAD and AIHQ-B
Model includes tasks administered during all phases with IBT-B; Δ indicates change in adjusted R2 after accounting for NC block; VIF values for all predictors across models < 2.5 indicating acceptable multicollinearity, bold values indicate p<.05; SLOF Self-Report not administered during phases Three and Four; AIHQ-B = Ambiguous Intentions and Hostility Questionnaire Blame Index, BLERT-B = Bell Lysaker Emotion Recognition Task, ER40-B = Penn Emotion Recognition Task, Eyes = Reading the Mind in the Eyes Task, Hinting = Hinting Task, IBT-B = Intentionality Bias Task, RAD = Relationship Across Domains Task, TASIT-S = The Awareness of Social Inference Test, SSPA = Social Skills Performance Assessment, SLOF = Specific Levels of Functioning, UPSA-B = UCSD Performance-Based Skills Assessment - Brief.
Incremental Validity
The AIHQ-B, IBT-B, BLERT-B, Eyes-B, and Hinting-B demonstrated incremental validity after controlling for NC task performance (see Table 7). Altogether, brief SC tasks explained an additional 1-12% of variance in functional outcomes. These results were similar to incremental validity observed in original SC tasks with the exception that Eyes-B demonstrated incremental validity with the UPSA-B, whereas the original Eyes did not demonstrate incremental validity.
Final BB-SCOPE Recommendations
Three tasks were identified for inclusion in BB-SCOPE based on parsimonious representation of one task per SC domain, good psychometric properties, and unique relationships with functional outcomes: the AIHQ-B, the BLERT-B, and the original Hinting task. The Hinting-B only achieved adequate psychometric properties with inclusion of 80% of original items. With a reduced length of just one minute, a recommendation is made to administer the full-length Hinting Task which can also be considered brief.
Utility of a BB-SCOPE Total Score
Total scores from the AIHQ-B, BLERT-B, and the Hinting task were divided by total respective possible scores, multiplied to weight each task equally, and then summed for a BB-SCOPE total score ranging from 0-100 (see Appendix A for example scoring and scoring template). The SSD group (M = 61.29, SD = 12.49) scored significantly lower than the HC group (M = 67.13, SD = 7.82; t(361.5) = 5.49, p <.01). ROC analyses identified the utility of BB-SCOPE total scores predicting impaired functioning defined as one standard deviation below HC performance and a cut score of 60 for the UPSA-B based on previous literature (Mausbach et al., 2007) since this measure was not administered to the HC group (see Table 8). Balancing optimal cutoff scores across indices and a score less than one standard deviation below HC performance, a cutoff score of 60 was recommended to identify SC impairment associated with impairments in functional capacity and social competence (see Supplementary Figure 1 for distribution of SSD and HC BB-SCOPE total scores).
Table 8.
Utility of BB-SCOPE Total Score Predicting Impairment across Functional Indices
| Predicted Outcome |
Optimal Cutoff Scoreb |
AUC [95% Cl] | Sensitivity | Specificity | 60 Cutoff Score |
Sensitivity | Specificity |
|---|---|---|---|---|---|---|---|
| SSPA Total Score < 1SD below average HC | 58 | .69 [.62, .76] |
.63 | .65 | 60 | .60 | .67 |
| SLOF Self-Report total score <1 SD below average HC | 66 | .62 [.47, .78] |
.18 | .84 | 60 | .37 | .78 |
| SLOF Informant-Report total score <1 SD below average HCa | 51 | .53 [.44, .61] |
.50 | .51 | 60 | .39 | .60 |
| UPSA-B total score <60 | 61 | .75 [.68, .83] |
.80 | .59 | 60 | .80 | .59 |
Note: A total BB-SCOPE score of 59.31 is consistent with performance less than one standard deviation below the average score of healthy control participants
the same criteria for impaired SLOF Self-Report was used (i.e., cutoff score of 4.2) since the SLOF informant-report was only administered to informants of SSD participants
optimal cutoff score determined using Youden’s Index; SSPA = Social Skills Performance Assessment, SLOF = Specific Levels of Functioning, UPSA-B = UCSD Performance-Based Skills Assessment – Brief.
Discussion
The aim of the present study was to develop a brief battery to identify SC impairment in SSD to reduce heterogeneity of SC measurement and facilitate more widespread assessment. To this end, BB-SCOPE consists of three tasks (AIHQ-B, BLERT-B, original Hinting) for a total administration time of 15 minutes to assess three SC domains of attributional bias, emotion perception and processing, and theory of mind. The BB-SCOPE is also easy to score (see Appendix A) with a recommended cutoff score of 60 for identifying marked SC impairment associated with impairments in functional capacity and social competence.
The AIHQ-B was selected to assess attributional bias due to good psychometric properties and utility in uniquely predicting social competence beyond NC performance with good divergent validity. The AIHQ-B was selected over the IBT-B based on better psychometric properties and high level of missingness (close to 10%) observed in the IBT-B due to limited response times. Recommendation of the AIHQ-B replicates previous work demonstrating improved performance of the AIHQ when using only the Likert items and a unique relationship with suspiciousness/persecution (Buck et al., 2017, 2016).
The BLERT-B and ER40-B tasks demonstrated comparable psychometric properties, but the BLERT-B was selected to assess emotion processing due to utility in uniquely predicting real-world functioning beyond NC performance. The ER40-B did not demonstrate incremental validity. The BLERT-B did not demonstrate divergent validity; however, recent findings suggest NC and SC may be less distinct than previously thought, and therefore relationships observed between the BLERT-B and NC may accurately reflect shared variance (Deckler et al., 2018).
The original Hinting task was selected over the Eyes-B and TASIT-S to assess theory of mind based on stronger relationships with indicators of functioning and incremental validity. The TASIT-S did not demonstrate incremental validity. Additionally, prior work demonstrates that performance on the Eyes task is influenced by social class and culture, and that this task may be better conceptualized as an assessment of cognitive performance or emotion recognition than SC (Deckler et al., 2018; Dodell-Feder et al., 2020; Kittel et al., 2021; Oakley et al., 2016).
No measures of social perception were recommended for inclusion in BB-SCOPE. The RAD-B demonstrated acceptable consistency and test-retest reliability but pronounced chance effects and limited validity.
Limitations
One limitation is the use of HC and SSD samples to estimate item parameters and psychometric properties. Although final recommendations for BB-SCOPE were primarily based on performance within SSD, the decision to include HC was done to maximize ability range (e.g., individuals high and low in SC ability) to better estimate performance of items as they relate to the full spectrum of SC ability. Previous work demonstrates that the majority of, individuals with SSD, but not all, experience impairment in SSD (Hajdúk et al., 2018). Therefore, inclusion of a HC group accounts for task performance estimates of individuals that may not experience SC impairment.
Additionally, BB-SCOPE tests share considerably more variance with performance-based measures of competence (SSPA, UPSA-B) than with ratings of real-world functioning. This was also the case in SCOPE and abbreviating the battery has not increased the correlation between SC measures and rating scale indices of everyday outcomes. Thus, the broad limitation that traditional assessment of SC impairments manifest somewhat limited relationships with real-world functioning applies here as well. While BB-SCOPE demonstrates utility for assessing SC impairment, there are informant and self-report measures of SC such as the Observable Social Cognition Rating Scale (OSCARS; Halverson et al., 2020; Healey et al., 2015), similar to the Schizophrenia Cognition Rating Scale (SCoRS; Keefe et al., 2006) for NC, which demonstrate stronger relationships with indicators of functioning (Jones et al., 2019; Silberstein et al., 2018). However, in the SCOPE studies, obtaining high quality clinician informant reports was possible for fewer than half of the participants, cautioning reliance on informant reports, and self-reports of cognitive abilities can be biased in SSD participants (Burton et al., 2016; Lysaker et al., 2013).
Additionally, more nuanced approaches to SC assessment such as the inclusion of introspective accuracy (self-evaluation of one’s own ability compared to observed ability, (Harvey and Pinkham, 2015) and overconfidence (overestimation of task performance captured by confidence ratings, Silberstein and Harvey, 2019) have been shown to improve correlational relationships with functioning, performance on other tasks, and improved differentiation of SSD from HC individuals (Badal et al., 2021; Perez et al., 2020). While BB-SCOPE offers an efficient and straightforward approach to establishing marked SC impairment, other approaches should be considered, especially when there is a focus on SC ability or change over time (rather than identification of impairment) and relationships with functional outcomes.
A final limitation is that the estimates and psychometric properties of brief SC tasks are based on secondary analysis of the SCOPE Study. While the estimated time of BB-SCOPE is 15 minutes, BB-SCOPE needs to be validated with additional data collection. Future research administering BB-SCOPE can provide information about observed battery duration and agreement with estimated psychometric properties.
Future Directions
While BB-SCOPE was developed to improve dissemination of SC assessment, there is ample opportunity for the improvement of brief assessment of SC. Importantly, there remains a gap in the measurement of social perception with few tasks demonstrating acceptable psychometric properties and unique relationships with functioning. Another future direction is the development of brief SC tasks with demonstrated sensitivity to change appropriate for use in clinical trials, rather than the focus of BB-SCOPE on identification of marked SC impairment (Vaskinn and Horan, 2020). The BB-SCOPE may be best utilized as a short screen to identify severe SC in SSD and to better understand if remedial interventions are broadly effective in this group.
A final future direction for consideration is the underlying structure of SC in SSD. The present study conceptualized SC as a four-dimensional construct (i.e., attribution bias, emotion processing, social perception, and theory of mind) identified by experts in the field of SC in SSD (Pinkham et al., 2014), although more recent research indicates simpler structures of SC may also be appropriate (Buck et al., 2016a; Mike et al., 2019; Riedel et al., 2021). Future work exploring the underlying factor structure of SC may identify more parsimonious and efficient SC batteries to assess SC in SSD.
Conclusion
The BB-SCOPE is a brief battery for identifying severe SC impairment which includes three tasks (i.e., AIHQ-B, BLERT-B, Hinting Task – full length) assessing three domains of SC (i.e., attribution bias, emotion processing, and theory of mind). BB-SCOPE is an efficient battery with a duration of 15 minutes, good psychometric properties, and good criterion and predictive validity. Development of BB-SCOPE is meant to improve dissemination of SC assessment as well as decrease heterogeneity in identification of SC impairment. To this end, BB-SCOPE now meets most criteria outlined by the National Institute of Mental Health when selecting the Computerized Neurocognitive Battery as a measure of neurocognition for the PhenX Toolkit for psychosis: comprehensive, easily administered, scalable in a variety of settings, and established psychometric properties in SSD (Öngür et al., 2020). The BB-SCOPE is also an efficient battery for use in clinical practice to identify individuals with marked SC impairment that may optimally benefit from psychosocial interventions targeting SC and related constructs to improve functioning.
Supplementary Material
Appendix A
BB-SCOPE Sample Scoring
To score the AIHQ-B, BLERT-B, and Hinting tasks into a composite score, raw scores are converted as follows: sum total points for each task divided by the total number of points possible. Multiple percent correct by 33.33 to weight each task equally, round to the nearest integer (e.g., 2.61 rounds to 3, 2.42 rounds to 2), and then sum each task score. Scores below 60 correspond with functional impairment and suggests impaired social cognition. Scores at or above 60 correspond with intact functioning. An example of the scoring is provided below.

BB-SCOPE Scoring
To score the AIHQ-B, BLERT-B, and Hinting tasks into a composite score, raw scores are converted as follows: sum total points for each task divided by the total number of points possible. Multiple percent correct by 33.33 to weight each task equally, round to the nearest integer (e.g., 2.61 rounds to 3, 2.42 rounds to 2), and then sum the three individual task scores. Scores below 60 correspond with functional impairment and suggest impaired social cognition. Scores at or above 60 correspond with intact functioning.

Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
CRediT Author Statement
Tate Halverson: Conceptualization, Formal analysis, Writing – original draft, Visualization; Amy Pinkham: Conceptualization, Methodology, Investigation, Resources, Writing – reviewing and editing, Funding acquisition, Project administration; Philip Harvey: Conceptualization, Methodology, Investigation, Resources, Writing – reviewing and editing, Funding acquisition, Project administration; David Penn: Conceptualization, Methodology, Investigation, Writing – review and editing, Resources, Supervision, Funding acquisition, Project administration.
Conflicts of Interest
Dr. Harvey has received consulting fees or travel reimbursements from Alkermes, Bio Excel, Boehringer Ingelheim, Karuna Pharma, Merck Pharma, Minerva Pharma, SK Pharma, and Sunovion Pharma during the past year. He receives royalties from the Brief Assessment of Cognition in Schizophrenia (Owned by Verasci, Inc. and contained in the MCCB). He is chief scientific officer of i-Function, Inc. Dr. Halverson, Dr. Pinkham, and Dr. Penn have no conflicts of interest to disclose.
References
- Akoglu H, 2018. User’s guide to correlation coefficients. Turkish J. Emerg. Med 18, 91–93. 10.1016/j.tjem.2018.08.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Badal VD, Depp CA, Hitchcock PF, Penn DL, Harvey PD, Pinkham AE, 2021. Computational methods for integrative evaluation of confidence, accuracy, and reaction time in facial affect recognition in schizophrenia. Schizophr. Res. Cogn 25, 100196. 10.1016/j.scog.2021.100196 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baron-Cohen S, Wheelwright S, Hill J, Raste Y, Plumb I, Hill J, Plumb I, Wheelwright S, 2001. The “Reading the Mind in the Eyes” Test Revised Version: A study with normal adults, and adults with asperger syndrome or high-functioning autism. J. Child Psychol. Psychiatry 42, 241–251. 10.1111/1469-7610.00715 [DOI] [PubMed] [Google Scholar]
- Bock RD, 1997. A Brief History of Item Theory Response. Educ. Meas. Issues Pract 16, 21–33. 10.1111/j.1745-3992.1997.tb00605.x [DOI] [Google Scholar]
- Bortolotti SLV, Tezza R, de Andrade DF, Bornia AC, de Sousa Júnior AF, 2013. Relevance and advantages of using the item response theory. Qual. Quant 47, 2341–2360. 10.1007/s11135-012-9684-5 [DOI] [Google Scholar]
- Bryson G, Bell M, Lysaker P, 1997. Affect recognition in schizophrenia: A function of global impairment or a specific cognitive deficit. Psychiatry Res. 71, 105–113. 10.1016/S0165-1781(97)00050-4 [DOI] [PubMed] [Google Scholar]
- Buck B, Kern RS, Marder SR, Penn DL, Healey KM, Green MF, Lee J, Reise SP, Horan WP, Iwanski C, Healey KM, Green MF, Horan WP, Kern RS, Lee J, Marder SR, Reise SP, Penn DL, 2017. Improving measurement of attributional style in schizophrenia; A psychometric evaluation of the Ambiguous Intentions Hostility Questionnaire (AIHQ). J. Psychiatr. Res 89, 48–54. 10.1016/j.jpsychires.2017.01.004 [DOI] [PubMed] [Google Scholar]
- Buck BE, Healey KM, Gagen EC, Roberts DL, Penn DL, 2016. Social cognition in schizophrenia: Factor structure, clinical and functional correlates. J. Ment. Heal 25, 330–337. 10.3109/09638237.2015.1124397 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Buck BE, Pinkham AE, Harvey PD, Penn DL, 2016. Revisiting the validity of measures of social cognitive bias in schizophrenia: Additional results from the social cognition psychometric evaluation (SCOPE) study. Br. J. Clin. Psychol 55, 441–454. 10.1007/s11065-015-9294-9. Functional [DOI] [PMC free article] [PubMed] [Google Scholar]
- Burton CZ, Harvey PD, Patterson TL, Twamley EW, 2016. Neurocognitive insight and objective cognitive functioning in schizophrenia. Schizophr. Res 171, 131–136. 10.1016/j.schres.2016.01.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chalmers RP, 2012. mirt: A multidimensional item response theory package for the R environment. J. Stat. Softw 48, 1–29. 10.18637/jss.v048.i06 [DOI] [Google Scholar]
- Combs DR, Penn DL, Wicher M, Waldheter E, 2007. The Ambiguous Intentions Hostility Questionnaire (AIHQ): A new measure for evaluating hostile social-cognitive biases in paranoia. Cogn. Neuropsychiatry 12, 128–143. 10.1080/13546800600787854 [DOI] [PubMed] [Google Scholar]
- Corcoran R, Mercer G, Frith CD, 1995. Schizophrenia, symptomatology and social inference: Investigating “theory of mind” in people with schizophrenia. Schizophr. Res 17, 5–13. 10.1016/0920-9964(95)0024-g [DOI] [PubMed] [Google Scholar]
- Cornacchio D, Pinkham AE, Penn DL, Harvey PD, 2017. Self-assessment of social cognitive ability in individuals with schizophrenia: Appraising task difficulty and allocation of effort. Schizophr. Res 179, 85–90. 10.1016/j.schres.2016.09.033 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deckler E, Hodgins GE, Pinkham AE, Penn DL, Harvey PD, 2018. Social cognition and neurocognition in schizophrenia and healthy controls: Intercorrelations of performance and effects of manipulations aimed at increasing task difficulty. Front. Psychiatry 9, 1–10. 10.3389/fpsyt.2018.00356 [DOI] [PMC free article] [PubMed] [Google Scholar]
- DeVellis RF, 2017. Scale Development: Theory and Applications (4th ed.). Thousand Oaks, CA: Sage. [Google Scholar]
- Dodell-Feder D, Ressler KJ, Germine LT, 2020. Social cognition or social class and culture? On the interpretation of differences in social cognitive performance. Psychol. Med 50, 133–145. 10.1017/S003329171800404X [DOI] [PubMed] [Google Scholar]
- Green MF, Horan WP, Lee J, 2010. Social cognition in schizophrenia. Curr. Dir. Psychol. Sci 19, 243–248. 10.1038/nrn4005 [DOI] [Google Scholar]
- Hajdúk M, Harvey PD, Penn DL, Pinkham AE, 2018. Social cognitive impairments in individuals with schizophrenia vary in severity. J. Psychiatr. Res 104, 65–71. 10.1016/j.jpsychires.2018.06.017 [DOI] [PubMed] [Google Scholar]
- Halverson TF, Hajdúk M, Pinkham AE, Harvey PD, Jarskog LF, Nye L, Penn DL, 2020. Psychometric properties of the Observable Social Cognition Rating Scale (OSCARS): Self-report and informant-rated social cognitive abilities in schizophrenia. Psychiatry Res. 286, 112891. 10.1016/j.psychres.2020.112891 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Halverson TF, Orleans-Pobee M, Merritt C, Sheeran P, Fett A-K, Penn DL, 2019. Pathways to functional outcomes in schizophrenia spectrum disorders: Meta-analysis of social cognitive and neurocognitive predictors. Neurosci. Biobehav. Rev 105, 212–219. 10.1016/j.neubiorev.2019.07.020 [DOI] [PubMed] [Google Scholar]
- Harvey PD, Penn D, 2010. Social cognition: the key factor predicting social outcome in people with schizophrenia? Psychiatry (Edgmont). 7, 41–4. [PMC free article] [PubMed] [Google Scholar]
- Harvey PD, Pinkham A, 2015. Impaired self-assessment in schizophrenia: Why patients misjudge their cognition and functioning. Curr. Psychiatr 14, 53–59. [Google Scholar]
- Healey KM, Combs DR, Gibson CM, Keefe RSE, Roberts DL, Penn DL, 2015. Observable Social Cognition - A Rating Scale: An interview-based assessment for schizophrenia. Cogn. Neuropsychiatry 20, 198–221. 10.1080/13546805.2014.999915 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Honan CA, McDonald S, Sufani C, Hine DW, Kumfor F, 2016. The awareness of social inference test: Development of a shortened version for use in adults with acquired brain injury. Clin. Neuropsychol 30, 243–264. 10.1080/13854046.2015.1136691 [DOI] [PubMed] [Google Scholar]
- Jones MT, Deckler E, Laurrari C, Jarskog LF, Penn DL, Pinkham AE, Harvey PD, 2019. Confidence, performance, and accuracy of self-assessment of social cognition: A comparison of schizophrenia patients and healthy controls. Schizophr. Res. Cogn 0–1. 10.1016/j.scog.2019.01.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kay SR, Fiszbein A, Opler LA, 1987. The positive and negative syndrome scale (PANSS) for schizophrenia. Schizophr. Bull 13, 261–276. [DOI] [PubMed] [Google Scholar]
- Keefe RSE, Goldberg TE, Harvey PD, Gold JM, Poe MP, Coughenour L, 2004. The Brief Assessment of Cognition in Schizophrenia: Reliability, sensitivity, and comparison with a standard neurocognitive battery. Schizophr. Res 68, 283–297. 10.1016/j.schres.2003.09.011 [DOI] [PubMed] [Google Scholar]
- Keefe RSE, Poe M, Walker TM, Kang JW, Harvey PD, 2006. The Schizophrenia Cognition Rating Scale: An interview-based assessment and its relationship to cognition, real-world functioning, and functional capacity. Am. J. Psychiatry 163, 426–432. 10.1176/appi.ajp.163.3.426 [DOI] [PubMed] [Google Scholar]
- Kittel AFD, Olderbak S, Wilhelm o., 2021. Sty in the mind’s eye: A meta-analytic investigation of the nomological network and internal consistency of the “Reading the Mind in the Eyes” test. Assessment (ahead of print). 10.1177/1073191121996469 [DOI] [PubMed] [Google Scholar]
- Klein HS, Springfield CR, Bass E, Ludwig K, Penn DL, Harvey PD, Pinkham AE, 2020. Measuring mentalizing: A comparison of scoring methods for the hinting task. Int. J. Methods Psychiatr. Res 29, 1–11. 10.1002/mpr.1827 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kohler CG, Turner TH, Bilker WB, Brensinger CM, Siegel SJ, Kanes SJ, Gur RE, Gur RC, 2003. Facial emotion recognition in schizophrenia: Intensity effects and error pattern. Am. J. Psychiatry 160, 1768–1774. 10.1176/appi.ajp.160.10.1768 [DOI] [PubMed] [Google Scholar]
- Kraemer HC, Kupfer DJ, Clarke DE, Narrow WE, Regier DA, 2012. DSM-5: How Reliable is Reliable Enough? Am J Psychiatry 169, 13–15. 10.1176/appi.ajp.2011.11010050 [DOI] [PubMed] [Google Scholar]
- Kuha J, 2004. AIC and BIC: Comparisons of assumptions and performance. Sociol. Methods Res 33, 188–229. 10.1177/0049124103262065 [DOI] [Google Scholar]
- Lysaker PH, Vohs J, Ohayon IH, Kukla M, Wierwille J, Dimaggio G, 2013. Depression and insight in schizophrenia: Comparisons of levels of deficits in social cognition and metacognition and internalized stigma across three profiles. Schizophr. Res 148, 18–23. 10.1016/j.schres.2013.05.025 [DOI] [PubMed] [Google Scholar]
- Mausbach BT, Harvey PD, Goldman SR, Jeste DV, Patterson TL, 2007. Development of a brief scale of everyday functioning in persons with serious mental illness. Schizophr. Bull 33, 1364–1372. 10.1093/schbul/sbm014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- McDonald S, Flanagan S, Rollins J, Kinch J, 2003. TASIT: A new clinical tool for assessing social perception after traumatic brain injury. J. Head Trauma Rehabil 18, 219–238. 10.1097/00001199-200305000-00001 [DOI] [PubMed] [Google Scholar]
- Mike L, Guimond S, Kelly S, Thermenos H, Mesholam-Gately R, Eack S, Keshavan M, 2019. Social cognition in early course of schizophrenia: Exploratory factor analysis. Psychiatry Res. 272, 737–743. 10.1016/j.psychres.2018.12.152 [DOI] [PubMed] [Google Scholar]
- Nuechterlein KH, Green MF, Kern RS, Baade LE, Barch DM, Cohen JD, Essock S, Fenton WS, Frese FJ 3rd, Gold JM, Goldberg T, Heaton RK, Keefe RSE, Kraemer H, Mesholam-Gately R, Seidman LJ, Stover E, Weinberger DR, Young AS, Zalcman S, Marder SR, 2008. The MATRICS Consensus Cognitive Battery, part 1: Test selection, reliability, and validity. Am. J. Psychiatry 165, 203–213. 10.1176/appi.ajp.2007.07010042 [DOI] [PubMed] [Google Scholar]
- Oakley BFM, Brewer R, Bird G, Catmur C, 2016. Theory of mind is not theory of emotion: A cautionary note on the Reading the Mind in the Eyes Test. J. Abnorm. Psychol 125, 818. 10.1037/abn0000182 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Öngür D, Carter CS, Gur RE, Perkins D, Sawa A, Seidman LJ, Tamminga C, Huggins W, Hamilton C, 2020. Common data elements for national institute of mental health–funded translational early psychosis research. Biol. Psychiatry Cogn. Neurosci. Neuroimaging 5, 10–22. 10.1016/j.bpsc.2019.06.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Patterson TL, Moscona S, McKibbin CL, Davidson K, Jeste DV 2001. Social skills performance assessment among older patients with schizophrenia. Schizophr. Res 48, 351–360. 10.1016/s0920-9964(00)00109-2 [DOI] [PubMed] [Google Scholar]
- Penn DL, Corrigan PW, Bentall RP, Racenstein JM, Newman L, 1997. Social cognition in schizophrenia. Psychol. Bull 121, 114–132. 10.1037/0033-2909.121.1.114 [DOI] [PubMed] [Google Scholar]
- Perez MM, Tercero BA, Penn DL, Pinkham AE, Harvey PD, 2020. Overconfidence in social cognitive decision making: Correlations with social cognitive and neurocognitive performance in participants with schizophrenia and healthy individuals. Schizophr. Res 224, 51–57. 10.1016/j.schres.2020.10.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Petrillo J, Cano SJ, McLeod LD, Coon CD, 2015. Using classical test theory, item response theory, and rasch measurement theory to evaluate patient-reported outcome measures: A comparison of worked examples. Value Heal. 18, 25–34. 10.1016/j.jval.2014.10.005 [DOI] [PubMed] [Google Scholar]
- Pinkham AE, Harvey PD, Penn DL, 2018. Social Cognition Psychometric Evaluation: Results of the Final Validation Study. Schizophr. Bull 44, 737–748. 10.1093/schbul/sbx117 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pinkham AE, Harvey PD, Penn DL, 2016a. Paranoid individuals with schizophrenia show greater social cognitive bias and worse social functioning than non-paranoid individuals with schizophrenia. Schizophr. Res. Cogn 3, 33–38. 10.1016/j.scog.2015.11.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pinkham AE, Penn DL, Green MF, Buck B, Healey K, Harvey PD, 2014. The social cognition psychometric evaluation study: Results of the expert survey and RAND Panel. Schizophr. Bull 40, 813–823. 10.1093/schbul/sbt081 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pinkham AE, Penn DL, Green MF, Harvey PD, 2016b. Social cognition psychometric evaluation: Results of the initial psychometric study. Schizophr. Bull 42, 494–504. 10.1093/schbul/sbv056 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Riedel P, Horan WP, Lee J, Hellemann GS, Green MF, 2021. The factor structure of social cognition in schizophrenia: A focus on replication with confirmatory factor analysis and machine learning. Clin. Psychol. Sci 9, 38–52. 10.1177/2167702620951527 [DOI] [Google Scholar]
- Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, Muller M, 2011. pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12, 77. 10.1109/ACCESS.2018.2805869 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rosset E, 2008. It’s no accident: Our bias for intentional explanations. Cognition 108, 771–780. 10.1016/j.cognition.2008.07.001 [DOI] [PubMed] [Google Scholar]
- Schneider LC, Struening EL, 1983. SLOF: a behavioral rating scale for assessing the mentally ill. Soc. Work Res. Abstr 19, 9–21. [DOI] [PubMed] [Google Scholar]
- Sergi MJ, Fiske AP, Horan WP, Kern RS, Kee KS, Subotnik KL, Nuechterlein KH, Green MF, 2009. Development of a measure of relationship perception in schizophrenia. Psychiatry Res. 166, 54–62. 10.1016/j.psychres.2008.03.010 [DOI] [PubMed] [Google Scholar]
- Silberstein J, Harvey PD, 2019. Cognition, social cognition, and self-assessment in schizophrenia: Prediction of different elements of everyday functional outcomes. CNS Spectr. 1–6. 10.1017/S1092852918001414 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Silberstein JM, Pinkham AE, Penn DL, Harvey PD, 2018. Self-assessment of social cognitive ability in schizophrenia: Association with social cognitive test performance, informant assessments of social cognitive ability, and everyday outcomes. Schizophr. Res 199, 75–82. 10.1016/j.schres.2018.04.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thissen D, Orlando M, 2001. Item response theory for items scored in two categories. [Google Scholar]
- Vaskinn A, Horan WP, 2020. Social cognition and schizophrenia: Unresolved issues and new challenges in a maturing field of research. Schizophr. Bull 46, 464–470. 10.1093/schbul/sbaa034 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Velthorst E, Fett AKJ, Reichenberg A, Perlman G, Van Os J, Bromet EJ, Kotov R, 2017. The 20-year longitudinal trajectories of social functioning in individuals with psychotic disorders. Am. J. Psychiatry 174, 1075–1085. 10.1176/appi.ajp.2016.15111419 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ventura J, Reise SP, Keefe RSE, Baade LE, Gold JM, Green MF, Kern RS, Mesholam-Gately R, Nuechterlein KH, Seidman LJ, Bilder RM, 2010. The Cognitive Assessment Interview (CAI): Development and validation of an empirically derived, brief interview-based measure of cognition. Schizophr. Res 121, 24–31. 10.1016/j.schres.2010.04.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Youden WJ, 1950. Index for rating diagnostic tests. Cancer 3, 32–35. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

