. 2019 Jul 4;10:425. doi: 10.3389/fpsyt.2019.00425

Table 3.

Strengths and limitations of more popular social cognitive assessments used in neuropsychiatric populations.

Measure	Strengths	Limitations
Sally Anne Task	• Can be used with children • Tests understanding of both first- and second-order belief • False belief tasks in general are established tests of ToM available in a variety of forms • Relatively pure measure of cognitive ToM	• Not originally designed for adults • Executive functions affect performance (93, 94, 139) • Format of presentation will also influence performance (139, 470)
Strange Stories	• Validity, e.g., correlated with measures of relational perspective taking (156, 471) and the Faux Pas Task (157) • Associated with social competence in epilepsy (157) • Includes control-type “physical” stories • Insight offered by multiple scoring techniques including number of mental states attributed, appropriateness and quality (149) • Naturalistic style task (149)	• Performance is affected by reading comprehension (155), IQ (153, 163, 471–473), and executive function (145, 161, 163) • General inferential ability, social norms, and autobiographical memory may influence performance (474) • Typical children don’t reach ceiling (474) • Different studies use different length versions • Lack of vocal cues limits ecological validity (146) • Physical (control) stories are not well matched (152) • Age effects (145, 475)
The Yoni Task	• Tests both cognitive and affective mental states, and first- and second-order belief • Visual task which could reduce working memory demand • Ease of presentation and can be used well with children (175) • Validity supported by correlations with, e.g., false belief tasks (67) • Affective trials can be related to quality of life measure in Parkinson’s disease (172) • The authors also developed a related task to assess understanding of socially competitive emotions	• Executive functions (175, 176, 454) and IQ can affect performance (176, 454) • It is not clear if these factors differentially influence the cognitive and affective aspects, i.e., that the demands of all trials are comparable • Simply relying on eye gaze direction may help answer some trials, although there are some control trials with eye gaze straight ahead
Animations Task	• Can be used to reveal both hypo- and hyper-mentalizing • Can assess spontaneous mental state reasoning, therefore has good ecological validity, and may be more challenging and sensitive than some other tasks • Non-facial as well as non-verbal stimuli • Multiple scores meaning complex patterns of performance and selective deficits can be identified • Can be related to social, school, and occupational functioning in schizophrenia (476) • Responses can be scored for length as a control	• Complex scoring and transcription required, a need for multiple raters • The clips are short: standardized instructions are required in relation to the number of viewings to permit • Experimenter must avoid providing cues as to the nature of the task • Verbal abilities from speech to vocabulary will influence response quality (e.g., 186) and visual attention may affect performance • Possible gender effect (186) • The video clips are not matched across condition in terms of length or complexity
Intentions Comic Strip Task	• Avoids verbal demands, which makes it accessible across cultures and enhances the purity of the measure • Useful for fMRI experiments (e.g., 191) • Contains useful control conditions • Factor analysis supports the validity of the three conditions (477) • Taps implicit reasoning • Fairly pure measure of cognitive ToM	• Possible ceiling effect in controls (478) • Used in few clinical groups overall • Studies have yet to explore the contribution of, e.g., executive functions to task performance
Pictures of Facial Affect	• Can be used to reveal emotion specific deficits • Suitable for use with children (479) • May be a sensitive measure in terms of tracking disorder state (e.g., 210) • Performance can indicate carer burden (480) • Includes neutral trials can offer particular insight (481) • Validity supported by associations with other social cognitive tasks (223)	• Only assesses recognition of basic emotions and mainly negative emotions • Motor contribution unknown • Associated with global cognition or education (238, 482) and IQ (211, 216) • Interpretation is complex as performance could be impaired by self-awareness (483), problems with motor simulation, or memory • Possible gender (484) and age effects (479, 485, 486) • Time-limited format may lead to guessing (222) • Little ethnic variation in stimuli, grayscale, old fashioned (479) • Ecologically validity is limited by the use of static images • Possible effects of field of presentation (487)
The Assessment of Social Inference Test	• No ceiling effect (488) • Linked to functional outcome/social skills in schizophrenia (238, 274) and in traumatic brain injury (489), as well as caregiver burden (231, 232) • Comprehensive and naturalistic, as taps ability to use a range of skills in combination, including facial expression and other non-verbal cues (490) • Good construct and convergent validity as related to other perspective taking measures (230) and IRI (242) • More challenging and less contrived than facial expressions • Lots of norms available for scoring • Dynamic, not static, so better predictive value (491) • Indexes frontal lobe volume loss in fronto-temporal dementia (234) • Good psychometrics (223)	Age effect (228, 238, 492–494) • Performance is influenced by vocabulary (494, 495), IQ (249, 489), education (238), and executive functions (228–230, 245, 496) including processing speed and working memory (223) • Motor component is unclear (497) • Lengthy task for impaired patients, although a short version is now available (496) • Surprise items are poor (230) • Forced-choice response format limits ecological validity (242) • Impairments could simply reflect poor face emotion recognition as this is correlated (209, 249, 489)
Movie for the Assessment of Social Cognition	• Can detect both hypo- and hyper-mentalizing • Tests understanding of both cognitive and affective mental state reasoning and fine-grained assessment that can reveal selective deficits (69, 259) • Reliable in adolescents (260, 498) • Good psychometrics (250) including internal consistency and reliability (263, 273) • Ecologically valid (267) • Not related to verbal IQ (69) • Validity supported by correlations with other social cognitive tasks (150, 151, 260, 499) but not always correlated with other social cognitive tasks (273) • Not affected by culture or social desirability (150, 151)	• Depression, IQ, and executive functions can affect performance (255, 265, 501) • Age effects (265, 270, 499) • Uses only second-person perspective and participant is observer (499), should add self-referent aspect (271) • Long time to administer and score—45–70 min (150, 151) • Use of contextual cues could mask a deficit (468) • Stress can affect performance (502) • Need trained raters (69, 259) • Doesn’t tap implicit social cognition (250) • Further psychometric analysis would be helpful
Hinting Task	• Takes less than 10 min to administer (278) • Strong test–retest reliability and good internal consistency (500) • Not associated with IQ (294, 503) • Validity supported by correlation with spoken prosody (504) and correlates with other social cognitive tasks, e.g., emotion recognition (505) • Related to social functioning in schizophrenia (274, 506) • Not associated with referential thinking in general (507, 508)	• Potential ceiling effect (274, 275, 300) • Only assesses cognitive ToM • Poor test–retest reliability and practice effect (274) • Highly dependent on verbal comprehension (293) and associated with IQ (509) • Executive function may affect performance (504, 510–514), especially processing speed and memory (297) • Age effect (301)
Reading the Mind in the Eyes Test	• Validity supported by strong association with other social cognitive measures, e.g., Hinting task (506), IRI-PT (515) but perhaps only a weak correlation with autism spectrum quotient (516) • No ceiling in controls, can examine positive, negative, and neutral trials separately (e.g., 382–384) and use RT to offer insight (382–384, 517) • Scores remain stable over time (518) • Short administration time (typically 10–15 min) • Can use across cultures (349) and many existing translations • Not just basic emotion recognition (519) • Associated with social factors such as maternal functioning (520), social isolation (506), and clinical change in psychosis (521) • Test–retest reliability is fairly good for the child version of RMET and one study demonstrated no learning effects (522).	• Gender effects are debated (361, 515, 518, 523–525) • Performance is associated with visuospatial skills (512), reading (526), autobiographical memory (527), IQ (528–532), and executive function (533; my papers; 298, 534) • Debate as to whether stress affects performance (502, 535) • Age effects (160, 523, 536) • Cronbach’s alpha can be low (312, 537) • The stimuli were restricted to only Caucasians in the original task, and a gender confound as the males are older, less attractive, and more negative (538) • Ecological validity is also weakened by static images, specificity of cues and forced-choice response format • Better control tasks are needed (539) • Debate over whether the task measures cognitive or affective ToM, or empathy, or emotion recognition (261) • Some items have floor or ceiling effects
Faux Pas Task	• Used to test cognitive and affective ToM, with multiple layers of difficulty, and fine-grained analysis possible • Control stories are included and can indicate hyper-mentalizing as well as hypo-mentalizing • Mimics real life • Associated with other social cognitive tasks and quality of life in epilepsy (373) • Can adapt to other cultures (137) • Associated with prosody deficit/indirect speech understanding (540, 541) and RMET performance in some studies (542) but not others (543, 544) • Associated with carer behavior ratings (545) and mixed findings for social functioning in schizophrenia (366, 546)	• A verbal task that makes cognitive demands beyond mental state reasoning (474) • Accuracy may reflect use social norms and scripts, not just online reasoning about mental states, making this a “top-down” task (547) • Associated with education (548) and IQ (549), and executive function can affect performance (339, 378, 382–385, 546, 550, 551) • Scoring differences across studies (160) and some responses are difficult to score • The cognitive and affective questions may not be of comparable difficulty • Controls don’t always perform at ceiling • Antipsychotic medications may affect performance (552) • Little psychometric data
Interpersonal Reactivity Index	• A multidimensional measure that can be used to assess cognitive and affective empathy: multidimensional • Fast to administer—15 min (447) • High convergent and discriminant validity (553) • Often associated with other social cognitive tasks (e.g., 341) • Psychophysiological data support the difference between cognitive and affective aspects (430) • Stable over time in schizophrenia (554) • Predicts functional capacity/psychosocial functioning in schizophrenia (555, 556) and psychosocial function in bipolar disorder (557) as well as being associated with carer burden (231, 232, 461) • Proxy version available and scores can be correlated, e.g., between parents and their adolescent children (558).	• Not associated with other empathy measures (559) • Self-report means potential for bias and difficulties due to insight or anosagnosia (541) • Social desirability can be a problem, e.g., in forensic populations (560), so more objective measures are needed (561) • Cognitive and affective subscales and combinations have questionable validity (562) and the factor structure can be challenged (563): the scale be less valid for affective empathy (564) • The PD subscale has weakest internal consistency (565), plus this subscale is self-oriented and neither it nor the F subscale measures true empathy (566) • Gender effect (567–569) • Scores can be associated with executive function (450) • Age effect (570)

Limitations are raised by the author where no reference is given. Factors such as ceiling effects and the specificity of the measure could be considered both strengths and limitations. A ceiling effect in controls could mean a task can highlight a profound deficit in patients, but no ceiling effect may mean greater sensitivity, whereas task specificity can help to reveal a precise deficit to target with intervention, although a more global perspective on social cognitive performance may also be needed.