Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Dec 1.
Published in final edited form as: Psychol Assess. 2017 Feb 23;29(12):1537–1542. doi: 10.1037/pas0000439

Test-Retest Reliability of the Facial Expression Labeling Task

Jennifer L Cecilione 1, Lance M Rappaport 2, Brad Verhulst 3, Dever M Carney 4, RJR Blair 5, Melissa A Brotman 6, Ellen Leibenluft 7, Daniel S Pine 8, Roxann Roberson-Nay 9, John M Hettema 10
PMCID: PMC5568997  NIHMSID: NIHMS846327  PMID: 28230406

Abstract

Recognizing others’ emotional expressions is vital for socioemotional development; impairments in this ability occur in several psychiatric disorders. Further study is needed to map the development of this ability and to evaluate its components as potential transdiagnostic endophenotypes. Before doing so, however, research is required to substantiate the test-retest reliability of scores of the face emotion identification tasks linked to developmental psychopathology. The current study estimated test-retest reliability of scores of one such task, the facial expression labeling task (FELT) among a sample of twin children (N = 157; ages 9–14). Participants completed the FELT at two visits two to five weeks apart. Participants discerned the emotion presented of faces depicting six emotions (i.e., happiness, anger, sadness, fear, surprise, and disgust) morphed with a neutral face to provide 10 levels of increasing emotional expressivity. The present study found strong test-retest reliability (Pearson r) of the FELT scores across all emotions. Results suggested that data from this task may be effectively analyzed using a latent growth curve model to estimate overall ability (i.e., intercept; r’s = 0.76—0.85) and improvement as emotions become clearer (i.e., linear slope; r’s = 0.69—0.83). Evidence of high test-retest reliability of this task’s scores informs future developmental research and the potential identification of transdiagnostic endophenotypes for child psychopathology.

Keywords: test-retest reliability, face emotion identification, facial expression labeling task, children


The ability to identify facial expressions is crucial for healthy socioemotional development (Marsh & Blair, 2008; Trentacosta & Fine, 2010). Deficits in face emotion identification have been associated with several psychiatric disorders (Brotman et al., 2008; Shaw, Stringaris, Nigg, & Leibenluft, 2016; Blair, Colledge, Murray, & Mitchell, 2001; Trentacosta & Fine, 2010). Evidence that face emotion identification deficits are implicated in a myriad of psychiatric disorders suggests that components of face emotion identification may be transdiagnostic endophenotypes of psychiatric illnesses.

Research regarding face emotion identification deficits in internalizing disorders has resulted in mixed findings (e.g., Easter et al., 2005, but see Guyer et al., 2007). This may be because different paradigms are used to assess face emotion identification ability, which hinders the ability to reconcile discordant findings in this field (Bourke et al., 2010). Some of the various face emotion identification tasks include the Child and Adult Facial Expressions subtests of the Diagnostic Analysis of Nonverbal Accuracy (Nowicki & Duke, 1994), the Penn emotion recognition test (Kohler et al., 2003), and the Bell Lysaker emotion recognition task (Pinkham, Penn, Green, & Harvey, 2016). Other studies use variations of the facial expression labeling task (FELT), such as the dynamic emotion identification task (Kirsh & Mounts, 2007) and the emotional expression multimorph task (Blair et al., 2001).

Since task heterogeneity creates a methodological confound in aggregating results to draw reliable conclusions about face emotion identification ability, the current study sought to evaluate a promising version of the task by determining the test-retest reliability of FELT scores; this task has been used in several previous studies in varying forms (e.g., Averbeck, Bobin, Evans, & Shergill, 2012; Kirkpatrick, Lee, Wardle, Jacob, & de Wit, 2014; Marsh, Yu, Pine, & Blair, 2010). The present study’s version of the FELT may be a more sensitive and preferred measure of face emotion identification ability than prior variations for several reasons.

First, this version uses all six of Ekman’s exemplar emotions, which is particularly important since prior research suggests that certain psychiatric disorders (e.g., major depressive disorder) may be associated with difficulty or a predilection to identifying particular emotions (e.g., sadness; Bourke et al., 2010). Including fewer emotions (e.g., Nowicki & Duke, 1994) may preclude establishing emotion-specific deficits. Second, this task presents each emotion at 10 to 100 percent emotional expressivity in 10 percent increments (Harmer, Rogers, Tunbridge, Cowen, & Goodwin, 2003; Marsh et al., 2010); this may provide a more realistic, nuanced display of facial expressions than other tasks (e.g., Nowicki & Duke, 1994). Third, in the present task, participants select a response choice after seeing each of 360 static pictures of faces; this is preferred to previous tasks wherein only one data point is collected when participants press a key to identify an emotion in a continuously morphing image (e.g., Kirsh & Mounts, 2007). The current paradigm provides information on face emotion identification ability at each level of emotional expressivity. Finally, in the present task, trials are presented in random order. When trials are presented in sequential order (e.g., Blair et al., 2001), responses at various levels of expressivity become non-independent, such that a participant’s response at one level may inform their response at the next.

Moreover, the analytic strategy used to assess accuracy often varies over studies of face emotion identification. One approach is to compute an individual’s raw accuracy, which conflates an individual’s ability to correctly identify an emotion with his or her tendency to endorse the emotion. The current study utilized the “unbiased hit rate”. This approach adjusts an individual’s raw accuracy (i.e., proportion of trials correctly identified) for an individual’s differential accuracy (i.e., proportion of correct uses of an emotion). The resulting score was adjusted for guessing and non-normality (Marsh et al., 2010; Wagner, 1993). Additionally, in all variations of face emotion identification tasks, levels of emotional expressivity are nested within individual and present a gradient over which one can estimate an intercept and rate of change as the expressed emotion becomes clearer (i.e., linear slope). At a minimum, the analysis of data from this and related tasks should account for the nonindependence of trials within person. However, existing research has not examined the test-retest reliability of face emotion identification tasks’ scores through analytic approaches that account for the structure of this data (e.g., multilevel/hierarchical modeling or latent growth curve modeling).

The present study provides the first examination of test-retest reliability of change in scores over increasing expressivity of emotional expression in pre-adolescent children. Prior research has reported adequate test-retest reliability of FELT scores among an adult community sample (Adams et al., 2015). Since the test-retest reliability of scores of any face emotion identification task using a child sample has not yet been established, the current study’s approach is novel in that it assesses test-retest reliability of FELT scores in a genetic epidemiological sample of pre-adolescent children. It is important to study face emotion identification ability in this demographic, as pre-adolescent children are learning, developing, and solidifying various psychological characteristics that will affect their developmental trajectories into adolescence and adulthood (Steinberg & Morris, 2001). Given that deficits in face emotion identification ability are correlated with a variety of psychopathology, intervening at this early stage in development is critical. Similar to previous research (Adams et al., 2015), the present study hypothesized that test-retest reliability would be high. Since findings from research using adult participants (i.e., Adams et al., 2015) cannot be assumed to apply to children, demonstrating high test-retest reliability in the current study is necessary to inform future developmental research on face emotion identification. In particular, high test-retest reliability of scores on a face emotion identification task is essential for future research to examine face emotion identification deficits in childhood as a marker for abnormal development and psychopathology. Additionally, this study used an analytic approach that is more consistent with the nature of the task (i.e., a latent growth curve model), in which performance can be examined as a function of decreasing task difficulty. This analytic approach permitted examining reliability of level, which is similar to research described by Adams et al. and change as difficulty decreases. In this manner, the present study sought to extend research by Adams et al. to demonstrate test-retest reliability in a child and adolescent sample and sought to examine the test-retest reliability of change during the task.

Method

Participants and Procedure

Participants (N = 796 individuals; 52.4% female) were recruited as twin pairs from the Mid-Atlantic Twin Registry (Lilley & Silberg, 2013) to participate in a study on juvenile anxiety risk. This genetic epidemiological sample was comprised of Caucasian twin children (aged 9 to 14, M=10.78 yrs.) who lived in the mid-Atlantic region. This study’s protocol was approved by Virginia Commonwealth University’s institutional review board. Consent was obtained from participants’ legal guardian and assent was obtained from participants. Participants attended two laboratory-based sessions that were two to five weeks apart. At visit 1 (V1), 393 twin pairs (786 individuals) completed the study’s full protocol. Of those who participated in visit 2 (V2), 157 individuals were assigned to a randomization order that included the FELT (i.e., a planned missingness design). The study’s full protocol is detailed elsewhere (Carney et al., 2015).

Measures

To measure zygosity, a parent or legal guardian of participants completed questionnaires about twins that included questions regarding physical similarities between them; these items have previously been used to determine the zygosity of twin participants, and the interpretations of the questionnaire’s results have shown good validity compared to blood testing (Kasriel & Eaves, 1976) and DNA evaluation of zygosity (Jackson, Snieder, Davis, & Treiber, 2001).

The task used herein is most similar to that described by Marsh et al. (2010). Participants sat with an experimenter in front of a computer and were given these instructions: “In this task, you will be shown pictures of people with various emotional expressions. After viewing the faces, please choose the correct emotion from the choices provided on the next screen”. Participants then completed one practice trial during which they were shown a picture of a man making a happy face and asked to identify the emotion from the list of Ekman’s six emotions.

Pictures were created by morphing a static photo of a person expressing the target emotion with a photo of the same person making a neutral expression. The lowest level of emotional expressivity was 10%, and the highest level was 100%; each emotion had 10 levels of emotional expressivity. Emotions reflected Ekman’s six basic emotions (i.e., happiness, sadness, disgust, fear, anger and surprise; Ekman & Friesen, 1976). All faces were Caucasian adults (50% female). Participants were presented with six trials of each emotion at each expressivity level for a total of 360 trials (6 trials by 6 emotions by 10 expressivity levels). Trials were presented in random order and consisted of a fixation cross for 250 ms, then the target face for 500 ms. Following each trial, participants were asked to choose (at their own pace) the emotion displayed from a list of the six possible emotions.

Data Analysis

Raw data were collected using E-Prime 2.0 software, scored in E-Merge software (Schneider, Eschman, & Zuccolotto, 2002) and exported to R for analysis. An unbiased hit rate was computed at each expressivity level for each emotion as the product of raw accuracy (i.e., number of trials correct / number of trials of an emotion) and differential accuracy (i.e., number of trials correct / number of times an emotion was endorsed; Marsh et al., 2010; Wagner, 1993). To account for accuracy due to guessing, the result was adjusted by subtracting 1/6 and then transformed using an arcsin transformation to improve normality over participants. Experimenters noted that 33 participants from V1 and 17 participants from V2 stopped engaging with the task (e.g., rapidly and repeatedly pressed the same number) before its completion. Data from these cases were removed due to concerns about data quality.

Test-retest reliability of FELT scores (i.e., unbiased hit rates) was assessed in two approaches. First, test-retest reliability was assessed as the correlation between unbiased hit rates at V1 and V2, calculated for each emotion at each expressivity level. Non-independence of twins due to familial aggregation was accounted for by estimating correlations within the biometrical model; a multivariate biometrical model was fit using structural equation modeling for V1 and V2 nested within person, nested within family. The correlation between V1 and V2 was taken from the resulting estimated variance-covariance matrix. This model provided an appropriate adjustment for non-independence due to nesting twins within families and an adjustment for greater non-independence among monozygotic twins.

Clinical and neurological correlates of the FELT may be clarified by analyzing data using a latent growth curve model or similar multilevel modeling to estimate the rate of change in accuracy as emotional expression becomes clearer and to account for the nesting of levels of emotional expressivity within person. Therefore, test-retest reliability of FELT scores (i.e., unbiased hit rates) was also assessed using a latent growth curve model to estimate the intercept, linear slope, and quadratic slope over difficulty, which describes a participant’s improvement in detecting an emotion with increasing emotional expressivity. Test-retest reliability of FELT scores was computed using structural equation modeling to fit a latent growth curve to the 10 expressivity levels for each person at each visit. Six separate structural equation models were fit to estimate the test-retest reliability of participants’ accuracy at detecting each of the six emotions. Similar to estimating the correlation between V1 and V2 for each emotion at each expressivity level, test-retest reliability for intercept, linear slope, and quadratic slope over difficulty were estimated within a biometrical model to account for non-independence due to nesting of twins within family and greater non-independence among monozygotic twins. A latent growth curve was estimated for each visit such that four latent growth curves were fit for each model (2 per twin by 2 twins). Because the factor loadings in a latent growth curve model can influence the correlations between the latent parameters, orthogonal loadings were used to minimize the impact of non-essential multicollinearity on the reliability estimates (Cohen, Cohen, West, & Aiken, 2003). Orthogonal loadings were centered so that the intercept for each model was centered at 55%. The correlation between V1 and V2 for intercept, linear slope, and quadratic slope were extracted from the resulting estimated variance-covariance matrix.

Results and Discussion

Results revealed strong test-retest reliability of FELT scores as a measure of face emotion identification ability among juveniles. Reliability was higher for emotion presentations at higher levels of expressivity (see Table 1). At 100% expressivity of emotion, V1 and V2 unbiased hit rate scores were correlated for each emotion: anger (r= 0.409), fear (r= 0.432), happiness (r= 0.542), sadness (r= 0.467), disgust (r= 0.462), and surprise (r= 0.390). Reliability of task performance scores (i.e., unbiased hit rates) improved as emotions became easier to recognize and approached an asymptote at approximately 60% expressivity (see Figure 1). That test-retest reliability approaches an asymptote is further supported by overlapping confidence intervals, which are consistent with no improvement in test-retest reliability of scores for each emotion from 60% to 100% expressivity (see Table 1). Reliability of scores is similar in the detection of all six emotions, with estimates of reliability for each falling within the confidence intervals for most scores at other emotions at 100% expressivity.

Table 1.

Test-Retest Reliability of Unbiased Hit Rates at Each Level of Emotional Expressivity

Expressivity Anger Fear Happy

Estimate CI Estimate CI Estimate CI
10 −0.082 (−0.248, 0.088) 0.146 (−0.058, 0.338) −0.016 (−0.187, 0.157)
20 0.236 (0.072, 0.386) −0.101 (−0.295, 0.102) −0.009 (−0.177, 0.159)
30 0.441 (0.293, 0.567) 0.085 (−0.102, 0.264) 0.156 (−0.009, 0.313)
40 0.069 (−0.096, 0.230) 0.264 (0.098, 0.416) 0.429 (0.279, 0.560)
50 0.275 (0.113, 0.423) 0.263 (−0.728, 0.88) 0.431 (0.284, 0.561)
60 0.411 (0.263, 0.541) 0.240 (−0.560, 0.928) 0.516 (0.380, 0.631)
70 0.475 (0.333, 0.598) 0.465 (0.318, 0.592) 0.581 (0.457, 0.685)
80 0.315 (0.153, 0.464) 0.374 (0.210, 0.520) 0.587 (0.461, 0.691)
90 0.508 (0.375, 0.621) 0.388 (0.236, 0.524) 0.546 (0.412, 0.659)
100 0.409 (0.259, 0.541) 0.432 (0.282, 0.563) 0.542 (0.408, 0.655)

Expressivity Sad Disgust Surprise

Estimate CI Estimate CI Estimate CI

10 0.152 (−0.017, 0.313) 0.320 (0.156, 0.464) −0.116 (−0.332, 0.114)
20 0.040 (−0.131, 0.209) 0.120 (−0.049, 0.283) 0.230 (0.020, 0.417)
30 0.020 (−0.153, 0.193) 0.072 (−0.100, 0.241) 0.022 (−0.170, 0.214)
40 0.107 (−0.068, 0.278) 0.164 (−0.010, 0.327) 0.178 (0.006, 0.338)
50 0.358 (0.209, 0.495) 0.499 (0.361, 0.615) 0.284 (0.123, 0.429)
60 0.397 (0.244, 0.532) 0.574 (0.447, 0.680) 0.440 (0.291, 0.568)
70 0.352 (0.194, 0.493) 0.600 (0.480, 0.698) 0.428 (0.280, 0.555)
80 0.442 (0.293, 0.572) 0.555 (0.425, 0.662) 0.388 (0.235, 0.525)
90 0.414 (0.261, 0.548) 0.555 (0.428, 0.662) 0.364 (0.209, 0.501)
100 0.467 (0.323, 0.590) 0.462 (0.314, 0.588) 0.390 (0.237, 0.525)

Note. CI refers to a 95% confidence interval

Figure 1.

Figure 1

Test-retest reliability for unbiased hit rates of each emotion at each level of expressivity.

Findings demonstrate high test-retest reliability for FELT scores when data is analyzed using latent growth curve modeling that aggregates over levels of expressivity to estimate performance as both an intercept and change within person. Intercept scores represent participant median performance for each emotion; the correlation between intercepts represents aggregated scores for performance on each emotion (as opposed to considering scores at each expressivity level independently). In contrast to a participant’s mean performance, the linear slope indexes the extent to which a participant’s performance improved as the presentation of emotion became clearer. The correlations between intercepts at V1 and V2 (anger: r= 0.811, fear: r= 0.796, happy: r= 0.831, sad= r= 0.763, disgust: r= 0.847, surprise: r= 0.789) as well as correlations between linear slopes at V1 and V2 (anger: r= 0.759, fear: r= 0.688, happy: r= 0.765, sad= r= 0.734, disgust: r= 0.831, surprise: r= 0.692) are high and similar for all emotions (see Table 2). Notably, estimated reliability is substantially higher for unbiased hits rates aggregated over expressivity levels, lending support for this analytic approach in future research. There were no consistent differences in reliability of scores based on gender (see Supplemental Table 1) or age (see Supplemental Table 2). The estimated growth parameters (i.e., intercept, linear slope, and quadratic slope) for visit 1 and visit 2 were examined to evaluate potential practice effects. There is evidence that performance on the task followed a non-linear trajectory (see Supplemental Table 3). There was little inter-individual variance in the quadratic slope, which precludes estimating test-retest reliability. While there is some improvement of performance at visit 2 (see Supplemental table 3), consistent improvement in performance may reduce estimated correlations describing the test-retest reliability of performance on the FELT.

Table 2.

Reliability of Intercept, Linear Slope, and Quadratic Slope over Difficulty

Emotion Intercept Linear Slope Quadratic Slope

Estimate CI Estimate CI Estimate CI
Anger 0.811 (0.661, 0.829) 0.759 (0.535, 0.793) −0.068 (−0.430, 0.627)
Fear 0.796 (0.652, 0.897) 0.688 (0.429, 0.695) 0.652 (−0.050, 0.651)
Happy 0.831 (0.741, 0.899) 0.765 (0.629, 0.860) 0.711 (0.155, 0.873)
Sad 0.763 (0.619, 0.864) 0.734 (0.578, 0.892) 0.136 (−0.613, 0.267)
Disgust 0.847 (0.715, 0.934) 0.831 (0.679, 0.927) 0.686 (−0.044, 0.887)
Surprise 0.789 (0.606, 0.904) 0.692 (0.431, 0.692) 0.874 (−0.133, 0.874)

Note. CI refers to a 95% confidence interval.

While this study demonstrates high test-retest reliability of FELT scores, some limitations should be mentioned. First, the FELT requires participants to attend for the entire duration of task (~ 25 min), and lack of attention was evident for some participants in this study. It is unclear whether a shorter task with fewer trials would provide an acceptable level of signal-to-noise. Second, prior studies have found effects of IQ on participants’ ability to identify emotional facial expressions (Lawrence et al., 2015); however, those effects were not considered in these reliability estimates. Finally, this sample is limited to Caucasian families due to its primary genetic aims, so these findings might not be generalizable to other races.

Despite these limitations, evidence of high test-retest reliability of FELT scores supports the use of this task in longitudinal research on children’s socioemotional developmental trajectories; test-retest reliability of a task’s scores is necessary to use that task to study change in a process over time (Khoo, West, Wu, & Kwok, 2006). This task and the analytic methods of the current study may also be used in future research on the development and clinical correlates of face emotion identification in children. Test-retest reliability of performance on the FELT is needed to clarify the association of face emotion identification deficits with psychopathology. Reliability is important for generalizability and to determine that the variance in results is not due to noise, but to variance due to psychopathology. This task may also be used in future research on emotion regulation. Since recruiting assistance from others in coping with distress and providing assistance to others in need are key components of interpersonal emotion regulation (Hofmann, 2014), face emotion identification seems to play a crucial role in this process as well.

Moreover, since deficits in face emotion identification ability are associated with a variety of psychiatric disorders, deficits in the detection of some emotions may represent a transdiagnostic endophenotype (i.e., a broad, intermediate risk factor for psychiatric illness). This might be possible given that some emotions activate a common group of brain regions while others are region-specific (Fusar-Poli et al., 2009). Endophenotypes may be less genetically complex and thus provide a link between genetic code and symptoms indicative of psychiatric disorders (Cannon & Keller, 2006; Insel et al., 2010). Since one of the criteria of an endophenotype is that it is heritable (Gottesman & Gould, 2003), and test-retest reliability is needed to assess heritability based on genetic epidemiological research (Cannon & Keller, 2006), this study’s findings recommend this variation of the FELT for future research on the heritability of face emotion identification; this future work will utilize twin models. This research, using the FELT and corresponding analyses demonstrated in the current study, will be critical in order to evaluate whether components of face emotion identification are transdiagnostic endophenotypes.

The findings of the current study establish the reliability of the FELT to assess children’s face emotion labeling ability, which is necessary to support the use of this task in future studies of children’s face emotion identification ability. Additionally, the analytic approach utilized in the current study (i.e., latent growth curve analysis using unbiased hit rates) is preferred to other methods (e.g., bivariate correlations using hit rates), as the analytic strategies of this study take into account change in accuracy as the emotion becomes clearer and adjust for accuracy due to guessing and non-normality when determining participants’ overall ability to identify emotions.

Supplementary Material

1
2
3

Public Significance Statement.

This study demonstrated strong test-retest reliability of the facial emotion labeling task scores in a sample of child twins aged 9 to 14. Evidence from this study of strong test-retest reliability supports use of this task in future research on the psychobiological etiology and longitudinal development of face emotion identification ability.

Acknowledgments

This study was supported by the National Institutes of Health (R01MH098055 to JMH, NIMH-T32MH020030 to LMR and NIMH-IRP-ziamh002781 to DSP)

Footnotes

The material contained in this manuscript has not been published nor is it under consideration elsewhere. There are no previous publications based on this data. The authors do not have any financial interests that might influence this research.

Contributor Information

Jennifer L. Cecilione, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA

Lance M. Rappaport, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA

Brad Verhulst, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA.

Dever M. Carney, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA

R.J.R. Blair, Section on Affective Cognitive Neuroscience, National Institute of Mental Health, National Institutes of Health, Bethesda, MD

Melissa A. Brotman, Emotion and Development Branch, National Institute of Mental Health, National Institutes of Health, Bethesda, MD

Ellen Leibenluft, Emotion and Development Branch, National Institute of Mental Health, National Institutes of Health, Bethesda, MD.

Daniel S. Pine, Emotion and Development Branch, National Institute of Mental Health, National Institutes of Health, Bethesda, MD

Roxann Roberson-Nay, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA.

John M. Hettema, Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA

References

  1. Adams T, Pounder Z, Preston S, Hanson A, Gallagher P, Harmer CJ, McAllister-Williams RH. Test–retest reliability and task order effects of emotional cognitive tests in healthy subjects. Cognition and Emotion. 2015:1–13. doi: 10.1080/02699931.2015.1055713. http://doi.org/10.1080/02699931.2015.1055713. [DOI] [PubMed]
  2. Averbeck BB, Bobin T, Evans S, Shergill SS. Emotion recognition and oxytocin in patients with schizophrenia. Psychological Medicine. 2012;42(2):259–266. doi: 10.1017/S0033291711001413. http://doi.org/10.1017/S0033291711001413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Blair RJR, Colledge E, Murray L, Mitchell DGV. A selective impairment in the processing of sad and fearful expressions in children with psychopathic tendencies. Journal of Abnormal Child Psychology. 2001;29(6):491–498. doi: 10.1023/a:1012225108281. [DOI] [PubMed] [Google Scholar]
  4. Bourke C, Douglas K, Porter R. Processing of facial emotion expression in major depression: a review. Australian and New Zealand Journal of Psychiatry. 2010;44(8):681–696. doi: 10.3109/00048674.2010.496359. [DOI] [PubMed] [Google Scholar]
  5. Brotman MA, Guyer AE, Lawson ES, Horsey SE, Rich BA, Dickstein DP, … Leibenluft E. Facial emotion labeling deficits in children and adolescents at risk for bipolar disorder. American Journal of Psychiatry. 2008 doi: 10.1176/appi.ajp.2007.06122050. Retrieved from http://ajp.psychiatryonline.org/doi/pdf/10.1176/appi.ajp.2007.06122050. [DOI] [PubMed]
  6. Cannon TD, Keller MC. Endophenotypes in the Genetic Analyses of Mental Disorders. Annual Review of Clinical Psychology. 2006;2(1):267–290. doi: 10.1146/annurev.clinpsy.2.022305.095232. http://doi.org/10.1146/annurev.clinpsy.2.022305.095232. [DOI] [PubMed] [Google Scholar]
  7. Carney DM, Moroney E, Machlin L, Hahn S, Savage JE, Lee M, Towbin KA, Brotman MA, Pine DS, Leibenluft E, Roberson-Nay R, Hettema JM. The Twin Study of Negative Valence Emotional Constructs. Twin Research and Human Genetics. doi: 10.1017/thg.2016.59. In press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cohen J, Cohen P, West SG, Aiken LS. Applied multiple regression/correlation analysis for the behavioral sciences. 3. Mahwah, NJ: Lawrence Erlbaum Associates; 2003. [Google Scholar]
  9. Easter J, McClure EB, Monk CS, Dhanani M, Hodgdon H, Leibenluft E, … Ernst M. Emotion recognition deficits in pediatric anxiety disorders: implications for amygdala research. Journal of Child & Adolescent Psychopharmacology. 2005;15(4):563–570. doi: 10.1089/cap.2005.15.563. [DOI] [PubMed] [Google Scholar]
  10. Ekman P, Friesen WV. Pictures of facial affect. Consulting Psychologists Press; 1976. [Google Scholar]
  11. Fusar-Poli P, Placentino A, Carletti F, Landi P, Abbamonte M. Functional atlas of emotional faces processing: a voxel-based meta-analysis of 105 functional magnetic resonance imaging studies. Journal of psychiatry & neuroscience: JPN. 2009;34(6):418. [PMC free article] [PubMed] [Google Scholar]
  12. Gottesman II, Gould TD. The endophenotype concept in psychiatry: etymology and strategic intentions. American Journal of Psychiatry. 2003;160(4):636–645. doi: 10.1176/appi.ajp.160.4.636. [DOI] [PubMed] [Google Scholar]
  13. Guyer AE, McClure EB, Adler AD, Brotman MA, Rich BA, Kimes AS, … Leibenluft E. Specificity of facial expression labeling deficits in childhood psychopathology: Face-emotion labeling deficits. Journal of Child Psychology and Psychiatry. 2007;48(9):863–871. doi: 10.1111/j.1469-7610.2007.01758.x. http://doi.org/10.1111/j.1469-7610.2007.01758.x. [DOI] [PubMed] [Google Scholar]
  14. Harmer CJ, Rogers RD, Tunbridge E, Cowen PJ, Goodwin GM. Tryptophan depletion decreases the recognition of fear in female volunteers. Psychopharmacology. 2003;167(4):411–417. doi: 10.1007/s00213-003-1401-6. [DOI] [PubMed] [Google Scholar]
  15. Hofmann SG. Interpersonal Emotion Regulation Model of Mood and Anxiety Disorders. Cognitive Therapy and Research. 2014;38(5):483–492. doi: 10.1007/s10608-014-9620-1. http://doi.org/10.1007/s10608-014-9620-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Insel T, Cuthbert B, Garvey M, Heinssen R, Pine DS, Quinn K, … Wang P. Research domain criteria (RDoC): toward a new classification framework for research on mental disorders. American Journal of Psychiatry. 2010;167(70):748–751. doi: 10.1176/appi.ajp.2010.09091379. [DOI] [PubMed] [Google Scholar]
  17. Jackson RW, Snieder H, Davis H, Treiber FA. Determination of twin zygosity: a comparison of DNA with various questionnaire indices. Twin Research. 2001;4(1):12–18. doi: 10.1375/1369052012092. [DOI] [PubMed] [Google Scholar]
  18. Kasriel J, Eaves L. The zygosity of twins: Further evidence on the agreement between diagnosis by blood groups and written questionnaires. Journal of Biosocial Science. 1976;8(3):263–266. doi: 10.1017/s0021932000010737. [DOI] [PubMed] [Google Scholar]
  19. Khoo S-T, West S, Wu W, Kwok O-M. Longitudinal methods. Handbook of Multimethod Measurement in Psychology. 2006:301–317. [Google Scholar]
  20. Kirkpatrick MG, Lee R, Wardle MC, Jacob S, de Wit H. Effects of MDMA and intranasal oxytocin on social and emotional processing. Neuropsychopharmacology. 2014;39(7):1654–1663. doi: 10.1038/npp.2014.12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Kirsh SJ, Mounts JRW. Violent video game play impacts facial emotion recognition. Aggressive Behavior. 2007;33(4):353–358. doi: 10.1002/ab.20191. http://doi.org/10.1002/ab.20191. [DOI] [PubMed] [Google Scholar]
  22. Lawrence K, Campbell R, Skuse D. Age, gender, and puberty influence the development of facial emotion recognition. Frontiers in Psychology. 2015;6:761. doi: 10.3389/fpsyg.2015.00761. http://doi.org/10.3389/fpsyg.2015.00761. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Lilley ECH, Silberg JL. The Mid-Atlantic Twin Registry, Revisited. Twin Research and Human Genetics. 2013;16(1):424–428. doi: 10.1017/thg.2012.125. http://doi.org/10.1017/thg.2012.125. [DOI] [PubMed] [Google Scholar]
  24. Marsh AA, Blair RJR. Deficits in facial affect recognition among antisocial populations: A meta-analysis. Neuroscience & Biobehavioral Reviews. 2008;32(3):454–465. doi: 10.1016/j.neubiorev.2007.08.003. http://doi.org/10.1016/j.neubiorev.2007.08.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Marsh AA, Yu HH, Pine DS, Blair RJR. Oxytocin improves specific recognition of positive facial expressions. Psychopharmacology. 2010;209(3):225–232. doi: 10.1007/s00213-010-1780-4. http://doi.org/10.1007/s00213-010-1780-4. [DOI] [PubMed] [Google Scholar]
  26. Nowicki S, Duke MP. Individual differences in the nonverbal communication of affect: The Diagnostic Analysis of Nonverbal Accuracy Scale. Journal of Nonverbal Behavior. 1994;18(1):9–35. [Google Scholar]
  27. Pinkham AE, Penn DL, Green MF, Harvey PD. Social Cognition Psychometric Evaluation: Results of the Initial Psychometric Study. Schizophrenia Bulletin. 2016;42(2):494–504. doi: 10.1093/schbul/sbv056. http://doi.org/10.1093/schbul/sbv0562. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Schneider W, Eschman A, Zuccolotto A. E-prime computer software and manual. Pittsburgh, PA: Psychology Software Tools Inc; 2002. [Google Scholar]
  29. Shaw P, Stringaris A, Nigg J, Leibenluft E. Emotion dysregulation in attention deficit hyperactivity disorder. Focus. 2016;14(1):127–144. doi: 10.1176/appi.focus.140102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Steinberg L, Morris AS. Adolescent development. Journal of Cognitive Education and Psychology. 2001;2(1):55–87. [Google Scholar]
  31. Trentacosta CJ, Fine SE. Emotion Knowledge, Social Competence, and Behavior Problems in Childhood and Adolescence: A Meta-analytic Review. Social Development. 2010;19(1):1–29. doi: 10.1111/j.1467-9507.2009.00543.x. http://doi.org/10.1111/j.1467-9507.2009.00543.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Wagner HL. On measuring performance in category judgment studies of nonverbal behavior. Journal of Nonverbal Behavior. 1993;17(1):3–28. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2
3

RESOURCES