Visual and Multimodal Analysis of Human Spontaneous Behavior: Introduction to the Special Issue of Image & Vision Computing Journal

Maja Pantic; Jeffrey F Cohn

doi:10.1016/j.imavis.2009.07.001

. Author manuscript; available in PMC: 2010 Nov 1.

Published in final edited form as: Image Vis Comput. 2009 Nov 1;27(12):1741–1742. doi: 10.1016/j.imavis.2009.07.001

Visual and Multimodal Analysis of Human Spontaneous Behavior: Introduction to the Special Issue of Image & Vision Computing Journal

Maja Pantic ¹, Jeffrey F Cohn ²

PMCID: PMC2759093 NIHMSID: NIHMS131167 PMID: 20160957

Widely anticipated in HCI is that computing will move to the background, weaving itself into the fabric of our everyday living and projecting the human user into the foreground. To realize this goal, next-generation computing (a.k.a. pervasive computing, ambient intelligence, and human computing) will need to develop human-centered user interfaces that respond readily to naturally occurring, multimodal, human communication. These interfaces will need the capacity to perceive, understand, and respond appropriately to human intentions and cognitive-emotional states as communicated by social and affective signals.

Motivated by this vision of the future, automated analysis of nonverbal behavior has attracted increasing attention in diverse disciplines, including psychology, computer science, linguistics, and neuroscience. Promising approaches have been reported, especially in the areas of facial expression and multimodal communication. Until recently, much of this work has focused on posed, often exaggerated expressions (for reviews, see (Pantic & Rothkrantz, 2003; Tian et al., 2005; Zeng et al., 2009)). Yet, increasing evidence suggests that deliberate or posed behaviour differs in appearance and timing from that which occurs in daily life. For example, brow raises have larger amplitude, faster onset, and shorter duration when posed than when not (Schmidt et al., 2009). The morphology of facial actions differs as well. As (Littlewort, et al., 2009) in this issue report, facial actions systematically differ between spontaneous and feigned pain. Approaches to automatic behaviour analysis that have been trained on deliberate and typically exaggerated behaviours may fail to generalize to the complexity of expressive behaviour found in real-world settings.

This Special Issue of Image and Vision Computing brings together cutting edge work on the automatic analysis of non-posed, real-world human behavior. It includes state-of-the-art reviews of computational approaches to conversation analysis (Gatica-Perez, 2009) and social signal processing (Vinciarelli et al., 2009), recent advances in generic face modeling (Lucey et al., 2009), automated detection of pain from facial behavior (Littlewort et al., 2009) (Ashraf et al., 2009), cognitive states of interest from facial, vocal, and gestural behavior (Schuller et al., 2009), automatic detection of diverse human activities from spatio-temporal features (Oikonomopoulos et al., 2009), and automatic recognition of American Sign Language (Ding & Martinez, 2009). These papers represent an exciting advance toward human-centered interfaces that can perceive and understand real-world human behavior.

Of course, as discussed by Zeng et al. (2009) and by Vinciarelli et al. (2009) and Gatica-Perez (2009) in this issue, significant scientific and technical challenges remain to be addressed. However, we are optimistic about the continued progress. A principal reason is that automatic multimodal analysis of human naturalistic behavior is prerequisite in achieving next-generation, human-centered computing (Jaimes et al., 2006; Pantic et al., 2006, 2009); and this topic is poised to become one of the most active research topics in the computer vision and signal processing communities. To support these efforts, infrastructure is emerging from the extensive efforts of investigators, international sponsors, and professional societies. A sampling of research activities includes basic research on machine analysis of human behavior (e.g., European Research Council (ERC) MAHNOB project), automatic analysis of face-to-face and small group interactions (e.g., see the projects of MIT Human Dynamics Laboratory and European Commission (EC) FP6 AMIDA project), social signaling (e.g., EC FP7 Social Signal Processing NoE project), human-computer interactions (e.g., EC FP7 Semaine project), applications in mental health (Cohn et al., 2009), and other areas. The contributions in this Special Issue highlight recent advances and point to continued progress toward the goal of human-centered interfaces that can understand human intentions and behavior and respond intelligently.

Acknowledgments

The Guest Editors of this Special Issue would like to thank the reviewers who have volunteered their time to provide valuable feedback to the authors. They would also like to thank the contributors for making this issue an important asset to the existing body of literature in the field. Many thanks go to Image & Vision Computing Journal editorial support for their help during the preparation of this issue.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Contributor Information

Maja Pantic, Imperial College London UK / University of Twente, NL.

Jeffrey F. Cohn, University of Pittsburgh / Carnegie Mellon University, USA

References

Ashraf AB, Lucey S, Cohn JF, Chen T, Prkachin KM, Solomon P. The painful face: Pain expression recognition using active appearance models. Image and Vision Computing. 2009 doi: 10.1016/j.imavis.2009.05.007. this issue. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cohn JF, Simon Kreuz T, Matthews I, Yang Y, Nguyen MH, Tejera Padilla M, et al. Detecting depression from facial actions and vocal prosody. Proceedings Int'l Conf Affective Computing and Intelligent Interaction.2009. [Google Scholar]
Ding L, Martinez AM. Modelling and recognition of the linguistic components in American Sign Language. Image and Vision Computing. 2009 doi: 10.1016/j.imavis.2009.02.005. this issue. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gatica-Perez D. Automatic nonverbal analysis of social interaction in small groups: A review. Image and Vision Computing. 2009 this issue. [Google Scholar]
Jaimes A, Sebe N, Gatica-Perez D. Human-Centered Computing: A Multimedia Perspective. Proceedings ACM Int'l Conf Multimedia; 2006. pp. 855–864. [Google Scholar]
Littlewort GC, Bartlett MS, Lee K. Automatic coding of facial expressions displayed during posed and genuine pain. Image and Vision Computing. 2009:2009. this issue. [Google Scholar]
Lucey S, Wang Y, Cox M, Sridharan S, Cohn JF. Efficient constrained local model fitting for non-rigid face alignment. Image and Vision Computing Journal. 2009 doi: 10.1016/j.imavis.2009.03.002. this issue. [DOI] [PMC free article] [PubMed] [Google Scholar]
Oikonomopoulos A, Pantic M, Patras I. Sparse B-spline Polynomial Descriptors for Human Activity Recognition. Image and Vision Computing. 2009 this issue. [Google Scholar]
Pantic M, Pentland A, Nijholt A, Huang TS. Human Computing and machine understanding of human behavior: A Survey. Proceedings ACM Int'l Conf Multimodal Interfaces; 2006. pp. 239–248. [Google Scholar]
Pantic M, Pentland A, Nijholt A. Guest Editorial Special Issue on Human Computing. IEEE Transactions on Systems, Man and Cybernetics – Part B. 2009;39(1):3–93. doi: 10.1109/tsmcb.2008.2008372. [DOI] [PubMed] [Google Scholar]
Pantic M, Rothkrantz M. Toward an affect-sensitive multimodal human-computer interaction. Proceedings of the IEEE. 2003;91:1371–1390. [Google Scholar]
Schmidt KL, Bhattacharya S, Denlinger R. Comparison of deliberate and spontaneous facial movement in smiles and eyebrow raises. Journal of Nonverbal Behavior. 2009;33:35–45. doi: 10.1007/s10919-008-0058-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schuller B, Muller R, Eyben F, Gast J, Hornler B, Wollmer M, et al. Being bored? Recognising natural interest by extensive audiovisual integration for real-life application. Image and Vision Computing. 2009 this issue. [Google Scholar]
Tian Y, Cohn JF, Kanade T. Facial expression analysis. In: Li SZ, Jain AK, editors. Handbook of face recognition. New York, New York: Springer; 2005. pp. 247–276. [Google Scholar]
Vinciarelli A, Pantic M, Boulard H. Social signal processing: Survey of an emerging domain. Image and Vision Computing. 2009 this issue. [Google Scholar]
Zeng Z, Pantic M, Roisman GI, Huang TS. A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions. IEEE Transactions of Pattern Analysis and Machine Intelligence. 2009;31(1):39–58. doi: 10.1109/TPAMI.2008.52. [DOI] [PubMed] [Google Scholar]

[R1] Ashraf AB, Lucey S, Cohn JF, Chen T, Prkachin KM, Solomon P. The painful face: Pain expression recognition using active appearance models. Image and Vision Computing. 2009 doi: 10.1016/j.imavis.2009.05.007. this issue. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Cohn JF, Simon Kreuz T, Matthews I, Yang Y, Nguyen MH, Tejera Padilla M, et al. Detecting depression from facial actions and vocal prosody. Proceedings Int'l Conf Affective Computing and Intelligent Interaction.2009. [Google Scholar]

[R3] Ding L, Martinez AM. Modelling and recognition of the linguistic components in American Sign Language. Image and Vision Computing. 2009 doi: 10.1016/j.imavis.2009.02.005. this issue. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Gatica-Perez D. Automatic nonverbal analysis of social interaction in small groups: A review. Image and Vision Computing. 2009 this issue. [Google Scholar]

[R5] Jaimes A, Sebe N, Gatica-Perez D. Human-Centered Computing: A Multimedia Perspective. Proceedings ACM Int'l Conf Multimedia; 2006. pp. 855–864. [Google Scholar]

[R6] Littlewort GC, Bartlett MS, Lee K. Automatic coding of facial expressions displayed during posed and genuine pain. Image and Vision Computing. 2009:2009. this issue. [Google Scholar]

[R7] Lucey S, Wang Y, Cox M, Sridharan S, Cohn JF. Efficient constrained local model fitting for non-rigid face alignment. Image and Vision Computing Journal. 2009 doi: 10.1016/j.imavis.2009.03.002. this issue. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Oikonomopoulos A, Pantic M, Patras I. Sparse B-spline Polynomial Descriptors for Human Activity Recognition. Image and Vision Computing. 2009 this issue. [Google Scholar]

[R9] Pantic M, Pentland A, Nijholt A, Huang TS. Human Computing and machine understanding of human behavior: A Survey. Proceedings ACM Int'l Conf Multimodal Interfaces; 2006. pp. 239–248. [Google Scholar]

[R10] Pantic M, Pentland A, Nijholt A. Guest Editorial Special Issue on Human Computing. IEEE Transactions on Systems, Man and Cybernetics – Part B. 2009;39(1):3–93. doi: 10.1109/tsmcb.2008.2008372. [DOI] [PubMed] [Google Scholar]

[R11] Pantic M, Rothkrantz M. Toward an affect-sensitive multimodal human-computer interaction. Proceedings of the IEEE. 2003;91:1371–1390. [Google Scholar]

[R12] Schmidt KL, Bhattacharya S, Denlinger R. Comparison of deliberate and spontaneous facial movement in smiles and eyebrow raises. Journal of Nonverbal Behavior. 2009;33:35–45. doi: 10.1007/s10919-008-0058-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Schuller B, Muller R, Eyben F, Gast J, Hornler B, Wollmer M, et al. Being bored? Recognising natural interest by extensive audiovisual integration for real-life application. Image and Vision Computing. 2009 this issue. [Google Scholar]

[R14] Tian Y, Cohn JF, Kanade T. Facial expression analysis. In: Li SZ, Jain AK, editors. Handbook of face recognition. New York, New York: Springer; 2005. pp. 247–276. [Google Scholar]

[R15] Vinciarelli A, Pantic M, Boulard H. Social signal processing: Survey of an emerging domain. Image and Vision Computing. 2009 this issue. [Google Scholar]

[R16] Zeng Z, Pantic M, Roisman GI, Huang TS. A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions. IEEE Transactions of Pattern Analysis and Machine Intelligence. 2009;31(1):39–58. doi: 10.1109/TPAMI.2008.52. [DOI] [PubMed] [Google Scholar]

PERMALINK

Visual and Multimodal Analysis of Human Spontaneous Behavior: Introduction to the Special Issue of Image & Vision Computing Journal

Maja Pantic

Jeffrey F Cohn

Acknowledgments

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Visual and Multimodal Analysis of Human Spontaneous Behavior: Introduction to the Special Issue of Image & Vision Computing Journal

Maja Pantic

Jeffrey F Cohn

Acknowledgments

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases