Abstract
Multimodal emotional expressions play an essential role in real-life communication. Mehrabian and colleagues suggested that facial expressions may have the greatest emotional impact, followed by vocal and verbal expressions. However, no study has examined all three modalities in face-to-face situations in a single experiment, possibly due to limitations in human acting. We postulated that an android could be a useful solution to this problem. In this study, the android Nikola systematically changed its facial, vocal, and verbal expressions of negative, neutral, and positive emotions in a face-to-face situation. Participants rated the emotional valence of the expressions. The modalities were ranked from the greatest to the least emotional impact, as follows: facial expressions, then vocal expressions, and finally verbal expressions. Additional experiments with human raters and ChatGPT showed comparable emotional valence for facial, vocal, and verbal expressions presented unimodally. The results provide the first evidence validating Mehrabian’s model, demonstrating the importance of facial or nonverbal expressions in face-to-face emotional communication.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-025-11745-w.
Keywords: Android, Facial expression, Mehrabian’s model, Multimodal emotional communication, Prosody, Verbal content, ChatGPT.
Subject terms: Social behaviour, Human behaviour
Introduction
Emotional information is communicated between individuals multimodally through facial, vocal, and verbal expressions, greatly influencing the quality of social interactions1. A series of studies by Mehrabian2–4 and several subsequent studies (e.g., 5,6) have suggested that facial expressions have the greatest emotional impact, followed by vocal and verbal expressions, when the messages conveyed by these different modalities differ. Mehrabian and colleagues initiated an investigation into this issue, reasoning that while many studies had investigated unimodal emotional expressions, those involving multimodal emotional expressions were theoretically important but less frequently studied2. In these studies, each modality of emotional expression was systematically manipulated (e.g., photographs of models’ facial expressions of negative, neutral, and positive emotions were each combined with speakers’ negative, neutral, and positive vocal tones while reading a neutral word [maybe]2), and the participants rated the stimuli in terms of emotional attitudes2,3 or states5,6 such as emotional valence (i.e., the qualitative component of emotion, ranging from positive to negative7,8). Although there were some inconsistent findings9 many studies support the proposed order6,9. Mehrabian and Ferris2 further calculated relative importance values for facial, vocal, and verbal expressions as 55%, 38%, and 7%, respectively. However, subsequent studies reported different values9. One study5 examined individual differences in the relative importance of nonverbal (i.e., facial and vocal) versus verbal emotional expressions, and found that the relative importance ranged from 55 to 100%, with a mean value of approximately 90%, suggesting that testing different samples could produce different ratios. Collectively, regardless of the relative importance values, evidence suggests that facial expressions are the most important, followed by vocal and verbal expressions6,9. This phenomenon was later called Mehrabian’s rule10 and became one of the most widely cited academic findings in the field of communication11.
However, to the best of our knowledge, no study has examined the three emotional expression modalities in a single experiment; prior studies manipulated only two modalities (e.g., facial and vocal expressions2 or vocal and verbal expressions3). This may be due to limitations in human acting capabilities. Another problem with previous studies is their lack of ecological validity9. No study has examined live face-to-face interaction. Previous studies reported that psychological/neural responses to emotional expressions could differ between live and pre-recorded expressions12,13. Given the lack of supportive evidence, it was even suggested that Mehrabian’s rule could be an urban legend or myth10,14.
Androids may provide a solution to this problem. A recently developed android named Nikola is capable of human-like facial expressions and can also produce verbal utterances with prosody15. Using this android in face-to-face situations, we aimed to validate the hypothesized order of importance of facial, vocal, and verbal expressions regarding emotional impact.
In the main experiment of this study, Nikola systematically showed a combination of facial, vocal, and verbal expressions of negative (sadness), neutral, and positive (happiness) emotions in a face-to-face situation. Participants were asked to rate the expressed emotions on a 9-point emotional valence scale. We aimed to assess the emotion of the expressor (i.e., Nikola), as in previous studies assessing the “emotional state” of stimulus persons using valence scales5,6. However, as a previous study showed that people generally believe that humanoid robots have no subjective experience, akin to dead people16 we asked participants to rate the “expressed emotion” at the behavioral level, without referring to subjective emotional states. Additionally, we cautioned the participants not to rate their own emotions, which had been assessed in some previous studies17. The relative importance of the three modalities was evaluated using main effect F-values in an analysis of variance (ANOVA), as in a previous study18. To demonstrate that the stimuli for each modality could communicate the target emotional valence without the effects of other modalities in a comparable manner, we conducted additional experiments presenting unimodal stimuli to human raters and an artificial intelligence (AI) agent, ChatGPT19 and asked for their valence ratings.
Results
Main experiment
Valence ratings (Fig. 1) were analyzed using a 3 (facial) × 3 (vocal) × 3 (verbal) repeated-measures ANOVA model. First, planned contrasts of simple-simple main effects were tested to confirm the validity of our emotional expression manipulation (negative versus neutral versus positive) using one-tailed t-values. The results showed that the differences among emotional expressions were significant in all conditions (t[289.141] > 3.31, p < 0.005).
Fig. 1.
Mean (with standard error) valence ratings for Nikola’s facial, vocal, and verbal emotional expressions in the main experiment.
A three-way ANOVA was then conducted. The results showed that all main effects (facial, vocal, and verbal expressions: F[2, 76] = 138.20, 89.81, and 61.21, all p < 0.001, and η²p = 0.78, 0.74, and 0.61, respectively), all two-way interactions (facial × vocal, facial × verbal, and vocal × verbal expressions: F[4, 152] = 11.80, 14.13, and 7.02, all p < 0.001, η²p = 0.23, 0.27, and 0.15, respectively), and the three-way interaction (F[8, 304] = 7.23, p < 0.001, η²p = 0.16) were significant.
The relative importance of the different modalities was evaluated by comparing F-values for the main effects, as in a previous study18 (Fig. 2). The F-value was highest for facial expressions, followed by vocal and verbal expressions, corresponding to 47.8%, 31.1%, and 21.2%, respectively. Following a previous study2, a weighted sum model (i.e., a model without interactions20) was constructed, even though the interactions were significant in our study. The results showed the same patterns as the above results, indicating significant main effects of facial, vocal, and verbal expressions (F[2, 76] = 137.62, 91.64, and 61.02, all p < 0.001, η²p = 0.78, 0.71, and 0.62), which corresponded to 47.4%, 31.6%, and 21.0%, respectively.
Fig. 2.

Relative F-values for the main effects of facial, vocal, and verbal expressions in the main experiment.
Additional experiments
We conducted two additional experiments to validate the stimuli for each modality. For these experiments, we only used emotional expressions from single modality: facial (i.e., videos of facial expressions without sound), vocal (i.e., sounds of vocal expressions with a neutral verbal expression), and verbal (i.e., written texts) expressions of negative, neutral, or positive emotions.
In the first experiment, we presented these stimuli to human raters in an online environment. They rated the stimuli using a 9-point valence scale, which was also used in the main experiment. Valence ratings (Table 1) were analyzed using a two-way mixed design ANOVA with modality (facial, vocal, and verbal) as a between-subjects factor and emotion (negative, neutral, and positive) as a within-subjects factor. The results showed that only the main effect of emotion was significant (F[2, 114] = 195.38, p < 0.001, η²p = 0.77). The main effect of modality showed a non-significant tendency (F[2, 57] = 2.76, p = 0.072, η²p = 0.09), suggesting more positive valence ratings for verbal than for other expressions. The interaction between modality and emotion was not significant (F[4, 114] = 1.86, p = 0.123, η²p = 0.06). The planned contrasts for the main effect of emotion showed that the expected differences (i.e., negative versus neutral versus positive) were significant (t[114] = 19.76, p < 0.001).
Table 1.
Mean (with standard error) Valence ratings for Nikola’s facial, vocal, and verbal emotional expressions by human and ChatGPT raters in the additional experiment.
| Rater | Modality | Negative | Neutral | Positive | |||
|---|---|---|---|---|---|---|---|
| Human | Facial | −1.9 | (0.2) | −0.3 | (0.2) | 1.6 | (0.3) |
| Vocal | −1.5 | (0.2) | 0.1 | (0.2) | 1.7 | (0.2) | |
| Verbal | −2.0 | (0.2) | 0.1 | (0.1) | 2.4 | (0.2) | |
| ChatGPTa | Facial | −3.0 | 0.0 | 3.5 | |||
| Vocal | −2.5 | 0.5 | 3.5 | ||||
| Verbal | −3.5 | 0.0 | 3.0 | ||||
a ChatGPT o4-mini-high was used on 17 June 2025.
Next, we collected objective and automated valence ratings using the AI agent ChatGPT to complement the human ratings. Previous studies have shown that, similar to humans, ChatGPT can effectively conduct emotion perception tasks using facial21, vocal22, and verbal23 emotional expressions. We presented the unimodal stimuli used in the human ratings and asked ChatGPT to rate the emotional expressions using a 9-point valence scale. As expected, the valence ratings differed across emotion conditions (i.e., negative versus neutral versus positive), but were similar (difference of ≤ 1.0) across the modalities (Table 1).
In summary, these human and machine ratings suggested that the stimuli for each modality could express the target emotional valence in a comparable manner.
Discussion
The emotional valence conveyed by the facial, vocal, and verbal expressions of the android Nikola was recognized by participants in both the main and additional experiments. These results corroborate earlier studies on emotion perception through an android’s facial expressions13 and expand our understanding to include emotion perception based on prosody and verbal utterances.
More important, the order of importance with respect to emotional impact was as follows (from most to least): facial, vocal, and verbal expressions. This is consistent with the model proposed by Mehrabian and Ferris2 and the findings of several subsequent studies5,6. However, no study has empirically validated this model using all three modalities under face-to-face situations. Some researchers even consider Mehrabian’s model an urban legend10,14. To the best of our knowledge, this study provides the first empirical evidence supporting Mehrabian’s communication model.
In this study, facial, vocal, and verbal expressions accounted for approximately 48%, 31%, and 21% of the variance in emotional valence ratings, respectively. These values are not fully compatible with those reported by Mehrabian and Ferris2 i.e., 55%, 38%, and 7%, respectively. Several other studies have also reported different relative importance values from those of Mehrabian and colleagues9. Jacob et al.5 suggested that the relative importance of modalities can vary substantially across participants. Our study aligns with these findings and provides the first data on the relative importance of different emotional expression modalities measured simultaneously.
Our findings supporting Mehrabian’s model have theoretical implications. As suggested by Mehrabian4 and many subsequent studies9, our results showed that nonverbal expressions play a more important role than verbal expressions in emotional communication, despite the dominant role of verbal communication in modern society. The results could lead to an emphasis on refining nonverbal expressions in several communication domains, such as leadership, counseling, education, and sales. For example, leadership development programs could focus more on facial expressions and tone of voice than on producing perfect speech. In addition, the results point to the importance of nonverbal messages in communication media. For example, text-only messages sent via social network services may not be able to effectively communicate emotional nuances compared with text messages supported by visual and vocal expressions. In summary, our results empirically support Mehrabian’s model2 and emphasize the greater importance of nonverbal expressions for emotional communication compared with verbal expressions.
Our findings also have practical implications. We found that robots have the potential to communicate emotions in a multimodal manner, which has been suggested as a capability of humans1 and could encourage the use of robots for real-life social applications. Social robots have become popular in the home and the workplace as a complement to human labor24. To fulfil this purpose, social robots are expected to interact and communicate with humans in a natural manner25. Our data suggests that androids can effectively convey multimodal emotional expressions to humans, which could lead to optimized collaborations between humans and robots in the future.
Several limitations of this study should be acknowledged. First, the control of emotional impact across modalities in the main experiment was not complete. Although the results of our additional experiments suggested that the stimuli for each modality communicated a comparable level of emotional valence, the data should be interpreted with caution because the vocal expressions contained neutral verbal expressions, and facial and verbal unimodal expressions were unrealistic (i.e., videos without sound and text messages, respectively). It is highly likely that increasing the emotional impact of one modality would enhance its relative importance when presented in combination with other modalities. Future studies should systematically control the emotional impact of the modalities to investigate their effect on Mehrabian’s model.
Second, we employed only the emotion perception task. Several previous studies have indicated that decoding other communication aspects could lead to different results regarding the relative importance of the facial, vocal, and verbal communication modalities9. For example, a previous study showed that verbal information was more important than vocal information in a task in which participants inferred the objective meaning of messages26. Testing the relative importance of the communication modalities using different tasks is an important target for further investigations of android–human communication.
Finally, although we expected a humanlike android to simulate naturalistic inter-human interactions, such an android may have introduced specific artifacts. Previous studies have suggested that humans may feel an aversion to artificial entities that closely resemble humans, a phenomenon known as the “uncanny valley” effect27. In our previous study, we obtained participants’ uncanniness ratings for Nikola’s and humans’ happy facial expressions, and found comparable ratings between these expressions28. However, we did not investigate sad facial expressions or multimodal emotional expressions of any emotion. Since subtle flaws in appearance or motion can be uncanny in very humanlike robots29, slightly unnatural facial, verbal, or vocal emotional expressions may have produced the uncanny valley effect. It would be useful to investigate the uncanny feeling in response to android multimodal emotional expressions in future research.
In summary, we found that an android’s facial, verbal, and vocal expressions explained 47.8%, 31.1%, and 21.2% of the variance in the ratings of emotional valence, respectively. This study is the first to validate Mehrabian’s model, showing that facial expression is the most important aspect of face-to-face emotional communication.
Methods
Main experiment
Participants
This experiment enrolled 39 participants (22 women and 17 men; mean ± standard deviation age, 38.0 ± 8.8 years). The sample size was determined through an a priori power analysis conducted using G*Power software ver. 3.1.9.230. We assumed to test the main effect of emotional expression (negative versus neutral versus positive) using a repeated-measures ANOVA design to detect a medium-sized effect (i.e., f = 0.25), with an α-level of 0.05 and a power (1 – β) of 0.80. The power analysis showed that 28 participants were required. The participants were recruited through advertisements placed in the local community. All participants had normal or corrected-to-normal visual acuity. After the experimental procedures had been fully explained, all the participants provided written informed consent, which was approved by the Ethics Committee of RIKEN. The experiment was performed in accordance with the Declaration of Helsinki.
Experimental design
The experiment used a three-factor within-subjects design, with facial (negative, neutral, and positive), vocal (negative, neutral, and positive), and verbal (negative, neutral, and positive) expressions as the factors.
Apparatus
The android Nikola13 was used to convey multimodal emotional expressions. Nikola was developed in RIKEN for the purpose of studying robot–human emotional interactions. Only the head is robotic, with a height of approximately 28.5 cm and a weight of about 4.6 kg; we used a mannequin body for the other body parts. The android has an appearance similar to that of a male human child. Nikola has 35 actuators: 29 for facial muscle actions, 3 for head movements (roll, pitch, and yaw rotation), and 3 for eyeball control (panning movements of the individual eyeballs and tilt movements of both eyeballs). The movements are driven by pneumatic (air) actuators. The surface of the entire head, except for the back part, is covered in a soft silicone skin. An extended system comprising control valves, an air compressor, and computers controls the actuators and sensors. Nikola was positioned about 1 m in front of the participant.
The experiments were controlled using PsychoPy 2022-2.44031 running on a Windows 11 computer (Alienware Aurora R13; Dell, Round Rock, TX, USA). The participants made their responses using a keyboard. AITalk 6 software (AI Inc., Tokyo, Japan) was used to allow Nikola to produce speech with prosody. Additional in-house programs created using Python (Python Software Foundation, Fredericksburg, VA, USA) and ROS (Open Robotics, Mountain View, CA, USA) were used to control the android’s facial expressions and mouth motions.
Stimuli
As facial expression stimuli, two slightly different facial expressions of each sad, neutral, and happy emotions were produced by Nikola (Fig. 3; Movie S1). The sad and happy facial expressions were validated in a previous study13. In the study13, we presented photographs of sad and happy facial expressions of Nikola, which activated the same action units as the stimuli in this study, and asked 30 Japanese participants to label the photographs of these expressions. The recognition accuracy for both sad and happy facial expressions was higher than chance level.
Fig. 3.

Stimulus presentation. The android Nikola produced facial, vocal, and verbal expressions conveying negative (sad), neutral, and positive (happy) emotions in face-to-face situations. In the figure, Nikola is producing a facially sad, vocally happy, and verbally neutral expression. See also Movie S1.
As prosody stimuli, two slightly different vocal tones were produced using AITalk software (AI Inc., Tokyo, Japan) for the sadness, neutral, and happiness conditions. The software has been used in some previous studies and has been shown to appropriately communicate negative, neutral, and positive emotional tone in agents’ voice32,33. AITalk added a sad or happy tone to the voice, with an emotional intensity ranging from 0 to 1. Our preliminary assessments showed that the reading of text messages could lead to vocal artifacts when using intensity > 0.8; hence, we used intensity values of 0.6 and 0.7.
As verbal content stimuli, two synonymous and short self-referential sentences were constructed in Japanese for the sadness, neutral, and happiness conditions. The sentences were constructed based on those used in a previous study4. The following six sentences were used.
I feel sad.
I feel sorrowful.
I feel neutral.
I feel normal emotions.
I feel happy.
I feel delighted.
Procedure
The experiments were conducted in a chamber on an individual basis. The participants were seated comfortably in front of Nikola; the chair height could be adjusted to allow eye contact with the android, and there was a table between the participant and the android. The participants were instructed to rate the emotion expressed by Nikola in terms of emotional valence, from − 4 (very negative) to + 4 (very positive). We also clarified that the task was not to rate their own emotional states.
The participants completed a total of 216 trials, distributed over four blocks of 54 trials. Each block comprised an equal number of facial, vocal, and verbal expression conditions. The order of the conditions was randomized within each block. A break was provided after each block. At the beginning of the experiment, we showed all possible (i.e., 27) combinations of Nikola’s emotional expressions to participants to illustrate the stimuli. After that, we asked the participants to complete 15 practice trials under randomly selected conditions to make them familiar with the rating task.
In each trial, Nikola looked up from an initially slumped position, opened its eyes, and said, “Let’s go.” Then, the android produced facial and vocal/verbal expressions simultaneously. The facial expressions were displayed for 2,000 ms, and the vocal/verbal expressions lasted for 1,020–1,360 ms. After each expression, Nikola’s face returned to a neutral expression within 1,000 ms before the android slumped down and closed its eyes. A beep sound played for 500 ms as Nikola slumped down, and the participants rated its expression in terms of emotional valence (range: from − 4 [very negative] to 4 [very positive]). During the 3,000–5,000 ms intertrial intervals, Nikola continuously made subtle motions (e.g., breathing) to enhance its human-like impression.
Data analysis
Data were analyzed using JASP 0.14.1 software34. The mean valence rating was calculated for each condition and participant. The ratings were analyzed using a 3 (facial) × 3 (vocal) × 3 (verbal) repeated-measures ANOVA model. As a manipulation check, planned contrasts of the simple-simple main effects (27 conditions; e.g., differences between the negative, neutral, and positive in verbal content for the facially negative and prosodically negative condition) were tested to confirm emotional differences (i.e., negative < neutral < positive) using one-tailed t-values. Then, ANOVA was conducted, and the relative importance of the communication modalities was evaluated by comparing the F-values of the main effects, as in previous studies18. As preliminary analyses showed the same order of F-values in male and female participants and our sample size was insufficient to test for sex differences, the results were reported without considering sex as a factor. The results of all statistical tests were deemed statistically significant at p < 0.05.
Additional experiment 1
Participants
This study enrolled 60 participants (30 women and 30 men; mean ± standard deviation age, 35.2 ± 4.6 years). The sample size was determined through an a priori power analysis conducted using G*Power software ver. 3.1.9.230. We assumed a two-way mixed-design ANOVA to test the interaction between modality and emotion, aiming to detect a medium-sized effect (i.e., f = 0.25), similar to the main experiment, with an α-level of 0.05 and a power (1 – β) of 0.95. We planned a powered study to interpret the null result as evidence of no meaningful effect35. The power analysis showed that 54 participants were required. Twenty participants were randomly assigned to each of the three modality conditions. The participants were recruited through online advertisements in CrowdWorks (Tokyo, Japan). After the experimental procedures had been fully explained, all participants provided written informed consent. The experiments were approved by the Ethics Committee of RIKEN and were performed in accordance with the Declaration of Helsinki.
Experimental design
The experiment used a two-factor mixed randomized-repeated design, with modality (facial, vocal, and verbal) as a between-subjects factor and emotion (negative, neutral, and positive) as a within-subjects factor.
Stimuli
For facial expression stimuli, the six videotaped facial expressions of negative, neutral, and positive emotions from the main experiment were used without sound (i.e., no vocal or verbal information).
For vocal expression stimuli, the recorded sounds of six verbal negative, neutral, and positive expressions, with one neutral verbal expression (i.e., “I feel normal emotions”), used in the main experiment were presented without sound (i.e., no facial and neutral vocal information).
For verbal expression stimuli, the texts of verbal negative, neutral, and positive expressions used in the main experiment were presented (i.e., with no facial or vocal information).
Procedure
The experiments were conducted via the Qualtrics online platform (Seattle, WA). Each participant was randomly assigned to either the facial, vocal, or verbal modality condition. The participants completed a total of six trials, with two trials per emotion condition. The trial order was randomized. At the beginning of the experiment, all stimuli were presented to the participants so that they could familiarize themselves with them. In each trial, the stimuli were displayed, and the participants rated the expressions in terms of emotional valence (range: −4 [very negative] to 4 [very positive]).
Data analysis
Data were analyzed using JASP 0.14.1 software34. The mean valence rating was calculated for each emotion condition and participant. The ratings were analyzed using a 3 (modality) × 3 (emotion) ANOVA. For the significant main effect of emotion, planned contrasts were applied to examine expected emotional differences (i.e., negative < neutral < positive) using one-tailed t-values. The results of all statistical tests were deemed statistically significant at p < 0.05.
Additional experiment 2
Procedure
We presented the unimodal stimuli used in additional experiment 1 to ChatGPT o4-mini-high (OpenAI, San Francisco, CA, USA; https://chatgpt.com/) on 17 June 2025, and asked it to rate the expressions in terms of emotional valence (range: −4 [very negative] to 4 [very positive]). ChatGPT is a state-of-the-art AI agent based on large language models19 and o4-mini-high is one of the latest versions. Previous studies have demonstrated that, similar to humans, ChatGPT can perceive emotions from facial21, vocal22, and verbal23 emotional expressions.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Acknowledgements
The authors thank Megumi Sakiyama for her technical support.
Author contributions
Conceived and designed the experiments: WS, KS, and MT. Performed the experiments: WS, and KS. Analyzed the data: WS. Wrote the paper: WS, KS, and MT.
Data availability
All data analyzed during this study are included in this published article and its supplementary information files.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Argyle, M. Bodily communication (Methuen, 1975).
- 2.Mehrabian, A. & Ferris, S. R. Inference of attitudes from nonverbal communication in two channels. J. Consult Psychol.31, 248–252 (1967). [DOI] [PubMed] [Google Scholar]
- 3.Mehrabian, A. & Wiener, M. Decoding of inconsistent communication. J. Pers. Soc. Psychol.6, 109–114 (1967). [DOI] [PubMed] [Google Scholar]
- 4.Mehrabian, A. Nonverbal communication in Nebraska Symposium on Motivation 1971 (ed. Cole, J. K.) 107–161 (University of Nebraska Press, 1971).
- 5.Jacob, H. et al. Nonverbal signals speak up: association between perceptual nonverbal dominance and emotional intelligence. Cogn. Emot.27, 783–799 (2013). [DOI] [PubMed] [Google Scholar]
- 6.Pelzl, M. A. et al. Reduced impact of nonverbal cues during integration of verbal and nonverbal emotional information in adults with high-functioning autism. Front. Psychiatry. 13, 1069028 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Lang, P. J., Bradley, M. M. & Cuthbert, B. N. Emotion, motivation, and anxiety: brain mechanisms and psychophysiology. Biol. Psychiatry. 44, 1248–1263 (1998). [DOI] [PubMed] [Google Scholar]
- 8.Reisenzein, R. Pleasure-arousal theory and the intensity of emotions. J. Pers. Soc. Psychol.67, 525–539 (1994). [Google Scholar]
- 9.Noller, P. Video primacy-A further look. J. Nonverb Behav.9, 28–47 (1985). [Google Scholar]
- 10.Amsel, T. T. An urban legend called: the 7/38/55 ratio rule. Eur. Polygraph. 13, 95–99 (2019). [Google Scholar]
- 11.Lapakko, D. Three cheers for language: A closer examination of a widely cited study of nonverbal communication. Commun. Educ.46, 63–67 (1997). [Google Scholar]
- 12.Hsu, C. T., Sato, W. & Yoshikawa, S. Enhanced emotional and motor responses to live vs. videotaped dynamic facial expressions. Sci. Rep.10, 16825 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Hsu, C. T. et al. Enhanced mirror neuron network activity and effective connectivity during live interaction among female subjects. Neuroimage263, 119655 (2022). [DOI] [PubMed] [Google Scholar]
- 14.Lapakko, D. Communication is 93% nonverbal: an urban legend proliferatesn. Communication Theater Association Minn. J.34, 7–19 (2007). [Google Scholar]
- 15.Sato, W. et al. An android for emotional interaction: Spatiotemporal validation of its facial expressions. Front. Psychol.12, 800657 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Gray, H. M., Gray, K. & Wegner, D. M. Dimensions of Mind perception. Science315, 619–619 (2007). [DOI] [PubMed] [Google Scholar]
- 17.Sato, W. & Yoshikawa, S. Enhanced experience of emotional arousal in response to dynamic facial expressions. J. Nonverbal Behav.31, 119–135 (2007). [Google Scholar]
- 18.Argyle, M., Alkema, F. & Gilmour, R. The communication of friendly and hostile attitudes by verbal and non-verbal signals. Eur. J. Soc. Psychol.1, 385–402 (1971). [Google Scholar]
- 19.Liu, J. & ChatGPT Perspectives from human-computer interaction and psychology. Front. Artif. Intell.7, 1418869 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Anderson, N. H. Note on weighted sum and linear operator models. Psychon Sci.1, 189–190 (1964). [Google Scholar]
- 21.Kramer, R. S. S. Identifying basic emotions and action units from facial photographs with ChatGPT. J. Nonverbal Behav. (2025).
- 22.Santoso, J., Ishizuka, K. & Hashimoto, T. Large language model-based emotional speech annotation using context and acoustic feature for speech emotion recognition. ICASSP 2024–2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (2024).
- 23.Banimelhem, O. & Amayreh, W. The performance of ChatGPT in emotion classification. 2023 14th International Conference on Information and Communication Systems (ICICS). (2023).
- 24.Vishwakarma, L. P., Singh, R. K., Mishra, R., Demirkol, D. & Daim, T. The adoption of social robots in service operations: A comprehensive review. Technol. Soc.76, 102441 (2024). [Google Scholar]
- 25.Bal, F., Tekerek, M., Gök, M. & Şimşir, R. Human-robot interaction with social humanoid robots. J. Sci. Eng.11, 94–102 (2024). [Google Scholar]
- 26.Solomon, D. & Yaeger, J. Effects of content and intonation on perceptions of verbal reinforcers. Percept. Mot Skills. 28, 319–327 (1969). [DOI] [PubMed] [Google Scholar]
- 27.Mori, M. The uncanny valley. Energy7, 33–35 (1970). [Google Scholar]
- 28.Diel, A., Sato, W., Hsu, C. T. & Minato, T. Asynchrony enhances uncanniness in human, android, and virtual dynamic facial expressions. BMC Res. Notes. 16, 368 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.MacDorman, K. F. & Ishiguro, H. The uncanny advantage of using androids in cognitive and social science research. Interact. Stud.7, 297–337 (2006). [Google Scholar]
- 30.Faul, F., Erdfelder, E., Lang, A. G. & Buchner, A. G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav. Res. Methods. 39, 175–191 (2007). [DOI] [PubMed] [Google Scholar]
- 31.Peirce, J. W. et al. PsychoPy2: experiments in behavior made easy. Behav. Res. Methods. 51, 195–203 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Lee, J. Generating robotic speech prosody for human robot interaction: A preliminary study. Appl. Sci.11, 3468 (2021). [Google Scholar]
- 33.Shimabe, T., Yoshimura, E., Tsuchiya, S. & Watabe, H. Speech synthesis expressing emotions for communication robots. Forum Inform. Technol.11, 241–242 (2012). [Google Scholar]
- 34.Team, J. A. S. P. JASP (Version 0.14.1). [Computer software] (2020).
- 35.Quertemont, E. How to statistically show the absence of an effect. Psychol. Belg.51, 109–127 (2011). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data analyzed during this study are included in this published article and its supplementary information files.

