Exploring Complexity of Facial Dynamics in Autism Spectrum Disorder

Pradeep Raj Krishnappa Babu; J Matias Di Martino; Zhuoqing Chang; Sam Perochon; Kimberly LH Carpenter; Scott Compton; Steven Espinosa; Geraldine Dawson; Guillermo Sapiro

doi:10.1109/taffc.2021.3113876

. Author manuscript; available in PMC: 2024 Apr 1.

Published in final edited form as: IEEE Trans Affect Comput. 2021 Sep 20;14(2):919–930. doi: 10.1109/taffc.2021.3113876

Exploring Complexity of Facial Dynamics in Autism Spectrum Disorder

Pradeep Raj Krishnappa Babu ¹, J Matias Di Martino ², Zhuoqing Chang ³, Sam Perochon ⁴, Kimberly LH Carpenter ⁵, Scott Compton ⁶, Steven Espinosa ⁷, Geraldine Dawson ⁸, Guillermo Sapiro ⁹

PMCID: PMC10231874 NIHMSID: NIHMS1884318 PMID: 37266390

Abstract

Atypical facial expression is one of the early symptoms of autism spectrum disorder (ASD) characterized by reduced regularity and lack of coordination of facial movements. Automatic quantification of these behaviors can offer novel biomarkers for screening, diagnosis, and treatment monitoring of ASD. In this work, 40 toddlers with ASD and 396 typically developing toddlers were shown developmentally-appropriate and engaging movies presented on a smart tablet during a well-child pediatric visit. The movies consisted of social and non-social dynamic scenes designed to evoke certain behavioral and affective responses. The front-facing camera of the tablet was used to capture the toddlers’ face. Facial landmarks’ dynamics were then automatically computed using computer vision algorithms. Subsequently, the complexity of the landmarks’ dynamics was estimated for the eyebrows and mouth regions using multiscale entropy. Compared to typically developing toddlers, toddlers with ASD showed higher complexity (i.e., less predictability) in these landmarks’ dynamics. This complexity in facial dynamics contained novel information not captured by traditional facial affect analyses. These results suggest that computer vision analysis of facial landmark movements is a promising approach for detecting and quantifying early behavioral symptoms associated with ASD.

Keywords: Autism spectrum disorder, Affective state, Facial dynamics, Complexity, Sample entropy, Multiscale entropy

1. Introduction

Facial expressions are often used as a mode of communication to initiate social interaction with others [1], [2], and are one of the key social behaviors used by infants during early development [3]. Individuals with autism spectrum disorder (ASD) often experience challenges in establishing social communication coupled with difficulties in recognizing facial expressions and using them to communicate with others [4]. Reduced sharing of affect and differences in use of facial expressions for communication are core symptoms of ASD and are assessed as part of standard diagnostic evaluations, such as the Autism Diagnostic Observation Schedule (ADOS) [5]. Children with ASD more often display neutral affect and ambiguous expressions compared to children with other developmental delays and typically developing (TD) children [6].

Standardized observational assessments of ASD symptoms require highly trained and experienced clinicians [7]. Research on facial expressions usually involves manually coding of observations of facial expressions from recorded videos, based on complex and time-intensive facial action coding systems. These methods are difficult to deploy at scale and universally. Therefore, researchers have been employing technological advancements to capture facial expressions using motion capture and computer vision (CV) [8], [9]. The application of CV can help to quantify the intensify of emotional expression and the atypicality of facial expressions [7]. Prior work in CV shows that quantification of the differential ability in producing facial expressions can distinguish children with typical development versus ASD [10]. CV can also help in understanding the lag in developmental stages of facial expression production, offering cues to understand the emotional competence faced by the individuals with ASD [11].

Exploiting CV, it was shown that the facial expressions of children with ASD were often ambiguous [9], a result in agreement with a study using a non-CV approach [6]. A recent study [12] extracted the dynamics of facial landmarks to estimate the group differences across various emotional expressions. Individuals with ASD exhibited a higher range of facial landmarks’ dynamics compared to TD individuals across all emotions assessed. To quantify the complexity of facial landmarks’ dynamics, researchers have started to explore computational tools such as autoregressive models [13] and entropy measures [14]. These studies found that individuals with ASD exhibit distinctive complexity in their facial dynamics, compared to TD individuals when they were asked to mimic given emotions. One of the standard measures used to analyze the complexity of physiological signals (e.g., facial dynamics) is the multiscale entropy (MSE) [14]–[18], discussed and extended in this work.

The present work focuses on analyzing the complexity of spontaneous facial dynamics of toddlers with and without ASD. Toddlers watched developmentally-appropriate and engaging movies presented on a smart tablet. Simultaneously the frontal camera of the tablet was used to record the toddlers’ faces, providing the opportunity for the automatic analysis via CV. Specifically, we studied the facial landmarks’ dynamics of the toddlers with ASD versus TD, quantified in terms of a complexity estimate derived from MSE analysis.

We hypothesized that the complexity in landmarks’ dynamics would differentiate toddlers with and without ASD, offering a distinctive biomarker. We hypothesized that the toddlers with ASD would exhibit higher complexity (i.e., less predictability) in their landmarks’ dynamics associated with regions such as the eyebrows– representing their uniqueness in raising eyebrows [19], and mouth– potentially related to atypical vocalization patterns [20]–[22]. Furthermore, we were interested in exploring whether our findings would support previous work showing atypical eyebrow [19] and mouth [22] movements in the ASD population. Lastly, we also examined whether the complexity in landmarks’ dynamics provides complementary and nonredundant information to the estimated affective expressions that the toddlers manifested in response to the presented movies, or if they provide redundant information. In one of our previous studies [19], we examined affect (i.e., emotional expressions) variation over a period of time while the toddlers were engaged and watched the presented movies. Though the work [19] presented the feasibility of distinguishing between the ASD and TD groups based on patterns of affective expression, in our current work, we investigate the possibility of using the complexity of the raw facial landmarks’ dynamics without considering any variation in affect. This is motivated in part by the fact that individuals with ASD are prone to elicit a mixture of emotions at the same time [6], therefore using the complexity of the raw landmarks’ dynamics can offer further confidence and add additional information beyond affect. Thus, for a much larger dataset and re-designed stimuli, we replicate the affect-related analysis similar to [19], and show that the complexity in facial dynamics are an independent and more powerful measure.

In this work, we demonstrate the following: (a) the MSE, as here extended to handle time-series with partially missing data and to compare across subjects, can characterize complexity in facial landmarks’ dynamics; (b) the complexity in landmarks’ dynamics can distinguish between ASD and TD groups; and (c) this complexity information is complementary to information about affective expressions estimated from computer vision-based algorithms.

2. Data Collection and Stimuli

2.1. Participants and Study Procedures

Toddlers between 17–36 months of age were recruited at four pediatric primary care clinics during their well-child visit. Toddlers received a commonly-used, caregiver-completed autism screening questionnaire, Modified Checklist for Autism in Toddler – Revised with Follow-up (M-CHAT-R/F) [23] as part of routine clinical care. If a child screened positive on the M-CHAT-R/F, or a caregiver/clinician expressed any developmental concern, the child was evaluated by a child psychologist based on the Autism Diagnostic Observation Schedule – Toddler (ADOS-T) module [24]. The exclusion criteria were: (i) known hearing or vision impairments; (ii) child too upset; (iii) caregiver expressed no interest, not enough time, needed to take care of siblings, or unable to give consent in English or Spanish; (iv) child did not complete study procedures; and (v) clinical information missing. A total of 436 children (TD = 396 and ASD = 40) participated; 82.8% of caregivers had a college degree, 15.5% had more than high school education, and 1.8% did not have a high school education; for additional demographics please see Table 1. Caregivers /legal guardians provided written informed consent, and the study was approved by the Duke University Health System Institutional Review Board (Pro00085434, Pro00085435).

TABLE 1.

Demographics of the study participants

			ASD

Number of Participants		396	40
		N (%)
Males		194 (48.9%)	31 (77.5%)
Females		202 (51.1%)	9 (22.5%)
	White/ Caucasian	308 (77.7%)	19 (47.5%)
Race	African American	34 (8.6%)	6 (15.0%)
	Others	54 (13.6%)	15 (37.5%)
Hispanic / Latino		29 (7.3%)	13(32.5%)
M-CHAT-R/F (Positive)		1 (025%)	35 (87.5%)
		M±SD
ADOS-T Total		3.0±0.0	7.64±1.69
Age in months		20.62±3.21	24.21±4.72

Open in a new tab

Note: One participant in the TD group screened positive on M- CHAT-R/F, was evaluated with ADOS-T and not diagnosed with ASD.

Caregivers were asked to hold their child on their lap and an iPad was placed at a distance of ~60 cm from the child. The tablet’s frontal camera recorded the child’s behavior while short movies (each less than a minute and a half long) were presented on the tablet’s screen. The movies consisted of both social and non-sodal components; see Fig. 1 for illustrative snapshots. To minimize any distractions during the study, all other family members and the practitioners who administered the study were asked to stay behind both the caregiver and the child.

Fig. 1. — Movies used in this study: The social movies were (a) Blowing Bubbles, (b) Spinning Top, (c) Rhymes and Toys, and (d) Make Me Laugh; the non-social movies were (e) Mechanical Puppy, and (f) Dog in Grass Right-Right-Left.

2.2. Movies

The movies were presented through an application (app) on a tablet. These movies were strategically designed to elicit autism-relevant behaviors. For our current analysis, we categorized the movies according to whether they contained primarily social versus non-social elements (Fig. 1). During the social movies, human actors demonstrated engaging actions, such as making eye contact, smiling, making gestures, acting silly, and narrating rhymes. The non-social movies were dynamic toys or animations. We studied the participants’ response to 6 movies briefly described next.

Blowing Bubbles (~64 secs): A man held a bubble wand and blew the bubbles, with some attempts successful and some failing, with eye contact, smiling and frowning. The movie included limited talking from the actor.

Spuming Top (~53 secs): An actress played with a spinning top with both successful and unsuccessful attempts along with eye contact smiling and frowning. The movie included limited talking from the actress.

Rhymes and Toys (~49 secs): An actress recited nursery rhymes, such as Itsy-Bitsy Spider, while smiling and gesturing, followed by a series of dynamic, noise-making toys which were shown without the presence of the actress on the scene.

Make Me Laugh (~56 secs): An actress engaged in silly, funny actions while smiling and making eye contact.

Mechanical Puppy (~25 secs): A mechanical toy puppy barked and walked towards vegetable toys.

Dog in Grass Right-Right-Left (RRL) (~40 secs): A barking puppy was shown in different parts of the screen, followed by a series of appearances in a right-right-left (RRL) pattern.

3. Methods and Analysis

3.1. Facial Landmark Detection and Preprocessing

A face detection algorithm was used first to identify the number of faces detected in each of the recorded video frames [25]. Using a low-dimensional facial embedding [26], [27], we ensured that we tracked only the participant’s face throughout the video, ignoring other detected faces associated with the caregiver, siblings (if any), and clinical practitioners. Once the child’s facial image was detected and tracked, we extracted 49 facial landmark points [28]. These 2D positional coordinates of 49 facial landmarks were time synchronized with the presented movies.

The facial landmarks were then preprocessed in two steps for our further analysis, namely (1) first compensating for the effects due to global head motions via global shape alignment, and (2) removing the time segments when the participants were not attending to the stimuli. For step 1, we utilized the points from the comers of the eyes and the nose (Fig. 2(a)) to align and normalize the data to a canonical face model through an affine transformation (refer to [29] for more details). During this process, we also estimated the head pose angles $θ_{r o l l}, θ_{p i t c h}$ and $θ_{y a w}$ (see upper plot in Fig. 2(b)) as angular coordinates relative to the camera. Step 1 of preprocessing was crucial since we did not want our landmarks’ dynamics to be contaminated with additional noise caused due to head motions, rather only to study the actual facial expressions. In addition, the videos were recorded at 30 frames per second, and we removed the high frequency components of the landmark signals (generally associated with noise). To this end, we filtered the signal components above 15 Hz, this frequency value was chosen based on the facts that we require at least 260 ms to capture a facial expression [30], and 150–250 ms to exhibit a valid gaze fixation duration [31], [32].

Fig. 2. — Landmarks and associated features, representing (a) the 49 landmark points and (b) an example representing the time-series of extracted features such as headpose angles, landmarks’ dynamics associated with eyebrows and mouth regions, and affect probabilities for a 20 second window of a randomly chosen TD participant. The dotted circles with red colored landmark points (in (a)) were used for the preprocessing (face transform to canonical form). The blue dots represent landmarks for the eyebrows and the green ones were for mouth. The red colored masks (in (b)) exemplify filtered-out intervals in the time-series.

For the step 2 of preprocessing, we were interested in studying the dynamics of the facial landmarks as a spontaneous response to the presented movies, we focus our analysis on time segments in which the participants were considered to be engaged with the presented movies. To this end, we filtered the segments considering two criteria: (1) extreme non-frontal headpose, and (2) rapid head movement (this in part can render computation of landmarks very unstable). A non-frontal headpose was defined as the frames where the headpose angles lied outside the ranges $\pm 20^{\circ}$ for the $θ_{p i t c h}$ and $θ_{y a w}$ , and $\pm 45^{\circ}$ for the $θ_{r o l l}$ . Frames containing rapid head movement were removed by analyzing the angular speed of the headpose. We calculated a one second moving average $(θ^{’})$ of the time-series data for $θ_{r o l l}, θ_{p i t c h}$ and $θ_{y a w}$ . Using these smooth versions of the headpose coordinates, we excluded frames where the difference $(θ_{d i f f})$ between the current frame $i$ and the previous frame $i - 1$ was more than $5^{\circ}$ , estimated using Equation (1),

θ_{d i f f} = θ_{i}^{’} - θ_{i - 1}^{’} .

(1)

3.2. Computation of Facial Landmarks’ Dynamics

After the above described preprocessing, and considering only the valid / attending frames, we extracted the landmarks’ dynamics, concentrating on the eyebrows and the mouth regions (Fig. 2(a)). To do so, we estimated each landmark’s Euclidean displacements from the current frame i to the previous frame i−1, for all the points belonging to the eyebrows and mouth regions (represented as blue and green points in Fig. 2 (a)). If any of the i or i−1 frames was missing, the respective Euclidean displacement was considered missing. Finally, an average value was computed by combining all the landmark points belonging to either of the eyebrows or mouth regions to form a one-dimensional time-series data for each of the two regions (middle plot in Fig. 2(b)).

3.3. Multiscale Entropy for Measuring the Complexity of Landmarks’ Dynamics

Multiscale entropy (MSE) is used as a measure of dynamic complexity, quantifying the randomness or unpredictability of a time-series physiological signal operating at multiple temporal scales [15], [16], including facial dynamics [14], [17], Briefly, entropy helps in quantifying the unpredictability or randomness in a sequence of numbers; the higher the entropy, the higher its unpredictability. The sample entropy is a modified version widely used to assess the complexity of time-series [33], [34]. These concepts are formalized next.

The MSE is estimated by calculating the sample entropy on a time-series data $X = \{x_{1}, \dots, x_{i}, \dots, x_{N}\}$ with length $N$ at multiple timescales $τ$ . To this end, the time-series $X$ is represented at multiple resolutions $({y_{j}^{(τ)}})$ by coarse-graining $X$ as

y_{j}^{(τ)} = \frac{1}{τ} \sum_{i = (j - 1) τ + 1}^{j τ} x_{i}, w h e r e 1 \leq j \leq \frac{N}{τ} .

(2)

Here, we downsampled the landmarks’ dynamics time-series data across the frames for 0 to 30 scales. During this downsampling process, a downsampled data point $y_{i}$ was filled with the average values of $x_{i} - s$ , only when a minimum of $50 %$ of the $x_{i}$ -s were not missing (see Fig 3(a), where dotted circles represent the missing data points). Once we coarse-grained the landmarks’ dynamics for the eyebrows and mouth regions, the sample entropy was computed for each scale to obtain the MSE, further detailed next.

Fig. 3. — Schematic of measuring multiscale entropy (MSE) using sample entropy (*SampEn*) with missing values, (a) Example of generating coarse-grained time-series data for three different scales, (b) Example of calculating *SampEn* with m=2. Template vectors of m and m+1 were defined from a time-series signal with varying amplitude having data points across time (columns, N=28) lying on a specific tolerance (r, represented as equally spaced dotted-lines/rows). The dotted circles indicate that the specific data point was missing in the original time-series. The negative natural logarithm was computed for the ratio between the number of matching (or repeating) templates associated with the m and m+1 dimensional vectors. For example, in the 2-dimesional (m=2) vectors case, the template vectors denoted as X was repeated 2 times, # was repeated 3 times, * was repeated 4 times, and so on. In total, the repeated vector sequences was C^m=16. Similarly, for the m+1-component vectors the number of repeated vector sequences was C^{m +1}=5. Then the *SampEn* for this example was *−ln(5/16).* Please note that the embedding vectors having missing data points marked with a red rectangular area were not considered while estimating C^m and C^{m +1}. Instead, if we would have just concatenated the data points, the template 16 or 17 (without considering the missing value) would have matched with 25 and 26, thereby (artificially) increasing O without necessarily increasing C^{m +1}, or vice versa, leading to inaccurate results.

As mentioned before, the sample entropy is a measure of the irregularity of a signal; at a given embedding dimension $m$ and a positive scalar tolerance $r$ , the sample entropy is given by the negative logarithm of the conditional probability that if the sets of simultaneous data points having length $m$ repeats within the distance $r$ , then the sets of simultaneous data points having length $m + 1$ also repeats within the distance $r$ . Consider a time-series (landmarks’ dynamics) of length $N$ as $X = \{x_{1}, \dots, x_{i}, \dots, x_{N}\}$ , from which the $m$ -dimensional vector $X_{i}^{m} = \{x_{i}^{m}, x_{i + 1}^{m}, x_{i + 2}^{m}, \dots, x_{i + m - 1}^{m}\}$ is formed. The distance $d (x_{i}^{m}, x_{j}^{m})$ between any two vectors is defined as

d (x_{i}^{m}, x_{j}^{m}) = m a x \{|x_{i + k - 1}^{m} - x_{j + k - 1}^{m}|\}, k = 1, 2, \dots, m .

(3)

Consider $C^{m} (r)$ to be the cumulative sum of the number of repeating vectors in the m-dimension (see Fig. 3(b)) under the condition $d (x_{i}^{m}, x_{j}^{m}) \leq r$ , with $i \neq j$ , and analogously $C^{m + 1} (r)$ be the cumulative sum of repeating vectors in $m + 1$ -dimension. Then, the sample entropy (SampEn) is defined as

S a m p E n = - l n (C^{m + 1} (r) / C^{m} (r)) .

(4)

When estimating $C^{m}$ and $C^{m + 1}$ , the choice of the dimension $m$ , the tolerance value $r$ [34], and the missing data in the landmarks’ dynamics [34], [35] play a vital role. This is discussed next.

For our study, we set m=2 because it was the most commonly used value in previous similar studies, e.g., [14]–[17]. However, the choice of different m values did not greatly affect our findings (see Appendix I). We have selected r=0.15σ, where σ is considered to be a standard deviation of the time-series. The value 0.15 was chosen as suggested in other studies, e.g., [16], [35], [36]. It was also evident that a time-series having noisy spikes can increase the tolerance r because it is a function of σ, which can be vulnerable to noise [34], As a result even if the time-series data was complex (irregular), as r increases, we allow a higher degree of tolerance to match repeating vector sequences in both m and m+1 dimensions, resulting in a lower SampEn value. This can be potentially mitigated by removing the noisy spikes; we have already taken care of this for our landmarks’ dynamics via the previously described preprocessing steps. Another challenge arises when we compare the SampEn computed on the landmarks’ dynamics that came from two different participants having two different values of σ influencing the tolerance r. There is a persisting misconception in the literature in using σ calculated on a time-series at the per-participant level, and comparing the resulting SampEn across different participants, causing severe bias in the final outcome. To overcome this spurious effect and compare signals consistently across participants, for the definition of r, we used the population standard deviation rather than the standard deviation associated with each participant.¹ See Appendix II for an extended discussion and numerical examples demonstrating its importance.

Finally, it could be possible that individuals with ASD tend to exhibit large amounts of head movements, causing a large amount of missing data in the time-series, challenging the SampEn estimation. To handle any such missing data and consistently estimate the SampEn, we selected the segment in m+1 dimension only if the respective vector was embedded with no missing data (see Fig 3). Only those data points belonging to these segments were considered during the computation of the coefficients $C^{m} (r)$ and $C^{m + 1} (r)$ (similar to [35]). Additionally, at least 40% of the data was necessary to perform effective complexity analysis [34]–[36]. Below this threshold the estimation of the SampEn may not be reliable. For our analysis, the 40% depended on the length of the movies. Participants having less than 40% data were removed from the analysis for a specific movie.

3.4. Estimation of Affective States

In addition to studying the complexity of facial dynamics, we also considered the more standard approach of investigating affective expressions, and explicitly show that these two measurements are not redundant For consistency, we used the pose-invariant affect method from our previous work [37], while other approaches could be used for this secondary analysis. We estimated the probability of a participant’s three different categories of affective expression using four facial expressions: positive (happy expression), neutral (neutral expression) and negative (angry and sad expression). Figure 2(b) illustrates the time-series of the probabilities associated with positive, neutral, and negative affect. These quantities were estimated for those frames during which the participant was considered to be engaged (Section 3.1) and at a rate of 30 frames per second. To analyze the evolution of the displayed affect, we considered the first-derivative of each of these affect-based time-series. Finally, we computed the rate of change for each affect, defined as a moving average over 10 frames (1/3 second), followed by cumulative sum of these values to obtain energy of the signal. Previous work, [29], presented validation results between human coding and the computer vision based automatic coding for ‘positive’ and ‘other’ (neutral and negative) affective expressions on a frame-by-frame basis for 136,450 frames (belonging to different participants). The validation results showed excellent intra-class correlation coefficient (ICC) with 0.9 inter-rater reliability performance.

3.5. Statistical Analysis

Statistically significant difference between the groups’ distribution was tested using Mann–Whitney U test, in particular, the python function pingouin.mwu was used. Within group comparisons (e.g., to compare between the eyebrows and mouth regions) were performed using Wilcoxon signed-rank test with python function pingouin.wilcoxon. Effect sizes were estimated with the standard r value for both significant tests. Since, there could be possible confounds due to covariates such as (1) ‘percentage of missing data’ that was removed during preprocessing, and (2) ‘the variation in the landmark movements’ on our complexity measures; we have performed additional statistics, e.g., analysis of covariance (AN-COVA) using pingouin.ancova. Additionally, for cross-correlation analysis, we computed Spearman’s ρ using scipy.stats.spearmanr in python. A decision tree-based classifier [38] was used to assess the possible separation between the ASD and TD groups using our complexity analysis and affect-related measures while considering each individual movies. ‘Gini impurity’ was used as automatic splitting criteria. The differences between Area Under the Curves (AUCs) of the Receiver Operating Characteristic (ROC) based on the different features and movies were compared using the DeLong method [39]. Additionally, logistic regression (with a python function, sklearn.linear_model. LogisticRegression) was used to estimate odds ratio to predict the risk for ASD in toddlers. For the logistic regression, we have again used both the proposed landmarks’ complexity and the affect-related measures.

4. Results

The two main questions explored in this section are: (1) whether the estimation of complexity in landmarks’ dynamics can be used as a distinctive biomarker to distinguish between ASD and TD participants, and (2) whether the estimated complexity measure adds value beyond traditional measures of facial emotional expressions.

4.1. Complexity of Facial Landmarks’ Dynamics

To address our first research question, we estimated the MSE for the eyebrows and mouth regions of the participants in both the ASD and TD groups (Fig. 4). Irrespective of the movies, whether the movie contained social or non-social content, the participants in the ASD group exhibited higher complexity (higher MSE) in their landmarks’ dynamics, reflecting a higher level of ambiguous movements in such facial landmarks. The SampEn values were significantly different across the first 20 resolutions of the time-series for 4 out of the 6 presented movies (Fig. 4). Though the values were still significantly different between the ASD and TD groups, the mean difference was less pronounced across the multiple scales during the non-social movies (such as Dog in Grass RRL and Mechanical Puppy with p < .01 / .05 and the effect sizes (r) were medium, in the .32–4 range: see Fig. 4) compared to the social movies (p < .0001 with effect sizes (r) being large to very large, in the .51–.82 range).

Fig. 4. — Results of the MSE for all the social and non-social movies with significant differences (p-value) between the ASD and TD groups, represented across the scales (1–30). The number of participants in the ASD and TD group varies across the movies since we have considered only individuals having at least 40% of valid data per each movie. Symbols: Δ→ p< .0001, + → p< .001, ○→ p< .01 and ☆→ p< .05.

Additionally, we computed the cumulative sum of SampEn values to compare between social and non-social movies. To do so, we aggregated the SampEn values associated with the first 20 scales for all the participants (integrated entropy: see Fig. 5). The results indicated that the integrated entropy was highly significantly different between the ASD and TD groups for all the social movies, having p < .00001 with large or very large effect sizes (r) for both the eyebrows and mouth regions (Fig. 5). In contrast, the effect sizes (r) were smaller or medium for the non-social movies, though still significantly different (p< .01). In short, the integrated entropy computed from the facial dynamics offered more confidence in distinguishing the ASD and TD groups, particularly while they watched the social movies.

Fig. 5. — Comparative analysis of *integrated entropy* between the ASD and TD groups for all the social and non-social movies. Symbols: Δ→ > p< .00001 and ○→ p< .01. The numbers inside the figure represent the effect size r. *BB = Blowing Bubbles, MML = Make Me Laugh, MML = Make Me Laugh, ST = Spinning Top, RAT = Rhymes and Toys, RRL = Dog in Grass RRL, and Mpuppy = Mechanical Puppy.* Refer to Fig. A6 in the supplementary for scatter plot distributions of the groups.

Though the integrated entropy was significantly different between the ASD and TD groups, we still wanted to check if possible confounds such as (1) ‘percentage of missing data’ and the (2) ‘variation in the landmark dynamics’ had any influence on our findings. To do so, we performed a cross-correlation analysis. The results indicated a weak correlation (ranged 0.2–0.42) between the integrated entropy and ‘percentage of missing data’ as well as ‘variation in the landmark dynamics’ (see Appendix V for additional numerical values). The lower values in the correlation analysis states that the (1) integrated entropy is not affected by the ‘percentage of missing data,’ and (2) integrated entropy measures are not just capturing the ‘variation in the landmarks’ dynamics.’ Though we have considered only the participants that had a minimum of 40% of landmarks’ data for our analysis after the preprocessing, we still wanted to see if the ‘percentage of missing data’ are statistically different between the groups. The results indicate that the two groups are not significantly different with regards to the ‘percentage of missing data’ (see Appendix V for detailed information). Additionally, we have used these possible two confounds as a covariate in ANCOVA to further ensure the statistical difference between the ASD and TD groups. Even after adjusting for the two confounds, either individually or together, the significant difference was maintained between the ASD and TD groups, with p<0.0001 for social tasks and p<0.001 for non-social movies, indicating that the integrated entropy was not influenced by these confounds.

Additionally, to understand whether the two regions of interest (eyebrows and mouth) had different levels of complexity among themselves, we compared the integrated entropy values between these two regions within the ASD and TD groups. The results indicated that the integrated entropy for these two regions were not significantly different from each other (p > .05 with effect size (r) < 0.1), irrespective of the movies, for both the ASD and TD groups.

Additionally, in order to understand individualized range of change in the integrated entropy across the tasks, we have estimated the range defined as,

i n t e g r a t e d e n t r o p y_{R A N G E} = i n t e g r a t e d e n t r o p y_{M I N} - i n t e g r a t e d e n t r o p y_{M A X}

for each participant across the six presented movies. A statistical analysis revealed that the ASD group had significantly larger change in their $ntegrated e n t r o p y_{R A N G E}$ values than the TD group across the tasks in both the eyebrows $(p = . 001, r = . 33$ ) and mouth $(p = . 0002, r = . 39$ regions, with medium effect sizes. Such an observation again reflected that the participants with ASD had complex landmarks’ dynamics that varied across the different movies presented.

4.2. Landmarks’ Complexity and Affective State

This section addresses our second research question. A comparative analysis on the calculated energy upon the first-derivative of the time-series data related to neutral, positive, and negative affect indicated that the participants expressed a higher probability of neutral affect in response to all the movies (Table 2). This finding is consistent with previous work from our group [19]. Our movies were not designed to elicit any negative affect, as expected, we did not observe any significant negative facial emotions.

TABLE 2.

Energy of the first-derivative of affect-based time-series data

	Group	BB	ST	RAT	MML	RRL	Mpuppy

Nu	TD	1.0+0.8	0.8±0.7	1.0±0.7	1.0+0.8	1.1+0.8	0.5±0.4
	ASD	1.8+1.0	1.9+2.0	2.1+1.0	2.1+1.4	0.5+0.8	0.7±0.5
P	TD	0.5±0.7	0.3±0.5	0.3±0.5	0.2±0.4	0.6±0.8	0.3±0.5
	ASD	1.2±1.4	0.7±0.7	0.9±0.9	0.6±0.7	1.2 ±1.0	0.4±0.5
N	TD	0.0±0.0	0.0±0.1	0.04±0.0	0.1±0.0	0.1±0.1	0.0±0.0
	ASD	0.1±0.0	0.1±0.1	0.07±0.1	0.1±0.0	0.1±0.1	0.0±0.0

Open in a new tab

Note: Nu = neutral affect, P = positive affect, N = negative affect, BB = Blowing Bubbles, ST = Spinning Top, RAT = Rhymes and Toys, MML = Make Me Laugh, RRL = Dog in Grass RRL, and Mpuppy = Mechanical Puppy

We now consider the energy calculated from the first-derivative of positive affect’s time-series data (Positive_Energy’) for further analysis. To understand if the observed complexity in facial landmarks’ dynamics was simply an outcome of expressing positive affect in response to the movies, we extracted the correlation coefficient between the integrated entropy and Positive_Energy, for all the movies (combining both the ASD and TD groups). The results (Table 3) indicated that though some dependency existed for certain movies, such as Blowing Bubbles (BB) and Mechanical Puppy (MPuppy), it was not the case for the other movies such as Spinning Top (ST), Rhymes and Toys (RAT), Make Me Laugh (MML), Dog in Grass RRL (RRL). The results were similar even when the analysis was done separately for ASD and TD groups. Thus, the landmarks’ dynamics were possibly a combination of affect and other manifestations such as atypical mouth movements [20]–[22], and frequently raised eyebrows and open mouth potentially reflecting level of attentional engagement [19]. Furthermore, the correlation coefficient (ρ) was comparatively more pronounced between the integrated entropy of the mouth region and the Positive_Energy’, where the positive affect (smile) can be more prominent than the eyebrows. It was evident from these results that although a partial correlation existed with the affect data, the complexity of the landmarks’ dynamics can offer a complementary measure with additional unique information.

TABLE 3.

Correlation Coefficient Between integrated entropy and Positive_Energy’

	BB	ST	RAT	MML	RRL	Mpuppy

Eyebrows	.49	.08	.05	.11	.01	.41
Mouth	.60	.13	.05	.15	.06	.49

Open in a new tab

Note: The values represent Spearman’s ρ

4.3. Classification approach using Landmarks’ Complexity

To understand the feasibility of using the proposed complexity measure to automatically classify the individuals with ASD and TD, we used the integrated entropy values of the eyebrows and mouth regions as an input to a decision tree-based model. Since the statistical tests showed that the significant difference in integrated entropy between the groups had larger effect sizes during social movies, we have considered only those movies for this analysis. Fig. 6 shows the ROC curves and Table 4 shows other performance measures such as accuracy, precision, and recall using Leave-One-Out (LOO) cross-validation. Overall, the results indicated that the integrated entropy can serve as a promising biomarkers to classify the ASD and TD groups. In addition, we have also tested whether the affect data, e.g., Positive_Energy’ could contribute to improved performance when included as input in such classification. The classification performance remained similar, indicating that the Positive_Energy’ was not powerful enough to add additional value to the proposed integrated entropy. In fact, while using the affect data as the only input to the classifier, the performance was not up to the mark as it was while using only the integrated entropy as an input.

Fig. 6. — ROC curves for each of the social movies

The AUC of the ROC was higher while the classifier was trained with only *integrated entropy* compared to either of *Positive*_Energy’ or the combination of both (except for Spinning Top) the measures.

TABLE 4.

Cross-validation results for the decision tree model

Movies	Accuracy			Precision			Recall

	A	B	C	A	B	C	A	B	C
Blowing Bubbles	78.3%	75.0%	79.1%	72.4%	75.0%	75.7%	93.5%	77.0%	77.4%
Make Me Laugh	78.3%	65.0%	79.5%	70.5%	56%	75.1%	88.8%	82.5%	85.1%
Rhymes and Toys	76.6%	70.0%	75.0%	70.0%	69.1%	72.4%	89.3%	82.1%	75.0%
Spinning Top	76.7%	61.6%	63.3%	70.3%	62.5%	61.3%	89.6%	51.7%	65.5%

Open in a new tab

Note: A = integrated entropy from eyebrows and mouth regions, B = Positive _Energy’ and C = features from both A and B The performances measures of the classifiers while using only the integrated entropy as input have showed stable and higher performance while comparing for all the social movies. In contrast, the Positive_Energy’ as only input has showed the lowest performance. Combining both the integrated entropy and Positive_Energy’ as an input to the classifier showed a slight increase in accuracy and precision only for Blowing Bubbles and Make Me Laugh (not otherwise), while the recall rate is reduced. Thus, we can say that combining Positive_Energy with integrated entropy is not adding additional information during the classification between the ASD and TD groups

Furthermore, the odds ratio calculated from the linear logistic regression (Table 5) indicates that the higher values of integrated entropy increase the chance to predict the risk of ASD by up to 1.8 times while considering either of the social movies. On the other hand, the changes in the affective expression (i.e., Positive_Energy’) did not offered any positive odds ratio (that is, 0.8, which is less than 1). However, both the integrated entropy and Positive_Energy’ had a significant contribution (p<0.05) in fitting the logistic regression model. Again, using only the integrated entropy has offered same results than when combined with Positive_Energy’, indicating that the integrated entropy was independent from the affect measure and powerful enough to distinguish ASD and TD groups.

TABLE 5.

Odds ratio results from linear logistic regression

Movies	integrated entropy				Positive_Eneray’

	Eyebrows		Mouth
	Odds	P-value	Odds	P-value	Odds	P-value

Blowing Bubbles	1.49	.004	1.48	.005	0.80	.018
Make Me Laugh	154	.004	151	.004	0.82	.030
Rhymes and Toys	152	.004	1.80	.004	0.81	.026
Spinning Top	1.48	.005	152	.004	0.85	.026

Open in a new tab

5. Discussion and Conclusions

We designed an iPad-based application (app) that displayed strategically designed, developmentally appropriate short movies involving social and non-social components. Children diagnosed with ASD and children with typical development (TD) took part in our study, watching the movies during their well-child visit to pediatric clinics. The device’s front-facing camera was used to record the children’s behavior and capture ASD-related features. Our current work was focused on exploring biomarkers related to facial dynamics. We exploited the children’s facial dynamics from the eyebrows and mouth regions using multiscale entropy (MSE) analysis to study the complexity of such facial landmarks’ dynamics. The complexity analysis may give insights about the level of irregularity (or ambiguity) in the facial dynamics that can potentially be used as a biomarker to distinguish between children with ASD and those with typical development (TD). Basically the complexity estimates using entropy offer information about how easy is to predict facial landmark dynamics rather than just their variation. Specifically a time-series with higher variations that are highly periodic will result in extremely low values of the entropy measure. Similar is the case when the time-series data is almost stable with very low variations, the entropy value will still be low (which was the case for our TD participants). In contrast for a time-series with higher variability irregular and non-periodic movements, the entropy value will be high, indicating higher complexity in facial dynamics (which was the case for the ASD participants). We speculate that the presence of greater predictability with minor facial movements in the TD toddlers reflects a higher level and more consistent understanding of the shared social meaning of content of the movies (e.g., rhymes, a conversation). If so, the TD toddlers might be expected to respond more predictably to the stimuli, whereas the responses of the children with autism may be more idiosyncratic. It has been previously documented (e.g. [9]) that children with autism make more atypical and ambiguous facial expressions and that these vary across children with autism.

As expected, the results of our modified approach to MSE analysis captured distinctive landmarks’ dynamics in children with ASD, characterized by a significantly higher level of complexity in both the eyebrows and mouth regions when compared to typically-developing children. This measure can be robust and complementary to other measures such as affective state [19]. The observation from the integrated entropy supports recent work indicating that individuals with ASD often exhibit a higher probability of neutral expression [19]. Neutral expressions might be interpreted by others as more ambiguous in terms of the affective state they convey. The results here reported are also in agreement with other works related to atypical speech and mouth movements [20], [22], offering a scope and directions for further exploration. Also, it was shown that the individuals with ASD have difficulties in affect coordination during interpersonal social interaction [40], it would be also interesting to study the potential of complexity / coordination in facial dynamics in such context in future. Finally, we observed that die proposed integrated entropy (the sum of SampEn from 1–20 scales of the MSE) might not only hold promise in distinguishing children with ASD versus TD, but also from other developmental disorders (e.g., developmental delay/language delay: see Appendix III). Additionally, the integrated entropy has offered better performance in classifying the ASD and TD groups of participants while using machine learning based classifiers, e.g., a decision tree-based model, offering avenue to build an automated decision-making pipeline for conveying a probability of risk in children with ASD while using the app in home-based settings. Notwithstanding the fact that in addition to the complexity measure it would be necessary to combine other features such as gaze-related indices, name call response, signatures of motor deficits, and more, as mentioned in [7] before deploying such automated decision-making tool. Complementary to the facial landmarks’ dynamics, the known deficits in motor control in children with ASD can be manifested in the form of poor postural control [41] which can be captured easily with our data (See Appendix 5). Exploring complexity estimates in these head motions can be an interesting future work

Limitations of this study include: landmarks and affect detection were based on algorithms trained primarily on adults, although our previous work showed it was still reliable for toddlers; other measures of complexity might be more robust than the MSE in their ability to discriminate children with and without ASD; and the study sample, while relatively large, still has a limited number of ASD participants and did not have sufficient power to determine the impact of demographic characteristics on the results.

To conclude, in this work, we introduced a newly normalized measure of MSE more suited for across-subject comparison and demonstrated that the complexity of facial dynamics has potential as an ASD biomarker beyond more traditional measures of affective expression. Our findings were consistent with the previous work on patterns of affective expression in children with ASD, while adding new discoveries underscoring the value of dynamic facial primitives. Considering that autism is a very heterogeneous condition, the combination of the novel biomarkers described here with additional biomarkers has the potential to improve the development of scalable screening, diagnosis, and treatment monitoring tools.

Supplementary Material

NIHMS1884318-supplement-Supplementary_Material.docx^{(7.2MB, docx)}

Acknowledgment

This work was supported by an NIH Autism Center of Excellence Award NICHD 1P50HD093074, NIMH R01MH121329 and NIMH R01MH120093 with additional support provided by The Marcus Foundation, the Simons Foundation, NSF-1712867, ONR N00014-18-1-2143 and N00014-20-1-233, NGA HM04761912010, Apple, Inc. Microsoft Inc., Amazon Web Services, and Google, Inc. The funders had no direct role in the study design, data analysis, decision to publish, or preparation of the manuscript. The authors wish to thank the many caregivers and children for their participation in the study, without whom this research would not have been possible. The authors gratefully acknowledge the collaboration of the physicians and nurses in Duke Children’s Primary Care, and the contributions of several research staff and clinical research specialists. Data analysis for this project is constantly independently reviewed by Scott Compton, who has no conflict of interest related to this work. Sapiro, Dawson, Carpenter, and Espinosa developed technology related to the app that has been licensed to Apple, Inc., and both they and Duke University have benefited financially. Dawson is on the Scientific Advisory Boards of Janssen Research and Development, Akili Interactive, Inc, LabCorp, Inc, Roche Pharmaceutical Company, and Tris Pharma, and is a consultant to Apple, Gerson Lehrman Group, Guidepoint Global, Inc, and is CEO of DASIO, LLC. Dr. Dawson has stock interests in Neuvana, Inc. Dr. Dawson has received book royalties from Guilford Press, Oxford University Press, Springer Nature Press. In addition, Dr. Dawson has the following patent No. 10,912,801 and patent applications: 62,757,234, 25,628,402, and 62,757,226. Dr. Dawson has developed products that have been licensed to Cryocell, Inc. and Dawson and Duke University have benefited financially. Sapiro was a consultant for Apple, Volvo, Restore3D, and SIS during parts of this work, and is CEO of DASIO, LLC. He was on the Board of SIS and Tanku and has invention disclosures and patent applications registered at the Duke Office of Licensing and Venture. He received speaking fees from Janssen while parts of this work were performed. He is currently affiliated with Apple, while all this work was completed before this affiliation started. The remaining authors have declared that they have no competing or potential conflicts of interest

Biographies

graphic file with name nihms-1884318-b0007.gif

Pradeep Raj Krishnappa Babu is a postdoctoral research associate in Electrical and Computer Engineering at Pratt School of Engineering, Duke University, USA. He received his B.E. in Computer Science Engineering from Anna University affiliated institute, India in 2012. Graduated with M.E in Virtual Prototyping and Digital Manufacturing from P.S.G. College of Technology, Coimbatore, India, in 2014. He received his Ph.D. in Cognitive Science from Indian Institute of Technology Gandhinagar, India, in 2019. During his Ph.D. he received a prestigious fellowship support from ‘Visvesvarya PhD Scheme’ offered by Ministry of Electronics and Information Technology, India. His research interests includes – human-computer interaction, psychophysiological-based user interaction strategies, computer vision, machine learning, virtual reality based user interaction systems and affect-sensitive system modeling.

graphic file with name nihms-1884318-b0008.gif

J. Mafias Di Martino was born in Montevideo, Uruguay, on January 1987. He received the B.Sc. and Ph.D. degrees in Electrical Engineering from “Universidad de la Republica”, Uruguay, in 2011 and 2015 respectively. During 2016– 2017 he was a Research Associate at Ecole Normale Superieure de Cachan, Paris. Since 2017 he is working as Assistant Professor at the Physics Department of the School of Engineer, “Universidad de la Republica”, and since 2019 he is also a Research Assistant Professor at the Department of Electrical and Computer Engineering, Duke University, US. His main areas of interest are applied optics, machine learning, facial and behavioral analysis and image processing.

graphic file with name nihms-1884318-b0009.gif

Zhuoqing Chang received his B.S. from Tianjin University in Measuring and Controlling Technologies and Instruments. He received his M.S. and Ph.D. from Duke University in Electrical and Computer Engineering with a focus on computer vision and machine learning. He is currently a Machine Learning Engineer at Peloton Interactive.

graphic file with name nihms-1884318-b0010.gif

Sam J. Perochon was born in Niort, France. He received his B.E in Electrical Engineering from the Ecole Normale Supérieure Paris-Saclay in France (2018). Graduated with M.E in Applied Mathematics and Computer Science, Master MVA (Mathématics - Vision - Learning) from the Ecole Normale Supérieure Paris-Saclay (2021). He is a Ph.D. candidate in applied mathematics with the Ecole Normale Supérieure Paris-Saclay in France and Duke University, USA, starting September 2021. Since 2019, he has been a regular collaborator with the Department of Electrical and Computer Engineering at the Pratt School of Engineering, Duke University, USA, as a Visiting Research Scholar working on human behavioral analysis using machine learning and computer vision. His research interests includes the analysis of multimodal information tor the screening, monitoring or treatment of neurophysiological disorders, computer vision, machine learning, and statistics.

graphic file with name nihms-1884318-b0011.gif

Kimberly L.H. Carpenter is a neurobiologist specializing in translational developmental neuroscience, with expertise in functional and structural neuroimaging in clinical and pediatric populations. Dr. Carpenter’s research focuses on three primary content areas: (A) The neuroscience of early childhood mental health, (B) Risk factors tor psychiatric and neurodevelopmental disorders in preschool-age children, and (C) The development of new technologies for evidenced-based screening for neurodevelopmental and psychiatric disorders in young children. Through this work, she aims to increase access to, and provide a solid neurobiological foundation for, evidence-based screening, diagnosis and treatment of autism and associated psychiatric comorbidities in children from birth to 5 years of age. Dr. Carpenter has a Ph.D. in Neurobiology from the University of North Carolina at Chapel Hill, a bachelor of science in biology with a minor in chemistry, and a bachelor of arts in psychology from the University of North Carolina at Wilmington.

graphic file with name nihms-1884318-b0012.gif

Steven Espinosa received his B.S. in Computer Science from the University of South Carolina (2008). Since then, he has worked on various projects in several industries including video game development, higher education, computer vision and medical research. His current role is Senior IT Analyst at Duke University where he focuses on creating research experiments using mobile devices as a platform for delivery.

graphic file with name nihms-1884318-b0013.gif

Scott Compton, PhD is an associate professor in the Department of Psychiatry and Behavioral Sciences at Duke University Medical Center and Department of Psychology at Duke University. He is a senior clinical trials researcher and has conducted multiple large NIH-funded comparative treatment trials. His research interests are in understanding, developing, evaluating, and disseminating psychosocial interventions for pediatric and young adult anxiety disorders, depressive disorders, OCD-spectrum disorders, suicide, and Tourette syndrome. Dr. Compton serves on the scientific advisory board of the Tourette Association of America and Anxiety and Depression of America. In the recent past, he served as the Deputy Editor for Journal of Consulting and Clinical Psychology. His research has been published in major journals including Journal of the American Medical Association and the New England Journal of Medicine.

graphic file with name nihms-1884318-b0014.gif

Geraldine Dawson is the William Cleland Distinguished Professor of Psychiatry and Behavioral Sciences at Duke University, where she also is Professor of Pediatrics and Psychology & Neuroscience and directs the Duke Center for Autism and Brain Development. Dawson was previously Professor of Psychology at the University of Washington, where she directed the UW Autism Center. She served as President of the International Society of Autism Research, member of the Interagency Autism Coordinating Committee, and Chief Science Officer for Autism Speaks. Dawson’s research has focused on autism early detection, treatment, and brain function. She is an elected member of the American Academy of Arts and Sciences and received the American Psychological Association Distinguished Career Award, Association for Psychological Science Lifetime Achievement Award, and Clarivate Top 1% Cited Researcher Across All Scientific Fields. Dawson received a Ph.D. in Developmental and Child Clinical Psychology from the University of Washington and completed a clinical internship at UCLA.

graphic file with name nihms-1884318-b0015.gif

Guillermo Sapiro was born in Montevideo, Uruguay, on April 3, 1966. He received his B.Sc. (summa cum laude), M.Sc., and Ph.D. from the Department of Electrical Engineering at the Technion, Israel Institute of Technology, in 1989, 1991, and 1993 respectively. After postdoctoral research at MIT, Dr. Sapiro became Member of Technical Staff at the research facilities of HP Labs in Palo Alto, California. He was with the Department of Electrical and Computer Engineering at the University of Minnesota, where he held the position of Distinguished McKnight University Professor and Vincentine Hermes-Luh Chair in Electrical and Computer Engineering. Currently he is a James B. Duke School Professor with Duke University. G. Sapiro works on theory and applications in computer vision, computer graphics, medical imaging, image analysis, and machine learning. He has authored and co-authored over 450 papers in these areas and has written a book published by Cambridge University Press, January 2001. G. Sapiro was awarded the Gutwirth Scholarship for Special Excellence in Graduate Studies in 1991, the Ollendorff Fellowship for Excellence in Vision and Image Understanding Work in 1992, the Rothschild Fellowship for Post-Doctoral Studies in 1993, the Office of Naval Research Young Investigator Award in 1998, the Presidential Early Career Awards for Scientist and Engineers (PECASE) in 1998, the National Science Foundation Career Award in 1999, and the National Security Science and Engineering Faculty Fellowship in 2010. He received the Test-of-Time award at ICCV 2011 and at ICML 2019, only researcher to receive Test-of-Time awards in both computer vision and machine learning major venues. He was elected to the American academy of Arts and Sciences in 2018. G. Sapiro is a Fellow of IEEE, SIAM, and the American Academy of Arts and Sciences (AAAS). G. Sapiro was the founding Editor-in-Chief of the SIAM Journal on Imaging Sciences.

Footnotes

It is interesting to note that in spite of the relatively popular use of MSE in health research, this problem has never been addressed before, and often authors report lower MSE for data that is clearly more complex and vice-versa. Without taking this into consideration, it is challenging to use MSE to compare the complexity among signals that come from different participants.

Contributor Information

Pradeep Raj Krishnappa Babu, Department of Electrical and Computer Engineering, Duke University, Durham, NC, USA..

J. Matias Di Martino, Department of Electrical and Computer Engineering, Duke University, Durham, NC, USA..

Zhuoqing Chang, Department of Electrical and Computer Engineering, Duke University, Durham, NC, USA..

Sam Perochon, Department of Electrical and Computer Engineering, Duke University, Durham, NC, USA..

Kimberly L.H. Carpenter, Duke Center for Autism and Brain Development, Department of Psychiatry and Behavioral Sciences, Duke University, Durham, NC, USA.

Scott Compton, Duke Center for Autism and Brain Development, Department of Psychiatry and Behavioral Sciences, Duke University, Durham, NC, USA..

Steven Espinosa, Office of Information Technology, Duke University, Durham, NC, USA..

Geraldine Dawson, Duke Center for Autism and Brain Development, Department of Psychiatry and Behavioral Sciences, Duke University, Durham, NC. USA..

Guillermo Sapiro, Department of Electrical and Computer Engineering, Biomedical Engineering, Mathematics, and Computer Sciences, Duke University, Durham, NC, USA..

References

[1].Ekman P, “Emotional and Conversational Nonverbal Signals,” in Language, Knowledge, and Representation, 2004, pp. 39–50. [Google Scholar]
[2].Ekman P and Friesen WV, “Constants across cultures in the face and emotion,” J. Pers. Soc. Psychol, vol. 17, no. 2, pp. 124–129, 1971. [DOI] [PubMed] [Google Scholar]
[3].Izard CE, Huebner RR, Risser D, and Dougherty L, “The young infant’s ability to produce discrete emotion expressions,” Dev. Psychol, 1980. [Google Scholar]
[4].American Psychiatric Association, “DSM-5 Diagnostic Classification,” in Diagnostic and Statistical Manual of Mental Disorders, 2013. [Google Scholar]
[5].Lord C et al. , “The Autism Diagnostic Observation Schedule-Generic: A standard measure of social and communication deficits associated with the spectrum of autism,” J. Autism Deo. Disord, vol. 30, no. 3, pp. 205–223, 2000. [PubMed] [Google Scholar]
[6].Yirmiya N, Kasari C, Sigman M, and Mundy P, “Facial Expressions of Affect in Autistic, Mentally Retarded and Normal Children,” J. Child Psychol. Psychiatry, vol. 30, no. 5, pp. 725–735, 1989. [DOI] [PubMed] [Google Scholar]
[7].Dawson G and Sapiro G, “Potential for Digital Behavioral Measurement Tools to Transform the Detection and Diagnosis of Autism Spectrum Disorder,” JAMA Pediatrics. 2019, doi: 10.1001/jamapediatrics.2018.5269. [DOI] [PMC free article] [PubMed] [Google Scholar]
[8].Metallinou A, Grossman RB, and Narayanan S, “Quantifying atypicality in affective facial expressions of children with autism spectrum disorders,” in Proceedings - IEEE International Conference on Multimedia and Expo, 2013, doi: 10.1109/ICME.2013.6607640. [DOI] [PMC free article] [PubMed] [Google Scholar]
[9].Grossard C et al. , “Children with autism spectrum disorder produce more ambiguous and less socially meaningful facial expressions: An experimental study using random forest classifiers,” Mol. Autism, vol. 11, no. 1,2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[10].Manfredonia J et al. , “Automatic Recognition of Posed Facial Expression of Emotion in Individuals with Autism Spectrum Disorder,” J. Autism Dev. Disord, vol. 49, no. 1, pp. 279–293, 2019. [DOI] [PubMed] [Google Scholar]
[11].Leo M et al. , “Computational analysis of deep visual data for quantifying facial expression production,” Appl. Sci, vol. 9, no. 21,2019, doi: 10.3390/app9214542. [DOI] [Google Scholar]
[12].Zane E, Yang Z, Pozzan L, Guha T, Narayanan S, and Grossman RB, “Motion-Capture Patterns of Voluntarily Mimicked Dynamic Facial Expressions in Children and Adolescents With and Without ASD,” J. Autism Dev. Disord, vol. 49, no. 3, pp. 1062–1079, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
[13].Guha T et al. , “On quantifying facial expression-related atypicality of children with Autism Spectrum Disorder,” in 1CASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2015, pp. 803–807. [DOI] [PMC free article] [PubMed] [Google Scholar]
[14].Guha T, Yang Z, Grossman RB, and Narayanan SS, “A Computational Study of Expressive Facial Dynamics in Children with Autism,” IEEE Trans. Affect. Comput, vol. 9, no. 1, pp. 14–20, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
[15].Costa M, Goldberger AL, and Peng CK, “Multiscale Entropy Analysis of Complex Physiologic Time Series,” Phys. Rev. Lett, vol. 89, no. 6, p. 068102,2002. [DOI] [PubMed] [Google Scholar]
[16].Costa M, Goldberger AL, and Peng CK, “Multiscale entropy analysis of biological signals,” Phys. Rev. E - Stat. Nonlinear, Soft Matter Phys., vol. 71, no. 2, p. 021906,2005. [DOI] [PubMed] [Google Scholar]
[17].Harati S, Crowell A, Mayberg H, Kong J, and Nemati S, “Discriminating clinical phases of recovery from major depressive disorder using the dynamics of facial expression,” in Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, 2016, pp. 2254–2257. [DOI] [PubMed] [Google Scholar]
[18].Zhao Z et al. , “Atypical Head Movement during Face-to-Face Interaction in Children with Autism Spectrum Disorder,” Autism Res, 2021, doi: 10.1002/aur.2478. [DOI] [PubMed] [Google Scholar]
[19].Carpenter KLH et al. , “Digital Behavioral Phenotyping Detects Atypical Pattern of Facial Expression in Toddlers with Autism,” Autism Res, 2020, doi: 10.1002/aur.2391. [DOI] [PMC free article] [PubMed] [Google Scholar]
[20].Green JR, Moore CA, and Reilly KJ, “The sequential development of jaw and lip control for speech,” J. Speech, Lang. Hear. Res, vol. 45, no. 1, pp. 66–79, 2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
[21].Shriberg LD, Green JR, Campbell TF, McSweeny JL, and Scheer AR, “A diagnostic marker for childhood apraxia of speech: The coefficient of variation ratio,” Clinical Linguistics and Phonetics, vol. 17, no. 7. pp. 575–595, 2003. [DOI] [PubMed] [Google Scholar]
[22].Tenenbaum et ah EJ, “A Six-Minute Measure of Vocalizations in Toddlers with Autism Spectrum Disorder,” Autism Res, vol. 13, no. 8, pp. 1373–1382, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
[23].Robins DL, Casagrande K, Barton M, Chen CMA, Dumont-Mathieu T, and Fein D, “Validation of the modified checklist for autism in toddlers, revised with follow-up (M-CHAT-R/F),” Pediatrics, vol. 133, no. 1, pp. 37–45, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[24].Luyster R et al. , “The autism diagnostic observation schedule - Toddler module: A new module of a standardized diagnostic measure for autism spectrum disorders,” J. Autism Dev. Disord, vol. 39, no. 9, pp. 1305–1320, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
[25].King DE, “Dlib-ml: A machine learning toolkit,” J. Mach. Learn. Res, vol. 10, pp. 1755–1758, 2009. [Google Scholar]
[26].Chang et ah Z, “Computational Methods to Measure Patterns of Gaze in Toddlers With Autism Spectrum Disorder,” pp. 1–10, 2021, doi: 10.1001/jamapediatrics.2021.0530. [DOI] [PMC free article] [PubMed] [Google Scholar]
[27].Perochon et ah S, “A scalable computational approach to assessing response to name in toddlers with autism,” J. Child Psychol. Psychiatry Allied Discip, 2021, doi: 10.1111/jcpp.13381. [DOI] [PMC free article] [PubMed] [Google Scholar]
[28].Baltrusaitis T, Zadeh A, Lim YC, and Morency LP, “OpenFace 2.0: Facial behavior analysis toolkit,” in Proceedings - 13th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2018,2018, pp. 59–66. [Google Scholar]
[29].Hashemi et ah J, “Computer Vision Analysis for Quantification of Autism Risk Behaviors,” IEEE Trans. Affect. Comput, 2018, doi: 10.1109/TAFFC.2018.2868196. [DOI] [PMC free article] [PubMed] [Google Scholar]
[30].Yan WJ, Wu Q, Liang J, Chen YH, and Fu X, “How Fast are the Leaked Facial Expressions: The Duration of Micro-Expressions,” ]. Nonverbal Behav, vol. 37, no. 4, pp. 217–230, 2013. [Google Scholar]
[31].Salthouse TA and Ellis CL, “Determinants of eye-fixation duration.,” Am. ]. Psychol., vol. 93, no. 2, pp. 207–234, 1980. [PubMed] [Google Scholar]
[32].Galley N, Betz D, and Biniossek C, “Fixation durations - Why are they so highly variable?,” in Advances in Visual Perception Research, 2015, pp. 83–106. [Google Scholar]
[33].Richman JS and Moorman JR, “Physiological time-series analysis using approximate and sample entropy,” Am. ]. Physiol. - Hear. Circ. Physiol, vol. 278, no. 6, pp. H2039–H2049, 2000. [DOI] [PubMed] [Google Scholar]
[34].Lake DE, Richman JS, Pamela Griffin M, and Randall Moorman J, “Sample entropy analysis of neonatal heart rate variability,” Am. ]. Physiol. - Regul. Integr. Comp. Physiol, vol. 283, no. 3, pp. R789–R797, 2002. [DOI] [PubMed] [Google Scholar]
[35].Dong et ah X, “An improved method of handling missing values in the analysis of sample entropy for continuous monitoring of physiological signals,” Entropy, vol. 21, no. 3, p. 274,2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
[36].Cirugeda-Roldan E, Cuesta-Frau D, Miro-Martinez P, and Oltra-Crespo S, “Comparative study of entropy sensitivity to missing biosignal data,” Entropy, vol. 16, no. 11, pp. 5901–5918, 2014. [Google Scholar]
[37].Hashemi J, Qiu Q, and Sapiro G, “Cross-modality pose-invariant facial expression,” in Proceedings - International Conference on Image Processing, ICIP, 2015, pp. 4007–4011. [Google Scholar]
[38].Breiman L, Friedman JH, Olshen RA, and Stone CJ, Classification and regression trees. Routledge, 2017. [Google Scholar]
[39].DeLong ER, DeLong DM, and Clarke-Pearson DL, “Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach,” Biometrics, vol. 44, no. 3, p. 837,1988. [PubMed] [Google Scholar]
[40].Zampella CJ, Bennetto L, and Herrington JD, “Computer Vision Analysis of Reduced Interpersonal Affect Coordination in Youth With Autism Spectrum Disorder,” Autism Res, 2020, doi: 10.1002/aur.2334. [DOI] [PMC free article] [PubMed] [Google Scholar]
[41].Dawson G, Campbell K, Hashemi J, Lippmann S.j., Smith V, Carpenter K, Egger H, Espinosa S, Vermeer S, Baker J, and Sapiro G, “Atypical postural control can be detected via computer vision analysis in toddlers with autism spectrum disorder,” Scientific reports, vol. 8, no. 9, pp.1–7, 2018 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

NIHMS1884318-supplement-Supplementary_Material.docx^{(7.2MB, docx)}

[R1] [1].Ekman P, “Emotional and Conversational Nonverbal Signals,” in Language, Knowledge, and Representation, 2004, pp. 39–50. [Google Scholar]

[R2] [2].Ekman P and Friesen WV, “Constants across cultures in the face and emotion,” J. Pers. Soc. Psychol, vol. 17, no. 2, pp. 124–129, 1971. [DOI] [PubMed] [Google Scholar]

[R3] [3].Izard CE, Huebner RR, Risser D, and Dougherty L, “The young infant’s ability to produce discrete emotion expressions,” Dev. Psychol, 1980. [Google Scholar]

[R4] [4].American Psychiatric Association, “DSM-5 Diagnostic Classification,” in Diagnostic and Statistical Manual of Mental Disorders, 2013. [Google Scholar]

[R5] [5].Lord C et al. , “The Autism Diagnostic Observation Schedule-Generic: A standard measure of social and communication deficits associated with the spectrum of autism,” J. Autism Deo. Disord, vol. 30, no. 3, pp. 205–223, 2000. [PubMed] [Google Scholar]

[R6] [6].Yirmiya N, Kasari C, Sigman M, and Mundy P, “Facial Expressions of Affect in Autistic, Mentally Retarded and Normal Children,” J. Child Psychol. Psychiatry, vol. 30, no. 5, pp. 725–735, 1989. [DOI] [PubMed] [Google Scholar]

[R7] [7].Dawson G and Sapiro G, “Potential for Digital Behavioral Measurement Tools to Transform the Detection and Diagnosis of Autism Spectrum Disorder,” JAMA Pediatrics. 2019, doi: 10.1001/jamapediatrics.2018.5269. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] [8].Metallinou A, Grossman RB, and Narayanan S, “Quantifying atypicality in affective facial expressions of children with autism spectrum disorders,” in Proceedings - IEEE International Conference on Multimedia and Expo, 2013, doi: 10.1109/ICME.2013.6607640. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] [9].Grossard C et al. , “Children with autism spectrum disorder produce more ambiguous and less socially meaningful facial expressions: An experimental study using random forest classifiers,” Mol. Autism, vol. 11, no. 1,2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] [10].Manfredonia J et al. , “Automatic Recognition of Posed Facial Expression of Emotion in Individuals with Autism Spectrum Disorder,” J. Autism Dev. Disord, vol. 49, no. 1, pp. 279–293, 2019. [DOI] [PubMed] [Google Scholar]

[R11] [11].Leo M et al. , “Computational analysis of deep visual data for quantifying facial expression production,” Appl. Sci, vol. 9, no. 21,2019, doi: 10.3390/app9214542. [DOI] [Google Scholar]

[R12] [12].Zane E, Yang Z, Pozzan L, Guha T, Narayanan S, and Grossman RB, “Motion-Capture Patterns of Voluntarily Mimicked Dynamic Facial Expressions in Children and Adolescents With and Without ASD,” J. Autism Dev. Disord, vol. 49, no. 3, pp. 1062–1079, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] [13].Guha T et al. , “On quantifying facial expression-related atypicality of children with Autism Spectrum Disorder,” in 1CASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2015, pp. 803–807. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] [14].Guha T, Yang Z, Grossman RB, and Narayanan SS, “A Computational Study of Expressive Facial Dynamics in Children with Autism,” IEEE Trans. Affect. Comput, vol. 9, no. 1, pp. 14–20, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] [15].Costa M, Goldberger AL, and Peng CK, “Multiscale Entropy Analysis of Complex Physiologic Time Series,” Phys. Rev. Lett, vol. 89, no. 6, p. 068102,2002. [DOI] [PubMed] [Google Scholar]

[R16] [16].Costa M, Goldberger AL, and Peng CK, “Multiscale entropy analysis of biological signals,” Phys. Rev. E - Stat. Nonlinear, Soft Matter Phys., vol. 71, no. 2, p. 021906,2005. [DOI] [PubMed] [Google Scholar]

[R17] [17].Harati S, Crowell A, Mayberg H, Kong J, and Nemati S, “Discriminating clinical phases of recovery from major depressive disorder using the dynamics of facial expression,” in Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, 2016, pp. 2254–2257. [DOI] [PubMed] [Google Scholar]

[R18] [18].Zhao Z et al. , “Atypical Head Movement during Face-to-Face Interaction in Children with Autism Spectrum Disorder,” Autism Res, 2021, doi: 10.1002/aur.2478. [DOI] [PubMed] [Google Scholar]

[R19] [19].Carpenter KLH et al. , “Digital Behavioral Phenotyping Detects Atypical Pattern of Facial Expression in Toddlers with Autism,” Autism Res, 2020, doi: 10.1002/aur.2391. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] [20].Green JR, Moore CA, and Reilly KJ, “The sequential development of jaw and lip control for speech,” J. Speech, Lang. Hear. Res, vol. 45, no. 1, pp. 66–79, 2002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] [21].Shriberg LD, Green JR, Campbell TF, McSweeny JL, and Scheer AR, “A diagnostic marker for childhood apraxia of speech: The coefficient of variation ratio,” Clinical Linguistics and Phonetics, vol. 17, no. 7. pp. 575–595, 2003. [DOI] [PubMed] [Google Scholar]

[R22] [22].Tenenbaum et ah EJ, “A Six-Minute Measure of Vocalizations in Toddlers with Autism Spectrum Disorder,” Autism Res, vol. 13, no. 8, pp. 1373–1382, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] [23].Robins DL, Casagrande K, Barton M, Chen CMA, Dumont-Mathieu T, and Fein D, “Validation of the modified checklist for autism in toddlers, revised with follow-up (M-CHAT-R/F),” Pediatrics, vol. 133, no. 1, pp. 37–45, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] [24].Luyster R et al. , “The autism diagnostic observation schedule - Toddler module: A new module of a standardized diagnostic measure for autism spectrum disorders,” J. Autism Dev. Disord, vol. 39, no. 9, pp. 1305–1320, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] [25].King DE, “Dlib-ml: A machine learning toolkit,” J. Mach. Learn. Res, vol. 10, pp. 1755–1758, 2009. [Google Scholar]

[R26] [26].Chang et ah Z, “Computational Methods to Measure Patterns of Gaze in Toddlers With Autism Spectrum Disorder,” pp. 1–10, 2021, doi: 10.1001/jamapediatrics.2021.0530. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] [27].Perochon et ah S, “A scalable computational approach to assessing response to name in toddlers with autism,” J. Child Psychol. Psychiatry Allied Discip, 2021, doi: 10.1111/jcpp.13381. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] [28].Baltrusaitis T, Zadeh A, Lim YC, and Morency LP, “OpenFace 2.0: Facial behavior analysis toolkit,” in Proceedings - 13th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2018,2018, pp. 59–66. [Google Scholar]

[R29] [29].Hashemi et ah J, “Computer Vision Analysis for Quantification of Autism Risk Behaviors,” IEEE Trans. Affect. Comput, 2018, doi: 10.1109/TAFFC.2018.2868196. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] [30].Yan WJ, Wu Q, Liang J, Chen YH, and Fu X, “How Fast are the Leaked Facial Expressions: The Duration of Micro-Expressions,” ]. Nonverbal Behav, vol. 37, no. 4, pp. 217–230, 2013. [Google Scholar]

[R31] [31].Salthouse TA and Ellis CL, “Determinants of eye-fixation duration.,” Am. ]. Psychol., vol. 93, no. 2, pp. 207–234, 1980. [PubMed] [Google Scholar]

[R32] [32].Galley N, Betz D, and Biniossek C, “Fixation durations - Why are they so highly variable?,” in Advances in Visual Perception Research, 2015, pp. 83–106. [Google Scholar]

[R33] [33].Richman JS and Moorman JR, “Physiological time-series analysis using approximate and sample entropy,” Am. ]. Physiol. - Hear. Circ. Physiol, vol. 278, no. 6, pp. H2039–H2049, 2000. [DOI] [PubMed] [Google Scholar]

[R34] [34].Lake DE, Richman JS, Pamela Griffin M, and Randall Moorman J, “Sample entropy analysis of neonatal heart rate variability,” Am. ]. Physiol. - Regul. Integr. Comp. Physiol, vol. 283, no. 3, pp. R789–R797, 2002. [DOI] [PubMed] [Google Scholar]

[R35] [35].Dong et ah X, “An improved method of handling missing values in the analysis of sample entropy for continuous monitoring of physiological signals,” Entropy, vol. 21, no. 3, p. 274,2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] [36].Cirugeda-Roldan E, Cuesta-Frau D, Miro-Martinez P, and Oltra-Crespo S, “Comparative study of entropy sensitivity to missing biosignal data,” Entropy, vol. 16, no. 11, pp. 5901–5918, 2014. [Google Scholar]

[R37] [37].Hashemi J, Qiu Q, and Sapiro G, “Cross-modality pose-invariant facial expression,” in Proceedings - International Conference on Image Processing, ICIP, 2015, pp. 4007–4011. [Google Scholar]

[R38] [38].Breiman L, Friedman JH, Olshen RA, and Stone CJ, Classification and regression trees. Routledge, 2017. [Google Scholar]

[R39] [39].DeLong ER, DeLong DM, and Clarke-Pearson DL, “Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach,” Biometrics, vol. 44, no. 3, p. 837,1988. [PubMed] [Google Scholar]

[R40] [40].Zampella CJ, Bennetto L, and Herrington JD, “Computer Vision Analysis of Reduced Interpersonal Affect Coordination in Youth With Autism Spectrum Disorder,” Autism Res, 2020, doi: 10.1002/aur.2334. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] [41].Dawson G, Campbell K, Hashemi J, Lippmann S.j., Smith V, Carpenter K, Egger H, Espinosa S, Vermeer S, Baker J, and Sapiro G, “Atypical postural control can be detected via computer vision analysis in toddlers with autism spectrum disorder,” Scientific reports, vol. 8, no. 9, pp.1–7, 2018 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Exploring Complexity of Facial Dynamics in Autism Spectrum Disorder

Pradeep Raj Krishnappa Babu

J Matias Di Martino

Zhuoqing Chang

Sam Perochon

Kimberly LH Carpenter

Scott Compton

Steven Espinosa

Geraldine Dawson

Guillermo Sapiro

Roles

Abstract

1. Introduction

2. Data Collection and Stimuli

2.1. Participants and Study Procedures

TABLE 1.

Fig. 1.

2.2. Movies

3. Methods and Analysis

3.1. Facial Landmark Detection and Preprocessing

Fig. 2.

3.2. Computation of Facial Landmarks’ Dynamics

3.3. Multiscale Entropy for Measuring the Complexity of Landmarks’ Dynamics

Fig. 3.

3.4. Estimation of Affective States

3.5. Statistical Analysis

4. Results

4.1. Complexity of Facial Landmarks’ Dynamics

Fig. 4.

Fig. 5.

4.2. Landmarks’ Complexity and Affective State

TABLE 2.

TABLE 3.

4.3. Classification approach using Landmarks’ Complexity

Fig. 6.

TABLE 4.

TABLE 5.

5. Discussion and Conclusions

Supplementary Material

Acknowledgment

Biographies

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases