Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Jun 1.
Published in final edited form as: J Autism Dev Disord. 2023 Apr 27;54(6):2286–2297. doi: 10.1007/s10803-023-05973-0

Computer Vision Analysis of Caregiver–Child Interactions in Children with Neurodevelopmental Disorders: A Preliminary Report

Dmitry Yu Isaev 1, Maura Sabatos-DeVito 2, J Matias Di Martino 3, Kimberly Carpenter 2, Rachel Aiello 2, Scott Compton 2, Naomi Davis 4, Lauren Franz 2,5, Connor Sullivan 2, Geraldine Dawson 2, Guillermo Sapiro 3,6
PMCID: PMC10603206  NIHMSID: NIHMS1923430  PMID: 37103659

Abstract

We report preliminary results of computer vision analysis of caregiver–child interactions during free play with children diagnosed with autism (N = 29, 41–91 months), attention-deficit/hyperactivity disorder (ADHD, N = 22, 48–100 months), or combined autism + ADHD (N = 20, 56–98 months), and neurotypical children (NT, N = 7, 55–95 months). We conducted micro-analytic analysis of ‘reaching to a toy,’ as a proxy for initiating or responding to a toy play bout. Dyadic analysis revealed two clusters of interaction patterns, which differed in frequency of ‘reaching to a toy’ and caregivers’ contingent responding to the child’s reach for a toy by also reaching for a toy. Children in dyads with higher caregiver responsiveness had less developed language, communication, and socialization skills. Clusters were not associated with diagnostic groups. These results hold promise for automated methods of characterizing caregiver responsiveness in dyadic interactions for assessment and outcome monitoring in clinical trials.

Keywords: Autism, ADHD, Caregiver–child interaction, Dyadic data analysis, Micro-analytic coding, Computer vision

Introduction

Dyadic interactions between a caregiver and child are foundational to children’s development in multiple domains including joint attention, language, and self-regulation. For example, longitudinal studies of autistic children have found that caregiver’s responsiveness to their child’s attention and activity during play predicted rate of children’s language growth (Adamson et al., 2019; Siller & Sigman, 2002, 2008). Measurement of play-based caregiver–child interactions is important for assessing proximal clinical outcomes in clinical trials, especially those that involve caregiver-mediated interventions (Conrad et al., 2021; Nevill et al., 2018; Rogers et al., 2012). A traditional method of measuring behaviors observed during caregiver–child interactions involves human coding, a time consuming and labor-intensive process that requires training and assessment of inter-rater reliability making it less scalable for large clinical trials. Thus, there is a need for automated and objective methods for assessing dyadic interactions.

Both autism and attention-deficit/hyperactivity disorder (ADHD) have been found to be associated with differences in early caregiver–child interaction. Reduced frequency of joint engagement and attention has been found to characterize interactions between young autistic children and their caregivers (Adamson et al., 2019, 2021). Caregivers interacting with their children with ADHD have been found to have a more directive style of interaction (Tallmadge & Barkley, 1983). In the present study, we examined caregiver–child interactions with children diagnosed with autism alone, autism and co-occurring ADHD, ADHD alone, and neurotypical development in order to better characterize variations in caregiver–child interaction using a novel computer vision-based method.

Computer Vision Approaches for Behavior Analysis

Computer vision-based approaches have been used to objectively characterize behavioral patterns in children, including those associated with autism (e.g., Campbell et al., 2019; Carpenter et al., 2021; Chang et al., 2021; Dawson et al., 2018; Lidstone et al., 2021; Tunçgenç et al., 2021). Using a combination of strategically designed stimuli, sensing technologies, and automatic analysis via computer vision and machine learning, digital approaches have been used to measure facial dynamics (Krishnappa Babu et al., 2021), emotions (Carpenter et al., 2021; Egger et al., 2018), head turns in response to name (Perochon et al., 2021), social attention (Chang et al., 2021), imitative behaviors (Lidstone et al., 2021; Tunçgenç et al., 2021), and interactive movement dynamics (Chen et al., 2021). Chen et al. (2021) automatically tracked head movements in mothers and infants during a face-to-face still face procedure and found that infant–mother head movement dynamics varied based on the infant’s attachment security.

To the best of our knowledge, only one study so far (Kojovic et al., 2021) has used computer vision analysis (CVA) to measure dyadic interactions in autistic and neurotypical children. Kojovic and colleagues (2021)) successfully distinguished autistic from neurotypical children based on automated analysis of clinician-child interactions recorded during the Autism Diagnostic Observation Schedule (Lord et al., 2000), including the Toddler Module and Modules 1 and 2, which are sensitive to non-verbal communication (Lord et al., 2012). Although promising, some of the challenges with using this approach include the great variability observed in unconstrained behavior during naturalistic adult–child interactions, the lack of tools that can reliably detect 3D human poses from 2D recordings, and a lack of 3D recordings of adult–child interactions.

Analysis of the Interactive Behaviors

In the present study, we analyzed behaviors of caregivers and children recorded during a 6-min period of free play with a set of standardized, developmentally appropriate toys. Interactive behaviors in a dyadic context are complex and multimodal, often involving a combination of facial expressions, gestures, vocalizations, gaze, and movements of hands and body in a sequence which can be contingent or non-contingent on the previous behavioral patterns of a partner. As a first step in applying CVA techniques to an unconstrained dyadic free play setting, we focused our measurement on a commonly observed behavior that was exhibited by both the child and their caregiver, namely, reaching to one of the toys. Given that caregivers and children were sitting on the floor with a set of toys located between them, reaching for a toy was observed frequently for both the caregiver and child, which we considered as a proxy to initiating or responding to a toy play bout by the partner. Furthermore, previous studies, some of which use eye-tracking to examine the moment-to-moment interaction between caregivers and children, have reported that the child’s and caregiver’s use of hands/manual contact with toys is important for facilitating children’s sustained attention and developing joint attention and engagement (Suarez-Rivera et al., 2019; Yurkovic et al., 2021). We were particularly interested in whether the caregiver’s reaching behavior was contingent on the child’s behavior, as this could be interpreted as responding to the child’s initiation. Caregiver responsiveness has been identified as a mechanism for facilitating autistic children’s language acquisition and social communication in natural play and in the context of caregiver-mediated interventions (Davis et al., 2022; Siller & Sigman, 2002, 2008; Warlaumont et al., 2014). Using CVA, we automatically extracted time-points of ‘reaching to a toy’ behaviors from 2D video recordings of dyadic interaction and analyzed initiating-responding patterns in this behavior using dyadic data analysis approaches (Kenny et al., 2006). We then used time-series analysis of the ‘reaching to a toy’ behavior to identify distinct clusters of dyadic interaction styles based on the contingent or non-contingent (person following or not following other person’s lead, respectively) relationship between caregiver and child reaching. Finally, we examined whether these distinct clusters were correlated with the child’s communication and socialization abilities (as reported by caregiver on a standardized measure), and language skills (verbal IQ). We hypothesized that cluster membership would be associated with diagnostic group and level of social and communicative abilities across diagnostic groups. We viewed this as a first step towards exploring the validity of interpretable automatic annotation of dyadic behavior second by second (i.e. micro-analytic coding, Bakeman & Gottman, 1997) to assess outcomes in clinical trials.

Methods

Participants

Participants were seventy-eight children, ranging from 41 to 100 months of age, and their caregivers who were part of a study funded by the National Institute of Health. In this study, all caregivers were familiar and included 6 fathers, 71 mothers, and 1 grandmother. Children were recruited through brochures posted on the university website and given out at community events attended by families with children with developmental disabilities (e.g., walks), email, and social media. The ethnic and racial composition of the sample was as follows: White, 70.51%; Black, 10.26%; Asian, 5.13%; Other and mixed race, 14.10%; Hispanic, 10.26%. The mean level of maternal education was having received a bachelor’s degree. The sample included 29 children diagnosed with autism spectrum disorder (ASD; Mean age = 66 months, SD = 14.5, 23 males), 20 children diagnosed with both autism and attention-deficit/hyperactivity disorder (ADHD; Mean age = 78.7, SD = 13.5, 13 males), 22 children diagnosed with ADHD (Mean age = 76.1, SD = 13.7, 18 males), and 7 children with neurotypical development (NT; Mean age = 79.6, SD = 14.9, 4 males). Information regarding demographic characteristics by group is provided in Table 1. A chi-square test was performed to check for associations between diagnostic group and race, ethnicity and maternal education level. Inclusion criteria for each subgroup were as follows: (1) autism alone: DSM-5 diagnosis of ASD, based on ADOS-2 and Autism Diagnostic Interview-Revised (Le Couteur et al., 2003) and a score on the ADHD rating scale (DuPaul et al., 1998; McGoey et al., 2007) < 80%ile; (2) ADHD alone: Score on ADHD-RS ≥ 93%ile and an expert consensus DSM-5 diagnosis of ADHD and score on the Social Responsiveness Scale-2nd Edition (SRS-2) of < 60. (Constantino & Gruber, 2012); (3) autism + ADHD met DSM-5 criteria for ASD based on ADOS-2 and ADI-R, score > 93%ile on ADHD-RS, and meet consensus expert DSM-5 diagnosis of ADHD; (4) NT: score < 80%ile on ADHD-RS and < 60 on SRS-2, and Full Scale IQ > 80 as assessed by the Differential Ability Scales – II (Elliott, 2007). These diagnostic measures were administered as part of this study by clinical psychologists who were research-reliable on the ADOS-2 and ADI-R. Exclusion criteria included known genetic or neurological syndrome or condition, history of epilepsy or current seizure disorders, significant vision, hearing, and/or serious motor impairment, and for clinical groups, a clinically elevated score (t score ≥ 65) on the Child Behavior Checklist (Achenbach & Rescorla, 2000) in domains other than those related to autism or ADHD. We excluded children with other mental health conditions to better understand the specific effects of autism and/or ADHD in caregiver–child interactions. NT participants were excluded if they had a known or suspected developmental, neurological, or psychiatric disorder and/or clinically elevated scores on the SRS-2, ADHD-RS, and/or Child Behavior Checklist, and/or sibling/first degree relative with autism or ADHD. All caregivers/legal guardians of participants gave written, informed consent and the study protocol was approved by the Duke University Health System Institutional Review Board (Protocol number Pro00085156). Methods were carried out in accordance with institutional, State, and Federal guidelines and regulations.

Table 1.

Sex, racial and ethnic breakdown and maternal education level per diagnostic group

Group # Participants Age range (months) # Male # Hispanic/Latino Race breakdowna
Mean level of maternal education
# W # B # A #MTO # Other
NT   7 55–95   4 2   7 0 0 0 0 Some graduate school
Autism 28 41–91 22 2 15 4 4 3 2 Bachelor’ degree
ADHD 22 48–100 18 1 13 3 0 6 0 Bachelor’s degree
Autism + ADHD 20 56–98 13 3 19 1 0 0 0 Bachelor’s degree
a

Race breakdown: W White, B Black, A Asian, MTO more than one race

Social, Communication and Language Assessments

Social and communication skills were assessed with a caregiver-report measure, the Vineland Adaptive Behavior Scales - 3rd edition (VABS-3, Sparrow et al., 2016). Verbal IQ was assessed with the Differential Abilities Scales-II (DAS-II, Elliott, 2007). Means and standard deviations for the VABS Socialization (VSoc), VABS Communication (VCom), and DAS Verbal IQ (VIQ) scores for the autism, ADHD, autism + ADHD, and NT groups are shown in Table 2.

Table 2.

Means and Standard Deviations for Vineland Adaptive Behavior Scale (VABS-3) Communication and Socialization Domain Scores, and Verbal IQ scores

Group VABS-3 Communication VABS-3 Socialization Verbal IQa
NT 107.29 (6.95) 104.71 (6.02) 114.43 (7.00)
Autism   75.76 (17.02)   73.28 (15.27)   76.97 (31.86)
ADHD   93.09 (11.84)   87.68 (10.49) 113.77 (21.35)
Autism + ADHD   83.45 (10.81)   76.95 (14.74)   97.60 (20.08)
a

Differential Abilities Scale (DAS-II) Verbal Ability Standard Score

Caregiver–Child Interaction Assessment

As part of a series of caregiver-child play-based interactions, participants were invited to play together on the floor “as they naturally would” for 6 min. Caregivers and children wore purple and red T-shirts, respectively, to be more easily detected by CVA. A standardized set of toys representing 7 toy categories (building, pretend, puzzle, game, transportation, books, drawing/writing) was placed in a bin in the center of the room. Toys (puzzle, sensory) were also available at a table in the corner of the room, but we chose to analyze the behavior only in the center of the room because reaching events (RE) could not be reliably detected when at the table. The interaction was recorded synchronously at 30 frames per second by 2 RGB cameras located in the corners of the room, resulting in the video recording from 2 viewpoints (see Fig. 1).

Fig. 1.

Fig. 1

Cosine of bending angle (BA) signals from the left and right video streams of the child (top left) and caregiver (bottom left). Detected contiguous reaching events (CREs) are blue-shaded vertical bars on each graph. A, B, C vertical lines correspond to video frames on the right. A child sits straight, caregiver touches the toy; B child touches the toy and caregiver sits straight; C caregiver touches the toy. Child CRE happens between A and B, caregiver CRE happens between B and C

Automated Coding via Computer Vision Analysis

Preprocessing and Feature Extraction

Body pose landmarks were extracted first. After initial extraction of landmarks time series, we transformed them into Bending Angle (BA) time series, a composite metric to describe the process of torso rotation towards the floor, which accompanies reaching movement. Continuous reaching events (CREs) were a result of segmentation of BA time-series based on constraints on the cosine of angular velocity. Finally, REs were extracted as a binary time series indicating whether the participant exhibited a reaching movement (i.e. CRE) with 0.5 seconds precision.

Landmark Detection

Videos were split into left and right streams, corresponding to the 2 cameras. Participants were then detected on each video using the DensePose algorithm (Güler et al., 2018) and identified as child and caregiver using in-house code for classifying colors of T-shirts. A 3D multi-person pose estimation algorithm (3DMPPE, Moon et al., 2019) was then run separately on the left and right video streams. 3DMPPE provides estimated 3D coordinates of body landmarks of both participants in the same 3D space relative to the camera axis. The OpenPose (Cao et al., 2021) 2D pose detection algorithm was run in parallel, providing confidence scores per landmark, a proxy for data quality. Parts of signals with a low confidence score (average of upper body confidence scores < 0.45) were removed from further analysis (upper body confidence scores mean and standard deviation were 0.62±0.14 for the entire dataset).

Bending Angle Time Series Construction

Torso directions, a vertical vector, and bending angles were computed (from the above mentioned landmarks) as preliminary steps for the detection of RE time series, which was our primary variable of interest for the analysis. A torso direction (ToD) vector was computed for each participant as a normalized cross-product of the left-right shoulder and neck-pelvis landmark vectors. We observed that when caregiver and child sit in front of each other (angle between ToDs of caregiver and child is about 180 degrees), their spines are vertical. This observation was used to compute the vertical vector (VV) as the average of child’s and caregiver’s pelvis-neck vectors taken together across all frames where the scalar product ToDchildToDcaregiver<0.95. After VV for each video stream was computed, BA was defined as BA(t)=arccos(ToD(t)VV). See Fig. 2 for visualization of these vectors and angles. When the child or caregiver is reaching for the toy, they bend their torso towards the floor, which is reflected in a temporary BA increase. For each dyad, the BA is the time-series signal per participant (caregiver/child) and camera view (left/right).

Fig. 2.

Fig. 2

A schema of two subjects (a caregiver and a child) skeleton landmarks when playing on the floor. Torso Direction (ToD) vector is computed as a cross-product of Pelvis-Neck and Left–Right shoulder vectors. Vertical Vector (VV) is computed as an average of Pelvis-Neck vectors for all frames where the cosine of angle between ToDcaregiver and ToDchild less than −0.95 (torsos are in front of each other). Bending angles (BA) are angles between each persons’ ToD and VV

Continuous Reaching Events Extraction

To remove noise, the signal was low-pass filtered at 5 Hz, then a sliding window of 15 frames (0.5s) and step of 1 frame was applied, a first-order linear approximation to the signal was fitted in each window, and slope of the linear approximation was computed. In parallel, the confidence signal was computed as the median confidence of landmarks in each window. The final BA signal per participant was combined from the left and right BA, selecting the signal with higher confidence at each timepoint.

CREs were then defined as contiguous periods where the slope of the cos(BA(t)) signal was bounded between −1.0 and −0.1 (see Fig. 1). These boundaries were selected empirically based on reviewing the time series for reaching events together with participants’ videos, and the specificity of CRE detections was assessed independently.

Reaching Events Time Series Construction

We then discretized the entire time of the experiment into 1-second windows with 0.5 second overlap. For the caregiver and the child, we separately defined reaching event time series (‘RE time series’) as a binary value per each window. The window was assigned a value of “1” if CRE start timepoint fell into it, and “0” otherwise. RE time series effectively define the binary signal of ‘beginning of the reaching to the toy movement’ with the frequency of 2 Hz. In each dyad, there are two RE time series (for the caregiver and for the child). They are the basis of the time series dyadic analysis which we performed to reveal patterns of interaction.

Specificity Assessment

For each dyad video, ten CREs per caregiver and child were randomly selected from all CREs (1560 segments total). A random subsample of 200 segments per parent and per child was then selected from this sample and labeled by two independent manual raters. Reaching events extraction was done via a fully automatic pipeline, and human raters were involved only in the specificity assessment. CRE was manually labeled as true detection if a person bent or touched/moved a toy with their hand; otherwise CRE was labeled as false detection. Inter-rater reliability (Cohen’s kappa) and specificity, measured as a percentage of consensus true positive labels was computed. Labeling was performed with ‘pigeon’ widget for Jupyter Notebook (https://github.com/agermanidis/pigeon).

Time-Series Dyadic Analysis

For each dyad, a pair of binary RE time-series was transformed into a single time-series with four states (‘No RE,’ ‘RE Child,’ ‘RE Caregiver,’ ‘RE Both’). Then dyadic data analysis (Kenny et al., 2006) methods (Actor-Partner Interdependence Model, APIM), implemented as Markov and Mixture Markov models (Fuchs et al., 2017; Helske & Helske, 2019; van de Pol & Langeheine, 1990) were applied to the time series. Transition probabilities between the states of the Markov model (MM) allow us to characterize how one participant’s RE state at the current timepoint influenced both themselves and their partner at the next timepoint, capturing interdependence between participants. Applying a Mixture Markov Model (MixMM) to the same data reveals clusters with different transition probabilities matrices, a proxy for different interaction patterns in dyads. Bayesian Information Criteria (BIC, Schwarz, 1978) was used to measure goodness-of-fit of the models. Cluster stability was additionally assessed via model-based distance methods and bootstrapping (see Supplemental Materials for details).

Potential Influence of Demographic Factors

To assess whether the CVA methods are biased based on the race of the participant, we tested whether the amount of dropped frames during reaching events detection process was associated with race. We also tested whether the amount of detected CREs can be associated with demographic factors, including age, sex, maternal education level, race, and ethnicity. Results of this analysis are reported in the Supplemental Materials.

Association of Cluster Membership and Clinical Measures

After clustering the dyadic time series, we examined the association between cluster membership and clinical measures, including VCom and VSoc and VIQ. Our aim was to examine whether cluster variable helps to explain the variance in the clinical scores. For that we ran a sequential analysis of variance on linear model fit (models M1-M10). Measure in the model formulation is either VCom, VSoc, or VIQ; and Group stands for the diagnostic group.

Models M1–M4 serve as a baseline, which allowed us to estimate how well the variance in Measure is explained by Group and Age variables alone.

M1:MeasureAge
M2:MeasureGroup
M3:MeasureAge+Group
M4:MeasureAge+Group+GroupAge

Models M5–M10 are designed to test whether adding Cluster variable helps to explain the variance in Measure besides what is explained by the Group variable. Additionally, models M9 and M10 explore potential interaction between Cluster and Age or Group.

M5:MeasureCluster
M6:MeasureAge+Cluster
M7:MeasureGroup+Cluster
M8:MeasureAge+Group+Cluster
M9:MeasureAge+Group+Cluster+AgeCluster
M10:MeasureAge+Group+Cluster+GroupCluster

Model parameters, such as between- and within-group degrees of freedom are provided in Table SM2. Additionally, we performed a power sensitivity analysis using G*Power software package (Faul et al., 2009) to assess the effect size that was needed to reject the null hypothesis with a sample size of 78 subjects at the power level of 0.8. Finally, we also checked for an association between diagnostic group, race, or ethnicity and cluster with a Chi-squared test.

Results

Diagnostic group was associated with race (X (12,N=78) = 23.8, p = 0.022), but was not associated with ethnicity or maternal education level, based on Chi-square test statistics. See Table 1 and Table SM1 in Supplemental Materials for details.

Cohen’s kappas for CRE detections for children and caregivers were 0.85 and 0.86, respectively. Specificity of detections for children was 0.76 (95%CI: 0.67 – 0.82) and for caregivers was 0.70 (95%CI: 0.67 – 0.82). Kruskal-Wallis test (Kruskal & Wallis, 1952) was not significant when comparing specificity of CRE detections between diagnostic groups both for children (H(3)=0.58, p=0.90) and caregivers (H(3)=0.78,p=0.86).

MixMM with 2 clusters showed a better fit than MM when compared by Bayesian Information Criteria (BIC=86302.33 for MixMM; BIC=87166.67 for MM). Behavioral differences between Cluster 1 and Cluster 2 are presented in Table 3. Clusters were highly stable, based on bootstrapping assessment (see Supplemental Materials). Cluster 2 was characterized by more frequent caregiver responsiveness to the child’s behavior and more overall reaching movement. Compared to Cluster 1, Cluster 2 had more transitions from ‘RE Child’ to ‘RE Caregiver’ state (by 130.2%), from ‘RE Caregiver’ to ‘RE Child,’ and from ‘No RE’ to ‘RE Child’ states (by 15.5% and 29.2% respectively), and overall more transitions to ‘RE Both’ and fewer transitions to ‘No RE’ states (more amount of reaching movements).

Table 3.

Cluster 2–Cluster 1 differences in transition matrices (%)

From state To state
No RE RE Child RE Caregiver RE Both
No RE − 20.4   29.2 122.9 221.0
RE Child − 23.3 − 10.1 130.2 142.2
RE Caregiver − 23.0   15.5 − 1.5   48.7
RE Both − 31.7 − 17.4 − 2.3   18.4

Computed as (TM2TM1)/TM1×100%

There are four potential states at each point of behavioral time series: No Reaching Event (No RE), Child Reaching Event (RE Child), Caregiver Reaching Event (RE Caregiver), Caregiver and Child Reaching Events happening at the same timepoint (RE Both). Each potential state transition is defined by an entry in the matrix. The value corresponding to each transition is the relative percentage difference between transition probabilities of dyads in Cluster 2 and Cluster 1 referenced to dyads in Cluster 1

Relationships Between Cluster Classification and Clinical Characteristics

Clusters obtained in MixMM also differed significantly in terms of their VABS-3 Communication and Socialization Domain Scores and Verbal IQ, as measured by one-way ANOVA model (F(1,76) = 9.522, f = 0.35, p=0.003 for VABS Communication; F(1,76)=5.863, f = 0.28, p= 0.018 for VABS Socialization; F(1,76)=10.53, f = 0.37, p=0.002 for Verbal IQ). Children in Cluster 2 were found to have lower communication, socialization, and language skills (see Fig. 3).

Fig. 3.

Fig. 3

Differences on VABS Communication and Socialization, and DAS Verbal IQ scores for Cluster 1 versus Cluster 2 revealed by Mixed Markov Model. Cluster 2 is characterized by more frequent caregiver responsiveness to the child’s behavior and more overall reaching movement

We next examined whether autism, autism+ADHD, ADHD, and NT children were more likely to fall into Cluster 1 or 2. Results showed that clusters were not associated with diagnostic groups or with demographic characteristics based on a Chi-squared test (see Table SM1 in Supplemental Materials for details). Additionally, clusters were not associated with age (F(1,76)=0.008, p=0.927). Thus, levels of language and social abilities were related to cluster membership, but not to clinical diagnosis.

Next, using models M1-M10 listed above, we examined how adding cluster variable to group and age helps to explain the variance in clinical scores. F-statistics, p-values and effect sizes (Eta-squared and Cohen’s f) for each variable of the sequentially applied linear models (M1-M10) and adjusted R2 per each model are shown in Table SM3.

In a set of baseline models, though Age was significantly associated with the clinical scores (model M1, adjusted R2 = 0.10, 0.05, and 0.09, for VCom, VSoc, and VIQ, respectively), the biggest single contribution to the variances of clinical scores was by diagnostic Group (model M2, adjusted R2 = 0.33, 0.32, and 0.27, for VCom, VSoc, and VIQ, respectively). Adding Group variable to explain variance unexplained by Age helps to explain the data better (models M3 and M4).

In a set of models exploring potential of the Cluster variable to explain the variance in data, Cluster by itself (model M5) demonstrated adjusted R2 = 0.10, 0.06 and 0.11 for VCom, VSoc and VIQ measures. However, both separately adding Age (model M6) and Group (model M7), and then further adding both Age and Group (model M8), interaction of Cluster and Age (model M9), and Cluster and Group (model M10) significantly increased the model’s explained variance. The model that explained the data best was M9, which included the interaction Age*Cluster term. It showed adjusted R2 = 0.44, 0.40 and 0.36 for VCom, VSoc and VIQ respectively. Note that for VIQ, measure model M10 was on par with M9 on adjusted R2; however, neither the Age*Cluster term in M9, nor the Group*Cluster term in M10 were significant. For VCom and VSoc, the Age*Cluster interaction term in model M9 was significant (F(1,71) = 4.216, f = 0.24, p = 0.044 and F(1,71) = 4.798, f = 0.26, p = 0.032 for VCom and VSoc, respectively). It implies different patterns of association between VCom and VSoc and Age for different clusters, which is shown in Fig. 4. Specifically, increase of age in Cluster 2 was associated with increase of VCom and VSoc scores, while there was no significant association of age and VABS scores in Cluster 1.

Fig. 4.

Fig. 4

Association of VABS scores and Age in Cluster 1 (C1) and Cluster 2 (C2). In Cluster 2, which was characterized by more caregiver responsiveness and more overall reaching behavior, VABS Communication and Socialization scores are positively associated with age

Discussion

We applied computer vision analytics to a 6-minute, lab-based, caregiver-child free play interaction to measure initiating and responding to reaching for a toy in caregiver-child dyads and examined correlations of these interaction patterns with standardized clinical measures of children’s communication, social, and verbal skills. These methods detected two clusters of dyads that differed in their overall amount of reaching movements and in their patterns of caregiver responsiveness. Specifically, as compared to Cluster 1, dyads in Cluster 2 exhibited reaching movements more frequently and the caregiver was more likely to respond to the child’s reaching for a toy by also reaching for a toy. Children in Cluster 2 were also found to have lower levels of communication, social, and verbal skills. Alternatively, children who had higher social and language skills were more likely to be in Cluster 1, in which caregivers were less likely to be contingently responding to their child’s reaching for a toy. Furthermore, we found that, within Cluster 2 which was characterized by caregiver responsiveness and more movement, children’s age was also correlated with level of social abilities; specifically, children who were older had higher social abilities. Interestingly, this relationship was not found for children in Cluster 1, which was characterized by non-contingent caregiver responding and less movement. We also found that cluster membership was not associated with a child’s diagnostic status or age. Cluster membership was predictive of a child’s language and social abilities, but not their clinical diagnosis of autism or ADHD.

Caregiver Responsiveness and Language Abilities

While caregiver responsiveness has been identified as a mechanism for facilitating language and social growth in autistic children, longitudinal studies of caregiver-child interaction have identified important changes in the nature of caregiver responsiveness as a child acquires more advanced social and language skills. Bornstein and colleagues defined caregiver responsiveness as the prompt, contingent responses to their child’s activities, which can include toy exploration and verbalizations (Bornstein et al., 2008). In a prospective longitudinal study, these authors found that as children acquired more advanced language skills, caregivers’ reduced their responsiveness to children’s exploratory toy behavior (i.e., manipulating objects) and increased their responsiveness to their child’s verbalizations by imitating and expanding on their verbalizations. These results may help to explain the finding in the present study that children with more advanced language were more likely to be in Cluster 1, which was characterized by lower levels of caregiver contingent responding to the child’s reaching for a toy. It would be of interest in future analyses to explore whether the lower level of caregiver’s contingent responding to the child’s reaching behavior was accompanied by an increased level of contingent responding to the child’s verbalizations.

It was of particular interest that associations between caregivers’ contingent responsiveness was associated with children’s language and communication abilities, but did not differ based on diagnostic group. This finding suggests that the relationship between key features of caregiver behavior in the context of caregiver-child interaction, namely, contingent responsiveness and language acquisition, are trans-diagnostic, reflecting general principles regarding the importance of caregiver-child interaction for facilitating children’s acquisition of communication skills (Tomasello & Farrar, 1986).

Limitations

A limitation of the study is that only reaching behavior was considered in the analysis. As an initial attempt to apply computer vision to the measurement of dyadic interaction, we chose reaching events derived from bending angle because it has a strong signal-to-noise ratio. Furthermore, it is possible that more frequent reaching behaviors in Cluster 2 might be a sign of more movement activity of children in Cluster 2 in general, although the lack of an association between an ADHD diagnosis and Cluster 2 membership does not support this. As responsiveness is a multi-modal concept, physical behavior, like reaching movement, can elicit a reply in other modalities, such as verbalizations as mentioned above. Adding automated detection of vocalizations is a natural extension of this study.

An additional limitation of the study is the quality of reconstructing 3D from 2D and extracting ‘reaching’ signals, which is reflected in the specificity levels of RE detection of 0.76 and 0.70 for parent and child respectively. Using 3D depth cameras and estimating landmarks from the 3D data can be a potential solution in the future. Usage of purple and red T-shirts to identify the participants constrains the proposed method’s scalability, and can be addressed in the future studies by using more advanced machine learning technologies for human detection.

Analysis of the behavior only in the center of the room is another study limitation coming from the method constraints. When children are standing, which inevitably happens when they approach the table in the corner of the room, or walk away from the toys on the floor, it is not possible to detect RE both due to the relative position of child’s body to the camera and the absence of torso rotation needed to reach to the object on the table. Nevertheless, this happens rarely because toys for this part of the caregiver-child interaction are intentionally placed on the floor and participants are specifically instructed to move to the floor in the center of the room to play with the toys from the box.

Another limitation of the study is the sample, which was relatively small, limited diversity, and included small neurotypical comparison group. Seventy percent of the sample was White and the mean maternal education level was at least a bachelor’s degree. Children with other co-occurring conditions, such as anxiety, were not included which limits the generalizability of the findings. Future studies on CVA analysis of caregiver-child interaction should address this by including a more diverse sample. Specifically, the interaction of age and cluster findings, which have the least effect size, should be verified by future studies.

Conclusions

The long-term goal of this initial feasibility study is to use automated methods for micro-analytic coding to measure changes in caregiver-child interactions in the context of clinical trials. This preliminary work is a first step in applying automated, tailored methods to naturalistic caregiver-child interactions to detect different behavioral patterns of dyads during free play. We presented the first attempt to micro-analytic coding of caregiver-child interactions with a focus on reaching behavior, which was used as a proxy for initiating and responding to a toy bout. Our findings offer promise that CVA of dyadic behavior may provide a useful way of automatically and objectively characterizing clinically meaningful aspects of caregiver-child interaction.

Supplementary Material

Supplementary file 1
supplementary file 2

Acknowledgements

This work was funded by grants from NICHD (P50HD093074, Dawson, PI), NIMH (R01MH120093, Sapiro and Dawson, MPI), and the Stylli Translational Neuroscience Award. Additional support from ONR, NGA, and NSF is acknowledged. We gratefully acknowledge the caregivers and children who participated in this study.

Footnotes

Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/s10803-023-05973-0.

Conflict of interest Dr. Dawson is on the Scientific Advisory Boards of Janssen Research and Development, Akili Interactive, Inc, LabCorp, Inc, Roche Pharmaceutical Company, Zynerba, and Tris Pharma, and is a consultant to Apple, Gerson Lehrman Group, and Guidepoint Global, Inc. Dr. Dawson has stock interests in Neuvana, Inc unrelated to this work. Drs. Dawson, Sapiro, and Carpenter have a patent (11158403B1) related to digital phenotyping methods. Drs. Dawson, Sapiro and Carpenter developed technology, not covering this work, that has been licensed to Apple, Inc. and they and Duke University have benefited financially. Dr. Sapiro is also affiliated with Apple.

References

  1. Achenbach TM, & Rescorla LA (2000). Manual for the ASEBA school-age forms & profiles : an integrated system of multi-informant Assessment. University of Vermont. [Google Scholar]
  2. Adamson LB, Bakeman R, Suma K, & Robins DL (2019). An expanded view of joint attention: Skill, engagement, and language in typical development and autism. Child Development, 90(1), e1–e18. 10.1111/cdev.12973 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Adamson LB, Bakeman R, Suma K, & Robins DL (2021). Autism adversely affects auditory joint engagement during parent–toddler interactions. Autism Research : Official Journal of the International Society for Autism Research, 14(2), 301–314. 10.1002/aur.2355 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bakeman R, & Gottman JM (1997). Observing interaction: An introduction to sequential analysis (2nd ed.). Cambridge University Press. 10.1017/CBO9780511527685 [DOI] [Google Scholar]
  5. Bornstein MH, Tamis-LeMonda CS, Hahn CS, & Haynes OM (2008). Maternal responsiveness to young children at three ages: Longitudinal analysis of a multidimensional, modular, and specific parenting construct. Developmental Psychology, 44(3), 867–874. 10.1037/0012-1649.44.3.867 [DOI] [PubMed] [Google Scholar]
  6. Campbell K, Carpenter KL, Hashemi J, Espinosa S, Marsan S, Borg JS, Chang Z, Qiu Q, Vermeer S, Adler E, Tepper M, Egger HL, Baker JP, Sapiro G, & Dawson G (2019). Computer vision analysis captures atypical attention in toddlers with autism. Autism: The International Journal of Research and Practice, 23(3), 619–628. 10.1177/1362361318766247 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Cao Z, Hidalgo G, Simon T, Wei SE, & Sheikh Y (2021). Open-Pose: Realtime Multi-Person 2D pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(1), 172–186. 10.1109/TPAMI.2019.2929257 [DOI] [PubMed] [Google Scholar]
  8. Carpenter KLH, Hahemi J, Campbell K, Lippmann SJ, Baker JP, Egger HL, Espinosa S, Vermeer S, Sapiro G, & Dawson G (2021). Digital behavioral phenotyping detects atypical pattern of facial expression in toddlers with autism. Autism Research, 14(3), 488–499. 10.1002/aur.2391 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Chang Z, Di Martino JM, Aiello R, Baker J, Carpenter K, Compton S, Davis N, Eichner B, Espinosa S, Flowers J, Franz L, Harris A, Howard J, Perochon S, Perrin EM, Krishnappa Babu PR, Spanos M, Sullivan C, Walter BK, … Sapiro G (2021). Computational Methods to Measure Patterns of Gaze in Toddlers With Autism Spectrum Disorder. JAMA Pediatrics, 175(8), 827–836. 10.1001/jamapediatrics.2021.0530 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Chen M, Chow SM, Hammal Z, Messinger DS, & Cohn JF (2021). A person- and time-varying vector autoregressive model to capture interactive infant–mother head movement dynamics. Multivariate Behavioral Research, 56(5), 739–767. 10.1080/00273171.2020.1762065 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Conrad CE, Rimestad ML, Rohde JF, Petersen BH, Korfitsen CB, Tarp S, Cantio C, Lauritsen MB, & Händel MN (2021). Parent-mediated interventions for children and adolescents with autism spectrum disorders: A systematic review and meta-analysis. In Frontiers in psychiatry (Vol. 12, p. 773604). 10.3389/fpsyt.2021.773604 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Constantino JN, & Gruber CP (2012). Social Responsiveness Scale Second Edition (SRS-2): manual. Western Psychological Services (WPS). [Google Scholar]
  13. Davis PH, Elsayed H, Crais ER, Watson LR, & Grzadzinski R (2022). Caregiver responsiveness as a mechanism to improve social communication in toddlers: Secondary analysis of a randomized controlled trial. Autism Research : Official Journal of the International Society for Autism Research, 15(2), 366–378. 10.1002/aur.2640 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Dawson G, Campbell K, Hashemi J, Lippmann SJ, Smith V, Carpenter K, Egger H, Espinosa S, Vermeer S, Baker J, & Sapiro G (2018). Atypical postural control can be detected via computer vision analysis in toddlers with autism spectrum disorder. Scientific Reports, 8(1), 17008. 10.1038/s41598-018-35215-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. DuPaul GJ, Power TJ, Anastopoulos AD, & Reid R (1998). ADHD Rating Scale—IV: Checklists, norms, and clinical interpretation. Guilford Press. 10.1177/0734282905285792 [DOI] [Google Scholar]
  16. Egger HL, Dawson G, Hashemi J, Carpenter KLH, Espinosa S, Campbell K, Brotkin S, Schaich-Borg J, Qiu Q, Tepper M, Baker JP, Bloomfield RA, & Sapiro G (2018). Automatic emotion and attention analysis of young children at home: A ResearchKit autism feasibility study. Npj Digital Medicine, 1(1), 20. 10.1038/s41746-018-0024-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Elliott CD (2007). Differential Ability Scales (2nd ed.). Harcourt Assessment. [Google Scholar]
  18. Faul F, Erdfelder E, Buchner A, & Lang A-G (2009). Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41(4), 1149–1160. 10.3758/BRM.41.4.1149 [DOI] [PubMed] [Google Scholar]
  19. Fuchs P, Nussbeck FW, Meuwly N, & Bodenmann G (2017). Analyzing dyadic sequence data—research questions and implied statistical models. Frontiers in Psychology, 8, 429. 10.3389/fpsyg.2017.00429 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Güler RA, Neverova N, & Kokkinos I (2018). DensePose: Dense Human Pose Estimation in the Wild. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 7297–7306). 10.1109/CVPR.2018.00762 [DOI] [Google Scholar]
  21. Helske S, & Helske J (2019). Mixture Hidden Markov Models for Sequence Data: The seqHMM Package in R. Journal of Statistical Software, 88(3 SE-Articles), 1–32. 10.18637/jss.v088.i03 [DOI] [Google Scholar]
  22. Kenny DA, Kashy DA, & Cook WL (2006). Chapter 7. The Actor–Partner Interdependence Model. In Dyadic data analysis. Guilford Press. [Google Scholar]
  23. Kojovic N, Natraj S, Mohanty SP, Maillart T, & Schaer M (2021). Using 2D video-based pose estimation for automated prediction of autism spectrum disorders in young children. Scientific Reports, 11(1), 1–10. 10.1038/s41598-021-94378-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Krishnappababu PR, Martino M. Di, Chang Z, Perochon SP, Carpenter KLH, Compton S, Espinosa S, Dawson G, & Sapiro G (2021). Exploring Complexity of Facial Dynamics in Autism Spectrum Disorder. IEEE Transactions on Affective Computing. 10.1109/TAFFC.2021.3113876 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Kruskal WH, & Wallis WA (1952). Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association, 47(260), 583–621. 10.2307/2280779 [DOI] [Google Scholar]
  26. Le Couteur A, Lord C, & Rutter M (2003). The autism diagnostic interview- revised. Western Psychological Services. 10.1017/CBO9781107415324.004 [DOI] [Google Scholar]
  27. Lidstone DE, Rochowiak R, Pacheco C, Tunçgenç B, Vidal R, & Mostofsky SH (2021). Automated and scalable Computerized Assessment of Motor Imitation (CAMI) in children with Autism Spectrum Disorder using a single 2D camera: A pilot study. Research in Autism Spectrum Disorders, 87(February), 101840. 10.1016/j.rasd.2021.101840 [DOI] [Google Scholar]
  28. Lord C, DiLavore PC, Gotham K, Guthrie W, Luyster RJ, Risi S, Rutter M, & (Firm), W. P. S. (2012). Autism diagnostic observation schedule : ADOS-2. Western Psychological Services. [Google Scholar]
  29. Lord C, Risi S, Lambrecht L, Cook EH, Leventhal BL, Dilavore PC, Pickles A, & Rutter M (2000). The autism diagnostic observation schedule-generic: A standard measure of social and communication deficits associated with the spectrum of autism. Journal of Autism and Developmental Disorders, 30(3), 205–223. 10.1023/A:1005592401947 [DOI] [PubMed] [Google Scholar]
  30. McGoey KE, DuPaul GJ, Haley E, & Shelton TL (2007). Parent and teacher ratings of attention-deficit/hyperactivity disorder in preschool: The ADHD rating scale-IV preschool version. Journal of Psychopathology and Behavioral Assessment, 29(4), 269–276. 10.1007/s10862-007-9048-y [DOI] [Google Scholar]
  31. Moon G, Chang JY, & Lee KM (2019). Camera distance-aware top-down approach for 3D multi-person pose estimation from a single RGB image. In Proceedings of the IEEE international conference on computer vision, 2019 October (pp. 10132–10141). 10.1109/ICCV.2019.01023 [DOI] [Google Scholar]
  32. Nevill RE, Lecavalier L, & Stratis EA (2018). Meta-analysis of parent-mediated interventions for young children with autism spectrum disorder. Autism : The International Journal of Research and Practice, 22(2), 84–98. 10.1177/1362361316677838 [DOI] [PubMed] [Google Scholar]
  33. Perochon S, Di Martino M, Aiello R, Baker J, Carpenter K, Chang Z, Compton S, Davis N, Eichner B, Espinosa S, Flowers J, Franz L, Gagliano M, Harris A, Howard J, Kollins SH, Perrin EM, Raj P, Spanos M, … Dawson G (2021). A scalable computational approach to assessing response to name in toddlers with autism. Journal of Child Psychology and Psychiatry, 62(9), 1120–1131. 10.1111/jcpp.13381 [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Rogers SJ, Estes A, Lord C, Vismara L, Winter J, Fitzpatrick A, Guo M, & Dawson G (2012). Effects of a brief early start Denver model (ESDM)-based parent intervention on toddlers at risk for autism spectrum disorders: A randomized controlled trial. Journal of the American Academy of Child and Adolescent Psychiatry, 51(10), 1052–1065. 10.1016/j.jaac.2012.08.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Schwarz G (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464. 10.1214/aos/1176344136 [DOI] [Google Scholar]
  36. Siller M, & Sigman M (2002). The behaviors of parents of children with autism predict the subsequent development of their children’s communication. Journal of Autism and Developmental Disorders, 32(2), 77–89. 10.1023/A:1014884404276 [DOI] [PubMed] [Google Scholar]
  37. Siller M, & Sigman M (2008). Modeling longitudinal change in the language abilities of children with autism: Parent behaviors and child characteristics as predictors of change. Developmental Psychology, 44(6), 1691–1704. 10.1037/a0013771 [DOI] [PubMed] [Google Scholar]
  38. Sparrow SS, Saulnier CA, Cicchetti DV, & Doll EA (2016). Vineland Adaptive Behavior Scales, Third Edition. Pearson, Inc. [Google Scholar]
  39. Suarez-Rivera C, Smith LB, & Yu C (2019). Multimodal parent behaviors within joint attention support sustained attention in infants. Developmental Psychology, 55(1), 96–109. 10.1037/dev0000628 [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Tallmadge J, & Barkley RA (1983). The interactions of hyperactive and normal boys with their fathers and mothers. Journal of Abnormal Child Psychology, 11(4), 565–579. 10.1007/BF00917085 [DOI] [PubMed] [Google Scholar]
  41. Tomasello M, & Farrar MJ (1986). Joint attention and early language. Child Development, 57(6), 1454–1463. [PubMed] [Google Scholar]
  42. Tunçgenç B, Pacheco C, Rochowiak R, Nicholas R, Rengarajan S, Zou E, Messenger B, Vidal R, & Mostofsky SH (2021). Computerized Assessment of motor imitation as a scalable method for distinguishing children with autism. Biological Psychiatry. Cognitive Neuroscience and Neuroimaging, 6(3), 321–328. 10.1016/j.bpsc.2020.09.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. van de Pol F, & Langeheine R (1990). Mixed Markov Latent Class Models. Sociological Methodology, 20(January 1990), 213. 10.2307/271087 [DOI] [Google Scholar]
  44. Warlaumont AS, Richards JA, Gilkerson J, & Oller DK (2014). A social feedback loop for speech development and its reduction in autism. Psychological Science, 25(7), 1314–1324. 10.1177/0956797614531023 [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Yurkovic JR, Lisandrelli G, Shaffer RC, Dominick KC, Pedapati EV, Erickson CA, Kennedy DP, & Yu C (2021). Using head-mounted eye tracking to examine visual and manual exploration during naturalistic toy play in children with and without autism spectrum disorder. Scientific Reports, 11(1), 3578. 10.1038/s41598-021-81102-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary file 1
supplementary file 2

RESOURCES