Development of webcam-collected and artificial-intelligence-derived social and cognitive performance measures for neurodevelopmental genetic syndromes

TW Frazier; RM Busch; P Klaas; K Lachlan; S Jeste; A Kolevzon; E Loth; J Harris; L Speer; T Pepper; K Anthony; JM Graglia; CG Delagrammatikas; S Bedrosian-Sermone; C Smith-Hicks; K Huba; R Longyear; L Green-Snyder; F Shic; M Sahin; C Eng; AY Hardan; M Uljarević

doi:10.1002/ajmg.c.32058

. Author manuscript; available in PMC: 2024 Sep 1.

Published in final edited form as: Am J Med Genet C Semin Med Genet. 2023 Aug 3;193(3):e32058. doi: 10.1002/ajmg.c.32058

Development of webcam-collected and artificial-intelligence-derived social and cognitive performance measures for neurodevelopmental genetic syndromes

TW Frazier ^1,², RM Busch ^3,⁴, P Klaas ³, K Lachlan ⁵, S Jeste ⁶, A Kolevzon ⁷, E Loth ⁸, J Harris ⁹, L Speer ¹⁰, T Pepper ¹¹, K Anthony ¹², JM Graglia ¹³, CG Delagrammatikas ¹⁴, S Bedrosian-Sermone ¹⁵, C Smith-Hicks ⁹, K Huba ¹, R Longyear ¹⁶, L Green-Snyder ¹⁷, F Shic ¹⁸, M Sahin ¹⁹, C Eng ⁴, AY Hardan ²⁰, M Uljarević ^20,²¹

¹Department of Psychology, John Carroll University, University Heights, OH, USA

²Departments of Pediatrics and Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA.

³Department of Neurology, Neurological Institute, Cleveland Clinic, Cleveland, OH 44195, USA

⁴Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA

⁵Human Genetics and Genomic Medicine, Faculty of Medicine, University of Southampton and Wessex Clinical Genetics Service, University Hospital Southampton NHS Foundation Trust, Southampton, UK

⁶Division of Neurology, Children’s Hospital of Los Angeles, Los Angeles, CA, USA

⁷Departments of Psychiatry and Pediatrics, Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY, USA

⁸Department of Forensic and Neurodevelopmental Science, Institute of Psychiatry, Psychology and Neuroscience, Kings College London, London, UK.

⁹Department of Neurology, Kennedy Krieger Institute and Johns Hopkins University School of Medicine, Baltimore, MD, USA

¹⁰Frazier Behavioral Health, Cleveland, OH, USA

¹¹PTEN Research Foundation, Cheltenham, UK

¹²PTEN Hamartoma Tumor Syndrome Foundation, Huntsville, AL, USA

¹³SYNGAP Research Fund, Palo Alto, CA, USA

¹⁴Malan Syndrome Foundation, Old Bridge, NJ, USA

¹⁵ADNP Kids Foundation, Brush Prairie, WA, USA

¹⁶Autism Analytica, Syracuse, NY, USA

¹⁷Simons Foundation, New York, NY, USA

¹⁸Department of Pediatrics, University of Washington and Seattle Children’s Research Institute, Seattle, WA, USA

¹⁹Rosamund Stone Zander Translational Neuroscience Center, Department of Neurology, Boston Children’s Hospital and Harvard Medical School, Boston, MA, USA

²⁰Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, California, USA

²¹Melbourne School of Psychological Sciences, Faculty of Medicine, Dentistry, and Health Sciences, The University of Melbourne, Victoria, Australia

Author Contributions

Thomas W Frazier, Mirko Uljarević, and Antonio Y Hardan designed the study. Thomas W Frazier and Mirko Uljarević collected the data. Thomas W Frazier and Mirko Uljarević had full access to the data and conducted the analyses. Thomas W Frazier, Mirko Uljarević, and Antonio Y Hardan drafted the initial manuscript. All authors critically reviewed and provided the feedback on the initial version of manuscript. All authors approved the final version of the manuscript.

^✉

Corresponding Author: Thomas W. Frazier, PhD, Department of Psychology, John Carroll University, University Heights, OH, USA 44118. tfrazier@jcu.edu; Telephone: (216) 397-1530.

PMCID: PMC10543620 NIHMSID: NIHMS1923134 PMID: 37534867

Abstract

This study focused on the development and initial psychometric evaluation of a set of online, webcam-collected, and artificial intelligence-derived patient performance measures for neurodevelopmental genetic syndromes (NDGS). Initial testing and qualitative input was used to develop four stimulus paradigms capturing social and cognitive processes, including social attention, receptive vocabulary, processing speed, and single-word reading. The paradigms were administered to a sample of 375 participants, including 163 with NDGS, 56 with idiopathic neurodevelopmental disability (NDD), and 156 neurotypical controls. Twelve measures were created from the 4 stimulus paradigms. Valid completion rates varied from 87% to 100% across measures, with lower but adequate completion rates in participants with intellectual disability. Adequate to excellent internal consistency reliability (α=.67 to .95) was observed across measures. Test-retest reproducibility at 1-month follow-up and stability at 4-month follow-up was fair to good (r=.40–.73) for 8 of 12 measures. All gaze-based measures showed evidence of convergent and discriminant validity with parent-report measures of other cognitive and behavioral constructs. Comparisons across NDGS groups revealed distinct patterns of social and cognitive functioning, including people with PTEN mutations showing a less impaired overall pattern and people with SYNGAP1 mutations showing more attentional, processing speed, and social processing difficulties relative to people with NFIX mutations. Webcam-collected performance measures appear to be a reliable and potentially useful method for objective characterization and monitoring of social and cognitive processes in NDGS and idiopathic NDD. Additional validation work, including more detailed convergent and discriminant validity analyses and examination of sensitivity to change, is needed to replicate and extend these observations.

1. Introduction

Advances in identifying pathogenic variation linked to neurodevelopmental disability (NDD) has accelerated the discovery of a growing number of specific neurodevelopmental genetic syndromes (NDGS). As NDGS are identified, natural history investigations have begun to characterize a wide spectrum of medical conditions and neurobehavioral strengths and weaknesses associated with each condition (Busch et al., 2023; Mulder et al., 2020; Vlaskamp et al., 2019). This work is crucial to developing patient support guidelines and ensuring that patients with NDGS receive appropriate supports that maximize their development. For example, in individuals with PTEN hamartoma tumor syndrome (PHTS) resulting from germline heterozygous mutations in PTEN, a spectrum of frontal-systems deficits has been identified from no impairment to very severe impairment associated with intellectual disability (ID) and autism spectrum disorder (ASD) (Busch et al., 2019; Ciaccio et al., 2018; Frazier et al., 2015; Steele et al., 2021). This pattern has been found to be stable over a period of 2 years (Busch et al., 2023), even in young children, and the specific profile of frontal systems impairment can be used to inform clinical and educational care (Frazier, 2019).

While there have been some initial attempts to provide more detailed characterization of neurobehavioral profiles across different NGDS, yield from the natural history and neurobehavioral studies have been limited by the lack of comprehensive and sensitive instruments appropriate for evaluations with geographically-dispersed populations. For example, within the Rare Disease Clinical Research Network – Developmental Synaptopathies Consortium natural history study of individuals with PHTS and ASD (Busch et al., 2019), in-person cognitive assessments were limited to annual visits and often required several hours of testing to collect data from relevant neurocognitive domains. Because of the extensive effort required, the related pilot clinical trial initiated within this network was limited to three in-person assessments over a six-month study period (Hardan et al., 2021; Srivastava et al., 2022). The infrequency, difficulty, and burden of these traditional approaches highlight the need for new phenotyping methods.

Identification of NDGS has also accelerated the development of syndrome-specific patient advocacy groups and foundations, as well as programs of research designed to better understand and translate molecular, cellular, and circuitry findings into intervention strategies. A primary goal of these patient advocacy groups - and the research programs they support - is to develop and evaluate the efficacy of personalized interventions. Recent reviews of NDGS have emphasized the need to understand pathophysiology and neurobehavioral profiles to generate personalized therapeutic strategies (Frazier, 2019; Sahin & Sur, 2015). Yet, given the small number of specialty clinics focused on each NDGS, and practical geographic constraints, many patients remain under-served and many clinics lack resources to collect extensive neurobehavioral assessments during clinic visits. Relatedly, due to the rare nature of many NDGS, natural history studies often rely on small sample sizes, which limits their value in identifying clinical endpoints for trials. In these small-sample longitudinal contexts, it is important to have reliable, stable indicators of individual performance, as compared to larger group studies where statistical certainty can be bolstered by adding participants. Having repeatable, online measures of neurobehavioral function could substantially improve both the statistical power of translational and clinical studies and increase the ability to more rapidly and sensitively identify individual differences in the pattern of intervention response. Administration of these meaures in the individual’s home rather than within a clinic setting would not only broaden access to research participation but might also reduce biases resulting from collection of neurobehavioral information in an unfamiliar setting.

Research in NDGS and idiopathic NDD is also limited by reliance on subjective measurements acquired from parents/caregivers and/or observations by clinician scientists, which has precipitated a call for the development of objective measures (Sahin et al., 2018). As a result, a number of tools have begun to be developed and have shown promise for objectively evaluating and tracking key functions relevant to neurodevelopment (Amit et al., 2020; Dawson et al., 2018; Egger et al., 2018; Goodwin et al., 2019; Manfredonia et al., 2019; McPartland et al., 2020; Ness et al., 2019; Tuncgenc et al., 2021). However, with a few notable exceptions, these measures have been developed solely for in-person evaluation, limiting their application and temporal sensitivity. In addition, noted measures have predominantly focused on the evaluation of only single domains rather than providing a more detailed characterization of multiple social, developmental, and cognitive domains. Furthermore, a high percentage of individuals with NDGS have significant cognitive and functional impairments. A relatively brief and repeatable battery of objective measures that can reliably capture a wide range of cognitive and behavioral capacities could supplement existing tools while simultaneously increasing sensitivity to intervention effects.

One possibility that can increase the objectivity of NDGS evaluations and simultaneously overcome accessibility barriers is to augment traditional characterization methods with appropriately-designed remotely-administered measures of neurobehavioral function. Designing remote measures for maximal accessibility has the potential to lower burden for providers as well as patients. Webcam-based eye tracking is a remote data collection method that uses cameras on everyday computing devices coupled with artificial-intelligence / machine learning algorithms to capture individual looking patterns toward probes such as the presentation of videos and images. Webcam data collection also permits the frame-by-frame automated facial expression analysis using machine learning algorithms that enable prototype matching using large training datasets. The potential for these methods to inform neurodevelopment is strong and, increasingly, both webcam-collected data (Simmatis et al., 2023) and artificial intelligence / machine learning algorithms (Nerusil et al., 2021) are being applied to create novel biometric measures for assessing child development and neurological conditions. A key advantage of webcam-based data collection is that the paradigms can be administered without direct real-time clinical supervision. Thus, an online, webcam-collected patient performance battery, capturing relevant social and cognitive measurements in an objective way, could supplement in-person assessment of NDGS patients and provide a more temporally-sensitive picture of neurobehavioral development in these populations. This is particularly true for individuals with medical and mental health comorbidities and cognitive impairments that merit closer surveillance but are currently underserved (Vlaskamp et al., 2019).

Unfortunately, at present, there are no accessible, scalable objective measures specifically designed for rapid and repeated evaluation of multiple social and cognitive domains important to NDGS and idiopathic NDD. The primary aim of this study was to address this limitation and develop social and cognitive stimulus paradigms that could be paired with webcam collection and artificial intelligence algorithms to measure key neurocognitive processes relevant to NDGS. Webcam-collected measures were developed in conjunction with clinician-scientist experts, patients, and parents/caregivers, following gold-standard principles of measure development (Boateng et al., 2018) and inclusive practices (FDA, 2009), to complement our recently developed and validated informant-report survey scales (Frazier et al., 2023). Individual paradigms were created to be brief (3–4 minutes) and to require only spontaneous or directed gaze, without motor or speech responses, thus making it appropriate for a wide range of developmental and cognitive levels. Stimuli followed best practices in gaze collection (Sasson & Elison, 2012) and test development (Boateng et al., 2018), including teaching parents to facilitate data collection (when needed) without interfering in the evaluation, presenting large elements within the visual field to limit accuracy issues in webcam gaze collection (Semmelmann & Weigelt, 2018), and, where relevant, focusing on very easy initial items with a graded increase in task difficulty. Based on careful attention to applicability to a wide range of individuals with NDGS, valid measure collection was expected to be achieved in the majority of participants, including those with intellectual disability.

A secondary aim of this study was to conduct initial psychometric evaluation of these measures in several distinct NDGS groups, people with idiopathic NDD, and neurotypical controls. Initial evaluation included estimation of scale reliability, test-retest reproducibility (1-month follow-up), and stability (4-month follow-up). Initial convergent and discriminant validity was assessed using data from other informant(parent)-reported clinical information (Frazier et al., 2023). In addition, given the importance of detecting autism within NDGS to ensure access to appropriate services, concurrent validity with ASD diagnoses and autism symptom levels was evaluated. Finally, using baseline data, exploratory analyses examined the pattern of cognitive and behavioral functioning across NDGS and idiopathic NDD.

2. Methods

2.1. Initial Stimulus Development

The stimulus paradigm development process is outlined in Appendix 1. Briefly, this included identifying or creating appropriate target items and stimuli across a wide range of ages (3–45) and ability levels (moderate to severe cognitive impairment to average ability levels); collecting feasability data; updating items and stimuli based on initial feedback; conducting a pilot administration of performance measures with 10 clinician-scientist experts and 9 parents and patients with NDGS and/or idiopathic NDD; and administering a post-evaluation survey to collect additional feedback and create the final performance paradigms.

The social paradigm and associated stimuli were chosen based on the combination of empirical work (Frazier et al., 2018) and comprehensive review of the literature (Chita-Tegmark, 2016; Frazier et al., 2017) . Specifically, a variety of social stimuli were selected, in part, due to the high rates of ASD occurrence in NDGS and the broader relevance of social attention to neurodevelopment as a transdiagnostic construct (Frazier, Uljarevic, et al., 2021; Salley & Colombo, 2016). The processing speed paradigm was selected because of the potential to use this cognitive paradigm to capture attentional scanning across the stimulus field, measure speed of object detection via gaze, the ease-of-administration in individuals with NGDS, particularly those with limited speech or motor difficulties, and the ability to create easier stimuli relevant to individuals with more significant intellectual impairments. Importantly, processing speed has been shown to be a very sensitive index of brain development and neuropathophysiological processes (Bove et al., 2021; Kail, 1991). The receptive vocabulary paradigm was selected because receptive language is a strong indicator of developmental trajectory and functional outcome (Frazier, Klingemier, et al., 2021) and can validly estimate results from standardized in-person testing using gaze to visual targets (Frazier et al., 2020). The single-word reading paradigm was developed based on a recommendation by clinician-scientist experts for identifying early reading, including in people with limited or no speech where reading is more difficulty to assess. This paradigm was also included based on its potential to monitor development of reading throughout childhood and early adulthood in NDGS. Additional information for receptive vocabulary and single-word reading target selection and stimulus creation are provided in Appendices 2–3. Example screenshots for each of the performance paradigms are included in Appendices 4–7, and stimulus/target order and composition information are provided in Appendices 8–11.

2.2. Clinician-Scientist Experts and Parent Pilot Evaluation Feedback

Ten clinician-scientist experts were recruited based on their clinical and/or research expertise with a specific NDGS group or idiopathic NDD. Nine parent-patient pairs were recruited from the respective groups (6 PHTS, 1 NFIX, 1 SYNGAP1, 1 ADNP, and 1 idiopathic ASD). Patients were intentionally selected to represent a range of ages and cognitive levels. After completing a pilot administration of performance paradigms, clinician-scientist experts and parents - who facilitated the webcam administration for the patient participant - completed a post-evaluation survey. Questions are provided in Appendices 12–13. This information was used to generate final stimulus videos and to improve the training of parents in facilitating administration to the child.

2.3. Parent/Caregiver Administration Support Training

Based on initial feedback, a parent/caregiver training process was developed (Appendix 14). This process included the following elements: 1) introduction to webcam technology, 2) training video, 3) parent completion of a “practice” stimulus set, 4) online training in valid task completion, and 5) virtual support meetings during initial and follow-up administrations. All of the elements were optional, but most participants used at least one option, and nearly all participants completed the parent “practice” stimuli.

2.4. Webcam Collection of Gaze

Participants were instructed to use a device with at least a 10” screen size based on results of initial pilot testing, which indicated that smaller screen sizes could reduce accuracy of point-of-regard relative to specific areas-of-interest. Webcam data were collected and processed using proprietary CoolTool software. The software was orginally intended as a neuromarketing tool, but initial feasability testing, including with several young children with neurodevelopmental disabilities, indicated good potential for use as a data collection platform. The minimum required camera resolution was 720p at 30fps. The gaze collection algorithm included a five-point calibration routine prior to each paradigm administration. This routine is coupled with a machine learning algorithm and was designed to detect webcam position within the 3-D space and intended to maximize gaze accuracy. On a frame-by-frame basis, gaze position relative to the 2-D screen was estimated. While accurate calibration is desirable, the gaze estimation model often functions adequately when less than ideal calibration data are acquired, making the system ideal for young and more impaired participants. Similar systems have been shown to have achieve ~3–5 degrees of calibration uncertainty, translating to accurate detection of areas >10% of screen size (Semmelmann & Weigelt, 2018; Shehu et al., 2021). The present stimulus paradigms were built with large areas-of-interest to be tolerant of higher levels of gaze uncertainty. Importantly, any reductions in gaze accuracy should reduce the reliability and validity of gaze-based measurements. Thus, observations of high reliability and evidence of convergent validity would suggest minimal impact of sub-optimal gaze calibration. To offset concerns regarding possible reductions in gaze calibration and accuracy negatively impacting neurobehavioral measurements, no indices were scored if total time with eyes on screen was estimated to be less than 30 seconds overall (out of 15 minutes of possible gaze time to the screen).

Areas-of-interest were generated for each stimulus. For social attention stimuli, these include both socially-relevant (e.g., faces, target objects, etc.) and socially-irrelevant stimuli (e.g., foreground and background distractors, non-target objects), based on our prior research (Frazier et al., 2018). For processing speed, receptive vocabulary, and single-word reading stimuli, areas-of-interest included target items/objects. For all stimuli, areas-of-interest are temporally-defined based on expected gaze patterns from prior research (social attention) (Frazier et al., 2018) or after the verbal directive has been given (cognitive paradigms) (Frazier et al., 2020).

2.5. Automated Scoring of Facial Expressions

The webcam software also includes a proprietary algorithm for automatically scoring facial expressions. Facial landmarks are identified in the 3-dimensional space and the artificial intelligence algorithm is applied to these landmarks on a frame-by-frame basis to generate probability scores based on accuracy of classification from training data (Kuntzler et al., 2021). Probability scores represent a match between the facial landmark configuration and known sets of facial expressions (fear, anger, disgust, sadness, surprise, joy, and neutral), with closer matches being interpreted as higher intensities of expression (range 0–100%). For the present study, and because specific affect recognition intensities can be prone to error for more subtle expressions (Kuntzler et al., 2021), specific expressions were aggregated into positive and negative categories to maximize reliability. Facial expression measures were only collected to the social attention stimuli, as these showed the greatest range of non-neutral expressions in preliminary data.

2.6. Development of a Priori Validity Criteria and Scoring

For each social and cognitive paradigm, the investigative team a priori identified possible gaze and facial expression measures that would be relevant to evaluating social and cognitive processes in NDGS and idiopathic NDD. The only exception to this is the social attention measure which was empirically-developed following our prior published methodology (Frazier et al., 2018) (see Appendix 15 for additional information). Appendix 16 presents operational definitions for each performance measure. Each gaze-based measure was only scored if stringent validity criteria were met. Appendix 17 includes validity criteria for all 12 webcam-collected measures. For each measure, validity criteria ensure that the participant attended to the stimuli for at least 30 seconds, and at least 8 valid targets or 4 valid stimuli were collected. Fixations were scored by identifying at least 66ms of gaze point samples within a 100-pixel dispersion. Four gaze metrics are calculated for each area-of-interest – fixation duration, fixation count, glance count, and time-to-first fixation (Appendix 18). These metrics were used to score the 12 performance measures evaluated in this study.

2. 7. Participants for Initial Measure Evaluation

NDGS groups included participants with PTEN Hamartoma Tumor Syndrome (PHTS), ADNP , SYNGAP1, or NFIX recruited via contacts through the PTEN Hamartoma Tumor Syndrome Foundation with the support of the PTEN Research Foundation, the ADNP Kids Foundation, the SYNGAP Research Fund, and the Malan Syndrome Foundation. Other individuals with NDGS were recruited via the Simons Foundation Searchlight registry and included people with mutations in GRIN2B, CSNK2A1, HIVEP2, SCN2A, MED13L, and STXBP1. Given the relatively small sample sizes for ADNP (n=11) and these NDGS groups, they were combined into a single “other NDGS” group (n=63). Individuals were included if they were between the ages of 3 and 45 at enrollment and had an available parent or other close relative/caregiver to complete informant-report measures. Siblings of individuals with NDGS were also eligible to participate, and unrelated neurotypical controls were recruited using StudyKik, a national recruitment service. Siblings and unrelated controls who were reported to have an idiopathic neurodevelopmental disability were included in a separate group.

2.8. Procedure

Parent/caregiver informants first completed a demographic and clinical information questionnaire followed by 11 neurobehavioral evaluation tool (NET) survey scales (Frazier et al., 2023). These survey scales included 6 measures of symptoms/problems (anxiety, attention-deficit/hyperactivity disorder, restricted/repetitive behavior, challenging behavior, mood, and sleep problems) and 5 measures of skills/functioning (motor skills, daily living skills, social communication/interaction skills, executive functioning, and quality of life). After NET survey completion, informants and participants were instructed to complete webcam-collected performance measures and were sent links via email or text to facilitate completion. For young and/or impaired children, performance measure administration began by having the parent complete a practice version, so that they understood how the webcam collection works and how best to help their child. Parents and older patients also were offered a video call with the research coordinator to review best practices in performance measure administration and were provided a set of recommendations to improve evaluation validity.

Performance measure administration began with the 5-point calibration that included dots presented in the four corners and center of the screen. Next, videos were presented for each paradigm in succession – social attention, receptive language, processing speed, and single-word reading. Re-calibration automatically occured prior to each paradigm.

Survey and webcam measures were collected at baseline, 1-month, and 4-month follow-up timepoints. The maximum total administration time across all paradigms was 15 minutes (social attention – 4 min, receptive vocabulary – 4 min, processing speed – 3 min, single-word reading – 4 min) with videos separated into 1-minute segments to permit breaks. A button press was required to advance to the next video. Participants were instructed to complete all of the social attention and processing speed videos, but were permitted to complete only the first two minutes of the receptive vocabulary paradigm and complete only one minute of the single-word reading paradigm dependent on the parent’s appraisal of the patient’s capacity to engage with the paradigm. Participants could proceed through all paradigms or take breaks between paradigms but were encouraged to finish all videos in one sitting if possible.

IRB approval was obtained for all of the qualitative and quantitative procedures of the study, including administration of the final NET scales, and parents/legally-authorized representatives and adult patients provided informed consent prior to completing any study procedures. Assent for minors was also obtained, where appropriate.

2.9. Statistical Analyses

2.9.1. Sample Characterization

Descriptive statistics for demographic and clinical factors were computed to characterize the sample, and Chi-square or univariate ANOVA were used to compare across the seven study groups (PHTS, SYNGAP1, NFIX, other NDGS, idiopathic NDD, sibling controls, and unrelated neurotypical controls).

2.9.2. Evaluation and Measure Validity

Using validity criteria for each of the 12 performance measures, the sum of valid measures was computed and compared across study groups using univariate ANOVA. Proportions of validity by measure were also computed overall and by parent-reported intellectual disability status.

2.9.3. Reliability

Scale reliability (internal consistency) was calculated using Cronbach’s alpha (α) (Streiner & Norman, 1995). Scale reliability estimates falling in the ranges .70 to .79, .80 to .89, and >.90 were considered fair, good, and excellent (Nunnally & Bernstein, 1994), respectively. Test-retest reproducibility (one-month follow-up) and stability (4-month follow-up) were estimated using Pearson’s bivariate correlations. Test-retest estimates <.40 were considered poor, .40 to .59 fair, .60 to .74 good, and .75+ excellent (Cicchetti et al., 2006).

2.9.4. Convergent and Discriminant Validity

To evaluate convergent and discriminant validity, other clinical information based on informant-report was a priori selected as either measuring similar constructs (convergent validity) or measuring dissimilar constructs (discriminant validity) for each performance measure. Informant-report information included: estimated IQ; speech level (5-point scale from non or minimally speaking to fluent speech); reading level (5-point scale from no reading to paragraph level or higher). ADHD, anxiety, mood, challenging behavior, social communication/interaction, and restricted repetitive behavior symptoms; sleep problems; daily living skills; executive functioning; and motor skills. Bivariate correlations were computed between each performance measure and the convergent and discriminant validity measures selected. To compute aggregate correlations over multiple measures, correlations were converted to Fisher’s z, averaged, and transformed back to a correlation metric. The test of the significance of the difference in dependent correlations was used to examine whether convergent validity correlations were higher than discriminant validity correlations (Cohen & Cohen, 1983).

2.9.5. Concurrent Validity with ID, ASD Diagnoses, and Autism Symptom Levels

To examine concurrent validity of performance measures with parent-report clinical ID diagnosis, independent samples t-tests were computed with each measure as the dependent variable and ID status (yes, no) as the grouping variable. Cohen’s d was computed to estimate the magnitude of group differences. To evaluate potential diagnostic validity, receiver operating characteristic (ROC) curve analyses were calculated in the training, testing, validation, and testing plus validation sub-samples, separately for baseline, 1-month, and 4-month follow-up data. Areas under the curve (AUCs) evaluated diagnostic validity. A rough guideline for evaluating AUC values is: < .60 = poor, .60 - .69 = fair, .70 to .79 = good, 80 - .89 = excellent if the comparison group is clinically meaningful; and .90 – 1.00 = exceptional only if the design and comparison are appropriate (Youngstrom et al., 2019). To evaluate concurrent validity with autism symptom levels, autism symptom levels derived from neurobehavioral evaluation survey scales were calculated and correlations were computed in the same subsamples as ROC analyses.

2.9.6. Neurobehavioral Patterns across NDGS and idiopathic NDD Groups

To explore unique patterns of social and cognitive function, webcam meausures were first normed using regression-based norming in unrelated healthy controls, with age, the square of age (to capture non-linear developmental trends), and sex included as predictors in each equation. This approach puts each measure on a z-score metric relative to healthy controls. Using these standardized residual scores, univariate analysis of variance models were computed, with each of the seven groups as the independent variable and the performance measure scores as dependent variables in separate analyses.

2.9.7. Statistical Power

Assuming total sample sizes of 200+ for reliability and validity analyses, statistical power to detect a bivariate correlation of r≥.40 was excellent (>.99; one-tailed p-value of .05). Assuming minimal sub-sample sizes of at least 18 ASD and 40 non-ASD diagnosed individuals, power to detect AUCs≥.72 was at least good (≥.80). Statistical power to detect group differences across webcam performance measures, assuming a minimum sample size of 24, was at least adequate (>.82) if large group differences were observed (d≥.80; α=.05, two-tailed). For larger group sizes (n>40), power was adequate, even for medium effects (d≥.50).

2.9.8. Statistical Analysis Implementation

Statistical significance was set at α=.05, two-tailed, and effect size magnitude was emphasized. Data preparation, descriptive analyses, internal consistency reliability using Cronbach’s alpha (α), and bivariate correlations were computed in SPSS v28 (IBM Corp, 2021). ROC analyses were computed using the R package pROC and implemented in version 4.1.2 (R Core Team, 2021) using R Studio version 2021.09.1.

3. Results

3. 1. Pilot Evaluation Results

Clinicians used a wide range of hardware setups and reported high relevance of the paradigms to their respective NDGS or idiopathic NDD group (Appendix 19). Clarity of instructions and quality of audio and visual stimuli was rated as high. Timing was rated as generally moderate (neither fast nor slow). Several potential concerns about target difficulty levels were raised and used to adjust the final stimuli.

Parents rated the overall experience as positive and of relatively moderate difficulty across paradigms (Appendix 20). Patient participants did not require breaks, looked away from the screen with variable frequency (every 5–10 seconds to only a few losses of attention to screen), covered or touched their face only infrequently, and required variable levels of physical, gestural, or verbal assistance to maintain motivation and attention. Unexpected intrusions and adjustments to lighting were infrequent. Overall attention was rated as average to good. Paradigm relevance to the patient’s condition was rated as “relevant” to “highly relevant” across paradigms. Quality of audio and visual stimuli was rated as high, and timing was judged to be generally moderate to fast. These data were used to adjust parent training processes and to include reminders to limit assistance to motivation and general attention (not specific to a stimulus or desired response).

3.2. Sample Characteristics

A total of 395 individuals enrolled to participate before 04/05/2023 (recruitment is ongoing). Of these, 20 did not attempt baseline webcam paradigms, but of the 375 who did attempt the paradigms, all achieved at least 1 valid measure (Appendix 21). Longitudinal attrition was modest at 1-month follow-up (n=54 did not attempt; n=341 attempted) but higher at 4-month follow-up (n=100 did not attempt; n=295 attempted).

Table 1 presents sample characteristics. Findings were highly consistent with findings in our recent survey validation study (Frazier et al., 2023). Specifically, participants were younger in the NFIX and SYNGAP1 groups and older in the PHTS and idiopathic NDD groups, with high rates of spousal informants in the latter groups. All groups had very high proportions of White/Caucasian participants, although Hispanic ethnicity approximated US population proportions in most groups, and the sample had a wide range of household incomes. Estimated cognitive levels were lowest in the NFIX, SYNGAP1, and other NDGS groups and to a lesser extent in the PHTS group relative to control groups. Informant-reported developmental diagnoses were highly variable across NDGS groups, but with elevated rates of ASD, ID, anxiety, and motor disorder in NFIX, SYNGAP1, and other NDGS groups compared to controls. Participants were predominantly from the US (n=325, 87%), but a small minority of participants with informants fluent in English were also included from other countries (United Kingdom n=17, Canada n=24, Australia n=4, New Zealand n=1, Ireland n=2, Netherlands n=1, Israel n=1).

Table 1.

Demographic and clinical characteristics by study group.

	Sibling Controls	Unrelated Controls	PHTS	NFIX	SYNGAP1	Other NDGS	NDD	X² / F (p)
	n (%)	n (%)	n (%)	n (%)	n (%)	n (%)	n (%)
N	40	116	33	24	43	63	56
Informant Age (M, SD)	42 (6)	42 (9)	43 (8)	41 (10)	42 (8)	44 (8)	42 (8)	0.6 (.718)
Informant Sex (% Female)	37 (93%)	95 (82%)	28 (85%)	21 (88%)	39 (91%)	61 (97%)	51 (91%)	12.3 (.424)
Informant Relationship to Participant								39.3 (.003)
Biological Parent	39 (98%)	99 (85%)	25 (76%)	23 (96%)	40 (93%)	59 (93%)	44 (79%)
Adoptive or Custodial Parent	0 (0%)	3 (3%)	1 (3%)	1 (4%)	1 (2%)	4 (6%)	2 (4%)
Other Biological Relative / Sibling	1 (2%)	7 (6%)	0 (0%)	0 (0%)	1 (2%)	0 (0%)	3 (5%)
Spouse/Other Non-Biological Relative	0 (0%)	7 (6%)	7 (21%)	0 (0%)	1 (2%)	0 (0%)	7 (12%)
Household Income (US $)								79.7 (.013)
<$25,000	1 (3%)	5 (4%)	2 (6%)	0 (0%)	0 (0%)	2 (3%)	8 (14%)
$25,000–$34,999	2 (5%)	8 (7%)	0 (0%)	2 (8%)	1 (2%)	1 (2%)	2 (4%)
$35,000–$49,999	1 (3%)	5 (4%)	1 (3%)	3 (13%)	3 (7%)	3 (5%)	6 (11%)
$50,000–$74,999	6 (15%)	18 (16%)	9 (27%)	4 (17%)	3 (7%)	4 (6%)	11 (20%)
$75,000–$99,999	2 (5%)	21 (18%)	3 (9%)	3 (13%)	4 (9%)	6 (10%)	4 (7%)
$100,000–$149,999	7 (18%)	28 (24%)	7 (21%)	4 (17%)	10 (23%)	16 (25%)	11 (20%)
$150,000–$199,999	7 (18%)	14 (12%)	4 (12%)	5 (21%)	10 (23%)	6 (10%)	6 (11%)
$200,000+	6 (15%)	13 (11%)	2 (6%)	2 (8%)	7 (16%)	12 (19%)	5 (9%)
Did not report	8 (20%)	4 (3%)	5 (15%)	1 (4%)	5 (11%)	13 (21%)	3 (5%)
Participant Age (M, SD)	11 (5)	12 (8)	17 (13)	10 (7)	10 (7)	11 (6)	16 (9)	4.8 (<.001)
Participant Sex (% Female)	23 (58%)	63 (54%)	13 (39%)	12 (50%)	19 (44%)	36 (57%)	21 (38%)	8.6 (.197)
Participant Race / Ethnicity
White / Caucasian	36 (90%)	95 (82%)	30 (91%)	24 (100%)	37 (86%)	58 (92%)	46 (82%)	9.6 (.142)
Black / African American	3 (8%)	9 (8%)	2 (6%)	0 (0%)	5 (12%)	5 (8%)	8 (14%)	5.6 (.473)
Middle Eastern or North African	2 (5%)	1 (1%)	1 (3%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	9.1 (.167)
East Asian	2 (5%)	9 (8%)	3 (9%)	0 (0%)	2 (5%)	5 (8%)	2 (4%)	3.8 (.697)
South Asian	2 (5%)	8 (7%)	0 (0%)	0 (0%)	1 (2%)	3 (5%)	0 (0%)	8.2 (.223)
Native American / Alaskan Native	0 (0%)	3 (3%)	1 (3%)	1 (4%)	0 (0%)	0 (0%)	1 (2%)	4.5 (.605)
Native Hawaiian / Pacific Islander	0 (0%)	1 (1%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	2.2 (.896)
Hispanic	7 (18%)	21 (18%)	1 (3%)	5 (21%)	7 (17%)	2 (3%)	11 (20%)	18.7 (.096)
Unknown	0 (0%)	2 (2%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	4.5 (.611)
Did not report	0 (0%)	2 (2%)	0 (0%)	0 (0%)	0 (0%)	0 (0%)	1 (2%)	3.6 (.734)
Cognitive Level (informant-estimated)								337.9 (<.001)
Very high or above (120+)	6 (15%)	12 (10%)	3 (9%)	0 (0%)	0 (0%)	1 (2%)	10 (18%)
High Average (110–119)	18 (45%)	58 (50%)	6 (18%)	0 (0%)	0 (0%)	0 (0%)	19 (34%)
Average (90–109)	13 (33%)	42 (36%)	15 (46%)	0 (0%)	1 (2%)	2 (3%)	22 (39%)
Below average (80–89)	0 (0%)	0 (0%)	1 (3%)	2 (8%)	4 (9%)	6 (10%)	2 (4%)
Borderline impairment (70–79)	0 (0%)	0 (0%)	2 (6%)	2 (8%)	1 (2%)	2 (3%)	0 (0%)
Mild impairment (55–69)	0 (0%)	0 (0%)	1 (3%)	5 (21%)	6 (14%)	12 (19%)	3 (5%)
Moderate impairment (40–54)	0 (0%)	0 (0%)	2 (6%)	9 (38%)	11 (26%)	17 (27%)	0 (0%)
Severe impairment (21 to 39)	0 (0%)	0 (0%)	0 (0%)	2 (8%)	10 (23%)	12 (19%)	0 (0%)
Profound impairment (<20)	0 (0%)	0 (0%)	0 (0%)	2 (8%)	5 (12%)	3 (5%)	0 (0%)
Did not report	3 (8%)	4 (3%)	3 (9%)	2 (8%)	5 (12%)	8 (13%)	0 (0%)
Cognitive Estimate from Prior Testing	6 (15%)	19 (16%)	16 (49%)	13 (54%)	21 (49%)	30 (48%)	26 (46%)	57.1 (<.001)
Developmental Diagnoses (n, %)
ASD	‐	‐	9 (27%)	5 (21%)	35 (81%)	32 (51%)	8 (14%)	54.8 (<.001)
ID/GDD	‐	‐	10 (30%)	21 (88%)	39 (91%)	58 (92%)	1 (2%)	141.3 (<.001)
Speech/language disorder	‐	‐	9 (27%)	11 (46%)	32 (74%)	40 (64%)	10 (18%)	44.2 (<.001)
ADHD	‐	‐	5 (15%)	1 (4%)	6 (14%)	16 (25%)	26 (46%)	24.0 (<.001)
ODD/CD	‐	‐	0 (0%)	1 (4%)	4 (9%)	2 (3%)	4 (7%)	4.4 (.353)
Anxiety disorder	‐	‐	7 (21%)	8 (33%)	8 (19%)	10 (16%)	18 (32%)	6.4 (.174)
Specific learning disorder	‐	‐	2 (6%)	0 (0%)	1 (2%)	4 (6%)	5 (9%)	3.6 (.460)
Motor / coordination disorder	‐	‐	4 (12%)	6 (25%)	24 (56%)	21 (33%)	0 (0%)	45.5 (<.001)
Depressive disorder	‐	‐	5 (15%)	0 (0%)	0 (0%)	0 (0%)	10 (18%)	23.8 (<.001)
Bipolar disorder / mania	‐	‐	0 (0%)	0 (0%)	0 (0%)	1 (2%)	1 (2%)	1.7 (.789)
Obsessive compulsive disorder	‐	‐	0 (0%)	0 (0%)	4 (9%)	2 (3%)	2 (4%)	6.1 (.192)
Tic disorder	‐	‐	0 (0%)	0 (0%)	1 (2%)	1 (2%)	1 (2%)	1.2 (.882)
Feeding / eating disorder	‐	‐	0 (0%)	0 (0%)	11 (26%)	10 (16%)	0 (0%)	27.5 (<.001)
Baseline Webcam Evaluation Validity								57.4 (<.001)
1–3 measures valid (n, %)	0 (0%)	0 (0%)	0 (0%)	1 (4%)	0 (0%)	3 (6%)	0 (0%)
4–11 measures valid (n, %)	9 (22%)	29 (25%)	7 (21%)	10 (42%)	30 (69.8%)	30 (47%)	13 (23%)
All measures valid (n, %)	31 (78%)	87 (75%)	26 (79%)	13 (54%)	13 (30.2%)	30 (47%)	43 (77%)
Number of Valid Measures (M, SD)	11.3 (1)	11.2 (2)	11.1 (2)	9.9 (3)	10.0 (2)	9.8 (3)	11.3 (2)	6.3 (<.001)

#	Measure	Stimulus Paradigm	# of indicators	Evaluation Validity Overall %	% Valid no ID	% Valid ID	Internal Consistency Reliability (Cronbach’s α)	1-Month Test-Retest Reproducibility (r)	4-Month Test-Retest Stability (r)
1	Overall Attention	All	15	100%	100%	100%	.89	.52	.50
2	Attentional Scanning	Processing Speed	12	87%	89%	84%	.94	.66	.64
3	Positive Emotion	Social	32	100%	100%	100%	.93	.63	.62
4	Negative Emotion	Social	32	100%	100%	100%	.95	.44	.38
5	Social Attention	Social	141	92%	95%	89%	.89	.62	.64
6	Social Preference	Social	69	92%	95%	89%	.75	.48	.40
7	Face Preference	Social	28	92%	94%	88%	.90	.37	.29
8	Non-social Preference	Social	42	92%	95%	89%	.67	.31	.31
9	Receptive Vocabulary	Receptive Vocabulary	39	94%	96%	89%	.93	.73	.72
10	Speed to Faces	Social	28	92%	94%	88%	.93	.29	.29
11	Speed to Object	Processing Speed	12	87%	89%	84%	.95	.53	.51
12	Reading Accuracy	Single-word Reading	46	96%	99%	91%	.91	.68	.72

	Convergent Validity		Discriminant Validity
Webcam Measure	Measures	Average \|r\|	Measures	Average \|r\|	t (p)
Overall Attention	Estimated IQ, ADHD Symptoms, Executive Functioning	.30	Anxiety, Mood, Challenging Behavior	.17	2.53 (.012)
Attentional Scanning	Estimated IQ, ADHD Symptoms, Executive Functioning	.43	Anxiety, Mood, Challenging Behavior	.23	4.10 (<.001)
Positive Emotion	Mood-Hypomania, Anxiety	.10	Motor, Daily Living Skills	.07	0.47 (.635)
Negative Emotion	Mood-Emotion Regulation, Anxiety	.09	Motor, Daily Living Skills	.11	−0.33 (.742)
Social Attention	Autism Symptoms	.55	Anxiety, Mood	.23	6.95 (<.001)
Social Preference	Social Communication / Interaction Symptoms	.36	Anxiety, Mood	.16	3.69 (<.001)
Face Preference	Social Communication / Interaction Symptoms	.26	Anxiety, Mood	.12	2.50 (.013)
Non-social Preference	Social Communication / Interaction Symptoms, Restricted / Repetitive Behavior	.21	Anxiety, Mood	.09	2.27 (.024)
Receptive Vocabulary	Estimated IQ, Speech Level, Social Communication / Interaction Symptoms	.29	Anxiety, Mood, Sleep	.14	2.38 (.018)
Speed to Faces	Social Communication / Interaction Symptoms	.25	Anxiety, Mood, Challenging Behavior	.12	2.43 (.016)
Speed to Object	Estimated IQ	.47	Anxiety, Mood, Challenging Behavior	.24	3.70 (<.001)
Reading Accuracy	Reading Fluency Level	.62	Anxiety, Mood, Sleep	.14	8.05 (<.001)

	No ID n=224	ID n=151
	M (sd)	M (sd)	Raw Δ	t (p)	Cohen’s d
Overall Attention (%)	82.1 (14)	70.5 (17)	+11.6% (1.8 min total)	7.1 (<.001)	.75
Attentional Scanning (Count)	11.6 (3.4)	7.9 (2.5)	+3.7 glances to each target	9.9 (<.001)	1.20
Positive Emotion (%)	6.4 (8.7)	10.3 (9.1)	−3.9% intensity	−4.2 (<.001)	−.44
Negative Emotion (%)	2.2 (3.0)	3.4 (4.1)	−1.2% intensity	−3.3 (.001)	−.35
Social Attention (z)	−.02 (1.0)	−1.52 (1.3)	+1.5 control SDs	11.9 (<.001)	1.14
Social Preference (FD)	1.4 (0.3)	1.2 (0.3)	+0.2 seconds per AOI	6.0 (<.001)	.68
Face Preference (FD)	1.3 (0.8)	0.8 (0.5)	+0.5 seconds per AOI	6.1 (<.001)	.70
Non-social Preference (FD)	1.1 (0.4)	1.2 (0.4)	−0.1 seconds per AOI	−2.1 (.038)	−.24
Receptive Vocabulary (FD)	41.9 (25.7)	17.1 (13.6)	+24.8 seconds to all targets	10.1 (<.001)	1.13
Speed to Faces (TFF)	7.2 (2.1)	8.0 (1.8)	−0.8 seconds per AOI	−3.3 (<.001)	−.37
Speed to Object (TFF)	4.9 (1.3)	6.1 (1.2)	−1.2 seconds per AOI	−7.2 (<.001)	−.87
Reading Accuracy (FD)	37.9 (22.8)	16.6 (14.1)	+21.3 seconds to all targets	9.2 (<.001)	1.06

Stimulus #	Stimulus Type	Duration (sec)
1	Instructions	4.1
2	Facial Affect ID	6
3	Facial Affect ID	6
4	Facial Affect ID	6
5	Facial Affect ID	6
6	Joke	6.5
7	Break	0.7
8	Joint Attention	4.6
9	Joke	6.8
10	Break – blank screen	0.7
11	Social vs Abstract	8
12	Social vs Abstract	6
13	Instructions	4.4
14	Facial Affect ID	6
15	Facial Affect ID	6
16	Facial Affect ID	6
17	Facial Affect ID	6
18	Joke	5.9
19	Break – blank screen	0.7
20	Joint Attention	4.3
21	Joke	7.3
22	Break – blank screen	0.7
23	Social vs Abstract	8
24	Social vs Abstract	6.5
25	Instructions	4.4
26	Joint Attention	3.7
27	Joint Attention	4.5
28	Social vs. Abstract	8
29	Joint Attention	4
30	Joint Attention	4
31	Social vs. Abstract	5.9
32	Naturalistic Scene	12
33	Naturalistic Scene	12
34	Instructions	4.4
35	Naturalistic Scene	10
36	Joint Attention	3.8
37	Naturalistic Scene	7.8
38	Joke	6.7
39	Break - blank screen	0.7
40	Social vs. Abstract	6
41	Naturalistic Scene	7.7
42	Social vs. Abstract	6
43	Naturalistic Scene	10.5

Trial #	# of Stimuli	# of Target Stimuli	# of Distractor Stimuli	Duration (sec)	Target Stimuli	Distractor Stimuli
1	5	3	2	7	Pink Flower	Green Tree, Green Leaf
2	7	4	3	7	Yellow Star	Pink Hearts, Blue Circles,
3	7	4	3	8	Shoe	Shirts, Sweater
4	9	5	4	10	White truck	Blueberry, White Airplane
5	9	5	4	10	Fork	Spoon, Knife
6	11	6	5	10	Fly	Ant, Ladybug
7	11	6	5	10	Red Apple	Red Ball, Red Balloon
8	11	6	5	10	Orange Hat	Orange Cup, Pumpkin
9	13	7	6	10	Bird Head	Cat Head, Lion
10	15	8	7	10	Jelly Fish	Octopus, Squid
11	15	8	7	10	Cart	Truck, Pile
12	15	8	7	10	Rook	Pawn, Bishop

Stimulus #	Stimulus Type	Target #	Target Word	Distractor(s)	Duration (sec)	Position
1	2X2	1	baby	hat	5	1
2	2X2	2	fish	lion	5	2
3	2X2	3	shoes	sweater	5	2
3	2X2	4	apple	banana	5	3
4	2X2	5	eating	ring	5	4
4	2X2	6	ball	umbrella	5	1
5	2X2	7	drinking	eating	5	3
5	2X2	8	running	mouth	5	4
6	2X2	9	socks	pants	5	4
6	2X2	10	sleeping	dancing	5	3
7	2X2	11	kicking	drinking	5	2
7	2X2	12	fence	gift	5	4
8	3X2	13	mouth	ball	5	4
8	3X2	14	umbrella	kicking	5	5
8	3X2	15	muffin	plant	5	6
9	3X2	16	ring	raccoon	5	3
9	3X2	17	fountain	muffin	5	2
9	3X2	18	elbow	fence	5	5
10	3X2	19	dentist	timber	5	4
10	3X2	20	aquarium	culinary	5	1
10	3X2	21	yacht	miniature	5	5
11	3X2	22	culinary	gesture	5	2
11	3X2	23	compass	toxic	5	3
11	3X2	24	wedge	fungus	5	6
12	3X2	25	wrench	gauge	5	6
12	3X2	26	reptile	dictator	5	5
12	3X2	27	trumpet	virtuoso	5	1
13	4X2	28	gift	fountain	5	4
13	4X2	29	jewelry	elbow	5	2
13	4X2	30	map	shirt	5	6
13	4X2	31	raccoon	eating	5	8
14	4X2	32	duet	banister	5	8
14	4X2	33	noxious	irregular	5	2
14	4X2	34	admonish	parallel	5	3
14	4X2	35	aviator	physician	5	6
15	4X2	36	carnivore	herbivore	5	7
15	4X2	37	speedometer	thermometer	5	5
15	4X2	38	amorphous	admonish	5	2
15	4X2	39	virulent	apathetic	5	3

Stimulus #	Stimulus Orientation	Target #	Target Word	Distractor Word(s)
1	left/right	1	it	to
2	left/right	2	so	up
3	top/bottom	3	me	do
3	top/bottom	4	on	do
4	left top/middle/right bottom	5	dog	hot, eat
5	left top/middle/right bottom	6	win	buy
5	left top/middle/right bottom	7	car	buy
6	left top/middle/right bottom	8	map	bag, hard
7	left top/middle/right bottom	9	few	how
7	left top/middle/right bottom	10	out	how
8	left top/middle/right bottom	11	run	set, why
9	right top/middle/left bottom	12	cat	leg
9	right top/middle/left bottom	13	all	leg
10	right top/middle/left bottom	14	boy	own, fly
11	4 squares	15	tree	throw
11	4 squares	16	from	find
12	4 squares	17	time	take
12	4 squares	18	fall	fine
13	4 squares	19	large	leave
13	4 squares	20	cheat	chain
14	4 squares	21	adult	assist
14	4 squares	22	spoon	stone
15	4 squares	23	orange	onion
15	4 squares	24	silver	stream
16	4 squares	25	people	program
16	4 squares	26	office	often
17	4 squares	27	stretch	soldier
17	4 squares	28	match	model
18	4 squares	29	service	student
18	4 squares	30	magic	marry
19	4 squares	31	capacity	citizen
19	4 squares	32	facility	foreign
20	4 squares	33	railway	rapidly
20	4 squares	34	opinion	object
21	4 squares	35	listener	logical
21	4 squares	36	absolute	abandon
22	4 squares	37	aircraft	apparent
22	4 squares	38	language	launch
23	4 squares	39	flaccid	freckle
23	4 squares	40	regiment	resilient
24	4 squares	41	factitious	fungible
24	4 squares	42	reprimand	repudiate
25	4 squares	43	generic	glorious
25	4 squares	44	niche	nausea
26	4 squares	45	neurotic	neophyte
26	4 squares	46	gnarled	gimmicky

Expert Feedback
Device type performance measure was completed on:	Mini Tablet	Standard Tablet	Laptop with internal webcam					Laptop with external webcam						Desktop with internal webcam					Desktop with external webcam
What is the screen size on the device you used?	Less than 10 inches	10 – 12 inches				12 – 18 inches							Greater than 20 inches
Is there anything you felt would have been helpful to know before beginning the [paradigm measure]?		(text entry)
Please rate the overall relevance of the [paradigm measure] to the neurodevelopment / genetic disorder you represent:		Extremely relevant				Very relevant				Somewhat relevant						Not relevant at all
Please rate the instructions for the section of the assessment:		Very clear				Somewhat clear				Somewhat difficult to follow							Very difficult to follow
Specific comments regarding the instructions for this section:		(text entry)
Please rate the quality of the audio during the section of this assessment:		The audio was very clear				The audio was somewhat clear						The audio was not clear
Please rate the quality of pictures used during this section of the assessment		High quality				Medium quality						Low quality
If you answered that some or all photos were low quality, please indicate which photos you felt were not high quality		(text entry)
Please rate the timing of the assessment:		Very fast		Somewhat fast			Neither slow nor fast				Somewhat slow							Very slow
Specific comments regarding the timing for the assessment		(text entry)
How appropriate was the level of difficulty of the [paradigm measures] targets?		Very appropriate			Appropriate				Somewhat inappropriate						Inappropriate
Please share any concerns you have regarding the level of difficulty or the array of the [paradigm measure] targets:		(text entry)
Please check below any specific concerns you have (check all that apply)		Too many easy targets		Too many hard targets			Too few easy targets				Too few hard targets							Too many moderate difficulty targets
		Too few moderate difficulty targets		Targets were too close together			Targets were too far apart				Target array was not appropriate							Other concerns (text entry)
What aspects of completing measure do you think will be the most difficult for participants?		(text entry)
Is there anything you feel that could help a caregiver or guardian administer this measure at home?		(text entry)
Any additional comments regarding the measure you feel researchers should be aware of?		(text entry)

Overview (asked for all paradigms)
Please rate your overall experience with this assessment.	Extremely positive	Somewhat positive	Neither positive nor negative		Somewhat negative	Extremely negative
How easy / hard was it for you to complete this assessment?	Extremely easy	Somewhat easy	Neither easy nor hard		Somewhat hard	Extremely hard
What type of device did you complete this assessment on?	Tablet with stand	Laptop	PC/MAC Desktop Computer		Other (text entry)
Were you sitting still during the video?	Yes			No
Did you engage in any sensory movements (rocking back and forth, hand flicking or flapping, going up and down on tippy toes, etc.)?	Yes			No
Please provide specific details on sensory movements you engaged in during the video?				(text entry)

Breaks / Eye Calibration (asked for all paradigms)
How many breaks did you need to complete the [paradigms name] performance measure?	No breaks		One break					Two or three breaks			Four or five breaks
How long were the breaks on average?	No breaks taken	Less than 5 minutes			5 – 15 minutes		16 – 30 minutes			30 minutes to 1 hour			1 hour or more
How often did you look away?	Very often	Often			Sometimes		Infrequently			Very infrequently			Did not look away from the screen
How often did you cover or touch your face during the assessment?	Very often			Often		Infrequently			Very infrequently			Did not touch face during the assessment
Did you end the video early at any point?	Yes								No

	Processing Speed	Receptive Vocabulary	Single-Word Reading
Number of experts completing	9	10	9
Device and webcam type (webcam: internal or external)	1 Mini tablet 5 Laptop internal 1 Laptop external 2 Desktop external	6 Laptop internal 1 Laptop external 3 Desktop external	5 Laptop internal 4 Desktop external
Screen size	1 <10” 3 10–12” 3 12–18” 4 19+”	2 10–12” 3 13–18” 4 19+”	2 10–12” 3 13–18” 4 19+”
Paradigm relevance (M, SD, range) (1=highly relevant to 4=not relevant)	1.6 (0.7, 1–3)	1.6 (0.8, 1–3)	2.4 (1.0, 1–4)
Clarity of instructions (M, SD, range) (1=very clear to 4=very difficult to follow)	1.1 (0.3, 1–2)	1.2 (0.4, 1–2)	1.3 (1.0, 1–4)
Quality of audio (M, SD, range) (1=very clear to 4=not clear)	1 (0)	1.7 (0.9, 1–4)	1.1 (0.3, 1–2)
Quality of pictures / words (M, SD, range) (1=high, 2=medium, 3=low)	1 (0)	1.1 (0.3, 1–2)	1.0 (0)
Timing of administration (M, SD, range) (1=very fast to 3=neither fast or slow to 5=very slow)	3 (0.3, 3–4)	3.1 (0.3, 3–4)	2.9 (0.3, 2–3)
Difficulty level (M, SD, range) (1=very appropriate to 4=inappropriate)	1.7 (0.5, 1–2)	2.4 (0.9, 1–4)	2.3 (1.3, 1–4)
Possible concerns (n)
Too many easy targets	0	0	0
Too many moderate diff targets	0	0	0
Too many hard targets	0	4	2
Too few easy targets	0	1	1
Too few moderate diff targets	0	0	0
Too few hard targets	0	0	0
Targets too close together	2	0	0
Targets too far apart	0	0	0
Target array not appropriate	0	0	0

	Social Attention	Processing Speed	Receptive Vocabulary	Single-Word Reading
Number of parents completing	8	8	9	8
Overall experience (M, SD, range) (1=extremely positive to 5=extremely negative)	1.9 (0.8, 1–3)	2.0 (0.8, 1–3)	2.1 (1.0, 1–4)	2.9 (1.3, 1–5)
Difficulty (M, SD, range) (1=extremely easy to 5=extremely hard)	2.0 (0.9, 1–4)	2.5 (1.5, 1–5)	2.7 (1.0, 2–4)	3.3 (1.7, 1–5)
Device and webcam type (M, SD, range) (webcam: internal or external)	6 Laptop 1 PC/Mac 1 Did not report	7 Laptop 1 PC/Mac	8 Laptop 1 PC/Mac	7 Laptop 1 PC/Mac
Sitting during evaluation	6 Yes 2 No	7 Yes 1 No	7 Yes 1 No	5 Yes 3 No
Sensory-related movements during evaluation	2 Yes 6 No	2 Yes 6 No	4 Yes 5 No	5 Yes 3 No
Breaks	8 No	8 No	8 No 1 One break after each segment	8 No
Look away from screen (1=Very often to 6=No looking away)	3.8 (2.3, 1–6)	3.0 (2.1, 1–6)	3.8 (1.7, 2–6)	3.4 (2.2, 1–6)
Cover or touch face (1=Very often to 5=No covering or touching)	4.1 (1.1, 2–5)	4.1 (0.8, 3–5)	3.3 (1.4, 2–5)	3.5 (1.5, 1–5)
End video early	8 No	8 No	9 No	8 No
Unexpected noise during evaluation	5 No noise 3 One unexpected noise	7 No noise 1 five+ occurrences	7 No noise 2 One unexpected noise	6 No noise 2 One unexpected noise
Adjust lighting during evaluation	8 No adjustments	8 No adjustments	9 No adjustments	8 No adjustments
Connection problems	4 No 2 One occurrence 2 Two or three occurrences	8 No	9 No	7 No 1 One occurrence
Overall attention (1=excellent to 3=average to 5=very poor)	1.9 (1.2 (1–4)	2.4 (1.3, 1–4)	2.2 (1.2, 1–4)	2.5 (1.5, 1–5)
Physical assistance	4 None 1 One time 3 Most of the time	3 None 1 One time 1 Part-time 3 Most of the time	3 None 2 Part-time 3 Most of the time 1 Did not report	3 None 4 Part-time 1 Most of the time
Gestural assistance	5 None 1 One time 2 Part-time	3 None 4 Part-time 1 Most of the time	5 None 3 Part-time 1 Did not report	4 None 3 Part-time 1 Most of the time
Verbal assistance	4 None 2 One time 2 Most of the time	4 None 1 One time 1 Part-time 2 Most of the time	3 None 1 One time 2 Part-time 2 Most of the time 1 Did not report	3 None 1 One time 3 Part-time 1 Most of the time
Paradigm relevance (M, SD, range) (1=highly relevant to 4=not relevant)	1.8 (0.4, 1–2)	1.8 (.07, 1–3)	1.2 (0.4, 1–2)	2.3 (0.9, 1–3)
Quality of audio (1=very clear to 4=not clear)	1.1 (0.3, 1–2)	1 (0)	1 (0)	1.3 (0.4, 1–2)
Quality of video / pictures / words (1=high, 2=medium, 3=low)	1.3 (0.5, 1–2)	1.4 (0.7, 1–3)	1 (0)	1.1 (0.3, 1–2)
Timing of administration (1=very fast to 3=neither fast or slow to 5=very slow)	2.6 (0.7, 1–3)	2.3 (0.9, 1–3)	2.0 (0.8, 1–3)	2.5 (0.8, 1–3)

	1: OA	2: AS	3: PE	4: NE	5: SA	5: SP	6: FP	7: NP	8: RV	9: SF	10: SO	11: RA
1: Overall attention (OA)	-	.34^*	−.19^*	−.08	−.45^*	.13	.33^*	−.25^*	.54^*	−.26^*	−.28^*	.55^*
2: Attentional Scanning (AS)		-	−.24^*	−.21^*	−.49^*	.23^*	.27^*	−.25^*	.54^*	−.21^*	−.78^*	.48^*
3: Positive Emotion (PE)			-	−.06	.23^*	−.15	−.15	.08	−.23^*	.02	.22^*	−.27^*
4: Negative Emotion (NE)				-	.22^*	−.15	−.11	.15	−.23^*	.08	.19^*	−.28^*
5: Social Attention (SA)					-	−.54	−.52^*	.19^*	−.59^*	.30^*	.45^*	−.62^*
5: Social Preference (SP)						-	.64^*	−.05	.35^*	−.30^*	−.24^*	.38^*
6: Face Preference (FP)							-	−.50^*	.38^*	−.79^*	−.34^*	.38^*
7: Non-social Preference (NP)								-	−.18	.62^*	.33^*	−.20
8: Receptive Vocabulary (RV)									-	−.22	−.55^*	.78^*
9: Speed to Faces (SF)										-	.30^*	−.21
10: Speed to Object (SO)											-	−.47^*
11: Reading accuracy (RA)												-

			Autism Symptom Level
			Baseline	1-Month Follow-Up	4-Month Follow-Up
	Total n	ASD n	r	r	r
Training	192	42	.53	.42	.54
Testing	76	18	.49	.32	.54
Validation	77	22	.61	.53	.65
Testing + Validation	153	40	.55	.43	.58
Ages 3–8	64	20	.54	.48	.62
Ages 9+	89	20	.57	.37	.59
			ASD Diagnosis
			Baseline	1-Month Follow-Up	4-Month Follow-Up
	Total n	ASD n	AUC (SE)	AUC (SE)	AUC (SE)
Training	192	42	.809 (.036)	.735 (.041)	.804 (.039)
Testing	76	18	.790 (.066)	.693 (.073)	.755 (.081)
Validation	77	22	.857 (.050)	.790 (.067)	.883 (.051)
Testing + Validation	153	40	.821 (.041)	.744 (.049)	.815 (.048)
Ages 3–8	64	20	.806 (.060)	.730 (.072)	.836 (.062)
Ages 9+	89	20	.833 (.058)	.749 (.071)	.822 (.072)

Environment (asked for all paradigms)
During any point of the assessment, did an unexpected noise occur within the environment?	No occurrences of unexpected noises	1 occurrence	2 – 3 occurrences	4 – 5 occurrences	5+ occurrences of unexpected noises
During any point of the assessment, were you required to adjust the lighting in the room?	No adjustments to lighting	1 adjustment	2–3 adjustments	4–5 adjustments		5+ adjustments
During any point of the assessment, did you experience internet connection difficulties (i.e. disconnection, weak connection, slow speed, etc.)	No internet difficulties	1 occurrence	2 – 3 occurrences	4 – 5 occurrences		5 + occurrences

Ease of Completion (asked for all paradigms)
What gave you the most difficulty when completing this assessment?	(text entry)
What was the easiest thing about completing this assessment?	(text entry)

Assistance (asked for all paradigms)
Please rate your child’s overall attention level during the assessment?	Excellent	Good		Average	Poor	Terrible
Please indicate the level of Physical Assistance (i.e., staying seated, position head, etc.)	Did not provide physical assistance	Assisted one time		Assisted part of the time	Assisted most of the time
Please indicate the level of Gestural Assistance (i.e., using your finger to point things on the screen, point to screen to get their attention, etc.)	Did not provide gestural assistance	Assisted one time		Assisted part of the time	Assisted most of the time
Please indicate the level of Verbal Assistance (i.e., “look here” or “watch the video”, etc.)	Did not provide verbal assistance	Assisted one time		Assisted part of the time	Assisted most of the time
Please provide additional information on the level of assistance you provided. Type n/a if no assistance was provided			(text entry)
What was the most difficult part of aiding someone in completing this assessment?			(text entry)
Was there something that could have made it easier for you to assist someone in completing this assessment?			(text entry)

#	Measure	Operational Definition
1	Overall Attention	Average percentage viewing time to the screen across all 4 stimulus paradigms (15 one-minute blocks of stimuli)
2	Attentional Scanning	Average number of glances to processing speed stimuli across all quadrant of the screen
3	Positive Emotion	Average intensity rating for happy and surprised emotions across all social attention stimuli
4	Negative Emotion	Average intensity rating for fear, anger, disgust, and sadness emotions across all social attention stimuli
5	Social Attention	Empirically-derived measure of attention to social versus non-social information using standardized fixation duration, fixation count, and time-to-first fixation
6	Social Preference	Average of all fixation durations to social areas-of-interest across all social attention stimuli
7	Face Preference	Average of all fixation durations to face areas-of-interest across all social attention stimuli
8	Non-social Preference	Average of all fixation durations to non-social areas-of-interest across all social attention stimuli
9	Receptive Vocabulary	Sum of all fixation durations across vocabulary target word-picture combinations (27 total targets)
10	Speed to Faces	Average time-to-first-fixation on the most prominent face areas-of-interest across all social attention stimuli
11	Speed to Object	Average time-to-first-fixation on each target object area-of-interest across all processing speed stimuli
12	Reading accuracy	Sum of all fixation durations across target reading words (38 total targets)

#	Measure	Validity Guidelines
1	Overall Attention	At least 50% time on screen to at least one 1-minute video (≥30 seconds with gaze on-screen)
2	Attentional Scanning	At least 4 valid processing speed stimuli (≥40 seconds with gaze on-screen)
3	Positive Emotion	At least 50% time on screen to at least one 1-minute video (≥30 seconds with gaze on-screen)
4	Negative Emotion	At least 50% time on screen to at least one 1-minute video (≥30 seconds with gaze on-screen)
5	Social Attention	At least 8 valid stimuli with at least 8 social or 8 non-social AOIs empirically-identified
6	Social Preference	At least 8 valid social stimuli and at least 8 valid social AOIs (≥50 seconds with gaze on-screen)
7	Face Preference	At least 8 valid social attention stimuli with faces (≥40 seconds with gaze on-screen)
8	Non-social Preference	At least 8 valid social stimuli and 8 valid non-social areas-of-interest (≥50 seconds gaze on-screen)
9	Receptive Vocabulary	At least 8 valid target words (≥40 seconds with gaze on screen)
10	Speed to Faces	At least 8 valid social attention stimuli with faces (≥40 seconds with gaze on-screen)
11	Speed to Object	At least 4 valid processing speed stimuli (≥40 seconds with gaze on-screen)
12	Reading accuracy	At least 8 valid target words (≥32 seconds with gaze on screen)

Gaze Metric	Definition
Fixation duration	The total duration in milliseconds of an identified fixation from the first sample to the last sample included in the fixation definition. Fixation was defined as at least 66ms of gaze samples within a 100-pixel dispersion area. The total fixation duration was determined by summing the fixation duration for all identified fixations within an area-of-interest.
Fixation count	A count of all fixations detected within an area-of-interest.
Glance count	The number of times gaze entered and left an area-of-interest. Gaze to the area of interest was defined by at least one identified fixation.
Time-to-first-fixation	The time elapsed from the start of the temporal area-of-interest to the first sample of the first identified fixation within an area-of-interest.

Score range (z-score) [NT Mean=0, SD=1]	mLR	Interpretation	Cut Score (z-score)	Sensitivity	Specificity
< +0.1	.153	Reduced Probability	0.1	92%	50%
.1 to 1.8	.947	No Change	1.0	77%	65%
1.81 to 2.45	3.92	Increased Probability	1.8	55%	90%
2.46+	6.41	Strongly Increased Probability	2.45	36%	95%
Youden’s J	-	-	1.49	70%	87%

	PTEN		NFIX		SYNGAP1		Other NDGS		idiopathic NDD
Webcam Measure	M	(SD)	M	(SD)	M	(SD)	M	(SD)	M	(SD)	Pattern
Overall Attention	−0.07	(1.03)	−0.55	(1.18)	−0.78	(1.04)	−1.08	(1.29)	−0.16	(1.04)	Low scores for NFIX, SYNGAP1, other NDGS
Attentional Scanning	−0.12	(1.58)	−0.85	(0.65)	−1.30	(0.80)	−1.28	(0.77)	−0.05	(1.25)	Low scores for NFIX, SYNGAP1, other NDGS
Positive Emotion	0.05	(0.74)	0.35	(1.23)	−0.07	(0.71)	0.32	(1.02)	−0.06	(0.65)	High scores for NFIX and other NDGS
Negative Emotion	0.18	(1.67)	−0.09	(0.92)	0.62	(1.67)	0.59	(1.81)	0.10	(0.70)	High scores for SYNGAP1 and other NDGS
Social Attention	−0.34	(1.00)	−1.78	(1.09)	−1.70	(1.34)	−1.78	(1.25)	−0.11	(1.11)	Low scores for all but idiopathic NDD
Social Preference	−0.34	(0.95)	−0.67	(0.85)	−1.27	(1.31)	−0.79	(1.19)	0.04	(1.54)	Low scores for all but idiopathic NDD
Face Preference	−0.10	(0.90)	−0.32	(0.80)	−0.54	(0.78)	−0.47	(0.85)	0.13	(1.27)	Low scores for NFIX, SYNGAP1, other NDGS
Non-social Preference	0.40	(1.01)	0.55	(0.94)	0.82	(1.44)	0.62	(1.24)	0.11	(1.07)	High scores for all but idiopathic NDD
Receptive Vocabulary	−0.26	(1.04)	−0.94	(0.91)	−0.82	(0.91)	−1.01	(0.72)	−0.13	(1.00)	Low scores for all but idiopathic NDD
Speed to Faces	−0.09	(1.12)	0.19	(0.74)	0.50	(0.82)	0.38	(0.84)	−0.11	(1.10)	Slow for NFIX, SYNGAP1, and other NDGS
Speed to Object	0.08	(1.43)	0.53	(0.74)	1.00	(0.83)	1.08	(0.74)	0.09	(1.13)	Slow for NFIX, SYNGAP1, and other NDGS
Reading accuracy	−0.21	(0.96)	−0.70	(0.73)	−0.74	(0.91)	−0.93	(0.64)	−0.35	(1.01)	Low scores for all but idiopathic NDD

PERMALINK

Development of webcam-collected and artificial-intelligence-derived social and cognitive performance measures for neurodevelopmental genetic syndromes

TW Frazier, PhD

RM Busch

P Klaas

K Lachlan

S Jeste

A Kolevzon

E Loth

J Harris

L Speer

T Pepper

K Anthony

JM Graglia

CG Delagrammatikas

S Bedrosian-Sermone

C Smith-Hicks

K Huba

R Longyear

L Green-Snyder

F Shic

M Sahin

C Eng

AY Hardan

M Uljarević

Abstract

1. Introduction

2. Methods

2.1. Initial Stimulus Development

2.2. Clinician-Scientist Experts and Parent Pilot Evaluation Feedback

2.3. Parent/Caregiver Administration Support Training

2.4. Webcam Collection of Gaze

2.5. Automated Scoring of Facial Expressions

2.6. Development of a Priori Validity Criteria and Scoring

2. 7. Participants for Initial Measure Evaluation

2.8. Procedure

2.9. Statistical Analyses

2.9.1. Sample Characterization

2.9.2. Evaluation and Measure Validity

2.9.3. Reliability

2.9.4. Convergent and Discriminant Validity

2.9.5. Concurrent Validity with ID, ASD Diagnoses, and Autism Symptom Levels

2.9.6. Neurobehavioral Patterns across NDGS and idiopathic NDD Groups

2.9.7. Statistical Power

2.9.8. Statistical Analysis Implementation

3. Results

3. 1. Pilot Evaluation Results

3.2. Sample Characteristics

Table 1.

3.3. Evaluation Validity

Table 2.

3.4. Reliability

3.5. Convergent and Discriminant Validity

Table 3.

3.6. Concurrent Validity with ID, ASD Diagnosis, and Autism Symptom Level

Table 4.

3.7. Group Profiles Across Performance Measures

Figure 1.

4. Discussion

5. Conclusions

Acknowledgments

Conflict of Interest

Funding:

Appendix 1. Performance paradigm creation process.

Appendix 2. Receptive vocabulary target selection and stimulus creation.

Appendix 3. Single word reading test item selection and stimulus creation

Stimulus creation

Appendix 4. Screenshots of example social paradigm stimuli.

Appendix 5. Screenshots of example processing speed paradigm stimuli.

Appendix 6. Screenshots of example receptive vocabulary paradigm stimuli.

Appendix 7. Screenshots of example single-word reading paradigm stimuli.

Appendix 8. Stimulus order and composition for all social attention stimuli.

Appendix 9. Stimulus order and composition for all processing speed stimuli.

Appendix 10. Stimulus order and composition for all receptive vocabulary stimuli.

Appendix 11. Stimulus order and composition for all single-word reading stimuli.

Appendix 12. Pilot testing post-evaluation questions for clinician-scientist-experts.

Appendix 13. Pilot testing post-evaluation questions for parents assisting patient participants.

Appendix 14. Parent/caregiver administration support training process.

Appendix 15. Methodological details for computing social attention.

Appendix 16. Operational definitions for all webcam performance measures.