Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Sep 1.
Published in final edited form as: Am J Med Genet C Semin Med Genet. 2023 Aug 3;193(3):e32058. doi: 10.1002/ajmg.c.32058

Development of webcam-collected and artificial-intelligence-derived social and cognitive performance measures for neurodevelopmental genetic syndromes

TW Frazier 1,2, RM Busch 3,4, P Klaas 3, K Lachlan 5, S Jeste 6, A Kolevzon 7, E Loth 8, J Harris 9, L Speer 10, T Pepper 11, K Anthony 12, JM Graglia 13, CG Delagrammatikas 14, S Bedrosian-Sermone 15, C Smith-Hicks 9, K Huba 1, R Longyear 16, L Green-Snyder 17, F Shic 18, M Sahin 19, C Eng 4, AY Hardan 20, M Uljarević 20,21
PMCID: PMC10543620  NIHMSID: NIHMS1923134  PMID: 37534867

Abstract

This study focused on the development and initial psychometric evaluation of a set of online, webcam-collected, and artificial intelligence-derived patient performance measures for neurodevelopmental genetic syndromes (NDGS). Initial testing and qualitative input was used to develop four stimulus paradigms capturing social and cognitive processes, including social attention, receptive vocabulary, processing speed, and single-word reading. The paradigms were administered to a sample of 375 participants, including 163 with NDGS, 56 with idiopathic neurodevelopmental disability (NDD), and 156 neurotypical controls. Twelve measures were created from the 4 stimulus paradigms. Valid completion rates varied from 87% to 100% across measures, with lower but adequate completion rates in participants with intellectual disability. Adequate to excellent internal consistency reliability (α=.67 to .95) was observed across measures. Test-retest reproducibility at 1-month follow-up and stability at 4-month follow-up was fair to good (r=.40–.73) for 8 of 12 measures. All gaze-based measures showed evidence of convergent and discriminant validity with parent-report measures of other cognitive and behavioral constructs. Comparisons across NDGS groups revealed distinct patterns of social and cognitive functioning, including people with PTEN mutations showing a less impaired overall pattern and people with SYNGAP1 mutations showing more attentional, processing speed, and social processing difficulties relative to people with NFIX mutations. Webcam-collected performance measures appear to be a reliable and potentially useful method for objective characterization and monitoring of social and cognitive processes in NDGS and idiopathic NDD. Additional validation work, including more detailed convergent and discriminant validity analyses and examination of sensitivity to change, is needed to replicate and extend these observations.

1. Introduction

Advances in identifying pathogenic variation linked to neurodevelopmental disability (NDD) has accelerated the discovery of a growing number of specific neurodevelopmental genetic syndromes (NDGS). As NDGS are identified, natural history investigations have begun to characterize a wide spectrum of medical conditions and neurobehavioral strengths and weaknesses associated with each condition (Busch et al., 2023; Mulder et al., 2020; Vlaskamp et al., 2019). This work is crucial to developing patient support guidelines and ensuring that patients with NDGS receive appropriate supports that maximize their development. For example, in individuals with PTEN hamartoma tumor syndrome (PHTS) resulting from germline heterozygous mutations in PTEN, a spectrum of frontal-systems deficits has been identified from no impairment to very severe impairment associated with intellectual disability (ID) and autism spectrum disorder (ASD) (Busch et al., 2019; Ciaccio et al., 2018; Frazier et al., 2015; Steele et al., 2021). This pattern has been found to be stable over a period of 2 years (Busch et al., 2023), even in young children, and the specific profile of frontal systems impairment can be used to inform clinical and educational care (Frazier, 2019).

While there have been some initial attempts to provide more detailed characterization of neurobehavioral profiles across different NGDS, yield from the natural history and neurobehavioral studies have been limited by the lack of comprehensive and sensitive instruments appropriate for evaluations with geographically-dispersed populations. For example, within the Rare Disease Clinical Research Network – Developmental Synaptopathies Consortium natural history study of individuals with PHTS and ASD (Busch et al., 2019), in-person cognitive assessments were limited to annual visits and often required several hours of testing to collect data from relevant neurocognitive domains. Because of the extensive effort required, the related pilot clinical trial initiated within this network was limited to three in-person assessments over a six-month study period (Hardan et al., 2021; Srivastava et al., 2022). The infrequency, difficulty, and burden of these traditional approaches highlight the need for new phenotyping methods.

Identification of NDGS has also accelerated the development of syndrome-specific patient advocacy groups and foundations, as well as programs of research designed to better understand and translate molecular, cellular, and circuitry findings into intervention strategies. A primary goal of these patient advocacy groups - and the research programs they support - is to develop and evaluate the efficacy of personalized interventions. Recent reviews of NDGS have emphasized the need to understand pathophysiology and neurobehavioral profiles to generate personalized therapeutic strategies (Frazier, 2019; Sahin & Sur, 2015). Yet, given the small number of specialty clinics focused on each NDGS, and practical geographic constraints, many patients remain under-served and many clinics lack resources to collect extensive neurobehavioral assessments during clinic visits. Relatedly, due to the rare nature of many NDGS, natural history studies often rely on small sample sizes, which limits their value in identifying clinical endpoints for trials. In these small-sample longitudinal contexts, it is important to have reliable, stable indicators of individual performance, as compared to larger group studies where statistical certainty can be bolstered by adding participants. Having repeatable, online measures of neurobehavioral function could substantially improve both the statistical power of translational and clinical studies and increase the ability to more rapidly and sensitively identify individual differences in the pattern of intervention response. Administration of these meaures in the individual’s home rather than within a clinic setting would not only broaden access to research participation but might also reduce biases resulting from collection of neurobehavioral information in an unfamiliar setting.

Research in NDGS and idiopathic NDD is also limited by reliance on subjective measurements acquired from parents/caregivers and/or observations by clinician scientists, which has precipitated a call for the development of objective measures (Sahin et al., 2018). As a result, a number of tools have begun to be developed and have shown promise for objectively evaluating and tracking key functions relevant to neurodevelopment (Amit et al., 2020; Dawson et al., 2018; Egger et al., 2018; Goodwin et al., 2019; Manfredonia et al., 2019; McPartland et al., 2020; Ness et al., 2019; Tuncgenc et al., 2021). However, with a few notable exceptions, these measures have been developed solely for in-person evaluation, limiting their application and temporal sensitivity. In addition, noted measures have predominantly focused on the evaluation of only single domains rather than providing a more detailed characterization of multiple social, developmental, and cognitive domains. Furthermore, a high percentage of individuals with NDGS have significant cognitive and functional impairments. A relatively brief and repeatable battery of objective measures that can reliably capture a wide range of cognitive and behavioral capacities could supplement existing tools while simultaneously increasing sensitivity to intervention effects.

One possibility that can increase the objectivity of NDGS evaluations and simultaneously overcome accessibility barriers is to augment traditional characterization methods with appropriately-designed remotely-administered measures of neurobehavioral function. Designing remote measures for maximal accessibility has the potential to lower burden for providers as well as patients. Webcam-based eye tracking is a remote data collection method that uses cameras on everyday computing devices coupled with artificial-intelligence / machine learning algorithms to capture individual looking patterns toward probes such as the presentation of videos and images. Webcam data collection also permits the frame-by-frame automated facial expression analysis using machine learning algorithms that enable prototype matching using large training datasets. The potential for these methods to inform neurodevelopment is strong and, increasingly, both webcam-collected data (Simmatis et al., 2023) and artificial intelligence / machine learning algorithms (Nerusil et al., 2021) are being applied to create novel biometric measures for assessing child development and neurological conditions. A key advantage of webcam-based data collection is that the paradigms can be administered without direct real-time clinical supervision. Thus, an online, webcam-collected patient performance battery, capturing relevant social and cognitive measurements in an objective way, could supplement in-person assessment of NDGS patients and provide a more temporally-sensitive picture of neurobehavioral development in these populations. This is particularly true for individuals with medical and mental health comorbidities and cognitive impairments that merit closer surveillance but are currently underserved (Vlaskamp et al., 2019).

Unfortunately, at present, there are no accessible, scalable objective measures specifically designed for rapid and repeated evaluation of multiple social and cognitive domains important to NDGS and idiopathic NDD. The primary aim of this study was to address this limitation and develop social and cognitive stimulus paradigms that could be paired with webcam collection and artificial intelligence algorithms to measure key neurocognitive processes relevant to NDGS. Webcam-collected measures were developed in conjunction with clinician-scientist experts, patients, and parents/caregivers, following gold-standard principles of measure development (Boateng et al., 2018) and inclusive practices (FDA, 2009), to complement our recently developed and validated informant-report survey scales (Frazier et al., 2023). Individual paradigms were created to be brief (3–4 minutes) and to require only spontaneous or directed gaze, without motor or speech responses, thus making it appropriate for a wide range of developmental and cognitive levels. Stimuli followed best practices in gaze collection (Sasson & Elison, 2012) and test development (Boateng et al., 2018), including teaching parents to facilitate data collection (when needed) without interfering in the evaluation, presenting large elements within the visual field to limit accuracy issues in webcam gaze collection (Semmelmann & Weigelt, 2018), and, where relevant, focusing on very easy initial items with a graded increase in task difficulty. Based on careful attention to applicability to a wide range of individuals with NDGS, valid measure collection was expected to be achieved in the majority of participants, including those with intellectual disability.

A secondary aim of this study was to conduct initial psychometric evaluation of these measures in several distinct NDGS groups, people with idiopathic NDD, and neurotypical controls. Initial evaluation included estimation of scale reliability, test-retest reproducibility (1-month follow-up), and stability (4-month follow-up). Initial convergent and discriminant validity was assessed using data from other informant(parent)-reported clinical information (Frazier et al., 2023). In addition, given the importance of detecting autism within NDGS to ensure access to appropriate services, concurrent validity with ASD diagnoses and autism symptom levels was evaluated. Finally, using baseline data, exploratory analyses examined the pattern of cognitive and behavioral functioning across NDGS and idiopathic NDD.

2. Methods

2.1. Initial Stimulus Development

The stimulus paradigm development process is outlined in Appendix 1. Briefly, this included identifying or creating appropriate target items and stimuli across a wide range of ages (3–45) and ability levels (moderate to severe cognitive impairment to average ability levels); collecting feasability data; updating items and stimuli based on initial feedback; conducting a pilot administration of performance measures with 10 clinician-scientist experts and 9 parents and patients with NDGS and/or idiopathic NDD; and administering a post-evaluation survey to collect additional feedback and create the final performance paradigms.

The social paradigm and associated stimuli were chosen based on the combination of empirical work (Frazier et al., 2018) and comprehensive review of the literature (Chita-Tegmark, 2016; Frazier et al., 2017) . Specifically, a variety of social stimuli were selected, in part, due to the high rates of ASD occurrence in NDGS and the broader relevance of social attention to neurodevelopment as a transdiagnostic construct (Frazier, Uljarevic, et al., 2021; Salley & Colombo, 2016). The processing speed paradigm was selected because of the potential to use this cognitive paradigm to capture attentional scanning across the stimulus field, measure speed of object detection via gaze, the ease-of-administration in individuals with NGDS, particularly those with limited speech or motor difficulties, and the ability to create easier stimuli relevant to individuals with more significant intellectual impairments. Importantly, processing speed has been shown to be a very sensitive index of brain development and neuropathophysiological processes (Bove et al., 2021; Kail, 1991). The receptive vocabulary paradigm was selected because receptive language is a strong indicator of developmental trajectory and functional outcome (Frazier, Klingemier, et al., 2021) and can validly estimate results from standardized in-person testing using gaze to visual targets (Frazier et al., 2020). The single-word reading paradigm was developed based on a recommendation by clinician-scientist experts for identifying early reading, including in people with limited or no speech where reading is more difficulty to assess. This paradigm was also included based on its potential to monitor development of reading throughout childhood and early adulthood in NDGS. Additional information for receptive vocabulary and single-word reading target selection and stimulus creation are provided in Appendices 2–3. Example screenshots for each of the performance paradigms are included in Appendices 4–7, and stimulus/target order and composition information are provided in Appendices 8–11.

2.2. Clinician-Scientist Experts and Parent Pilot Evaluation Feedback

Ten clinician-scientist experts were recruited based on their clinical and/or research expertise with a specific NDGS group or idiopathic NDD. Nine parent-patient pairs were recruited from the respective groups (6 PHTS, 1 NFIX, 1 SYNGAP1, 1 ADNP, and 1 idiopathic ASD). Patients were intentionally selected to represent a range of ages and cognitive levels. After completing a pilot administration of performance paradigms, clinician-scientist experts and parents - who facilitated the webcam administration for the patient participant - completed a post-evaluation survey. Questions are provided in Appendices 12–13. This information was used to generate final stimulus videos and to improve the training of parents in facilitating administration to the child.

2.3. Parent/Caregiver Administration Support Training

Based on initial feedback, a parent/caregiver training process was developed (Appendix 14). This process included the following elements: 1) introduction to webcam technology, 2) training video, 3) parent completion of a “practice” stimulus set, 4) online training in valid task completion, and 5) virtual support meetings during initial and follow-up administrations. All of the elements were optional, but most participants used at least one option, and nearly all participants completed the parent “practice” stimuli.

2.4. Webcam Collection of Gaze

Participants were instructed to use a device with at least a 10” screen size based on results of initial pilot testing, which indicated that smaller screen sizes could reduce accuracy of point-of-regard relative to specific areas-of-interest. Webcam data were collected and processed using proprietary CoolTool software. The software was orginally intended as a neuromarketing tool, but initial feasability testing, including with several young children with neurodevelopmental disabilities, indicated good potential for use as a data collection platform. The minimum required camera resolution was 720p at 30fps. The gaze collection algorithm included a five-point calibration routine prior to each paradigm administration. This routine is coupled with a machine learning algorithm and was designed to detect webcam position within the 3-D space and intended to maximize gaze accuracy. On a frame-by-frame basis, gaze position relative to the 2-D screen was estimated. While accurate calibration is desirable, the gaze estimation model often functions adequately when less than ideal calibration data are acquired, making the system ideal for young and more impaired participants. Similar systems have been shown to have achieve ~3–5 degrees of calibration uncertainty, translating to accurate detection of areas >10% of screen size (Semmelmann & Weigelt, 2018; Shehu et al., 2021). The present stimulus paradigms were built with large areas-of-interest to be tolerant of higher levels of gaze uncertainty. Importantly, any reductions in gaze accuracy should reduce the reliability and validity of gaze-based measurements. Thus, observations of high reliability and evidence of convergent validity would suggest minimal impact of sub-optimal gaze calibration. To offset concerns regarding possible reductions in gaze calibration and accuracy negatively impacting neurobehavioral measurements, no indices were scored if total time with eyes on screen was estimated to be less than 30 seconds overall (out of 15 minutes of possible gaze time to the screen).

Areas-of-interest were generated for each stimulus. For social attention stimuli, these include both socially-relevant (e.g., faces, target objects, etc.) and socially-irrelevant stimuli (e.g., foreground and background distractors, non-target objects), based on our prior research (Frazier et al., 2018). For processing speed, receptive vocabulary, and single-word reading stimuli, areas-of-interest included target items/objects. For all stimuli, areas-of-interest are temporally-defined based on expected gaze patterns from prior research (social attention) (Frazier et al., 2018) or after the verbal directive has been given (cognitive paradigms) (Frazier et al., 2020).

2.5. Automated Scoring of Facial Expressions

The webcam software also includes a proprietary algorithm for automatically scoring facial expressions. Facial landmarks are identified in the 3-dimensional space and the artificial intelligence algorithm is applied to these landmarks on a frame-by-frame basis to generate probability scores based on accuracy of classification from training data (Kuntzler et al., 2021). Probability scores represent a match between the facial landmark configuration and known sets of facial expressions (fear, anger, disgust, sadness, surprise, joy, and neutral), with closer matches being interpreted as higher intensities of expression (range 0–100%). For the present study, and because specific affect recognition intensities can be prone to error for more subtle expressions (Kuntzler et al., 2021), specific expressions were aggregated into positive and negative categories to maximize reliability. Facial expression measures were only collected to the social attention stimuli, as these showed the greatest range of non-neutral expressions in preliminary data.

2.6. Development of a Priori Validity Criteria and Scoring

For each social and cognitive paradigm, the investigative team a priori identified possible gaze and facial expression measures that would be relevant to evaluating social and cognitive processes in NDGS and idiopathic NDD. The only exception to this is the social attention measure which was empirically-developed following our prior published methodology (Frazier et al., 2018) (see Appendix 15 for additional information). Appendix 16 presents operational definitions for each performance measure. Each gaze-based measure was only scored if stringent validity criteria were met. Appendix 17 includes validity criteria for all 12 webcam-collected measures. For each measure, validity criteria ensure that the participant attended to the stimuli for at least 30 seconds, and at least 8 valid targets or 4 valid stimuli were collected. Fixations were scored by identifying at least 66ms of gaze point samples within a 100-pixel dispersion. Four gaze metrics are calculated for each area-of-interest – fixation duration, fixation count, glance count, and time-to-first fixation (Appendix 18). These metrics were used to score the 12 performance measures evaluated in this study.

2. 7. Participants for Initial Measure Evaluation

NDGS groups included participants with PTEN Hamartoma Tumor Syndrome (PHTS), ADNP , SYNGAP1, or NFIX recruited via contacts through the PTEN Hamartoma Tumor Syndrome Foundation with the support of the PTEN Research Foundation, the ADNP Kids Foundation, the SYNGAP Research Fund, and the Malan Syndrome Foundation. Other individuals with NDGS were recruited via the Simons Foundation Searchlight registry and included people with mutations in GRIN2B, CSNK2A1, HIVEP2, SCN2A, MED13L, and STXBP1. Given the relatively small sample sizes for ADNP (n=11) and these NDGS groups, they were combined into a single “other NDGS” group (n=63). Individuals were included if they were between the ages of 3 and 45 at enrollment and had an available parent or other close relative/caregiver to complete informant-report measures. Siblings of individuals with NDGS were also eligible to participate, and unrelated neurotypical controls were recruited using StudyKik, a national recruitment service. Siblings and unrelated controls who were reported to have an idiopathic neurodevelopmental disability were included in a separate group.

2.8. Procedure

Parent/caregiver informants first completed a demographic and clinical information questionnaire followed by 11 neurobehavioral evaluation tool (NET) survey scales (Frazier et al., 2023). These survey scales included 6 measures of symptoms/problems (anxiety, attention-deficit/hyperactivity disorder, restricted/repetitive behavior, challenging behavior, mood, and sleep problems) and 5 measures of skills/functioning (motor skills, daily living skills, social communication/interaction skills, executive functioning, and quality of life). After NET survey completion, informants and participants were instructed to complete webcam-collected performance measures and were sent links via email or text to facilitate completion. For young and/or impaired children, performance measure administration began by having the parent complete a practice version, so that they understood how the webcam collection works and how best to help their child. Parents and older patients also were offered a video call with the research coordinator to review best practices in performance measure administration and were provided a set of recommendations to improve evaluation validity.

Performance measure administration began with the 5-point calibration that included dots presented in the four corners and center of the screen. Next, videos were presented for each paradigm in succession – social attention, receptive language, processing speed, and single-word reading. Re-calibration automatically occured prior to each paradigm.

Survey and webcam measures were collected at baseline, 1-month, and 4-month follow-up timepoints. The maximum total administration time across all paradigms was 15 minutes (social attention – 4 min, receptive vocabulary – 4 min, processing speed – 3 min, single-word reading – 4 min) with videos separated into 1-minute segments to permit breaks. A button press was required to advance to the next video. Participants were instructed to complete all of the social attention and processing speed videos, but were permitted to complete only the first two minutes of the receptive vocabulary paradigm and complete only one minute of the single-word reading paradigm dependent on the parent’s appraisal of the patient’s capacity to engage with the paradigm. Participants could proceed through all paradigms or take breaks between paradigms but were encouraged to finish all videos in one sitting if possible.

IRB approval was obtained for all of the qualitative and quantitative procedures of the study, including administration of the final NET scales, and parents/legally-authorized representatives and adult patients provided informed consent prior to completing any study procedures. Assent for minors was also obtained, where appropriate.

2.9. Statistical Analyses

2.9.1. Sample Characterization

Descriptive statistics for demographic and clinical factors were computed to characterize the sample, and Chi-square or univariate ANOVA were used to compare across the seven study groups (PHTS, SYNGAP1, NFIX, other NDGS, idiopathic NDD, sibling controls, and unrelated neurotypical controls).

2.9.2. Evaluation and Measure Validity

Using validity criteria for each of the 12 performance measures, the sum of valid measures was computed and compared across study groups using univariate ANOVA. Proportions of validity by measure were also computed overall and by parent-reported intellectual disability status.

2.9.3. Reliability

Scale reliability (internal consistency) was calculated using Cronbach’s alpha (α) (Streiner & Norman, 1995). Scale reliability estimates falling in the ranges .70 to .79, .80 to .89, and >.90 were considered fair, good, and excellent (Nunnally & Bernstein, 1994), respectively. Test-retest reproducibility (one-month follow-up) and stability (4-month follow-up) were estimated using Pearson’s bivariate correlations. Test-retest estimates <.40 were considered poor, .40 to .59 fair, .60 to .74 good, and .75+ excellent (Cicchetti et al., 2006).

2.9.4. Convergent and Discriminant Validity

To evaluate convergent and discriminant validity, other clinical information based on informant-report was a priori selected as either measuring similar constructs (convergent validity) or measuring dissimilar constructs (discriminant validity) for each performance measure. Informant-report information included: estimated IQ; speech level (5-point scale from non or minimally speaking to fluent speech); reading level (5-point scale from no reading to paragraph level or higher). ADHD, anxiety, mood, challenging behavior, social communication/interaction, and restricted repetitive behavior symptoms; sleep problems; daily living skills; executive functioning; and motor skills. Bivariate correlations were computed between each performance measure and the convergent and discriminant validity measures selected. To compute aggregate correlations over multiple measures, correlations were converted to Fisher’s z, averaged, and transformed back to a correlation metric. The test of the significance of the difference in dependent correlations was used to examine whether convergent validity correlations were higher than discriminant validity correlations (Cohen & Cohen, 1983).

2.9.5. Concurrent Validity with ID, ASD Diagnoses, and Autism Symptom Levels

To examine concurrent validity of performance measures with parent-report clinical ID diagnosis, independent samples t-tests were computed with each measure as the dependent variable and ID status (yes, no) as the grouping variable. Cohen’s d was computed to estimate the magnitude of group differences. To evaluate potential diagnostic validity, receiver operating characteristic (ROC) curve analyses were calculated in the training, testing, validation, and testing plus validation sub-samples, separately for baseline, 1-month, and 4-month follow-up data. Areas under the curve (AUCs) evaluated diagnostic validity. A rough guideline for evaluating AUC values is: < .60 = poor, .60 - .69 = fair, .70 to .79 = good, 80 - .89 = excellent if the comparison group is clinically meaningful; and .90 – 1.00 = exceptional only if the design and comparison are appropriate (Youngstrom et al., 2019). To evaluate concurrent validity with autism symptom levels, autism symptom levels derived from neurobehavioral evaluation survey scales were calculated and correlations were computed in the same subsamples as ROC analyses.

2.9.6. Neurobehavioral Patterns across NDGS and idiopathic NDD Groups

To explore unique patterns of social and cognitive function, webcam meausures were first normed using regression-based norming in unrelated healthy controls, with age, the square of age (to capture non-linear developmental trends), and sex included as predictors in each equation. This approach puts each measure on a z-score metric relative to healthy controls. Using these standardized residual scores, univariate analysis of variance models were computed, with each of the seven groups as the independent variable and the performance measure scores as dependent variables in separate analyses.

2.9.7. Statistical Power

Assuming total sample sizes of 200+ for reliability and validity analyses, statistical power to detect a bivariate correlation of r≥.40 was excellent (>.99; one-tailed p-value of .05). Assuming minimal sub-sample sizes of at least 18 ASD and 40 non-ASD diagnosed individuals, power to detect AUCs≥.72 was at least good (≥.80). Statistical power to detect group differences across webcam performance measures, assuming a minimum sample size of 24, was at least adequate (>.82) if large group differences were observed (d≥.80; α=.05, two-tailed). For larger group sizes (n>40), power was adequate, even for medium effects (d≥.50).

2.9.8. Statistical Analysis Implementation

Statistical significance was set at α=.05, two-tailed, and effect size magnitude was emphasized. Data preparation, descriptive analyses, internal consistency reliability using Cronbach’s alpha (α), and bivariate correlations were computed in SPSS v28 (IBM Corp, 2021). ROC analyses were computed using the R package pROC and implemented in version 4.1.2 (R Core Team, 2021) using R Studio version 2021.09.1.

3. Results

3. 1. Pilot Evaluation Results

Clinicians used a wide range of hardware setups and reported high relevance of the paradigms to their respective NDGS or idiopathic NDD group (Appendix 19). Clarity of instructions and quality of audio and visual stimuli was rated as high. Timing was rated as generally moderate (neither fast nor slow). Several potential concerns about target difficulty levels were raised and used to adjust the final stimuli.

Parents rated the overall experience as positive and of relatively moderate difficulty across paradigms (Appendix 20). Patient participants did not require breaks, looked away from the screen with variable frequency (every 5–10 seconds to only a few losses of attention to screen), covered or touched their face only infrequently, and required variable levels of physical, gestural, or verbal assistance to maintain motivation and attention. Unexpected intrusions and adjustments to lighting were infrequent. Overall attention was rated as average to good. Paradigm relevance to the patient’s condition was rated as “relevant” to “highly relevant” across paradigms. Quality of audio and visual stimuli was rated as high, and timing was judged to be generally moderate to fast. These data were used to adjust parent training processes and to include reminders to limit assistance to motivation and general attention (not specific to a stimulus or desired response).

3.2. Sample Characteristics

A total of 395 individuals enrolled to participate before 04/05/2023 (recruitment is ongoing). Of these, 20 did not attempt baseline webcam paradigms, but of the 375 who did attempt the paradigms, all achieved at least 1 valid measure (Appendix 21). Longitudinal attrition was modest at 1-month follow-up (n=54 did not attempt; n=341 attempted) but higher at 4-month follow-up (n=100 did not attempt; n=295 attempted).

Table 1 presents sample characteristics. Findings were highly consistent with findings in our recent survey validation study (Frazier et al., 2023). Specifically, participants were younger in the NFIX and SYNGAP1 groups and older in the PHTS and idiopathic NDD groups, with high rates of spousal informants in the latter groups. All groups had very high proportions of White/Caucasian participants, although Hispanic ethnicity approximated US population proportions in most groups, and the sample had a wide range of household incomes. Estimated cognitive levels were lowest in the NFIX, SYNGAP1, and other NDGS groups and to a lesser extent in the PHTS group relative to control groups. Informant-reported developmental diagnoses were highly variable across NDGS groups, but with elevated rates of ASD, ID, anxiety, and motor disorder in NFIX, SYNGAP1, and other NDGS groups compared to controls. Participants were predominantly from the US (n=325, 87%), but a small minority of participants with informants fluent in English were also included from other countries (United Kingdom n=17, Canada n=24, Australia n=4, New Zealand n=1, Ireland n=2, Netherlands n=1, Israel n=1).

Table 1.

Demographic and clinical characteristics by study group.

Sibling
Controls
Unrelated
Controls
PHTS NFIX SYNGAP1 Other
NDGS
NDD X2 / F (p)
n (%) n (%) n (%) n (%) n (%) n (%) n (%)
N 40 116 33 24 43 63 56
Informant Age (M, SD) 42 (6) 42 (9) 43 (8) 41 (10) 42 (8) 44 (8) 42 (8) 0.6 (.718)
Informant Sex (% Female) 37 (93%) 95 (82%) 28 (85%) 21 (88%) 39 (91%) 61 (97%) 51 (91%) 12.3 (.424)
Informant Relationship to Participant 39.3 (.003)
 Biological Parent 39 (98%) 99 (85%) 25 (76%) 23 (96%) 40 (93%) 59 (93%) 44 (79%)
 Adoptive or Custodial Parent 0 (0%) 3 (3%) 1 (3%) 1 (4%) 1 (2%) 4 (6%) 2 (4%)
 Other Biological Relative / Sibling 1 (2%) 7 (6%) 0 (0%) 0 (0%) 1 (2%) 0 (0%) 3 (5%)
 Spouse/Other Non-Biological Relative 0 (0%) 7 (6%) 7 (21%) 0 (0%) 1 (2%) 0 (0%) 7 (12%)
Household Income (US $) 79.7 (.013)
 <$25,000 1 (3%) 5 (4%) 2 (6%) 0 (0%) 0 (0%) 2 (3%) 8 (14%)
 $25,000–$34,999 2 (5%) 8 (7%) 0 (0%) 2 (8%) 1 (2%) 1 (2%) 2 (4%)
 $35,000–$49,999 1 (3%) 5 (4%) 1 (3%) 3 (13%) 3 (7%) 3 (5%) 6 (11%)
 $50,000–$74,999 6 (15%) 18 (16%) 9 (27%) 4 (17%) 3 (7%) 4 (6%) 11 (20%)
 $75,000–$99,999 2 (5%) 21 (18%) 3 (9%) 3 (13%) 4 (9%) 6 (10%) 4 (7%)
 $100,000–$149,999 7 (18%) 28 (24%) 7 (21%) 4 (17%) 10 (23%) 16 (25%) 11 (20%)
 $150,000–$199,999 7 (18%) 14 (12%) 4 (12%) 5 (21%) 10 (23%) 6 (10%) 6 (11%)
 $200,000+ 6 (15%) 13 (11%) 2 (6%) 2 (8%) 7 (16%) 12 (19%) 5 (9%)
 Did not report 8 (20%) 4 (3%) 5 (15%) 1 (4%) 5 (11%) 13 (21%) 3 (5%)
Participant Age (M, SD) 11 (5) 12 (8) 17 (13) 10 (7) 10 (7) 11 (6) 16 (9) 4.8 (<.001)
Participant Sex (% Female) 23 (58%) 63 (54%) 13 (39%) 12 (50%) 19 (44%) 36 (57%) 21 (38%) 8.6 (.197)
Participant Race / Ethnicity
 White / Caucasian 36 (90%) 95 (82%) 30 (91%) 24 (100%) 37 (86%) 58 (92%) 46 (82%) 9.6 (.142)
 Black / African American 3 (8%) 9 (8%) 2 (6%) 0 (0%) 5 (12%) 5 (8%) 8 (14%) 5.6 (.473)
 Middle Eastern or North African 2 (5%) 1 (1%) 1 (3%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 9.1 (.167)
 East Asian 2 (5%) 9 (8%) 3 (9%) 0 (0%) 2 (5%) 5 (8%) 2 (4%) 3.8 (.697)
 South Asian 2 (5%) 8 (7%) 0 (0%) 0 (0%) 1 (2%) 3 (5%) 0 (0%) 8.2 (.223)
 Native American / Alaskan Native 0 (0%) 3 (3%) 1 (3%) 1 (4%) 0 (0%) 0 (0%) 1 (2%) 4.5 (.605)
 Native Hawaiian / Pacific Islander 0 (0%) 1 (1%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2.2 (.896)
 Hispanic 7 (18%) 21 (18%) 1 (3%) 5 (21%) 7 (17%) 2 (3%) 11 (20%) 18.7 (.096)
 Unknown 0 (0%) 2 (2%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4.5 (.611)
 Did not report 0 (0%) 2 (2%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 1 (2%) 3.6 (.734)
Cognitive Level (informant-estimated) 337.9 (<.001)
 Very high or above (120+) 6 (15%) 12 (10%) 3 (9%) 0 (0%) 0 (0%) 1 (2%) 10 (18%)
 High Average (110–119) 18 (45%) 58 (50%) 6 (18%) 0 (0%) 0 (0%) 0 (0%) 19 (34%)
 Average (90–109) 13 (33%) 42 (36%) 15 (46%) 0 (0%) 1 (2%) 2 (3%) 22 (39%)
 Below average (80–89) 0 (0%) 0 (0%) 1 (3%) 2 (8%) 4 (9%) 6 (10%) 2 (4%)
 Borderline impairment (70–79) 0 (0%) 0 (0%) 2 (6%) 2 (8%) 1 (2%) 2 (3%) 0 (0%)
 Mild impairment (55–69) 0 (0%) 0 (0%) 1 (3%) 5 (21%) 6 (14%) 12 (19%) 3 (5%)
 Moderate impairment (40–54) 0 (0%) 0 (0%) 2 (6%) 9 (38%) 11 (26%) 17 (27%) 0 (0%)
 Severe impairment (21 to 39) 0 (0%) 0 (0%) 0 (0%) 2 (8%) 10 (23%) 12 (19%) 0 (0%)
 Profound impairment (<20) 0 (0%) 0 (0%) 0 (0%) 2 (8%) 5 (12%) 3 (5%) 0 (0%)
 Did not report 3 (8%) 4 (3%) 3 (9%) 2 (8%) 5 (12%) 8 (13%) 0 (0%)
Cognitive Estimate from Prior Testing 6 (15%) 19 (16%) 16 (49%) 13 (54%) 21 (49%) 30 (48%) 26 (46%) 57.1 (<.001)
Developmental Diagnoses (n, %)
 ASD 9 (27%) 5 (21%) 35 (81%) 32 (51%) 8 (14%) 54.8 (<.001)
 ID/GDD 10 (30%) 21 (88%) 39 (91%) 58 (92%) 1 (2%) 141.3 (<.001)
 Speech/language disorder 9 (27%) 11 (46%) 32 (74%) 40 (64%) 10 (18%) 44.2 (<.001)
 ADHD 5 (15%) 1 (4%) 6 (14%) 16 (25%) 26 (46%) 24.0 (<.001)
 ODD/CD 0 (0%) 1 (4%) 4 (9%) 2 (3%) 4 (7%) 4.4 (.353)
 Anxiety disorder 7 (21%) 8 (33%) 8 (19%) 10 (16%) 18 (32%) 6.4 (.174)
 Specific learning disorder 2 (6%) 0 (0%) 1 (2%) 4 (6%) 5 (9%) 3.6 (.460)
 Motor / coordination disorder 4 (12%) 6 (25%) 24 (56%) 21 (33%) 0 (0%) 45.5 (<.001)
 Depressive disorder 5 (15%) 0 (0%) 0 (0%) 0 (0%) 10 (18%) 23.8 (<.001)
 Bipolar disorder / mania 0 (0%) 0 (0%) 0 (0%) 1 (2%) 1 (2%) 1.7 (.789)
 Obsessive compulsive disorder 0 (0%) 0 (0%) 4 (9%) 2 (3%) 2 (4%) 6.1 (.192)
 Tic disorder 0 (0%) 0 (0%) 1 (2%) 1 (2%) 1 (2%) 1.2 (.882)
 Feeding / eating disorder 0 (0%) 0 (0%) 11 (26%) 10 (16%) 0 (0%) 27.5 (<.001)
Baseline Webcam Evaluation Validity 57.4 (<.001)
1–3 measures valid (n, %) 0 (0%) 0 (0%) 0 (0%) 1 (4%) 0 (0%) 3 (6%) 0 (0%)
4–11 measures valid (n, %) 9 (22%) 29 (25%) 7 (21%) 10 (42%) 30 (69.8%) 30 (47%) 13 (23%)
All measures valid (n, %) 31 (78%) 87 (75%) 26 (79%) 13 (54%) 13 (30.2%) 30 (47%) 43 (77%)
Number of Valid Measures (M, SD) 11.3 (1) 11.2 (2) 11.1 (2) 9.9 (3) 10.0 (2) 9.8 (3) 11.3 (2) 6.3 (<.001)

Note. ASD=autism spectrum disorder. ID/GDD=intellectual disability/global developmental delay, ADHD=Attention-Deficit/Hyperactivity disorder; ODD/CD=oppositional defiant disorder/conduct disorder. Non-ASD diagnoses do not sum to 100% because children could be diagnosed with more than one condition. Note that race/ethnicity categories are not mutually exclusive and participants were encouraged to select all options that apply. For statistical tests with low cell sizes, Fisher’s exact test was also computed, but results were highly consistent with the chi-square analysis. For this reason, chi-square is reported with the associated p-value.

3.3. Evaluation Validity

Evaluation validity was high across all groups, but NFIX, SYNGAP1, and other NDGS groups had higher proportions of individuals with at least one invalid measure (Tables 1). On average, all groups had at least 10 valid performance measures. Participants with reported ID had lower measure validity proportions than participants without ID, but measure validity never dropped below 84% (Table 2).

Table 2.

Valid administration and reliability metrics for webcam-based performance measures.

# Measure Stimulus Paradigm # of indicators Evaluation Validity Overall % % Valid
no ID
% Valid
ID
Internal Consistency Reliability
(Cronbach’s α)
1-Month
Test-Retest Reproducibility
(r)
4-Month
Test-Retest Stability
(r)
1 Overall Attention All 15 100% 100% 100% .89 .52 .50
2 Attentional Scanning Processing Speed 12 87% 89% 84% .94 .66 .64
3 Positive Emotion Social 32 100% 100% 100% .93 .63 .62
4 Negative Emotion Social 32 100% 100% 100% .95 .44 .38
5 Social Attention Social 141 92% 95% 89% .89 .62 .64
6 Social Preference Social 69 92% 95% 89% .75 .48 .40
7 Face Preference Social 28 92% 94% 88% .90 .37 .29
8 Non-social Preference Social 42 92% 95% 89% .67 .31 .31
9 Receptive Vocabulary Receptive Vocabulary 39 94% 96% 89% .93 .73 .72
10 Speed to Faces Social 28 92% 94% 88% .93 .29 .29
11 Speed to Object Processing Speed 12 87% 89% 84% .95 .53 .51
12 Reading Accuracy Single-word Reading 46 96% 99% 91% .91 .68 .72

Note. # of indicators refers to the number of areas-of-interest (these could be whole vidoes or whole stimui if areas-of-interest are combined) included in computing the measure. Validity proportions are given for baseline data and are estimated by including all individuals who attempted to complete the webcam performance paradigm. Fair test-retest reliability values for overall attention are likely due in part to restricted range as many individuals obtain near 95–100% values. Low test-retest reliability values for negative emotion is likely a function of very limited score range with many individuals falling at 0% expression intensity values.

Score distributions were variable across measures, with many showing near normal distributions, and all but negative emotion suggesting a good quantitative range (Appendix 22). The latter was highly skewed and kurtotic with scores clustered close to 0%.

3.4. Reliability

Internal consistency reliability was good to excellent for all performance measures (α=.89–.95; Table 2), with the exception of non-social attention, where reliability was lower but still adequate for a low frequency behavior (α=.67). Test-retest reproducibility estimates were fair or above across 9 of the 12 scales (r=.44–.73), with the two measures based on face processing and the non-social preference measure showing less stability. Test-retest stability was fair or above for 8 of the 12 measures (r=.40–.72), and the highest stability estimates were for receptive vocabulary and single-word reading. Face processing, non-social preference, and negative emotional expression scales showed lower stablity, the latter of which just missed the cutoff for fair test-retest stability. Similar levels were observed when only NDGS patients were examined.

3.5. Convergent and Discriminant Validity

All performance measures, except positive and negative emotional expressiveness, showed strong evidence of convergent and discriminant validity (Table 3). Given the unique nature of gaze-based measures and the difference in measurement modality (gaze vs. informant-report), convergent validity was generally quite good (r=.21–.62). Similarly, discriminant validity estimates were generally quite low (r=.07–.24). The lack of convergent validity for emotional expressiveness measures is likely due to the fact that there were no close behavioral constructs assessed by any available informant-report measure.

Table 3.

Predicted convergent and discriminant validity associations for selected webcam measures.

Convergent Validity Discriminant Validity
Webcam Measure Measures Average |r| Measures Average |r| t (p)
Overall Attention Estimated IQ, ADHD Symptoms, Executive Functioning .30 Anxiety, Mood, Challenging Behavior .17 2.53 (.012)
Attentional Scanning Estimated IQ, ADHD Symptoms, Executive Functioning .43 Anxiety, Mood, Challenging Behavior .23 4.10 (<.001)
Positive Emotion Mood-Hypomania, Anxiety .10 Motor, Daily Living Skills .07 0.47 (.635)
Negative Emotion Mood-Emotion Regulation, Anxiety .09 Motor, Daily Living Skills .11 −0.33 (.742)
Social Attention Autism Symptoms .55 Anxiety, Mood .23 6.95 (<.001)
Social Preference Social Communication / Interaction Symptoms .36 Anxiety, Mood .16 3.69 (<.001)
Face Preference Social Communication / Interaction Symptoms .26 Anxiety, Mood .12 2.50 (.013)
Non-social Preference Social Communication / Interaction Symptoms, Restricted / Repetitive Behavior .21 Anxiety, Mood .09 2.27 (.024)
Receptive Vocabulary Estimated IQ, Speech Level, Social Communication / Interaction Symptoms .29 Anxiety, Mood, Sleep .14 2.38 (.018)
Speed to Faces Social Communication / Interaction Symptoms .25 Anxiety, Mood, Challenging Behavior .12 2.43 (.016)
Speed to Object Estimated IQ .47 Anxiety, Mood, Challenging Behavior .24 3.70 (<.001)
Reading Accuracy Reading Fluency Level .62 Anxiety, Mood, Sleep .14 8.05 (<.001)

Note. Convergent and discriminant validity correlations were averaged after conversion to Fisher’s z and then re-converted to correlations. Average convergent and discriminant validity correlations were compared using the test of dependent correlations with the nuisance correlation being the average of the inter-correlations between the convergent and discriminant validity measures.

Inter-correlations among the performance measures tended to be small to moderate (Appendix 23), with a few notable exceptions (speed to faces with face preference r=−.79 and receptive vocabulary with reading accuracy r=.78). The former may suggest redundancy of these measures but the latter correlation is likely due to the close relationship between vocabulary and reading and represents a realistic estimate of the association of these two constructs.

3.6. Concurrent Validity with ID, ASD Diagnosis, and Autism Symptom Level

Participants with ID showed statistically significant differentiation across all performance measures (Table 4), including lower levels of general attention, attentional scanning, social attention, social preference, face preference, receptive vocabulary, single-word reading, and slower speed to faces and objects. Interestingly, individuals with ID showed high positive and negative emotional expressiveness.

Table 4.

Descriptive statistics for webcam-collected performance measures across cases with and without Intellectual Disability (ID).

No ID
n=224
ID
n=151
M (sd) M (sd) Raw Δ t (p) Cohen’s d
Overall Attention (%) 82.1 (14) 70.5 (17) +11.6% (1.8 min total) 7.1 (<.001) .75
Attentional Scanning (Count) 11.6 (3.4) 7.9 (2.5) +3.7 glances to each target 9.9 (<.001) 1.20
Positive Emotion (%) 6.4 (8.7) 10.3 (9.1) −3.9% intensity −4.2 (<.001) −.44
Negative Emotion (%) 2.2 (3.0) 3.4 (4.1) −1.2% intensity −3.3 (.001) −.35
Social Attention (z) −.02 (1.0) −1.52 (1.3) +1.5 control SDs 11.9 (<.001) 1.14
Social Preference (FD) 1.4 (0.3) 1.2 (0.3) +0.2 seconds per AOI 6.0 (<.001) .68
Face Preference (FD) 1.3 (0.8) 0.8 (0.5) +0.5 seconds per AOI 6.1 (<.001) .70
Non-social Preference (FD) 1.1 (0.4) 1.2 (0.4) −0.1 seconds per AOI −2.1 (.038) −.24
Receptive Vocabulary (FD) 41.9 (25.7) 17.1 (13.6) +24.8 seconds to all targets 10.1 (<.001) 1.13
Speed to Faces (TFF) 7.2 (2.1) 8.0 (1.8) −0.8 seconds per AOI −3.3 (<.001) −.37
Speed to Object (TFF) 4.9 (1.3) 6.1 (1.2) −1.2 seconds per AOI −7.2 (<.001) −.87
Reading Accuracy (FD) 37.9 (22.8) 16.6 (14.1) +21.3 seconds to all targets 9.2 (<.001) 1.06

Note. ID=Intellectual disability (defined as parent-report of ID/GDD or estimated IQ<170). Overall attention (%) is the percentage of time on screen throughout all stimulus paradigms. Count=sum of glances to all targets averaged across stimuli. TFF=time to first fixation – values represent averages across all stimuli, including those that were not fixated where the length of the stimulus was imputed. AOI=area-of-interest. Values for positive and negative emotion represent estimated intensities with a range of 0–100%. Higher values are preferable for all measures except Speed to Faces and Speed to Objects where higher values indicate slower time to the AOIs, Non-Social Preference where higher values indicate a preference for non-social information, and Positive and Negative Emotion measures where higher scores simply indicate more expressiveness. Social attention is presented as a z-score (based on the neurotypical control mean) because this measure is created by averaging multiple different metrics (fixation duration, fixation count, and time-to-first fixation) after standardization.

Across subsamples, timepoints, and ages, the social attention measure showed moderate to high correlations (r=.32–.62) with autism symptom level (Appendix 24). Similarly, concurrent validity with ASD diagnosis consistently fell in the good to excellent range (AUC=.69–.88; Appendix 25), with evidence that diagnostic validity is maintained across evaluation timepoints. Dividing the social attention measure into clinically-useful score ranges, multi-level likelihood ratios suggest meaningful reductions in ASD probability for low scores (z≤0.1) and increases in ASD probability for high scores (z≥1.81). The optimal cut score was 1.49 resulting in 70% sensitivity and 87% specificity (Appendix 26).

3.7. Group Profiles Across Performance Measures

Group differences were statistically significant across all performance measures (largest p=.041; eta-squared=.04–.36). In general, NFIX, SYNGAP1, and other NDGS showed a more impaired neurobehavioral phenotype, including lower attention, higher non-social preference, worse receptive vocabulary and single-word reading, and slower speed to faces and objects (Figure 1). PHTS patients showed lower social attention and social preference and higher non-social preference, consistent with high rates of ASD in this group, but only mild reductions in receptive vocabulary and single-word reading and no deficits in overall attention or attentional scanning. Interestingly, SYNGAP1 and other NDGS patients had higher negative emotional expressiveness scores, while NFIX patients and other NDGS showed higher positive emotional expressiveness scores, implying syndrome-specific patterns even among more significantly impaired groups (Appendix 27). Taken together, these findings provide preliminary evidence of concurrent (known-groups) validity of performance measures.

Figure 1.

Figure 1.

NDGS group differences across webcam measures.

Note. SC=sibling controls, UC=unrelated controls, PHTS=PTEN Hamartoma Tumor Syndrome. Other NDGS=other neurodevelopmental genetic syndromes, and NDD=idiopathic neurodevelopmental disability.

4. Discussion

This research aimed to describe a comprehensive process of creating a set of objective webcam-collected measures, derived using artificial intelligence algorithms for capturing gaze and facial expression information, and based on the gold-standard measurement development guidelines (Boateng et al., 2018) as well as principles of inclusive research practices (FDA, 2009). The process involved both clinicians-scientists and families and was undertaken to provide a preliminary validation of these patient performance measures by examining a range of key psychometric characteristics. Results suggest that these measures may serve as promising new objective evaluation tools that can be useful complements to our recently validated informant-report survey scales (Frazier et al., 2023), permitting multi-method characterization of key social and cognitive characteristics among individuals with NGDS. To our knowledge, the webcam measures and associated survey instruments are the first dedicated set specifically developed to assess a wide range of neurobehavioral and neurodevelopmental presentations seen in NGDS, including individuals with significant cognitive challenges. This initial validation demonstrated that the performance measures are psychometrically-sound instruments with potential utility in characterizing the varied clinical and functional spectra seen in many people with NDGS and idiopathic NDD. The validation further highlights the potential value of artificial intelligence / machine learning algorithms for collecting key biometric information that can be used to better understanding individuals with NDGS.

All of the measures showed strong evaluation validity and can be collected in many individuals with mild to moderate cognitive dysfunction. There was a clear gradient of invalid collection in people with more severe cognitive dysfunction, but some individuals reported to be at the more severe levels could validly complete one or more performance measures. Scale reliability was fair to excellent across all webcam measures, indicating good ability to measure individual differences cross-sectionally across each of the neurobehavioral processes assessed. Test-retest reproducibility and stability were at least acceptable across the majority of measures. Specifically, test-retest reliability was good for attentional scanning, positive emotional expressiveness, social attention, receptive vocabulary, and single-word reading and was fair for sustained attention, social preference, and speed to objects. This indicates that changes in these measures are relatively stable over time, increasing the likelihood that changes reflect real differences in neurobehavioral functioning. Test-retest reliability estimates were lower for negative emotional expressiveness, non-social attentional preference, face preference, and speed to faces. When considered in light of adequate or better scale reliability for these measures, the present results suggest these measures may be more state-like in nature. Observations of the score distributions for negative emotional expression and non-social preference suggest that lower test-retest reliability for these measures may be influenced by floor effects and, therefore, may be under-estimated. Future work is needed to examine score stability over a longer time interval to ensure an adequate balance of stability and sensitivity to change. If sensitivity to change is demonstrated, the quantitative nature, relative brevity, and high evaluation validity of webcam measures might allow for more frequent assessments in the context of intervention studies, thereby increasing statistical power and reducing the sample size needed for clinical trials. This is particularly important for studies of rare NDGS.

Lower test-retest reliability for measures of face processing is intriguing and may be due to factors influencing attention to faces, including the fact that many stimuli included multiple faces as well as other target or background stimuli. It is possible that follow-up evaluations may bias attention towards novel faces (faces not processed as comprehensively in the baseline assessment) or other novel environmental stimuli. It is also possible that face processing is simply more state-like in nature, with reliable collection at each assessment, but rapid changes in quantitative level across hours or days. Future work is needed to tease out these possibilities and examine whether stimulus complexity moderates stability for these measures. Beyond floor effects, lower stability for non-social preference is likely, in part, a function of the less frequent nature of attention to socially-irrelevant information. It may also be useful for future iterations of the social stimuli to include a larger number of non-social or background objects to increase the reliability of this measure. Lower stability for negative emotional expressiveness may be, at least partly, due to the low number of negative facial expressions observed across all participants and is likely influenced by the state-like nature of emotional expressiveness. Adding stimuli that specifically pull for negative emotionality could enhance the test-retest reliability of this measure. Even with these exceptions, all performance measures showed group differences in the baseline data collection, suggesting good known-groups validity and potential value for cross-sectional characterization.

Given their scalability, webcam-collected performance measures also may have utility in clinical contexts for supplementing collection of traditional neurobehavioral measures, allowing more frequent collection between clinical visits, great inclusion in research, and higher quality data via home-based collection. If offered at minimal cost with automated administration, scoring, and reporting functions to reduce clinician burden, these measures could become a key part of ongoing developmental monitoring strategies. This is further supported by the brevity (max 15 minutes) of administering all 4 paradigms and the potential to collect only those measures that are relevant to a given patient in future clinical assessments. Future research and collection of large-scale normative data is warranted to determine whether this potential clinical value might be realized and, more importantly, to further evaluate psychometric performance.

Finally, the present results provide preliminary evidence of concurrent (known-groups) validity of webcam measures across NDGS and in comparison to neurotypical controls and idiopathic NDD. The pattern of substantial reductions in many cognitive processes in NFIX, SYNGAP1, and other NDGS is consistent with our recently published informant-report patterns for many neurobehavioral domains (Frazier et al., 2023). Interestingly, there are some unique patterns among these groups, particularly in the pattern for positive and negative emotional expressiveness, but also in the magnitude of impairments for other domains. For example, people with SYNGAP1 mutations showed generally worse attention, slower processing speed to faces and objects, and lower social but higher non-social preference than people with NFIX mutations.

Relative to other NDGS groups, individuals with PHTS tended to show a less impacted social and cognitive profile. Specifically, this group showed no significant impairment in overall attention, attentional scanning, or processing speed measures and only slight reductions in receptive vocabulary and reading accuracy. This is consistent with a spectrum of neurobehavioral dysfunction in PHTS (Busch et al., 2023) and the observation that many individuals have either no or mild reductions in neurocognitive function relative to normative expectation (Busch et al., 2013). Additional data collection in larger NDGS samples will be required to replicate and extend the findings reported here. This work will also need to evaluate the influence of additional clinical factors (e.g., seizures, ID, etc.) on developmental trends.

Several limitations of the current study warrant mention. The genetic syndromes included in this study have a low prevalence and, thus, sample sizes remain modest, particularly given the wide age range. While our power analysis indicated at least adequate power for group comparisons and psychometric analyses were well powered in the full sample, our current data should nevertheless be treated as preliminary, and studies with larger group sample sizes should be completed to replicate our findings and ensure they generalize to the larger population of these NDGS. Given the online nature of the research, it was not feasible to conduct in-person clinical characterization. As a result, this study could not independently confirm the diagnostic status of participants and was not able to administer dedicated in-person cognitive and behavioral assessments. However, previous studies have demonstrated that parent-report of children’s IQ strongly correlates with standardized clinical IQ testing (Shu et al., 2022), and a substantial minority of estimates in this study were based on prior testing (42%). Future work should collect well-validated in-person cognitive assessments to more accurately characterize the sample and examine how webcam measures relate to traditional standardized measures of cognitive and behavioral functioning.

Longitudinal investigations with larger NDGS samples and longer follow-up will also be critical for evaluating age effects and changes in neurobehavioral processes across development, as well as sensitivity to intervention effects. Further, given the preliminary nature of this study, it was not possible to include a comprehensive set of additional instruments to establish convergent and divergent validity. Thus, additional validation work, including convergent and discriminant validity analyses, is needed to provide further support for these webcam measures.

In spite of noted limitations, the present results suggest that webcam-collected gaze and facial expression-based performance measures are promising with evidence that they may function as reliable and valid assessment tools, covering key social and cognitive domains not easily evaluated by informant-report surveys. As such, they may be useful for detailed phenotypic characterization and, ultimately, as reliable, objective, and feasible outcome measures in clinical trials. With additional validation, and sufficient norming, these measures could also facilitate surveillance and clinical assessment for NDGS and idiopathic NDD.

5. Conclusions

The present study provides preliminary evidence that webcam-collected performance measures, derived using artificial intelligence algorithms for capturing gaze and facial expression data, can reliably capture individual and between group differences in neurobehavioral function. Future longitudinal investigations with larger NDGS and idiopathic NDD samples will be crucial to further evaluate these measures and determine their potential clinical and research utility.

Acknowledgments

We are sincerely indebted to the generosity of the families and individuals who contributed their time and effort to this study. We would also like to thank the PTEN Hamartoma Tumor Syndrome Foundation, the PTEN Research Foundation, the SYNGAP Research Fund, the Malan Syndrome Foundation, and the ADNP Kids Foundation for their support of this project.

We are grateful to all of the families at the participating Simons Searchlight sites as well as the Simons Searchlight Consortium, formerly the Simons VIP Consortium. We also appreciate obtaining access to the phenotypic data on SFARI Base. Approved researchers can obtain the Simons Searchlight population dataset described in this study by applying at https://base.sfari.org.

CE is the Sondra J. and Stephen R. Hardis Endowed Chair of Cancer Genomic Medicine at the Cleveland Clinic and an ACS Clinical Research Professor. MS is the Rosamund Stone Zander Chair at Boston Children’s Hospital.

Conflict of Interest

Dr. Frazier has received funding or research support from, acted as a consultant to, received travel support from, and/or received a speaker’s honorarium from the PTEN Research Foundation, SYNGAP Research Fund, Malan Syndrome Foundation, ADNP Kids Research Foundation, Quadrant Biosciences, Autism Speaks, Impel NeuroPharma, F. Hoffmann-La Roche AG Pharmaceuticals, the Cole Family Research Fund, Simons Foundation, Ingalls Foundation, Forest Laboratories, Ecoeos, IntegraGen, Kugona LLC, Shire Development, Bristol-Myers Squibb, National Institutes of Health, and the Brain and Behavior Research Foundation, is employed by and has equity options in Quadrant Biosciences/Autism Analytica, has equity options in MaraBio and Springtide, and has an investor stake in Autism EYES LLC and iSCAN-R. Dr. Kolevzon has received funding or research support from, or acted as a consultant to ADNP Kids Research Foundation, David Lynch Foundation, Klingestein Third Generation Foundation, Ovid Therapeutics, Ritrova Therapeutics, Acadia, Alkermes, Jaguar Therapeutics, GW Pharmaceuticals, Neuren Pharmaceuticals, Scioto Biosciences, and Biogen. Dr. Sahin reports grant support from Novartis, Biogen, Astellas, Aeovian, Bridgebio, and Aucta. He has served on Scientific Advisory Boards for Novartis, Roche, Regenxbio, SpringWorks Therapeutics, Jaguar Therapeutics and Alkermes. Dr. Hardan is a consultant to Beaming Health and IAMA therapeutics. He also has equity options in Quadrant Biosciences/Autism Analytica, and has an investor stake in iSCAN-R. Dr. Shic has acted as a consultant to F. Hoffmann-La Roche AG Pharmaceuticals and Jansen Pharameuticals. The remaining authors have no competing interests to disclose.

Funding:

This study was funded by the PTEN Research Foundation (to Frazier and Uljarević), with additional support from the SYNGAP Research Fund, the Malan Syndrome Foundation, the ADNP Kids Foundation, Autism Speaks, and the Simons Foundation Autism Research Initiative. The content is solely the responsibility of the authors and does not necessarily represent the official views of any of the funding agencies.

Appendix 1. Performance paradigm creation process.

graphic file with name nihms-1923134-f0002.jpg

Detailed description: The process included adapting social stimuli from our prior work and identifying cognitive paradigms that could be collected without speech or motor responses; identifying target items and creating individual stimuli for each paradigm; collecting feasibility data from 15 participants, including several young children with neurodevelopmental disabilities, across a wide age span (3 to 68) to evaluate online data collection viability and inform grading of stimulus and item difficulty (where applicable); updating the stimulus paradigms and administering to 10 clinician-scientist experts, 8 NDGS participants (5 PTEN, 1 SYNGAP1, 1 NFIX, and 1 ADNP), and 1 idiopathic NDD participant with ASD; and post-evaluation questionnaire to assess the ease-of-completion and any potential issues arising from performance paradigm administration.

Appendix 2. Receptive vocabulary target selection and stimulus creation.

Receptive vocabulary words were selected to be graded from very easy to moderately difficult. Specifically, to identify words the research team first reviewed lists of vocabulary words from infant/toddler/preschool through high school and SAT word lists using common websites https://www.education.com/ www.time4learning.com, www.vocabulary.com, etc.). Up to 16 words were selected for each level (infant/toddler, preschool, kindergarten, grades 1–3, grades 4–6, grades 7–12, common SAT words). Emphasis was given to easier words because of the desire to create stimuli which best differentiate between very low vocabulary levels (SS<55, age-equivalent<3), low vocabulary levels (55<SS<70, 3<age-equivalent<6), borderline vocabulary levels (70<SS<80, 6<age-equivalent<10), and low average vocabulary levels (80<SS<100, 10<age-equivalent<12). Once words were identified, the iWeb: 14 Billion Word Web Corpus (https://www.english-corpora.org/iweb/) was used to examine word frequency. The iWeb English language corpus data is based on the one-billion-word Corpus of Contemporary American English (COCA) -- the only corpus of English that is large, up-to-date, and balanced between many genres. Potential words were then sorted for word frequency. Words were then compared to existing receptive vocabulary tests (Peabody Picture Vocabulary Test – Fifth Edition and Receptive One-Word Picture Vocabulary Test-4) to identify approximate age levels for sets of words. This was done by comparing the word frequency of words from the existing instruments to the word frequency of identified possible words.

Final word sets were then chosen to be comparable to the lowest age levels (~ages 2–3) of existing instruments with very high word frequency (>990,000) (8 words chosen), preschool to grade 1 (~ages 4–6) with high word frequency (200k to 900k) (7 words chosen), grade 2 to grade 5 (~ages 7–10) with moderate word frequency (100k to 199k) (7 words chosen), grade 6–12 (~ages 11–17) with moderate to low word frequency (15k to 99k) (10 words chosen), grade 13+/SAT (~ages 18+) with low to very low word frequency (<15k) (5 word chosen). This resulted in 37 total words. Clip art and photos were then chosen to represent each word with selections verified by the principal investigator and a parent of a child with autism spectrum disorder and intellectual disability not affiliated with the investigative team.

A set of 37 distractor words were also identified to be roughly equivalent in word frequency to the target words. Pictures were also chosen for these distractors in the same manner as described above.

Appendix 3. Single word reading test item selection and stimulus creation

A similar set of procedures for item selection as described above were followed in development of the single word reading test with the following exceptions:

  1. Words were chosen by first inspecting existing word reading tasks (WIAT-3, WRAT-4, etc.), identifying words from the above receptive vocabulary lists that were not used, and looking for words of comparable word frequency to those used in existing single word reading tests. Word lists for common 2-letter through 6-letter words were searched for easy to moderate difficulty words. Synonyms and words with similar pronunciation difficulty to difficult words form existing tests were used to populate potential difficulty words.

  2. To ensure heavy coverage of easier words to allow detection of simple reading in impaired individuals, the focus was placed on 2-letter through 5-letter words. Four 2-letter words, ten 3-letter words, four 4-letter words, and five 5-letter words that were considered very to moderately easy to read were chosen for the final list (23 easy reading difficulty words). Next, two 5-letter words, four 6-letter words, and six 7-letter words of moderate difficulty were chosen (12 moderate reading difficulty words). Finally, eleven 5–10 letter words deemed of moderate to high reading difficulty rounded out the final list (46 total words). Difficulty was assessed by matching each word to words of similar length and complexity on existing single word reading tests and by inspecting word frequency results from COCA (See link above for Corpus of Contemporary American English). Specifically, word length and word frequency were used to identify very easy (2- and 3-letter words with very high frequency – 1,000,000+), easy (4-letter and 5-letter words with high to very high frequency – 250,000+) moderate (5–7 letter words with silent letters in pronunciation and/or moderate frequency to very high frequency – 50k+) and difficult words (6–10 letters with complex pronunciation and generally low word frequency <250k).

Stimulus creation

A similar set of procedures for stimulus creation were used to those implemented for the receptive vocabulary test except:

  1. Rather than pictures representing vocabulary words, words were spelled out in white font on a black screen.

  2. Words were divided into 14 stimuli of various sizes – 2 target words, 3 target words, and 4 target words. Stimuli with 2–3 target words will be presented in varying arrays (left/right; top/bottom; top/middle/bottom, 4 squares). See table 2 for the target word, spatial array, and timing for each word.

  3. Since multiple target words are presented for each stimulus, each target will be randomly arrayed so that participants cannot anticipate which word to find first.

  4. Each stimulus will be presented for four seconds with the first 1–1.5 seconds being the voiceover “Find the word (target word)”, with the exception of the final 8 most difficult words which will be presented for 5 seconds.

  5. for the most difficult words, two words beginning with the same letter will be placed in each array to avoid people guessing which word by the initial word sound.

Appendix 4. Screenshots of example social paradigm stimuli.

graphic file with name nihms-1923134-f0003.jpg

Facial Affect ID

graphic file with name nihms-1923134-f0004.jpg

Joke

graphic file with name nihms-1923134-f0005.jpg

Joint Attention

graphic file with name nihms-1923134-f0006.jpg

Social vs. Abstract

graphic file with name nihms-1923134-f0007.jpg

Naturalistic Scene

graphic file with name nihms-1923134-f0008.jpg

Appendix 5. Screenshots of example processing speed paradigm stimuli.

graphic file with name nihms-1923134-f0009.jpg

graphic file with name nihms-1923134-f0010.jpg

Appendix 6. Screenshots of example receptive vocabulary paradigm stimuli.

graphic file with name nihms-1923134-f0011.jpg

Directive: “Look at the Baby”

graphic file with name nihms-1923134-f0012.jpg

Directive: “Look at the Fish”

graphic file with name nihms-1923134-f0013.jpg

Appendix 7. Screenshots of example single-word reading paradigm stimuli.

graphic file with name nihms-1923134-f0014.jpg

Directive: “Find the word ‘it’”

graphic file with name nihms-1923134-f0015.jpg

Directive: Find the word ‘on’”

graphic file with name nihms-1923134-f0016.jpg

Appendix 8. Stimulus order and composition for all social attention stimuli.

Stimulus # Stimulus Type Duration (sec)
1 Instructions 4.1
2 Facial Affect ID 6
3 Facial Affect ID 6
4 Facial Affect ID 6
5 Facial Affect ID 6
6 Joke 6.5
7 Break 0.7
8 Joint Attention 4.6
9 Joke 6.8
10 Break – blank screen 0.7
11 Social vs Abstract 8
12 Social vs Abstract 6
13 Instructions 4.4
14 Facial Affect ID 6
15 Facial Affect ID 6
16 Facial Affect ID 6
17 Facial Affect ID 6
18 Joke 5.9
19 Break – blank screen 0.7
20 Joint Attention 4.3
21 Joke 7.3
22 Break – blank screen 0.7
23 Social vs Abstract 8
24 Social vs Abstract 6.5
25 Instructions 4.4
26 Joint Attention 3.7
27 Joint Attention 4.5
28 Social vs. Abstract 8
29 Joint Attention 4
30 Joint Attention 4
31 Social vs. Abstract 5.9
32 Naturalistic Scene 12
33 Naturalistic Scene 12
34 Instructions 4.4
35 Naturalistic Scene 10
36 Joint Attention 3.8
37 Naturalistic Scene 7.8
38 Joke 6.7
39 Break - blank screen 0.7
40 Social vs. Abstract 6
41 Naturalistic Scene 7.7
42 Social vs. Abstract 6
43 Naturalistic Scene 10.5

Note. Facial affect ID = side-by-side faces with instructions to look at a specific facial expression. Joke stimuli involved a person telling a corny joke. Social vs. Abstract included half the screen with an abstract shape or numerical representation and the other half with one or more people interacting. Joint attention scenes involved a variety of target and distractor objects with one person pointing toward and/or directing their gaze toward the target objects. Naturalistic scenes involved people interacting in various ways (e.g., having a conversation, playing a board game, entering an elevator, etc.).

Appendix 9. Stimulus order and composition for all processing speed stimuli.

Trial # # of Stimuli # of Target Stimuli # of Distractor Stimuli Duration (sec) Target Stimuli Distractor Stimuli
1 5 3 2 7 Pink Flower Green Tree, Green Leaf
2 7 4 3 7 Yellow Star Pink Hearts, Blue Circles,
3 7 4 3 8 Shoe Shirts, Sweater
4 9 5 4 10 White truck Blueberry, White Airplane
5 9 5 4 10 Fork Spoon, Knife
6 11 6 5 10 Fly Ant, Ladybug
7 11 6 5 10 Red Apple Red Ball, Red Balloon
8 11 6 5 10 Orange Hat Orange Cup, Pumpkin
9 13 7 6 10 Bird Head Cat Head, Lion
10 15 8 7 10 Jelly Fish Octopus, Squid
11 15 8 7 10 Cart Truck, Pile
12 15 8 7 10 Rook Pawn, Bishop

Appendix 10. Stimulus order and composition for all receptive vocabulary stimuli.

Stimulus # Stimulus Type Target # Target Word Distractor(s) Duration (sec) Position
1 2X2 1 baby hat 5 1
2 2X2 2 fish lion 5 2
3 2X2 3 shoes sweater 5 2
3 2X2 4 apple banana 5 3
4 2X2 5 eating ring 5 4
4 2X2 6 ball umbrella 5 1
5 2X2 7 drinking eating 5 3
5 2X2 8 running mouth 5 4
6 2X2 9 socks pants 5 4
6 2X2 10 sleeping dancing 5 3
7 2X2 11 kicking drinking 5 2
7 2X2 12 fence gift 5 4
8 3X2 13 mouth ball 5 4
8 3X2 14 umbrella kicking 5 5
8 3X2 15 muffin plant 5 6
9 3X2 16 ring raccoon 5 3
9 3X2 17 fountain muffin 5 2
9 3X2 18 elbow fence 5 5
10 3X2 19 dentist timber 5 4
10 3X2 20 aquarium culinary 5 1
10 3X2 21 yacht miniature 5 5
11 3X2 22 culinary gesture 5 2
11 3X2 23 compass toxic 5 3
11 3X2 24 wedge fungus 5 6
12 3X2 25 wrench gauge 5 6
12 3X2 26 reptile dictator 5 5
12 3X2 27 trumpet virtuoso 5 1
13 4X2 28 gift fountain 5 4
13 4X2 29 jewelry elbow 5 2
13 4X2 30 map shirt 5 6
13 4X2 31 raccoon eating 5 8
14 4X2 32 duet banister 5 8
14 4X2 33 noxious irregular 5 2
14 4X2 34 admonish parallel 5 3
14 4X2 35 aviator physician 5 6
15 4X2 36 carnivore herbivore 5 7
15 4X2 37 speedometer thermometer 5 5
15 4X2 38 amorphous admonish 5 2
15 4X2 39 virulent apathetic 5 3

Note. 2X2 stimuli present 4 objects in each quadrant of the screen. 3X2 stimuli present 3 objects across the top row and 3 across the bottom row. 4X2 stimuli present 4 objects across the top row and 4 objects across the bottom row. Position is listed in order from top left to bottom right.

Appendix 11. Stimulus order and composition for all single-word reading stimuli.

Stimulus # Stimulus Orientation Target # Target Word Distractor Word(s)
1 left/right 1 it to
2 left/right 2 so up
3 top/bottom 3 me do
3 top/bottom 4 on
4 left top/middle/right bottom 5 dog hot, eat
5 left top/middle/right bottom 6 win buy
5 left top/middle/right bottom 7 car
6 left top/middle/right bottom 8 map bag, hard
7 left top/middle/right bottom 9 few how
7 left top/middle/right bottom 10 out
8 left top/middle/right bottom 11 run set, why
9 right top/middle/left bottom 12 cat leg
9 right top/middle/left bottom 13 all
10 right top/middle/left bottom 14 boy own, fly
11 4 squares 15 tree throw
11 4 squares 16 from find
12 4 squares 17 time take
12 4 squares 18 fall fine
13 4 squares 19 large leave
13 4 squares 20 cheat chain
14 4 squares 21 adult assist
14 4 squares 22 spoon stone
15 4 squares 23 orange onion
15 4 squares 24 silver stream
16 4 squares 25 people program
16 4 squares 26 office often
17 4 squares 27 stretch soldier
17 4 squares 28 match model
18 4 squares 29 service student
18 4 squares 30 magic marry
19 4 squares 31 capacity citizen
19 4 squares 32 facility foreign
20 4 squares 33 railway rapidly
20 4 squares 34 opinion object
21 4 squares 35 listener logical
21 4 squares 36 absolute abandon
22 4 squares 37 aircraft apparent
22 4 squares 38 language launch
23 4 squares 39 flaccid freckle
23 4 squares 40 regiment resilient
24 4 squares 41 factitious fungible
24 4 squares 42 reprimand repudiate
25 4 squares 43 generic glorious
25 4 squares 44 niche nausea
26 4 squares 45 neurotic neophyte
26 4 squares 46 gnarled gimmicky

Appendix 12. Pilot testing post-evaluation questions for clinician-scientist-experts.

Expert Feedback
Device type performance measure was completed on: Mini Tablet Standard Tablet Laptop with internal webcam Laptop with external webcam Desktop with internal webcam Desktop with external webcam
What is the screen size on the device you used? Less than 10 inches 10 – 12 inches 12 – 18 inches Greater than 20 inches
Is there anything you felt would have been helpful to know before beginning the [paradigm measure]? (text entry)
Please rate the overall relevance of the [paradigm measure] to the neurodevelopment / genetic disorder you represent: Extremely relevant Very relevant Somewhat relevant Not relevant at all
Please rate the instructions for the section of the assessment: Very clear Somewhat clear Somewhat difficult to follow Very difficult to follow
Specific comments regarding the instructions for this section: (text entry)
Please rate the quality of the audio during the section of this assessment: The audio was very clear The audio was somewhat clear The audio was not clear
Please rate the quality of pictures used during this section of the assessment High quality Medium quality Low quality
If you answered that some or all photos were low quality, please indicate which photos you felt were not high quality (text entry)
Please rate the timing of the assessment: Very fast Somewhat fast Neither slow nor fast Somewhat slow Very slow
Specific comments regarding the timing for the assessment (text entry)
How appropriate was the level of difficulty of the [paradigm measures] targets? Very appropriate Appropriate Somewhat inappropriate Inappropriate
Please share any concerns you have regarding the level of difficulty or the array of the [paradigm measure] targets: (text entry)
Please check below any specific concerns you have (check all that apply) Too many easy targets Too many hard targets Too few easy targets Too few hard targets Too many moderate difficulty targets
Too few moderate difficulty targets Targets were too close together Targets were too far apart Target array was not appropriate Other concerns (text entry)
What aspects of completing measure do you think will be the most difficult for participants? (text entry)
Is there anything you feel that could help a caregiver or guardian administer this measure at home? (text entry)
Any additional comments regarding the measure you feel researchers should be aware of? (text entry)

Appendix 13. Pilot testing post-evaluation questions for parents assisting patient participants.

Parents assisting participants were asked the following sets of questions for each paradigm:

  • Overview

  • Breaks / Eye Calibration

  • Environment

  • Open answer

  • Support of participants

  • Paradigm Specifics

Overview (asked for all paradigms)
Please rate your overall experience with this assessment. Extremely positive Somewhat positive Neither positive nor negative Somewhat negative Extremely negative
How easy / hard was it for you to complete this assessment? Extremely easy Somewhat easy Neither easy nor hard Somewhat hard Extremely hard
What type of device did you complete this assessment on? Tablet with stand Laptop PC/MAC Desktop Computer Other
(text entry)
Were you sitting still during the video? Yes No
Did you engage in any sensory movements (rocking back and forth, hand flicking or flapping, going up and down on tippy toes, etc.)? Yes No
Please provide specific details on sensory movements you engaged in during the video? (text entry)
Breaks / Eye Calibration (asked for all paradigms)
How many breaks did you need to complete the [paradigms name] performance measure? No breaks One break Two or three breaks Four or five breaks
How long were the breaks on average? No breaks taken Less than 5 minutes 5 – 15 minutes 16 – 30 minutes 30 minutes to 1 hour 1 hour or more
How often did you look away? Very often Often Sometimes Infrequently Very infrequently Did not look away from the screen
How often did you cover or touch your face during the assessment? Very often Often Infrequently Very infrequently Did not touch face during the assessment
Did you end the video early at any point? Yes No
Environment (asked for all paradigms)
During any point of the assessment, did an unexpected noise occur within the environment? No occurrences of unexpected noises 1 occurrence 2 – 3 occurrences 4 – 5 occurrences 5+ occurrences of unexpected noises
During any point of the assessment, were you required to adjust the lighting in the room? No adjustments to lighting 1 adjustment 2–3 adjustments 4–5 adjustments 5+ adjustments
During any point of the assessment, did you experience internet connection difficulties (i.e. disconnection, weak connection, slow speed, etc.) No internet difficulties 1 occurrence 2 – 3 occurrences 4 – 5 occurrences 5 + occurrences
Ease of Completion (asked for all paradigms)
What gave you the most difficulty when completing this assessment? (text entry)
What was the easiest thing about completing this assessment? (text entry)
Assistance (asked for all paradigms)
Please rate your child’s overall attention level during the assessment? Excellent Good Average Poor Terrible
Please indicate the level of Physical Assistance (i.e., staying seated, position head, etc.) Did not provide physical assistance Assisted one time Assisted part of the time Assisted most of the time
Please indicate the level of Gestural Assistance (i.e., using your finger to point things on the screen, point to screen to get their attention, etc.) Did not provide gestural assistance Assisted one time Assisted part of the time Assisted most of the time
Please indicate the level of Verbal Assistance (i.e., “look here” or “watch the video”, etc.) Did not provide verbal assistance Assisted one time Assisted part of the time Assisted most of the time
Please provide additional information on the level of assistance you provided. Type n/a if no assistance was provided (text entry)
What was the most difficult part of aiding someone in completing this assessment? (text entry)
Was there something that could have made it easier for you to assist someone in completing this assessment? (text entry)
Paradigm specific: Social Attention
Please rate the overall relevance of the Social Attention assessment regarding a performance measure for the neurodevelopment / genetic disorder: Extremely relevant Very relevant Slightly relevant Not relevant at all
Please rate the quality of the audio during the section of this assessment: The audio was very clear The audio was somewhat clear The audio was not clear
Please rate the quality of videos used during this section of the assessment High quality Medium quality Low quality
Specific comments regarding the quality of videos of the assessment: (text entry)
If you answered that some of or all videos were low quality, please indicate which ones your felt were not high quality: (text entry)
Please rate the timing of the assessment? Very fast Somewhat fast Neither slow nor fast Somewhat slow Very slow
Specific comments regarding the timing of the assessment (text entry)
Is there anything you felt would have been helpful to know before beginning the Social Attention assessment? (text entry)
Any additional comments regarding the Social Attention assessment you feel researchers should be aware of? (text entry)
Paradigm specific: Processing Speed / Receptive Language / Single Word Reading
Please rate the overall relevance of the [specific paradigm] assessment regarding a performance measure for the neurodevelopment / genetic disorder: Extremely relevant Very relevant Slightly relevant Not relevant at all
Please rate the instructions for this section of the assessment: Very clear Somewhat clear Somewhat difficult to follow Very difficult to follow
Specific comments regarding the instructions (text entry)
Please rate the quality of the audio during the section of this assessment: The audio was clear The audio was somewhat clear The audio was not clear
Please rate the quality of pictures used during this section of the assessment: High quality Medium quality Low quality
If you answered that some or all photos were low quality, please indicate which photos you felt were not high quality: (text entry)
Please rate the timing of the assessment: Very fast Somewhat fast Neither slow nor fast Somewhat slow Very slow
Specific comments regarding the timing for the assessment: (text entry)
If there anything you felt would have been helpful to know before beginning [specific paradigm] assessment? (text entry)
Any additional comments regarding [specific paradigm] assessment researchers should be aware of? (text entry)

Appendix 14. Parent/caregiver administration support training process.

Introductory Call Introduction / Training video (optional) Practice Performance Measure (optional) Zoom Training
(optional)
Virtual Support Meetings
(optional)
  1. Scheduled prior to enrollment.

  2. Description of the webcam performance process including descriptions of paradigms (i.e., “Social attention is the first video [participant’s name] will be asked to watch”), environmental set up (lighting, device type, etc.), description of assistance to provide to participants (prompts, limitations, etc.).

  3. Addressed any concerns regarding participants qualifications (i.e., attending concerns, engagement in sensory behaviors, etc.).

  1. Sent to parents if concerns were expressed regarding the child’s ability to complete measures.

  2. Video provided visual overview of environmental set up and assisting participants.

  3. Video provided visual content of another child completing the webcam performance measures. This allowed parents to see that a child could move and be engaged while not having a negative impact on results.

  1. Sent to parents during consenting process.

  2. Allowed parents to experience calibration process as well as experience example stimuli from each paradigm.

  1. Provided to families if they expressed concerns about implementing procedures independently.

  2. Research team provided step by step guidance via zoom or telephone to set up environment and launch webcam performance measures. Offered at times best suited for family’s needs.

  1. Provided to families if technology difficulties arose during the practice performance measure and / or paradigm measures for participants.

  2. Research team provided step by step guidance via zoom or telephone to set up environment and launch webcam performance measures. Offered at times best suited for family’s needs.

Appendix 15. Methodological details for computing social attention.

While all other webcam-collected patient performance measures were calculated based on a priori criteria. The social attention measure was calculated using empirical criteria. Specifically, the social attention measure was calculated following our prior research methods to determine whether a more sensitive indicator of ASD diagnosis and autism symptom level could be identified from data collected during the social attention stimulus paradigm. For this measure, the correlations between ASD diagnoses and each fixation metric for each area-of-interest were evaluated in a training sub-sample. This sub-sample was randomly selected from all baseline webcam administrations (60% of participants). Fixation metric/area-of-interest combinations with statistically-significant correlations (r>.18) were selected. These metrics were then combined into an aggregate social attention index and tested in separate testing (20%) and validation (20%) sub-samples as well as across timepoints.

Appendix 16. Operational definitions for all webcam performance measures.

# Measure Operational Definition
1 Overall Attention Average percentage viewing time to the screen across all 4 stimulus paradigms
(15 one-minute blocks of stimuli)
2 Attentional Scanning Average number of glances to processing speed stimuli across all quadrant of the screen
3 Positive Emotion Average intensity rating for happy and surprised emotions
across all social attention stimuli
4 Negative Emotion Average intensity rating for fear, anger, disgust, and sadness emotions
across all social attention stimuli
5 Social
Attention
Empirically-derived measure of attention to social versus non-social information using standardized fixation duration, fixation count, and time-to-first fixation
6 Social Preference Average of all fixation durations to social areas-of-interest
across all social attention stimuli
7 Face
Preference
Average of all fixation durations to face areas-of-interest
across all social attention stimuli
8 Non-social Preference Average of all fixation durations to non-social areas-of-interest
across all social attention stimuli
9 Receptive Vocabulary Sum of all fixation durations across vocabulary target word-picture combinations (27 total targets)
10 Speed
to Faces
Average time-to-first-fixation on the most prominent face areas-of-interest
across all social attention stimuli
11 Speed
to Object
Average time-to-first-fixation on each target object area-of-interest
across all processing speed stimuli
12 Reading accuracy Sum of all fixation durations across target reading words
(38 total targets)

Note. Fixations were defined as at least 66ms of gaze point samples within a 100-pixel dispersion area. Glances were defined as an entry to an AOI with at least one fixation. To increment glance count, gaze must leave the AOI, with at least one fixation outside the AOI, and then return to the AOI with at least one fixation. With the exception of the social attention measure, which was empirically-derived, social and non-social areas-of-interest were defined a priori based on our prior investigations. Non-social areas-of-interest include non-target or distractor (extraneous) objects within social scenes. Receptive vocabulary words were chosen to range in difficulty from preschool (age 2) to adult (college) words. Word frequency was also used to select target words and distractors for the single-word reading task.

Appendix 17. Validity guidelines for all webcam performance measures.

For each paradigm, a valid stimulus was determined by requiring at least 50% fixation duration to that stimulus.

# Measure Validity Guidelines
1 Overall Attention At least 50% time on screen to at least one 1-minute video
(≥30 seconds with gaze on-screen)
2 Attentional Scanning At least 4 valid processing speed stimuli
(≥40 seconds with gaze on-screen)
3 Positive Emotion At least 50% time on screen to at least one 1-minute video
(≥30 seconds with gaze on-screen)
4 Negative Emotion At least 50% time on screen to at least one 1-minute video
(≥30 seconds with gaze on-screen)
5 Social Attention At least 8 valid stimuli with at least 8 social or 8 non-social AOIs empirically-identified
6 Social Preference At least 8 valid social stimuli and at least 8 valid social AOIs
(≥50 seconds with gaze on-screen)
7 Face Preference At least 8 valid social attention stimuli with faces
(≥40 seconds with gaze on-screen)
8 Non-social Preference At least 8 valid social stimuli and 8 valid non-social areas-of-interest
(≥50 seconds gaze on-screen)
9 Receptive Vocabulary At least 8 valid target words
(≥40 seconds with gaze on screen)
10 Speed to Faces At least 8 valid social attention stimuli with faces
(≥40 seconds with gaze on-screen)
11 Speed to Object At least 4 valid processing speed stimuli
(≥40 seconds with gaze on-screen)
12 Reading accuracy At least 8 valid target words
(≥32 seconds with gaze on screen)

Appendix 18. Definitions for the four distinct gaze metrics collected.

Gaze Metric Definition
Fixation duration The total duration in milliseconds of an identified fixation from the first sample to the last sample included in the fixation definition. Fixation was defined as at least 66ms of gaze samples within a 100-pixel dispersion area. The total fixation duration was determined by summing the fixation duration for all identified fixations within an area-of-interest.
Fixation count A count of all fixations detected within an area-of-interest.
Glance count The number of times gaze entered and left an area-of-interest. Gaze to the area of interest was defined by at least one identified fixation.
Time-to-first-fixation The time elapsed from the start of the temporal area-of-interest to the first sample of the first identified fixation within an area-of-interest.

Appendix 19. Clinician-scientist performance paradigm feedback.

Processing Speed Receptive Vocabulary Single-Word Reading
Number of experts completing 9 10 9
Device and webcam type
(webcam: internal or external)
1 Mini tablet
5 Laptop internal
1 Laptop external
2 Desktop external
6 Laptop internal
1 Laptop external
3 Desktop external
5 Laptop internal
4 Desktop external
Screen size 1 <10”
3 10–12”
3 12–18”
4 19+”
2 10–12”
3 13–18”
4 19+”
2 10–12”
3 13–18”
4 19+”
Paradigm relevance (M, SD, range)
(1=highly relevant to 4=not relevant)
1.6 (0.7, 1–3) 1.6 (0.8, 1–3) 2.4 (1.0, 1–4)
Clarity of instructions (M, SD, range)
(1=very clear to 4=very difficult to follow)
1.1 (0.3, 1–2) 1.2 (0.4, 1–2) 1.3 (1.0, 1–4)
Quality of audio (M, SD, range)
(1=very clear to 4=not clear)
1 (0) 1.7 (0.9, 1–4) 1.1 (0.3, 1–2)
Quality of pictures / words (M, SD, range)
(1=high, 2=medium, 3=low)
1 (0) 1.1 (0.3, 1–2) 1.0 (0)
Timing of administration (M, SD, range)
(1=very fast to 3=neither fast or slow to 5=very slow)
3 (0.3, 3–4) 3.1 (0.3, 3–4) 2.9 (0.3, 2–3)
Difficulty level (M, SD, range)
(1=very appropriate to 4=inappropriate)
1.7 (0.5, 1–2) 2.4 (0.9, 1–4) 2.3 (1.3, 1–4)
Possible concerns (n)
 Too many easy targets 0 0 0
 Too many moderate diff targets 0 0 0
 Too many hard targets 0 4 2
 Too few easy targets 0 1 1
 Too few moderate diff targets 0 0 0
 Too few hard targets 0 0 0
 Targets too close together 2 0 0
 Targets too far apart 0 0 0
 Target array not appropriate 0 0 0

Note. Clinician-scientist experts did not provide feedback on the social attention paradigm as this was adapted from our prior eye tracking investigations and the stimuli for this paradigm had already received prior input from clinician-scientist experts in their development. Specific qualitative feedback is not included but was used to improve the administration flow and provide better instructions to the parent facilitating the administration.

Appendix 20. Parent performance paradigm feedback.

Social Attention Processing Speed Receptive Vocabulary Single-Word Reading
Number of parents completing 8 8 9 8
Overall experience (M, SD, range)
(1=extremely positive to 5=extremely negative)
1.9 (0.8, 1–3) 2.0 (0.8, 1–3) 2.1 (1.0, 1–4) 2.9 (1.3, 1–5)
Difficulty (M, SD, range)
(1=extremely easy to 5=extremely hard)
2.0 (0.9, 1–4) 2.5 (1.5, 1–5) 2.7 (1.0, 2–4) 3.3 (1.7, 1–5)
Device and webcam type (M, SD, range)
(webcam: internal or external)
6 Laptop
1 PC/Mac
1 Did not report
7 Laptop
1 PC/Mac
8 Laptop
1 PC/Mac
7 Laptop
1 PC/Mac
Sitting during evaluation 6 Yes
2 No
7 Yes
1 No
7 Yes
1 No
5 Yes
3 No
Sensory-related movements during evaluation 2 Yes
6 No
2 Yes
6 No
4 Yes
5 No
5 Yes
3 No
Breaks 8 No 8 No 8 No
1 One break after each segment
8 No
Look away from screen
(1=Very often to 6=No looking away)
3.8 (2.3, 1–6) 3.0 (2.1, 1–6) 3.8 (1.7, 2–6) 3.4 (2.2, 1–6)
Cover or touch face
(1=Very often to 5=No covering or touching)
4.1 (1.1, 2–5) 4.1 (0.8, 3–5) 3.3 (1.4, 2–5) 3.5 (1.5, 1–5)
End video early 8 No 8 No 9 No 8 No
Unexpected noise during evaluation 5 No noise
3 One unexpected noise
7 No noise
1 five+ occurrences
7 No noise
2 One unexpected noise
6 No noise
2 One unexpected noise
Adjust lighting during evaluation 8 No adjustments 8 No adjustments 9 No adjustments 8 No adjustments
Connection problems 4 No
2 One occurrence
2 Two or three occurrences
8 No 9 No 7 No
1 One occurrence
Overall attention
(1=excellent to 3=average to 5=very poor)
1.9 (1.2 (1–4) 2.4 (1.3, 1–4) 2.2 (1.2, 1–4) 2.5 (1.5, 1–5)
Physical assistance 4 None
1 One time
3 Most of the time
3 None
1 One time
1 Part-time
3 Most of the time
3 None
2 Part-time
3 Most of the time
1 Did not report
3 None
4 Part-time
1 Most of the time
Gestural assistance 5 None
1 One time
2 Part-time
3 None
4 Part-time
1 Most of the time
5 None
3 Part-time
1 Did not report
4 None
3 Part-time
1 Most of the time
Verbal assistance 4 None
2 One time
2 Most of the time
4 None
1 One time
1 Part-time
2 Most of the time
3 None
1 One time
2 Part-time
2 Most of the time
1 Did not report
3 None
1 One time
3 Part-time
1 Most of the time
Paradigm relevance (M, SD, range)
(1=highly relevant to 4=not relevant)
1.8 (0.4, 1–2) 1.8 (.07, 1–3) 1.2 (0.4, 1–2) 2.3 (0.9, 1–3)
Quality of audio
(1=very clear to 4=not clear)
1.1 (0.3, 1–2) 1 (0) 1 (0) 1.3 (0.4, 1–2)
Quality of video / pictures / words
(1=high, 2=medium, 3=low)
1.3 (0.5, 1–2) 1.4 (0.7, 1–3) 1 (0) 1.1 (0.3, 1–2)
Timing of administration
(1=very fast to 3=neither fast or slow to 5=very slow)
2.6 (0.7, 1–3) 2.3 (0.9, 1–3) 2.0 (0.8, 1–3) 2.5 (0.8, 1–3)

Note. Two NDGS participants who were recruited to give feedback on the neurobehavioral surveys gave no feedback or only provided feedback for one performance paradigms due to life stress. As a result, only 8–9 participants completed the performance paradigms as part of pilot testing. All of these were participants who required parental support with parents completing the post-evaluation questionnaire.

Appendix 21. Participant accounting.

graphic file with name nihms-1923134-f0017.jpg

Note. Invalid cases attempted at least one webcam performance measure but achieved less than 30 seconds with gaze on screen.

Appendix 22. Histograms for each webcam-collected performance measure with super-imposed normal distribution curve.

graphic file with name nihms-1923134-f0018.jpg

graphic file with name nihms-1923134-f0019.jpg

Note. A=Overall Attention, B=Attentional Scanning, C=Positive Emotional Expressiveness, D=Negative Emotional Expressiveness, E=Social Attention, F=Social Preference, G=Face Preference, H=Non-social Preference, I=Receptive Vocabulary, J=Speed to Faces, K=Speed to Objects, L=Reading Accuracy.

Appendix 23. Inter-correlations among the performance measures.

1: OA 2: AS 3: PE 4: NE 5: SA 5: SP 6: FP 7: NP 8: RV 9: SF 10: SO 11: RA
1: Overall attention (OA) - .34* −.19* −.08 −.45* .13 .33* −.25* .54* −.26* −.28* .55*
2: Attentional Scanning (AS) - −.24* −.21* −.49* .23* .27* −.25* .54* −.21* −.78* .48*
3: Positive Emotion (PE) - −.06 .23* −.15 −.15 .08 −.23* .02 .22* −.27*
4: Negative Emotion (NE) - .22* −.15 −.11 .15 −.23* .08 .19* −.28*
5: Social Attention (SA) - −.54 −.52* .19* −.59* .30* .45* −.62*
5: Social Preference (SP) - .64* −.05 .35* −.30* −.24* .38*
6: Face Preference (FP) - −.50* .38* −.79* −.34* .38*
7: Non-social Preference (NP) - −.18 .62* .33* −.20
8: Receptive Vocabulary (RV) - −.22 −.55* .78*
9: Speed to Faces (SF) - .30* −.21
10: Speed to Object (SO) - −.47*
11: Reading accuracy (RA) -

Note. Social attention is keyed so that higher scores are more consistent with autism spectrum disorder. Speed to faces and speed to objects are keyed so that higher scores indicate longer time to fixate the target. Samples sizes vary from n=248 to n=322.

*

designates significant correlations p<.001.

Appendix 24. Association of the webcam social attention measure with autism symptom level and ASD diagnosis.

Autism Symptom Level
Baseline 1-Month Follow-Up 4-Month Follow-Up
Total n ASD n r r r
Training 192 42 .53 .42 .54
Testing 76 18 .49 .32 .54
Validation 77 22 .61 .53 .65
Testing + Validation 153 40 .55 .43 .58
 Ages 3–8 64 20 .54 .48 .62
 Ages 9+ 89 20 .57 .37 .59
ASD Diagnosis
Baseline 1-Month Follow-Up 4-Month Follow-Up
Total n ASD n AUC (SE) AUC (SE) AUC (SE)
Training 192 42 .809 (.036) .735 (.041) .804 (.039)
Testing 76 18 .790 (.066) .693 (.073) .755 (.081)
Validation 77 22 .857 (.050) .790 (.067) .883 (.051)
Testing + Validation 153 40 .821 (.041) .744 (.049) .815 (.048)
 Ages 3–8 64 20 .806 (.060) .730 (.072) .836 (.062)
 Ages 9+ 89 20 .833 (.058) .749 (.071) .822 (.072)

Note. ASD=Autism Spectrum Disorder. All correlations are statistically significant, p<.01. The Training sub-sample included a randomly-selected 60% of all administration. The testing and validation sub-samples each consisted of a random selection of 20% of all administration. Autism Symptom Level is based on averaging scores on the social communication / interaction and restricted repetitive behavior scales of the neurobehavioral evaluation tool informant-report survey.

Appendix 25. Receiver operating characteristic curve analyses evaluating the predictive validity of the social attention measure for ASD diagnosis in the combined testing and validation baseline subsamples.

graphic file with name nihms-1923134-f0020.jpg

Appendix 26. Multi-level likelihood ratios (mLRs) and sensitivity and specificity for ASD diagnosis at relevant cut scores using the social attention measure (baseline only).

Score range
(z-score)
[NT Mean=0, SD=1]
mLR Interpretation Cut Score
(z-score)
Sensitivity Specificity
< +0.1 .153 Reduced Probability 0.1 92% 50%
.1 to 1.8 .947 No Change 1.0 77% 65%
1.81 to 2.45 3.92 Increased Probability 1.8 55% 90%
2.46+ 6.41 Strongly Increased Probability 2.45 36% 95%
Youden’s J - - 1.49 70% 87%

Appendix 27. Means and standard deviations for webcam measures (z-scores) in each patient group.

PTEN NFIX SYNGAP1 Other NDGS idiopathic NDD
Webcam Measure M (SD) M (SD) M (SD) M (SD) M (SD) Pattern
Overall Attention −0.07 (1.03) −0.55 (1.18) −0.78 (1.04) −1.08 (1.29) −0.16 (1.04) Low scores for NFIX, SYNGAP1, other NDGS
Attentional Scanning −0.12 (1.58) −0.85 (0.65) −1.30 (0.80) −1.28 (0.77) −0.05 (1.25) Low scores for NFIX, SYNGAP1, other NDGS
Positive Emotion 0.05 (0.74) 0.35 (1.23) −0.07 (0.71) 0.32 (1.02) −0.06 (0.65) High scores for NFIX and other NDGS
Negative Emotion 0.18 (1.67) −0.09 (0.92) 0.62 (1.67) 0.59 (1.81) 0.10 (0.70) High scores for SYNGAP1 and other NDGS
Social Attention −0.34 (1.00) −1.78 (1.09) −1.70 (1.34) −1.78 (1.25) −0.11 (1.11) Low scores for all but idiopathic NDD
Social Preference −0.34 (0.95) −0.67 (0.85) −1.27 (1.31) −0.79 (1.19) 0.04 (1.54) Low scores for all but idiopathic NDD
Face Preference −0.10 (0.90) −0.32 (0.80) −0.54 (0.78) −0.47 (0.85) 0.13 (1.27) Low scores for NFIX, SYNGAP1, other NDGS
Non-social Preference 0.40 (1.01) 0.55 (0.94) 0.82 (1.44) 0.62 (1.24) 0.11 (1.07) High scores for all but idiopathic NDD
Receptive Vocabulary −0.26 (1.04) −0.94 (0.91) −0.82 (0.91) −1.01 (0.72) −0.13 (1.00) Low scores for all but idiopathic NDD
Speed to Faces −0.09 (1.12) 0.19 (0.74) 0.50 (0.82) 0.38 (0.84) −0.11 (1.10) Slow for NFIX, SYNGAP1, and other NDGS
Speed to Object 0.08 (1.43) 0.53 (0.74) 1.00 (0.83) 1.08 (0.74) 0.09 (1.13) Slow for NFIX, SYNGAP1, and other NDGS
Reading accuracy −0.21 (0.96) −0.70 (0.73) −0.74 (0.91) −0.93 (0.64) −0.35 (1.01) Low scores for all but idiopathic NDD

Note. Scores represent z-scores adjusted for age, the square of age, and sex derived from neurotypical control norms. Sibling controls not shown as they did not significantly deviate from the healthy control mean for any measure.

Data Availability Statement

The datasets analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Amit M, Chukoskie L, Skalsky AJ, Garudadri H, & Ng TN (2020). Flexible Pressure Sensors for Objective Assessment of Motor Disorders. Advanced Functional Materials, 30(20). https://doi.org/ARTN1905241 10.1002/adfm.201905241 [DOI] [Google Scholar]
  2. Boateng GO, Neilands TB, Frongillo EA, Melgar-Quinonez HR, & Young SL (2018). Best Practices for Developing and Validating Scales for Health, Social, and Behavioral Research: A Primer. Front Public Health, 6, 149. 10.3389/fpubh.2018.00149 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bove R, Rowles W, Zhao C, Anderson A, Friedman S, Langdon D, Alexander A, Sacco S, Henry R, Gazzaley A, Feinstein A, & Anguera JA (2021). A novel in-home digital treatment to improve processing speed in people with multiple sclerosis: A pilot study. Multiple Sclerosis, 27(5), 778–789. 10.1177/1352458520930371 [DOI] [PubMed] [Google Scholar]
  4. Busch RM, Chapin JS, Mester J, Ferguson L, Haut JS, Frazier TW, & Eng C (2013). Cognitive characteristics of PTEN hamartoma tumor syndromes. Genet Med, 15(7), 548–553. 10.1038/gim.2013.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Busch RM, Frazier Ii TW, Sonneborn C, Hogue O, Klaas P, Srivastava S, Hardan AY, Martinez-Agosto JA, Sahin M, & Eng C (2023). Longitudinal neurobehavioral profiles in children and young adults with PTEN hamartoma tumor syndrome and reliable methods for assessing neurobehavioral change. J Neurodev Disord, 15(1), 3. 10.1186/s11689-022-09468-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Busch RM, Srivastava S, Hogue O, Frazier TW, Klaas P, Hardan A, Martinez-Agosto JA, Sahin M, Eng C, & Developmental Synaptopathies C (2019). Neurobehavioral phenotype of autism spectrum disorder associated with germline heterozygous mutations in PTEN. Transl Psychiatry, 9(1), 253. 10.1038/s41398-019-0588-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Chita-Tegmark M (2016). Social attention in ASD: A review and meta-analysis of eye-tracking studies. Research in Developmental Disabilities, 48, 79–93. 10.1016/j.ridd.2015.10.011 [DOI] [PubMed] [Google Scholar]
  8. Ciaccio C, Saletti V, D’Arrigo S, Esposito S, Alfei E, Moroni I, Tonduti D, Chiapparini L, Pantaleoni C, & Milani D (2018). Clinical spectrum of PTEN mutation in pediatric patients. A bicenter experience. Eur J Med Genet. 10.1016/j.ejmg.2018.12.001 [DOI] [PubMed] [Google Scholar]
  9. Cicchetti D, Bronen R, Spencer S, Haut S, Berg A, Oliver P, & Tyrer P (2006). Rating scales, scales of measurement, issues of reliability: Resolving some critical issues for clinicians and researchers. The Journal of Nervous and Mental Disease, 194(8), 557–564. [DOI] [PubMed] [Google Scholar]
  10. Cohen J, & Cohen P (1983). Applied Multiple Regression/correlation analysis for the behavioral sciences (2nd edition ed.). L. Erlabaum Associates. [Google Scholar]
  11. Dawson G, Campbell K, Hashemi J, Lippmann SJ, Smith V, Carpenter K, Egger H, Espinosa S, Vermeer S, Baker J, & Sapiro G (2018). Atypical postural control can be detected via computer vision analysis in toddlers with autism spectrum disorder. Sci Rep, 8(1), 17008. 10.1038/s41598-018-35215-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Egger HL, Dawson G, Hashemi J, Carpenter KLH, Espinosa S, Campbell K, Brotkin S, Schaich-Borg J, Qiu Q, Tepper M, Baker JP, Bloomfield RA Jr., & Sapiro G (2018). Automatic emotion and attention analysis of young children at home: a ResearchKit autism feasibility study. NPJ Digit Med, 1, 20. 10.1038/s41746-018-0024-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. FDA. (2009). Patient-reported outcome measures: use in medical product development to support labeling claims. (United States Food and Drug Administration, Guidance for Industry., Issue. [Google Scholar]
  14. Frazier TW (2019). Autism Spectrum Disorder Associated with Germline Heterozygous PTEN Mutations. In Eng C, Ngeow J, & Stambolic V (Ed.), (Vol. 9, pp. a037002). Cold Spring Harbor Laboratory Press. 10.1101/cshperspect.a037002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Frazier TW, Busch RM, Klaas P, Lachlan K, Jeste S, Kolevzon A, Loth E, Harris J, Speer L, Pepper T, Anthony K, Graglia JM, Delagrammatikas C, Bedrosian-Sermone S, Beekhuyzen J, Smith-Hicks C, Sahin M, Eng C, Hardan AY, & Uljarevic M (2023). Development of informant-report neurobehavioral survey scales for PTEN hamartoma tumor syndrome and related neurodevelopmental genetic syndromes. Am J Med Genet A. 10.1002/ajmg.a.63195 [DOI] [PubMed] [Google Scholar]
  16. Frazier TW, Embacher R, Tilot AK, Koenig K, Mester J, & Eng C (2015). Molecular and phenotypic abnormalities in individuals with germline heterozygous PTEN mutations and autism. Molecular Psychiatry, 20(9), 1132–1138. 10.1038/mp.2014.125 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Frazier TW, Hauschild KM, Klingemier E, Strauss MS, Hardan AY, & Youngstrom EA (2020). Rapid eye-tracking evaluation of language in children and adolescents referred for assessment of neurodevelopmental disorders. Journal of Intellectual & Developmental Disability, 45(3), 222–235. 10.3109/13668250.2019.1698287 [DOI] [Google Scholar]
  18. Frazier TW, Klingemier EW, Anderson CJ, Gengoux GW, Youngstrom EA, & Hardan AY (2021). A Longitudinal Study of Language Trajectories and Treatment Outcomes of Early Intensive Behavioral Intervention for Autism. Journal of Autism and Developmental Disorders, 51(12), 4534–4550. 10.1007/s10803-021-04900-5 [DOI] [PubMed] [Google Scholar]
  19. Frazier TW, Klingemier EW, Parikh S, Speer L, Strauss MS, Eng C, Hardan AY, & Youngstrom EA (2018). Development and validation of objective and quantitative eye tracking-based measures of autism risk and symptom levels. Journal of the American Academy of Child and Adolescent Psychiatry, 57(11), 858–866. 10.1016/j.jaac.2018.06.023 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Frazier TW, Strauss M, Klingemier EW, Zetzer EE, Hardan AY, Eng C, & Youngstrom EA (2017). A Meta-Analysis of Gaze Differences to Social and Nonsocial Information Between Individuals With and Without Autism. Journal of the American Academy of Child and Adolescent Psychiatry, 56(7), 546–555. 10.1016/j.jaac.2017.05.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Frazier TW, Uljarevic M, Ghazal I, Klingemier EW, Langfus J, Youngstrom EA, Aldosari M, Al-Shammari H, El-Hag S, Tolefat M, Ali M, & Al-Shaban FA (2021). Social attention as a cross-cultural transdiagnostic neurodevelopmental risk marker. Autism Res. 10.1002/aur.2532 [DOI] [PubMed] [Google Scholar]
  22. Goodwin MS, Mazefsky CA, Ioannidis S, Erdogmus D, & Siegel M (2019). Predicting aggression to others in youth with autism using a wearable biosensor. Autism Res, 12(8), 1286–1296. 10.1002/aur.2151 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Hardan AY, Jo B, Frazier TW, Klaas P, Busch RM, Dies KA, Filip-Dhima R, Snow AV, Eng C, Hanna R, Zhang B, & Sahin M (2021). A randomized double-blind controlled trial of everolimus in individuals with PTEN mutations: Study design and statistical considerations. Contemp Clin Trials Commun, 21, 100733. 10.1016/j.conctc.2021.100733 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. IBM Corp. (2021). IBM SPSS Statistics for Windows. In (Version 28.0) IBM Corp. [Google Scholar]
  25. Kail R (1991). Developmental change in speed of processing during childhood and adolescence. Psychological Bulletin, 109(3), 490–501. 10.1037/0033-2909.109.3.490 [DOI] [PubMed] [Google Scholar]
  26. Kuntzler T, Hofling TTA, & Alpers GW (2021). Automatic Facial Expression Recognition in Standardized and Non-standardized Emotional Expressions. Front Psychol, 12, 627561. 10.3389/fpsyg.2021.627561 [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Manfredonia J, Bangerter A, Manyakov NV, Ness S, Lewin D, Skalkin A, Boice M, Goodwin MS, Dawson G, Hendren R, Leventhal B, Shic F, & Pandina G (2019). Automatic Recognition of Posed Facial Expression of Emotion in Individuals with Autism Spectrum Disorder. Journal of Autism and Developmental Disorders, 49(1), 279–293. 10.1007/s10803-018-3757-9 [DOI] [PubMed] [Google Scholar]
  28. McPartland JC, Bernier RA, Jeste SS, Dawson G, Nelson CA, Chawarska K, Earl R, Faja S, Johnson SP, Sikich L, Brandt CA, Dziura JD, Rozenblit L, Hellemann G, Levin AR, Murias M, Naples AJ, Platt ML, Sabatos-DeVito M, … Autism Biomarkers Consortium for Clinical, T. (2020). The Autism Biomarkers Consortium for Clinical Trials (ABC-CT): Scientific Context, Study Design, and Progress Toward Biomarker Qualification. Front Integr Neurosci, 14, 16. 10.3389/fnint.2020.00016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Mulder PA, van Balkom IDC, Landlust AM, Priolo M, Menke LA, Acero IH, Alkuraya FS, Arias P, Bernardini L, Bijlsma EK, Cole T, Coubes C, Dapia I, Davies S, Di Donato N, Elcioglu NH, Fahrner JA, Foster A, Gonzalez NG, … Hennekam RC (2020). Development, behaviour and sensory processing in Marshall-Smith syndrome and Malan syndrome: phenotype comparison in two related syndromes. Journal of Intellectual Disability Research, 64(12), 956–969. 10.1111/jir.12787 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Nerusil B, Polec J, Skunda J, & Kacur J (2021). Eye tracking based dyslexia detection using a holistic approach. Sci Rep, 11(1), 15687. 10.1038/s41598-021-95275-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Ness SL, Bangerter A, Manyakov NV, Lewin D, Boice M, Skalkin A, Jagannatha S, Chatterjee M, Dawson G, Goodwin MS, Hendren R, Leventhal B, Shic F, Frazier JA, Janvier Y, King BH, Miller JS, Smith CJ, Tobe RH, & Pandina G (2019). An Observational Study With the Janssen Autism Knowledge Engine (JAKE((R))) in Individuals With Autism Spectrum Disorder. Front Neurosci, 13, 111. 10.3389/fnins.2019.00111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Nunnally JC, & Bernstein IH (1994). Psychometric theory (3rd ed.). McGraw-Hill, Inc. [Google Scholar]
  33. R Core Team. (2021). R: A language and environment for statistical computing URL https://www.R-project.org/
  34. Sahin M, Jones SR, Sweeney JA, Berry-Kravis E, Connors BW, Ewen JB, Hartman AL, Levin AR, Potter WZ, & Mamounas LA (2018). Discovering translational biomarkers in neurodevelopmental disorders. Nat Rev Drug Discov. 10.1038/d41573-018-00010-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Sahin M, & Sur M (2015). Genes, circuits, and precision therapies for autism and related neurodevelopmental disorders. Science, 350(6263). 10.1126/science.aab3897 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Salley B, & Colombo J (2016). Conceptualizing Social Attention in Developmental Research. Soc Dev, 25(4), 687–703. 10.1111/sode.12174 [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Sasson NJ, & Elison JT (2012). Eye tracking young children with autism. J of Visual Experiments(61), 3675. 10.3791/3675 [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Semmelmann K, & Weigelt S (2018). Online webcam-based eye tracking in cognitive science: A first look. Behav Res Methods, 50(2), 451–465. 10.3758/s13428-017-0913-7 [DOI] [PubMed] [Google Scholar]
  39. Shehu IS, Wang YF, Athuman AM, & Fu XP (2021). Remote Eye Gaze Tracking Research: A Comparative Evaluation on Past and Recent Progress. Electronics, 10(24). 10.3390/electronics10243165 [DOI] [Google Scholar]
  40. Shu C, Green Snyder L, Shen Y, Chung WK, & Consortium S (2022). Imputing cognitive impairment in SPARK, a large autism cohort. Autism Res, 15(1), 156–170. 10.1002/aur.2622 [DOI] [PubMed] [Google Scholar]
  41. Simmatis L, Alavi Naeini S, Jafari D, Xie MKY, Tanchip C, Taati N, McKinlay S, Sran R, Truong J, Guarin DL, Taati B, & Yunusova Y (2023). Analytical Validation of a Webcam-Based Assessment of Speech Kinematics: Digital Biomarker Evaluation following the V3 Framework. Digit Biomark, 7(1), 7–17. 10.1159/000529685 [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Srivastava S, Jo B, Zhang B, Frazier T, Gallagher AS, Peck F, Levin AR, Mondal S, Li Z, Filip-Dhima R, Geisel G, Dies KA, Diplock A, Eng C, Hanna R, Sahin M, Hardan A, & Developmental Synaptopathies C (2022). A randomized controlled trial of everolimus for neurocognitive symptoms in PTEN hamartoma tumor syndrome. Human Molecular Genetics, 31(20), 3393–3404. 10.1093/hmg/ddac111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Steele M, Uljarevic M, Rached G, Frazier TW, Phillips JM, Libove RA, Busch RM, Klaas P, Martinez-Agosto JA, Srivastava S, Eng C, Sahin M, & Hardan AY (2021). Psychiatric Characteristics Across Individuals With PTEN Mutations. Front Psychiatry, 12, 672070. 10.3389/fpsyt.2021.672070 [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Streiner DL, & Norman GR (1995). Health Measurement Scales: A Practical Guide To Their Development and Use (2nd ed.). Oxford University Press. [Google Scholar]
  45. Tuncgenc B, Pacheco C, Rochowiak R, Nicholas R, Rengarajan S, Zou E, Messenger B, Vidal R, & Mostofsky SH (2021). Computerized Assessment of Motor Imitation as a Scalable Method for Distinguishing Children With Autism. Biol Psychiatry Cogn Neurosci Neuroimaging, 6(3), 321–328. 10.1016/j.bpsc.2020.09.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Vlaskamp DRM, Shaw BJ, Burgess R, Mei D, Montomoli M, Xie H, Myers CT, Bennett MF, XiangWei W, Williams D, Maas SM, Brooks AS, Mancini GMS, van de Laar I, van Hagen JM, Ware TL, Webster RI, Malone S, Berkovic SF, … Scheffer IE (2019). SYNGAP1 encephalopathy: A distinctive generalized developmental and epileptic encephalopathy. Neurology, 92(2), e96–e107. 10.1212/WNL.0000000000006729 [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Youngstrom EA, Salcedo S, Frazier TW, & Perez Algorta G (2019). Is the Finding Too Good to Be True? Moving from “More Is Better” to Thinking in Terms of Simple Predictions and Credibility. J Clin Child Adolesc Psychol, 48(6), 811–824. 10.1080/15374416.2019.1669158 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets analyzed during the current study are available from the corresponding author on reasonable request.

RESOURCES