Abstract
This study focused on the development and initial psychometric evaluation of a set of online, webcam-collected, and artificial intelligence-derived patient performance measures for neurodevelopmental genetic syndromes (NDGS). Initial testing and qualitative input was used to develop four stimulus paradigms capturing social and cognitive processes, including social attention, receptive vocabulary, processing speed, and single-word reading. The paradigms were administered to a sample of 375 participants, including 163 with NDGS, 56 with idiopathic neurodevelopmental disability (NDD), and 156 neurotypical controls. Twelve measures were created from the 4 stimulus paradigms. Valid completion rates varied from 87% to 100% across measures, with lower but adequate completion rates in participants with intellectual disability. Adequate to excellent internal consistency reliability (α=.67 to .95) was observed across measures. Test-retest reproducibility at 1-month follow-up and stability at 4-month follow-up was fair to good (r=.40–.73) for 8 of 12 measures. All gaze-based measures showed evidence of convergent and discriminant validity with parent-report measures of other cognitive and behavioral constructs. Comparisons across NDGS groups revealed distinct patterns of social and cognitive functioning, including people with PTEN mutations showing a less impaired overall pattern and people with SYNGAP1 mutations showing more attentional, processing speed, and social processing difficulties relative to people with NFIX mutations. Webcam-collected performance measures appear to be a reliable and potentially useful method for objective characterization and monitoring of social and cognitive processes in NDGS and idiopathic NDD. Additional validation work, including more detailed convergent and discriminant validity analyses and examination of sensitivity to change, is needed to replicate and extend these observations.
1. Introduction
Advances in identifying pathogenic variation linked to neurodevelopmental disability (NDD) has accelerated the discovery of a growing number of specific neurodevelopmental genetic syndromes (NDGS). As NDGS are identified, natural history investigations have begun to characterize a wide spectrum of medical conditions and neurobehavioral strengths and weaknesses associated with each condition (Busch et al., 2023; Mulder et al., 2020; Vlaskamp et al., 2019). This work is crucial to developing patient support guidelines and ensuring that patients with NDGS receive appropriate supports that maximize their development. For example, in individuals with PTEN hamartoma tumor syndrome (PHTS) resulting from germline heterozygous mutations in PTEN, a spectrum of frontal-systems deficits has been identified from no impairment to very severe impairment associated with intellectual disability (ID) and autism spectrum disorder (ASD) (Busch et al., 2019; Ciaccio et al., 2018; Frazier et al., 2015; Steele et al., 2021). This pattern has been found to be stable over a period of 2 years (Busch et al., 2023), even in young children, and the specific profile of frontal systems impairment can be used to inform clinical and educational care (Frazier, 2019).
While there have been some initial attempts to provide more detailed characterization of neurobehavioral profiles across different NGDS, yield from the natural history and neurobehavioral studies have been limited by the lack of comprehensive and sensitive instruments appropriate for evaluations with geographically-dispersed populations. For example, within the Rare Disease Clinical Research Network – Developmental Synaptopathies Consortium natural history study of individuals with PHTS and ASD (Busch et al., 2019), in-person cognitive assessments were limited to annual visits and often required several hours of testing to collect data from relevant neurocognitive domains. Because of the extensive effort required, the related pilot clinical trial initiated within this network was limited to three in-person assessments over a six-month study period (Hardan et al., 2021; Srivastava et al., 2022). The infrequency, difficulty, and burden of these traditional approaches highlight the need for new phenotyping methods.
Identification of NDGS has also accelerated the development of syndrome-specific patient advocacy groups and foundations, as well as programs of research designed to better understand and translate molecular, cellular, and circuitry findings into intervention strategies. A primary goal of these patient advocacy groups - and the research programs they support - is to develop and evaluate the efficacy of personalized interventions. Recent reviews of NDGS have emphasized the need to understand pathophysiology and neurobehavioral profiles to generate personalized therapeutic strategies (Frazier, 2019; Sahin & Sur, 2015). Yet, given the small number of specialty clinics focused on each NDGS, and practical geographic constraints, many patients remain under-served and many clinics lack resources to collect extensive neurobehavioral assessments during clinic visits. Relatedly, due to the rare nature of many NDGS, natural history studies often rely on small sample sizes, which limits their value in identifying clinical endpoints for trials. In these small-sample longitudinal contexts, it is important to have reliable, stable indicators of individual performance, as compared to larger group studies where statistical certainty can be bolstered by adding participants. Having repeatable, online measures of neurobehavioral function could substantially improve both the statistical power of translational and clinical studies and increase the ability to more rapidly and sensitively identify individual differences in the pattern of intervention response. Administration of these meaures in the individual’s home rather than within a clinic setting would not only broaden access to research participation but might also reduce biases resulting from collection of neurobehavioral information in an unfamiliar setting.
Research in NDGS and idiopathic NDD is also limited by reliance on subjective measurements acquired from parents/caregivers and/or observations by clinician scientists, which has precipitated a call for the development of objective measures (Sahin et al., 2018). As a result, a number of tools have begun to be developed and have shown promise for objectively evaluating and tracking key functions relevant to neurodevelopment (Amit et al., 2020; Dawson et al., 2018; Egger et al., 2018; Goodwin et al., 2019; Manfredonia et al., 2019; McPartland et al., 2020; Ness et al., 2019; Tuncgenc et al., 2021). However, with a few notable exceptions, these measures have been developed solely for in-person evaluation, limiting their application and temporal sensitivity. In addition, noted measures have predominantly focused on the evaluation of only single domains rather than providing a more detailed characterization of multiple social, developmental, and cognitive domains. Furthermore, a high percentage of individuals with NDGS have significant cognitive and functional impairments. A relatively brief and repeatable battery of objective measures that can reliably capture a wide range of cognitive and behavioral capacities could supplement existing tools while simultaneously increasing sensitivity to intervention effects.
One possibility that can increase the objectivity of NDGS evaluations and simultaneously overcome accessibility barriers is to augment traditional characterization methods with appropriately-designed remotely-administered measures of neurobehavioral function. Designing remote measures for maximal accessibility has the potential to lower burden for providers as well as patients. Webcam-based eye tracking is a remote data collection method that uses cameras on everyday computing devices coupled with artificial-intelligence / machine learning algorithms to capture individual looking patterns toward probes such as the presentation of videos and images. Webcam data collection also permits the frame-by-frame automated facial expression analysis using machine learning algorithms that enable prototype matching using large training datasets. The potential for these methods to inform neurodevelopment is strong and, increasingly, both webcam-collected data (Simmatis et al., 2023) and artificial intelligence / machine learning algorithms (Nerusil et al., 2021) are being applied to create novel biometric measures for assessing child development and neurological conditions. A key advantage of webcam-based data collection is that the paradigms can be administered without direct real-time clinical supervision. Thus, an online, webcam-collected patient performance battery, capturing relevant social and cognitive measurements in an objective way, could supplement in-person assessment of NDGS patients and provide a more temporally-sensitive picture of neurobehavioral development in these populations. This is particularly true for individuals with medical and mental health comorbidities and cognitive impairments that merit closer surveillance but are currently underserved (Vlaskamp et al., 2019).
Unfortunately, at present, there are no accessible, scalable objective measures specifically designed for rapid and repeated evaluation of multiple social and cognitive domains important to NDGS and idiopathic NDD. The primary aim of this study was to address this limitation and develop social and cognitive stimulus paradigms that could be paired with webcam collection and artificial intelligence algorithms to measure key neurocognitive processes relevant to NDGS. Webcam-collected measures were developed in conjunction with clinician-scientist experts, patients, and parents/caregivers, following gold-standard principles of measure development (Boateng et al., 2018) and inclusive practices (FDA, 2009), to complement our recently developed and validated informant-report survey scales (Frazier et al., 2023). Individual paradigms were created to be brief (3–4 minutes) and to require only spontaneous or directed gaze, without motor or speech responses, thus making it appropriate for a wide range of developmental and cognitive levels. Stimuli followed best practices in gaze collection (Sasson & Elison, 2012) and test development (Boateng et al., 2018), including teaching parents to facilitate data collection (when needed) without interfering in the evaluation, presenting large elements within the visual field to limit accuracy issues in webcam gaze collection (Semmelmann & Weigelt, 2018), and, where relevant, focusing on very easy initial items with a graded increase in task difficulty. Based on careful attention to applicability to a wide range of individuals with NDGS, valid measure collection was expected to be achieved in the majority of participants, including those with intellectual disability.
A secondary aim of this study was to conduct initial psychometric evaluation of these measures in several distinct NDGS groups, people with idiopathic NDD, and neurotypical controls. Initial evaluation included estimation of scale reliability, test-retest reproducibility (1-month follow-up), and stability (4-month follow-up). Initial convergent and discriminant validity was assessed using data from other informant(parent)-reported clinical information (Frazier et al., 2023). In addition, given the importance of detecting autism within NDGS to ensure access to appropriate services, concurrent validity with ASD diagnoses and autism symptom levels was evaluated. Finally, using baseline data, exploratory analyses examined the pattern of cognitive and behavioral functioning across NDGS and idiopathic NDD.
2. Methods
2.1. Initial Stimulus Development
The stimulus paradigm development process is outlined in Appendix 1. Briefly, this included identifying or creating appropriate target items and stimuli across a wide range of ages (3–45) and ability levels (moderate to severe cognitive impairment to average ability levels); collecting feasability data; updating items and stimuli based on initial feedback; conducting a pilot administration of performance measures with 10 clinician-scientist experts and 9 parents and patients with NDGS and/or idiopathic NDD; and administering a post-evaluation survey to collect additional feedback and create the final performance paradigms.
The social paradigm and associated stimuli were chosen based on the combination of empirical work (Frazier et al., 2018) and comprehensive review of the literature (Chita-Tegmark, 2016; Frazier et al., 2017) . Specifically, a variety of social stimuli were selected, in part, due to the high rates of ASD occurrence in NDGS and the broader relevance of social attention to neurodevelopment as a transdiagnostic construct (Frazier, Uljarevic, et al., 2021; Salley & Colombo, 2016). The processing speed paradigm was selected because of the potential to use this cognitive paradigm to capture attentional scanning across the stimulus field, measure speed of object detection via gaze, the ease-of-administration in individuals with NGDS, particularly those with limited speech or motor difficulties, and the ability to create easier stimuli relevant to individuals with more significant intellectual impairments. Importantly, processing speed has been shown to be a very sensitive index of brain development and neuropathophysiological processes (Bove et al., 2021; Kail, 1991). The receptive vocabulary paradigm was selected because receptive language is a strong indicator of developmental trajectory and functional outcome (Frazier, Klingemier, et al., 2021) and can validly estimate results from standardized in-person testing using gaze to visual targets (Frazier et al., 2020). The single-word reading paradigm was developed based on a recommendation by clinician-scientist experts for identifying early reading, including in people with limited or no speech where reading is more difficulty to assess. This paradigm was also included based on its potential to monitor development of reading throughout childhood and early adulthood in NDGS. Additional information for receptive vocabulary and single-word reading target selection and stimulus creation are provided in Appendices 2–3. Example screenshots for each of the performance paradigms are included in Appendices 4–7, and stimulus/target order and composition information are provided in Appendices 8–11.
2.2. Clinician-Scientist Experts and Parent Pilot Evaluation Feedback
Ten clinician-scientist experts were recruited based on their clinical and/or research expertise with a specific NDGS group or idiopathic NDD. Nine parent-patient pairs were recruited from the respective groups (6 PHTS, 1 NFIX, 1 SYNGAP1, 1 ADNP, and 1 idiopathic ASD). Patients were intentionally selected to represent a range of ages and cognitive levels. After completing a pilot administration of performance paradigms, clinician-scientist experts and parents - who facilitated the webcam administration for the patient participant - completed a post-evaluation survey. Questions are provided in Appendices 12–13. This information was used to generate final stimulus videos and to improve the training of parents in facilitating administration to the child.
2.3. Parent/Caregiver Administration Support Training
Based on initial feedback, a parent/caregiver training process was developed (Appendix 14). This process included the following elements: 1) introduction to webcam technology, 2) training video, 3) parent completion of a “practice” stimulus set, 4) online training in valid task completion, and 5) virtual support meetings during initial and follow-up administrations. All of the elements were optional, but most participants used at least one option, and nearly all participants completed the parent “practice” stimuli.
2.4. Webcam Collection of Gaze
Participants were instructed to use a device with at least a 10” screen size based on results of initial pilot testing, which indicated that smaller screen sizes could reduce accuracy of point-of-regard relative to specific areas-of-interest. Webcam data were collected and processed using proprietary CoolTool software. The software was orginally intended as a neuromarketing tool, but initial feasability testing, including with several young children with neurodevelopmental disabilities, indicated good potential for use as a data collection platform. The minimum required camera resolution was 720p at 30fps. The gaze collection algorithm included a five-point calibration routine prior to each paradigm administration. This routine is coupled with a machine learning algorithm and was designed to detect webcam position within the 3-D space and intended to maximize gaze accuracy. On a frame-by-frame basis, gaze position relative to the 2-D screen was estimated. While accurate calibration is desirable, the gaze estimation model often functions adequately when less than ideal calibration data are acquired, making the system ideal for young and more impaired participants. Similar systems have been shown to have achieve ~3–5 degrees of calibration uncertainty, translating to accurate detection of areas >10% of screen size (Semmelmann & Weigelt, 2018; Shehu et al., 2021). The present stimulus paradigms were built with large areas-of-interest to be tolerant of higher levels of gaze uncertainty. Importantly, any reductions in gaze accuracy should reduce the reliability and validity of gaze-based measurements. Thus, observations of high reliability and evidence of convergent validity would suggest minimal impact of sub-optimal gaze calibration. To offset concerns regarding possible reductions in gaze calibration and accuracy negatively impacting neurobehavioral measurements, no indices were scored if total time with eyes on screen was estimated to be less than 30 seconds overall (out of 15 minutes of possible gaze time to the screen).
Areas-of-interest were generated for each stimulus. For social attention stimuli, these include both socially-relevant (e.g., faces, target objects, etc.) and socially-irrelevant stimuli (e.g., foreground and background distractors, non-target objects), based on our prior research (Frazier et al., 2018). For processing speed, receptive vocabulary, and single-word reading stimuli, areas-of-interest included target items/objects. For all stimuli, areas-of-interest are temporally-defined based on expected gaze patterns from prior research (social attention) (Frazier et al., 2018) or after the verbal directive has been given (cognitive paradigms) (Frazier et al., 2020).
2.5. Automated Scoring of Facial Expressions
The webcam software also includes a proprietary algorithm for automatically scoring facial expressions. Facial landmarks are identified in the 3-dimensional space and the artificial intelligence algorithm is applied to these landmarks on a frame-by-frame basis to generate probability scores based on accuracy of classification from training data (Kuntzler et al., 2021). Probability scores represent a match between the facial landmark configuration and known sets of facial expressions (fear, anger, disgust, sadness, surprise, joy, and neutral), with closer matches being interpreted as higher intensities of expression (range 0–100%). For the present study, and because specific affect recognition intensities can be prone to error for more subtle expressions (Kuntzler et al., 2021), specific expressions were aggregated into positive and negative categories to maximize reliability. Facial expression measures were only collected to the social attention stimuli, as these showed the greatest range of non-neutral expressions in preliminary data.
2.6. Development of a Priori Validity Criteria and Scoring
For each social and cognitive paradigm, the investigative team a priori identified possible gaze and facial expression measures that would be relevant to evaluating social and cognitive processes in NDGS and idiopathic NDD. The only exception to this is the social attention measure which was empirically-developed following our prior published methodology (Frazier et al., 2018) (see Appendix 15 for additional information). Appendix 16 presents operational definitions for each performance measure. Each gaze-based measure was only scored if stringent validity criteria were met. Appendix 17 includes validity criteria for all 12 webcam-collected measures. For each measure, validity criteria ensure that the participant attended to the stimuli for at least 30 seconds, and at least 8 valid targets or 4 valid stimuli were collected. Fixations were scored by identifying at least 66ms of gaze point samples within a 100-pixel dispersion. Four gaze metrics are calculated for each area-of-interest – fixation duration, fixation count, glance count, and time-to-first fixation (Appendix 18). These metrics were used to score the 12 performance measures evaluated in this study.
2. 7. Participants for Initial Measure Evaluation
NDGS groups included participants with PTEN Hamartoma Tumor Syndrome (PHTS), ADNP , SYNGAP1, or NFIX recruited via contacts through the PTEN Hamartoma Tumor Syndrome Foundation with the support of the PTEN Research Foundation, the ADNP Kids Foundation, the SYNGAP Research Fund, and the Malan Syndrome Foundation. Other individuals with NDGS were recruited via the Simons Foundation Searchlight registry and included people with mutations in GRIN2B, CSNK2A1, HIVEP2, SCN2A, MED13L, and STXBP1. Given the relatively small sample sizes for ADNP (n=11) and these NDGS groups, they were combined into a single “other NDGS” group (n=63). Individuals were included if they were between the ages of 3 and 45 at enrollment and had an available parent or other close relative/caregiver to complete informant-report measures. Siblings of individuals with NDGS were also eligible to participate, and unrelated neurotypical controls were recruited using StudyKik, a national recruitment service. Siblings and unrelated controls who were reported to have an idiopathic neurodevelopmental disability were included in a separate group.
2.8. Procedure
Parent/caregiver informants first completed a demographic and clinical information questionnaire followed by 11 neurobehavioral evaluation tool (NET) survey scales (Frazier et al., 2023). These survey scales included 6 measures of symptoms/problems (anxiety, attention-deficit/hyperactivity disorder, restricted/repetitive behavior, challenging behavior, mood, and sleep problems) and 5 measures of skills/functioning (motor skills, daily living skills, social communication/interaction skills, executive functioning, and quality of life). After NET survey completion, informants and participants were instructed to complete webcam-collected performance measures and were sent links via email or text to facilitate completion. For young and/or impaired children, performance measure administration began by having the parent complete a practice version, so that they understood how the webcam collection works and how best to help their child. Parents and older patients also were offered a video call with the research coordinator to review best practices in performance measure administration and were provided a set of recommendations to improve evaluation validity.
Performance measure administration began with the 5-point calibration that included dots presented in the four corners and center of the screen. Next, videos were presented for each paradigm in succession – social attention, receptive language, processing speed, and single-word reading. Re-calibration automatically occured prior to each paradigm.
Survey and webcam measures were collected at baseline, 1-month, and 4-month follow-up timepoints. The maximum total administration time across all paradigms was 15 minutes (social attention – 4 min, receptive vocabulary – 4 min, processing speed – 3 min, single-word reading – 4 min) with videos separated into 1-minute segments to permit breaks. A button press was required to advance to the next video. Participants were instructed to complete all of the social attention and processing speed videos, but were permitted to complete only the first two minutes of the receptive vocabulary paradigm and complete only one minute of the single-word reading paradigm dependent on the parent’s appraisal of the patient’s capacity to engage with the paradigm. Participants could proceed through all paradigms or take breaks between paradigms but were encouraged to finish all videos in one sitting if possible.
IRB approval was obtained for all of the qualitative and quantitative procedures of the study, including administration of the final NET scales, and parents/legally-authorized representatives and adult patients provided informed consent prior to completing any study procedures. Assent for minors was also obtained, where appropriate.
2.9. Statistical Analyses
2.9.1. Sample Characterization
Descriptive statistics for demographic and clinical factors were computed to characterize the sample, and Chi-square or univariate ANOVA were used to compare across the seven study groups (PHTS, SYNGAP1, NFIX, other NDGS, idiopathic NDD, sibling controls, and unrelated neurotypical controls).
2.9.2. Evaluation and Measure Validity
Using validity criteria for each of the 12 performance measures, the sum of valid measures was computed and compared across study groups using univariate ANOVA. Proportions of validity by measure were also computed overall and by parent-reported intellectual disability status.
2.9.3. Reliability
Scale reliability (internal consistency) was calculated using Cronbach’s alpha (α) (Streiner & Norman, 1995). Scale reliability estimates falling in the ranges .70 to .79, .80 to .89, and >.90 were considered fair, good, and excellent (Nunnally & Bernstein, 1994), respectively. Test-retest reproducibility (one-month follow-up) and stability (4-month follow-up) were estimated using Pearson’s bivariate correlations. Test-retest estimates <.40 were considered poor, .40 to .59 fair, .60 to .74 good, and .75+ excellent (Cicchetti et al., 2006).
2.9.4. Convergent and Discriminant Validity
To evaluate convergent and discriminant validity, other clinical information based on informant-report was a priori selected as either measuring similar constructs (convergent validity) or measuring dissimilar constructs (discriminant validity) for each performance measure. Informant-report information included: estimated IQ; speech level (5-point scale from non or minimally speaking to fluent speech); reading level (5-point scale from no reading to paragraph level or higher). ADHD, anxiety, mood, challenging behavior, social communication/interaction, and restricted repetitive behavior symptoms; sleep problems; daily living skills; executive functioning; and motor skills. Bivariate correlations were computed between each performance measure and the convergent and discriminant validity measures selected. To compute aggregate correlations over multiple measures, correlations were converted to Fisher’s z, averaged, and transformed back to a correlation metric. The test of the significance of the difference in dependent correlations was used to examine whether convergent validity correlations were higher than discriminant validity correlations (Cohen & Cohen, 1983).
2.9.5. Concurrent Validity with ID, ASD Diagnoses, and Autism Symptom Levels
To examine concurrent validity of performance measures with parent-report clinical ID diagnosis, independent samples t-tests were computed with each measure as the dependent variable and ID status (yes, no) as the grouping variable. Cohen’s d was computed to estimate the magnitude of group differences. To evaluate potential diagnostic validity, receiver operating characteristic (ROC) curve analyses were calculated in the training, testing, validation, and testing plus validation sub-samples, separately for baseline, 1-month, and 4-month follow-up data. Areas under the curve (AUCs) evaluated diagnostic validity. A rough guideline for evaluating AUC values is: < .60 = poor, .60 - .69 = fair, .70 to .79 = good, 80 - .89 = excellent if the comparison group is clinically meaningful; and .90 – 1.00 = exceptional only if the design and comparison are appropriate (Youngstrom et al., 2019). To evaluate concurrent validity with autism symptom levels, autism symptom levels derived from neurobehavioral evaluation survey scales were calculated and correlations were computed in the same subsamples as ROC analyses.
2.9.6. Neurobehavioral Patterns across NDGS and idiopathic NDD Groups
To explore unique patterns of social and cognitive function, webcam meausures were first normed using regression-based norming in unrelated healthy controls, with age, the square of age (to capture non-linear developmental trends), and sex included as predictors in each equation. This approach puts each measure on a z-score metric relative to healthy controls. Using these standardized residual scores, univariate analysis of variance models were computed, with each of the seven groups as the independent variable and the performance measure scores as dependent variables in separate analyses.
2.9.7. Statistical Power
Assuming total sample sizes of 200+ for reliability and validity analyses, statistical power to detect a bivariate correlation of r≥.40 was excellent (>.99; one-tailed p-value of .05). Assuming minimal sub-sample sizes of at least 18 ASD and 40 non-ASD diagnosed individuals, power to detect AUCs≥.72 was at least good (≥.80). Statistical power to detect group differences across webcam performance measures, assuming a minimum sample size of 24, was at least adequate (>.82) if large group differences were observed (d≥.80; α=.05, two-tailed). For larger group sizes (n>40), power was adequate, even for medium effects (d≥.50).
2.9.8. Statistical Analysis Implementation
Statistical significance was set at α=.05, two-tailed, and effect size magnitude was emphasized. Data preparation, descriptive analyses, internal consistency reliability using Cronbach’s alpha (α), and bivariate correlations were computed in SPSS v28 (IBM Corp, 2021). ROC analyses were computed using the R package pROC and implemented in version 4.1.2 (R Core Team, 2021) using R Studio version 2021.09.1.
3. Results
3. 1. Pilot Evaluation Results
Clinicians used a wide range of hardware setups and reported high relevance of the paradigms to their respective NDGS or idiopathic NDD group (Appendix 19). Clarity of instructions and quality of audio and visual stimuli was rated as high. Timing was rated as generally moderate (neither fast nor slow). Several potential concerns about target difficulty levels were raised and used to adjust the final stimuli.
Parents rated the overall experience as positive and of relatively moderate difficulty across paradigms (Appendix 20). Patient participants did not require breaks, looked away from the screen with variable frequency (every 5–10 seconds to only a few losses of attention to screen), covered or touched their face only infrequently, and required variable levels of physical, gestural, or verbal assistance to maintain motivation and attention. Unexpected intrusions and adjustments to lighting were infrequent. Overall attention was rated as average to good. Paradigm relevance to the patient’s condition was rated as “relevant” to “highly relevant” across paradigms. Quality of audio and visual stimuli was rated as high, and timing was judged to be generally moderate to fast. These data were used to adjust parent training processes and to include reminders to limit assistance to motivation and general attention (not specific to a stimulus or desired response).
3.2. Sample Characteristics
A total of 395 individuals enrolled to participate before 04/05/2023 (recruitment is ongoing). Of these, 20 did not attempt baseline webcam paradigms, but of the 375 who did attempt the paradigms, all achieved at least 1 valid measure (Appendix 21). Longitudinal attrition was modest at 1-month follow-up (n=54 did not attempt; n=341 attempted) but higher at 4-month follow-up (n=100 did not attempt; n=295 attempted).
Table 1 presents sample characteristics. Findings were highly consistent with findings in our recent survey validation study (Frazier et al., 2023). Specifically, participants were younger in the NFIX and SYNGAP1 groups and older in the PHTS and idiopathic NDD groups, with high rates of spousal informants in the latter groups. All groups had very high proportions of White/Caucasian participants, although Hispanic ethnicity approximated US population proportions in most groups, and the sample had a wide range of household incomes. Estimated cognitive levels were lowest in the NFIX, SYNGAP1, and other NDGS groups and to a lesser extent in the PHTS group relative to control groups. Informant-reported developmental diagnoses were highly variable across NDGS groups, but with elevated rates of ASD, ID, anxiety, and motor disorder in NFIX, SYNGAP1, and other NDGS groups compared to controls. Participants were predominantly from the US (n=325, 87%), but a small minority of participants with informants fluent in English were also included from other countries (United Kingdom n=17, Canada n=24, Australia n=4, New Zealand n=1, Ireland n=2, Netherlands n=1, Israel n=1).
Table 1.
Demographic and clinical characteristics by study group.
Sibling Controls |
Unrelated Controls |
PHTS | NFIX | SYNGAP1 | Other NDGS |
NDD | X2 / F (p) | |
---|---|---|---|---|---|---|---|---|
n (%) | n (%) | n (%) | n (%) | n (%) | n (%) | n (%) | ||
N | 40 | 116 | 33 | 24 | 43 | 63 | 56 | |
Informant Age (M, SD) | 42 (6) | 42 (9) | 43 (8) | 41 (10) | 42 (8) | 44 (8) | 42 (8) | 0.6 (.718) |
Informant Sex (% Female) | 37 (93%) | 95 (82%) | 28 (85%) | 21 (88%) | 39 (91%) | 61 (97%) | 51 (91%) | 12.3 (.424) |
Informant Relationship to Participant | 39.3 (.003) | |||||||
Biological Parent | 39 (98%) | 99 (85%) | 25 (76%) | 23 (96%) | 40 (93%) | 59 (93%) | 44 (79%) | |
Adoptive or Custodial Parent | 0 (0%) | 3 (3%) | 1 (3%) | 1 (4%) | 1 (2%) | 4 (6%) | 2 (4%) | |
Other Biological Relative / Sibling | 1 (2%) | 7 (6%) | 0 (0%) | 0 (0%) | 1 (2%) | 0 (0%) | 3 (5%) | |
Spouse/Other Non-Biological Relative | 0 (0%) | 7 (6%) | 7 (21%) | 0 (0%) | 1 (2%) | 0 (0%) | 7 (12%) | |
Household Income (US $) | 79.7 (.013) | |||||||
<$25,000 | 1 (3%) | 5 (4%) | 2 (6%) | 0 (0%) | 0 (0%) | 2 (3%) | 8 (14%) | |
$25,000–$34,999 | 2 (5%) | 8 (7%) | 0 (0%) | 2 (8%) | 1 (2%) | 1 (2%) | 2 (4%) | |
$35,000–$49,999 | 1 (3%) | 5 (4%) | 1 (3%) | 3 (13%) | 3 (7%) | 3 (5%) | 6 (11%) | |
$50,000–$74,999 | 6 (15%) | 18 (16%) | 9 (27%) | 4 (17%) | 3 (7%) | 4 (6%) | 11 (20%) | |
$75,000–$99,999 | 2 (5%) | 21 (18%) | 3 (9%) | 3 (13%) | 4 (9%) | 6 (10%) | 4 (7%) | |
$100,000–$149,999 | 7 (18%) | 28 (24%) | 7 (21%) | 4 (17%) | 10 (23%) | 16 (25%) | 11 (20%) | |
$150,000–$199,999 | 7 (18%) | 14 (12%) | 4 (12%) | 5 (21%) | 10 (23%) | 6 (10%) | 6 (11%) | |
$200,000+ | 6 (15%) | 13 (11%) | 2 (6%) | 2 (8%) | 7 (16%) | 12 (19%) | 5 (9%) | |
Did not report | 8 (20%) | 4 (3%) | 5 (15%) | 1 (4%) | 5 (11%) | 13 (21%) | 3 (5%) | |
Participant Age (M, SD) | 11 (5) | 12 (8) | 17 (13) | 10 (7) | 10 (7) | 11 (6) | 16 (9) | 4.8 (<.001) |
Participant Sex (% Female) | 23 (58%) | 63 (54%) | 13 (39%) | 12 (50%) | 19 (44%) | 36 (57%) | 21 (38%) | 8.6 (.197) |
Participant Race / Ethnicity | ||||||||
White / Caucasian | 36 (90%) | 95 (82%) | 30 (91%) | 24 (100%) | 37 (86%) | 58 (92%) | 46 (82%) | 9.6 (.142) |
Black / African American | 3 (8%) | 9 (8%) | 2 (6%) | 0 (0%) | 5 (12%) | 5 (8%) | 8 (14%) | 5.6 (.473) |
Middle Eastern or North African | 2 (5%) | 1 (1%) | 1 (3%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 9.1 (.167) |
East Asian | 2 (5%) | 9 (8%) | 3 (9%) | 0 (0%) | 2 (5%) | 5 (8%) | 2 (4%) | 3.8 (.697) |
South Asian | 2 (5%) | 8 (7%) | 0 (0%) | 0 (0%) | 1 (2%) | 3 (5%) | 0 (0%) | 8.2 (.223) |
Native American / Alaskan Native | 0 (0%) | 3 (3%) | 1 (3%) | 1 (4%) | 0 (0%) | 0 (0%) | 1 (2%) | 4.5 (.605) |
Native Hawaiian / Pacific Islander | 0 (0%) | 1 (1%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 2.2 (.896) |
Hispanic | 7 (18%) | 21 (18%) | 1 (3%) | 5 (21%) | 7 (17%) | 2 (3%) | 11 (20%) | 18.7 (.096) |
Unknown | 0 (0%) | 2 (2%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 4.5 (.611) |
Did not report | 0 (0%) | 2 (2%) | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (2%) | 3.6 (.734) |
Cognitive Level (informant-estimated) | 337.9 (<.001) | |||||||
Very high or above (120+) | 6 (15%) | 12 (10%) | 3 (9%) | 0 (0%) | 0 (0%) | 1 (2%) | 10 (18%) | |
High Average (110–119) | 18 (45%) | 58 (50%) | 6 (18%) | 0 (0%) | 0 (0%) | 0 (0%) | 19 (34%) | |
Average (90–109) | 13 (33%) | 42 (36%) | 15 (46%) | 0 (0%) | 1 (2%) | 2 (3%) | 22 (39%) | |
Below average (80–89) | 0 (0%) | 0 (0%) | 1 (3%) | 2 (8%) | 4 (9%) | 6 (10%) | 2 (4%) | |
Borderline impairment (70–79) | 0 (0%) | 0 (0%) | 2 (6%) | 2 (8%) | 1 (2%) | 2 (3%) | 0 (0%) | |
Mild impairment (55–69) | 0 (0%) | 0 (0%) | 1 (3%) | 5 (21%) | 6 (14%) | 12 (19%) | 3 (5%) | |
Moderate impairment (40–54) | 0 (0%) | 0 (0%) | 2 (6%) | 9 (38%) | 11 (26%) | 17 (27%) | 0 (0%) | |
Severe impairment (21 to 39) | 0 (0%) | 0 (0%) | 0 (0%) | 2 (8%) | 10 (23%) | 12 (19%) | 0 (0%) | |
Profound impairment (<20) | 0 (0%) | 0 (0%) | 0 (0%) | 2 (8%) | 5 (12%) | 3 (5%) | 0 (0%) | |
Did not report | 3 (8%) | 4 (3%) | 3 (9%) | 2 (8%) | 5 (12%) | 8 (13%) | 0 (0%) | |
Cognitive Estimate from Prior Testing | 6 (15%) | 19 (16%) | 16 (49%) | 13 (54%) | 21 (49%) | 30 (48%) | 26 (46%) | 57.1 (<.001) |
Developmental Diagnoses (n, %) | ||||||||
ASD | ‐ | ‐ | 9 (27%) | 5 (21%) | 35 (81%) | 32 (51%) | 8 (14%) | 54.8 (<.001) |
ID/GDD | ‐ | ‐ | 10 (30%) | 21 (88%) | 39 (91%) | 58 (92%) | 1 (2%) | 141.3 (<.001) |
Speech/language disorder | ‐ | ‐ | 9 (27%) | 11 (46%) | 32 (74%) | 40 (64%) | 10 (18%) | 44.2 (<.001) |
ADHD | ‐ | ‐ | 5 (15%) | 1 (4%) | 6 (14%) | 16 (25%) | 26 (46%) | 24.0 (<.001) |
ODD/CD | ‐ | ‐ | 0 (0%) | 1 (4%) | 4 (9%) | 2 (3%) | 4 (7%) | 4.4 (.353) |
Anxiety disorder | ‐ | ‐ | 7 (21%) | 8 (33%) | 8 (19%) | 10 (16%) | 18 (32%) | 6.4 (.174) |
Specific learning disorder | ‐ | ‐ | 2 (6%) | 0 (0%) | 1 (2%) | 4 (6%) | 5 (9%) | 3.6 (.460) |
Motor / coordination disorder | ‐ | ‐ | 4 (12%) | 6 (25%) | 24 (56%) | 21 (33%) | 0 (0%) | 45.5 (<.001) |
Depressive disorder | ‐ | ‐ | 5 (15%) | 0 (0%) | 0 (0%) | 0 (0%) | 10 (18%) | 23.8 (<.001) |
Bipolar disorder / mania | ‐ | ‐ | 0 (0%) | 0 (0%) | 0 (0%) | 1 (2%) | 1 (2%) | 1.7 (.789) |
Obsessive compulsive disorder | ‐ | ‐ | 0 (0%) | 0 (0%) | 4 (9%) | 2 (3%) | 2 (4%) | 6.1 (.192) |
Tic disorder | ‐ | ‐ | 0 (0%) | 0 (0%) | 1 (2%) | 1 (2%) | 1 (2%) | 1.2 (.882) |
Feeding / eating disorder | ‐ | ‐ | 0 (0%) | 0 (0%) | 11 (26%) | 10 (16%) | 0 (0%) | 27.5 (<.001) |
Baseline Webcam Evaluation Validity | 57.4 (<.001) | |||||||
1–3 measures valid (n, %) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (4%) | 0 (0%) | 3 (6%) | 0 (0%) | |
4–11 measures valid (n, %) | 9 (22%) | 29 (25%) | 7 (21%) | 10 (42%) | 30 (69.8%) | 30 (47%) | 13 (23%) | |
All measures valid (n, %) | 31 (78%) | 87 (75%) | 26 (79%) | 13 (54%) | 13 (30.2%) | 30 (47%) | 43 (77%) | |
Number of Valid Measures (M, SD) | 11.3 (1) | 11.2 (2) | 11.1 (2) | 9.9 (3) | 10.0 (2) | 9.8 (3) | 11.3 (2) | 6.3 (<.001) |
Note. ASD=autism spectrum disorder. ID/GDD=intellectual disability/global developmental delay, ADHD=Attention-Deficit/Hyperactivity disorder; ODD/CD=oppositional defiant disorder/conduct disorder. Non-ASD diagnoses do not sum to 100% because children could be diagnosed with more than one condition. Note that race/ethnicity categories are not mutually exclusive and participants were encouraged to select all options that apply. For statistical tests with low cell sizes, Fisher’s exact test was also computed, but results were highly consistent with the chi-square analysis. For this reason, chi-square is reported with the associated p-value.
3.3. Evaluation Validity
Evaluation validity was high across all groups, but NFIX, SYNGAP1, and other NDGS groups had higher proportions of individuals with at least one invalid measure (Tables 1). On average, all groups had at least 10 valid performance measures. Participants with reported ID had lower measure validity proportions than participants without ID, but measure validity never dropped below 84% (Table 2).
Table 2.
Valid administration and reliability metrics for webcam-based performance measures.
# | Measure | Stimulus Paradigm | # of indicators | Evaluation Validity Overall % | % Valid no ID |
% Valid ID |
Internal Consistency Reliability (Cronbach’s α) |
1-Month Test-Retest Reproducibility (r) |
4-Month Test-Retest Stability (r) |
---|---|---|---|---|---|---|---|---|---|
1 | Overall Attention | All | 15 | 100% | 100% | 100% | .89 | .52 | .50 |
2 | Attentional Scanning | Processing Speed | 12 | 87% | 89% | 84% | .94 | .66 | .64 |
3 | Positive Emotion | Social | 32 | 100% | 100% | 100% | .93 | .63 | .62 |
4 | Negative Emotion | Social | 32 | 100% | 100% | 100% | .95 | .44 | .38 |
5 | Social Attention | Social | 141 | 92% | 95% | 89% | .89 | .62 | .64 |
6 | Social Preference | Social | 69 | 92% | 95% | 89% | .75 | .48 | .40 |
7 | Face Preference | Social | 28 | 92% | 94% | 88% | .90 | .37 | .29 |
8 | Non-social Preference | Social | 42 | 92% | 95% | 89% | .67 | .31 | .31 |
9 | Receptive Vocabulary | Receptive Vocabulary | 39 | 94% | 96% | 89% | .93 | .73 | .72 |
10 | Speed to Faces | Social | 28 | 92% | 94% | 88% | .93 | .29 | .29 |
11 | Speed to Object | Processing Speed | 12 | 87% | 89% | 84% | .95 | .53 | .51 |
12 | Reading Accuracy | Single-word Reading | 46 | 96% | 99% | 91% | .91 | .68 | .72 |
Note. # of indicators refers to the number of areas-of-interest (these could be whole vidoes or whole stimui if areas-of-interest are combined) included in computing the measure. Validity proportions are given for baseline data and are estimated by including all individuals who attempted to complete the webcam performance paradigm. Fair test-retest reliability values for overall attention are likely due in part to restricted range as many individuals obtain near 95–100% values. Low test-retest reliability values for negative emotion is likely a function of very limited score range with many individuals falling at 0% expression intensity values.
Score distributions were variable across measures, with many showing near normal distributions, and all but negative emotion suggesting a good quantitative range (Appendix 22). The latter was highly skewed and kurtotic with scores clustered close to 0%.
3.4. Reliability
Internal consistency reliability was good to excellent for all performance measures (α=.89–.95; Table 2), with the exception of non-social attention, where reliability was lower but still adequate for a low frequency behavior (α=.67). Test-retest reproducibility estimates were fair or above across 9 of the 12 scales (r=.44–.73), with the two measures based on face processing and the non-social preference measure showing less stability. Test-retest stability was fair or above for 8 of the 12 measures (r=.40–.72), and the highest stability estimates were for receptive vocabulary and single-word reading. Face processing, non-social preference, and negative emotional expression scales showed lower stablity, the latter of which just missed the cutoff for fair test-retest stability. Similar levels were observed when only NDGS patients were examined.
3.5. Convergent and Discriminant Validity
All performance measures, except positive and negative emotional expressiveness, showed strong evidence of convergent and discriminant validity (Table 3). Given the unique nature of gaze-based measures and the difference in measurement modality (gaze vs. informant-report), convergent validity was generally quite good (r=.21–.62). Similarly, discriminant validity estimates were generally quite low (r=.07–.24). The lack of convergent validity for emotional expressiveness measures is likely due to the fact that there were no close behavioral constructs assessed by any available informant-report measure.
Table 3.
Predicted convergent and discriminant validity associations for selected webcam measures.
Webcam Measure | Measures | Average |r| | Measures | Average |r| | t (p) |
---|---|---|---|---|---|
Overall Attention | Estimated IQ, ADHD Symptoms, Executive Functioning | .30 | Anxiety, Mood, Challenging Behavior | .17 | 2.53 (.012) |
Attentional Scanning | Estimated IQ, ADHD Symptoms, Executive Functioning | .43 | Anxiety, Mood, Challenging Behavior | .23 | 4.10 (<.001) |
Positive Emotion | Mood-Hypomania, Anxiety | .10 | Motor, Daily Living Skills | .07 | 0.47 (.635) |
Negative Emotion | Mood-Emotion Regulation, Anxiety | .09 | Motor, Daily Living Skills | .11 | −0.33 (.742) |
Social Attention | Autism Symptoms | .55 | Anxiety, Mood | .23 | 6.95 (<.001) |
Social Preference | Social Communication / Interaction Symptoms | .36 | Anxiety, Mood | .16 | 3.69 (<.001) |
Face Preference | Social Communication / Interaction Symptoms | .26 | Anxiety, Mood | .12 | 2.50 (.013) |
Non-social Preference | Social Communication / Interaction Symptoms, Restricted / Repetitive Behavior | .21 | Anxiety, Mood | .09 | 2.27 (.024) |
Receptive Vocabulary | Estimated IQ, Speech Level, Social Communication / Interaction Symptoms | .29 | Anxiety, Mood, Sleep | .14 | 2.38 (.018) |
Speed to Faces | Social Communication / Interaction Symptoms | .25 | Anxiety, Mood, Challenging Behavior | .12 | 2.43 (.016) |
Speed to Object | Estimated IQ | .47 | Anxiety, Mood, Challenging Behavior | .24 | 3.70 (<.001) |
Reading Accuracy | Reading Fluency Level | .62 | Anxiety, Mood, Sleep | .14 | 8.05 (<.001) |
Note. Convergent and discriminant validity correlations were averaged after conversion to Fisher’s z and then re-converted to correlations. Average convergent and discriminant validity correlations were compared using the test of dependent correlations with the nuisance correlation being the average of the inter-correlations between the convergent and discriminant validity measures.
Inter-correlations among the performance measures tended to be small to moderate (Appendix 23), with a few notable exceptions (speed to faces with face preference r=−.79 and receptive vocabulary with reading accuracy r=.78). The former may suggest redundancy of these measures but the latter correlation is likely due to the close relationship between vocabulary and reading and represents a realistic estimate of the association of these two constructs.
3.6. Concurrent Validity with ID, ASD Diagnosis, and Autism Symptom Level
Participants with ID showed statistically significant differentiation across all performance measures (Table 4), including lower levels of general attention, attentional scanning, social attention, social preference, face preference, receptive vocabulary, single-word reading, and slower speed to faces and objects. Interestingly, individuals with ID showed high positive and negative emotional expressiveness.
Table 4.
Descriptive statistics for webcam-collected performance measures across cases with and without Intellectual Disability (ID).
No ID n=224 |
ID n=151 |
|||||
---|---|---|---|---|---|---|
M (sd) | M (sd) | Raw Δ | t (p) | Cohen’s d | ||
Overall Attention (%) | 82.1 (14) | 70.5 (17) | +11.6% (1.8 min total) | 7.1 (<.001) | .75 | |
Attentional Scanning (Count) | 11.6 (3.4) | 7.9 (2.5) | +3.7 glances to each target | 9.9 (<.001) | 1.20 | |
Positive Emotion (%) | 6.4 (8.7) | 10.3 (9.1) | −3.9% intensity | −4.2 (<.001) | −.44 | |
Negative Emotion (%) | 2.2 (3.0) | 3.4 (4.1) | −1.2% intensity | −3.3 (.001) | −.35 | |
Social Attention (z) | −.02 (1.0) | −1.52 (1.3) | +1.5 control SDs | 11.9 (<.001) | 1.14 | |
Social Preference (FD) | 1.4 (0.3) | 1.2 (0.3) | +0.2 seconds per AOI | 6.0 (<.001) | .68 | |
Face Preference (FD) | 1.3 (0.8) | 0.8 (0.5) | +0.5 seconds per AOI | 6.1 (<.001) | .70 | |
Non-social Preference (FD) | 1.1 (0.4) | 1.2 (0.4) | −0.1 seconds per AOI | −2.1 (.038) | −.24 | |
Receptive Vocabulary (FD) | 41.9 (25.7) | 17.1 (13.6) | +24.8 seconds to all targets | 10.1 (<.001) | 1.13 | |
Speed to Faces (TFF) | 7.2 (2.1) | 8.0 (1.8) | −0.8 seconds per AOI | −3.3 (<.001) | −.37 | |
Speed to Object (TFF) | 4.9 (1.3) | 6.1 (1.2) | −1.2 seconds per AOI | −7.2 (<.001) | −.87 | |
Reading Accuracy (FD) | 37.9 (22.8) | 16.6 (14.1) | +21.3 seconds to all targets | 9.2 (<.001) | 1.06 |
Note. ID=Intellectual disability (defined as parent-report of ID/GDD or estimated IQ<170). Overall attention (%) is the percentage of time on screen throughout all stimulus paradigms. Count=sum of glances to all targets averaged across stimuli. TFF=time to first fixation – values represent averages across all stimuli, including those that were not fixated where the length of the stimulus was imputed. AOI=area-of-interest. Values for positive and negative emotion represent estimated intensities with a range of 0–100%. Higher values are preferable for all measures except Speed to Faces and Speed to Objects where higher values indicate slower time to the AOIs, Non-Social Preference where higher values indicate a preference for non-social information, and Positive and Negative Emotion measures where higher scores simply indicate more expressiveness. Social attention is presented as a z-score (based on the neurotypical control mean) because this measure is created by averaging multiple different metrics (fixation duration, fixation count, and time-to-first fixation) after standardization.
Across subsamples, timepoints, and ages, the social attention measure showed moderate to high correlations (r=.32–.62) with autism symptom level (Appendix 24). Similarly, concurrent validity with ASD diagnosis consistently fell in the good to excellent range (AUC=.69–.88; Appendix 25), with evidence that diagnostic validity is maintained across evaluation timepoints. Dividing the social attention measure into clinically-useful score ranges, multi-level likelihood ratios suggest meaningful reductions in ASD probability for low scores (z≤0.1) and increases in ASD probability for high scores (z≥1.81). The optimal cut score was 1.49 resulting in 70% sensitivity and 87% specificity (Appendix 26).
3.7. Group Profiles Across Performance Measures
Group differences were statistically significant across all performance measures (largest p=.041; eta-squared=.04–.36). In general, NFIX, SYNGAP1, and other NDGS showed a more impaired neurobehavioral phenotype, including lower attention, higher non-social preference, worse receptive vocabulary and single-word reading, and slower speed to faces and objects (Figure 1). PHTS patients showed lower social attention and social preference and higher non-social preference, consistent with high rates of ASD in this group, but only mild reductions in receptive vocabulary and single-word reading and no deficits in overall attention or attentional scanning. Interestingly, SYNGAP1 and other NDGS patients had higher negative emotional expressiveness scores, while NFIX patients and other NDGS showed higher positive emotional expressiveness scores, implying syndrome-specific patterns even among more significantly impaired groups (Appendix 27). Taken together, these findings provide preliminary evidence of concurrent (known-groups) validity of performance measures.
Figure 1.
NDGS group differences across webcam measures.
Note. SC=sibling controls, UC=unrelated controls, PHTS=PTEN Hamartoma Tumor Syndrome. Other NDGS=other neurodevelopmental genetic syndromes, and NDD=idiopathic neurodevelopmental disability.
4. Discussion
This research aimed to describe a comprehensive process of creating a set of objective webcam-collected measures, derived using artificial intelligence algorithms for capturing gaze and facial expression information, and based on the gold-standard measurement development guidelines (Boateng et al., 2018) as well as principles of inclusive research practices (FDA, 2009). The process involved both clinicians-scientists and families and was undertaken to provide a preliminary validation of these patient performance measures by examining a range of key psychometric characteristics. Results suggest that these measures may serve as promising new objective evaluation tools that can be useful complements to our recently validated informant-report survey scales (Frazier et al., 2023), permitting multi-method characterization of key social and cognitive characteristics among individuals with NGDS. To our knowledge, the webcam measures and associated survey instruments are the first dedicated set specifically developed to assess a wide range of neurobehavioral and neurodevelopmental presentations seen in NGDS, including individuals with significant cognitive challenges. This initial validation demonstrated that the performance measures are psychometrically-sound instruments with potential utility in characterizing the varied clinical and functional spectra seen in many people with NDGS and idiopathic NDD. The validation further highlights the potential value of artificial intelligence / machine learning algorithms for collecting key biometric information that can be used to better understanding individuals with NDGS.
All of the measures showed strong evaluation validity and can be collected in many individuals with mild to moderate cognitive dysfunction. There was a clear gradient of invalid collection in people with more severe cognitive dysfunction, but some individuals reported to be at the more severe levels could validly complete one or more performance measures. Scale reliability was fair to excellent across all webcam measures, indicating good ability to measure individual differences cross-sectionally across each of the neurobehavioral processes assessed. Test-retest reproducibility and stability were at least acceptable across the majority of measures. Specifically, test-retest reliability was good for attentional scanning, positive emotional expressiveness, social attention, receptive vocabulary, and single-word reading and was fair for sustained attention, social preference, and speed to objects. This indicates that changes in these measures are relatively stable over time, increasing the likelihood that changes reflect real differences in neurobehavioral functioning. Test-retest reliability estimates were lower for negative emotional expressiveness, non-social attentional preference, face preference, and speed to faces. When considered in light of adequate or better scale reliability for these measures, the present results suggest these measures may be more state-like in nature. Observations of the score distributions for negative emotional expression and non-social preference suggest that lower test-retest reliability for these measures may be influenced by floor effects and, therefore, may be under-estimated. Future work is needed to examine score stability over a longer time interval to ensure an adequate balance of stability and sensitivity to change. If sensitivity to change is demonstrated, the quantitative nature, relative brevity, and high evaluation validity of webcam measures might allow for more frequent assessments in the context of intervention studies, thereby increasing statistical power and reducing the sample size needed for clinical trials. This is particularly important for studies of rare NDGS.
Lower test-retest reliability for measures of face processing is intriguing and may be due to factors influencing attention to faces, including the fact that many stimuli included multiple faces as well as other target or background stimuli. It is possible that follow-up evaluations may bias attention towards novel faces (faces not processed as comprehensively in the baseline assessment) or other novel environmental stimuli. It is also possible that face processing is simply more state-like in nature, with reliable collection at each assessment, but rapid changes in quantitative level across hours or days. Future work is needed to tease out these possibilities and examine whether stimulus complexity moderates stability for these measures. Beyond floor effects, lower stability for non-social preference is likely, in part, a function of the less frequent nature of attention to socially-irrelevant information. It may also be useful for future iterations of the social stimuli to include a larger number of non-social or background objects to increase the reliability of this measure. Lower stability for negative emotional expressiveness may be, at least partly, due to the low number of negative facial expressions observed across all participants and is likely influenced by the state-like nature of emotional expressiveness. Adding stimuli that specifically pull for negative emotionality could enhance the test-retest reliability of this measure. Even with these exceptions, all performance measures showed group differences in the baseline data collection, suggesting good known-groups validity and potential value for cross-sectional characterization.
Given their scalability, webcam-collected performance measures also may have utility in clinical contexts for supplementing collection of traditional neurobehavioral measures, allowing more frequent collection between clinical visits, great inclusion in research, and higher quality data via home-based collection. If offered at minimal cost with automated administration, scoring, and reporting functions to reduce clinician burden, these measures could become a key part of ongoing developmental monitoring strategies. This is further supported by the brevity (max 15 minutes) of administering all 4 paradigms and the potential to collect only those measures that are relevant to a given patient in future clinical assessments. Future research and collection of large-scale normative data is warranted to determine whether this potential clinical value might be realized and, more importantly, to further evaluate psychometric performance.
Finally, the present results provide preliminary evidence of concurrent (known-groups) validity of webcam measures across NDGS and in comparison to neurotypical controls and idiopathic NDD. The pattern of substantial reductions in many cognitive processes in NFIX, SYNGAP1, and other NDGS is consistent with our recently published informant-report patterns for many neurobehavioral domains (Frazier et al., 2023). Interestingly, there are some unique patterns among these groups, particularly in the pattern for positive and negative emotional expressiveness, but also in the magnitude of impairments for other domains. For example, people with SYNGAP1 mutations showed generally worse attention, slower processing speed to faces and objects, and lower social but higher non-social preference than people with NFIX mutations.
Relative to other NDGS groups, individuals with PHTS tended to show a less impacted social and cognitive profile. Specifically, this group showed no significant impairment in overall attention, attentional scanning, or processing speed measures and only slight reductions in receptive vocabulary and reading accuracy. This is consistent with a spectrum of neurobehavioral dysfunction in PHTS (Busch et al., 2023) and the observation that many individuals have either no or mild reductions in neurocognitive function relative to normative expectation (Busch et al., 2013). Additional data collection in larger NDGS samples will be required to replicate and extend the findings reported here. This work will also need to evaluate the influence of additional clinical factors (e.g., seizures, ID, etc.) on developmental trends.
Several limitations of the current study warrant mention. The genetic syndromes included in this study have a low prevalence and, thus, sample sizes remain modest, particularly given the wide age range. While our power analysis indicated at least adequate power for group comparisons and psychometric analyses were well powered in the full sample, our current data should nevertheless be treated as preliminary, and studies with larger group sample sizes should be completed to replicate our findings and ensure they generalize to the larger population of these NDGS. Given the online nature of the research, it was not feasible to conduct in-person clinical characterization. As a result, this study could not independently confirm the diagnostic status of participants and was not able to administer dedicated in-person cognitive and behavioral assessments. However, previous studies have demonstrated that parent-report of children’s IQ strongly correlates with standardized clinical IQ testing (Shu et al., 2022), and a substantial minority of estimates in this study were based on prior testing (42%). Future work should collect well-validated in-person cognitive assessments to more accurately characterize the sample and examine how webcam measures relate to traditional standardized measures of cognitive and behavioral functioning.
Longitudinal investigations with larger NDGS samples and longer follow-up will also be critical for evaluating age effects and changes in neurobehavioral processes across development, as well as sensitivity to intervention effects. Further, given the preliminary nature of this study, it was not possible to include a comprehensive set of additional instruments to establish convergent and divergent validity. Thus, additional validation work, including convergent and discriminant validity analyses, is needed to provide further support for these webcam measures.
In spite of noted limitations, the present results suggest that webcam-collected gaze and facial expression-based performance measures are promising with evidence that they may function as reliable and valid assessment tools, covering key social and cognitive domains not easily evaluated by informant-report surveys. As such, they may be useful for detailed phenotypic characterization and, ultimately, as reliable, objective, and feasible outcome measures in clinical trials. With additional validation, and sufficient norming, these measures could also facilitate surveillance and clinical assessment for NDGS and idiopathic NDD.
5. Conclusions
The present study provides preliminary evidence that webcam-collected performance measures, derived using artificial intelligence algorithms for capturing gaze and facial expression data, can reliably capture individual and between group differences in neurobehavioral function. Future longitudinal investigations with larger NDGS and idiopathic NDD samples will be crucial to further evaluate these measures and determine their potential clinical and research utility.
Acknowledgments
We are sincerely indebted to the generosity of the families and individuals who contributed their time and effort to this study. We would also like to thank the PTEN Hamartoma Tumor Syndrome Foundation, the PTEN Research Foundation, the SYNGAP Research Fund, the Malan Syndrome Foundation, and the ADNP Kids Foundation for their support of this project.
We are grateful to all of the families at the participating Simons Searchlight sites as well as the Simons Searchlight Consortium, formerly the Simons VIP Consortium. We also appreciate obtaining access to the phenotypic data on SFARI Base. Approved researchers can obtain the Simons Searchlight population dataset described in this study by applying at https://base.sfari.org.
CE is the Sondra J. and Stephen R. Hardis Endowed Chair of Cancer Genomic Medicine at the Cleveland Clinic and an ACS Clinical Research Professor. MS is the Rosamund Stone Zander Chair at Boston Children’s Hospital.
Conflict of Interest
Dr. Frazier has received funding or research support from, acted as a consultant to, received travel support from, and/or received a speaker’s honorarium from the PTEN Research Foundation, SYNGAP Research Fund, Malan Syndrome Foundation, ADNP Kids Research Foundation, Quadrant Biosciences, Autism Speaks, Impel NeuroPharma, F. Hoffmann-La Roche AG Pharmaceuticals, the Cole Family Research Fund, Simons Foundation, Ingalls Foundation, Forest Laboratories, Ecoeos, IntegraGen, Kugona LLC, Shire Development, Bristol-Myers Squibb, National Institutes of Health, and the Brain and Behavior Research Foundation, is employed by and has equity options in Quadrant Biosciences/Autism Analytica, has equity options in MaraBio and Springtide, and has an investor stake in Autism EYES LLC and iSCAN-R. Dr. Kolevzon has received funding or research support from, or acted as a consultant to ADNP Kids Research Foundation, David Lynch Foundation, Klingestein Third Generation Foundation, Ovid Therapeutics, Ritrova Therapeutics, Acadia, Alkermes, Jaguar Therapeutics, GW Pharmaceuticals, Neuren Pharmaceuticals, Scioto Biosciences, and Biogen. Dr. Sahin reports grant support from Novartis, Biogen, Astellas, Aeovian, Bridgebio, and Aucta. He has served on Scientific Advisory Boards for Novartis, Roche, Regenxbio, SpringWorks Therapeutics, Jaguar Therapeutics and Alkermes. Dr. Hardan is a consultant to Beaming Health and IAMA therapeutics. He also has equity options in Quadrant Biosciences/Autism Analytica, and has an investor stake in iSCAN-R. Dr. Shic has acted as a consultant to F. Hoffmann-La Roche AG Pharmaceuticals and Jansen Pharameuticals. The remaining authors have no competing interests to disclose.
Funding:
This study was funded by the PTEN Research Foundation (to Frazier and Uljarević), with additional support from the SYNGAP Research Fund, the Malan Syndrome Foundation, the ADNP Kids Foundation, Autism Speaks, and the Simons Foundation Autism Research Initiative. The content is solely the responsibility of the authors and does not necessarily represent the official views of any of the funding agencies.
Appendix 1. Performance paradigm creation process.
Detailed description: The process included adapting social stimuli from our prior work and identifying cognitive paradigms that could be collected without speech or motor responses; identifying target items and creating individual stimuli for each paradigm; collecting feasibility data from 15 participants, including several young children with neurodevelopmental disabilities, across a wide age span (3 to 68) to evaluate online data collection viability and inform grading of stimulus and item difficulty (where applicable); updating the stimulus paradigms and administering to 10 clinician-scientist experts, 8 NDGS participants (5 PTEN, 1 SYNGAP1, 1 NFIX, and 1 ADNP), and 1 idiopathic NDD participant with ASD; and post-evaluation questionnaire to assess the ease-of-completion and any potential issues arising from performance paradigm administration.
Appendix 2. Receptive vocabulary target selection and stimulus creation.
Receptive vocabulary words were selected to be graded from very easy to moderately difficult. Specifically, to identify words the research team first reviewed lists of vocabulary words from infant/toddler/preschool through high school and SAT word lists using common websites https://www.education.com/ www.time4learning.com, www.vocabulary.com, etc.). Up to 16 words were selected for each level (infant/toddler, preschool, kindergarten, grades 1–3, grades 4–6, grades 7–12, common SAT words). Emphasis was given to easier words because of the desire to create stimuli which best differentiate between very low vocabulary levels (SS<55, age-equivalent<3), low vocabulary levels (55<SS<70, 3<age-equivalent<6), borderline vocabulary levels (70<SS<80, 6<age-equivalent<10), and low average vocabulary levels (80<SS<100, 10<age-equivalent<12). Once words were identified, the iWeb: 14 Billion Word Web Corpus (https://www.english-corpora.org/iweb/) was used to examine word frequency. The iWeb English language corpus data is based on the one-billion-word Corpus of Contemporary American English (COCA) -- the only corpus of English that is large, up-to-date, and balanced between many genres. Potential words were then sorted for word frequency. Words were then compared to existing receptive vocabulary tests (Peabody Picture Vocabulary Test – Fifth Edition and Receptive One-Word Picture Vocabulary Test-4) to identify approximate age levels for sets of words. This was done by comparing the word frequency of words from the existing instruments to the word frequency of identified possible words.
Final word sets were then chosen to be comparable to the lowest age levels (~ages 2–3) of existing instruments with very high word frequency (>990,000) (8 words chosen), preschool to grade 1 (~ages 4–6) with high word frequency (200k to 900k) (7 words chosen), grade 2 to grade 5 (~ages 7–10) with moderate word frequency (100k to 199k) (7 words chosen), grade 6–12 (~ages 11–17) with moderate to low word frequency (15k to 99k) (10 words chosen), grade 13+/SAT (~ages 18+) with low to very low word frequency (<15k) (5 word chosen). This resulted in 37 total words. Clip art and photos were then chosen to represent each word with selections verified by the principal investigator and a parent of a child with autism spectrum disorder and intellectual disability not affiliated with the investigative team.
A set of 37 distractor words were also identified to be roughly equivalent in word frequency to the target words. Pictures were also chosen for these distractors in the same manner as described above.
Appendix 3. Single word reading test item selection and stimulus creation
A similar set of procedures for item selection as described above were followed in development of the single word reading test with the following exceptions:
Words were chosen by first inspecting existing word reading tasks (WIAT-3, WRAT-4, etc.), identifying words from the above receptive vocabulary lists that were not used, and looking for words of comparable word frequency to those used in existing single word reading tests. Word lists for common 2-letter through 6-letter words were searched for easy to moderate difficulty words. Synonyms and words with similar pronunciation difficulty to difficult words form existing tests were used to populate potential difficulty words.
To ensure heavy coverage of easier words to allow detection of simple reading in impaired individuals, the focus was placed on 2-letter through 5-letter words. Four 2-letter words, ten 3-letter words, four 4-letter words, and five 5-letter words that were considered very to moderately easy to read were chosen for the final list (23 easy reading difficulty words). Next, two 5-letter words, four 6-letter words, and six 7-letter words of moderate difficulty were chosen (12 moderate reading difficulty words). Finally, eleven 5–10 letter words deemed of moderate to high reading difficulty rounded out the final list (46 total words). Difficulty was assessed by matching each word to words of similar length and complexity on existing single word reading tests and by inspecting word frequency results from COCA (See link above for Corpus of Contemporary American English). Specifically, word length and word frequency were used to identify very easy (2- and 3-letter words with very high frequency – 1,000,000+), easy (4-letter and 5-letter words with high to very high frequency – 250,000+) moderate (5–7 letter words with silent letters in pronunciation and/or moderate frequency to very high frequency – 50k+) and difficult words (6–10 letters with complex pronunciation and generally low word frequency <250k).
Stimulus creation
A similar set of procedures for stimulus creation were used to those implemented for the receptive vocabulary test except:
Rather than pictures representing vocabulary words, words were spelled out in white font on a black screen.
Words were divided into 14 stimuli of various sizes – 2 target words, 3 target words, and 4 target words. Stimuli with 2–3 target words will be presented in varying arrays (left/right; top/bottom; top/middle/bottom, 4 squares). See table 2 for the target word, spatial array, and timing for each word.
Since multiple target words are presented for each stimulus, each target will be randomly arrayed so that participants cannot anticipate which word to find first.
Each stimulus will be presented for four seconds with the first 1–1.5 seconds being the voiceover “Find the word (target word)”, with the exception of the final 8 most difficult words which will be presented for 5 seconds.
for the most difficult words, two words beginning with the same letter will be placed in each array to avoid people guessing which word by the initial word sound.
Appendix 4. Screenshots of example social paradigm stimuli.
Facial Affect ID
Joke
Joint Attention
Social vs. Abstract
Naturalistic Scene
Appendix 5. Screenshots of example processing speed paradigm stimuli.
Appendix 6. Screenshots of example receptive vocabulary paradigm stimuli.
Directive: “Look at the Baby”
Directive: “Look at the Fish”
Appendix 7. Screenshots of example single-word reading paradigm stimuli.
Directive: “Find the word ‘it’”
Directive: Find the word ‘on’”
Appendix 8. Stimulus order and composition for all social attention stimuli.
Stimulus # | Stimulus Type | Duration (sec) |
---|---|---|
1 | Instructions | 4.1 |
2 | Facial Affect ID | 6 |
3 | Facial Affect ID | 6 |
4 | Facial Affect ID | 6 |
5 | Facial Affect ID | 6 |
6 | Joke | 6.5 |
7 | Break | 0.7 |
8 | Joint Attention | 4.6 |
9 | Joke | 6.8 |
10 | Break – blank screen | 0.7 |
11 | Social vs Abstract | 8 |
12 | Social vs Abstract | 6 |
13 | Instructions | 4.4 |
14 | Facial Affect ID | 6 |
15 | Facial Affect ID | 6 |
16 | Facial Affect ID | 6 |
17 | Facial Affect ID | 6 |
18 | Joke | 5.9 |
19 | Break – blank screen | 0.7 |
20 | Joint Attention | 4.3 |
21 | Joke | 7.3 |
22 | Break – blank screen | 0.7 |
23 | Social vs Abstract | 8 |
24 | Social vs Abstract | 6.5 |
25 | Instructions | 4.4 |
26 | Joint Attention | 3.7 |
27 | Joint Attention | 4.5 |
28 | Social vs. Abstract | 8 |
29 | Joint Attention | 4 |
30 | Joint Attention | 4 |
31 | Social vs. Abstract | 5.9 |
32 | Naturalistic Scene | 12 |
33 | Naturalistic Scene | 12 |
34 | Instructions | 4.4 |
35 | Naturalistic Scene | 10 |
36 | Joint Attention | 3.8 |
37 | Naturalistic Scene | 7.8 |
38 | Joke | 6.7 |
39 | Break - blank screen | 0.7 |
40 | Social vs. Abstract | 6 |
41 | Naturalistic Scene | 7.7 |
42 | Social vs. Abstract | 6 |
43 | Naturalistic Scene | 10.5 |
Note. Facial affect ID = side-by-side faces with instructions to look at a specific facial expression. Joke stimuli involved a person telling a corny joke. Social vs. Abstract included half the screen with an abstract shape or numerical representation and the other half with one or more people interacting. Joint attention scenes involved a variety of target and distractor objects with one person pointing toward and/or directing their gaze toward the target objects. Naturalistic scenes involved people interacting in various ways (e.g., having a conversation, playing a board game, entering an elevator, etc.).
Appendix 9. Stimulus order and composition for all processing speed stimuli.
Trial # | # of Stimuli | # of Target Stimuli | # of Distractor Stimuli | Duration (sec) | Target Stimuli | Distractor Stimuli |
---|---|---|---|---|---|---|
1 | 5 | 3 | 2 | 7 | Pink Flower | Green Tree, Green Leaf |
2 | 7 | 4 | 3 | 7 | Yellow Star | Pink Hearts, Blue Circles, |
3 | 7 | 4 | 3 | 8 | Shoe | Shirts, Sweater |
4 | 9 | 5 | 4 | 10 | White truck | Blueberry, White Airplane |
5 | 9 | 5 | 4 | 10 | Fork | Spoon, Knife |
6 | 11 | 6 | 5 | 10 | Fly | Ant, Ladybug |
7 | 11 | 6 | 5 | 10 | Red Apple | Red Ball, Red Balloon |
8 | 11 | 6 | 5 | 10 | Orange Hat | Orange Cup, Pumpkin |
9 | 13 | 7 | 6 | 10 | Bird Head | Cat Head, Lion |
10 | 15 | 8 | 7 | 10 | Jelly Fish | Octopus, Squid |
11 | 15 | 8 | 7 | 10 | Cart | Truck, Pile |
12 | 15 | 8 | 7 | 10 | Rook | Pawn, Bishop |
Appendix 10. Stimulus order and composition for all receptive vocabulary stimuli.
Stimulus # | Stimulus Type | Target # | Target Word | Distractor(s) | Duration (sec) | Position |
---|---|---|---|---|---|---|
1 | 2X2 | 1 | baby | hat | 5 | 1 |
2 | 2X2 | 2 | fish | lion | 5 | 2 |
3 | 2X2 | 3 | shoes | sweater | 5 | 2 |
3 | 2X2 | 4 | apple | banana | 5 | 3 |
4 | 2X2 | 5 | eating | ring | 5 | 4 |
4 | 2X2 | 6 | ball | umbrella | 5 | 1 |
5 | 2X2 | 7 | drinking | eating | 5 | 3 |
5 | 2X2 | 8 | running | mouth | 5 | 4 |
6 | 2X2 | 9 | socks | pants | 5 | 4 |
6 | 2X2 | 10 | sleeping | dancing | 5 | 3 |
7 | 2X2 | 11 | kicking | drinking | 5 | 2 |
7 | 2X2 | 12 | fence | gift | 5 | 4 |
8 | 3X2 | 13 | mouth | ball | 5 | 4 |
8 | 3X2 | 14 | umbrella | kicking | 5 | 5 |
8 | 3X2 | 15 | muffin | plant | 5 | 6 |
9 | 3X2 | 16 | ring | raccoon | 5 | 3 |
9 | 3X2 | 17 | fountain | muffin | 5 | 2 |
9 | 3X2 | 18 | elbow | fence | 5 | 5 |
10 | 3X2 | 19 | dentist | timber | 5 | 4 |
10 | 3X2 | 20 | aquarium | culinary | 5 | 1 |
10 | 3X2 | 21 | yacht | miniature | 5 | 5 |
11 | 3X2 | 22 | culinary | gesture | 5 | 2 |
11 | 3X2 | 23 | compass | toxic | 5 | 3 |
11 | 3X2 | 24 | wedge | fungus | 5 | 6 |
12 | 3X2 | 25 | wrench | gauge | 5 | 6 |
12 | 3X2 | 26 | reptile | dictator | 5 | 5 |
12 | 3X2 | 27 | trumpet | virtuoso | 5 | 1 |
13 | 4X2 | 28 | gift | fountain | 5 | 4 |
13 | 4X2 | 29 | jewelry | elbow | 5 | 2 |
13 | 4X2 | 30 | map | shirt | 5 | 6 |
13 | 4X2 | 31 | raccoon | eating | 5 | 8 |
14 | 4X2 | 32 | duet | banister | 5 | 8 |
14 | 4X2 | 33 | noxious | irregular | 5 | 2 |
14 | 4X2 | 34 | admonish | parallel | 5 | 3 |
14 | 4X2 | 35 | aviator | physician | 5 | 6 |
15 | 4X2 | 36 | carnivore | herbivore | 5 | 7 |
15 | 4X2 | 37 | speedometer | thermometer | 5 | 5 |
15 | 4X2 | 38 | amorphous | admonish | 5 | 2 |
15 | 4X2 | 39 | virulent | apathetic | 5 | 3 |
Note. 2X2 stimuli present 4 objects in each quadrant of the screen. 3X2 stimuli present 3 objects across the top row and 3 across the bottom row. 4X2 stimuli present 4 objects across the top row and 4 objects across the bottom row. Position is listed in order from top left to bottom right.
Appendix 11. Stimulus order and composition for all single-word reading stimuli.
Stimulus # | Stimulus Orientation | Target # | Target Word | Distractor Word(s) |
---|---|---|---|---|
1 | left/right | 1 | it | to |
2 | left/right | 2 | so | up |
3 | top/bottom | 3 | me | do |
3 | top/bottom | 4 | on | |
4 | left top/middle/right bottom | 5 | dog | hot, eat |
5 | left top/middle/right bottom | 6 | win | buy |
5 | left top/middle/right bottom | 7 | car | |
6 | left top/middle/right bottom | 8 | map | bag, hard |
7 | left top/middle/right bottom | 9 | few | how |
7 | left top/middle/right bottom | 10 | out | |
8 | left top/middle/right bottom | 11 | run | set, why |
9 | right top/middle/left bottom | 12 | cat | leg |
9 | right top/middle/left bottom | 13 | all | |
10 | right top/middle/left bottom | 14 | boy | own, fly |
11 | 4 squares | 15 | tree | throw |
11 | 4 squares | 16 | from | find |
12 | 4 squares | 17 | time | take |
12 | 4 squares | 18 | fall | fine |
13 | 4 squares | 19 | large | leave |
13 | 4 squares | 20 | cheat | chain |
14 | 4 squares | 21 | adult | assist |
14 | 4 squares | 22 | spoon | stone |
15 | 4 squares | 23 | orange | onion |
15 | 4 squares | 24 | silver | stream |
16 | 4 squares | 25 | people | program |
16 | 4 squares | 26 | office | often |
17 | 4 squares | 27 | stretch | soldier |
17 | 4 squares | 28 | match | model |
18 | 4 squares | 29 | service | student |
18 | 4 squares | 30 | magic | marry |
19 | 4 squares | 31 | capacity | citizen |
19 | 4 squares | 32 | facility | foreign |
20 | 4 squares | 33 | railway | rapidly |
20 | 4 squares | 34 | opinion | object |
21 | 4 squares | 35 | listener | logical |
21 | 4 squares | 36 | absolute | abandon |
22 | 4 squares | 37 | aircraft | apparent |
22 | 4 squares | 38 | language | launch |
23 | 4 squares | 39 | flaccid | freckle |
23 | 4 squares | 40 | regiment | resilient |
24 | 4 squares | 41 | factitious | fungible |
24 | 4 squares | 42 | reprimand | repudiate |
25 | 4 squares | 43 | generic | glorious |
25 | 4 squares | 44 | niche | nausea |
26 | 4 squares | 45 | neurotic | neophyte |
26 | 4 squares | 46 | gnarled | gimmicky |
Appendix 12. Pilot testing post-evaluation questions for clinician-scientist-experts.
Expert Feedback | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Device type performance measure was completed on: | Mini Tablet | Standard Tablet | Laptop with internal webcam | Laptop with external webcam | Desktop with internal webcam | Desktop with external webcam | |||||||||||||
What is the screen size on the device you used? | Less than 10 inches | 10 – 12 inches | 12 – 18 inches | Greater than 20 inches | |||||||||||||||
Is there anything you felt would have been helpful to know before beginning the [paradigm measure]? | (text entry) | ||||||||||||||||||
Please rate the overall relevance of the [paradigm measure] to the neurodevelopment / genetic disorder you represent: | Extremely relevant | Very relevant | Somewhat relevant | Not relevant at all | |||||||||||||||
Please rate the instructions for the section of the assessment: | Very clear | Somewhat clear | Somewhat difficult to follow | Very difficult to follow | |||||||||||||||
Specific comments regarding the instructions for this section: | (text entry) | ||||||||||||||||||
Please rate the quality of the audio during the section of this assessment: | The audio was very clear | The audio was somewhat clear | The audio was not clear | ||||||||||||||||
Please rate the quality of pictures used during this section of the assessment | High quality | Medium quality | Low quality | ||||||||||||||||
If you answered that some or all photos were low quality, please indicate which photos you felt were not high quality | (text entry) | ||||||||||||||||||
Please rate the timing of the assessment: | Very fast | Somewhat fast | Neither slow nor fast | Somewhat slow | Very slow | ||||||||||||||
Specific comments regarding the timing for the assessment | (text entry) | ||||||||||||||||||
How appropriate was the level of difficulty of the [paradigm measures] targets? | Very appropriate | Appropriate | Somewhat inappropriate | Inappropriate | |||||||||||||||
Please share any concerns you have regarding the level of difficulty or the array of the [paradigm measure] targets: | (text entry) | ||||||||||||||||||
Please check below any specific concerns you have (check all that apply) | Too many easy targets | Too many hard targets | Too few easy targets | Too few hard targets | Too many moderate difficulty targets | ||||||||||||||
Too few moderate difficulty targets | Targets were too close together | Targets were too far apart | Target array was not appropriate | Other concerns (text entry) | |||||||||||||||
What aspects of completing measure do you think will be the most difficult for participants? | (text entry) | ||||||||||||||||||
Is there anything you feel that could help a caregiver or guardian administer this measure at home? | (text entry) | ||||||||||||||||||
Any additional comments regarding the measure you feel researchers should be aware of? | (text entry) |
Appendix 13. Pilot testing post-evaluation questions for parents assisting patient participants.
Parents assisting participants were asked the following sets of questions for each paradigm:
Overview
Breaks / Eye Calibration
Environment
Open answer
Support of participants
Paradigm Specifics
Overview (asked for all paradigms) | ||||||
---|---|---|---|---|---|---|
Please rate your overall experience with this assessment. | Extremely positive | Somewhat positive | Neither positive nor negative | Somewhat negative | Extremely negative | |
How easy / hard was it for you to complete this assessment? | Extremely easy | Somewhat easy | Neither easy nor hard | Somewhat hard | Extremely hard | |
What type of device did you complete this assessment on? | Tablet with stand | Laptop | PC/MAC Desktop Computer | Other (text entry) |
||
Were you sitting still during the video? | Yes | No | ||||
Did you engage in any sensory movements (rocking back and forth, hand flicking or flapping, going up and down on tippy toes, etc.)? | Yes | No | ||||
Please provide specific details on sensory movements you engaged in during the video? | (text entry) |
Breaks / Eye Calibration (asked for all paradigms) | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
How many breaks did you need to complete the [paradigms name] performance measure? | No breaks | One break | Two or three breaks | Four or five breaks | |||||||||
How long were the breaks on average? | No breaks taken | Less than 5 minutes | 5 – 15 minutes | 16 – 30 minutes | 30 minutes to 1 hour | 1 hour or more | |||||||
How often did you look away? | Very often | Often | Sometimes | Infrequently | Very infrequently | Did not look away from the screen | |||||||
How often did you cover or touch your face during the assessment? | Very often | Often | Infrequently | Very infrequently | Did not touch face during the assessment | ||||||||
Did you end the video early at any point? | Yes | No |
Environment (asked for all paradigms) | ||||||
---|---|---|---|---|---|---|
During any point of the assessment, did an unexpected noise occur within the environment? | No occurrences of unexpected noises | 1 occurrence | 2 – 3 occurrences | 4 – 5 occurrences | 5+ occurrences of unexpected noises | |
During any point of the assessment, were you required to adjust the lighting in the room? | No adjustments to lighting | 1 adjustment | 2–3 adjustments | 4–5 adjustments | 5+ adjustments | |
During any point of the assessment, did you experience internet connection difficulties (i.e. disconnection, weak connection, slow speed, etc.) | No internet difficulties | 1 occurrence | 2 – 3 occurrences | 4 – 5 occurrences | 5 + occurrences |
Ease of Completion (asked for all paradigms) | |
---|---|
What gave you the most difficulty when completing this assessment? | (text entry) |
What was the easiest thing about completing this assessment? | (text entry) |
Assistance (asked for all paradigms) | ||||||
---|---|---|---|---|---|---|
Please rate your child’s overall attention level during the assessment? | Excellent | Good | Average | Poor | Terrible | |
Please indicate the level of Physical Assistance (i.e., staying seated, position head, etc.) | Did not provide physical assistance | Assisted one time | Assisted part of the time | Assisted most of the time | ||
Please indicate the level of Gestural Assistance (i.e., using your finger to point things on the screen, point to screen to get their attention, etc.) | Did not provide gestural assistance | Assisted one time | Assisted part of the time | Assisted most of the time | ||
Please indicate the level of Verbal Assistance (i.e., “look here” or “watch the video”, etc.) | Did not provide verbal assistance | Assisted one time | Assisted part of the time | Assisted most of the time | ||
Please provide additional information on the level of assistance you provided. Type n/a if no assistance was provided | (text entry) | |||||
What was the most difficult part of aiding someone in completing this assessment? | (text entry) | |||||
Was there something that could have made it easier for you to assist someone in completing this assessment? | (text entry) |
Paradigm specific: Social Attention | ||||||||
---|---|---|---|---|---|---|---|---|
Please rate the overall relevance of the Social Attention assessment regarding a performance measure for the neurodevelopment / genetic disorder: | Extremely relevant | Very relevant | Slightly relevant | Not relevant at all | ||||
Please rate the quality of the audio during the section of this assessment: | The audio was very clear | The audio was somewhat clear | The audio was not clear | |||||
Please rate the quality of videos used during this section of the assessment | High quality | Medium quality | Low quality | |||||
Specific comments regarding the quality of videos of the assessment: | (text entry) | |||||||
If you answered that some of or all videos were low quality, please indicate which ones your felt were not high quality: | (text entry) | |||||||
Please rate the timing of the assessment? | Very fast | Somewhat fast | Neither slow nor fast | Somewhat slow | Very slow | |||
Specific comments regarding the timing of the assessment | (text entry) | |||||||
Is there anything you felt would have been helpful to know before beginning the Social Attention assessment? | (text entry) | |||||||
Any additional comments regarding the Social Attention assessment you feel researchers should be aware of? | (text entry) |
Paradigm specific: Processing Speed / Receptive Language / Single Word Reading | ||||||||
---|---|---|---|---|---|---|---|---|
Please rate the overall relevance of the [specific paradigm] assessment regarding a performance measure for the neurodevelopment / genetic disorder: | Extremely relevant | Very relevant | Slightly relevant | Not relevant at all | ||||
Please rate the instructions for this section of the assessment: | Very clear | Somewhat clear | Somewhat difficult to follow | Very difficult to follow | ||||
Specific comments regarding the instructions | (text entry) | |||||||
Please rate the quality of the audio during the section of this assessment: | The audio was clear | The audio was somewhat clear | The audio was not clear | |||||
Please rate the quality of pictures used during this section of the assessment: | High quality | Medium quality | Low quality | |||||
If you answered that some or all photos were low quality, please indicate which photos you felt were not high quality: | (text entry) | |||||||
Please rate the timing of the assessment: | Very fast | Somewhat fast | Neither slow nor fast | Somewhat slow | Very slow | |||
Specific comments regarding the timing for the assessment: | (text entry) | |||||||
If there anything you felt would have been helpful to know before beginning [specific paradigm] assessment? | (text entry) | |||||||
Any additional comments regarding [specific paradigm] assessment researchers should be aware of? | (text entry) |
Appendix 14. Parent/caregiver administration support training process.
Introductory Call | Introduction / Training video (optional) | Practice Performance Measure (optional) | Zoom Training (optional) |
Virtual Support Meetings (optional) |
---|---|---|---|---|
|
|
|
|
|
Appendix 15. Methodological details for computing social attention.
While all other webcam-collected patient performance measures were calculated based on a priori criteria. The social attention measure was calculated using empirical criteria. Specifically, the social attention measure was calculated following our prior research methods to determine whether a more sensitive indicator of ASD diagnosis and autism symptom level could be identified from data collected during the social attention stimulus paradigm. For this measure, the correlations between ASD diagnoses and each fixation metric for each area-of-interest were evaluated in a training sub-sample. This sub-sample was randomly selected from all baseline webcam administrations (60% of participants). Fixation metric/area-of-interest combinations with statistically-significant correlations (r>.18) were selected. These metrics were then combined into an aggregate social attention index and tested in separate testing (20%) and validation (20%) sub-samples as well as across timepoints.
Appendix 16. Operational definitions for all webcam performance measures.
# | Measure | Operational Definition |
---|---|---|
1 | Overall Attention | Average percentage viewing time to the screen across all 4 stimulus paradigms (15 one-minute blocks of stimuli) |
2 | Attentional Scanning | Average number of glances to processing speed stimuli across all quadrant of the screen |
3 | Positive Emotion | Average intensity rating for happy and surprised emotions across all social attention stimuli |
4 | Negative Emotion | Average intensity rating for fear, anger, disgust, and sadness emotions across all social attention stimuli |
5 | Social Attention |
Empirically-derived measure of attention to social versus non-social information using standardized fixation duration, fixation count, and time-to-first fixation |
6 | Social Preference | Average of all fixation durations to social areas-of-interest across all social attention stimuli |
7 | Face Preference |
Average of all fixation durations to face areas-of-interest across all social attention stimuli |
8 | Non-social Preference | Average of all fixation durations to non-social areas-of-interest across all social attention stimuli |
9 | Receptive Vocabulary | Sum of all fixation durations across vocabulary target word-picture combinations (27 total targets) |
10 | Speed to Faces |
Average time-to-first-fixation on the most prominent face areas-of-interest across all social attention stimuli |
11 | Speed to Object |
Average time-to-first-fixation on each target object area-of-interest across all processing speed stimuli |
12 | Reading accuracy | Sum of all fixation durations across target reading words (38 total targets) |
Note. Fixations were defined as at least 66ms of gaze point samples within a 100-pixel dispersion area. Glances were defined as an entry to an AOI with at least one fixation. To increment glance count, gaze must leave the AOI, with at least one fixation outside the AOI, and then return to the AOI with at least one fixation. With the exception of the social attention measure, which was empirically-derived, social and non-social areas-of-interest were defined a priori based on our prior investigations. Non-social areas-of-interest include non-target or distractor (extraneous) objects within social scenes. Receptive vocabulary words were chosen to range in difficulty from preschool (age 2) to adult (college) words. Word frequency was also used to select target words and distractors for the single-word reading task.
Appendix 17. Validity guidelines for all webcam performance measures.
For each paradigm, a valid stimulus was determined by requiring at least 50% fixation duration to that stimulus.
# | Measure | Validity Guidelines |
---|---|---|
1 | Overall Attention | At least 50% time on screen to at least one 1-minute video (≥30 seconds with gaze on-screen) |
2 | Attentional Scanning | At least 4 valid processing speed stimuli (≥40 seconds with gaze on-screen) |
3 | Positive Emotion | At least 50% time on screen to at least one 1-minute video (≥30 seconds with gaze on-screen) |
4 | Negative Emotion | At least 50% time on screen to at least one 1-minute video (≥30 seconds with gaze on-screen) |
5 | Social Attention | At least 8 valid stimuli with at least 8 social or 8 non-social AOIs empirically-identified |
6 | Social Preference | At least 8 valid social stimuli and at least 8 valid social AOIs (≥50 seconds with gaze on-screen) |
7 | Face Preference | At least 8 valid social attention stimuli with faces (≥40 seconds with gaze on-screen) |
8 | Non-social Preference | At least 8 valid social stimuli and 8 valid non-social areas-of-interest (≥50 seconds gaze on-screen) |
9 | Receptive Vocabulary | At least 8 valid target words (≥40 seconds with gaze on screen) |
10 | Speed to Faces | At least 8 valid social attention stimuli with faces (≥40 seconds with gaze on-screen) |
11 | Speed to Object | At least 4 valid processing speed stimuli (≥40 seconds with gaze on-screen) |
12 | Reading accuracy | At least 8 valid target words (≥32 seconds with gaze on screen) |
Appendix 18. Definitions for the four distinct gaze metrics collected.
Gaze Metric | Definition |
---|---|
Fixation duration | The total duration in milliseconds of an identified fixation from the first sample to the last sample included in the fixation definition. Fixation was defined as at least 66ms of gaze samples within a 100-pixel dispersion area. The total fixation duration was determined by summing the fixation duration for all identified fixations within an area-of-interest. |
Fixation count | A count of all fixations detected within an area-of-interest. |
Glance count | The number of times gaze entered and left an area-of-interest. Gaze to the area of interest was defined by at least one identified fixation. |
Time-to-first-fixation | The time elapsed from the start of the temporal area-of-interest to the first sample of the first identified fixation within an area-of-interest. |
Appendix 19. Clinician-scientist performance paradigm feedback.
Processing Speed | Receptive Vocabulary | Single-Word Reading | |
---|---|---|---|
Number of experts completing | 9 | 10 | 9 |
Device and webcam type
(webcam: internal or external) |
1 Mini tablet 5 Laptop internal 1 Laptop external 2 Desktop external |
6 Laptop internal 1 Laptop external 3 Desktop external |
5 Laptop internal 4 Desktop external |
Screen size | 1 <10” 3 10–12” 3 12–18” 4 19+” |
2 10–12” 3 13–18” 4 19+” |
2 10–12” 3 13–18” 4 19+” |
Paradigm relevance (M, SD, range)
(1=highly relevant to 4=not relevant) |
1.6 (0.7, 1–3) | 1.6 (0.8, 1–3) | 2.4 (1.0, 1–4) |
Clarity of instructions (M, SD, range)
(1=very clear to 4=very difficult to follow) |
1.1 (0.3, 1–2) | 1.2 (0.4, 1–2) | 1.3 (1.0, 1–4) |
Quality of audio (M, SD, range)
(1=very clear to 4=not clear) |
1 (0) | 1.7 (0.9, 1–4) | 1.1 (0.3, 1–2) |
Quality of pictures / words (M, SD, range)
(1=high, 2=medium, 3=low) |
1 (0) | 1.1 (0.3, 1–2) | 1.0 (0) |
Timing of administration (M, SD, range)
(1=very fast to 3=neither fast or slow to 5=very slow) |
3 (0.3, 3–4) | 3.1 (0.3, 3–4) | 2.9 (0.3, 2–3) |
Difficulty level (M, SD, range)
(1=very appropriate to 4=inappropriate) |
1.7 (0.5, 1–2) | 2.4 (0.9, 1–4) | 2.3 (1.3, 1–4) |
Possible concerns (n) | |||
Too many easy targets | 0 | 0 | 0 |
Too many moderate diff targets | 0 | 0 | 0 |
Too many hard targets | 0 | 4 | 2 |
Too few easy targets | 0 | 1 | 1 |
Too few moderate diff targets | 0 | 0 | 0 |
Too few hard targets | 0 | 0 | 0 |
Targets too close together | 2 | 0 | 0 |
Targets too far apart | 0 | 0 | 0 |
Target array not appropriate | 0 | 0 | 0 |
Note. Clinician-scientist experts did not provide feedback on the social attention paradigm as this was adapted from our prior eye tracking investigations and the stimuli for this paradigm had already received prior input from clinician-scientist experts in their development. Specific qualitative feedback is not included but was used to improve the administration flow and provide better instructions to the parent facilitating the administration.
Appendix 20. Parent performance paradigm feedback.
Social Attention | Processing Speed | Receptive Vocabulary | Single-Word Reading | |
---|---|---|---|---|
Number of parents completing | 8 | 8 | 9 | 8 |
Overall experience (M, SD, range)
(1=extremely positive to 5=extremely negative) |
1.9 (0.8, 1–3) | 2.0 (0.8, 1–3) | 2.1 (1.0, 1–4) | 2.9 (1.3, 1–5) |
Difficulty (M, SD, range)
(1=extremely easy to 5=extremely hard) |
2.0 (0.9, 1–4) | 2.5 (1.5, 1–5) | 2.7 (1.0, 2–4) | 3.3 (1.7, 1–5) |
Device and webcam type (M, SD, range)
(webcam: internal or external) |
6 Laptop 1 PC/Mac 1 Did not report |
7 Laptop 1 PC/Mac |
8 Laptop 1 PC/Mac |
7 Laptop 1 PC/Mac |
Sitting during evaluation | 6 Yes 2 No |
7 Yes 1 No |
7 Yes 1 No |
5 Yes 3 No |
Sensory-related movements during evaluation | 2 Yes 6 No |
2 Yes 6 No |
4 Yes 5 No |
5 Yes 3 No |
Breaks | 8 No | 8 No | 8 No 1 One break after each segment |
8 No |
Look away from screen
(1=Very often to 6=No looking away) |
3.8 (2.3, 1–6) | 3.0 (2.1, 1–6) | 3.8 (1.7, 2–6) | 3.4 (2.2, 1–6) |
Cover or touch face
(1=Very often to 5=No covering or touching) |
4.1 (1.1, 2–5) | 4.1 (0.8, 3–5) | 3.3 (1.4, 2–5) | 3.5 (1.5, 1–5) |
End video early | 8 No | 8 No | 9 No | 8 No |
Unexpected noise during evaluation | 5 No noise 3 One unexpected noise |
7 No noise 1 five+ occurrences |
7 No noise 2 One unexpected noise |
6 No noise 2 One unexpected noise |
Adjust lighting during evaluation | 8 No adjustments | 8 No adjustments | 9 No adjustments | 8 No adjustments |
Connection problems | 4 No 2 One occurrence 2 Two or three occurrences |
8 No | 9 No | 7 No 1 One occurrence |
Overall attention
(1=excellent to 3=average to 5=very poor) |
1.9 (1.2 (1–4) | 2.4 (1.3, 1–4) | 2.2 (1.2, 1–4) | 2.5 (1.5, 1–5) |
Physical assistance | 4 None 1 One time 3 Most of the time |
3 None 1 One time 1 Part-time 3 Most of the time |
3 None 2 Part-time 3 Most of the time 1 Did not report |
3 None 4 Part-time 1 Most of the time |
Gestural assistance | 5 None 1 One time 2 Part-time |
3 None 4 Part-time 1 Most of the time |
5 None 3 Part-time 1 Did not report |
4 None 3 Part-time 1 Most of the time |
Verbal assistance | 4 None 2 One time 2 Most of the time |
4 None 1 One time 1 Part-time 2 Most of the time |
3 None 1 One time 2 Part-time 2 Most of the time 1 Did not report |
3 None 1 One time 3 Part-time 1 Most of the time |
Paradigm relevance (M, SD, range)
(1=highly relevant to 4=not relevant) |
1.8 (0.4, 1–2) | 1.8 (.07, 1–3) | 1.2 (0.4, 1–2) | 2.3 (0.9, 1–3) |
Quality of audio
(1=very clear to 4=not clear) |
1.1 (0.3, 1–2) | 1 (0) | 1 (0) | 1.3 (0.4, 1–2) |
Quality of video / pictures / words
(1=high, 2=medium, 3=low) |
1.3 (0.5, 1–2) | 1.4 (0.7, 1–3) | 1 (0) | 1.1 (0.3, 1–2) |
Timing of administration
(1=very fast to 3=neither fast or slow to 5=very slow) |
2.6 (0.7, 1–3) | 2.3 (0.9, 1–3) | 2.0 (0.8, 1–3) | 2.5 (0.8, 1–3) |
Note. Two NDGS participants who were recruited to give feedback on the neurobehavioral surveys gave no feedback or only provided feedback for one performance paradigms due to life stress. As a result, only 8–9 participants completed the performance paradigms as part of pilot testing. All of these were participants who required parental support with parents completing the post-evaluation questionnaire.
Appendix 21. Participant accounting.
Note. Invalid cases attempted at least one webcam performance measure but achieved less than 30 seconds with gaze on screen.
Appendix 22. Histograms for each webcam-collected performance measure with super-imposed normal distribution curve.
Note. A=Overall Attention, B=Attentional Scanning, C=Positive Emotional Expressiveness, D=Negative Emotional Expressiveness, E=Social Attention, F=Social Preference, G=Face Preference, H=Non-social Preference, I=Receptive Vocabulary, J=Speed to Faces, K=Speed to Objects, L=Reading Accuracy.
Appendix 23. Inter-correlations among the performance measures.
1: OA | 2: AS | 3: PE | 4: NE | 5: SA | 5: SP | 6: FP | 7: NP | 8: RV | 9: SF | 10: SO | 11: RA | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1: Overall attention (OA) | - | .34* | −.19* | −.08 | −.45* | .13 | .33* | −.25* | .54* | −.26* | −.28* | .55* |
2: Attentional Scanning (AS) | - | −.24* | −.21* | −.49* | .23* | .27* | −.25* | .54* | −.21* | −.78* | .48* | |
3: Positive Emotion (PE) | - | −.06 | .23* | −.15 | −.15 | .08 | −.23* | .02 | .22* | −.27* | ||
4: Negative Emotion (NE) | - | .22* | −.15 | −.11 | .15 | −.23* | .08 | .19* | −.28* | |||
5: Social Attention (SA) | - | −.54 | −.52* | .19* | −.59* | .30* | .45* | −.62* | ||||
5: Social Preference (SP) | - | .64* | −.05 | .35* | −.30* | −.24* | .38* | |||||
6: Face Preference (FP) | - | −.50* | .38* | −.79* | −.34* | .38* | ||||||
7: Non-social Preference (NP) | - | −.18 | .62* | .33* | −.20 | |||||||
8: Receptive Vocabulary (RV) | - | −.22 | −.55* | .78* | ||||||||
9: Speed to Faces (SF) | - | .30* | −.21 | |||||||||
10: Speed to Object (SO) | - | −.47* | ||||||||||
11: Reading accuracy (RA) | - |
Note. Social attention is keyed so that higher scores are more consistent with autism spectrum disorder. Speed to faces and speed to objects are keyed so that higher scores indicate longer time to fixate the target. Samples sizes vary from n=248 to n=322.
designates significant correlations p<.001.
Appendix 24. Association of the webcam social attention measure with autism symptom level and ASD diagnosis.
Autism Symptom Level | |||||
---|---|---|---|---|---|
Baseline | 1-Month Follow-Up | 4-Month Follow-Up | |||
Total n | ASD n | r | r | r | |
Training | 192 | 42 | .53 | .42 | .54 |
Testing | 76 | 18 | .49 | .32 | .54 |
Validation | 77 | 22 | .61 | .53 | .65 |
Testing + Validation | 153 | 40 | .55 | .43 | .58 |
Ages 3–8 | 64 | 20 | .54 | .48 | .62 |
Ages 9+ | 89 | 20 | .57 | .37 | .59 |
ASD Diagnosis | |||||
Baseline | 1-Month Follow-Up | 4-Month Follow-Up | |||
Total n | ASD n | AUC (SE) | AUC (SE) | AUC (SE) | |
Training | 192 | 42 | .809 (.036) | .735 (.041) | .804 (.039) |
Testing | 76 | 18 | .790 (.066) | .693 (.073) | .755 (.081) |
Validation | 77 | 22 | .857 (.050) | .790 (.067) | .883 (.051) |
Testing + Validation | 153 | 40 | .821 (.041) | .744 (.049) | .815 (.048) |
Ages 3–8 | 64 | 20 | .806 (.060) | .730 (.072) | .836 (.062) |
Ages 9+ | 89 | 20 | .833 (.058) | .749 (.071) | .822 (.072) |
Note. ASD=Autism Spectrum Disorder. All correlations are statistically significant, p<.01. The Training sub-sample included a randomly-selected 60% of all administration. The testing and validation sub-samples each consisted of a random selection of 20% of all administration. Autism Symptom Level is based on averaging scores on the social communication / interaction and restricted repetitive behavior scales of the neurobehavioral evaluation tool informant-report survey.
Appendix 25. Receiver operating characteristic curve analyses evaluating the predictive validity of the social attention measure for ASD diagnosis in the combined testing and validation baseline subsamples.
Appendix 26. Multi-level likelihood ratios (mLRs) and sensitivity and specificity for ASD diagnosis at relevant cut scores using the social attention measure (baseline only).
Score range (z-score) [NT Mean=0, SD=1] |
mLR | Interpretation | Cut Score (z-score) |
Sensitivity | Specificity |
---|---|---|---|---|---|
< +0.1 | .153 | Reduced Probability | 0.1 | 92% | 50% |
.1 to 1.8 | .947 | No Change | 1.0 | 77% | 65% |
1.81 to 2.45 | 3.92 | Increased Probability | 1.8 | 55% | 90% |
2.46+ | 6.41 | Strongly Increased Probability | 2.45 | 36% | 95% |
Youden’s J | - | - | 1.49 | 70% | 87% |
Appendix 27. Means and standard deviations for webcam measures (z-scores) in each patient group.
PTEN | NFIX | SYNGAP1 | Other NDGS | idiopathic NDD | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Webcam Measure | M | (SD) | M | (SD) | M | (SD) | M | (SD) | M | (SD) | Pattern |
Overall Attention | −0.07 | (1.03) | −0.55 | (1.18) | −0.78 | (1.04) | −1.08 | (1.29) | −0.16 | (1.04) | Low scores for NFIX, SYNGAP1, other NDGS |
Attentional Scanning | −0.12 | (1.58) | −0.85 | (0.65) | −1.30 | (0.80) | −1.28 | (0.77) | −0.05 | (1.25) | Low scores for NFIX, SYNGAP1, other NDGS |
Positive Emotion | 0.05 | (0.74) | 0.35 | (1.23) | −0.07 | (0.71) | 0.32 | (1.02) | −0.06 | (0.65) | High scores for NFIX and other NDGS |
Negative Emotion | 0.18 | (1.67) | −0.09 | (0.92) | 0.62 | (1.67) | 0.59 | (1.81) | 0.10 | (0.70) | High scores for SYNGAP1 and other NDGS |
Social Attention | −0.34 | (1.00) | −1.78 | (1.09) | −1.70 | (1.34) | −1.78 | (1.25) | −0.11 | (1.11) | Low scores for all but idiopathic NDD |
Social Preference | −0.34 | (0.95) | −0.67 | (0.85) | −1.27 | (1.31) | −0.79 | (1.19) | 0.04 | (1.54) | Low scores for all but idiopathic NDD |
Face Preference | −0.10 | (0.90) | −0.32 | (0.80) | −0.54 | (0.78) | −0.47 | (0.85) | 0.13 | (1.27) | Low scores for NFIX, SYNGAP1, other NDGS |
Non-social Preference | 0.40 | (1.01) | 0.55 | (0.94) | 0.82 | (1.44) | 0.62 | (1.24) | 0.11 | (1.07) | High scores for all but idiopathic NDD |
Receptive Vocabulary | −0.26 | (1.04) | −0.94 | (0.91) | −0.82 | (0.91) | −1.01 | (0.72) | −0.13 | (1.00) | Low scores for all but idiopathic NDD |
Speed to Faces | −0.09 | (1.12) | 0.19 | (0.74) | 0.50 | (0.82) | 0.38 | (0.84) | −0.11 | (1.10) | Slow for NFIX, SYNGAP1, and other NDGS |
Speed to Object | 0.08 | (1.43) | 0.53 | (0.74) | 1.00 | (0.83) | 1.08 | (0.74) | 0.09 | (1.13) | Slow for NFIX, SYNGAP1, and other NDGS |
Reading accuracy | −0.21 | (0.96) | −0.70 | (0.73) | −0.74 | (0.91) | −0.93 | (0.64) | −0.35 | (1.01) | Low scores for all but idiopathic NDD |
Note. Scores represent z-scores adjusted for age, the square of age, and sex derived from neurotypical control norms. Sibling controls not shown as they did not significantly deviate from the healthy control mean for any measure.
Data Availability Statement
The datasets analyzed during the current study are available from the corresponding author on reasonable request.
References
- Amit M, Chukoskie L, Skalsky AJ, Garudadri H, & Ng TN (2020). Flexible Pressure Sensors for Objective Assessment of Motor Disorders. Advanced Functional Materials, 30(20). https://doi.org/ARTN1905241 10.1002/adfm.201905241 [DOI] [Google Scholar]
- Boateng GO, Neilands TB, Frongillo EA, Melgar-Quinonez HR, & Young SL (2018). Best Practices for Developing and Validating Scales for Health, Social, and Behavioral Research: A Primer. Front Public Health, 6, 149. 10.3389/fpubh.2018.00149 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bove R, Rowles W, Zhao C, Anderson A, Friedman S, Langdon D, Alexander A, Sacco S, Henry R, Gazzaley A, Feinstein A, & Anguera JA (2021). A novel in-home digital treatment to improve processing speed in people with multiple sclerosis: A pilot study. Multiple Sclerosis, 27(5), 778–789. 10.1177/1352458520930371 [DOI] [PubMed] [Google Scholar]
- Busch RM, Chapin JS, Mester J, Ferguson L, Haut JS, Frazier TW, & Eng C (2013). Cognitive characteristics of PTEN hamartoma tumor syndromes. Genet Med, 15(7), 548–553. 10.1038/gim.2013.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Busch RM, Frazier Ii TW, Sonneborn C, Hogue O, Klaas P, Srivastava S, Hardan AY, Martinez-Agosto JA, Sahin M, & Eng C (2023). Longitudinal neurobehavioral profiles in children and young adults with PTEN hamartoma tumor syndrome and reliable methods for assessing neurobehavioral change. J Neurodev Disord, 15(1), 3. 10.1186/s11689-022-09468-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Busch RM, Srivastava S, Hogue O, Frazier TW, Klaas P, Hardan A, Martinez-Agosto JA, Sahin M, Eng C, & Developmental Synaptopathies C (2019). Neurobehavioral phenotype of autism spectrum disorder associated with germline heterozygous mutations in PTEN. Transl Psychiatry, 9(1), 253. 10.1038/s41398-019-0588-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chita-Tegmark M (2016). Social attention in ASD: A review and meta-analysis of eye-tracking studies. Research in Developmental Disabilities, 48, 79–93. 10.1016/j.ridd.2015.10.011 [DOI] [PubMed] [Google Scholar]
- Ciaccio C, Saletti V, D’Arrigo S, Esposito S, Alfei E, Moroni I, Tonduti D, Chiapparini L, Pantaleoni C, & Milani D (2018). Clinical spectrum of PTEN mutation in pediatric patients. A bicenter experience. Eur J Med Genet. 10.1016/j.ejmg.2018.12.001 [DOI] [PubMed] [Google Scholar]
- Cicchetti D, Bronen R, Spencer S, Haut S, Berg A, Oliver P, & Tyrer P (2006). Rating scales, scales of measurement, issues of reliability: Resolving some critical issues for clinicians and researchers. The Journal of Nervous and Mental Disease, 194(8), 557–564. [DOI] [PubMed] [Google Scholar]
- Cohen J, & Cohen P (1983). Applied Multiple Regression/correlation analysis for the behavioral sciences (2nd edition ed.). L. Erlabaum Associates. [Google Scholar]
- Dawson G, Campbell K, Hashemi J, Lippmann SJ, Smith V, Carpenter K, Egger H, Espinosa S, Vermeer S, Baker J, & Sapiro G (2018). Atypical postural control can be detected via computer vision analysis in toddlers with autism spectrum disorder. Sci Rep, 8(1), 17008. 10.1038/s41598-018-35215-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Egger HL, Dawson G, Hashemi J, Carpenter KLH, Espinosa S, Campbell K, Brotkin S, Schaich-Borg J, Qiu Q, Tepper M, Baker JP, Bloomfield RA Jr., & Sapiro G (2018). Automatic emotion and attention analysis of young children at home: a ResearchKit autism feasibility study. NPJ Digit Med, 1, 20. 10.1038/s41746-018-0024-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- FDA. (2009). Patient-reported outcome measures: use in medical product development to support labeling claims. (United States Food and Drug Administration, Guidance for Industry., Issue. [Google Scholar]
- Frazier TW (2019). Autism Spectrum Disorder Associated with Germline Heterozygous PTEN Mutations. In Eng C, Ngeow J, & Stambolic V (Ed.), (Vol. 9, pp. a037002). Cold Spring Harbor Laboratory Press. 10.1101/cshperspect.a037002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frazier TW, Busch RM, Klaas P, Lachlan K, Jeste S, Kolevzon A, Loth E, Harris J, Speer L, Pepper T, Anthony K, Graglia JM, Delagrammatikas C, Bedrosian-Sermone S, Beekhuyzen J, Smith-Hicks C, Sahin M, Eng C, Hardan AY, & Uljarevic M (2023). Development of informant-report neurobehavioral survey scales for PTEN hamartoma tumor syndrome and related neurodevelopmental genetic syndromes. Am J Med Genet A. 10.1002/ajmg.a.63195 [DOI] [PubMed] [Google Scholar]
- Frazier TW, Embacher R, Tilot AK, Koenig K, Mester J, & Eng C (2015). Molecular and phenotypic abnormalities in individuals with germline heterozygous PTEN mutations and autism. Molecular Psychiatry, 20(9), 1132–1138. 10.1038/mp.2014.125 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frazier TW, Hauschild KM, Klingemier E, Strauss MS, Hardan AY, & Youngstrom EA (2020). Rapid eye-tracking evaluation of language in children and adolescents referred for assessment of neurodevelopmental disorders. Journal of Intellectual & Developmental Disability, 45(3), 222–235. 10.3109/13668250.2019.1698287 [DOI] [Google Scholar]
- Frazier TW, Klingemier EW, Anderson CJ, Gengoux GW, Youngstrom EA, & Hardan AY (2021). A Longitudinal Study of Language Trajectories and Treatment Outcomes of Early Intensive Behavioral Intervention for Autism. Journal of Autism and Developmental Disorders, 51(12), 4534–4550. 10.1007/s10803-021-04900-5 [DOI] [PubMed] [Google Scholar]
- Frazier TW, Klingemier EW, Parikh S, Speer L, Strauss MS, Eng C, Hardan AY, & Youngstrom EA (2018). Development and validation of objective and quantitative eye tracking-based measures of autism risk and symptom levels. Journal of the American Academy of Child and Adolescent Psychiatry, 57(11), 858–866. 10.1016/j.jaac.2018.06.023 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frazier TW, Strauss M, Klingemier EW, Zetzer EE, Hardan AY, Eng C, & Youngstrom EA (2017). A Meta-Analysis of Gaze Differences to Social and Nonsocial Information Between Individuals With and Without Autism. Journal of the American Academy of Child and Adolescent Psychiatry, 56(7), 546–555. 10.1016/j.jaac.2017.05.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frazier TW, Uljarevic M, Ghazal I, Klingemier EW, Langfus J, Youngstrom EA, Aldosari M, Al-Shammari H, El-Hag S, Tolefat M, Ali M, & Al-Shaban FA (2021). Social attention as a cross-cultural transdiagnostic neurodevelopmental risk marker. Autism Res. 10.1002/aur.2532 [DOI] [PubMed] [Google Scholar]
- Goodwin MS, Mazefsky CA, Ioannidis S, Erdogmus D, & Siegel M (2019). Predicting aggression to others in youth with autism using a wearable biosensor. Autism Res, 12(8), 1286–1296. 10.1002/aur.2151 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hardan AY, Jo B, Frazier TW, Klaas P, Busch RM, Dies KA, Filip-Dhima R, Snow AV, Eng C, Hanna R, Zhang B, & Sahin M (2021). A randomized double-blind controlled trial of everolimus in individuals with PTEN mutations: Study design and statistical considerations. Contemp Clin Trials Commun, 21, 100733. 10.1016/j.conctc.2021.100733 [DOI] [PMC free article] [PubMed] [Google Scholar]
- IBM Corp. (2021). IBM SPSS Statistics for Windows. In (Version 28.0) IBM Corp. [Google Scholar]
- Kail R (1991). Developmental change in speed of processing during childhood and adolescence. Psychological Bulletin, 109(3), 490–501. 10.1037/0033-2909.109.3.490 [DOI] [PubMed] [Google Scholar]
- Kuntzler T, Hofling TTA, & Alpers GW (2021). Automatic Facial Expression Recognition in Standardized and Non-standardized Emotional Expressions. Front Psychol, 12, 627561. 10.3389/fpsyg.2021.627561 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Manfredonia J, Bangerter A, Manyakov NV, Ness S, Lewin D, Skalkin A, Boice M, Goodwin MS, Dawson G, Hendren R, Leventhal B, Shic F, & Pandina G (2019). Automatic Recognition of Posed Facial Expression of Emotion in Individuals with Autism Spectrum Disorder. Journal of Autism and Developmental Disorders, 49(1), 279–293. 10.1007/s10803-018-3757-9 [DOI] [PubMed] [Google Scholar]
- McPartland JC, Bernier RA, Jeste SS, Dawson G, Nelson CA, Chawarska K, Earl R, Faja S, Johnson SP, Sikich L, Brandt CA, Dziura JD, Rozenblit L, Hellemann G, Levin AR, Murias M, Naples AJ, Platt ML, Sabatos-DeVito M, … Autism Biomarkers Consortium for Clinical, T. (2020). The Autism Biomarkers Consortium for Clinical Trials (ABC-CT): Scientific Context, Study Design, and Progress Toward Biomarker Qualification. Front Integr Neurosci, 14, 16. 10.3389/fnint.2020.00016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mulder PA, van Balkom IDC, Landlust AM, Priolo M, Menke LA, Acero IH, Alkuraya FS, Arias P, Bernardini L, Bijlsma EK, Cole T, Coubes C, Dapia I, Davies S, Di Donato N, Elcioglu NH, Fahrner JA, Foster A, Gonzalez NG, … Hennekam RC (2020). Development, behaviour and sensory processing in Marshall-Smith syndrome and Malan syndrome: phenotype comparison in two related syndromes. Journal of Intellectual Disability Research, 64(12), 956–969. 10.1111/jir.12787 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nerusil B, Polec J, Skunda J, & Kacur J (2021). Eye tracking based dyslexia detection using a holistic approach. Sci Rep, 11(1), 15687. 10.1038/s41598-021-95275-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ness SL, Bangerter A, Manyakov NV, Lewin D, Boice M, Skalkin A, Jagannatha S, Chatterjee M, Dawson G, Goodwin MS, Hendren R, Leventhal B, Shic F, Frazier JA, Janvier Y, King BH, Miller JS, Smith CJ, Tobe RH, & Pandina G (2019). An Observational Study With the Janssen Autism Knowledge Engine (JAKE((R))) in Individuals With Autism Spectrum Disorder. Front Neurosci, 13, 111. 10.3389/fnins.2019.00111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nunnally JC, & Bernstein IH (1994). Psychometric theory (3rd ed.). McGraw-Hill, Inc. [Google Scholar]
- R Core Team. (2021). R: A language and environment for statistical computing URL https://www.R-project.org/
- Sahin M, Jones SR, Sweeney JA, Berry-Kravis E, Connors BW, Ewen JB, Hartman AL, Levin AR, Potter WZ, & Mamounas LA (2018). Discovering translational biomarkers in neurodevelopmental disorders. Nat Rev Drug Discov. 10.1038/d41573-018-00010-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sahin M, & Sur M (2015). Genes, circuits, and precision therapies for autism and related neurodevelopmental disorders. Science, 350(6263). 10.1126/science.aab3897 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Salley B, & Colombo J (2016). Conceptualizing Social Attention in Developmental Research. Soc Dev, 25(4), 687–703. 10.1111/sode.12174 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sasson NJ, & Elison JT (2012). Eye tracking young children with autism. J of Visual Experiments(61), 3675. 10.3791/3675 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Semmelmann K, & Weigelt S (2018). Online webcam-based eye tracking in cognitive science: A first look. Behav Res Methods, 50(2), 451–465. 10.3758/s13428-017-0913-7 [DOI] [PubMed] [Google Scholar]
- Shehu IS, Wang YF, Athuman AM, & Fu XP (2021). Remote Eye Gaze Tracking Research: A Comparative Evaluation on Past and Recent Progress. Electronics, 10(24). 10.3390/electronics10243165 [DOI] [Google Scholar]
- Shu C, Green Snyder L, Shen Y, Chung WK, & Consortium S (2022). Imputing cognitive impairment in SPARK, a large autism cohort. Autism Res, 15(1), 156–170. 10.1002/aur.2622 [DOI] [PubMed] [Google Scholar]
- Simmatis L, Alavi Naeini S, Jafari D, Xie MKY, Tanchip C, Taati N, McKinlay S, Sran R, Truong J, Guarin DL, Taati B, & Yunusova Y (2023). Analytical Validation of a Webcam-Based Assessment of Speech Kinematics: Digital Biomarker Evaluation following the V3 Framework. Digit Biomark, 7(1), 7–17. 10.1159/000529685 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Srivastava S, Jo B, Zhang B, Frazier T, Gallagher AS, Peck F, Levin AR, Mondal S, Li Z, Filip-Dhima R, Geisel G, Dies KA, Diplock A, Eng C, Hanna R, Sahin M, Hardan A, & Developmental Synaptopathies C (2022). A randomized controlled trial of everolimus for neurocognitive symptoms in PTEN hamartoma tumor syndrome. Human Molecular Genetics, 31(20), 3393–3404. 10.1093/hmg/ddac111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Steele M, Uljarevic M, Rached G, Frazier TW, Phillips JM, Libove RA, Busch RM, Klaas P, Martinez-Agosto JA, Srivastava S, Eng C, Sahin M, & Hardan AY (2021). Psychiatric Characteristics Across Individuals With PTEN Mutations. Front Psychiatry, 12, 672070. 10.3389/fpsyt.2021.672070 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Streiner DL, & Norman GR (1995). Health Measurement Scales: A Practical Guide To Their Development and Use (2nd ed.). Oxford University Press. [Google Scholar]
- Tuncgenc B, Pacheco C, Rochowiak R, Nicholas R, Rengarajan S, Zou E, Messenger B, Vidal R, & Mostofsky SH (2021). Computerized Assessment of Motor Imitation as a Scalable Method for Distinguishing Children With Autism. Biol Psychiatry Cogn Neurosci Neuroimaging, 6(3), 321–328. 10.1016/j.bpsc.2020.09.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vlaskamp DRM, Shaw BJ, Burgess R, Mei D, Montomoli M, Xie H, Myers CT, Bennett MF, XiangWei W, Williams D, Maas SM, Brooks AS, Mancini GMS, van de Laar I, van Hagen JM, Ware TL, Webster RI, Malone S, Berkovic SF, … Scheffer IE (2019). SYNGAP1 encephalopathy: A distinctive generalized developmental and epileptic encephalopathy. Neurology, 92(2), e96–e107. 10.1212/WNL.0000000000006729 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Youngstrom EA, Salcedo S, Frazier TW, & Perez Algorta G (2019). Is the Finding Too Good to Be True? Moving from “More Is Better” to Thinking in Terms of Simple Predictions and Credibility. J Clin Child Adolesc Psychol, 48(6), 811–824. 10.1080/15374416.2019.1669158 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets analyzed during the current study are available from the corresponding author on reasonable request.