Abstract
Objective:
This study aimed to develop a classifier for infants at 12 months of age based on a parent-report measure (the First Year Inventory v.2.0; FYI), to: (1) classify infants at elevated risk, above and beyond that attributable to familial risk status for ASD; and, (2) serve as a starting point to refine an approach for risk estimation in population samples.
Method:
Fifty-four high familial risk (HR) infants later diagnosed with ASD (HR-ASD), 183 HR infants not diagnosed with ASD at 24 months of age (HR-Neg), and 72 low risk controls participated in the study. All infants contributed FYI data at 12 months of age and had a diagnostic assessment for ASD at age 24 months. A data-driven, cross-validated analytic approach was used to develop a classifier to determine screening accuracy (e.g. sensitivity) of the FYI to classify HR-ASD and HR-Neg.
Results:
The newly developed FYI classifier had an estimated sensitivity of 0.71 (95% CI: 0.50, 0.91) and specificity of 0.72 (95% CI: 0.49, 0.91).
Conclusion:
This classifier demonstrates the potential to improve current screening for ASD risk at 12 months of age in infants already at elevated familial risk for ASD, increasing opportunities for detection of autism risk in infancy. Findings from this study highlight the utility of combining parent-report measures with machine learning approaches.
Keywords: screening, autism spectrum disorder, high-risk, first year, parent report
INTRODUCTION
Autism spectrum disorder (ASD) is a neurodevelopmental disorder defined by social deficits and restrictive and repetitive behaviors present in early childhood.1 The latest U.S. prevalence estimates indicate that ASD affects 1 in 54 school age children.2 Despite tremendous effort by stakeholders, ASD continues to be a lifelong challenge for most affected individuals and their families. Prospective, high familial infant risk studies have demonstrated that the defining behaviors of ASD do not generally consolidate into a diagnosable condition until around two years of age or later, an age which is preceded by a prodromal period in the first year of life.3-5 Although measures have been developed to screen for ASD in the second or third year of life, e.g. The Modified Checklist for Autism in Toddlers Revised (M-CHAT-R/F)6, there remains a need for development of cost-effective screening measures (e.g. parent/caregiver report) for identifying elevated ASD risk during this prodromal period in the first year of life. Developing such a measure would enable early detection of risk and possibly, prodromal intervention, for infants who are at the highest risk for developing ASD.
Currently available ASD-specific screeners that caregivers/parents complete when infants are 12-months of age include the First Year Inventory (FYI),7 the Parent Observation of Early Markers Scale (POEMS),8 and the Autism Parent Screen for Infants (APSI).9 Evaluating the strength of these measures as screening tools for ASD risk should include consideration of performance estimates such as sensitivity and specificity, as well as sample size and method of analysis - all critical in determining the stability and generalizability of estimates. Review of these parameters and methods for the above-mentioned parent report measures are presented in Table-1. In addition to limited sample size8,10,11 and/or screening accuracy (e.g. sensitivity),9-12 none of these studies employed a cross validation (CV) approach to prediction. One of the strengths of CV is to accurately estimate how the stated values of performance (e.g. sensitivity) will generalize to an independent dataset.13 Without the use of CV, reported estimates of performance are likely to be overly optimistic compared to true performance in an independent dataset.
Table 1:
Currently Available Parent Report Screening Measures Specifically designed for Autism Spectrum Disorder Risk in Infants at 12 Months of Age
Screening measure |
Ages assessed | Sample | Sample size for ASD outcome |
Method used for determining predictive utility |
Sensitivity | Specificity |
---|---|---|---|---|---|---|
At 12-month time point | ||||||
FYI10 | 12 months (mo) | Community ascertained | 9 | Risk score derived using Factor Analysis No CV ROC Analysis |
0.44 | 0.99 |
POEMS8 | 3, 6, 9, 12, 18, 24 mo | HR-ASD | Ranges from 2 – 9 At 12 mo: n = 7 |
Mixed model regression No CV |
0.71a | 0.68a |
APSI9 | 6, 9, 12, 18, 15, 24 mo | HR-ASD | Ranges from 21 – 56 At 12 mo: n = 54 |
Logistic Regression No CV ROC Analysis |
0.59a | 0.72a |
FYI11 | 12 mo | HR-ASD | 16 | CART No CV |
0.63a | 0.93a |
FYI12 | 12 mo | HR-ASD | 29 | Risk score derived using Factor Analysis No CV ROC Analysis |
0.34 | 0.91 |
Note: APSI = Autism Parent Screen for Infants; ASD = autism spectrum disorder; CART = Classification And Regression Tree; CV = cross validation; FYI = First Year Inventory; HR-ASD = high familial risk for ASD; POEMS = Parent Observation of Early Markers Scale; ROC = receiver operating characteristic.
Estimates were calculated from the same sample used to construct the classifier
In the current study we examined FYI data on 309 subjects from a prospective, longitudinal study of infants at high and low familial risk for autism, who participated in the NIH-funded Infant Brain Imaging Study (IBIS). This high familial risk sample included a relatively large number (N=54) of infants who went on to receive a diagnosis of ASD, maximizing power to examine the screening accuracy of the FYI items using supervised machine learning – a critical first step in the process of validating this tool for potential widespread use. Although the overarching goal is to develop a valid classifier that could potentially be employed in the general population to identify elevated risk for ASD in infants, in the current study we aimed to address an important part of this goal by: (1) developing an FYI classifier on a sample at heightened risk for ASD, that may have particular and more immediate clinical application for infants at high familial risk for ASD, and (2) serving as a starting point for refining an analytic approach to generate a risk estimate that can be employed with a population sample.
METHOD
Participants
A total of 309 infants participated in this study from four clinical data collection sites: University of North Carolina at Chapel Hill; Children’s Hospital of Philadelphia; Washington University in St. Louis; and University of Washington. All parents provided written, informed consent. Research protocols were approved by institutional review boards from respective clinical sites. Infants within the high-risk group (HR; n=237) had an older sibling with a community diagnosis of ASD and met criteria for ASD on the Social Communication Questionnaire (SCQ)14 and Autism Diagnostic Interview (ADI-R).15 Infants in the low risk group (LR; n=72) had an older, typically developing sibling, determined by parent interview on the Family Interview for Genetic Studies (FIGS)16 and no first-degree relatives with a developmental disability. All infants with evidence of a genetic condition or syndrome, significant medical or neurological condition affecting development, significant vision or hearing impairment, birth weight <2000 g or gestational age <36 weeks, perinatal brain injury secondary to birth complications or exposure to specific medication or neurotoxins during gestation, contraindication for MRI, predominant home language other than English; children who were adopted or half siblings, had a 1st degree relative with psychosis, schizophrenia, or bipolar disorder; and children who were twins, were excluded from the study.
Measures
Primary measure:
First Year Inventory version 2.0 (FYI):
The FYI is a parent questionnaire that was designed to identify 12-month-olds at elevated risk for developing ASD.7,17 It consists of 63 items: 46 based on a 4-point scale (‘never’ ‘seldom’, ‘sometimes’, and ‘often’ occurs), 14 multiple choice questions, two open-ended questions about development and caregiver concerns, and one on speech sounds during babbling. There are two domains within the FYI - Social Communication (SC) and Sensory-Regulatory functions (SR). A risk score is computed for each domain. Higher scores indicate more numerous and/or more extreme symptoms (for full details see Reznick et al).17 In this study we report group level differences based on the two domain scores. All individual items except the two open-ended questions and one question pertaining to speech sounds were used for the prediction analysis (i.e., 60 items), based on raw scores.
The following developmental and diagnostic assessments were used to characterize the groups: The Mullen Scales of Early Learning (MSEL)18 is a standardized, developmental assessment for children aged 0–68 months. It provides an Early Learning Composite (ELC) standard score which indexes overall development, and five subscale T-scores (fine motor, gross motor, visual reception, expressive and receptive language). The Autism Diagnostic Observation Schedule (ADOS)19 is a semi-structured, observational play assessment of social interaction, communication, and repetitive behaviors for diagnosis and classification of ASD. Conventional scoring algorithms were applied to create a total calibrated severity score (CSS).20
Procedure
All participants (HR= 237; LR=72) had complete data on the 60 FYI items we employed at 12 months of age and best estimate diagnostic outcome determined at 24 months by experienced, licensed clinicians. Although clinicians were blind to the FYI scores, they were not blind to the risk status of the subject. Participants with missing FYI data were not included in the study. See Supplement 1, available online, for more details on rationale for exclusion and Table S1, available online for participant characteristics with and without missing data. Fifty-four HR subjects (HR-ASD; 22.7%) and one LR subject (LR-ASD) received a diagnosis of either DSM-IV-TR autistic disorder or pervasive developmental disorder not otherwise specified (referred to here as ASD), based on a clinical best estimate supported by all available assessment data, including the ADOS, and MSEL. The single LR subject with an ASD diagnosis was excluded from the analysis because it was too small a comparison group. One hundred eighty-three HR subjects receiving the above evaluation and seventy-two LR subjects did not meet criteria for ASD and comprise the HR-negative (HR-Neg) group and LR groups, respectively. See Estes et al,21 for a detailed description of the diagnostic procedures.
Statistical Analysis
Analyses were performed using R software, version 3.4 (R Core Team, 2017).22 Group differences in demographics were examined using analysis of variance (ANOVA), chi-square tests, or Fisher’s exact tests where appropriate. To examine the validity of the FYI for use in predicting ASD on the HR sample, we first assessed group differences in the FYI domain scores using ANOVA, controlling for education level of the respondent (i.e., caregiver completing the FYI). The domain scores were used to minimize the number of comparisons being done and maximize statistical power. Least square means and corresponding 95% confidence intervals (CIs) are reported for each domain (SR and SC) by diagnosis group (HR-ASD, HR-Neg, LR). Omnibus tests of the domain scores were followed by post-hoc pairwise tests using Holm’s correction.23 Cohen’s d effect sizes were computed for all comparisons.
To address the primary objective of developing a classifier based on the raw scores of the 60 individual FYI items to predict ASD diagnosis at 24 months in HR infants, we used a Random Forest (RF) machine learning algorithm.24 Due to the categorical structure of the FYI items (all items are either multiple-choice or items that are rated on a 4-point rating scale), we considered tree-based classification methods to construct an ASD prediction algorithm using the FYI. RF was chosen to construct the algorithm because, in comparison to other tree-based classifiers (e.g., CART), RF improves prediction accuracy by reducing the degree of overfitting to the dataset used for training.24 We limited the analysis to HR infants given our aim to develop a classifier for potential use in high familial risk infants. RF is composed of a multitude of individual decision tree-based classifiers. Each individual tree is created using Classification And Regression Tree (CART), a single decision tree approach, after random sampling with replacement from a given dataset.25 This process is repeated a specified number of times to create a set of trees, i.e., the forest. For a given subject, FYI items (i.e., predictors) are inputted into each tree, with each returning a predicted outcome (HR-ASD or HR-Neg). RF selects HR-ASD as the final predicted outcome for the subject if the proportion of trees returning ASD is above a chosen threshold (probability threshold).
In order to construct and test the accuracy of the RF algorithm (classifier), we conducted 5-fold cross validation (CV) stratified by diagnosis (HR-ASD vs HR-Neg) and repeated 50 times. Five-fold CV refers to the process of randomly partitioning the data into five equal sub-samples (folds). In each iteration, one of the five folds is used as a test set and the remaining four are used as the training set to set the tuning parameters for and then construct the RF. This process is done five times with each of the folds used exactly once as the test set. For each iteration, we first applied the Synthetic Minority Over-sampling Technique (SMOTE)26, with 200 percent oversampling of the minority group (HR-ASD) to the training set before tuning and then constructing the RF. This was done because RF is known to perform poorly when used with unbalanced data. The tuning parameters for RF are the number of trees constructed and the number of predictors considered at each split. The number of trees was selected as the lowest number of trees such that the out-of-bag error rate in the training set (OOB) was stable based on visual inspection. After selecting the number of trees, the number of predictors considered was selected as the number such that the OOB error rate was minimized. Based on these tuning parameters, the RF was then constructed using the training set. Using this algorithm, an estimated probability of being diagnosed ASD positive was calculated for each child in the test set using the proportion of trees in the RF which result in an ASD positive prediction. Using these estimated probabilities, a receiver operating characteristic (ROC) curve was created with a corresponding area under the curve (AUC).
We evaluated the performance of the algorithm based on results from the test set for each iteration of the 5-fold CV, which was repeated 50 times (250 total iterations). For each test set, the probability threshold was chosen that corresponded to the point closest to the upper left of the ROC for a given data set. Based on this threshold and the estimated probabilities from RF, predicted ASD diagnoses were computed for each subject in the test set. The following measures were calculated to evaluate the performance of the algorithm: sensitivity (proportion of ASD positive children who were predicted as ASD positive), specificity (proportion of ASD negative children who were predicted as ASD negative), positive predictive value (proportion of predicted ASD positive children who were actually ASD positive), negative predictive value (proportion of predicted ASD negative children who were actually ASD negative), misclassification rate (proportion of children whose predicted diagnosis differs from their actual diagnosis), area under the receiver operating curve (AUC), positive likelihood ratio (LHR+; ratio of true positive rate and false negative rate), and negative likelihood ratio (LHR−; ratio of false positive rate and true negative rate).This process was repeated for each iteration (250 iterations), resulting in 250 estimates for each measure. We report the mean of these estimates along with a 95% CI composed of the 2.5th and 97.5th percentiles as the interval endpoints.
To create a single prediction algorithm to be used in a clinical setting, we applied RF on the entire dataset (i.e., the entire dataset served as the training set), after rebalancing using SMOTE with 200% oversampling, with the tuning parameters selected using the OOB error rates as per the process used in the CV procedure described above. The mean and CI for sensitivity, specificity, etc., from the repeated CV procedure, serve as accurate estimates of the expected performance of this RF when applied to an independent dataset.13 To assess the classifier’s performance outside of the HR sample, we tested the classifier on the LR (n=72) sample for accuracy, using a probability threshold of 0.5. The RF was not applied to the LR-ASD data as the sample size was too small to properly evaluate the classifier’s performance on this group (n=1). This RF classifier, accessed using the R computing language, can be found at https://github.com/kmdono02/FYI_Random_Forest.
RESULTS
Demographics
The HR-ASD and HR-Neg groups did not differ by age at time of FYI completion, age at outcome assessment, or race. As expected, there were more male children in the HR-ASD (81.5%) compared to the HR-Neg (54.6%) group. In the majority of the cases across both groups the FYI was completed by the mother only (88.7%). There were significant differences in maternal education between groups (p=0.004). Table-2 includes tests of group differences on descriptive and demographics data. Further, in comparing subjects with complete FYI data to those excluded on the basis of not having complete FYI data, we note that gender was significantly different in the HR-ASD group (p=0.04) and child race was significantly different in the HR-ASD (p=0.04) and the LR (p=0.04) groups. See Table S1, available online.
Table 2:
Descriptive and Demographic Data
HR-ASD (n = 54) |
HR-Neg (n = 183) |
LR (n = 72) |
p | Test statistic | Pair-Wise comparisons | |
---|---|---|---|---|---|---|
mean (SD) | ||||||
Age at completion of FYI (mo) | 12.2 (0.3) | 12.3 (0.4) | 12.3 (0.3) | 0.63 | 0.47 | NA |
12-Month clinic visit | ||||||
Age (mo) | 12.6 (0.6) | 12.5 (0.5) | 12.4 (0.4) | 0.09 | 2.48 | NA |
MSEL ELC (standard score) | 94.3 (14.7) | 101.3 (12.7) | 105.5 (12.9) | <0.001 | 11.34 | a,b,c |
MSEL Expressive Language (t-score) | 41.7 (11.8) | 47.1 (11.5) | 48.1 (12.0) | 0.005 | 5.46 | a,b |
MSEL Fine Motor (t-score) | 55.4 (10.2) | 57.1 (8.7) | 60.1 (10.0) | 0.014 | 4.37 | b,c |
MSEL Receptive Language (t-score) | 38.9 (8.7) | 43.5 (9.3) | 45.7 (7.7) | <0.001 | 9.27 | a,b |
MSEL Visual Reception (t-score) | 51.6 (10.3) | 54.5 (9.3) | 56.8 (8.2) | 0.009 | 4.8 | b |
24-Month clinic visit | ||||||
Age (mo) | 24.7 (1.1) | 24.7 (0.9) | 24.7 (1.4) | 1 | 0 | NA |
MSEL ELC (standard score) | 79.3 (19.1) | 103.0 (15.6) | 110.9 (13.6) | <0.001 | 64.28 | a,b,c |
MSEL Expressive Language (t-score) | 37.2 (13.4) | 49.1 (11.5) | 53.2 (9.2) | <0.001 | 32.01 | a,b,c |
MSEL Fine Motor (t-score) | 40.1 (10.6) | 49.8 (9.8) | 55.1 (7.9) | <0.001 | 38.54 | a,b,c |
MSEL Receptive Language (t-score) | 35.1 (16.1) | 52.4 (9.9) | 56.8 (8.9) | <0.001 | 65.15 | a,b,c |
MSEL Visual Reception (t-score) | 41.9 (12.1) | 54.4 (10.4) | 56.6 (11.0) | <0.001 | 33.03 | a,b |
ADOS Calibrated Severity Score | 6.2 (1.8) | 1.7 (1.1) | 1.5 (1.0) | <0.001 | 306.13 | a,b |
Primary respondent of FYI | n (%) | |||||
Father | 3 (5.6) | 10 (5.6) | 2 (2.8) | 0.28d | 5.09 | NA |
Mother | 46 (85.2) | 157 (87.7) | 69 (95.8) | |||
Both | 5 (9.3) | 12 (6.7) | 1 (1.4) | |||
Child sex | ||||||
Female | 10 (18.5) | 83 (45.4) | 30 (41.7) | 0.002 | 12.67 | a,b |
Male | 44 (81.5) | 100 (54.6) | 42 (58.3) | |||
Child race | ||||||
Asian | 0 (0.0) | 2 (1.1) | 1 (1.4) | 0.89e | NA | NA |
Black | 0 (0.0) | 2 (1.1) | 1 (1.4) | |||
Other | 6 (11.1) | 15 (8.2) | 4 (5.5) | |||
White | 48 (88.9) | 164 (89.6) | 67 (91.8) | |||
Father’s highest level of education | ||||||
High school graduate | 17 (32.1) | 55 (30.4) | 15 (22.7) | 0.50d | 5.39 | NA |
Some college | 21 (39.6) | 70 (38.7) | 30 (45.5) | |||
College graduate | 15 (28.3) | 56 (30.9) | 20 (30.3) | |||
Missing | 1 (1.9) | 2 (1.1) | 6 (8.3) | |||
Mother’s highest level of education | ||||||
High school graduate | 21 (38.9) | 55 (30.2) | 10 (15.2) | 0.004d | 15.11 | b,c |
Some college | 17 (31.5) | 80 (44.0) | 25 (37.9) | |||
College graduate | 16 (29.6) | 47 (25.8) | 31 (47.0) | |||
Missing | 0 (0.0) | 1 (0.5) | 6 (8.3) |
Note: ADOS = Autism Diagnostic Observation Schedule; ASD = autism spectrum disorder; ELC = Early Learning Composite; FYI = First Year Inventory; HR = high familial risk; LR = low familial risk; Neg = negative; MSEL = Mullen Scales of Early Learning.
HR-ASD vs HR-Neg
HR-ASD vs LR
HR-Neg vs LR
Chi-Square test p value
Fisher’s exact test p value (no corresponding test statistic)
Group Differences on FYI Domain Scores
Comparison of group means of domain risk scores on diagnostic group, controlling for respondent education, demonstrated significant differences in the FYI domain scores of SC (HR-ASD>HR-Neg>LR; p <0.001) and SR (HR-ASD>HR-Neg>LR; p= 0.01). Post-hoc analyses indicated that HR-ASD infants scored significantly higher (poorer) on the SC (e.g., HR-ASD vs HR-Neg: d= 0.9, a large effect size) and SR domains (e.g. HR-ASD vs HR-Neg: d=0.38, a small to medium effect size). Effect sizes between the HR-Neg and LR groups were negligible (Table-3). These results provide evidence that the FYI broadly differentiates the HR-ASD and HR-Neg groups.
Table 3:
Mean FYI Domain Scores and ANOVA Results by Diagnostic Group
Domain | HR-ASD (n=54) | HR-Neg (n=183) | LR (n=72) |
p | Pair-Wise comparisonsa | ||
---|---|---|---|---|---|---|---|
HR-ASD vs HR-Neg |
HR-ASD vs LR |
HR-Neg vs LR |
|||||
LSM (95% CI) | p (effect size) | ||||||
Social communication | 21.21 (17.48, 24.95) |
10.66 (7.96, 13.35) |
8.69 (5.55, 11.84) |
<0.001 | <0.001 (0.9) | <0.001 (1.07) | 0.24 (0.17) |
Sensory regulatory functions | 8.51 (5.83, 11.19) |
5.32 (3.39, 7.24) |
4.09 (1.84, 6.34) |
0.01 | 0.03 (0.38) | 0.01 (0.53) | 0.3 (0.15) |
Note: ASD = autism spectrum disorder; FYI = First Year Inventory; HR = high familial risk; LSM = least square means; LR = low familial risk; Neg = negative.
Pair-wise comparisons using Holm adjustment
Predicting ASD Outcome at age 24 months using individual FYI items
Based on results from the 5-fold cross validated RF algorithms repeated 50 times on HR infants, the 60 item FYI had an estimated sensitivity of 0.71 (95% CI: 0.50, 0.91), i.e. 71% of infants with ASD were correctly classified by the proposed FYI algorithm at 12 months of age; specificity of 0.72 (95% CI: 0.49, 0.91), i.e. 72% infants not diagnosed with ASD were also correctly classified. The positive predictive value (PPV) was 0.45 (0.28, 0.69) and the negative predictive value (NPV) was 0.89 (95% CI: 0.81, 0.97). AUC was 0.71 (95% CI: 0.53, 0.87). Other measures of predictive accuracy are presented in Table 4. The FYI classifier correctly classified 85% of LR subjects as not having ASD.
Table 4:
Screening Accuracy for the FYI-RF classifier for 60 items and a subset of 20 items
Sensitivity | Specificity | PPV | NPV | MCR | AUC | LHR+ | LHR− | |
---|---|---|---|---|---|---|---|---|
Estimate (95% CI)a | ||||||||
60 FYI items | 0.71 (0.5, 0.91) |
0.72 (0.49, 0.91) |
0.45 (0.28, 0.69) |
0.89 (0.81, 0.97) |
0.28 (0.15,0.46) |
0.71 (0.53, 0.87) |
2.54 (0.98, 10.11) |
0.40 (0.10, 1.02) |
Subset of 20 FYI items | 0.73 (0.45, 0.91) |
0.72 (0.48, 0.92) |
0.45 (0.28, 0.70) |
0.9 (0.82, 0.97) |
0.28 (0.13, 0.46) |
0.74 (0.54, 0.89) |
2.61 (0.90, 11.38) |
0.38 (0.10, 1.10) |
Note: AUC = area under the receiver operating curve; FYI = First Year Inventory; LHR+ = positive likelihood ratio; LHR− = negative likelihood ratio MCR = mis-classification rate; NPV = negative predictive value; PPV = positive predictive value; RF = random forest.
Values calculated using 50 time repeated 5-fold cross validation
An Abbreviated Version of the FYI
After determining the predictive utility of the 60 item FYI, we conducted an exploratory analysis to examine whether a subset of items could achieve a similar level of performance. A briefer screening measure would be more time-efficient and, as a result, likely to increase caregiver compliance. We first sorted the 60 individual items from highest to lowest based on their corresponding mean Gini decrease (a measure of variable importance) from the RF algorithm generated from the entire HR sample. Next, we a priori chose different subsets (e.g., first [highest] 20 items, first 30 items etc.) and predicted ASD outcomes on these subsets. Details on the rationale for using different subsets of items are provided in the Supplement 2, Table S2, Table S3, and Fig S1, available online, along with prediction results. We predicted ASD outcome on different subsets of items using the exact same process used above for the 60-item prediction analysis. A subset of 20 items had a sensitivity of 0.73 (95% CI: 0.45, 0.91) and specificity of 0.72 (95% CI: 0.48, 0.92), comparable to classification metrics with the 60 items – see Table 4. This list of 20 items appears in Table 5.
Table 5:
Abbreviated Version of the First Year Inventory (FYI)
Sl. No. | Variable-importance index |
Original FYI item/question No.a |
Brief description of itemb |
---|---|---|---|
1 | 13.06 | Q.38 | Use of pointing to communicate |
2 | 8.51 | Q.49 | Interest in new games |
3 | 7.27 | Q.22 | Get attention to play physical games |
4 | 7.2 | Q.35 | Look at person/object named |
5 | 6.13 | Q.19 | Get attention to show something interesting |
6 | 6.13 | Q.59 | Object in mouth |
7 | 5.64 | Q.20 | Get attention to play games |
8 | 4.89 | Q.25 | Copy actions |
9 | 4.61 | Q.11 | Play alone |
10 | 4.58 | Q.57 | Typical mood |
11 | 4.4 | Q.9 | Spit out textures |
12 | 4.23 | Q.21 | Get attention for food/toy |
13 | 4.12 | Q.56 | Gross motor skill |
14 | 3.86 | Q.2 | Bothered by sounds |
15 | 3.84 | Q.6 | Avoid looking |
16 | 3.73 | Q.47 | Play with favorite toy |
17 | 3.52 | Q.34 | Use of gestures |
18 | 3.44 | Q.15 | Upset when switching activity |
19 | 3.3 | Q.37 | Stuck on toy part |
20 | 3.29 | Q.4 | Excited by familiar games |
DISCUSSION
The primary aim of this study was to examine whether FYI items could serve as the basis for a new classification algorithm to screen for ASD risk (above that associated with familial risk status) at age 12 months, in high familial risk infants. Our relatively large sample (compared to earlier studies) of children with ASD, recruited through a prospective, longitudinal study, permitted us to examine predictive utility of the FYI by employing an analytic strategy of supervised machine learning with cross validation. We report higher than previously reported sensitivity (0.71) and specificity (0.72). In addition, we demonstrate that the PPV of the classifier (0.45) provides a 91% increase in the number of infants correctly identified as ASD positive, compared to the percentage of those identified solely by high familial risk status (~23% of HR sample in this study).
Measurement of risk at 12 months of age is complicated, owing to the variable timing and pattern of emergence of clinically observable behaviors (see Towle and Patrick for a review).27 Even on the background of these complicating factors, we report sensitivity and specificity of over 0.70 based on parent report. Although we report a lower specificity than other studies utilizing the FYI, we caution that our results (including sensitivity) are not comparable with previous reports due to methodological differences. Here, we report values based on cross validated test set performance in contrast to previous studies that have reported results based on the training set only (which also served simultaneously as the test set). To this point, we note that the training set sensitivity and specificity of the 60 item FYI classifier in our study are 0.86 and 0.94, respectively. This is very likely an overly optimistic estimate of the classifier’s performance due to overfitting to the training set.13
Though we report a relatively low PPV of 0.45, we report an NPV of 0.89. Low PPV is likely due to the low prevalence of ASD in the high familial risk population (~ 1 in 5).28 We also note that sensitivity is the proportion of individuals identified by a measure who have the disorder in the population being studied. In contrast, PPV is defined as the number of individuals who have the disorder among those who are identified by the measure. Thus, we view the FYI classifier developed in this study as having utility as a clinical screening tool among high familial risk infants, that identifies those children who should receive further assessment regarding risk and diagnosis, as opposed to use as a measure leading to definitive action e.g. intervention. Hence, we believe that sensitivity is a better measure of the utility of the FYI in the multistage screening process leading to further assessment and intervention, than PPV. Taken together, results from this study suggest that the FYI classifier developed here should have clinical utility for infants from high risk families, although we acknowledge that the sample studied here is predominantly white and the classifier will need evaluation in samples that represent ethnic and racial diversity. In addition, results from this study can serve as the basis for developing and testing a classifier for use in 12-month-old infants in the general population. In partial support of this we observed that 85% of LR infants, who were not diagnosed with ASD, were correctly classified. It should be noted that due to lower prevalence of ASD in the general population (1 in 54),2 the low PPV of the FYI classifier would be accentuated.
Several caveats should be noted when interpreting these findings. First, given the difference in frequency of occurrence of ASD in high familial risk (~ 1 in 5)28 versus population (1 in 54)2 samples, the utility of the FYI classifier we developed requires examination in a population sample to determine if comparable sensitivity and specificity are maintained. Second, the success of this classifier for infants in the general population will, in part, depend on whether the pattern of ASD features is comparable to those at high familial risk. Although there is evidence to suggest that community-based/clinically referred samples are likely to be more severely affected than those identified through high familial risk studies, we are unaware of evidence demonstrating qualitative differences in symptom patterns in these differently ascertained groups.30,31 Third, parents in our high-risk group may have prior knowledge about ASD symptoms based on experience with the affected proband that may affect responses to the FYI items in HR infants, in comparison to subjects in single-incidence ASD families with no prior personal experience of ASD. Countering this claim, however, is our finding that the difference in group means for the SC and SR domains, between the HR-negative and LR groups, was negligible (d=0.20 and d=0.17 respectively). Alternatively, it is equally plausible that parents with a typically developing older child might be more sensitive to aberrant early development in their infants. Clearly these issues suggest the need for examination of this FYI classifier in a population sample. Although we report screening accuracy measures (i.e. sensitivity, specificity etc.) based on cross validated test set performance, we did not conduct an out-of-sample validation. We are currently addressing this limitation in a new cohort of HR infants. We recognize that a specificity of 72%, for the FYI classifier we developed, could be viewed as a limitation for application to the population, in that, on its face, this might lead to identification of a high number (~28 %) of false positives in the population. We surmise that this may be due to variability in the onset of ASD symptoms and presence of subgroups of children with ASD for whom risk may not be evident until after age 12 months. This highlights the importance of viewing screening for ASD as an ongoing process and recognizing that age 12 months should not be considered a definitive endpoint for risk determination. We also acknowledge that because of a lack of systematic ascertainment of the sample we are unable to assess the degree to which an unknown bias of ascertainment may have impacted the results. Finally, we note that the classifier may perform differently in infants who are at increased risk of ASD for reasons other than family history (e.g. premature birth).32 Future work should assess generalizability of the classifier in sufficiently powered samples beyond the current context of high familial risk, to establish its value for use in the general population.
Identification of a cost-effective, efficient, screening tool for ASD risk in the general population could serve as a first-phase community strategy for early detection of clinically actionable levels of risk. Researchers has recently demonstrated high predictive utility (i.e., sensitivity and PPV of over 0.80) of MRI in the first year of life to predict ASD at 24 months of age.33,34 If replicated, MRI scanning may be useful for determining which infants are at ultra-high risk for ASD, which in turn could lead to intervention in the first year of life. However, imaging all infants in the population is not feasible or cost effective. Hence, if future work demonstrates that the FYI classifier maintains predictive utility when tested on a population sample, it could serve as a first phase screen prior to imaging in the community. Such a two-phase screening process would promote early detection of risk and more efficient strategies for intervention during the prodromal period in the development of autistic symptoms. Indeed, there is emerging evidence, although modest, of the benefits of initiating intervention in this very early period.35,36
Finally, based on an exploratory analysis, we identified 20 items that had similar performance characteristics to the 60 FYI items we tested. If with further study the performance of this abbreviated measure is maintained, use of this shorter instrument would contribute to increased compliance by caregivers. We present items of this shorter version in this paper for subsequent study.
In summary, we propose that the classifier developed in this study, based on parent report of behavioral risk markers, has clinical utility for identifying high familial risk infants who are at elevated risk beyond that attributable to familial risk only. Identification of this population at age 12 months will facilitate early detection for risk to develop the full syndrome and augment potential for prodromal intervention. The classifier developed in this study serves as proof-of-principle that parent report screening measures may be clinically useful in other risk groups (e.g. premature infants) as well as in the general population. Follow-up studies in sufficiently large and representative samples will be necessary to determine the performance of this or related approaches to screening before clinical application are warranted.
Supplementary Material
Acknowledgments
The authors thank the children and their families for their ongoing participation in this longitudinal study, as well as all research assistants and volunteers who have worked on this project through the years. The authors also wish to thank Dr. J. Steven Reznick (deceased) for inspiring this work; Rahul Pawar, PhD, Georgia Institute of Technology, Mahmoud Mostapha, PhD, University of North Carolina at Chapel Hill, and Jessica Girault, PhD, University of North Carolina at Chapel Hill, for their inputs on the analysis plan; Staff members at the Piven lab, University of North Carolina at Chapel Hill, Heidi McNeilly, MA, Leigh Anne Weisenfeld, MA, Michael Graves, BS, Chad Chappell, MA, Shannon Sweeney, BS, and Rachel Smith, BS, for their help with data collection and consolidation.
This work was supported by grants through the National Institutes of Health (R01-HD055741, PI Piven; U54HD079124, PI Piven). Additional funding support has been provided from the Simons Foundation (SFARI grant #140209, PI Piven). Dr. Meera was supported by the Fulbright-Nehru Post-Doctoral fellowship grant USIEF-2264/FNDPR/2017. The funders had no role in study design, data collection, analysis, data interpretation, or writing of the report.
Disclosure:
Dr. Piven has received grant or research support from the Eunice Kennedy Shriver National Institute of Child Health and Human Development, the US Department of Health and Human Services Administration on Intellectual and Developmental Disabilities, the National Institute of Mental Health, NIH, the National Institute of Neurological Disorders and Stroke, the National Institute of Environmental Health Sciences, and the Simons Foundation. He has served on the John Merck Fund Scientific Advisory Board. He is the Editor-in-Chief of the Journal of Neurodevelopmental Disorders. He is co-inventor of UNC file 16-0185, patent application PCT/US2017/040032, “Methods, Systems, and Computer Readable Media for Utilizing Functional Connectivity Brain Imaging for Diagnosis of a Neurobehavioral Disorder.” Drs. Meera, Wolff, Zwaigenbaum, Elison, Kinh, Shen, Estes, Hazlett, Watson, Baranek, Swanson, St. John, Burrows, Schultz, Dager, Botteron, Pandey and Mr. Donovan have reported no biomedical financial interests or potential conflicts of interest.
Appendix
**The Infant Brain Imaging Study (IBIS) Network is an NIH funded Autism Center of Excellence project and consists of a consortium of 8 universities in the US and Canada. Clinical Sites: University of North Carolina: J. Piven (IBIS Network PI), H.C. Hazlett, C. Chappell; University of Washington: S. Dager, A. Estes, D. Shaw; Washington University: K. Botteron, R. McKinstry, J. Constantino, J. Pruett; Children’s Hospital of Philadelphia: R. Schultz, J. Pandey, S. Paterson; University of Alberta: L. Zwaigenbaum; University of Minnesota: J. Elison, J. Wolff; Data Coordinating Center: Montreal Neurological Institute: A.C. Evans, D.L. Collins, G.B. Pike, V. Fonov, P. Kostopoulos, S. Das, L. MacIntyre; Image Processing Core: University of Utah: G. Gerig; University of North Carolina: M. Styner; Statistical Analysis Core: University of North Carolina: H. Gu.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Declaration: All authors have read and contributed to the manuscript and approve of this submission. This manuscript is not under review elsewhere.
References
- 1.American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders. American Psychiatric Association; 2013. doi: 10.1176/appi.books.9780890425596 [DOI] [Google Scholar]
- 2.Maenner MJ, Shaw KA, Baio J, et al. Prevalence of Autism Spectrum Disorder Among Children Aged 8 Years — Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2016. MMWR Surveill Summ. 2020;69(4):1–12. doi: 10.15585/mmwr.ss6904a1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Zwaigenbaum L, Bryson S, Rogers T, Roberts W, Brian J, Szatmari P. Behavioral manifestations of autism in the first year of life. Int J Dev Neurosci. 2005;23(2-3):143–152. doi: 10.1016/j.ijdevneu.2004.05.001 [DOI] [PubMed] [Google Scholar]
- 4.Landa RJ, Gross AL, Stuart EA, Faherty A. Developmental Trajectories in Children With and Without Autism Spectrum Disorders: The First 3 Years. Child Dev. 2013;84(2):429–442. doi: 10.1111/j.1467-8624.2012.01870.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Ozonoff S, Losif A, Baguio F A prospective study of the emergence of early behavioral signs of autism. 2012;100(2):130–134. doi: 10.1016/j.pestbp.2011.02.012.Investigations [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Robins DL, Casagrande K, Barton M, Chen C-MA, Dumont-Mathieu T, Fein D. Validation of the Modified Checklist for Autism in Toddlers, Revised With Follow-up (M-CHAT-R/F). Pediatrics. 2014;133(1):37–45. doi: 10.1542/peds.2013-1813 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Baranek GT, Watson LR, Crais ER, Reznick SJ. First- Year Inventory (FYI) 2.0. Chapel Hill, NC: University of North Carolina at Chapel Hill; 2003. [Google Scholar]
- 8.Feldman MA, Ward RA, Savona D, et al. Development and initial validation of a parent report measure of the behavioral development of infants at risk for autism spectrum disorders. J Autism Dev Disord. 2012;42(1):13–22. doi: 10.1007/s10803-011-1208-y [DOI] [PubMed] [Google Scholar]
- 9.Sacrey L-AR, Bryson S, Zwaigenbaum L, et al. The Autism Parent Screen for Infants: Predicting risk of autism spectrum disorder based on parent-reported behavior observed at 6–24 months of age. Autism. 2018;22(3):322–334. doi: 10.1177/1362361316675120 [DOI] [PubMed] [Google Scholar]
- 10.Turner-Brown LM, Baranek GT, Reznick JS, Watson LR, Crais ER. The First Year Inventory: a longitudinal follow-up of 12-month-old to 3-year-old children. Autism. 2013;17(5):527–540. doi: 10.1177/1362361312439633 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Rowberry J, Macari S, Chen G, et al. Screening for Autism Spectrum Disorders in 12-Month-Old High-Risk Siblings by Parental Report. J Autism Dev Disord. 2014;45(1):221–229. doi: 10.1007/s10803-014-2211-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lee HY, Vigen C, Zwaigenbaum L, et al. The Performance of the First Year Inventory (FYI) Screening on a Sample of High-Risk 12-Month-Olds Diagnosed with Autism Spectrum Disorder (ASD) at 36 Months. J Autism Dev Disord. September 2019. doi: 10.1007/s10803-019-04208-5 [DOI] [PubMed] [Google Scholar]
- 13.Kim J-H. Estimating Classification Error Rate: Repeated Cross-validation, Repeated Hold-out and Bootstrap. Comput Stat Data Anal. 2009;53(11):3735–3745. doi: 10.1016/j.csda.2009.04.009 [DOI] [Google Scholar]
- 14.Rutter M, Bailey A, Lord C, Berument S. Social Communication Questionnaire. Los Angeles: Western Psychological Services; 2003. [Google Scholar]
- 15.Lord C, Rutter M, Le Couteur A. Autism Diagnostic Interview-Revised: A revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders. J Autism Dev Disord. 1994;24(5):659–685. doi: 10.1007/BF02172145 [DOI] [PubMed] [Google Scholar]
- 16.Maxwell EM. Family Interview for Genetic Studies (FIGS): A Manual for FIGS. Bethesda, MD: National Institutes of Mental Health; 1992. [Google Scholar]
- 17.Reznick JS, Baranek GT, Reavis S, Watson LR, Crais ER. A parent-report instrument for identifying one-year-olds at risk for an eventual diagnosis of autism: The first year inventory. J Autism Dev Disord. 2007;37(9):1691–1710. doi: 10.1007/s10803-006-0303-y [DOI] [PubMed] [Google Scholar]
- 18.Mullen EM. Mullen Scales of Early Learning. Circle Pines, MN: AGS Publishing; 1995. [Google Scholar]
- 19.Lord C, Risi S, Lambrecht L, et al. The Autism Diagnostic Observation Schedule---Generic: A Standard Measure of Social and Communication Deficits Associated with the Spectrum of Autism. J Autism Dev Disord. 2000;30(3):205–223. doi: 10.1023/A:1005592401947 [DOI] [PubMed] [Google Scholar]
- 20.Gotham K, Pickles A, Lord C. Standardizing ADOS Scores for a Measure of Severity in Autism Spectrum Disorders. J Autism Dev Disord. 2009;39(5):693–705. doi: 10.1007/s10803-008-0674-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Estes A, Zwaigenbaum L, Gu H, et al. Behavioral, cognitive, and adaptive development in infants with autism spectrum disorder in the first 2 years of life. J Neurodev Disord. 2015;7(1):1–10. doi: 10.1186/s11689-015-9117-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2017. https://www.r-project.org/. [Google Scholar]
- 23.Holm S A Simple Sequentially Rejective Multiple Test Procedure. Scand J Stat. 1979;6(2):65–70. http://www.jstor.org/stable/4615733. [Google Scholar]
- 24.Breiman L Random Forests. Mach Learn. 2001;45(1):5–32. doi: 10.1023/A:1010933404324 [DOI] [Google Scholar]
- 25.Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification And Regression Trees. Routledge; 2017. doi: 10.1201/9781315139470 [DOI] [Google Scholar]
- 26.Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res. 2002;16:321–357. doi: 10.1613/jair.953 [DOI] [Google Scholar]
- 27.Towle PO, Patrick PA. Autism Spectrum Disorder Screening Instruments for Very Young Children: A Systematic Review. Autism Res Treat. 2016;2016:1–29. doi: 10.1155/2016/4624829 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Ozonoff S, Young GS, Carter A, et al. Recurrence Risk for Autism Spectrum Disorders: A Baby Siblings Research Consortium Study. Pediatrics. 2011;128(3):e488 LP–e495. doi: 10.1542/peds.2010-2825 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Grimes DA, Schulz KF. Uses and abuses of screening tests. Lancet. 2002;359(9309):881–884. doi: 10.1016/S0140-6736(02)07948-5 [DOI] [PubMed] [Google Scholar]
- 30.Sacrey L-AR, Zwaigenbaum L, Szatmari P, et al. Brief Report: Characteristics of preschool children with ASD vary by ascertainment. J Autism Dev Disord. 2017;47(5):1542–1550. doi: 10.1007/s10803-017-3062-z [DOI] [PubMed] [Google Scholar]
- 31.Micheletti M, McCracken C, Constantino JN, Mandell D, Jones W, Klin A. Research Review: Outcomes of 24- to 36-month-old children with autism spectrum disorder vary by ascertainment strategy: a systematic review and meta-analysis. J Child Psychol Psychiatry. April 2019:jcpp.13057. doi: 10.1111/jcpp.13057 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Joseph RM, O’Shea TM, Allred EN, et al. Prevalence and associated features of autism spectrum disorder in extremely low gestational age newborns at age 10 years. Autism Res. 2017;10(2):224–232. doi: 10.1002/aur.1644 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Emerson RW, Adams C, Nishino T, et al. Functional neuroimaging of high-risk 6-month-old infants predicts a diagnosis of autism at 24 months of age. Sci Transl Med. 2017;9(393):eaag2882. doi: 10.1126/scitranslmed.aag2882 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Hazlett HC, Gu H, Munsell BC, et al. Early brain development in infants at high risk for autism spectrum disorder. Nature. 2017;542(7641):348–351. doi: 10.1038/nature21369 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Green J, Charman T, Pickles A, et al. Parent-mediated intervention versus no intervention for infants at high risk of autism: A parallel, single-blind, randomised trial. The Lancet Psychiatry. 2015;2(2):133–140. doi: 10.1016/S2215-0366(14)00091-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Green J, Pickles A, Pasco G, et al. Randomised trial of a parent-mediated intervention for infants at high risk for autism: longitudinal outcomes to age 3 years. J Child Psychol Psychiatry Allied Discip. 2017;58(12):1330–1340. doi: 10.1111/jcpp.12728 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.