Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Sep 1.
Published in final edited form as: J Child Psychol Psychiatry. 2015 Apr 29;56(9):988–998. doi: 10.1111/jcpp.12421

Diagnostic stability in young children at risk for autism spectrum disorder: a baby siblings research consortium study

Sally Ozonoff 1, Gregory S Young 1, Rebecca J Landa 2, Jessica Brian 3, Susan Bryson 4, Tony Charman 5, Katarzyna Chawarska 6, Suzanne L Macari 6, Daniel Messinger 7, Wendy L Stone 8, Lonnie Zwaigenbaum 9, Ana-Maria Iosif 1
PMCID: PMC4532646  NIHMSID: NIHMS675997  PMID: 25921776

Abstract

Background

The diagnosis of autism spectrum disorder (ASD) made before age 3 has been found to be remarkably stable in clinic- and community-ascertained samples. The stability of an ASD diagnosis in prospectively ascertained samples of infants at risk for ASD due to familial factors has not yet been studied, however. The American Academy of Pediatrics recommends intensive surveillance and screening for this high-risk group, which may afford earlier identification. Therefore, it is critical to understand the stability of an ASD diagnosis made before age 3 in young children at familial risk.

Methods

Data were pooled across 7 sites of the Baby Siblings Research Consortium. Evaluations of 418 later-born siblings of children with ASD were conducted at 18, 24, and 36 months of age and a clinical diagnosis of ASD or Not ASD was made at each age.

Results

The stability of an ASD diagnosis at 18 months was 93% and at 24 months was 82%. There were relatively few children diagnosed with ASD at 18 or 24 months whose diagnosis was not confirmed at 36 months. There were, however, many children with ASD outcomes at 36 months who had not yet been diagnosed at 18 months (63%) or 24 months (41%).

Conclusions

The stability of an ASD diagnosis in this familial-risk sample was high at both 18 and 24 months of age and comparable with previous data from clinic- and community-ascertained samples. However, almost half of children with ASD outcomes were not identified as being on the spectrum at 24 months and did not receive an ASD diagnosis until 36 months. Thus, longitudinal follow-up is critical for children with early signs of social-communication difficulties, even if they do not meet diagnostic criteria at initial assessment. A public health implication of these data is that screening for ASD may need to be repeated multiple times in the first years of life. These data also suggest that there is a period of early development in which ASD features unfold and emerge but have not yet reached levels supportive of a diagnosis.

Keywords: Pre-school children, autism spectrum disorders, diagnosis

Introduction

The stability of an autism spectrum disorder (ASD) diagnosis made at a young age is of high interest, given the impact of early intervention, the provision of which requires early identification. While studies performed over the past two decades robustly demonstrated a high degree of stability in children aged three years or older at initial diagnosis (Woolfenden et al., 2012), there was initial concern about the stability of diagnosis for children identified before age 3. Both clinicians and researchers raised important questions, given the costs of early autism treatment, about the youngest age at which a reliable diagnosis could be made. In many communities, there was a general reluctance to diagnose children before age three. Questions about the permanence of diagnosis have been highlighted by recent empirical reports of children who, in middle or later childhood, no longer meet criteria for ASD (Anderson, Liang & Lord, 2014; Fein et al., 2013; Orinstein et al., 2014). However, in recent years, multiple studies have demonstrated impressive stability in children diagnosed before three years as well, with a meta-analysis reporting an overall stability rate of 86.3% for maintaining an ASD diagnosis over time (Rondeau et al., 2011). Similar findings were reported by Woolfenden et al. (2012) in a systematic review of 10 studies of toddlers diagnosed before their third birthday.

A review of stability and other classification indices from all previous studies of children younger than 36 months at first diagnosis was conducted and can be seen in Table S1 [available online]. The table aggregates across ASD subtypes and uses dichotomous classifications of ASD and Not ASD. The first set of studies reported in Table S1 followed only children with a diagnosis of ASD and did not include a comparable sample of children without ASD. Positive predictive value, which reflects stability of the diagnosis, is high, with a range of 63% to 100% across five investigations. Although stability rates and numbers of false positives can be calculated from these studies, they cannot address another important aspect of classification accuracy, false negative rates. False negatives may reflect missed diagnoses, later onset of symptoms, and/or borderline phenotypes that result in initial clinical uncertainty and caution in making early diagnoses. Therefore, longitudinal follow-up of children without autism spectrum diagnoses at the initial evaluation is critical to understanding clinical decision-making, although it is not formally needed for calculation of stability.

The next group of studies reported in Table S1 includes children with and without ASD at Time 1 so that additional classification parameters can be calculated (sensitivity, specificity, etc.). Across 8 studies with clinically ascertained samples, the positive predictive value ranged from 72% to 100% (with half of the studies over 93%) and the negative predictive value ranged from 67% to 100% (half the studies over 89%). These classification indices are highly influenced by the base rate of the condition in the samples studied (Altman & Bland, 1994). Using samples ascertained from clinics, where there has already been a degree of concern raised that was sufficient to bring the child to clinical attention, is likely to increase the base rate of ASD, in turn biasing rates of false positives and negatives, and increasing stability estimates.

Community-ascertained samples have the potential to provide less biased psychometric indices of classification accuracy. Four such studies are summarized last in Table S1. The positive predictive value ranged from 83% to 100%, comparable to the estimates for clinically ascertained samples. For practical reasons, many community-based studies employ a pre-screening design, in which only those who screen positive at Time 1 are followed longitudinally. For example, van Daalen et al. (2009) screened 31,724 children through primary care visits at 14 months of age and then followed 131 of the screen-positives for 12 months to calculate stability indices. Similarly, Guthrie and colleagues (2013) performed a two-step screening of 5,419 children in primary care and then followed 82 children who screened positive for two years to provide their estimates of stability. Thus, even in these community-based studies, the base rates of ASD, and thus the stability estimates, may have been overestimated by the screening process and sampling frame.

Another type of sample that may contribute to understanding the stability of early ASD diagnoses is a familial-risk sample. In such studies, participants at familial risk for ASD by virtue of having an older affected sibling are generally enrolled in longitudinal studies in early infancy, before the initial behavioral signs are usually evident (Ozonoff et al., 2010) and prior to when parents begin to report concerns (Hess & Landa, 2012; Ozonoff et al., 2009). They have not been ‘pre-screened’ based on symptoms before the initial evaluation, potentially reducing such sampling biases that may influence stability. In addition to identifying young children with ASD outcomes to follow, such samples also identify children with typical development and those with a wide range of clinical presentations, including subclinical difficulties in the core areas associated with ASD (Messinger et al., 2013; Ozonoff et al., 2014). Given the potential for much earlier detection, diagnosis, and treatment of children with a positive family history (Johnson et al., 2007; Ozonoff et al., 2011), it is critical to examine the stability of early classification in young children at familial risk of ASD. The current study had two aims: 1) to examine the stability at 36 months of a clinical diagnosis of ASD made at 18 and 24 months of age in infants at familial risk for ASD and 2) to explore phenotypic differences among children who were correctly and incorrectly classified at 18 and 24 months. Addressing these aims required a large sample and thus the present study utilized data from a multi-site cohort of infants whose data were collected as part of an international collaboration to study infants with an older sibling with ASD.

Method

Participants

The Baby Siblings Research Consortium (BSRC) is an international network that, with support from Autism Speaks, pools data from individually-funded research sites to study the development of infants at familial risk for ASD. The present analyses were carried out using data contributed from 7 sites (University of Alberta, Dalhousie University, Kennedy Krieger Institute, McMaster University, University of California – Davis, University of Toronto, Yale University) whose procedures and common measures permitted data pooling. Informed consent was obtained at each site prior to data collection, as well as Institutional Review Board approval to collect and analyze de-identified data from all sites.

Infant participants were later-born biological siblings of a child with ASD (99% were full siblings). Diverse community enrollment strategies were employed across sites, including recruitment from clinics and agencies serving individuals with ASD, community events (conferences, health fairs) targeted at families affected by ASD, other ASD studies at respective sites’ universities, websites targeted to ASD, word of mouth (parents referring other parents), fliers posted in the community, mailings, and media announcements. Inclusion required a documented diagnosis of DSM-IV Autistic Disorder, Asperger Disorder, or Pervasive Developmental Disorder Not Otherwise Specified in the affected older sibling and no identified neurological or genetic condition in the infant or older sibling that could account for an ASD diagnosis (e.g., fragile X syndrome). Additional inclusion criteria were maximum enrollment age of 18 months, outcome assessment age of 36 months, and availability of both a clinical diagnosis (ASD or not ASD) and scores on the Autism Diagnostic Observation Schedule (ADOS) at 18, 24, and 36 months of age. For families with multiple enrolled infants, only the infant recruited at the youngest age was included. All BSRC sites meeting these inclusion criteria were included in the present analyses, resulting in a total sample size of 418 participants across seven sites.

Measures

Clinical best estimate (CBE) diagnosis

Each site had established procedures for making clinical diagnoses at 18, 24, and 36 months, including: 1) ADOS administration by a research-reliable examiner, 2) clinical diagnosis using DSM-IV criteria, 3) diagnosis made or verified by licensed clinicians, and 4) 36-month outcome assessments performed by examiners unaware of risk group and previous diagnostic decisions. Although this study was initiated prior to the publication of DSM-5 and diagnoses were made initially using DSM-IV criteria, in order to be consistent with current practice, and given the inconsistent application of the DSM-IV sub-categories (Lord et al., 2012) that may be especially the case in younger children, all clinical diagnoses were dichotomized as ASD or Not ASD for analyses.

Autism Diagnostic Observation Schedule (ADOS; Lord et al., 2002)

The ADOS is a standardized protocol that measures symptoms of ASD and provides an empirically derived cutoff for ASD that has high inter-rater reliability and construct validity. The 2002 communication+social interaction algorithm score was used because item-level data, necessary for calculation of newer algorithms, was not available from all sites.

Mullen Scales of Early Learning (Mullen, 1995)

This is a standardized developmental test for children birth to 68 months that provides T scores (mean=50, SD=10) for nonverbal cognitive, receptive and expressive language, and gross and fine motor skills. The Mullen scales have excellent internal consistency and test-retest reliability.

Demographic information was collected at each site (see Table 1). Parent-reported race and ethnicity classifications of the infant were collapsed for analysis into two dichotomous variables (Caucasian/Not Caucasian and Hispanic/Not Hispanic). Another dichotomous variable was created indicating whether the infant's family was simplex (one older sibling with ASD) or multiplex (more than one older sibling with ASD).

Table 1.

Characteristics of the sample (n = 418)

Age at enrollment in months, mean (SD) 7.0 (4.1)
Gender, n (%)
    Female 172 (41%)
    Male 246 (59%)
Outcome (36 months), n (%)
    ASD 110 (26%)
    Not ASD 308 (74%)
Race1, n (%)
    Caucasian 308 (83%)
    Non-Caucasian 61 (17%)
Hispanic2, n (%)
    No 260 (95%)
    Yes 14 (5%)
Multiplex Status3, n (%)
    No 343 (89%)
    Yes 44 (11%)

Note: ASD = autism spectrum disorder.

1

Frequency Missing = 49

2

Frequency Missing = 144

3

Frequency Missing = 31

Statistical approach

Psychometric measures of the performance of a CBE diagnostic classification at 18 and 24 months were computed. Differences in sensitivity and specificity for 18- and 24-month CBE diagnostic classification were tested using McNemar's test (Li & Fine, 2004). The positive and negative predictive values of the 18- and 24-month diagnoses were compared using Wald test statistics derived from the weighted least square method for analyses of binary data (Wang, Davis & Soong, 2006).

To examine group differences in ADOS and Mullen scores at the 18-, 24-, and 36-month visits, mixed-effects linear models (Laird & Ware, 1982) were employed. These models are flexible and allow for unequally spaced and missing observations. All core models included fixed effects for group membership, the linear and the quadratic effect of age (centered at 18 months), and the interaction between group and the linear age effect. To account for the correlated nature of the data, the core models included two random effects for child-specific intercepts and slopes, as well as a random effect for site. Additional fixed terms (for the interaction of the quadratic effect of age with group and for ADOS module) were also added to the core model and tested. These terms were retained in the models only if they were significant.

Residual analyses and graphical diagnostics were used to determine that model assumptions were adequately met. Positive and negative predictive values for 18- and 24-month CBE were compared using the R program SCPVTBT (www.ugr.es/~bioest/software.htm). Mixed-effect analyses were conducted using PROC MIXED in SAS Version 9.4 (SAS Institute, Cary NC). All tests were two-sided, with α = 0.05.

Results

Table 2 provides stability and other classification indices at 18 and 24 months of age (using diagnosis at 36 months as the outcome standard) for this sample of 418 children at familial risk for ASD. More ASD diagnoses were made at 24 months (n = 79) than at 18 months (n = 44). This results in significant increases in sensitivity (p < 0.001) and decreases in the number of false negatives (p = 0.003) from 18 to 24 months of age. There is also a small but statistically significant decrease (p = 0.02) in positive predictive value from 18 months (93%) to 24 months (82%). This reflects the greater number of false positives at 24 months (n = 14) than at 18 months (n = 3). The 18- and 24-month stability rates in this familial-risk sample fall within the range of, and are consistent with, the stability rates for children under age 3 in clinic- and community-ascertained samples reviewed in Table S1.

Table 2.

Stability and diagnostic classification parameters at 18 and 24 months

ASD at 36 months Not ASD at 36 months Sensitivity (95% CI) Specificity (95% CI) PPV/Stability (95% CI) NPV (95% CI)
ASD (True Positives) Not ASD (False Negatives) ASD (False Positives) Not ASD (True Negatives)
18 Months CBE 41 69 3 305 37.3% (28%-47%) 99.0% (97%-100%) 93.2% (81%-99%) 81.6% (77%-85%)
24 Months CBE 65 45 14 294 59.1% (49%-68%) 95.5% (92%-97%) 82.3% (72%-90%) 86.7% (83%-90%)

Note: ASD = autism spectrum disorder; CBE = Clinical Best Estimate. CI=Confidence Interval. TP= True Positives, FN = False Negatives, FP = False Positives, TN = True Negatives. Separate analyses were conducted for 18 and 24 month CBE.

Sensitivity is the percentage of those diagnosed at 36 months who were identified at the earlier visit, calculated as TP/(TP+FN) × 100.

Specificity is the percentage of those without a diagnosis at 36 months who were correctly identified as Not ASD at the earlier visit, calculated as TN/(TN+FP) × 100.

Positive predictive value (PPV) is the percentage of those identified with ASD at the earlier visit who retain the diagnosis at 36 months, calculated as TP/(TP+FP) × 100. Equivalent to stability.

Negative predictive value (NPV) is the percentage of those identified as Not ASD at the earlier visit who are verified free of diagnosis at 36 months, calculated as TN/(TN+FN) × 100.

As depicted in Figure 1, eight patterns of stability are generated when a dichotomous diagnostic decision (ASD or Not ASD) is made at three ages. Some children are consistently identified as ASD or Not ASD (i.e., AAA or NNN patterns in Table 4), others are classified in a way that evolves over time, in both directions (i.e., ANN, AAN, NAA, NNA), and still others move back and forth between ASD and Not ASD classifications at different ages (i.e., ANA, NAN). Due to the very small sample sizes in several of the subgroups and to allow for comparison with other studies that use the language of classification science (e.g., true and false positives and negatives), we consolidated the 8 patterns into four conservatively-defined stability groups. Diagnosis at 36 months was used as the gold standard. A stable ‘positive’ early assessment was defined as meeting criteria for ASD at 18 and 24 months (e.g., True Positives [TP] = AAA), while a stable ‘negative’ early assessment was defined as not meeting criteria for ASD at both 18 and 24 months (e.g., True Negatives [TN] = NNN). The unstable groups were also defined conservatively, in that a classification at either 18 or 24 months that differed from the classification at 36 months led to inclusion in these groups. Thus, False Positives [FP] met ASD criteria at 18 and/or 24 months but not 36 months, while False Negatives [FN] failed to meet ASD criteria at 18 and/or 24 months but did at 36 months. The resulting classifications can be seen in Table 3.

Figure 1.

Figure 1

Stability of Clinical Best Estimate outcome classifications across visits

Table 4.

Estimated scores and 95% confidence intervals for the four stability groups

Variable True Positives (n = 38) False Negatives (n = 72) False Positives (n = 15) True Negatives (n = 293)
ADOS Social-Communication Score
    18 months 14.0 (12.6 - 15.4) 7.0 (6.0 - 8.1)a,b 6.9 (4.7 - 9.0) a,b 4.5 (4.1 - 4.9)
    24 months 12.7 (11.5 - 14.0) 9.2 (8.3 - 10.1) a,b 10.8 (8.8 - 12.7)b 3.5 (3.2 - 2.9)
    36 months 12.4 (11.1 - 13.7) 11.6 (10.6 - 12.6)b,c 5.2 (3.2 - 7.2) a,b,c 3.0 (2.4 - 3.7)

Mullen Expressive Language T Score
    18 months 32.9 (28.5 - 37.3) 42.0 (38.2 - 45.8)a,b 40.6 (34.0 - 47.3)a,b 48.4 (46.1 - 50.8)
    24 months 36.5 (32.7 - 40.4) 45.1 (41.9 - 48.3)a,b 45.2 (39.5 - 50.9)a,b 52.2 (50.0 - 54.4)
    36 months 37.9 (33.9 - 41.9) 45.4 (42.1 - 48.8)a,b 48.4 (42.4 - 54.4)a 53.7 (51.6 - 55.9)

Mullen Receptive Language T Score
    18 months 26.6 (21.4 - 31.8) 37.9 (33.8 - 42.1)a,b 34.5 (25.9 - 43.1)b 48.0 (45.3 - 50.7)
    24 months 34.1 (29.8 - 38.5) 44.2 (40.7 - 47.6)a,b 42.3 (35.6 - 49.1)a,b 54.2 (51.7 - 56.7)
    36 months 36.4 (31.9 - 40.9) 43.8 (40.4 - 47.2)a,b 45.2 (38.6 - 51.7)a,b 53.7 (51.4 - 56.1)

Mullen Visual Reception T Score
    18 months 42.4 (38.7 - 46.1) 45.5 (42.6 - 48.5)b 44.4 (38.7 - 50.2)b 51.6 (49.5 - 53.6)
    24 months 40.0 (36.6 - 43.4) 45.8 (43.1 - 48.5)a,b 45.7 (40.5 - 50.8)a,b 53.7 (51.7 - 55.7)
    36 months 37.9 (32.8 - 42.9) 48.9 (45.2 - 52.6)a,b 50.8 (43.3 - 58.2)a,b 60.5 (58.2 - 62.8)

Mullen Fine Motor T Score
    18 months 44.3 (41.6 - 47.1) 50.1 (47.4 - 52.7)a,b 47.4 (43.1 - 51.8)b 52.8 (51.3 - 54.4)
    24 months 39.4 (36.8 - 42.0) 44.1 (42.9 - 47.3)a,b 44.4 (40.4 - 48.4)a,b 51.0 (49.6 - 52.4)
    36 months 34.0 (29.8 - 38.2) 39.7 (36.5 - 43.0)a,b 42.9 (36.5 - 49.2)a,b 52.0 (50.3 - 53.7)

Note:

a

Significant differences (p < .05) from True Positives

b

Significant differences (p < .05) from True Negatives

c

Significant differences (p < .05) between False Positives and False Negatives Groups; For ADOS the estimates are for Module 1 scores; scores on Module 2 were 1.2 points higher.

Table 3.

Patterns of Clinical Best Estimate outcome classifications across visits

Clinical Best Estimate Outcome Total (n = 418) ASD at 36 months (n = 110) Not ASD at 36 months (n = 308) Classification
18 months 24 months 36 months
A A A 38 35% - True Positives

A A N 2 - 0.7% False Positives
A N N 1 - 0.3% False Positives
N A N 12 - 4% False Positives

A N A 3 3% - False Negatives
N A A 27 25% - False Negatives
N N A 42 38% - False Negatives

N N N 293 - 95% True Negatives

Note: ASD = autism spectrum disorder, A = ASD, N = Not ASD.

Table 4 presents estimated means and 95% confidence intervals from the mixed-models for ADOS and Mullen scores for the four stability groups. Full details of these models are provided in Table S2. Five sets of group differences were of interest (comparisons of the FP and FN groups to the TP and TN groups, as well as to each other) and are summarized in Table 4 and Figure 2.

Figure 2.

Figure 2

Means ± 1 standard errors for 18, 24, and 36 month ADOS and Mullen scores for the four stability groups

At 18 and 24 months, the clinical features of the FN group were intermediate between the TP and TN groups. They had higher Mullen and lower ADOS scores than the TP group, but lower Mullen and higher ADOS scores than the TN group, suggesting that, although they were not yet diagnosed with ASD, they were atypical at 18 and 24 months. By 36 months, the FN and TP groups had similar ADOS scores, but the FN group's Mullen remained higher than that of the TP group.

The patterns of group differences were quite similar for the FP group, who, like the FN group, demonstrated Mullen and ADOS scores that were intermediate between and significantly different from both the TP and TN groups at 18 and 24 months. At 36 months, the Mullen scores of the FP group remained lower and their mean ADOS score was still higher than the TN group, so they demonstrated continued atypical development. However, their 36-month ADOS scores now differed from the TP group.

We found no statistically significant differences between the FP and FN groups at either 18 or 24 months; in addition, the confidence intervals were almost completely overlapping on every measure at both ages (see Table 4) and the effect sizes of the differences were in the small range (with the largest d across all scales = 0.3 at 18 months and 0.2 at 24 months). At 36 months, there continued to be no differences on the Mullen scales, but the FN group now had a significantly higher ADOS score than the FP group.

Discussion

This study had two aims: 1) to examine the stability at 36 months of a clinical diagnosis of ASD made at 18 and 24 months in young children at familial risk for ASD, and 2) to explore phenotypic differences among children who were correctly and incorrectly classified at 18 and 24 months. The familial-risk design had a number of strengths. Improving upon previous studies, three longitudinal visits were conducted, the ages of which corresponded with screening ages recommended by the American Academy of Pediatrics (AAP; Johnson et al., 2007). In addition, the familial-risk cohort was not biased by clinical ascertainment or by the pre-screening selection methods often applied to community-based samples.

Regarding Aim 1, the stability rates (i.e., positive predictive value estimates) of 93% at 18 months and 82% at 24 months in this familial-risk sample were comparable to previous studies of both clinically and community ascertained samples younger than age three. The consistent positive predictive value across different types of samples provides some reassurance that previously published stability rates were not overly influenced by ascertainment methods. The high rates of diagnostic stability across studies and methodologies indicate that when ASD is identified at 18 or 24 months, the diagnosis is very likely to be retained, so implementation of treatment should begin as soon as possible.

The low sensitivity of an ASD diagnosis at 18 months and the decrease in stability from 18 to 24 months suggest that there may have been age-dependent differences in clinical calibration operating in this familial-risk sample. It appears that at 18 months, clinicians monitored their decision-making such that if the clinical picture was not certain, they waited to make the diagnosis until later. Indeed, the ratio of false negatives to false positives approached 5:1, suggesting that clinicians’ ratings were conservative and biased towards committing as few positive identification errors as possible. But when clinicians were confident in identifying the phenotype, even at early ages (e.g., 18 months) and did make a diagnosis, they were generally correct and it was verified at subsequent visits. Another explanation for differences in clinical decision-making at the two ages may lie in the subclinical social and communication difficulties that have been documented in even very young siblings of children with ASD (Landa & Garrett-Mayer, 2006; Landa, Holman, & Garrett-Mayer, 2007; Messinger et al., 2013; Ozonoff et al., 2014). Clinicians in this study needed to differentiate between emerging signs of ASD and subclinical features more consistent with the broader autism phenotype, a much more subtle distinction than ordinarily faced in a clinic setting. This may have encouraged clinicians in the current investigation to diagnose only the most affected children at 18 months of age.

While negative predictive value at 18 months was respectable (81.6%), the number of false negatives was quite high. For many families who already have a child with ASD, hearing that their 18-month-old does not meet criteria for a diagnosis will not be reassuring, given that the rate of missed diagnoses (18.4%) at this age is close to or higher than previously published recurrence rates for ASD (Ozonoff et al., 2011; Sandin et al., 2014). One public health implication of this study is that screening may need to be repeated after 24 months, since many toddlers with ASD in this sample were not identified until three years of age. While the AAP's screening guidelines (Johnson et al., 2007) were a step forward for clinical practice, our data suggest that they may need to go further still. For example, our results suggest that rescreening high-risk groups (e.g., siblings of children with ASD, children with developmental delays) at three years of age will identify some children whose ASD symptoms were not apparent at earlier ages.

The second aim of this study was to examine what differentiates the diagnostically stable and unstable groups. The FP and FN groups demonstrated an intermediate phenotype, with higher developmental levels and fewer ASD features than the TP group, but lower developmental functioning and more ASD symptoms than the TN group. The FP and FN groups were very similar to each other in global scores on the developmental and diagnostic tests at 18 and 24 months, so it is intriguing to speculate on the factors involved in clinical decision-making that led a clinician to diagnose one child with ASD and to classify another child with similar scores as non-ASD. There may have been particular symptom patterns that, when present, influenced clinicians to make (or not make) a clinical diagnosis. For example, a recent study identified several features at 18 months that were especially predictive of an ASD diagnosis, such as poor eye contact, lack of communicative gestures, and repetitive behaviors (Chawarska et al., 2014). It is possible that, even with similar ADOS algorithm scores, the FP and FN groups differed in individual symptoms or constellations of symptoms. Factors not measured in the current study, such as medical and developmental history, level of parent or pediatrician concern, or delays in additional areas, such as motor or adaptive functioning, may also have influenced clinicians to make versus hold off on a diagnosis at 18 and 24 months.

At each age, the FN group demonstrated significantly higher developmental functioning on the Mullen than the TP group. One interpretation of these data is that the FN group was composed of higher-functioning children with ASD who had a later onset of symptoms or whose symptoms were subtle at first and masked by age-appropriate language and cognitive abilities. These results are convergent with the results of a recent study that employed a data mining approach, rather than a CBE diagnostic process, to classify ASD at 18 months (Chawarska et al., 2014). In that study, a decision-tree learning algorithm identified correctly over half of the ASD cases at 18 months, but missed those who had less pronounced developmental delays and fewer symptoms of ASD. This suggests that the high rate of false negatives in the current study might be linked with the developmental dynamics observed in young children developing ASD, rather than with particular classification methods.

At 36 months, the FP group continued to demonstrate significantly lower Mullen and higher ADOS scores than the TN group. Thus, they continued to experience developmental difficulties, even though they no longer met criteria for an ASD diagnosis. More differentiated clinical outcomes were assigned at 36 months at each participating site. Of the 15 children in the False Positive group, only two were considered to be typically developing or have no diagnosis at 36 months. Over half (9 of the 15) children in the FP group demonstrated atypical social-communication features consistent with the broader autism phenotype, as has been found in other familial-risk samples (Georgiades et al., 2012; Messinger et al., 2013; Ozonoff et al., 2014). Two others in the FP group were classified at 36 months with speech-language delays, one with global developmental delays, and one with other developmental concerns that did not meet criteria for another clinical classification. This suggests that a history of atypical social-communication behavior at 18 or 24 months constitutes an important clinical indicator of later problems and suggests that these children should be monitored closely after age three, even though they may no longer meet ASD criteria.

Some might wonder if the false positive cases in this study were actually children with ‘optimal outcomes’ (Fein et al., 2013; Sutera et al., 2007), possibly secondary to early treatment. It is challenging, however, to compare the present investigation to previous studies of optimal outcome, which followed participants much longer, into later childhood. Intervention history data were available from only a few sites in the current study and the small sample size precluded formal analysis. Previous studies, however, have generally not found that number of intervention hours predicts outcome. In the meta-analysis of stability by Woolfenden et al. (2012), they note that in the subset of five studies that examined intervention hours as a predictor of outcome, none reported significant differences between the diagnostically stable and unstable groups. Anderson, Liang and Lord (2014) did not find that membership in their ‘very positive outcome group’ was predicted by hours of treatment in early childhood. Orinstein et al. (2014) reported that children who lost their diagnosis were more likely to have received applied behavior analysis services than children who retained a diagnosis, but there were no differences between the outcome groups in number of hours of therapy. To better address the question, it is critical for future prospective studies to collect data in a systematic way on intervention history.

In this familial-risk sample, false negatives were much more common than false positives, highlighting some of the consequences of using 24 months as a final outcome age in infant sibling study designs (e.g., Shen et al., 2013; Wolff et al., 2014). While the low rate of false positives and high stability may make this a tempting strategy in terms of funding and publication timelines, it does come at some cost. In this study, over 40% of the group diagnosed with ASD at outcome had not yet been identified at 24 months. While the high false negative rate in studies using 24 months as the age of final outcome may appear to present simply a conservative bias, the implications may be broader. Not only will the numbers of false negatives lead to misclassification at 24 months, potentially affecting the statistical significance of group differences, but also they may result in a non-representative sample. In the present study, the group diagnosed with ASD at 24 months had significantly more severe symptoms and lower developmental functioning than those who were not diagnosed until 36 months. As a result, it is possible that studies using a 24-month outcome may not be generalizable to the larger population of young children with ASD.

What are the potential lessons learned from this study in terms of clinical decision-making and diagnosis of ASD at 18 and 24 months? Could we have identified the false negatives any earlier? Is there anything that distinguishes the false positives from the true positives that would have helped clinicians realize that they would not meet criteria later and their initial diagnosis was inaccurate? There are few answers to these questions in the current dataset. The FP and FN groups are both higher-functioning developmentally than the TP group, which may have clouded the clinical picture by interacting with the expression of autism symptoms. To improve early identification efforts in these clinically complex later-born siblings of children with ASD, future research could examine whether there are particular symptom patterns associated with accurate and inaccurate early classifications, as done recently by Chawarska and colleagues (2014) in a larger familial-risk sample.

Although the labels of false positive and false negative were used in this study in accordance with conventions in classification science, they may be misleading or even inappropriate. The way these terms are usually employed in classification science is to indicate diagnostic errors or failures of the assessment protocol to identify true underlying patterns. In this study, however, inclusion in these groups may also be due to later emerging phenotypes or symptom patterns that change with age. The pattern of ADOS scores over time clearly falls in the FP group and rises in the FN group. Since all sites maintained high standards for initial training and ongoing reliability of ADOS administration, it is unlikely that clinician error resulted in these changing patterns over time. It is more likely that shifting phenotypes in the toddlers, transient autism signs in the former group and later emerging signs in the latter, are responsible for the changes in classification. Indeed, the pattern of rising ADOS scores in the FN group is consistent with multiple previous studies demonstrating a period in which symptoms are on the increase but have not yet reached levels at which a diagnosis can be confirmed (Landa & Garrett-Mayer, 2006; Ozonoff et al., 2010, 2014). The current data suggest that the unstable diagnostic classifications may not be diagnostic errors as much as they are reflections of an unfolding, emerging picture that goes in both directions (symptoms intensifying and lessening). Finally, it is worth reiterating that the ‘unstable’ FP and FN groups were defined very conservatively in this study, with misclassifications at either 18 or 24 months leading to inclusion in these groups. While it may be alarming that such a large proportion of children with ASD went undiagnosed by expert clinicians in the second year of life, it is likely that many of these children were nonetheless eligible for early intervention services, given their lower developmental functioning and higher level of ASD symptoms than the TN group.

Infancy is characterized by rapid changes in development as well as significant behavioral variability from moment to moment, features which themselves make early diagnosis challenging. Fisch (2012) cites low test-retest correlations across multiple developmental areas in infancy and points out the psychometric and norming limitations of many measures of infant development. Yet the stability of an ASD diagnosis, both in the present investigation and in numerous previous studies (Rondeau et al., 2011; Woolfenden et al., 2012), is impressive and is substantially higher than the stability rate reported for developmental delay classifications. Moura et al. (2010) studied a population-based cohort of 3,907 infants, tested at 12 months and again at 24 months with the Batelle Developmental Screening Inventory. Of the 390 suspected of developmental delay at 12 months, only 58 continued to test positive at 24 months, yielding a stability estimate of 15% that is considerably lower than the 80% or better rates reported for ASD in the current and previous investigations.

This study had several limitations. Infant sibling study designs have inherent biases that differ from clinic- and community-based investigations. Biased enrollment of infants with higher levels of parental concern cannot be ruled out. This, or other unknown biases of the infant sibling methodology, may have contributed to a slight (and not surprising, given the restrictive inclusion criteria) elevation in recurrence rate in this sample, relative to previously reported rates (Gronberg et al., 2013; Ozonoff et al., 2011; Sandin et al., 2014).

Currently, there are no published studies comparing the clinical phenotypes of familial and non-familial cohorts and so the results of the present investigation may or may not generalize to the general population of young children with ASD. This caveat notwithstanding, it is critical that we understand the stability of early diagnoses in the familial-risk group. Such children have the potential to be identified early, since the AAP recommends performing more intensive surveillance on infants with a positive family history of ASD (Johnson et al., 2007). The high stability and low rate of false positive diagnoses documented in the current study support the AAP guidelines for extra surveillance for this high-risk group and provide reassurance that early screening, assessment, and referral to intervention will not be wasted effort. However, the modest negative predictive value and high rate of false negatives found in the current study at 18 and 24 months also suggest that, even in the context of the intensified surveillance that occurs in infant sibling studies, not all children are demonstrating clear enough clinical phenotypes to be identified prior to 36 months, particularly those with higher cognitive levels. More work is clearly needed to guide future surveillance efforts for this population.

Supplementary Material

Supp TableS1-S2

Key points.

  • Clinical diagnoses of ASD made before age 3 years have been shown in previous research to be quite stable in samples of children ascertained from clinics or the community.

  • Stability was comparably high in a large sample of children under age 3 at heightened familial risk for ASD. Few children were classified as having ASD at 18 or 24 months who were not confirmed at 36 months.

  • Sensitivity of the clinical diagnosis was relatively low at 18 and 24 months, with close to half the sample not diagnosed until 36 months of age.

  • These data suggest that screening for ASD should be repeated multiple times during the first years of life.

Acknowledgments

This original article was invited by the journal as part of a special issue, it has undergone full, external peer review.

Data collection for this manuscript was supported by National Institutes of Health grants MH068398 (PI: Ozonoff), P01 HD003008, Project 1 (PI: Chawarska), R01MH087554 (PI: Chawarska), and R01MH059630 (PI: Landa), and grants from the Canadian Institutes of Health Research 62924 and 102665 (PIs: Zwaigenbaum, Bryson), NeuroDevNet (PI: Zwaigenbaum), and Autism Speaks Canada (PIs: Zwaigenbaum, Bryson). Autism Speaks provided funding for the Baby Siblings Research Consortium database. The authors gratefully acknowledge Diane Larzelere, UC Davis, for her editorial assistance with this manuscript, Alycia Halladay, for her organizational support at Autism Speaks, and especially the children and families who participated at each site.

Footnotes

Conflict of interest: No conflicts declared

The authors have declared that they have no competing or potential conflicts of interest.

Supporting information

Additional supporting information is provided along with the online version of this article. Table S1. Classification accuracy parameters for previously published studies of children diagnosed with ASD before age 3

Table S2. Parameter estimates (standard errors) for the mixed-effects regression models

References

  1. Altman DG, Bland JM. Diagnostic tests 2: Predictive values. British Medical Journal. 1994;309:102. doi: 10.1136/bmj.309.6947.102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Anderson DK, Liang JW, Lord C. Predicting young adult outcome among more and less cognitively able individuals with autism spectrum disorders. Journal of Child Psychology and Psychiatry. 2014;55(5):485–494. doi: 10.1111/jcpp.12178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Chawarska K, Shic F, Macari S, Campbell DJ, Brian J, Landa R, Bryson S. Younger siblings of children with autism spectrum disorder: A baby siblings research consortium study. Journal of the American Academy of Child and Adolescent Psychiatry. 2014 doi: 10.1016/j.jaac.2014.09.015. Epub ahead of print. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Fein D, Barton M, Eigsti IM, Kelley E, Naigles L, Schultz RT, Tyson K. Optimal outcome in individuals with a history of autism. Journal of Child Psychology and Psychiatry. 2013;54(2):195–205. doi: 10.1111/jcpp.12037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Fisch GS. Autism and epistemology III: Child development, behavioral stability, and reliability of measurement. American Journal of Medical Genetics Part A. 2012;158:969–979. doi: 10.1002/ajmg.a.35269. [DOI] [PubMed] [Google Scholar]
  6. Georgiades S, Szatmari P, Zwaigenbaum L, Bryson S, Brian J, Garon N. A prospective study of autistic-like traits in unaffected siblings of probands with autism spectrum disorder. JAMA Psychiatry. 2012;70(1):42–48. doi: 10.1001/2013.jamapsychiatry.1. [DOI] [PubMed] [Google Scholar]
  7. Gronberg TK, Schendel DE, Parner ET. Recurrence of autism spectrum disorders in full- and half-siblings and trends over time: A population-based cohort study. JAMA Pediatrics. 2013;167:947–953. doi: 10.1001/jamapediatrics.2013.2259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Guthrie W, Swineford LB, Nottke C, Wetherby AM. Early diagnosis of autism spectrum disorder: Stability and change in clinical diagnosis and symptom presentation. Journal of Child Psychology and Psychiatry. 2013;54(5):582–590. doi: 10.1111/jcpp.12008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Hess CR, Landa RJ. Predictive and concurrent validity of parent concern about young children at risk for autism. Journal of Autism and Developmental Disorders. 2012;42(4):575–584. doi: 10.1007/s10803-011-1282-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Johnson CP, Myers SM, the Council on Children with Disabilities Identification and evaluation of children with autism spectrum disorders. Pediatrics. 2007;120(5):1183–1215. doi: 10.1542/peds.2007-2361. [DOI] [PubMed] [Google Scholar]
  11. Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]
  12. Landa R, Garrett-Mayer E. Development in infants with autism spectrum disorders: A prospective study. Journal of Child Psychology and Psychiatry. 2006;47(6):629–638. doi: 10.1111/j.1469-7610.2006.01531.x. [DOI] [PubMed] [Google Scholar]
  13. Landa RJ, Holman KC, Garrett-Mayer E. Social and communication development in toddlers with early and later diagnosis of autism spectrum disorders. Archives of General Psychiatry. 2007;64(7):853–864. doi: 10.1001/archpsyc.64.7.853. [DOI] [PubMed] [Google Scholar]
  14. Li J, Fine J. On sample size for sensitivity and specificity in prospective diagnostic accuracy studies. Statistics in Medicine. 2004;23(16):2537–2550. doi: 10.1002/sim.1836. [DOI] [PubMed] [Google Scholar]
  15. Lord C, Petkova E, Hus V, Gan W, Lu F, Martin DM, Risi S. A multisite study of the clinical diagnosis of different autism spectrum disorders. Archives of General Psychiatry. 2012;69:306–313. doi: 10.1001/archgenpsychiatry.2011.148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Lord C, Rutter M, DiLavore PC, Risi S. Autism Diagnostic Observation Schedule manual. WPS; Los Angeles: 2002. [Google Scholar]
  17. Messinger D, Young GS, Ozonoff S, Dobkins K, Carter A, Zwaigenbaum L, Sigman M. Beyond autism: a baby siblings research consortium study of high-risk children at three years of age. Journal of the American Academy of Child & Adolescent Psychiatry. 2013;52(3):300–308. doi: 10.1016/j.jaac.2012.12.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Moura DR, Costa JC, Santos IS, Barros AJ, Matijasevich A, Halpern R, Barros FC. Natural history of suspected developmental delay between 12 and 24 months of age in the 2004 Pelotas birth cohort. Journal of Paediatrics & Child Health. 2010;46:329–336. doi: 10.1111/j.1440-1754.2010.01717.x. [DOI] [PubMed] [Google Scholar]
  19. Mullen EM. Mullen scales of early learning. AGS; Circle Pines, MN: 1995. [Google Scholar]
  20. Orinstein AJ, Helt M, Troyb E, Tyson KE, Barton ML, Eigsti IM, Fein DA. Intervention for optimal outcome in children and adolescents with a history of autism. Journal of Developmental & Behavioral Pediatrics. 2014;35(4):247–256. doi: 10.1097/DBP.0000000000000037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Ozonoff S, Iosif AM, Baguio F, Cook IC, Hill MM, Hutman T, Young GS. A prospective study of the emergence of early behavioral signs of autism. Journal of the American Academy of Child and Adolescent Psychiatry. 2010;49(3):256–266. [PMC free article] [PubMed] [Google Scholar]
  22. Ozonoff S, Young GS, Belding A, Hill MM, Hill A, Hutman T, Iosif A. The broader autism phenotype in infancy: When does it emerge? Journal of the American Academy of Child and Adolescent Psychiatry. 2014;53:398–407. doi: 10.1016/j.jaac.2013.12.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Ozonoff S, Young GS, Carter A, Messinger D, Yirmiya N, Zwaigenbaum L, Stone WL. Recurrence risk for autism spectrum disorders: a Baby Siblings Research Consortium study. Pediatrics. 2011;128(3):e488–e495. doi: 10.1542/peds.2010-2825. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Ozonoff S, Young GS, Steinfeld MB, Hill MM, Cook I, Hutman T, Sigman M. How early do parent concerns predict later autism diagnosis? Journal of Developmental and Behavioral Pediatrics. 2009;30(5):367. doi: 10.1097/dbp.0b013e3181ba0fcf. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Rondeau E, Klein LS, Masse A, Bodeau N, Cohen D, Guilé JM. Is pervasive developmental disorder not otherwise specified less stable than autistic disorder? A meta-analysis. Journal of Autism and Developmental Disorders. 2011;41(9):1267–1276. doi: 10.1007/s10803-010-1155-z. [DOI] [PubMed] [Google Scholar]
  26. Sandin S, Lichtenstein P, Kuja-Halkola R, Larsson H, Hultman CM, Reichenberg A. The familial risk of autism. JAMA. 2014;311:1770–1777. doi: 10.1001/jama.2014.4144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. SAS Institute, Inc. SAS/STAT Version 9.4. Cary, NC, USA: [Google Scholar]
  28. Shen MD, Nordahl CW, Young GS, Wootton-Gorges SL, Lee A, Liston SE, Amaral DG. Early brain enlargement and elevated extra-axial fluid in infants who develop autism spectrum disorder. Brain. 2013 doi: 10.1093/brain/awt166. doi.org/10.1093/brain/awt166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Sutera S, Pandey J, Esser EL, Rosenthal MA, Wilson LB, Barton M, Fein D. Predictors of optimal outcome in toddlers diagnosed with autism spectrum disorders. Journal of Autism and Developmental Disorders. 2007;37(1):98–107. doi: 10.1007/s10803-006-0340-6. [DOI] [PubMed] [Google Scholar]
  30. Van Daalen E, Kemner C, Dietz C, Swinkels SH, Buitelaar JK, Van Engeland H. Inter-rater reliability and stability of diagnoses of autism spectrum disorder in children identified through screening at a very young age. European Child & Adolescent Psychiatry. 2009;18(11):663–674. doi: 10.1007/s00787-009-0025-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Wang W, Davis CS, Soong SJ. Comparison of predictive values of two diagnostic tests from the same sample of subjects using weighted least squares. Statistics in Medicine. 2006;25(13):2215–2229. doi: 10.1002/sim.2332. [DOI] [PubMed] [Google Scholar]
  32. Wolff JJ, Botteron KN, Dager SR, Elison JT, Estes AM, Gu H, Piven J. Longitudinal patterns of repetitive behavior in toddlers with autism. Journal of Child Psychology and Psychiatry. 2014;55(8):945–953. doi: 10.1111/jcpp.12207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Woolfenden S, Sarkozy V, Ridley G, Williams K. A systematic review of the diagnostic stability of autism spectrum disorder. Research in Autism Spectrum Disorders. 2012;6(1):345–354. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp TableS1-S2

RESOURCES