Skip to main content
JAMA Network logoLink to JAMA Network
. 2022 Oct 17;176(12):1233–1241. doi: 10.1001/jamapediatrics.2022.3605

Clinician Diagnostic Certainty and the Role of the Autism Diagnostic Observation Schedule in Autism Spectrum Disorder Diagnosis in Young Children

William Barbaresi 1,, Jaclyn Cacia 2, Sandra Friedman 3, Jill Fussell 4, Robin Hansen 5, Johannes Hofer 6, Nancy Roizen 7, Ruth E K Stein 8, Douglas Vanderbilt 9,10, Georgios Sideridis 1
PMCID: PMC9577880  PMID: 36251287

This study evaluates consistency between clinical diagnosis and diagnosis incorporating the Autism Diagnostic Observation Schedule and examines clinician and child factors that predict consistency between index and reference standard diagnoses.

Key Points

Question

What is the role of the Autism Diagnostic Observation Schedule (ADOS) for diagnosis of autism spectrum disorder (ASD) in young children?

Findings

In this diagnostic study of 349 children ages 18 months to 5 years, 11 months, there was 90.0% agreement between index diagnoses (ie, clinical diagnosis) and reference standard diagnoses (ie, diagnosis including information from ADOS). Clinician diagnostic certainty was the best predictor of consistency between index diagnoses and reference standard diagnoses.

Meaning

The ADOS is not required for ASD diagnosis in young children; specialist clinicians can identify children for whom the ADOS may contribute to accurate diagnosis.

Abstract

Importance

Autism spectrum disorder (ASD) affects 1 in 44 children. The Autism Diagnostic Observation Schedule (ADOS) is a semi-structured observation developed for use in research but is considered a component of gold standard clinical diagnosis. The ADOS adds time and cost to diagnostic assessments.

Objective

To evaluate consistency between clinical diagnosis (index ASD diagnosis) and diagnosis incorporating the ADOS (reference standard ASD diagnosis) and to examine clinician and child factors that predict consistency between index diagnoses and reference standard diagnoses.

Design, Setting, and Participants

This prospective diagnostic study was conducted between May 2019 and February 2020. Developmental-behavioral pediatricians (DBPs) made a diagnosis based on clinical assessment (index ASD diagnosis). The ADOS was then administered, after which the DBP made a second diagnosis (reference standard ASD diagnosis). DBPs self-reported diagnostic certainty at the time of the index diagnoses and reference standard diagnoses. The study took place at 8 sites (7 US and 1 European) that provided subspecialty assessments for children with concerns for ASD. Participants included children aged 18 months to 5 years, 11 months, without a prior ASD diagnosis, consecutively referred for possible ASD. Among 648 eligible children, 23 refused, 376 enrolled, and 349 completed the study. All 40 eligible DBPs participated.

Exposures

ADOS administered to all child participants.

Main Outcomes and Measures

Index diagnoses and reference standard diagnoses of ASD (yes/no).

Results

Among the 349 children (279 [79.7%] male; mean [SD] age, 39.9 [13.4] months), index diagnoses and reference standard diagnoses were consistent for 314 (90%) (ASD = 250; not ASD = 64) and changed for 35. Clinician diagnostic certainty was the most sensitive and specific predictor of diagnostic consistency (area under curve = 0.860; P < .001). In a multilevel logistic regression, no child or clinician factors improved prediction of diagnostic consistency based solely on clinician diagnostic certainty at time of index diagnosis.

Conclusions and Relevance

In this prospective diagnostic study, clinical diagnoses of ASD by DBPs with vs without the ADOS were consistent in 90.0% of cases. Clinician diagnostic certainty predicted consistency of index diagnoses and reference standard diagnoses. This study suggests that the ADOS is generally not required for diagnosis of ASD in young children by DBPs and that DBPs can identify children for whom the ADOS may be needed.

Introduction

Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by deficits in social communication and the presence of restricted and repetitive behaviors.1 The increasing prevalence of ASD, estimated at 1 in 44 eight-year-old children, is an acknowledged public health crisis.2 Early, intensive treatment offers the best hope for improved function and outcomes.3 ASD is diagnosed according to behavioral criteria from the Diagnostic and Statistical Manual of Mental Disorders (Fifth Edition) (DSM-5), including deficits in social communication and social behavior (A criteria) and presence of restricted, repetitive behaviors or interests (B criteria).1 Diagnosis of ASD by a health care professional is required for a child to access treatment through insurance or government-funded early intervention and school programs, depending on the state of residence.4,5 There are long wait lists, often many months long, to access diagnostic services and significant disparities in the age of diagnosis by race, ethnicity, socioeconomic status, and geography.2,4,6,7,8,9,10

The Autism Diagnostic Observation Schedule, Second Edition (ADOS-2) is a semistructured observation that allows examiners to observe behaviors relevant to ASD.11,12 The ADOS was developed to establish ASD status among research patients and studies of its accuracy have primarily been conducted in research settings.13,14 The ADOS takes 45 to 60 minutes to administer, additional time to score and interpret, and requires specialized training. According to the ADOS manual, “…clinical judgment should overrule the ADOS-2 Classification in achieving a best-estimate clinical diagnosis.”11 Despite limited research in clinical settings, the ADOS is considered by some to be a component of a gold standard ASD clinical diagnosis, and is often required for access to early intervention, school-based intervention, and intensive behavioral treatment.2,4,5,6,7,15,16 The use of semistructured ASD assessments (including the ADOS) varies from 12.9% to 100% across 10 sites in the Developmental-Behavioral Pediatric Research Network (DBPNet).17

Given the prevalence of ASD, limited data on the clinical use of the ADOS, frequent requirement for ADOS results to access treatment, importance of early diagnosis, long wait lists, and disparities in age of diagnosis, it is critically important to evaluate both the accuracy and efficiency of different diagnostic approaches.18 Therefore, this multisite prospective diagnostic study was conducted to evaluate the consistency between clinical ASD diagnosis (index diagnosis) and diagnosis that incorporates results of the ADOS (reference standard diagnosis). Factors that predict consistency between the index and reference standard diagnoses were examined.

Methods

Study Design and Participants

This prospective diagnostic study collected data prior to completion of index diagnoses and reference standard diagnoses for ASD. The Standards for Reporting of Diagnostic Accuracy Studies (STARD) reporting guidelines were followed in this report. The study was approved by the institutional review boards of the Children’s Hospital of Philadelphia (DBPNet coordinating center) and the 8 participating sites (7 DBPNet and 1 Austrian site), each of which provides subspecialty assessments for children with ASD. The study was completed during the course of routine clinical care and included children from ages 18 months to 5 years, 11 months consecutively referred for possible ASD between May 2019 and February 2020. Written informed consent was obtained from the parent or legal guardian of each participant. Children were excluded if they had a prior diagnosis of ASD established through a multidisciplinary team assessment or the child was non-English speaking (non-German speaking at the Austrian site). Board eligible or certified Developmental-Behavioral Pediatricians (DBPs) who completed diagnostic assessments were also considered participants in this study and consented to participate. At the Austrian site, physicians were specialists trained to diagnose ASD.

Study Procedure

Screening Phase

Prior to enrollment of children, all participating DBPs completed a demographic form. Eligible children were identified based on a review of physician schedules and medical records for age, reason for referral, and confirmation of no prior ASD diagnosis (Figure).

Figure. Participants Included in Study of the Use of the Autism Diagnostic Observation Schedule (ADOS) to Diagnose Autism Spectrum Disorder (ASD) in Young Children.

Figure.

ADOS indicates Autism Diagnostic Observation Schedule; ADOS-T, Autism Diagnostic Observation Schedule-Toddler Module.

aOne site was in Austria and language exclusion was non-German speaking.

bEight individuals were enrolled but did not have the index diagnosis documented by the developmental pediatrician.

Child Assessment Phase 1 and Completion of Index Diagnosis

A demographic form was completed including visit date, age, parent-reported race and ethnicity, insurance type, primary caregiver education level, and primary reason for referral. DBPs had access to clinical intake forms that included information about presenting concerns and referring clinician questions, any prior assessments, and, when available, results of the Modified Checklist for Autism in Toddlers-Revised screening tool.19 Each child was evaluated by a DBP who completed a medical and developmental history, physical examination, and clinical observation form developed for this study that included a DSM-5 ASD symptom checklist, DSM-5 severity ratings for A and B criteria, and visit characteristics (total time spent by the diagnosing clinician directly observing and/or interacting with the child, number of visits to complete assessment). The presence of significant aggressive, hyperactive, or inattentive/distractible behaviors was recorded. All child participants had a clinical evaluation determined by the usual approach and assessment tools at each site, including cognitive/developmental, language, adaptive, and social skills measures.

At the conclusion of phase 1, the DBP made and recorded a diagnosis of ASD yes or ASD no. This constituted the index diagnosis (clinical diagnosis). The DBPs self-rated the degree of certainty of their diagnosis on a 10-point Likert scale with poles labeled as follows: 1 indicated low (not at all certain) and 10 indicated high (very certain).

Child Assessment Phase 2 and Completion of Reference Standard Diagnosis

During phase 2, the ADOS-2 or Autism Diagnostic Observation Schedule-Toddler Module (ADOS-T), was administered to each child by clinical staff trained to administer the ADOS in a clinically reliable manner.11,12,20 The ADOS-2 yields 3 classifications (nonspectrum, spectrum, autism). The ADOS was required for this study, and therefore considered to be a study procedure. The DBP was provided with the results of the ADOS-2 or ADOS-T and, on the basis of the ADOS plus information from phase 1, the DBP again made a diagnosis of ASD-yes or ASD-no, and self-rated their degree of diagnostic certainty. The diagnosis based on the combination of all clinically obtained data plus the results of the ADOS-2 or ADOS-T constituted the final, reference standard diagnosis (clinical diagnosis including ADOS). Index diagnoses and reference standard diagnoses were completed on the same day, except for 2 sites, at which the ADOS-2 and ADOS-T were completed several days to weeks later due to site logistics.

Analyses

Frequencies and descriptive statistics describe participant and site characteristics, as well as the assessment results by site (eMethods in the Supplement). Intended sample size for univariate analyses was based on a power calculation for logistic regression to identify factors associated with differences between index diagnoses and reference standard diagnoses. Using sex as an example, a sample size of 300 observations (75% male) achieves 80% power at an α level of .05 (2-tailed test). When considering power, a 2-tailed significance was considered using α equals 5%. An exception was the multivariate, multilevel logistic model for which the significance of a predictor and its entrance into the model is judged considering any amounts of variance this predictor shares with others. Thus, to not ignore meaningful predictors when building the multivariate model, a 1-tailed test was the criterion for a predictor’s inclusion into the model.

Multilevel multivariate logistic regression was used to compare index diagnoses and reference standard diagnoses.21 The model was estimated using full maximum likelihood estimation using 300 Laplace iterations. Deviance statistics were used across nested models to justify the inclusion of additional predictors based on their contribution to model fit (eMethods in the Supplement).

Power of the logistic regression coefficients was estimated so that an odds ratio (OR) of 2.0 would be significant 80% of the time using a nominal α level of 5% and a 2-tailed test.22,23 Given the observed ratio of consistency/inconsistency between index diagnoses and reference standard diagnoses, estimated sample size for power levels equal to 80% was 188 participants.

Receiver operating characteristic (ROC) analyses were used to supplement the multilevel logistic model by adding the predictive ability of variables to correctly classify consistency between index and reference standard diagnoses (sensitivity) vs inconsistency between index diagnoses and reference standard diagnoses (specificity). Thus, information on predictive ability, as well as the cutoff values in the predictor that maximize prediction is also presented.24 Less than 60% area under the curve (AUC) represents chance classification accuracy.25 The significance of the curve supplemented by effect size conventions were both used.

Power for the ROC curve was estimated using the following specifications: power equals 80%, type-II error equals 20%, null hypothesis ROC equals 50%; alternative hypothesis ROC equals 70%; ratio of positive vs negative cases 10:1. The required sample sizes for 80% power were equal to 18 and 180 participants for a total of n = 198. Missing data represented less than 1% of the sample across all analyses and were treated using pairwise deletion.

Results

Participants/Demographics

The reference standard diagnosis was completed for 349 children (Table 1). Diagnoses were made by 40 DBPs. Overall, 279 children (79.9%) were male, 276 children were non-Hispanic (80.5% of 346 with data available) and 212 children were White (60.7% of 323 with data available). Scores on cognitive, language, and adaptive measures were typical for a population of young children referred for possible ASD.1 For example, among those with a reference standard diagnosis of ASD, 39.6% had mild, moderate, or severe cognitive impairment.

Table 1. Child Participant and Clinician Demographic Characteristics.

Child participants No. (%)
All sites Range across sites
No. of participants 349 28-65
Index diagnosis of ASD 249 (71.3) 17-46 (60.7-70.8)
Reference diagnosis of ASD 250 (71.6) 18-44 (64.3-78.6)
Child’s age, mean (SD), mo 39.90 (13.4) 33.47-49.86
Sex
Male 279 (79.9) 23-49 (74.4-85.7)
Female 70 (20.1) 4-16 (14.3-25.6)
Racea
>1 Race 33 (9.4) 0-8 (0-24.1)
Asian 22 (6.3) 0-8 (0-19.5)
Black 52 (14.9) 1-16 (3.0-28.6)
Hawaiian/Pacific Islander 1 (0.3) 0-1 (0-1.5)
Native American 2 (0.6) 0-1 (0-3.6)
White 212 (60.7) 12-43 (27.9-80.5)
Unknown/not reported 26 (7.4) 0-1 (0-1.5)
Ethnicitya
Hispanic 67 (19.5) 1-54 (1.5-9.1)
Non-Hispanic 276 (80.5) 19-54 (44.2-96.4)
Unknown 6 (1.7)
Insurance
Medicaid/SCHIP/CHIP 184 (52.7) 1-45 (3.6-100)
Private 155 (44.4) 0-44 (0-89.3)
Military 9 (2.6) 0-3 (0-11.5)
Self-pay 0 (0.0) 0 (0)
Primary caregiver education level 343
<HS 38 (11.1) 0-10 (0-24.4)
HS/GED 82 (23.9) 1-28 (1.8-68.3)
Some post-HS 63 (18.4) 0-18 (0-36.0)
College graduate 105 (30.6) 0-26 (0-53.5)
Graduate degree 55 (16.0) 1-18 (3.2-32.7)
Unknown 6 (1.7) 1-3 (2.3-10.7)
ASD diagnosis ASD Not ASD
Child participant assessment results by reference standard diagnosis 250 99
Behavior problems noted 225 99
Aggressive 39 (17) 27 (27)
Hyperactive 86 (38) 36 (36)
Inattentive/distractible 100 (44) 36 (36)
Cognitive assessmentb 154 65
Average to above average 55 (36) 40 (62)
Borderline 38 (25) 11 (17)
Mild impairment 41 (27) 10 (15)
Moderate impairment 13 (8) 3 (5)
Severe/profound impairment 7 (5) 1 (2)
Language assessmentb 138 50
Average to above average 11 (8) 14 (28)
Borderline 23 (17) 17 (34)
Mild impairment 38 (27) 12 (24)
Moderate impairment 49 (36) 3 (6)
Severe/profound impairment 17 (12) 4 (8)
Adaptive assessmentb 162 56
Average to above average 17 (10) 11 (20)
Borderline 46 (28) 29 (52)
Mild impairment 76 (47) 14 (25)
Moderate impairment severe/profound impairment 21 (13) 0 (0)
Impairment 2 (1) 2 (3)
Participating DBP clinicians
No. 40 3-9
ADOS routinely used at site (Y/N) 34 (85.0) NA
Sex
Female 33 (82.5) NA
Male 7 (17.5) NA
Age, mean (SD) 48.10 (10.7) NA
Years of experience, mean (SD) 14.04 (11.9) NA
ASD care a primary responsibility 33 (82.5) NA

Abbreviations: ADOS, Autism Diagnostic Observation Schedule; ASD, autism spectrum disorder; CHIP, Children’s Health Insurance Program; DBP, developmental-behavioral pediatricians; GED, general educational development diploma; HS, high school; NA, not applicable; SCHIP, State Children’s Health Insurance Program.

a

Race and ethnicity were self-reported.

b

Scores for Cognitive, Language and Adaptive measures were obtained using different instruments across sites. Standard scores from individual test results were therefore categorized as average to above average, borderline, mild impairment, moderate impairment, or severe to profound impairment according to the following convention: average to above average, higher than 84; borderline: 69 to 84; mild impairment: 54 to 69; moderate impairment: 39 to 54; severe to profound impairment: less than 39.

Among participating DBPs, 33 were women (82.5%), mean age was 48.1 years, with a mean of 14 years since training was completed. Due to variability across sites, site of data collection was treated as a random variable across all models.

Assessment Characteristics and Index vs Reference Standard Diagnoses

Overall, there was consistency between index diagnoses and reference standard diagnoses for 314 children (90%) (ASD, 250; not ASD, 64), while diagnoses differed for 35 (index-ASD; reference-not ASD, 17; index-not ASD; reference-ASD, 18) (Table 2). Among 250 children who received a reference standard diagnosis of ASD, 232 children (92.8%) also received an index diagnosis of ASD. DBP diagnostic certainty increased from 7.5 at the time of the index diagnosis to 8.7 at the time of the reference standard diagnosis (P < .001). The reference standard diagnosis was consistent with the diagnostic categorization from the ADOS for 322 children (92.2%) and differed for 27 children (7.7%). These findings are consistent with rates of agreement noted in the ADOS manual.11,12,20

Table 2. Assessment Characteristics by Site and Overall.

Characteristic No. (%)
Site 1 Site 2 Site 3 Site 4 Site 5 Site 6 Site 7 Site 8 All sites
Children assessed, No. 55 28 33 43 65 56 28 41 349
Index and reference standard diagnoses
Index = ASD, reference = ASD 38 (69.1) 20 (71.4) 26 (78.8) 29 (67.4) 39 (60) 43 (76.8) 17 (60.7) 20 (48.8) 232 (66.5)
Index = not-ASD, reference = ASD 5 (9.0) 1 (3.6) 1 (3) 7 (16.3) 2 (3.1) 1 (1.8) 1 (3.6) 0 (0) 18 (5.2)
Index = ASD, reference = not-ASD 2 (3.6) 0 (0) 2 (6.1) 3 (6.9) 7 (10.8) 3 (5.4) 0 (0) 0 (0) 17 (4.8)
Index = not-ASD, reference = not-ASD 10 (18.2) 7 (25) 4 (12.1) 4 (9.3) 17 (26.1) 9 (16.1) 10 (35.7) 21 (51.2) 82 (23.4)
Total No. of DBP assessments at site
DBP clinician who completed assessment 55 27 33 43 66 55 28 41 348
Attending MD 44 (80.0) 22 (81.5) 28 (84.8) 41 (95.3) 62 (93.9) 50 (90.9) 16 (57.1) 41 (100) 304 (87.4)
Supervised trainee 11 (20.0) 5 (18.5) 5 (15.2) 2 (4.7) 4 (6.1) 5 (9.1) 12 (42.9) 0 (0) 44 (12.6)
Time spent in direct observation
Children with data on time spent, No. 55 5a 32 42 65 56 27 41 323
Time spent, min
<30 6 (10.9) 0 (0) 10 (31.3) 2 (4.8) 0 (0) 2 (3.6) 14 (51.9) 0 (0) 34 (10.5)
31-60 36 (65.5) 1 (20.0) 6 (18.8) 6 (14.3) 51 (78.5) 35 (62.5) 11 (40.7) 0 (0) 146 (45.2)
61-90 13 (23.6) 4 (80.0) 16 (50.0) 34 (81.0) 14 (21.5) 19 (33.9) 2 (7.4) 41 (100) 143 (44.3)
No. of visits to complete assessment
Total No. of children with assessment results 55 27 32 43 65 55 28 41 346
1 55 (100) 16 (59.3) 30 (93.8) 24 (55.8) 2 (3.0) 24 (43.6) 6 (21.4) 32 (78.0) 189 (54.5)
2 0 (0) 11 (40.7) 1 (3.1) 7 (16.3) 64 (97.0) 31 (56.4) 22 (78.6) 9 (22.0) 145 (41.8)
>2 0 (0) 0 (0) 1 (3.1) 12 (27.9) 0 (0) 0 (0) 0 (0) 0 (0) 13 (3.7)
Assessment results available to diagnosing clinician at time of index diagnosis 55 28 33 43 66 56 29 41 657b
Cognitive/developmental 51 (92.7) 14 (50.0) 21 (63.6) 30 (69.8) 6 (9.1) 30 (53.6) 0 (0) 41 (100) 193 (29.4)
Language 0 (0) 1 (3.6) 1 (3.0) 24 (55.8) 35 (53.0) 47 (83.9) 1 (3.4) 40 (97.6) 149 (22.7)
Adaptive 17 (30.9) 9 (32.1) 24 (72.7) 19 (44.2) 44 (66.7) 22 (39.3) 0 (0) 0 (0) 135 (20.5)
Social 32 (58.2) 13 (46.4) 2 (6.1) 36 (83.7) 59 (89.4) 14 (25.0) 0 (0) 24 (58.5) 180 (27.4)

Abbreviations: ASD, autism spectrum disorder; DBP, developmental-behavioral pediatricians.

a

Data were missing for time spent in direct observation for all but 5 participants at site 2.

b

The total number of formal, standardized assessment results available would equal the total number per site times 4 (for the 4 assessments) if all children had all 4 types of assessment administered at each site. However, assessments were administered as part of routine clinical practice at each site and therefore this was not the case. For example, at site 1, 93% of the child participants had a cognitive assessment, none had a language, 31% had an adaptive assessment, and 58% had a social assessment.

Diagnoses Among Child Participants Not Receiving Reference Standard Diagnosis of ASD

Among the 99 children who did not receive a reference standard diagnosis of ASD, 49 children (49.5%) received more than 1 diagnosis. Diagnoses included a range of language, developmental, motor, behavioral, and neurodevelopmental disorders (eMethods in the Supplement).

Prediction of Consistency Between Index and Reference Standard Diagnoses: Univariate Analyses

Univariate models were evaluated in which 1 predictor was evaluated at a time so that all available data would be used (eMethods in the Supplement). The importance of predictors was evaluated using both inferential criteria (P < .05) and effect size indicators of ORs (Table 3).26 Ratings of severe for DSM A (social/communication) and B criteria (restricted/repetitive behaviors) were associated with diagnostic consistency (OR, 2.962; 95% CI, 1.882-4.663 for A and OR, 2.044; 95% CI, 1.379-3.031 for B criteria). Higher DBP diagnostic certainty was also associated with diagnostic consistency as an increase of 1 unit in certainty was associated with almost twice the odds of a consistent diagnosis (OR, 1.809; 95% CI, 1.589-2.059 for employing binary logistic regression).

Table 3. Coefficients, Odds Ratios, and Respective 95% CIs When Predicting the Consistency Between Index and Reference Autism Spectrum Disorder (ASD) Diagnoses Using Univariate Mixed-Effects Logistic Regression Models.

Parameter fixed effects Coefficient OR (95% CI) Effect size of OR
Predicted intercepta 2.142b 8.514 (5.401-13.420)b Large
Child-level predictors
Child age −0.006 0.994 (0.964-1.025) Small
Child sex (male) −0.353 0.702 (0.300-1.646) Small
Child race −0.005 0.995 (0.797-1.243) Small
Child ethnicity 0.577 1.782 (0.663-4.638) Small to medium
Insurance
Medicaid −0.060 0.941 (0.414-2.143) Small
Private −0.058 0.944 (0.421-2.118) Small
Militaryc
Severity DSM-5 A criteria 1.086b 2.962 (1.882-4.663)b Small to medium
Severity DSM-5 B criteria 0.715b 2.044 (1.379-3.031)b Small to medium
DBP clinician self-rated diagnostic certainty per child participant 0.593b 1.809 (1.589-2.059)b Small to medium
ADOS classificationd 0.448 1.565 (0.944-2.594) Small
ADOS module scores 0.086 1.089 (1.016-1.168) Small
Cognition −0.008 0.992 (0.974-1.010) Small
Language −0.029 0.972 (0.945-0.999) Small
Adaptive behavior −0.011 0.989 (0.948-1.031) Small
Child directly observed (time) 0.656b 1.926 (1.123-3.303)b Small to medium
Evaluation by trainee −0.780 0.459 (0.176-1.194) Small
Availability of measures
Cognitive 0.704 2.021 (0.960-4.256) Small to medium
Language 0.589 1.803 (0.833-3.901) Small to medium
Social function 0.653 1.921 (0.904-4.080) Small to medium
Adaptive behavior 0.328 1.388 (0.481-4.013) Small
Behavioral problems
Self-injurious 1.594 4.924 (0.800-30.285) Medium to large
Aggressive −0.080 0.923 (0.395-2.161) Small
Hyperactive −0.182 0.833 (0.416-1.668) Small
Inattentive/distractible −0.171 0.843 (0.370-1.920) Small
DBP clinician-level predictors
Sex (male) 1.124 3.076 (0.932-10.157) Small to medium
Age −0.028 0.973 (0.936-1.011) Small
Years past training −0.031 0.969 (0.936-1.004) Small
Experience with DP 0.249 1.283 (0.317-5.191) Small
Primary effort in ASD −0.531 0.588 (0.209-1.655) Small
ADOS routinely used 0.685 1.984 (0.398-9.884) Small to medium

Abbreviations: DP, Developmental Pediatrician defined by board certification in Developmental-Behavioral Pediatrics and/or Neurodevelopmental Disabilities; DSM-5, Diagnostic and Statistical Manual of Mental Disorders (Fifth Edition); OR, odds ratio.

a

Predicted intercept is from the null model. Results are based on univariate models to use all available data per predictor variable. The conventions used in the table refer to small, medium, and large effects as denoted by Cohen’s d values of .20, .50, and .80 standard deviations (in the OR metric). The child variables were directly observed and evaluation by a trainee were conceptually sound as clinician-level variables but had estimates per child and were used as child-level predictors.

b

P < .05, 2-tailed test. Significance was adjusted for multiple comparisons using the Benjamini-Hochberg correction and a false discovery rate equal to 5%.

c

Coefficients could not be estimated because of low frequency military insurance that did not contain enough variability to predict a binary outcome (consistency of index and reference diagnoses).

d

Values indicate: (0 = non ASD, 1 = autism spectrum, 2 = autism).

For each additional 30 minutes spent observing the child, index diagnoses and reference standard diagnoses were twice as likely to be consistent (OR, 1.926; 95% CI, 1.123-3.303); however, there was no correlation between degree of diagnostic certainty and time spent with the child (r = -0.025; P = .68).

Due to site variability in clinic flow, cognitive, language, adaptive, and social measures were not always available to the DBP at the time of index diagnosis. The availability of 1 or more cognitive or social behavior measure at the time of index diagnosis was associated with increased consistency between index diagnosis and reference standard diagnosis (cognitive measure OR, 2.021; 95% CI, 0.960-4.256; and social measure OR, 1.921; 95% CI, 0.904-4.680). However, availability of measures was not correlated with clinician degree of diagnostic certainty at time of index diagnosis.

The presence of self-injurious behavior was associated with an increase in consistency of diagnosis (OR, 4.924; 95% CI, 0.800-30.285) using a medium to large effect size, although this estimate did not exceed levels of significance following false discovery rate correction.

Clinician male sex was associated with a 3-fold increase in diagnostic consistency (OR, 3.076; 95% CI, 0.932-10.157), despite not being statistically significant. The same was true of the ADOS classification results (OR, 1.565, 95% CI, 0.944-2.594), which were associated with a 2-fold increase in consistency.

Prediction of Consistency Between Index and Reference Standard Diagnoses: Receiver Operating Curve (ROC)

The most important predictor was DBP diagnostic certainty at the time of the index diagnosis (AUC,  0.860; effect size, good) (eTable in the Supplement). A value 7 on the 10-point Likert scale maximized prediction of a consistent diagnosis (sensitivity or correct classification of consistent diagnosis 76.43%; specificity or correct classification of inconsistent diagnosis 80%). Clinician ratings of severe DSM A criteria (AUC 0.732) and scores on language measures (AUC 0.707) were fair predictors of diagnostic consistency (eFigure in the Supplement).

Prediction of Consistency Between Index and Reference Standard Diagnoses: Multivariate Model

A multivariate model was employed, keeping predictors only if: (1) predictors were significantly different from 0 either using 1-tailed or 2-tailed tests and (2) model fit improved in their presence by use of the deviance statistic (Table 4). DBP diagnostic certainty at index diagnosis, child ethnicity, and amount of time spent observing the child were important predictors of diagnostic consistency. Across all predictors, the higher the certainty, being Hispanic, and spending more time with the child were all associated with increased diagnostic consistency. Among clinician level predictors, no significant findings emerged in the multivariate model.

Table 4. Odds Ratios for the Prediction of Consistency Between Index and Reference ASD Diagnoses Using Child- and Clinician-Based Predictors Using Multivariate Multilevel Modela.

Parameter fixed effects Model 1 Model 2 Model 3 Model 4 Model 5
Predicted intercept 2.135b 2.777b 1.014b 0.239 0.029
Child-level predictor
Certainty NA 1.794b 1.536b 1.631b 1.817b
Ethnicity NA NA 2.659b 2.796b 2.459c
Child directly observed NA NA NA 1.947b 2.312b
Clinician-level predictor
Sex NA NA NA NA 3.084
Age NA NA NA NA 1.004
Years past training NA NA NA NA 0.970
ASD primary responsibility NA NA NA NA 2.156
ADOS routinely used NA NA NA NA 0.933
Model improvementd
Deviance based χ2 837.701 54.812 20.838 121.760 NA
df 2 3 4 1 NA
P value NA <.001 <.001 <.001 NA

Abbreviations: ADOS, Autism Diagnostic Observation Schedule, ASD; autism spectrum disorder; NA, not applicable.

a

Valid cases in the multivariate model were n = 341, thus, there were 8 missing cases (0.023%) and were due to listwise deletion required by the multivariate model.

b

P < .05, 2-tailed test.

c

P < .05, 1-tailed test.

d

Variance reduction by use of a χ2 test based on the difference in the 2 models’ deviance estimates. Nested models involve only significant parameters in the multivariate model using either 1-tailed or 2-tailed. NA in model 5 denotes the absence of a comparison model 6.

Discussion

In this prospective diagnostic study including children ages 18 months to 5 years, 11 months who were referred for possible ASD, there was 90.0% agreement between the index diagnoses (ie, clinical diagnosis) and reference standard diagnoses (ie, clinical diagnosis plus information from the ADOS). In univariate analyses, factors associated with consistency between the index diagnoses and reference standard diagnoses included severity of ASD symptoms, clinician diagnostic certainty of the index diagnosis, time spent in direct observation of the child, availability of measures of development, and presence of self-injurious behavior. ROC and multivariate analyses indicated that clinician diagnostic certainty was the most robust predictor of consistency between index diagnoses and reference standard diagnoses.

A recent meta-analysis27 identified 14 studies on ASD classification using the ADOS alone compared with a reference standard assessment that included a focused clinical interview and the ADOS. Only 6 of these studies included toddler and preschool age children, and only 3 were conducted exclusively in clinical settings. The authors27 concluded that additional research on the ADOS is needed in the clinical setting. The current study was not designed to compare the ASD classification with the ADOS to a reference standard diagnosis that includes a clinical evaluation plus the ADOS; rather, it was aimed at evaluating the clinical utility of the ADOS. Clinical diagnoses by DBPs (index diagnosis), employing information from a clinical evaluation and, in some cases, results of standardized cognitive, language, and adaptive measures, were consistent with the diagnosis that also included information from the ADOS (reference standard diagnosis) in 90% of children. The likelihood that the index and reference standard diagnoses would be consistent was predicted by the DBP’s level of certainty in their clinical diagnosis.

While clinician diagnostic certainty was the most robust predictor of diagnostic consistency, several other factors also predicted consistency. Some children present with more severe and therefore diagnostically more salient ASD symptoms, and children in this study whose ASD symptoms were rated as more severe were also more likely to have consistency between the index diagnoses and reference standard diagnoses.2,3,4,15 The presence of self-injurious behaviors was found to predict consistent index diagnoses and reference standard diagnoses, suggesting that when self-injurious behaviors are noted in young children referred for possible ASD, there may be a higher likelihood that the child has ASD.

A previous study28 found that ASD diagnoses based on brief periods of direct observation of the child are often inaccurate. In this study, increased time of observation was associated with increased consistency between index diagnoses and reference standard diagnoses; however, post-hoc analyses found that clinician diagnostic certainty did not vary as function of time spent observing the child. In the clinical settings for this study,28 clinicians typically spent at least 1 half hour, often longer, in direct observation and/or interaction with the child.

Hispanic ethnicity predicted diagnostic consistency in multivariate analyses only. This important finding requires further consideration beyond the scope of this initial report.

This study has several strengths, including a large, diverse sample from 8 US sites and 1 international site. The study was conducted in the context of routine clinical care in specialty clinics, and findings are therefore likely to reflect the population of children referred for ASD diagnostic evaluations. Findings also reflect typical variations in clinical practice, including time spent evaluating the child, availability of standardized measures of cognitive, language, and adaptive function, and variability in the child’s neurodevelopmental profile.

Limitations

First, the study included only English (or German) speaking children under age 6 years, limiting generalizability. Second, the study was conducted in tertiary care referral centers, suggesting caution in generalizing findings to other settings. Third, DBPs who participated in this study are experienced subspecialists; therefore, study results are not applicable to nonspecialist clinicians who are often asked to diagnose ASD. Fourth, logistical and feasibility considerations precluded a design that incorporated an ASD evaluation independent of the clinician participants in the study.

Conclusions

In this prospective diagnostic study, clinical diagnoses of ASD by DBPs (index diagnosis) were consistent with diagnoses that incorporated information from the ADOS (reference standard diagnosis) in 90.0% of cases. Clinician diagnostic certainty in their clinical diagnosis predicted consistency between the index diagnoses and reference standard diagnosis. The ADOS may have clinical use in certain scenarios (eg, older children or evaluations by less highly trained specialist clinicians); however, this study suggests that the ADOS is generally not required for diagnosis of ASD by DBPs and that DBPs can identify children for whom the ADOS may contribute to accurate diagnosis. ASD diagnostic assessments that do not include the ADOS are less time consuming and costly, potentially leading to more streamlined assessments that could improve access to timely diagnosis by more children. Additionally, this study suggests that results from the ADOS should not be required by insurers, early intervention programs, or schools for children to access intervention and treatment for ASD.

Supplement.

eMethods. Statistical Analyses.

eTable. Predictions of Diagnostic Stability by Use of Receiver Operating Curve Analyses

eFigure. ROC Curves of Individual Predictors With at Least Fair Effect Sizes.

References

  • 1.American Psychiatric Association . Diagnostic and Statistical Manual of Mental Disorders. 5th ed. American Psychiatric Association; 2013. [Google Scholar]
  • 2.Maenner MJ, Shaw KA, Bakian AV, et al. Prevalence and characteristics of autism spectrum disorder among children aged 8 years—Autism and Developmental Disabilities Monitoring Network, 11 sites, United States, 2018. MMWR Surveill Summ. 2021;70(11):1-16. doi: 10.15585/mmwr.ss7011a1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Hyman SL, Levy SE, Myers SM; Council on Children With Disabilities, Section on Developmental and Behavioral Pediatrics . Identification, evaluation, and management of children with autism spectrum disorder. Pediatrics. 2020;145(1):e20193447. doi: 10.1542/peds.2019-3447 [DOI] [PubMed] [Google Scholar]
  • 4.Gwynette MF, McGuire K, Fadus MC, Feder JD, Koth KA, King BH. Overemphasis of the autism diagnostic observation schedule (ADOS) evaluation subverts a clinician’s ability to provide access to autism services. J Am Acad Child Adolesc Psychiatry. 2019;58(12):1222-1223. doi: 10.1016/j.jaac.2019.07.933 [DOI] [PubMed] [Google Scholar]
  • 5.Mandell DS, Barry CL, Marcus SC, et al. Effects of autism spectrum disorder insurance mandates on the treated prevalence of autism spectrum disorder. JAMA Pediatr. 2016;170(9):887-893. doi: 10.1001/jamapediatrics.2016.1049 [DOI] [PubMed] [Google Scholar]
  • 6.Kanne SM, Bishop SL. Editorial perspective: the autism waitlist crisis and remembering what families need. J Child Psychol Psychiatry. 2021;62(2):140-142. doi: 10.1111/jcpp.13254 [DOI] [PubMed] [Google Scholar]
  • 7.McNally Keehn R, Tomlin A, Ciccarelli MR. COVID-19 pandemic highlights access barriers for children with autism spectrum disorder. J Dev Behav Pediatr. 2021;42(7):599-601. doi: 10.1097/DBP.0000000000000988 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Constantino JN, Abbacchi AM, Saulnier C, et al. Timing of the diagnosis of autism in African American children. Pediatrics. 2020;146(3):e20193629. doi: 10.1542/peds.2019-3629 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Durkin MS, Maenner MJ, Baio J, et al. Autism spectrum disorder among US children (2002-2010): socioeconomic, racial, and ethnic disparities. Am J Public Health. 2017;107(11):1818-1826. doi: 10.2105/AJPH.2017.304032 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Wiggins LD, Durkin M, Esler A, et al. Disparities in documented diagnoses of autism spectrum disorder based on demographic, individual, and service factors. Autism Res. 2020;13(3):464-473. doi: 10.1002/aur.2255 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Lord C, Rutter M, DiLavore PC, Risi S, Gotham K, Bishop S. Autism Diagnostic observation schedule: ADOS-2. Western Psychological Services; 2012. [Google Scholar]
  • 12.Lord C, Rutter M, DiLavore PC, Risi S. Autism diagnostic observation schedule: toddler module. Western Psychological Services; 2006. [Google Scholar]
  • 13.Akshoomoff N, Corsello C, Schmidt H. The role of the autism diagnostic observation schedule in the assessment of autism spectrum disorders in school and community settings. Calif School Psychol. 2006;11:7-19. doi: 10.1007/BF03341111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Havdahl KA, Hus Bal V, Huerta M, et al. Multidimensional influences on autism symptom measures: Implications for use in etiological research. J Am Acad Child Adolesc Psychiatry. 2016;55(12):1054-1063.e3. doi: 10.1016/j.jaac.2016.09.490 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Kaufman NK. Rethinking “gold standards” and “best practices” in the assessment of autism. Appl Neuropsychol Child. Published online May 5, 2022. doi: 10.1080/21622965.2020.1809414 [DOI] [PubMed] [Google Scholar]
  • 16.Johnson CP, Myers SM; American Academy of Pediatrics Council on Children With Disabilities . Identification and evaluation of children with autism spectrum disorders. Pediatrics. 2007;120(5):1183-1215. doi: 10.1542/peds.2007-2361 [DOI] [PubMed] [Google Scholar]
  • 17.Hansen RL, Blum NJ, Gaham A, Shults J; DBPNet Steering Committee . Diagnosis of autism spectrum disorder by developmental-behavioral pediatricians in academic centers. Pediatrics. 2016;137(suppl 2):S79-S89. doi: 10.1542/peds.2015-2851F [DOI] [PubMed] [Google Scholar]
  • 18.Lord C, Charman T, Havdahl A, et al. The Lancet Commission on the future of care and clinical research in autism. Lancet. 2022;399(10321):271-334. doi: 10.1016/S0140-6736(21)01541-5 [DOI] [PubMed] [Google Scholar]
  • 19.Robins DL, Casagrande K, Barton M, Chen CM, Dumont-Mathieu T, Fein D. Validation of the modified checklist for autism in toddlers, revised with follow-up (M-CHAT-R/F). Pediatrics. 2014;133(1):37-45. doi: 10.1542/peds.2013-1813 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Luyster R, Gotham K, Guthrie W, et al. The Autism Diagnostic Observation Schedule-toddler module: a new module of a standardized diagnostic measure for autism spectrum disorders. J Autism Dev Disord. 2009;39(9):1305-1320. doi: 10.1007/s10803-009-0746-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Raudenbush SW, Bryk AS. Hierarchical linear models: applications and data analysis methods; second edition. Sage, 2002 [Google Scholar]
  • 22.Hsieh FY, Bloch DA, Larsen MD. A simple method of sample size calculation for linear and logistic regression. Stat Med. 1998;17(14):1623-1634. doi: [DOI] [PubMed] [Google Scholar]
  • 23.NSS Statistical Software . PASS 2021 power analysis and sample size. Accessed September 13, 2022. http://ncss.com/software/pass
  • 24.Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29-36. doi: 10.1148/radiology.143.1.7063747 [DOI] [PubMed] [Google Scholar]
  • 25.Gallop RJ, Crits-Christoph P, Muenz LR, Tu XM. Determination and interpretation of the optimal operating point for ROC curves derived through generalized linear models. Underst Stat. 2003;2:219-242. doi: 10.1207/S15328031US0204_01 [DOI] [Google Scholar]
  • 26.Chen H, Cohen P, Chen S. How big is a big odds ratio? Interpreting the magnitudes of odds ratios in epidemiological Studies. Commun Stat Simul Comput. 2010;39(4):860-864. doi: 10.1080/03610911003650383 [DOI] [Google Scholar]
  • 27.Lebersfeld JB, Swanson M, Clesi CD, O’Kelley SE. Systematic review and meta-analysis of the clinical utility of the ADOS-2 and the ADI-R in diagnosing autism spectrum disorders in children. J Autism Dev Disord. 2021;51(11):4101-4114. doi: 10.1007/s10803-020-04839-z [DOI] [PubMed] [Google Scholar]
  • 28.Gabrielsen TP, Farley M, Speer L, Villalobos M, Baker CN, Miller J. Identifying autism in a brief observation. Pediatrics. 2015;135(2):e330-e338. doi: 10.1542/peds.2014-1428 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement.

eMethods. Statistical Analyses.

eTable. Predictions of Diagnostic Stability by Use of Receiver Operating Curve Analyses

eFigure. ROC Curves of Individual Predictors With at Least Fair Effect Sizes.


Articles from JAMA Pediatrics are provided here courtesy of American Medical Association

RESOURCES