Skip to main content
JAMA Network logoLink to JAMA Network
. 2023 Feb 20;177(4):373–383. doi: 10.1001/jamapediatrics.2022.5975

Sensitivity and Specificity of the Modified Checklist for Autism in Toddlers (Original and Revised)

A Systematic Review and Meta-analysis

Andrea Trubanova Wieckowski 1,, Lashae N Williams 1, Juliette Rando 1, Kristen Lyall 1, Diana L Robins 1
PMCID: PMC9941975  PMID: 36804771

Key Points

Question

What factors are associated with the sensitivity and specificity of the Modified Checklist for Autism in Toddlers (M-CHAT) and the M-CHAT, Revised With Follow-up (M-CHAT-R/F)—henceforth referred to as M-CHAT(-R/F)?

Findings

In this systematic review and meta-analysis of 50 studies, the pooled sensitivity of M-CHAT-R/F was 0.83, and the pooled specificity was 0.94. Heterogeneity analyses revealed greater diagnostic accuracy for low- vs high-likelihood samples, a concurrent vs prospective case confirmation strategy, a large vs small sample size, use of M-CHAT(-R) Follow-up, and non-English vs primarily English.

Meaning

Findings suggest that M-CHAT(-R/F) shows strong performance as an autism spectrum disorder screener, but researchers and clinicians should be aware of the variability in the sensitivity and specificity based on multiple factors.

Abstract

Importance

The Modified Checklist for Autism in Toddlers (M-CHAT) and the M-CHAT, Revised With Follow-up (M-CHAT-R/F)—henceforth referred to as M-CHAT(-R/F)—are the most commonly used toddler screeners for autism spectrum disorder (ASD). Their use often differs from that in the original validation studies, resulting in a range of estimates of sensitivity and specificity. Also, given the variability in reports of the clinical utility of the M-CHAT(-R/F), researchers and practitioners lack guidance to inform autism screening protocols.

Objective

To synthesize variability in sensitivity and specificity of M-CHAT(-R/F) across multiple factors, including procedures for identifying missed cases, likelihood level, screening age, and single compared with repeated screenings.

Data Sources

A literature search was conducted with PubMed, Web of Science, and Scopus to identify studies published between January 1, 2001, and August 31, 2022.

Study Selection

Articles were included if the studies used the M-CHAT(-R/F) (ie, original or revised version) to identify new ASD cases, were published in English-language peer-reviewed journals, included at least 10 ASD cases, reported procedures for false-negative case identification, screened children by 48 months, and included information (or had information provided by authors when contacted) needed to conduct the meta-analysis.

Data Extraction and Synthesis

The systematic review and meta-analysis was conducted within the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) reporting guideline. The Quality Assessment of Diagnostic Accuracy Studies–2 tool evaluated bias in sample selection. Data extraction and quality assessment were performed by 2 authors independently. The overall diagnostic accuracy of the M-CHAT(-R/F) was assessed with the hierarchic summary receiver operating characteristic (HSROC) model.

Main Outcomes and Measures

Sensitivity, specificity, diagnostic odds ratios, and HSROC curves of M-CHAT(-R/F).

Results

The review included 50 studies with 51 samples. The pooled sensitivity of M-CHAT(-R/F) was 0.83 (95% CI, 0.77-0.88), and the pooled specificity was 0.94 (95% CI, 0.89-0.97). Heterogeneity analyses revealed greater diagnostic accuracy for low- vs high-likelihood samples, a concurrent vs prospective case confirmation strategy, a large vs small sample size, use of M-CHAT(-R/F) Follow-up, and non-English vs English only.

Conclusions and Relevance

Overall, results of this study suggest the utility of the M-CHAT(-R/F) as an ASD screener. The wide variability in psychometric properties of M-CHAT(-R/F) highlights differences in screener use that should be considered in research and practice.


This systematic review and meta-analysis assesses the sensitivity and specificity of the Modified Checklist for Autism in Toddlers (M-CHAT) and the M-CHAT, Revised With Follow-up as autism spectrum disorder screeners.

Introduction

Autism spectrum disorder (ASD) is a neurodevelopmental disorder marked by core deficits in social communication and restricted and repetitive behaviors. It can be accurately detected by 18 months of age, although the diagnosis usually occurs much later.1,2,3 Early detection of ASD informs autism-specific early intervention, which improves outcomes.4,5 The American Academy of Pediatrics (AAP) recommends all children undergo general developmental and autism-specific screening paired with developmental surveillance.6 Although most primary care professionals screen for autism with a standardized autism-specific tool (eg, 72%),7 implementation of screening is inconsistent, in part owing to the lack of a recommendation for universal screening from the US Preventive Services Task Force.5 Reports of low sensitivity of toddler screening8,9 may also affect universal implementation.

The Modified Checklist for Autism in Toddlers (M-CHAT)10 and the M-CHAT, Revised With Follow-up (M-CHAT-R/F)11—henceforth referred to as M-CHAT(-R/F)—are the most commonly used ASD-specific screening tools.12 However, implementation often diverges from that in the original validation studies, including use of the Follow-up portion of the M-CHAT-R/F administration, which limits interpretation of sensitivity and specificity. Previous studies describing variability in the methods used to examine psychometric properties of ASD screening tools,13,14 and the M-CHAT specifically,15 have been limited in their assessment of contextual factors, such as the rigor of false-negative case identification strategies. Subsequently, there is lack of consensus regarding M-CHAT(-R/F) screening properties. The present systematic literature review and meta-analysis examines screening properties of the M-CHAT(-R/F) to assess variations in reported sensitivity and specificity across multiple factors, including procedures for identifying missed cases, likelihood level, screening age, single or repeated screenings, sample size, version, language administered, and use of the structured Follow-up portion of the M-CHAT(-R/F).

Methods

Search Strategy

This systematic review and meta-analysis was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) reporting guideline.16 We searched PubMed, Web of Science, and Scopus to identify peer-reviewed articles published between January 1, 2001, and August 31, 2022 (see eTable 1 in Supplement 1 for key words). This systematic review and meta-analysis has been registered with the International Prospective Register of Systematic Reviews (CRD42021232792).

Inclusion Criteria

Included articles used M-CHAT(-R/F) to screen for ASD and (1) were published in English-language peer-reviewed journals; (2) used M-CHAT(-R/F) to screen children younger than 48 months, before ASD diagnosis; (3) identified at least 10 ASD cases; (4) reported procedures for identifying false-negative cases (critical to estimate sensitivity); and (5) included information (or had information provided by authors when contacted) needed to conduct the meta-analysis (rates of true-positive, false-positive, false-negative, and true-negative cases). Reviews, commentaries, and conference articles were excluded. For studies with overlapping data sets, the study with the largest sample was included.

Data Extraction and Quality Assessment

Data in Table 1 were extracted independently by 2 authors (A.T.W. and L.N.W.).8,9,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64 Screening metrics not provided in the studies were calculated from raw values. When possible, specificity was recalculated from the raw numbers so that true-negative cases included presumed true-negative ones (ie, screen-negative children who were not evaluated). Of 42 authors contacted for missing information, 27 either responded with the requested information or noted that it was not available.

Table 1. Study Characteristics and Screening Metrics for M-CHAT(-R/F).

Source Total No.a Screen age, mob Evaluation age, mob HL, LL, or mixed M-CHAT language FU ASD detectionc TP FN FP TN Sens Specd
Baduel et al,17 2017 1250 22-26 24-34 LL French Y C: FN strategy 12 6 8 1201 0.667 0.993
Beacham et al,18 2018 154 16-45 16-45 HL English N C: all eval 105 19 14 16 0.847 0.533
Canal-Bedia et al,19 2011 2480 18-36 18-48 Mixede Spanish Y C: weak 23 0 43 2414 1.000 0.980f
Carbone et al,8 2020 26 364 16-30 46.8 (17.7)g LL English Y/Nh P: record 125 253 579 25 407 0.331 0.978
Chang et al,20 2021 990 17-37 17-37 LL English Y C: FN strategy 31 7 11 941 0.816 0.988
Charman et al,21 2016 543 18-56 32-73 HL English N C: all eval 45 10 32.5 32.5 0.818 0.500
Chlebowski et al,22 2013 18 989 16-30 25.8 (4.5) LL English or Spanish Y C: FN strategy 92 6 79 18 289 0.939 0.996
Choueiri et al,23 2021 80 18-36 18-36 HL English N C: all eval 53 3 0 24 0.946 1.000
Christopher et al,24 2021 290 18-48 18-48 HL English Y C: all eval 170 48 48 24 0.780 0.333
Coelho-Medeiros et al,25 2019 120 16-30 16-30 Mixede Spanish (Chile) Y C: FN strategy 18 0 4 97 1.000 0.960
Dereu et al,26 2012 199 16-31 13-51 HL Dutch N C: FN strategy 10 4 22 163 0.714i 0.881
DiGuiseppi et al,27 2010 85 20-86 20-86 HL English or Spanish N C: FN strategy 9 2 26 38 0.818 0.594
Dudova et al,28 2014 157 ≈24 NA HL Czechj N C: FN strategy 9 4 9 112 0.692 0.926
Eaves et al,29 2006 84 17-48 22-53 HL English N C: all eval 48 4 22 8 0.923 0.267
Guo et al,30 2019 7928 16-30 23 (4) LL Chinese Y C: FN strategy 72 10 103 7166 0.878 0.986
Guthrie et al,9 2019k 20 375 16-26 17-88 LL English or Spanish Y/Nh P: record 225 229 1247 18 674 0.496 0.937
Harris et al,31 2021l 360 24-48 NA LL English or Spanish Y C: FN strategy 8 29 4 315 0.216 0.987
Hoang et al,32 2019 17 277 18-30 18-30 LL Vietnamese N C: FN strategy 129 1 118 17 021 0.992 0.993
Inada et al,33 2011 1187 17-23 35-44 LL Japanese N P: FN strategy 11 9 46 1121 0.550 0.961
Jonsdottir et al,34 2022 1586 31.7 (1.7) NAm LL Icelandic Y P: record 18 11 7 1549 0.621 0.988
Kamio et al,35 2014 1851 17-26 33-73 LL Japanese Y P: FN strategy 29n 22 24 1661 0.569 0.986
Kanne et al,36 2018 158 18-48 24-32 HL English Y C: all eval 96 23 24 15 0.807 0.385
Kara et al,37 2014 618 18-36 24-42 Mixede Turkish Y C: FN strategy 45 2 15 534 0.957 0.973
Keehn et al,38 2021 605 18-48 18-48 HL English Y C: all eval 198 31 234 142 0.865 0.378
Kerub et al,39 2020 1591 18-36 NAo LL Hebrew Y P: record 7 3 43 1538 0.700 0.973
Kim et al,40 2016 827 14-43p 110-151 HL English N P: all eval 30 28 123 646 0.517 0.840
Kleinman et al,41 2008q 1416 16-30 52.2 (8.0)g Mixede English Y P: FN strategy 73 7 51 1285 0.913 0.962
Koh et al,42 2014 580 17-48 18-69 HL English or Chinese N C: all eval 158 41 123 258 0.794 0.677
Magán-Maganto et al,43 2020 6625 14-36 23-36 LL Spanish Y C: FN strategy 15 4 24 6542 0.789 0.996
Matson et al,44 2013 552 16-30 16-30 HL English N C: all eval 150 101 150 151 0.598 0.502
Oner and Munir,45 2020 6712 16-36 16-41 LL Turkish Y C: weak 57 0 95 6388 1.000 0.985
Robins et al,46 2014 16 041 16-30 26.2 (5.5) LL English Y C: FN strategy 105 18 116 15 496 0.854 0.993
Salim et al,47 2020 143 18-48 18-48 HL Indonesian N C: all eval 16 1 27 99 0.941 0.786
Salisbury et al,48 2018 485 16-48 16-48 HL English N C: all eval 220 77 82 106 0.741 0.564
Samadi and McConkey,49 2015 2941 24-60 24-60 LL Kurdish or Persian N C: FN strategy 25 3 45 2380 0.903r 0.981
Schjølberg et al,50 2022s 54 436 19.0 (1.2) ≈42 LL Norwegian N P: record 105 232 4048 50 078 0.312 0.925
Smith et al,51 2013 217 18-48 18-48 HL English N C: all eval 97 39 31 50 0.713 0.617
Snow and Lecavalier,52 2008 56 18-48 18-48 HL English N C: all eval 38 5 8 5 0.884 0.385
Srisinghasongkram et al,53 2016 (sample 1) 109 18-48 18-48 HL Thai Y C: all eval 40 5 1 63 0.889 0.984
Srisinghasongkram et al,53 2016 (sample 2) 732 18-48 18-48 LL Thai Y C: FN strategy 9 0 1 722 1.000 0.999
Sturner et al,54 2016 5071 18-24 14-41 LL English Y C: FN strategy 23 16 17 4772 0.590 0.996
Sturner et al,55 2022 408 16-20 20.5 (1.9) LL English Y C: FN strategy 46 17 118 227 0.730 0.658
Taylor et al,56 2014 145 28.1 (4.8) 28.1 (4.8) HL English N C: all eval 74 12 26 33 0.860 0.559
Toh et al,57 2018 19 297 15-36 NA LL Malay, Chinese, or English N P: record 18 32 20 19 227 0.360 0.999
Tsai et al,58 2019 317 16-32 36-37 Mixed Mandarin Chinese Y P: all eval 22 3 19 273 0.860r 0.935
Thi Vui et al,59 2022 40 243 18-30 NA LL Vietnamese N C: FN strategy 302 3 193 39 726 0.990 0.995
Weitlauf et al,60 2015 74 16-21 18-43 HL English Y C: FN strategy 21 6 7 29 0.778 0.806
Wieckowski et al,61 2021t 3052 17-22 18-60 LL English or Spanish Y C: FN strategy 61 13 79 2729 0.824 0.972
Windiani et al,62 2016 110 18-48 18-48 HL Indonesian Y C: all eval 16 2 5 87 0.889 0.946
Wong et al,63 2018 236 18-47 18-47 HL Chinese N C: all eval 99 14 58 65 0.876 0.528
Zhang et al,64 2022 11 190 18-24 23.1 (4.6) LL Chinese Y C: FN strategy 33 15u 56u 11 056u 0.688 0.995

Abbreviations: all eval, all participants evaluated; ASD, autism spectrum disorder; C, concurrent; FN, false negative; FP, false positive; FU, M-CHAT(-R/F) Follow-up administration; HL, high likelihood for ASD; LL, low likelihood for ASD; M-CHAT(-R/F), Modified Checklist for Autism in Toddlers and Modified Checklist for Autism in Toddlers, Revised With Follow-up (original and revised versions combined); N, no; NA, not available; P, prospective; record, medical record review; Sens, sensitivity; Spec, specificity; TN, true negative; TP, true positive; Y, yes; Y/N, Y, but not consistently.

a

Refers to sample who received the M-CHAT(-R/F).

b

Range reported for entire sample who received M-CHAT(-R/F), when available. If not available, mean (SD) is reported. If mean (SD) was not available, an estimate from the article or from communication with authors is reported.

c

Strategy used to detect FN cases.

d

Specificity was recalculated from the raw numbers so that TN cases included presumed TN results (ie, including children who screened negative but were not further evaluated) for consistency across studies, unless noted otherwise. Negative screen results were presumed to be TN unless there was other presented evidence.

e

Reclassified as low risk for analyses because most participants were low risk.

f

True negative and specificity taken directly from article and not recalculated due to missing information.

g

Age of evaluation is for ASD sample only; age for non-ASD sample is unknown.

h

Yes/no was reclassified for analyses: Carbone et al8 was reclassified as N based on few practices using the Follow-up portion of the M-CHAT-R/F, and Guthrie et al9 was reclassified as Y even though the Follow-up portion of the M-CHAT-R/F was not always used, given that it was built into the medical record system and intended to be used when indicated.

i

Sensitivity differs slightly from that reported in the article because of a focus on M-CHAT(-R/F) only.

j

Language presumed as not directly reported in the article.

k

Values were obtained from the main author for repeated screenings and do not match those reported in the article for single screening.

l

Subsample of children screened before 48 months of age only is reported. Values were obtained from communication with the main author.

m

Age of evaluation was up to 18 months after the age of screening.

n

P value differs from the one reported in the article because of the addition of 9 nonresponders who needed the Follow-up portion of the M-CHAT-R/F but did not complete it and had confirmed ASD.

o

Age of evaluation was within 10 months of screenings.

p

Screen age reported is uncorrected for prematurity.

q

Study 2 sample only presented and analyzed because of overlap of sample 1 with Chlebowski et al.22

r

Sensitivity calculation slightly differs from article’s reported sensitivity owing to rounding in calculation from raw numbers.

s

Information for M-CHAT23 is reported.

t

Information is reported for 18-month screening start age only.

u

Numbers adjusted to include the 12 screen-negative children with a diagnosis of ASD during subsequent well-child visit and follow-up.

Two authors (A.T.W. and L.N.W.) used the Quality Assessment of Diagnostic Accuracy Studies–2 (QUADAS-2)65 tool, adapting signaling questions (see Yuen et al15) to evaluate bias in sample selection, implementation of M-CHAT(-R/F), and diagnostic assessment procedures. QUADAS-2 assesses risk of bias across 4 domains (patient selection, index test, reference standard, and flow and timing) and applicability across 3 domains (patient selection, index test, and reference standard). Each study received ratings of low, high, or unclear risk of bias and applicability for each domain according to the signaling questions (eTables 2 and 3 in Supplement 1).

Data Analysis

Data analyses were conducted between March 11, 2022, and October 10, 2022. The overall diagnostic accuracy of the M-CHAT(-R/F) was assessed with the hierarchic summary receiver operating characteristic (HSROC)66,67 model using the MetaDAS SAS version 1.3.0 (SAS Institute Inc) macro.68 Models were run with and without covariates: likelihood level of sample, case confirmation strategy classification, sample size, M-CHAT version, use of the Follow-up, and language. The HSROC parameters output by each model were input into RevMan, version 5 (Cochrane) software to create the HSROC summary curves. Models were run with all included studies except those that were unable to be classified into main categories based on predominant data (eMethods in Supplement 1). The diagnostic odds ratio (DOR) is an estimate of overall diagnostic test accuracy, reflecting how many times higher the odds are of obtaining an M-CHAT(-R/F) score in the diagnostic range for a randomly selected person with ASD vs without ASD. The likelihood ratio test assesses the goodness of fit of 2 competing statistical models by testing whether the ratio of their likelihoods is significantly different from 1 and was used to assess whether the addition of each covariate to the model was significant (eMethods in Supplement 1).

Results

Search Results

Overall, 50 published studies met the criteria and were included in our systematic review (see eFigure 1 in Supplement 1 for exclusions). One study53 included 2 distinct samples, resulting in 51 study samples described in the present review and meta-analysis.

Study Characteristics

Table 1 and eTable 4 in Supplement 1 summarize the characteristics of the included studies. Almost half of the samples were small (<500; 21 of 51 studies [41%]); 26 (51%) administered the M-CHAT(-R/F) primarily in English. Thirty-two studies (63%) used the original version of M-CHAT, whereas 19 (37%) used M-CHAT-R. Thirty studies (59%) used the structured Follow-up portion of the M-CHAT-R/F with the original or revised M-CHAT, and 21 (41%) used only the initial items.

Studies differed based on case confirmation strategies. Most of the 51 studies (n = 40 [78%]) used concurrent detection methods, defined as evaluation to confirm ASD status within 6 months of screening. These studies included (1) additional rigorous false-negative case detection approaches, such as asking pediatric physicians to note ASD concerns, using additional screening tools for all or a subset of the sample (n = 21), or both17,20,22,25,26,27,28,30,31,32,37,43,46,49,53,54,55,59,60,61,64; or (2) evaluation of all children regardless of screening results (n = 17).18,21,23,24,29,36,38,42,44,47,48,51,52,53,56,62,63 In addition, 2 studies used strategies that were classified as weak concurrent false-negative case confirmation strategies, defined as strategies that resulted in few children who had not screened positive being invited for an evaluation.19,45 The rest of the studies (n = 11 [22%]) used prospective approaches to identify children who received a diagnosis of ASD, defined as diagnostic confirmation occurring when children were older (>6 months from screening). These studies used strategies that included medical record reviews from both primary and specialty clinics (n = 6),8,9,34,39,50,57 additional rigorous strategies to identify false-negative cases through additional screening tools or physician-indicated concerns (n = 3),33,35,41 and evaluation of all children regardless of screening result (n = 2).40,58

Studies also differed significantly based on the characteristics of the children screened. Twenty-four of the 51 studies (47%) recruited low-likelihood (LL) children, which included screening at well-child visits, childcare centers, kindergarten, or preschool. Twenty-two studies (43%) used high-likelihood (HL) samples, including children referred to community health services or for ASD-specific evaluations because of parent or practitioner concern about development,18,21,24,29,36,38,42,47,48,51,52,53,56,62,63 those at elevated likelihood for ASD according to another screening result,26 those receiving services through an early intervention system,23,44 those with Down syndrome,27 those with a history of prematurity or low birth weight,28,40 and those with older siblings who had received a diagnosis of ASD.60 In addition, 5 studies administered the M-CHAT(-R/F) to a mixed sample comprising both LL and HL children.19,25,37,41,58 Most of these studies included predominantly LL samples, however.

Methodological Quality of Included Studies

Figure 1 shows the results of the methodological quality for all published studies included in the review. There was generally low risk of bias across domains with the exception of the flow and timing, for which 36 of the 51 studies (71%) showed high risk of bias, although this was because many studies used LL samples that did not evaluate all children screened with the M-CHAT(-R/F) or because there was longer timing between screening and evaluation in prospective case confirmation studies. Regarding concern for applicability, all studies showed low concern for patient selection and reference standard, with only 1 rated to be of high risk and 1 unclear risk for concern of applicability for the index test domain. See eTable 3 in Supplement 1 for domain ratings for individual studies.

Figure 1. Methodological Quality Graph Depicting the Cumulative Findings of the Methodological Quality Analysis.

Figure 1.

QUADAS-2 indicates Quality Assessment of Diagnostic Accuracy Studies–2.

Meta-analysis Results

Across all studies, M-CHAT-(R/F) sensitivity ranged from 0.22 to 1.00, with a pooled sensitivity of 0.83 (95% CI, 0.77-0.88). Specificity similarly ranged from 0.27 to 1.00, with a pooled specificity of 0.94 (95% CI, 0.89-0.97). See Table 1 for screening metrics for each identified study, Table 2 for HSROC model results, and Figure 2 for a forest plot of included studies, sorted by sensitivity.8,9,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,59,60,61,62,63,64 The SROC plot of the overall model is depicted in eFigure 2 in Supplement 1. Figure 3 displays a visual of sensitivity and specificity values grouped by several factors affecting the values, including strategy used for identifying missed cases, risk level, and size of the study.

Table 2. Sensitivity and Specificity of M-CHAT(-R/F) Overall and by Study Characteristics.

Characteristic No. Sens Spec DOR (95% CI) −2LL P value
Overall 49a 0.83 0.94 75.40 (35.56-159.87) NA NA
M-CHAT version
M-CHAT 31 0.83 0.94 63.47 (25.38-158.73) 1.35 .72
M-CHAT-R 18 0.83 0.95 102.69 (27.87-378.26)
Likelihood level
Low 27 0.83 0.99 334.78 (156.69-715.31) 58.28 <.001b
High 22 0.82 0.70 10.93 (4.45-26.83)
Case confirmation strategy
Concurrent 40 0.86 0.93 85.84 (37.56-196.16) 17.66 .001b
Prospective 9 0.57 0.97 38.71 (8.38-178.82)
Sample size
<500 20 0.83 0.78 17.31 (6.48-46.21) 38.26 <.001b
500-5000 17 0.80 0.96 88.94 (29.58-267.43)
>5000 12 0.85 0.99 553.41 (159.34-1922.06)
Follow-up
Initial only 22 0.82 0.84 24.61 (9.01-67.23) 11.29 .01b
Follow-up 27 0.83 0.97 182.87 (71.12-470.22)
Language
English or primarily English 26 0.78 0.85 20.19 (7.85-51.91) 23.96 <.001b
Other 23 0.89 0.98 361.76 (145.80-897.58)

Abbreviations: DOR, diagnostic odds ratio; M-CHAT, Modified Checklist for Autism in Toddlers; M-CHAT-R, Modified Checklist for Autism in Toddlers, Revised; NA, not applicable; Sens, sensitivity; Spec, specificity; −2LL, −2 log likelihood difference.

a

Studies classified as “mixed” or “unknown” in any category were excluded from the analysis (n = 2).57,58 When the overall model was run with the full set of 51 studies, sensitivity was only slightly different, specificity was identical, and DOR was slightly higher (78.71 [95% CI, 38.11-162.57]). Therefore, for comparison across the subanalyses, the common set of 49 studies is reported.

b

P < .05.

Figure 2. Forest Plot of Modified Checklist for Autism in Toddlers and Modified Checklist for Autism in Toddlers, Revised With Follow-up (Original and Revised Versions Combined) Sorted by Sensitivity.

Figure 2.

FN indicates false negative; FP, false positive; TN, true negative; and TP, true positive.

aRefers to sample 2 of Srisinghasongkram et al.53

Figure 3. Sensitivity and Specificity Estimates Grouped by Strategy Used for Identifying Missed Cases, Likelihood Level, and Size of Study Sample.

Figure 3.

FN indicates false negative.

Analyses testing the association of covariates (likelihood level of sample, case confirmation strategy classification, sample size, M-CHAT version, Follow-up use, and language) suggested several significant covariates (overall: DOR, 75.40 [95% CI, 35.56-159.87]) (Table 2; eFigure 2 in Supplement 1). All covariates examined except for the M-CHAT(-R/F) version (original vs revised) were statistically significant (Table 2; eFigure 3 in Supplement 1). Mixed and unknown subgroups were excluded from the models presented. When included, sensitivity was not significantly different, specificity was identical, and DOR was slightly higher (78.71 [95% CI, 38.11-162.57]). The LL group reported a DOR more than 30 times greater than that of the HL group (334.78 [95% CI, 156.69-715.31] vs 10.93 [95% CI, 4.45-26.83]) (Table 2; eFigure 4 in Supplement 1). Sensitivity was similar between the 2 groups (0.83 vs 0.82), but specificity of the LL group was higher than that of the HL group (0.99 vs 0.70). Higher DOR was achieved by the concurrent rather than the prospective case confirmation strategy (85.84 [95% CI, 37.56-196.16] vs 38.71 [95% CI, 8.38-178.82]) (Table 2; eFigure 5 in Supplement 1). Although the concurrent group achieved a higher sensitivity than the prospective group (0.86 vs 0.57), there were no large differences in specificity based on the detection strategy (0.93 vs 0.97). Diagnostic accuracy increased with sample size; the small sample size group (<500) had a DOR 5 times smaller than that of the medium sample size group (500-5000), which was, in turn, approximately 6 times smaller than that of the large sample size group (>5000). The DOR was 17.31 (95% CI, 6.48-46.21), 88.94 (95% CI, 29.58-267.43), and 553.41 (95% CI, 159.34-1922.06), respectively (Table 2; eFigure 6 in Supplement 1). Specificity increased with increasing sample size (0.78, 0.96, and 0.99, respectively), but sensitivity did not (0.83, 0.80, and 0.85, respectively). Higher DOR, sensitivity, and specificity were observed for use of the Follow-up compared with initial screening only (DOR, 182.87 [95% CI, 71.12-470.22] vs 24.61 [95% CI, 9.01-67.23]; sensitivity, 0.83 vs 0.82; specificity, 0.97 vs 0.84) (Table 2; eFigure 7 in Supplement 1). Higher DOR, sensitivity, and specificity were observed for studies conducted in languages other than predominantly English (DOR, 361.76 [95% CI, 145.80-897.58] vs 20.19 [95% CI, 7.85-51.91]; sensitivity, 0.89 vs 0.78; specificity, 0.98 vs 0.85) (Table 2; eFigure 8 in Supplement 1).

Systematic Review Results

Two factors were examined in an insufficient number of studies to be assessed as covariates in meta-analyses: screening age and repeated screening. We therefore provide a descriptive review of these factors here. Although M-CHAT(-R/F) was initially validated for children between 16 and 30 months of age, it has been used with children up to 48 months of age. Only 5 included studies directly compared M-CHAT(-R/F) psychometric values for younger children within the validated age range (up to 30 months) vs children older than 30 months (eTable 5 in Supplement 1). Descriptively, all 5 studies found slightly lower sensitivity for children older than 30 months compared with younger than 30 months.18,24,36,42,48 However, specificity differed, with 1 study reporting lower specificity with children older than 30 months,18 whereas other studies showed higher specificity for the same age group.24,36,42,48

Only 6 of the 51 studies (12%) reported repeated screenings at 18 and 24 months, as is recommended by the AAP. Three of the repeated screening studies used concurrent false-negative case detection strategies,22,46,61 1 study used a prospective false-negative case detection strategy,41 and 2 studies used prospective record review.8,9 Three studies did not report enough information to allow for comparison of sensitivity for single compared with repeated screenings.22,41,46 The remaining 3 studies demonstrated 11% to 45% higher sensitivity across repeated screenings compared with sensitivity based on a single screening, without decreasing specificity (eTable 6 in Supplement 1).

Discussion

The AAP identifies sensitivity and specificity above 70% to be acceptable for screening measures.69 Estimates of these properties for the M-CHAT(-R/F) vary widely according to study methods and sample characteristics, and this variability affects use in both clinical and research settings. For example, as a result of 2 large prospective studies’ finding of low sensitivity for M-CHAT(-R/F),8,9 a recent study70 used an alternative screener that not only lacked long-term outcome data but also had tested the screener with only small samples, insufficient for thorough validation. Overall, across the studies identified in this systematic review and meta-analysis, sensitivity and specificity of the M-CHAT(-R/F) were found to be strong, with pooled values of 0.83 and 0.94, respectively. The variability of the estimates of sensitivity and specificity of M-CHAT(-R/F), however, highlights a need to consider factors that influence screening performance.

Case confirmation strategies used to identify missed cases were closely associated with sensitivity estimates. Weak concurrent strategies used to detect false-negative results likely inflated sensitivity compared with rigorous false-negative strategies better equipped to identify missed cases. Prospective strategies, on the other hand, likely conflate missed children who may not have had measurable symptoms during toddlerhood (ie, children whose ASD symptoms emerged later in childhood) with children truly missed by screening (who should have been detectable), potentially because of parents’ inaccurate report or their not being willing or ready to endorse an increased likelihood of ASD behaviors during screening.71 Studies show that for some children with ASD, symptoms are subtle early in development or show a prolonged course of symptom development,72 consistent with the theory that although brain development may be different before birth, measurable symptoms of neurodevelopmental disorders may emerge gradually because children are expected to demonstrate more sophisticated behavior as they grow older.73 Although an advantage of reviewing medical records or registries lies in obtaining information over a range of ages, this broad age range may also be associated with ascertainment differences, given expected differences in medical record content for preschoolers compared with older children. Furthermore, prospective record review often includes community diagnoses that may not be as rigorous as diagnostic procedures in research74; therefore, some children identified through record review would not meet more stringent research classification.

Another factor associated with variability in M-CHAT(-R/F)’s specificity is the classification of study samples based on ASD likelihood. The M-CHAT(-R/F) casts a broad net for children in need of expert differential diagnosis. Therefore, it is not surprising that specificity is lower for HL groups compared with LL samples because HL samples include many children with other developmental delays or co-occurring conditions. However, sensitivity—the ability to detect ASD when it is truly present—is equally high for both groups. The lower performance of specificity—the ability to identify individuals without ASD—supports the recommendation for comprehensive evaluation to assess differential diagnoses when M-CHAT(-R/F) is used, particularly in HL samples. Similarly, because the M-CHAT(-R/F) was validated for children between 16 and 30 months of age, it is not surprising that studies found slightly lower sensitivity for those older than 30 months compared with 30 months of age or younger, although M-CHAT(-R/F) has utility for children up to 48 months of age.

Repeated screening is also an important factor in maximizing sensitivity. However, the results of this systematic review suggest that repeated screening is extremely underused because only 6 of the 51 studies reported systematically screening toddlers more than once. The 3 studies for which data allowed direct comparison suggested a large increase in sensitivity with repeated vs single screening, without decreasing specificity. Similarly, other studies have shown that repeated screenings for ASD at 18 and 24 months of age increase the likelihood of identifying children missed by earlier screening,75,76 and rescreening after 18 months of age detects children with ASD who initially screened negative.77 In addition to symptom emergence detected among children at later ages, parents’ limited knowledge of typical developmental milestones, or of ASD-specific symptoms, may be associated with negative screening results at 18 months. For example, some children demonstrate symptoms of ASD at 18 months, even though they screen negative for ASD at that age.78 These findings support the AAP’s recommendation for repeated screening at 18- and 24-month well-child visits; however, most studies did not adopt this recommendation in their study designs.

The studies included in the systematic review and meta-analysis differed in many aspects, including sample size, version of M-CHAT (original vs revised), whether the structured Follow-up was used appropriately for children whose initial scores were in the medium likelihood of ASD range, and language of M-CHAT(-R/F). Larger sample sizes resulted in higher DORs, possibly because of improved statistical stability in larger samples or because more resources were available in larger studies. Use of the Follow-up portion of the M-CHAT(-R/F) significantly improved the tool’s performance, with greatly increased specificity and a more than 7-fold increase in DOR, consistent with findings from the original validation studies of the M-CHAT(-R/F).22,46 Even though use of the Follow-up does not change sensitivity (nor would it be expected to), it greatly reduces false-positive rates, which in turn reduces the burden on tertiary care clinics that receive referrals. This finding highlights the importance and benefit of the use of the Follow-up portion of the M-CHAT(-R/F) during screening, even though only slightly more than half of the identified studies administered the Follow-up consistently. The structured Follow-up clarifies parents’ endorsements during initial screening, giving them the opportunity to explain behaviors beyond dichotomous response options, which improves accuracy. The improved performance of the M-CHAT in non-English languages is interesting. It is unclear whether other factors may be associated with this finding. For example, non-English administration of M-CHAT(-R/F) appears to occur more often in HL studies. These potential interactions between variables need to be explored in future studies. In addition to the factors explored in the present meta-analysis, future factors to explore include parents’ prior knowledge of ASD, which may be more common in HL compared with LL samples, or parent education, which may account for some of the findings.

Limitations

A limitation of this study is the methodological issues that were identified in a majority of the studies, which could have biased reported accuracy measures. In particular, many studies with LL samples did not evaluate every child who was screened with the M-CHAT(-R/F), likely because of obvious feasibility challenges. Children who screened negative and were not evaluated were therefore presumed to have true-negative results for analyses, but it is possible that some cases were missed. Similarly, variation in the report of ASD diagnostic criteria and the type of clinicians who performed assessments were associated with between-study heterogeneity. In addition, the present systematic review was limited to studies published in English-language peer-reviewed journals.

Conclusions

When the M-CHAT(-R/F)’s utility in detecting toddlers at greater likelihood for ASD is evaluated, it is critical to consider study strategies, age of the children, and ASD likelihood status of the children in addition to other factors that tend to vary widely between the existing studies. Although the AAP recommends screening at both 18- and 24-month visits and studies emphasize the added value of repeated screenings, very few studies consistently rescreened participants, highlighting the need for continued effort in dissemination of best practice screening protocols. The results of this systematic review and meta-analysis illuminate important clinical implications for pediatric physicians. Even though no single measure will identify all children at increased likelihood for ASD, M-CHAT(-R/F) shows strong overall sensitivity and specificity, and research with the tool indicates that screening early and at multiple time points is critical to identify children at increased likelihood for ASD who are in need of access to ASD-specific early intervention services.

Overall, the results of this systematic review and meta-analysis highlight strong sensitivity and specificity for the M-CHAT(-R/F) across the 51 study samples—critical information for clinicians, researchers, and policy makers alike. However, the wide variability, ranging from poor to excellent screening metrics, highlights the differences between screener use, which should be considered when studies are designed. Critically, although the version of M-CHAT (original vs revised) does not significantly affect sensitivity and specificity, use of the Follow-up portion of the M-CHAT(-R/F) significantly reduces false-positive rates, which in turn reduces the burden on diagnostic and intervention systems. Other guidance based on this study’s findings includes emphasizing the importance of referral for comprehensive evaluations to discern symptoms of autism from symptoms observed in other developmental disorders, particularly when M-CHAT(-R/F) is used with HL populations. In addition, the difference in M-CHAT(-R/F)’s sensitivity in concurrent vs prospective studies indicates the need to account for timing and diagnostic rigor when designing studies.

Supplement 1.

eMethods. Diagnostic Accuracy of the M-CHAT(-R/F)

eTable 1. Database Search Terms

eTable 2. QUADAS-2 Description and Adapted Signaling Questions

eTable 3. Quality Assessment of Studies Included in the Systematic Review

eTable 4. Additional Study Characteristics and Psychometric Properties for M-CHAT(-R/F)

eTable 5. Sensitivity and Specificity for Younger and Older Samples

eTable 6. Sensitivity and Specificity for Single and Repeated Screening

eFigure 1. Study Selection Flow Chart Following PRISMA Guidelines

eFigure 2. Overall SROC of M-CHAT(-R/F) (n = 49 Studies)

eFigure 3. SROC Plot of M-CHAT(-R/F) by M-CHAT Version (M-CHAT n = 31, M-CHAT-R n = 18)

eFigure 4. SROC Plot of M-CHAT(-R/F) by Likelihood Level of Sample (Low Likelihood n = 27, High Likelihood n = 22)

eFigure 5. SROC Plot of M-CHAT(-R/F) by Case Confirmation Strategy (Concurrent n = 40, Prospective n = 9)

eFigure 6. SROC Plot of M-CHAT(-R/F) by Sample Size (Small n = 20, Medium n = 17, Large n = 12)

eFigure 7. SROC Plot of M-CHAT(-R/F) With Follow-up vs Initial Only (Initial n = 22, Follow-up n = 27)

eFigure 8. SROC Plot of M-CHAT(-R/F) by Language (English/Primarily English n = 26, Other Language n = 23)

Supplement 2.

Data Sharing Statement

References

  • 1.Ozonoff S, Young GS, Landa RJ, et al. Diagnostic stability in young children at risk for autism spectrum disorder: a Baby Siblings Research Consortium study. J Child Psychol Psychiatry. 2015;56(9):988-998. doi: 10.1111/jcpp.12421 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Pierce K, Gazestani VH, Bacon E, et al. Evaluation of the diagnostic stability of the early autism spectrum disorder phenotype in the general population starting at 12 months. JAMA Pediatr. 2019;173(6):578-587. doi: 10.1001/jamapediatrics.2019.0624 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Shaw KA, Maenner MJ, Baio J, et al. Early identification of autism spectrum disorder among children aged 4 years—Early Autism and Developmental Disabilities Monitoring Network, six sites, United States, 2016. MMWR Surveill Summ. 2020;69(3):1-11. doi: 10.15585/mmwr.ss6903a1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Elder JH, Kreider CM, Brasher SN, Ansell M. Clinical impact of early diagnosis of autism on the prognosis and parent-child relationships. Psychol Res Behav Manag. 2017;10:283-292. doi: 10.2147/PRBM.S117499 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Siu AL, Bibbins-Domingo K, Grossman DC, et al. ; US Preventive Services Task Force (USPSTF) . Screening for depression in adults: US Preventive Services Task Force recommendation statement. JAMA. 2016;315(4):380-387. doi: 10.1001/jama.2015.18392 [DOI] [PubMed] [Google Scholar]
  • 6.Hyman SL, Levy SE, Myers SM, et al. Identification, evaluation, and management of children with autism spectrum disorder. Pediatrics. 2020;145(1):e20193447. doi: 10.1542/peds.2019-3447 [DOI] [PubMed] [Google Scholar]
  • 7.Lipkin PH, Macias MM, Baer Chen B, et al. Trends in pediatricians’ developmental screening: 2002–2016. Pediatrics. 2020;145(4):e20190851. doi: 10.1542/peds.2019-0851 [DOI] [PubMed] [Google Scholar]
  • 8.Carbone PS, Campbell K, Wilkes J, et al. Primary care autism screening and later autism diagnosis. Pediatrics. 2020;146(2):e20192314. doi: 10.1542/peds.2019-2314 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Guthrie W, Wallis K, Bennett A, et al. Accuracy of autism screening in a large pediatric network. Pediatrics. 2019;144(4):e20183963. doi: 10.1542/peds.2018-3963 [DOI] [PubMed] [Google Scholar]
  • 10.Robins DL, Fein D, Barton ML, Green JA. The Modified Checklist for Autism in Toddlers: an initial study investigating the early detection of autism and pervasive developmental disorders. J Autism Dev Disord. 2001;31(2):131-144. doi: 10.1023/A:1010738829569 [DOI] [PubMed] [Google Scholar]
  • 11.Robins DL, Fein D, Barton M. Modified Checklist for Autism in Toddlers–Revised With Follow-up (M-CHAT-R/F). Lineagen; 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Levy SE, Wolfe A, Coury D, et al. Screening tools for autism spectrum disorder in primary care: a systematic evidence review. Pediatrics. 2020;145(suppl 1):S47-S59. doi: 10.1542/peds.2019-1895H [DOI] [PubMed] [Google Scholar]
  • 13.Petrocchi S, Levante A, Lecciso F. Systematic review of level 1 and level 2 screening tools for autism spectrum disorders in toddlers. Brain Sci. 2020;10(3):180. doi: 10.3390/brainsci10030180 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Sánchez-García AB, Galindo-Villardón P, Nieto-Librero AB, Martín-Rodero H, Robins DL. Toddler screening for autism spectrum disorder: a meta-analysis of diagnostic accuracy. J Autism Dev Disord. 2019;49(5):1837-1852. doi: 10.1007/s10803-018-03865-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Yuen T, Penner M, Carter MT, Szatmari P, Ungar WJ. Assessing the accuracy of the Modified Checklist for Autism in Toddlers: a systematic review and meta-analysis. Dev Med Child Neurol. 2018;60(11):1093-1100. doi: 10.1111/dmcn.13964 [DOI] [PubMed] [Google Scholar]
  • 16.Moher D, Liberati A, Tetzlaff J, Altman DG; PRISMA Group . Preferred Reporting Items for Systematic Reviews and Meta-analyses: the PRISMA statement. Int J Surg. 2010;8(5):336-341. doi: 10.1016/j.ijsu.2010.02.007 [DOI] [PubMed] [Google Scholar]
  • 17.Baduel S, Guillon Q, Afzali MH, Foudon N, Kruck J, Rogé B. The French version of the Modified-Checklist for Autism in Toddlers (M-CHAT): a validation study on a French sample of 24 month-old children. J Autism Dev Disord. 2017;47(2):297-304. doi: 10.1007/s10803-016-2950-y [DOI] [PubMed] [Google Scholar]
  • 18.Beacham C, Reid M, Bradshaw J, et al. Screening for autism spectrum disorder: profiles of children who are missed. J Dev Behav Pediatr. 2018;39(9):673-682. doi: 10.1097/DBP.0000000000000607 [DOI] [PubMed] [Google Scholar]
  • 19.Canal-Bedia R, García-Primo P, Martín-Cilleros MV, et al. Modified Checklist for Autism in Toddlers: cross-cultural adaptation and validation in Spain. J Autism Dev Disord. 2011;41(10):1342-1351. doi: 10.1007/s10803-010-1163-z [DOI] [PubMed] [Google Scholar]
  • 20.Chang Z, Di Martino JM, Aiello R, et al. Computational methods to measure patterns of gaze in toddlers with autism spectrum disorder. JAMA Pediatr. 2021;175(8):827-836. doi: 10.1001/jamapediatrics.2021.0530 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Charman T, Baird G, Simonoff E, et al. Testing two screening instruments for autism spectrum disorder in UK community child health services. Dev Med Child Neurol. 2016;58(4):369-375. doi: 10.1111/dmcn.12874 [DOI] [PubMed] [Google Scholar]
  • 22.Chlebowski C, Robins DL, Barton ML, Fein D. Large-scale use of the modified checklist for autism in low-risk toddlers. Pediatrics. 2013;131(4):e1121-e1127. doi: 10.1542/peds.2012-1525 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Choueiri R, Lindenbaum A, Ravi M, Robsky W, Flahive J, Garrison W. Improving early identification and access to diagnosis of autism spectrum disorder in toddlers in a culturally diverse community with the Rapid Interactive Screening Test for Autism in Toddlers. J Autism Dev Disord. 2021;51(11):3937-3945. doi: 10.1007/s10803-020-04851-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Christopher K, Bishop S, Carpenter LA, Warren Z, Kanne S. The implications of parent-reported emotional and behavioral problems on the Modified Checklist for Autism in Toddlers. J Autism Dev Disord. 2021;51(3):884-891. doi: 10.1007/s10803-020-04469-5 [DOI] [PubMed] [Google Scholar]
  • 25.Coelho-Medeiros ME, Bronstein J, Aedo K, et al. M-CHAT-R/F validation as a screening tool for early detection in children with autism spectrum disorder. Article in Spanish. Rev Chil Pediatr. 2019;90(5):492-499. doi: 10.32641/rchped.v90i5.703 [DOI] [PubMed] [Google Scholar]
  • 26.Dereu M, Raymaekers R, Warreyn P, Schietecatte I, Meirsschaut M, Roeyers H. Can child care workers contribute to the early detection of autism spectrum disorders? a comparison between screening instruments with child care workers versus parents as informants. J Autism Dev Disord. 2012;42(5):781-796. doi: 10.1007/s10803-011-1307-9 [DOI] [PubMed] [Google Scholar]
  • 27.DiGuiseppi C, Hepburn S, Davis JM, et al. Screening for autism spectrum disorders in children with Down syndrome: population prevalence and screening test characteristics. J Dev Behav Pediatr. 2010;31(3):181-191. doi: 10.1097/DBP.0b013e3181d5aa6d [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Dudova I, Markova D, Kasparova M, et al. Comparison of three screening tests for autism in preterm children with birth weights less than 1,500 grams. Neuropsychiatr Dis Treat. 2014;10:2201-2208. doi: 10.2147/NDT.S72921 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Eaves LC, Wingert H, Ho HH. Screening for autism: agreement with diagnosis. Autism. 2006;10(3):229-242. doi: 10.1177/1362361306063288 [DOI] [PubMed] [Google Scholar]
  • 30.Guo C, Luo M, Wang X, et al. Reliability and validity of the Chinese version of Modified Checklist for Autism in Toddlers, Revised, With Follow-up (M-CHAT-R/F). J Autism Dev Disord. 2019;49(1):185-196. doi: 10.1007/s10803-018-3682-y [DOI] [PubMed] [Google Scholar]
  • 31.Harris JF, Coffield CN, Janvier YM, Mandell D, Cidav Z. Validation of the developmental check-in tool for low-literacy autism screening. Pediatrics. 2021;147(1):e20193659. doi: 10.1542/peds.2019-3659 [DOI] [PubMed] [Google Scholar]
  • 32.Hoang VM, Le TV, Chu TTQ, et al. Prevalence of autism spectrum disorders and their relation to selected socio-demographic factors among children aged 18-30 months in northern Vietnam, 2017. Int J Ment Health Syst. 2019;13(1):29. doi: 10.1186/s13033-019-0285-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Inada N, Koyama T, Inokuchi E, Kuroda M, Kamio Y. Reliability and validity of the Japanese version of the Modified Checklist for Autism in Toddlers (M-CHAT). Res Autism Spectr Disord. 2011;5(1):330-336. doi: 10.1016/j.rasd.2010.04.016 [DOI] [Google Scholar]
  • 34.Jonsdottir SL, Saemundsen E, Jonsson BG, Rafnsson V. Validation of the Modified Checklist for Autism in Toddlers, Revised With Follow-up in a population sample of 30-month-old children in Iceland: a prospective approach. J Autism Dev Disord. 2022;52(4):1507-1522. doi: 10.1007/s10803-021-05053-1 [DOI] [PubMed] [Google Scholar]
  • 35.Kamio Y, Inada N, Koyama T, Inokuchi E, Tsuchiya K, Kuroda M. Effectiveness of using the Modified Checklist for Autism in Toddlers in two-stage screening of autism spectrum disorder at the 18-month health check-up in Japan. J Autism Dev Disord. 2014;44(1):194-203. doi: 10.1007/s10803-013-1864-1 [DOI] [PubMed] [Google Scholar]
  • 36.Kanne SM, Carpenter LA, Warren Z. Screening in toddlers and preschoolers at risk for autism spectrum disorder: evaluating a novel mobile-health screening tool. Autism Res. 2018;11(7):1038-1049. doi: 10.1002/aur.1959 [DOI] [PubMed] [Google Scholar]
  • 37.Kara B, Mukaddes NM, Altınkaya I, Güntepe D, Gökçay G, Özmen M. Using the Modified Checklist for Autism in Toddlers in a well-child clinic in Turkey: adapting the screening method based on culture and setting. Autism. 2014;18(3):331-338. doi: 10.1177/1362361312467864 [DOI] [PubMed] [Google Scholar]
  • 38.Keehn RM, Tang Q, Swigonski N, Ciccarelli M. Associations among referral concerns, screening results, and diagnostic outcomes of young children assessed in a statewide early autism evaluation network. J Pediatr. 2021;233:74-81. doi: 10.1016/j.jpeds.2021.02.063 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Kerub O, Haas EJ, Meiri G, Davidovitch N, Menashe I. A comparison between two screening approaches for ASD among toddlers in Israel. J Autism Dev Disord. 2020;50(5):1553-1560. doi: 10.1007/s10803-018-3711-x [DOI] [PubMed] [Google Scholar]
  • 40.Kim SH, Joseph RM, Frazier JA, et al. Predictive validity of the Modified Checklist for Autism in Toddlers (M-CHAT) born very preterm. J Pediatr. 2016;178:101-107. doi: 10.1016/j.jpeds.2016.07.052 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Kleinman JM, Robins DL, Ventola PE, et al. The Modified Checklist for Autism in Toddlers: a follow-up study investigating the early detection of autism spectrum disorders. J Autism Dev Disord. 2008;38(5):827-839. doi: 10.1007/s10803-007-0450-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Koh HC, Lim SH, Chan GJ, et al. The clinical utility of the Modified Checklist for Autism in Toddlers with high risk 18–48 month old children in Singapore. J Autism Dev Disord. 2014;44(2):405-416. doi: 10.1007/s10803-013-1880-1 [DOI] [PubMed] [Google Scholar]
  • 43.Magán-Maganto M, Canal-Bedia R, Hernández-Fabián A, et al. Spanish cultural validation of the Modified Checklist for Autism in Toddlers, Revised. J Autism Dev Disord. 2020;50(7):2412-2423. doi: 10.1007/s10803-018-3777-5 [DOI] [PubMed] [Google Scholar]
  • 44.Matson JL, Kozlowski AM, Fitzgerald ME, Sipes M. True versus false positives and negatives on the Modified Checklist for Autism in Toddlers. Res Autism Spectr Disord. 2013;7(1):17-22. doi: 10.1016/j.rasd.2012.02.011 [DOI] [Google Scholar]
  • 45.Oner O, Munir KM. Modified Checklist for Autism in Toddlers Revised (MCHAT-R/F) in an urban metropolitan sample of young children in Turkey. J Autism Dev Disord. 2020;50(9):3312-3319. doi: 10.1007/s10803-019-04160-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Robins DL, Casagrande K, Barton M, Chen CMA, Dumont-Mathieu T, Fein D. Validation of the Modified Checklist for Autism in Toddlers, Revised With Follow-up (M-CHAT-R/F). Pediatrics. 2014;133(1):37-45. doi: 10.1542/peds.2013-1813 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Salim H, Soetjiningsih S, Windiani IGAT, Widiana IGR. Validation of the Indonesian version of Modified Checklist for Autism in Toddlers: a diagnostic study. Paediatr Indones. 2020;60(3):160-166. doi: 10.14238/pi60.3.2020.160-6 [DOI] [Google Scholar]
  • 48.Salisbury LA, Nyce JD, Hannum CD, Sheldrick RC, Perrin EC. Sensitivity and specificity of 2 autism screeners among referred children between 16 and 48 months of age. J Dev Behav Pediatr. 2018;39(3):254-258. doi: 10.1097/DBP.0000000000000537 [DOI] [PubMed] [Google Scholar]
  • 49.Samadi SA, McConkey R. Screening for autism in Iranian preschoolers: contrasting M-CHAT and a scale developed in Iran. J Autism Dev Disord. 2015;45(9):2908-2916. doi: 10.1007/s10803-015-2454-1 [DOI] [PubMed] [Google Scholar]
  • 50.Schjølberg S, Shic F, Volkmar FR, et al. What are we optimizing for in autism screening? examination of algorithmic changes in the M-CHAT. Autism Res. 2022;15(2):296-304. doi: 10.1002/aur.2643 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Smith NJ, Sheldrick RC, Perrin EC. An abbreviated screening instrument for autism spectrum disorders. Infant Ment Health J. 2013;34(2):149-155. doi: 10.1002/imhj.21356 [DOI] [Google Scholar]
  • 52.Snow AV, Lecavalier L. Sensitivity and specificity of the Modified Checklist for Autism in Toddlers and the Social Communication Questionnaire in preschoolers suspected of having pervasive developmental disorders. Autism. 2008;12(6):627-644. doi: 10.1177/1362361308097116 [DOI] [PubMed] [Google Scholar]
  • 53.Srisinghasongkram P, Pruksananonda C, Chonchaiya W. Two-step screening of the Modified Checklist for Autism in Toddlers in Thai children with language delay and typically developing children. J Autism Dev Disord. 2016;46(10):3317-3329. doi: 10.1007/s10803-016-2876-4 [DOI] [PubMed] [Google Scholar]
  • 54.Sturner R, Howard B, Bergmann P, et al. Autism screening with online decision support by primary care pediatricians aided by M-CHAT/F. Pediatrics. 2016;138(3):e20153036. doi: 10.1542/peds.2015-3036 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Sturner R, Howard B, Bergmann P, et al. Autism screening at 18 months of age: a comparison of the Q-CHAT-10 and M-CHAT screeners. Mol Autism. 2022;13(1):2. doi: 10.1186/s13229-021-00480-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Taylor CM, Vehorn A, Noble H, Weitlauf AS, Warren ZE. Brief report: can metrics of reporting bias enhance early autism screening measures? J Autism Dev Disord. 2014;44(9):2375-2380. doi: 10.1007/s10803-014-2099-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Toh TH, Tan VWY, Lau PST, Kiyu A. Accuracy of Modified Checklist for Autism in Toddlers (M-CHAT) in detecting autism and other developmental disorders in community clinics. J Autism Dev Disord. 2018;48(1):28-35. doi: 10.1007/s10803-017-3287-x [DOI] [PubMed] [Google Scholar]
  • 58.Tsai JM, Lu L, Jeng SF, et al. Validation of the Modified Checklist for Autism in Toddlers, Revised With Follow-up in Taiwanese toddlers. Res Dev Disabil. 2019;85:205-216. doi: 10.1016/j.ridd.2018.11.011 [DOI] [PubMed] [Google Scholar]
  • 59.Thi Vui L, Duc DM, Thuy Quynh N, et al. Early screening and diagnosis of autism spectrum disorders in Vietnam: a population-based cross-sectional survey. J Public Health Res. 2022;11(2):2460. doi: 10.4081/jphr.2021.2460 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Weitlauf AS, Vehorn AC, Stone WL, Fein D, Warren ZE. Using the M-CHAT-R/F to identify developmental concerns in a high-risk 18-month-old sibling sample. J Dev Behav Pediatr. 2015;36(7):497-502. doi: 10.1097/DBP.0000000000000194 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Wieckowski AT, Hamner T, Nanovic S, et al. Early and repeated screening detects autism spectrum disorder. J Pediatr. 2021;234:227-235. doi: 10.1016/j.jpeds.2021.03.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Windiani I, Soetjiningsih S, Adnyana I, Lestari KA. Indonesian Modified Checklist for Autism in Toddler, Revised With Follow-up (M-CHAT-R/F) for autism screening in children at Sanglah General Hospital, Bali-Indonesia. Bali Med J. 2016;5(2):133. doi: 10.15562/bmj.v5i2.240 [DOI] [Google Scholar]
  • 63.Wong YS, Yang CC, Stewart L, Chiang CH, Wu CC, Iao LS. Use of the Chinese version Modified Checklist for Autism in Toddlers in a high-risk sample in Taiwan. Res Autism Spectr Disord. 2018;49:56-64. doi: 10.1016/j.rasd.2018.01.010 [DOI] [Google Scholar]
  • 64.Zhang Y, Zhou Z, Xu Q, et al. Screening for autism spectrum disorder in toddlers during the 18- and 24-month well-child visits. Front Psychiatry. 2022;13:879625. doi: 10.3389/fpsyt.2022.879625 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Whiting PF, Rutjes AW, Westwood ME, et al. ; QUADAS-2 Group . QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529-536. doi: 10.7326/0003-4819-155-8-201110180-00009 [DOI] [PubMed] [Google Scholar]
  • 66.Rutter CM, Gatsonis CA. Regression methods for meta-analysis of diagnostic test data. Acad Radiol. 1995;2(suppl 1):S48-S56. [PubMed] [Google Scholar]
  • 67.Rutter CM, Gatsonis CA. A hierarchical regression approach to meta-analysis of diagnostic test accuracy evaluations. Stat Med. 2001;20(19):2865-2884. doi: 10.1002/sim.942 [DOI] [PubMed] [Google Scholar]
  • 68.Takwoingi Y, Deeks J. MetaDAS: a SAS macro for meta-analysis of diagnostic accuracy studies: quick reference and worked example. Accessed October 1, 2022. https://methods.cochrane.org/sdt/sites/methods.cochrane.org.sdt/files/uploads/MetaDAS%20Quick%20Reference%20v1.3%20May%202012.pdf
  • 69.Council on Children With Disabilities; Section on Developmental Behavioral Pediatrics; Bright Futures Steering Committee; Medical Home Initiatives for Children With Special Needs Project Advisory Committee . Identifying infants and young children with developmental disorders in the medical home: an algorithm for developmental surveillance and screening. Pediatrics. 2006;118(1):405-420. doi: 10.1542/peds.2006-1231 [DOI] [PubMed] [Google Scholar]
  • 70.Campbell K, Carbone PS, Liu D, Stipelman CH. Improving autism screening and referrals with electronic support and evaluations in primary care. Pediatrics. 2021;147(3):e20201609. doi: 10.1542/peds.2020-1609 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Robins DL. How do we determine the utility of screening tools? Autism. 2020;24(2):271-273. doi: 10.1177/1362361319894170 [DOI] [PubMed] [Google Scholar]
  • 72.Ozonoff S, Young GS, Brian J, et al. Diagnosis of autism spectrum disorder after age 5 in children evaluated longitudinally since infancy. J Am Acad Child Adolesc Psychiatry. 2018;57(11):849-857. doi: 10.1016/j.jaac.2018.06.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Dennis M, Spiegler BJ, Simic N, et al. Functional plasticity in childhood brain disorders: when, what, how, and whom to assess. Neuropsychol Rev. 2014;24(4):389-408. doi: 10.1007/s11065-014-9261-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Hausman-Kedem M, Kosofsky BE, Ross G, et al. Accuracy of reported community diagnosis of autism spectrum disorder. J Psychopathol Behav Assess. 2018;40(3):367-375. doi: 10.1007/s10862-018-9642-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Barton ML, Dumont-Mathieu T, Fein D. Screening young children for autism spectrum disorders in primary practice. J Autism Dev Disord. 2012;42(6):1165-1174. doi: 10.1007/s10803-011-1343-5 [DOI] [PubMed] [Google Scholar]
  • 76.Crais ER, Watson LR. Challenges and opportunities in early identification and intervention for children at-risk for autism spectrum disorders. Int J Speech Lang Pathol. 2014;16(1):23-29. doi: 10.3109/17549507.2013.862860 [DOI] [PubMed] [Google Scholar]
  • 77.Dai YG, Miller LE, Ramsey RK, Robins DL, Fein DA, Dumont-Mathieu T. Incremental utility of 24-month autism spectrum disorder screening after negative 18-month screening. J Autism Dev Disord. 2020;50(6):2030-2040. doi: 10.1007/s10803-019-03959-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Øien RA, Schjølberg S, Volkmar FR, et al. Clinical features of children with autism who passed 18-month screening. Pediatrics. 2018;141(6):e20173596. doi: 10.1542/peds.2017-3596 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1.

eMethods. Diagnostic Accuracy of the M-CHAT(-R/F)

eTable 1. Database Search Terms

eTable 2. QUADAS-2 Description and Adapted Signaling Questions

eTable 3. Quality Assessment of Studies Included in the Systematic Review

eTable 4. Additional Study Characteristics and Psychometric Properties for M-CHAT(-R/F)

eTable 5. Sensitivity and Specificity for Younger and Older Samples

eTable 6. Sensitivity and Specificity for Single and Repeated Screening

eFigure 1. Study Selection Flow Chart Following PRISMA Guidelines

eFigure 2. Overall SROC of M-CHAT(-R/F) (n = 49 Studies)

eFigure 3. SROC Plot of M-CHAT(-R/F) by M-CHAT Version (M-CHAT n = 31, M-CHAT-R n = 18)

eFigure 4. SROC Plot of M-CHAT(-R/F) by Likelihood Level of Sample (Low Likelihood n = 27, High Likelihood n = 22)

eFigure 5. SROC Plot of M-CHAT(-R/F) by Case Confirmation Strategy (Concurrent n = 40, Prospective n = 9)

eFigure 6. SROC Plot of M-CHAT(-R/F) by Sample Size (Small n = 20, Medium n = 17, Large n = 12)

eFigure 7. SROC Plot of M-CHAT(-R/F) With Follow-up vs Initial Only (Initial n = 22, Follow-up n = 27)

eFigure 8. SROC Plot of M-CHAT(-R/F) by Language (English/Primarily English n = 26, Other Language n = 23)

Supplement 2.

Data Sharing Statement


Articles from JAMA Pediatrics are provided here courtesy of American Medical Association

RESOURCES