Overall prognosis of preschool autism spectrum disorder diagnoses

Amanda Brignell; Rachael C Harwood; Tamara May; Susan Woolfenden; Alicia Montgomery; Alfonso Iorio; Katrina Williams

doi:10.1002/14651858.CD012749.pub2

. 2022 Sep 28;2022(9):CD012749. doi: 10.1002/14651858.CD012749.pub2

Overall prognosis of preschool autism spectrum disorder diagnoses

Amanda Brignell ^1,^2,^3,^4,^✉, Rachael C Harwood ⁵, Tamara May ¹, Susan Woolfenden ^6,⁷, Alicia Montgomery ⁶, Alfonso Iorio ⁸, Katrina Williams ^1,^4,^9,¹⁰

Editor: Cochrane Developmental, Psychosocial and Learning Problems Group

PMCID: PMC9516883 PMID: 36169177

Abstract

Background

Autism spectrum disorder is a neurodevelopmental disorder characterised by social communication difficulties, restricted interests and repetitive behaviours. The clinical pathway for children with a diagnosis of autism spectrum disorder is varied, and current research suggests some children may not continue to meet diagnostic criteria over time.

Objectives

The primary objective of this review was to synthesise the available evidence on the proportion of preschool children who have a diagnosis of autism spectrum disorder at baseline (diagnosed before six years of age) who continue to meet diagnostic criteria at follow‐up one or more years later (up to 19 years of age).

Search methods

We searched MEDLINE, Embase, PsycINFO, and eight other databases in October 2017 and ran top‐up searches up to July 2021. We also searched reference lists of relevant systematic reviews.

Selection criteria

Two review authors independently assessed prospective and retrospective follow‐up studies that used the same measure and process within studies to diagnose autism spectrum disorder at baseline and follow‐up. Studies were required to have at least one year of follow‐up and contain at least 10 participants. Participants were all aged less than six years at baseline assessment and followed up before 19 years of age.

Data collection and analysis

We extracted data on study characteristics and the proportion of children diagnosed with autism spectrum disorder at baseline and follow‐up. We also collected information on change in scores on measures that assess the dimensions of autism spectrum disorder (i.e. social communication and restricted interests and repetitive behaviours). Two review authors independently extracted data on study characteristics and assessed risk of bias using a modified quality in prognosis studies (QUIPS) tool. We conducted a random‐effects meta‐analysis or narrative synthesis, depending on the type of data available. We also conducted prognostic factor analyses to explore factors that may predict diagnostic outcome.

Main results

In total, 49 studies met our inclusion criteria and 42 of these (11,740 participants) had data that could be extracted. Of the 42 studies, 25 (60%) were conducted in North America, 13 (31%) were conducted in Europe and the UK, and four (10%) in Asia. Most (52%) studies were published before 2014. The mean age of the participants was 3.19 years (range 1.13 to 5.0 years) at baseline and 6.12 years (range 3.0 to 12.14 years) at follow‐up. The mean length of follow‐up was 2.86 years (range 1.0 to 12.41 years). The majority of the children were boys (81%), and just over half (60%) of the studies primarily included participants with intellectual disability (intelligence quotient < 70). The mean sample size was 272 (range 10 to 8564). Sixty‐nine per cent of studies used one diagnostic assessment tool, 24% used two tools and 7% used three or more tools. Diagnosis was decided by a multidisciplinary team in 41% of studies. No data were available for the outcomes of social communication and restricted and repetitive behaviours and interests.

Of the 42 studies with available data, we were able to synthesise data from 34 studies (69% of all included studies; n = 11,129) in a meta‐analysis. In summary, 92% (95% confidence interval 89% to 95%) of participants continued to meet diagnostic criteria for autism spectrum disorder from baseline to follow‐up one or more years later; however, the quality of the evidence was judged as low due to study limitations and inconsistency. The majority of the included studies (95%) were rated at high risk of bias. We were unable to explore the outcomes of change in social communication and restricted and repetitive behaviour and interests between baseline and follow‐up as none of the included studies provided separate domain scores at baseline and follow‐up. Details on conflict of interest were reported in 24 studies. Funding support was reported by 30 studies, 12 studies omitted details on funding sources and two studies reported no funding support. Declared funding sources were categorised as government, university or non‐government organisation or charity groups. We considered it unlikely funding sources would have significantly influenced the outcomes, given the nature of prognosis studies.

Authors' conclusions

Overall, we found that nine out of 10 children who were diagnosed with autism spectrum disorder before six years of age continued to meet diagnostic criteria for autism spectrum disorder a year or more later, however the evidence was uncertain. Confidence in the evidence was rated low using GRADE, due to heterogeneity and risk of bias, and there were few studies that included children diagnosed using a current classification system, such as the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM‐5) or the eleventh revision of the International Classification of Diseases (ICD‐11). Future studies that are well‐designed, prospective and specifically assess prognosis of autism spectrum disorder diagnoses are needed. These studies should also include contemporary diagnostic assessment methods across a broad range of participants and investigate a range of relevant prognostic factors.

Keywords: Adult; Child; Child, Preschool; Female; Humans; Infant; Male; Young Adult; Autism Spectrum Disorder; Autism Spectrum Disorder/diagnosis; Prognosis; Prospective Studies; Retrospective Studies; Schools

Plain language summary

What proportion of preschool aged children diagnosed with autism spectrum disorder retain their diagnosis one or more years later?

Key messages

‐ Nine out of 10 preschool aged children diagnosed with autism in a research setting may continue to meet diagnostic criteria one or more years later.

‐ Due to lack of robust evidence, this finding may not be able to be generalised to children outside a research setting, and we were not able to identify any child or research study factors that influenced if a child retained their diagnosis.

‐ Future research should focus on designing a robust study exploring whether a child retains their autism diagnosis over time in clinical practice and what other factors, if any, may change how likely a child is to retain their diagnosis.

What is autism?

Autism (autism spectrum disorder) is a common neurodevelopmental condition that is generally considered to be lifelong. It is characterised by difficulties in social communication, and restricted interests and repetitive behaviours. How much of a challenge these areas present for each individual is highly variable.

How is autism diagnosed?

Autism is diagnosed by assessing whether an individual meets a set of standardised diagnostic criteria.

In children, an autism diagnostic assessment may involve a paediatrician, child psychiatrist, speech pathologist, occupational therapist and psychologist. One or more of these health professionals may observe and ask questions about a child’s social and communication skills, any difficulties in restricted interests and repetitive behaviours, and how they process and respond to sensory information from the world around them. There are diagnostic assessment tools that these professionals can use, alone or in combination, to help make the diagnosis.

What is diagnostic stability, and why is it important?

Diagnostic stability refers to whether an individual retains their diagnosis over time. The diagnostic stability of autism is important to help health professionals, autistic individuals and their families understand how likely it is for a diagnosis of autism spectrum disorder to be lifelong. Additionally, it helps government and community groups to plan what health, education and employment resources are required to support autistic children and their families. Diagnostic stability also helps us to understand whether the characteristics of autistic children and the way that autism spectrum disorder is currently diagnosed influences whether a child continues to meet the criteria for an autism diagnosis over time.

What did we want to find out?

We wanted to find out whether a preschool child who was given a diagnosis of autism spectrum disorder before the age of six years retained their diagnosis at repeat diagnostic assessment one or more years later.

We also wanted to learn more about whether any factors relating to the individual child, the way the child was diagnosed with autism, or the research methods used in the studies, made it more or less likely for the child to continue to meet diagnostic criteria for autism spectrum disorder over time. The factors relating to the individual child included the children's age at the initial and follow‐up diagnostic assessments, their intelligence quotient (IQ) score, their ability to complete daily living tasks for a child of their age (adaptive behaviour score), and their ability to communicate with those around them (language score). Factors relating to the way children were diagnosed included the type of tool or criteria used to make the diagnosis, the length of time between diagnostic assessments, and whether the diagnosis was made by a multidisciplinary team. The factors related to the research methods included the year the study was published and the robustness of the evidence.

What did we do?

We searched for studies looking at preschool aged children diagnosed with autism. We then summarised the results, evaluated the evidence and rated our confidence in the evidence based on factors such as study methods and participation.

What did we find?

In total, 49 studies met our inclusion criteria and 42 of these (11,740 children) had data that could be used. The biggest study had 8564 children and the smallest had 11. These studies were from 13 countries, with 16 from the USA. The average age of the children was three years at their first diagnosis and six years at follow‐up. The average length of follow‐up was 2.86 years.

We found that, in a research setting, nine out of 10 of preschool children diagnosed with autism spectrum disorder may keep their diagnosis one or more years later.

What are the limitations of the evidence?

We have little confidence in the evidence because not all the studies provided data about everything that we were interested in, and the studies were done with different types of people and diagnostic assessments.

For the one in 10 children who no longer met diagnostic criteria for an autism diagnosis at follow‐up, we were not able to tell whether they had 'grown out' of their autism because they became more mature over time, or because they had received intervention, or whether the original diagnosis was inaccurate.

How up to date is this evidence?

The evidence is up to date to July 2021.

Summary of findings

Summary of findings 1. Summary of findings.

Proportion of individuals who have a diagnosis of autism spectrum disorder at baseline and continue not meet diagnostic criteria at follow‐up one or more years later
Patient or population: children diagnosed with autism spectrum disorder Settings: range of settings
Outcomes	Relative effect (95% CI)	Number of participants (studies)	Quality of the evidence (GRADE)	Comments
Proportion with an autism spectrum disorder diagnosis at baseline and follow‐up Follow‐up: > 12 months	0.92 (0.89 to 0.95)	11,105 (34 studies: 1 intervention trial with 1 arm; 1 RCT^a; 2 non‐RCTs^a; 30 TAU or in the community)	⊕⊕⊝⊝ Low^b,c	Limitations (ROB): serious^b Inconsistency: serious^c Indirectness: not serious Imprecision: not serious Publication/reporting bias: not serious Effect size: N/A Dose response gradient: N/A Confirmatory evidence: N/A See footnotes below.
Social communication at baseline and follow‐up (mean score) Follow‐up: > 12 months	See comments			None of the included studies provided separate domain scores at baseline and follow‐up
Restricted and repetitive behaviours and interests at baseline and follow‐up (mean score) Follow‐up: > 12 months	See comments			None of the included studies provided separate domain scores at baseline and follow‐up
Defnitions of levels of evidence High: We are very confident that the true prognosis (probability of future events) lies close to that of the estimate Moderate: We are moderately confident that the true prognosis (probability of future events) is likely to be close to the estimate, but there is a possibility that it is substantially different Low: Our confidence in the estimate is limited: the true prognosis (probability of future events) may be substantially different from the estimate Very low: We have very little confidence in the estimate: the true prognosis (probability of future events) is likely to be substantially different from the estimate CI: Confidence intervals; N/A: Not applicable; RCT(s): Randomised controlled trial(s); ROB: Risk of bias; TAU: Treatment as usual.

Year published	Classification system	Subgroups (as specified in the classification system)
1975	International Classification of Diseases, Ninth Revision, Clinical Modification (ICD‐9‐CM)	Autistic disorder
1980	Diagnostic and Statistical Manual of Mental Disorders, Third Edition (DSM‐III)	PDD: infantile autism, childhood onset PDD and atypical PDD
1987	Diagnostic and Statistical Manual of Mental Disorders, Third Edition, Revised (DSM‐III‐R)	PDD: autistic disorder, PDD‐not otherwise specified (PDD‐NOS)
1994 to 2000	Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM‐IV)	Asperger’s disorder, autistic disorder, PDD‐NOS
1996	International Classification of Diseases, Tenth Revision (ICD‐10)	Childhood autism, Asperger's syndrome, atypical autism, pervasive developmental disorder (PDD) ‐ unspecified
2000 to 2013	Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision (DSM‐IV‐TR)	Asperger’s disorder, autistic disorder, PDD‐NOS
2013 to current	Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM‐5)	Autism spectrum disorder
2018	International Classification of Diseases, Eleventh Revision (ICD‐11)	Autism spectrum disorder

Primary publication	Additional publications from the same study
Anderson 2009^a	Anderson 2007, Bedford 2016, Gotham 2012, Gotham 2011, Hus 2011, Lord 1995, Lord 2004, Lord 2012, Luyster 2007, Pickles 2014, Richler 2010, Thurm 2007
Baghdadli 2012	Baghdadli 2018, Baghdadli 2008, Baghdadli 2007, Darrou 2010, Pry 2011, Pry 2012
Bopp 2006	Bopp 2009; Smith 2007
Flanagan 2010	Flanagan 2012
Giserman‐Kiss 2020	Giserman‐Kiss 2018
Moss 2008	Magiati 2007, Magiati 2011a, Magiati 2011b
Qian 2018	Ke 2017, Li 2019
Rivard 2019	Mello 2018
Solomon 2014	Mahoney 2016
Solomon 2016	Solomon 2018, Waizbard‐Bartov 2021
Szatmari 2021	Baribeau 2020, Baribeau 2021, Courchesne 2021, Bennett 2014, Bennett 2015, Georgiades 2014, Georgiades 2021, Szatmari 2015
Venker 2014	Ellis‐Weismer 2015, Davidson 2017, Ray‐Subramanian 2012, Venker 2016

Study	Diagnosis type	N at baseline (% male)	IQ (mean standard score)^a	Adaptive behavior (mean standard score)^a	Language (mean standard score)^a	Age at baseline (years)	Follow‐up duration (years)	Diagnostic tool used at baseline (multidisciplinary or not)	Proportion who met diagnostic criteria at follow‐up
Baghdadli 2012	ASD	152 (82)	< 70	< 70	NR	4.90	3.00	ICD‐10 & CARS (Y)	1.0
Benedetto 2021	ASD	147 (80)	NR	NR	NR	2.3	1	DSM‐5 and ADOS (Y)	0.73
Brian 2016	ASD	18 (72)	> 70	NR	> 70	3.15	6.36	DSM‐IV‐TR & ADOS (N)	0.94
Demb 1989	ASD	12 (75)	< 70	NR	NR	4.50	5.00	DSM‐III & DSM‐III R (N)	0.83
Eaves 2004	ASD	43 (80)	< 70	< 70	NR	2.75	2.25	DSM‐IV, CARS, MDT (Y)	0.93
Elmose 2014	ASD	23 (78)	NR	NR	NR	3.10	8.30	ICD‐10, ADOS (Y)	1.00
Flanagan 2010	ASD	67 (82)	NR	< 70	NR	3.59	1.38	CARS (N)	0.81
Freeman 2004	ASD	59 (81)	< 70	NR	NR	4.00	2.2	DSM IV, CARS (N)	0.97
Gillberg 1990	ASD	25 (68)	< 70	NR	NR	1.13	4.04	DSM‐III‐R (N)	0.92
Giserman‐Kiss 2020	ASD	60 (87)	<70	NR	<70	2.31	1.98	ADOS	0.883
Gonzalez 1993	ASD	30 (73)	< 70	NR	NR	4.50	1.00	DSM‐III, DSM‐III‐R, DSM‐IV and ICD 10 (N)	0.97
Hinnebusch 2017	ASD	219 (81)	Both	NR	< 70	2.13	2.16	DSM‐IV, ADOS, CARS (N)	0.83
Kim 2016	ASD	100 (84)	> 70	Both	NR	1.80	1.30	ADOS (Y)	0.93
Klintwall 2015	ASD	70 (89)	> 70	> 70	< 70	1.83	1.36	ADOS G, ADOS T (U)	0.93
Malhi 2011	ASD	77 (83)	< 70	NR	NR	2.48	1.65	CARS (Y)	0.95
Moore 2003	ASD	19 (80)	> 70	NR	< 70	2.83	1.59	ADI‐R (Y)	1.00
Moss 2008	ASD	35 (91)	< 70	< 70	< 70	3.5	7.00	ADI‐R (N)	0.80
Ozonoff 2015	ASD	79 (NR)	NR	NR	NR	2	1	ADI‐R, DSM IV, best clinical estimate (N)	0.82
Paul 2008	ASD	37 (NR)	> 70	> 70	< 70	1.82	1.09	ADOS (Y)	1.00
Qian 2018	ASD	37 (86)	<70	NR	NR	2.57	2	DSM IV TR; CARS ADI‐R (N)	1.00
Robain 2020	ASD	60 (100)	>70	NR	NR	3	1	DSM 5 ADOS (N)	1.00
Santocchi 2012	ASD	98 (NR)	NR	NR	NR	3.25	1.75	ADOS, CARS	0.86
Sheinkopf 1998	ASD	11 (NR)	< 70	NR	NR	2.94	1.51	DSM‐III (Y)	1.00
Soke 2011	AD	28 (79)	< 70	NR	NR	2.75	2.08	ADI‐R (Y)	0.89
Solomon 2014	ASD	55 (84)	< 70	NR	NR	4.21	1.00	ADOS (U)	0.78
Solomon 2016	ASD	102 (80)	> 70	Both	Both	2.86	2.76	ADOS (N)	0.95
Spjut Jansson 2016	ASD	71 (79)	Both	Both	NR	3.03	2.00	ADOS, DISCO, ADI‐R (Y)	0.93
Sullivan 2010	ASD	75 (83)	< 70	< 70	NR	3.94	2.18	CARS (N)	0.53
Takeda 2007	ASD	126 (81)	< 70	NR	NR	2.62	2.90	ICD‐10, CARS (N)	1.00
Venker 2014	ASD	129 (87)	> 70	> 70	Both	2.80	5.85	DSM‐IV, ADOS (Y)	1.00
Wu 2016	ASD	8564 (83)	NR	NR	NR	3.67	1.43	DSM‐IV‐TR file record review (N)	0.91
Zappella 1990	AD	15 (87)	> 70	Both	NR	4.50	1.83	DSM‐III (N)	0.60
Zappella 2010	ASD	534 (84)	NR	NR	NR	5.00	2.67	DSM‐IV‐TR (U)	0.93
Zwaigenbaum 2015	ASD	23 (69)	NR	> 70	NR	1.50	1.50	DSM‐IV‐TR (N)	0.83

Domain		Relative effect (95% CIs)	No. of participants (studies)	I²
Age at baseline	0 to 2 years	0.94 (0.88 to 0.98)	251 (5 studies)	52.64%, P = 0.08
	2 to 3 years	0.92 (0.88 to 0.95)	9989 (22 studies)	90.17%, P < 0.01
	4 to 5 years	0.91 (0.76 to 0.99)	152 (5 studies)	90.48%, P < 0.01
	5 to 6 years	0.93 (0.90 to 0.95)	534 (1 study)	‐
Age at follow‐up	< 4 years	0.89 (0.79 to 0.96)	443 (6 studies)	86.80%, P < 0.01
	4 to 6 years	0.92 (0.88 to 0.95)	9794 (21 studies)	87.88%, P < 0.01
	7 to 12 years	0.96 (0.89 to 1.00)	868 (7 studies)	88.18%, P < 0.01
Duration of follow‐up	1 to 2 years	0.91 (0.88 to 0.94)	10,745 (27 studies)	87.86%, P < 0.01
	2 to 5 years	0.99 (0.92 to 1.00)	293 (4 studies)	78.16%, P < 0.01
	6 to 17 years	0.92 (0.77 to 1.00)	67 (3 studies)	‐
Decade of publication	1980 to 1989	0.83 (0.55 to 0.95)	12 (1 studies)	‐
	1990 to 1999	0.91 (0.74 to 1.00)	82 (4 studies)	73.16% P = 0.01
	2000 to 2009	0.98 (0.93 to 1.00)	479 (7 studies)	80.57% P < 0.01
	2010 to 2019	0.90 (0.87 to 0.93)	10,273 (19 studies)	86.84% P < 0.01
	2020 to 2029	0.90 (0.68 to 1.00)	259 (3 studies)	‐
Intelligence^a	< 70	0.93 (0.85 to 0.98)	793 (15 studies)	90.88%, P < 0.01
	> 70	0.97 (0.92 to 1.00)	502 (9 studies)	77.54%, P < 0.01
	Both < 70 and > 70	0.86 (0.81, 0.89)	289 (2 studies)	‐
Language^a	< 70	0.92 (0.84 to 0.98)	382 (6 studies)	79.65%, P < 0.01
	> 70	0.94 (0.74 to 0.99)	18 (1 study)	‐
	Both	0.98 (0.96 to 1.00)	205 (2 studies)	‐
Adaptive behaviour^a	< 70	0.85 (0.60 to 0.99)	300 (5 studies)	96.33%, P < 0.01
	> 70	0.97 (0.8 to 1.00)	233 (4 studies)	83.28%, P < 0.01
	Both	0.91 (0.82, 0.97)	283 (4 studies)	73.88%, P = 0.01
Multidisciplinary assessment	Yes	0.97 (0.91 to 1.00)	767 (13 studies)	87.97%, P < 0.01
Multidisciplinary assessment	No	0.88 (0.83 to 0.93)	9468 (16 studies)	89.46%, P < 0.01

Column heading	Definition
Study number	‐
Author	First author (surname and first initial)
Country of publication	‐
Year of publication	‐
Description of study	Study description, prospective cohort, retrospective cohort, assessment of outcome, controlled, with/without intervention, aim of the study
Study population/group	Clinic versus population versus clinical drawn from a broad population base
Sampling frame	Description of where sample was collected from
Study sample	Description of baseline study sample
Inclusion/exclusion criteria	Participants that were eligible for study are described
Adequacy of participation	Adequacy of participation in the study by all who were eligible
Size of population/group	Number (N) at baseline, denominator for proportion analyses; proportion (%) male
Diagnostic criteria	DSM; ICD; or Kanner and edition number
Diagnostic tool/measure at baseline and follow‐up	ADI‐R; ADOS; CARS; GARS; 3di; or DISCO
Consistency of tool	Same diagnostic tool for all; same method and setting of outcome for all participants; whether valid reliable tool; completeness of outcome measure
Timing of diagnosis	Prior to study, at baseline, etc.
Multidisciplinary assessment	Diagnosis was completed by two or more professionals
Diagnosis	AD; ASD; AD + PDD‐NOS; as defined by diagnostic criteria
Age at baseline in years	‐
Age at follow‐up in years	‐
Period of follow‐up in years	Length of follow‐up for the study
Cognitive ability/IQ	Outcome; measure used
Language ability	Outcome; measure used
Adaptive behaviour ability	Outcome; measure used
Study approach and outcomes	When outcomes were measured
Numerator for primary outcome	Number diagnosed with ASD at follow‐up
Denominator for primary outcome	Number assessed for ASD at follow‐up
Proportion continuing to meet diagnostic criteria	Numerator divided by denominator
Autistic symptoms ‐ core	Outcome: social communication/repetitive, restricted behaviours, and interests; measure used
Autistic symptoms ‐ other	Outcome: what symptoms or measure used
Study attrition	Number of participants lost to follow‐up; participants that did not complete all parts of follow‐up or tools; reasons for loss to follow‐up; whether reasons have been linked to outcome
Interventions	Type and amount of interventions
Groups	Control group versus intervention group
Notes	‐
Footnotes AD: autistic disorder; ADI‐R: Autism Diagnostic Interview ‐ Revised; ADOS: Autism Diagnostic Observation Schedule; ASD: autism spectrum disorder; CARS: Childhood Autism Rating Scale; DISCO: Diagnostic Interview for Social and Communication Disorders; DSM: Diagnostic and Statistical Manual of Mental Disorders; GARS: Gilliam Autism Rating Scale; ICD: International Classification of Diseases; IQ: intelligence quotient; PDD‐NOS: pervasive developmental disorder‐not otherwise specified; 3di: developmental, dimensional and diagnostic interview.

1. Study participation: the study sample adequately represents population of interest
Criteria	Unclear	High	Moderate	Low
Sample (described)	‐	Clinical (not community based)	Clinical but drawn from broad community base	Population based
Description of sampling frame	‐	Not described	Some description but not adequate or complete	Well described
Description of baseline study sample	‐	Not described	Some description but not adequate or complete	Well described
Description of inclusion or exclusion criteria	‐	Not described	Some description but not adequate or complete	Well described
Adequacy of participation in study by all eligible	‐	No	‐	Yes
2. Study attrition: the study data available (those not lost to follow‐up) adequately represent the study sample
Criteria	Unclear	High	Moderate	Low
Recruitment	‐	Retrospective	Retrospective with whole cohort considered	Prospective
LFU (%)	‐	< 80% remain	≥ 80% remain	≥ 85% remain
Description of attempts to collect information on those LFU	‐	No	Some information provided but not adequate	Yes
Reasons for LFU provided?	‐	No	Some information provided but not adequate	Yes
Reasons for LFU linked to outcome?	‐	No	Some information provided but not adequate	Yes
Adequate description of LFU participants?	‐	No	Some information provided but not adequate	Yes
Analysis: important differences between LFU and non‐LFU in study?	‐	Important differences	‐	No important differences
3. Outcome measurement: the outcomes of interest are measured in a similar way for all participants
Criteria	Unclear	High	Moderate	Low
Blinding	‐	Not blinded	Blinding inadequate	Blinding adequate
Clear definition of outcome provided?	‐	No	‐	Yes
Same outcome tool for all?	‐	Not same tool for all	‐	Same for all
Valid and reliable tool?	‐	Not valid, reliable tool used	Valid or reliable tool, but parent rating	Standardised, reliable, valid tool used
Method and setting of outcome measurement same for all participants?	‐	No	‐	Yes
Completeness of outcome measure	‐	Not all tools completed (> 90% missing)	Not all tools completed but not > 90% missing	All tools completed
Footnotes LFU: Loss to follow‐up.

	Domain	Description
Rate down if:	Risk of bias	The overall quality is driven by the study with lowest quality (if only low risk of bias studies are use, then the quality is rated as high ; individual studies are rated down one or two levels for serious or critical risk of bias.
	Inconsistency	Unexplained heterogeneity or variability in results (point estimates) across studies with differences in estimates exceeding decisional thresholds. Large I² value (significant heterogeneity) and visual inspection of the forest plot (effect sizes on either side of the lines of no effect and with confidence intervals showing little to no overlap) usually prompt concerns around heterogeneity
	Indirectness	The study sample or the outcomes in the study, or both, do not accurately reflect the population of interest or the measured outcome does not capture what is believed to be important
	Imprecision	This is based primarily the position of the confidence interval relative to a clinical decision threshold
	Publication bias	Forrest plot or statistical testing suggesting that small negative studies are underrepresented
Rate up if:	Large effect	Moderate or large effect reported by most studies or in pooled findings in the meta‐analysis
Rate up if:	Dose‐response gradient	Gradient exists between studies for factors measured at different doses or an increase or decrease in events over time, which follows a well‐defined pattern (e.g. linear)
Footnotes Table modified from Guyatt 2011, Hayden 2014 and Iorio 2015.

Quality level	Definition
High	We are very confident that the true prognosis (probability of future events) lies close to that of the estimate
Moderate	We are moderately confident that the true prognosis (probability of future events) is likely to be close to the estimate, but there is a possibility that it is substantially different
Low	Our confidence in the estimate is limited: the true prognosis (probability of future events) may be substantially different from the estimate
Very low	We have very little confidence in the estimate: the true prognosis (probability of future events) is likely to be substantially different from the estimate
Footnotes This table has been reproduced from Iorio 2015, with permission from the first author.

Variable		Not included in meta‐analysis (n = 8)		Included in meta‐analysis (n = 34)
Variable		n	%	n	%
Year published	Older (< 2013)	3	38	18	53
Year published	Recent (2013‐2021)	5	62	16	47
Tools used to diagnose autism spectrum disorder	One tool	8	100	21	62
	Two tools	0	0	10	29
	Three tools +	0	0	3	9
Multidisciplinary approach		0^a	0	13^b	45
Autism subgroup	Autism spectrum disorder	6	100	32	94
	Autistic disorder	0	0	2	6
	Childhood autism	0	0	0
IQ	< 70	4^c	80	15^d	58
	> 70	0	0	9	35
	Mixed	1	20	2	7
Male		433	80	9139	82
Sample size mean (range)		67 (13‐272)		329^e (11‐8564)
Age at baseline in years mean (range)		3.81 (2.5‐4.9)		3.04 (1.13‐5)
Length of follow‐up in years mean (range)		4.24 (1‐7.36)		2.53 (1‐8.3)
Risk of bias (rated low)	Sample (clinical, clinical from broad base, population)	0	0	4	12
	Description of sampling frame	2	25	9	26
	Description of baseline study sample	4	50	15	44
	Description of inclusion or exclusion criteria	3	38	14	41
	Adequacy of participation in study by all eligible	3	38	15	44
	Recruitment (prospective)	7	88	22	65
	Loss to follow‐up (LFU; low= >85% retained)	5	63	13	38
	Description of attempts to collect info on those LFU	0	0	3	9
	Reasons for LFU provided	0	0	3	9
	Reasons for LFU linked to outcome	1	13	1	3
	Description of LFU participants	0	0	1	3
	Analysis: important differences LFU vs non‐LFU in study	2	25	6	18
	Blinding	1	13	5	15
	Clear definition of diagnosis	6	75	33	97
	Same diagnosis outcome tool for all	8	100	34	100
	Valid and reliable tool	8	100	34	100
	Method and setting of outcome measurements same for all participants	4	50	21	62
	Completeness of outcome measure	7	88	33	97
IQ: intelligence quotient; n: number.
Footnotes LFU: Loss to follow‐up. ^an = 3 ^bn = 29 ^cn = 3 ^dn = 26 ^eIf we remove the outlier study with n = 8564, mean n = 79.

Study ID: Baghdadli 2012
Domain	Risk of bias level	Support for judgement
Sample (described)	High	Clinical sample
Description of sampling frame	Moderate	Some description
Description of baseline study sample	Low	Well described
Description of inclusion or exclusion criteria	Low	Well described
Adequacy of participation in study by all eligible	High	Participation in study by all eligible not adequate
Recruitment	Low	Prospective
Loss to follow‐up (LFU)	High	51% of sample lost to follow‐up
Description of attempts to collect information on those LFU	Moderate	Some information provided but inadequate
Reasons for LFU provided	Low	Yes
Reasons for LFU linked to outcome	Moderate	Some information provided but inadequate
Description of LFU participants	High	Not described
Analysis: important differences LFU vs non‐LFU in study	Low	No
Blinding	Moderate	Inadequately blinded
Clear definition of diagnosis provided at follow‐up	Low	Yes
Same diagnosis outcome tool for all	Low	Yes
Valid and reliable tool	Low	Standardised, reliable valid tool used
Method and setting of outcome measurements same for all participants	High	No
Completeness of outcome measure	Low	Diagnostic tools completed by all study participants

Study ID: Benedettto 2021
Domain	Risk of bias level	Support for judgement
Sample (described)	Moderate	Clinical sample from a broad community base
Description of sampling frame	High	Not described
Description of baseline study sample	Moderate	Some description
Description of inclusion or exclusion criteria	Low	Well described
Adequacy of participation in study by all eligible	Low	Adequate participation by all eligible
Recruitment	Low	Prospective
Loss to follow‐up (LFU)	Low	13% of sample lost to follow‐up
Description of attempts to collect information on those LFU	High	Not described
Reasons for LFU provided	High	No
Reasons for LFU linked to outcome	High	No
Description of LFU participants	High	Not described
Analysis: important differences LFU vs non‐LFU in study	Unclear	Not described
Blinding	High	Not blinded
Clear definition of diagnosis provided at follow‐up	Low	Yes
Same diagnosis outcome tool for all	Low	Yes
Valid and reliable tool	Low	Standardised, reliable valid tool used
Method and setting of outcome measurements same for all participants	Low	Yes
Completeness of outcome measure	Low	Diagnostic tools completed by all study participants

Study ID: Bopp 2006
Domain	Risk of bias level	Support for judgement
Sample (described)	High	Clinical sample
Description of sampling frame	Low	Well described
Description of baseline study sample	Low	Well described
Description of inclusion or exclusion criteria	High	Not described
Adequacy of participation in study by all eligible	Low	Adequate participation by all eligible
Recruitment	Low	Prospective
Loss to follow‐up (LFU)	Low	None of sample lost to follow‐up
Description of attempts to collect info on those LFU	Not applicable as no LFU
Reasons for LFU provided	Not applicable as no LFU
Reasons for LFU linked to outcome	Not applicable as no LFU
Description of LFU participants	Not applicable as no LFU
Analysis: important differences LFU vs non‐LFU in study	Not applicable as no LFU
Blinding	Unclear	Not described
Clear definition of diagnosis provided at follow‐up	Low	Yes
Same diagnosis outcome tool for all	Low	Yes
Valid and reliable tool	Low	Standardised, reliable valid tool used
Method and setting of outcome measurements same for all participants	High	No
Completeness of outcome measure	Low	Diagnostic tools completed by all study participants

Study ID: Brian 2016
Domain	Risk of bias level	Support for judgement
Sample (described)	High	Clinical sample
Description of sampling frame	Low	Well described
Description of baseline study sample	Moderate	Some description
Description of inclusion or exclusion criteria	High	Not described
Adequacy of participation in study by all eligible	High	Not described
Recruitment	Low	Prospective
Loss to follow‐up (LFU)	High	29% of sample lost to follow‐up
Description of attempts to collect information on those LFU	High	Not described
Reasons for LFU provided	High	No
Reasons for LFU linked to outcome	High	No
Description of LFU participants	High	Not described
Analysis: important differences LFU vs non‐LFU in study	Low	No
Blinding	High	Not blinded
Clear definition of diagnosis provided at follow‐up	Low	Yes
Same diagnosis outcome tool for all	Low	Yes
Valid and reliable tool	Low	Standardised, reliable valid tool used
Method and setting of outcome measurements same for all participants	Low	Yes
Completeness of outcome measure	Low	Diagnostic tools completed by all study participants

Study ID: Chu 2017
Domain	Risk of bias level	Support for judgement
Sample (described)	High	Clinical sample
Description of sampling frame	Low	Well described
Description of baseline study sample	Low	Well described
Description of inclusion or exclusion criteria	Low	Well described
Adequacy of participation in study by all eligible	High	Participation in study by all eligible not adequate
Recruitment	Low	Prospective
Loss to follow‐up (LFU)	Low	13% of sample lost to follow‐up
Description of attempts to collect info on those LFU	High	Not described
Reasons for LFU provided	High	Not described
Reasons for LFU linked to outcome	High	Not described
Description of LFU participants	High	Not described
Analysis: important differences LFU vs non‐LFU in study	Unclear	Not described
Blinding	High	Not blinded
Clear definition of diagnosis provided at follow‐up	Low	Yes
Same diagnosis outcome tool for all	Low	Yes
Valid and reliable tool	Low	Standardised, reliable valid tool used
Method and setting of outcome measurements same for all participants	Low	Yes
Completeness of outcome measure	Low	Diagnostic tools completed by all study participants

Study ID: Demb 1989
Domain	Risk of bias level	Support for judgement
Sample (described)	Moderate	Clinical sample from a broad community base
Description of sampling frame	Moderate	Some description
Description of baseline study sample	High	Not described
Description of inclusion or exclusion criteria	High	Not described
Adequacy of participation in study by all eligible	High	Participation in study by all eligible not adequate
Recruitment	Low	Prospective
Loss to follow‐up (LFU)	High	33.3% of sample lost to follow‐up
Description of attempts to collect information on those LFU	Moderate	Some information but information inadequate
Reasons for LFU provided	Moderate	Some information but information inadequate
Reasons for LFU linked to outcome	High	No
Description of LFU participants	High	No
Analysis: important differences LFU vs non‐LFU in study	Unclear	Not described
Blinding	High	Not blinded
Clear definition of diagnosis provided at follow‐up	Low	Yes
Same diagnosis outcome tool for all	Low	Yes
Valid and reliable tool	Low	Standardised, reliable valid tool used
Method and setting of outcome measurements same for all participants	High	No, 11 were done in person and one via phone
Completeness of outcome measure	Low	Diagnostic tools completed by all study participants

Study ID: DeWaay 2012
Domain	Risk of bias level	Support for judgement
Sample (described)	High	Clinical sample
Description of sampling frame	Moderate	Some description
Description of baseline study sample	Moderate	Some description
Description of inclusion or exclusion criteria	Moderate	Some description
Adequacy of participation in study by all eligible	Low	Adequate participation by all eligible
Recruitment	Low	Prospective
Loss to follow‐up (LFU)	Low	None of sample lost to follow‐up but participants selected retrospectively from a prospective cohort
Description of attempts to collect info on those LFU	Unclear	Not described
Reasons for LFU provided	Unclear	Not described
Reasons for LFU linked to outcome	Unclear	Not described
Description of LFU participants	Unclear	Not described
Analysis: important differences LFU vs non‐LFU in study	Unclear	Not described
Blinding	High	Not blinded
Clear definition of diagnosis provided at follow‐up	High	No
Same diagnosis outcome tool for all	Low	Yes
Valid and reliable tool	Low	Standardised, reliable valid tool used
Method and setting of outcome measurements same for all participants	High	No
Completeness of outcome measure	Low	Diagnostic tools completed by all study participants

Study ID: Eaves 2004
Domain	Risk of bias level	Support for judgement
Sample (described)	Low	Population‐based sample
Description of sampling frame	Low	Well described
Description of baseline study sample	Low	Well described
Description of inclusion or exclusion criteria	Moderate	Some description
Adequacy of participation in study by all eligible	Low	Adequate participation by all eligible
Recruitment	Low	Prospective
Loss to follow‐up (LFU)	Low	None of sample lost to follow‐up
Description of attempts to collect information on those LFU	Not applicable as no LFU
Reasons for LFU provided	Not applicable as no LFU
Reasons for LFU linked to outcome	Not applicable as no LFU
Description of LFU participants	Not applicable as no LFU
Analysis: important differences LFU vs non‐LFU in study	Not applicable as no LFU
Blinding	Unclear	Not discussed
Clear definition of diagnosis provided at follow‐up	Low	Yes
Same diagnosis outcome tool for all	Low	Yes
Valid and reliable tool	Low	Standardised, reliable valid tool used
Method and setting of outcome measurements same for all participants	Low	Yes
Completeness of outcome measure	Low	Diagnostic tools completed by all study participants

Study ID: Elmose 2014
Domain	Risk of bias level	Support for judgement
Sample (described)	High	Clinical sample
Description of sampling frame	Moderate	Some description
Description of baseline study sample	Moderate	Some description
Description of inclusion or exclusion criteria	High	Not described
Adequacy of participation in study by all eligible	Low	Adequate participation by all eligible
Recruitment	Moderate	Retrospective, with the whole cohort considered
Loss to follow‐up (LFU)	Not applicable as the study is retrospective
Description of attempts to collect information on those LFU	Low	Yes
Reasons for LFU provided	Moderate	Some information but information is inadequate
Reasons for LFU linked to outcome	High	Not described
Description of LFU participants	High	Not described
Analysis: important differences LFU vs non‐LFU in study	High	Not described
Blinding	Low	Blinding adequate
Clear definition of diagnosis provided at follow‐up	Low	Yes
Same diagnosis outcome tool for all	Low	Yes
Valid and reliable tool	Low	Standardised, reliable valid tool used
Method and setting of outcome measurements same for all participants	Low	Yes
Completeness of outcome measure	Low	Diagnostic tools completed by all study participants

Study ID: Flanagan 2011
Domain	Risk of bias level	Support for judgement
Sample (described)	High	Clinical sample
Description of sampling frame	Moderate	Some description
Description of baseline study sample	Moderate	Some description
Description of inclusion or exclusion criteria	Low	Well described
Adequacy of participation in study by all eligible	High	Participation in study by all eligible not adequate
Recruitment	High	Retrospective
Loss to follow‐up (LFU)	Not applicable as the study is retrospective
Description of attempts to collect information on those LFU	Not applicable as the study is retrospective
Reasons for LFU provided	Not applicable as the study is retrospective
Reasons for LFU linked to outcome	Not applicable as the study is retrospective
Description of LFU participants	Not applicable as the study is retrospective
Analysis: important differences LFU vs non‐LFU in study	Not applicable as the study is retrospective
Blinding	High	Not blinded
Clear definition of diagnosis provided at follow‐up	Low	Yes
Same diagnosis outcome tool for all	Low	Yes
Valid and reliable tool	Low	Standardised, reliable valid tool used
Method and setting of outcome measurements same for all participants	Low	Yes
Completeness of outcome measure	Low	Diagnostic tools completed by all study participants

Study ID: Freeman 2004
Domain	Risk of bias level	Support for judgement
Sample (described)	High	Clinical sample
Description of sampling frame	Low	Well described
Description of baseline study sample	Moderate	Some description
Description of inclusion or exclusion criteria	High	Not described
Adequacy of participation in study by all eligible	Low	Adequate participation by all eligible
Recruitment	High	Retrospective
Loss to follow‐up (LFU)	Not applicable as the study is retrospective
Description of attempts to collect information on those LFU	Not applicable as the study is retrospective
Reasons for LFU provided	Not applicable as the study is retrospective
Reasons for LFU linked to outcome	Not applicable as the study is retrospective
Description of LFU participants	Not applicable as the study is retrospective
Analysis: important differences LFU vs non‐LFU in study	Not applicable as the study is retrospective
Blinding	Moderate	Blinding inadequate
Clear definition of diagnosis provided at follow‐up	Low	Yes
Same diagnosis outcome tool for all	Low	Yes
Valid and reliable tool	Low	Standardised, reliable valid tool used
Method and setting of outcome measurements same for all participants	Unclear	Not described
Completeness of outcome measure	Low	Diagnostic tools completed by all study participants

PERMALINK

Overall prognosis of preschool autism spectrum disorder diagnoses

Amanda Brignell

Rachael C Harwood

Tamara May

Susan Woolfenden

Alicia Montgomery

Alfonso Iorio

Katrina Williams

Abstract

Background

Objectives

Search methods

Selection criteria

Data collection and analysis

Main results

Authors' conclusions

Plain language summary

Summary of findings

Summary of findings 1. Summary of findings.

Background

Description of the condition

1. Changes to the classification systems over time.

Why it is important to do this review

Objectives

Methods

Criteria for considering studies for this review

Types of studies

Types of participants

Types of prognostic factors

Types of outcome measures

Primary outcome

Secondary outcomes

Search method for identification of studies

Electronic searches

Searching other resources

Data collection and analysis

Selection of studies

Data extraction and management

Assessment of risk of bias in included studies

Measures of association

Unit of analysis issues

Dealing with missing data

Assessment of heterogeneity

Reporting bias

Data synthesis

Assessment of quality of the evidence

Prognostic factor analysis

Sensitivity analysis

Results

Results of the search

1.

2. Studies that had multiple publications.

Included studies

3. Characteristics of individual studies included in synthesis (n = 34).

Excluded studies

Studies awaiting classification

Risk of bias assessment of included studies

2.

3.

4.

Findings

Primary outcomes

5.

Secondary outcomes

Prognostic factor analyses

4. Prognostic factor analyses (eight comparisons), with effect sizes and confidence intervals.

Sensitivity analyses

Discussion

Summary of main results

Quality of evidence available

Strengths and weaknesses of the review

Applicability of findings to clinical practice and policy

Agreements and disagreements with other studies or reviews

Authors' conclusions

History

Acknowledgements

Appendices

Appendix 1. Search strategies

Ovid MEDLINE

Study ID: Gillberg 1990
Domain	Risk of bias level	Support for judgement
Sample (described)	High	Clinical sample
Description of sampling frame	Moderate	Some description
Description of baseline study sample	Moderate	Some description
Description of inclusion or exclusion criteria	High	Not described
Adequacy of participation in study by all eligible	Low	Adequate participation by all eligible
Recruitment	Low	Prospective
Loss to follow‐up (LFU)	Low	None of sample lost to follow‐up
Description of attempts to collect information on those LFU	Not applicable as no LFU
Reasons for LFU provided	Not applicable as no LFU
Reasons for LFU linked to outcome	Not applicable as no LFU
Description of LFU participants	Not applicable as no LFU
Analysis: important differences LFU vs non‐LFU in study	Not applicable as no LFU
Blinding	High	Not
Clear definition of diagnosis provided at follow‐up	Low	Yes
Same diagnosis outcome tool for all	Low	Yes
Valid and reliable tool	Moderate	Valid or reliable but parent‐rated tool
Method and setting of outcome measurements same for all participants	Low	Yes
Completeness of outcome measure	Low	Diagnostic tools completed by all study participants

Study ID: Giserman‐Kiss
Domain	Risk of bias level	Support for judgement
Sample (described)	High	Clinical sample
Description of sampling frame	Low	Well described
Description of baseline study sample	Low	Well described
Description of inclusion or exclusion criteria	Moderate	Some description
Adequacy of participation in study by all eligible	Low	Adequate participation in study by all eligible
Recruitment	Low	Prospective
Loss to follow‐up (LFU)	Low	None of sample lost to follow‐up
Description of attempts to collect information on those LFU	Not applicable as no LFU	Not described
Reasons for LFU provided	Not applicable as no LFU	Not described
Reasons for LFU linked to outcome	Not applicable as no LFU	Not described
Description of LFU participants	Not applicable as no LFU	Not described
Analysis: important differences LFU vs non‐LFU in study	Not applicable as no LFU	Not described
Blinding	Unclear	Not described
Clear definition of diagnosis provided at follow‐up	Low	Yes
Same diagnosis outcome tool for all	Low	Yes
Valid and reliable tool	Low	Standardised, reliable valid tool used
Method and setting of outcome measurements same for all participants	Unclear	Not described
Completeness of outcome measure	Low	Diagnostic tools completed by all study participants

Study ID: Gonzalez 1993
Domain	Risk of bias level	Support for judgement
Sample (described)	Low	Population‐based sample
Description of sampling frame	Low	Well described
Description of baseline study sample	Low	Well described
Description of inclusion or exclusion criteria	High	Not described
Adequacy of participation in study by all eligible	High	Participation in study by all eligible not adequate
Recruitment	Moderate	Retrospective with the whole cohort considered
Loss to follow‐up (LFU)	Not applicable as the study is retrospective
Description of attempts to collect information on those LFU	Not applicable as the study is retrospective
Reasons for LFU provided	Not applicable as the study is retrospective
Reasons for LFU linked to outcome	Not applicable as the study is retrospective
Description of LFU participants	Not applicable as the study is retrospective
Analysis: important differences LFU vs non‐LFU in study	Not applicable as the study is retrospective
Blinding	High	Not blinded
Clear definition of diagnosis provided at follow‐up	Low	Yes
Same diagnosis outcome tool for all	Low	Yes
Valid and reliable tool	Low	Standardised, reliable valid tool used
Method and setting of outcome measurements same for all participants	Low	Yes
Completeness of outcome measure	Low	Diagnostic tools completed by all study participants

Study ID: Haglund 2020
Domain	Risk of bias level	Support for judgement
Sample (described)	Moderate	Clinical sample from a broad community base
Description of sampling frame	Moderate	Some description
Description of baseline study sample	Low	Well described
Description of inclusion or exclusion criteria	Moderate	Some description
Adequacy of participation in study by all eligible	High	Participation in study by all eligible not adequate
Recruitment	Low	Prospective
Loss to follow‐up (LFU)	High	2% of sample lost to follow‐up
Description of attempts to collect information on those LFU	High	Not described
Reasons for LFU provided	High	No
Reasons for LFU linked to outcome	High	No
Description of LFU participants	High	Not described
Analysis: important differences LFU vs non‐LFU in study	Unclear	Not described
Blinding	Low	Blinding adequate
Clear definition of diagnosis provided at follow‐up	Low	Yes
Same diagnosis outcome tool for all	Low	Yes
Valid and reliable tool	Low	Standardised, reliable valid tool used
Method and setting of outcome measurements same for all participants	Low	Yes
Completeness of outcome measure	Low	Diagnostic tools completed by all study participants

Study ID: Hinnebusch 2017
Domain	Risk of bias level	Support for judgement
Sample (described)	Low	Population‐based sample
Description of sampling frame	Low	Well described
Description of baseline study sample	Low	Well described
Description of inclusion or exclusion criteria	Low	Well described
Adequacy of participation in study by all eligible	High	Participation in study by all eligible not adequate
Recruitment	Low	Prospective
Loss to follow‐up (LFU)	High	44% of sample lost to follow‐up
Description of attempts to collect information on those LFU	Low	Yes
Reasons for LFU provided	High	No
Reasons for LFU linked to outcome	Unclear	Not described
Description of LFU participants	Low	Yes
Analysis: important differences LFU vs non‐LFU in study	Low	Some differences but these would not impact outcome
Blinding	Unclear	Not described
Clear definition of diagnosis provided at follow‐up	Low	Yes
Same diagnosis outcome tool for all	Low	Yes
Valid and reliable tool	Low	Standardised, reliable valid tool used
Method and setting of outcome measurements same for all participants	Low	Yes
Completeness of outcome measure	High	More than 90% of diagnostic tools not completed

Study ID: Kim 2015
Domain	Risk of bias level	Support for judgement
Sample (described)	High	Clinical sample
Description of sampling frame	Moderate	Some description
Description of baseline study sample	Low	Not described
Description of inclusion or exclusion criteria	Low	Not described
Adequacy of participation in study by all eligible	Low	Adequate participation by all eligible
Recruitment	Low	Prospective
Loss to follow‐up (LFU)	Low	None of sample lost to follow‐up
Description of attempts to collect information on those LFU	Not applicable as no LFU
Reasons for LFU provided	Not applicable as no LFU
Reasons for LFU linked to outcome	Not applicable as no LFU
Description of LFU participants	Not applicable as no LFU
Analysis: important differences LFU vs non‐LFU in study	Not applicable as no LFU
Blinding	High	Not blinded
Clear definition of diagnosis provided at follow‐up	Low	Yes
Same diagnosis outcome tool for all	Low	Yes
Valid and reliable tool	Low	Standardised, reliable valid tool used
Method and setting of outcome measurements same for all participants	Low	Yes
Completeness of outcome measure	Low	Diagnostic tools completed by all study participants

Study ID: Klintwall 2015
Domain	Risk of bias level	Support for judgement
Sample (described)	Unclear	Not described
Description of sampling frame	Moderate	Some description
Description of baseline study sample	High	Not described
Description of inclusion or exclusion criteria	Moderate	Some
Adequacy of participation in study by all eligible	High	Participation in study by all eligible not adequate
Recruitment	Low	Prospective
Loss to follow‐up (LFU)	Low	None of sample lost to follow‐up
Description of attempts to collect information on those LFU	Not applicable as no LFU
Reasons for LFU provided	Not applicable as no LFU
Reasons for LFU linked to outcome	Not applicable as no LFU
Description of LFU participants	Not applicable as no LFU
Analysis: important differences LFU vs non‐LFU in study	Not applicable as no LFU
Blinding	High	Not blinded
Clear definition of diagnosis provided at follow‐up	Low	Yes
Same diagnosis outcome tool for all	Low	Yes
Valid and reliable tool	Low	Standardised, reliable valid tool used
Method and setting of outcome measurements same for all participants	Low	Yes
Completeness of outcome measure	Low	Diagnostic tools completed by all study participants

Study ID: Mahli 2011
Domain	Risk of bias level	Support for judgement
Sample (described)	High	Clinical sample
Description of sampling frame	Moderate	Some description
Description of baseline study sample	Moderate	Some description
Description of inclusion or exclusion criteria	Low	Well described
Adequacy of participation in study by all eligible	Unclear	Not described
Recruitment	Low	Prospective
Loss to follow‐up (LFU)	High	54.2% of sample lost to follow‐up
Description of attempts to collect information on those LFU	High	Not described
Reasons for LFU provided	High	No
Reasons for LFU linked to outcome	Unclear	Not described
Description of LFU participants	Moderate	Some information but inadequate
Analysis: important differences LFU vs non‐LFU in study	Low	Yes
Blinding	High	Not blinded
Clear definition of diagnosis provided at follow‐up	Low	Yes
Same diagnosis outcome tool for all	Low	Yes
Valid and reliable tool	Low	Standardised, reliable valid tool used
Method and setting of outcome measurements same for all participants	Low	Yes
Completeness of outcome measure	Low	Diagnostic tools completed by all study participants

Study ID: Moore 2003
Domain	Risk of bias level	Support for judgement
Sample (described)	High	Clinical sample
Description of sampling frame	Moderate	Some description
Description of baseline study sample	Moderate	Some description
Description of inclusion or exclusion criteria	High	Not described
Adequacy of participation in study by all eligible	Unclear	Not described
Recruitment	Low	Prospective
Loss to follow‐up (LFU)	Low	% of sample lost to follow‐up
Description of attempts to collect information on those LFU	Not applicable as no LFU
Reasons for LFU provided	Not applicable as no LFU
Reasons for LFU linked to outcome	Not applicable as no LFU
Description of LFU participants	Not applicable as no LFU
Analysis: important differences LFU vs non‐LFU in study	Not applicable as no LFU
Blinding	Low	Blinded
Clear definition of diagnosis provided at follow‐up	Low	Yes
Same diagnosis outcome tool for all	Low	Yes
Valid and reliable tool	Low	Standardised, reliable valid tool used
Method and setting of outcome measurements same for all participants	Low	Yes
Completeness of outcome measure	Low	Diagnostic tools completed by all study participants

Study ID: Moss 2008
Domain	Risk of bias level	Support for judgement
Sample (described)	Moderate	Clinical sample from a broad community base
Description of sampling frame	Moderate	Some description
Description of baseline study sample	Low	Well described
Description of inclusion or exclusion criteria	Low	Well described
Adequacy of participation in study by all eligible	Low	Participation in study by all eligible was adequate
Recruitment	Low	Prospective
Loss to follow‐up (LFU)	Moderate	20% of sample lost to follow‐up
Description of attempts to collect information on those LFU	Low	Well described
Reasons for LFU provided	Low	Yes
Reasons for LFU linked to outcome	Low	Yes
Description of LFU participants	High	Not described
Analysis: important differences LFU vs non‐LFU in study	Low	Yes
Blinding	Moderate	Blinding inadequate
Clear definition of diagnosis provided at follow‐up	Low	Yes
Same diagnosis outcome tool for all	Low	Yes
Valid and reliable tool	Low	Standardised, reliable valid tool used
Method and setting of outcome measurements same for all participants	Low	Yes
Completeness of outcome measure	Low	Diagnostic tools completed by all study participants