Abstract
Transitive inference (TI) has a long history in the study of human development. There have, however, few pediatric studies that report clinical diagnoses have tested trial-and-error transitive inference learning, in which participants infer item relations, rather than evaluate them explicitly from verbal descriptions. Children aged 8 to 10 underwent a battery of clinical assessments and received a range of diagnoses, potentially including autism spectrum disorder (ASD), attention-deficit hyperactive disorder (ADHD), anxiety disorders (AD), specific learning disorders (SLD), and/or communication disorders (CD). Participants also performed a trial-and-error learning task that tested for transitive inference. Response accuracy and reaction time were assessed using a statistical model that controlled for diagnostic comorbidity at the group level. Participants in all diagnostic categories showed evidence of transitive inference. However, a model comparison analysis suggested that those diagnosed with ASD succeeded in a qualitatively different way, responding more slowly to each choice and improving faster across trials than their non-ASD counterparts. Additionally, transitive inference performance was not associated with IQ. Overall, our data suggest that superficially similar performance levels between ASD and non-ASD participants may have resulted from a difference in the speed-accuracy tradeoff made by each group. Our work provides a preliminary profile of the impact of various clinical diagnoses on TI performance in young children. Of these, an ASD diagnosis resulted in the largest difference in task strategy.
Keywords: serial learning, learning, autism spectrum disorder, cognition, humans
Lay Summary:
Children in a clinical sample with various diagnoses successfully learned how to order a set of pictures. However, some diagnostic categories appeared to be associated with different strategies for success than others.
Autism spectrum disorder (ASD) has been associated with impairments to attention, sensory perception, and memory (Chowdhury et al., 2017; Cooper et al., 2017). However, many of those diagnosed with ASD possess psychological faculties that remain intact. A challenge in characterizing ASD is that these intact faculties may nonetheless give rise to atypical schemas, because of the interaction of various cognitive processes. For example, most children diagnosed with ASD classify objects into learned perceptual categories, but these categories may differ substantially from those learned by most other children (Mercado et al., 2020). Thus, when children with ASD who were asked to create categories for abstract geometric shapes, they were often able to do so, but they relied on different shape properties than those without ASD (Mercado et al., 2015, Church et al., 2015). The underpinnings of learning for children with ASD, with respect to these perceptual differences, are still in contention. Therefore, experimental tasks of learning and memory should not focus solely on performance deficits, but rather should also help to identify cognitive processes and strategies that are more specific to those diagnosed with ASD.
Currently, the accepted symptomatology of ASD in clinical settings focuses on (1) social and communicative difficulties and (2) patterns of perseverative or restricted behavior or interest, neither of which necessarily reflect distinguishing characteristics in other cognitive domains, such as spatial or analogical reasoning. Nevertheless, a wide range of research studies have compared samples of ASD participants to neurotypical controls on cognitive tasks other than those that act as diagnostic criteria (Velikonja et al., 2019, provide a systematic review). When subjected to meta-analysis, ASD samples display a deficit, on average, in several non-social, non-verbal domains when compared to controls. As a spectrum disorder, however, autism is very poorly characterized by averages, and the aggregation of studies into meta-analyses further collapses much of the context needed to interpret those differences. Given the wide variety of characteristics exhibited by those on the autism spectrum, it is important to preserve this context and avoid making sweeping generalizations. This deeper understanding of the underlying cognitive processes may both facilitate diagnosis and help to personalize treatment and therapy options.
A further complication in trying to interpret large-scale ASD results is the high rate of diagnostic comorbidity. Khachadourian et al. (2023) reviewed the medical records of 40,582 individuals with ASD, as compared to 11,389 of their non-ASD siblings as a control group. ASD was associated with at least one comorbid diagnosis over 70% of the time, more than twice that for controls; and more than half of ASD records indicated two or more comorbidities. This makes evaluating reported patterns of cognitive deficits associated with specifically ASD much more difficult to interpret. Furthermore, symptoms of other diagnoses may present differently when comorbid with ASD (Belardinelli et al., 2016). Given these demographics, the study of cognitive processing in the ASD population must take comorbidities into consideration, since an isolated ASD diagnosis is the exception rather than the rule.
Transitive Inference
One topic that has not been widely studied in ASD populations is the ability to perform transitive inference (TI) (Jensen et al., 2017; Kao et al., 2020). Provided someone has already learned that A > B and B > C, TI is a cognitive ability that lets that person infer that A > C. Since this transitive property applies to any ordered set, TI is a crucial means for generalizing from past experience to new comparisons. A participant’s ability to perform TI tasks is used to measure relational memory in both clinical contexts (e.g., Onwuamezeet al., 2016) and in cognitive neuroscience (e.g., Wing et al., 2021). While this ability is basic enough that can be studied in animal models, and in humans with minimal verbal instruction, it cannot be explained solely in terms of associative learning (Jensen et al., 2019) and appears instead to reflect an underlying implicit cognitive process. The range of species that have demonstrated at least some capacity for TI includes monkeys (McGonigle and Chalmers, 1977; Jensen et al., 2017), fish (Hotta et al., 2020), and even wasps (Tibbetts et al., 2019). In many species, TI is thought to play a role in evaluating social dominance hierarchies (Gazes et al. 2017; Grosenick et al., 2007; Bond et al., 2010). As such, it is unsurprising that typically developing children are capable of successful transitive inferential learning, whether the task of transitivity consists of verbal word problems (Wright and Smailes, 2015), or of non-verbal trial-and-error learning (Holcomb et a., 1997; Bryant and Trabasso, 1971).
Contrastingly, very few studies have assessed TI in children with ASD. Those that have are generally based on small sample sizes (e.g. Gorham et al., 2009). When widening the scope to recruitment of adults with ASD, differences in performance are generally small and/or non-significant (Solomon et al., 2011; 2015), and again, these effects are not evaluated with comorbidities in mind. Nevertheless, reviews report cognitive deficits in “relational” and “analogical” reasoning associated with ASD (e.g., Banker et al., 2021).
Despite this paucity of clear evidence, there are theoretical reasons to think TI may reveal something interesting about symptoms associated with ASD. The trial-and-error task of TI given to animal models is by design, entirely non-verbal, so it is reasonable to hypothesize that individuals with ASD should be able to succeed at such tasks even if their symptoms include considerable difficulties with verbal communication. On the other hand, the comparative literature has argued that TI is an important for evaluating social group dynamics. Since TI is not currently considered diagnostic criterion for ASD, further understanding of this form of learning could help tease apart the cognitive processes that contribute to specific symptoms of ASD.
To do so, however, poses a considerable statistical challenge. Often citing statistical concerns, studies of TI in other clinical populations routinely exclude participants with other diagnoses. For example, Brunamonti et al. (2017) reported that children with an ADHD diagnosis displayed poorer TI performance, but did so having excluded those with “evidence of neurological disorders, pervasive developmental disorders, and receptive language disorders” (p. 201). Such exclusion criteria necessarily exclude children with ASD, and given the high rates of comorbidity between ASD and ADHD, this means by extension that they exclude a substantial minority of all children with ADHD.
In light of the above considerations, we felt that that overly stringent exclusion criteria risked excluding the majority of individuals for whom ASD is one of multiple diagnoses. Accordingly, we administered a TI task to a cohort of children with various clinical diagnoses, rather than those only diagnosed with ASD. First, we wanted to evaluate whether children with various clinical diagnoses, such as Autism Spectrum Disorder (ASD), Attention Deficit/Hyperactivity Disorder (ADHD), Anxiety Disorder (AD), Specific Learning Disorder (SLD), and Communication Disorder (CD) were able to successfully perform experimental TI. Second, we wanted to determine whether the children with ASD perform the task in a manner that is distinct from those without that diagnosis. Our data suggest children with ASD approached the tradeoff between reaction time and response accuracy using a different strategy than other children within this sample.
Methods
Participants
Participants were a clinical sample of 40 children (15F, 25M, aged 8 to 10 years) who had already been enrolled with the Child Mind Institute (CMI, New York, NY), an independent nonprofit that aims to match children with resources and support. The first step for a child working with CMI is a comprehensive clinical assessment, evaluating children over a series of visits (some in person, some remote), which included psychiatric inventories, cognitive tests, and examination by clinicians. The guardians of potential participants who sought assessment from CMI were invited to enroll them into the study in exchange for a $25 Amazon gift card, provided their IQ was at least 70. Once enrolled, the experimental session was added to their schedule of tests to be performed throughout the assessment period. Additionally, children who had previously completed CMI assessments in 2018/2019 were recruited to via their guardians to join the study, provided they fulfilled our inclusion/exclusion criteria. Guardians were reassured that participation would not impact other diagnostic determinations. Because CMI’s clinical assessment process was staggered over a period of weeks to avoid overloading the child, most participants performed the task before receiving diagnoses from CMI, with diagnostic information provided once the full assessment was complete.
Within our sample, 21 were diagnosed with autism spectrum disorder, 16 were diagnosed with ADHD, 12 were diagnosed with an anxiety disorder, 18 were diagnosed with a specific learning disorder, and 13 were diagnosed with a communication disorder. Only one participant satisfied none of these diagnostic categories, and only 15 satisfied a single diagnostic category. All others received multiple diagnoses. Table 1 provides an overview of all 40 participants with respect to their comorbidities. More details about these categories are provided in the Clinical Assessments section below.
Table 1:
Overview of Participant Diagnostic Categories
| Participants | ||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Diagnosis | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 |
|
| ||||||||||||||||||||||||||||||||||||||||
| ASD | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | |||||||||||||||||||
|
| ||||||||||||||||||||||||||||||||||||||||
| ADHD | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | ||||||||||||||||||||||||
|
| ||||||||||||||||||||||||||||||||||||||||
| AD | X | X | X | X | X | X | X | X | X | X | X | X | ||||||||||||||||||||||||||||
|
| ||||||||||||||||||||||||||||||||||||||||
| SLD | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | X | ||||||||||||||||||||||||
|
| ||||||||||||||||||||||||||||||||||||||||
| CD | X | X | X | X | X | X | X | X | X | X | X | X | X | |||||||||||||||||||||||||||
ASD: autism spectrum disorder; ADHD: attention deficit/hyperactivity disorder; AD: anxiety disorder;
SLD: specific learning disorder; CD: communication disorder
The experiment was approved by the Institutional Review Boards of Columbia University (protocol AAAR7039) and New York City College of Technology CUNY (protocol 2019–0601-NYCCT), conforming to guidelines for human research set forth by the American Psychological Association.
Procedure
Transitive Inference Task
Setup:
Our task was hosted using the Gorilla Experiment Builder for online experiments (Anwyl-Irvine et al., 2020), and implemented remotely by staff of the Child Mind Institute (CMI), NY, NY. The task was presented in a web browser, and participants used a computer mouse to click on their responses under guardian supervision, while a staff member was present via video conferencing to provide instructions, encouragement, and to troubleshoot difficulties. All parties were naïve to the underlying objective of the task.
Prior to the task, experimenters defined an ordered list of 7 pictures of cartoon characters. These were rank-ordered in advance by designating each image as A, B, C, D, E, F, or G, with image A being the “highest ranked” item, and G being the “lowest ranked” item. Figure 1A presents an example pair of images from the ordered list. Item ranks varied from one participant to the next. Neither the item labels nor their ordering was shared with participants or study staff.
Figure 1.

Transitive inference task. (A) Example of a 7-item list, using visually distinct cartoon characters. Stimuli were only ever presented in pairs, and the correct item in every pair was determined by a fixed hierarchy that was determined in advance. (B) Instructions given to participants. Beyond encouragement to continue trying, neither participants nor their guardians were given any additional information about the task until debriefing. (C) Sequence of events in a sample trial. Each trial began with a blue start stimulus. Clicking on it with the mouse caused the start stimulus to disappear and two list stimuli to appear. Clicking on the item of superior rank yielded positive feedback, whereas clicking in the other stimuli yielded negative feedback. The mouse cursor depicted in these example screens is of a scale appropriate to what a participant would see while performing the task.
Task Structure:
Participants were presented with pairs of images and instructed to click on whichever they thought was the “correct” image. Figure 1B displays the instructions given to participants. Figure 1C depicts an example pair of images from the ordered list, corresponding to the pair CE. In this pairing choosing stimulus C would be the correct response (and E would be incorrect). However, if a participant was instead shown pair BC, choosing stimulus C would instead have been the incorrect response (and B would be correct). For each trial, participants were required to choose the item they thought was the correct response.
Figure 1C outlines the trial sequence. Each trial began with a blue square at the center of the screen, which the participant needed to click with the mouse to proceed. As soon as they did, the blue square disappeared and was replaced with two images on either side of the cursor. When either image was chosen, feedback appeared in the center of the screen; a green check mark signaled a correct response, and a red X signaled an incorrect response. One second after the presentation of the feedback, the screen went blank and another trial began.
Training Phase:
Participants first completed a training phase of 180 trials, divided into 15 blocks. During training trials, participants saw only the pairs of stimuli with adjacent ranks in the ordered list: AB, BC, CD, DE, EF, and FG. Each of these pairs were presented twice during a block, once with the correct item on the left side of the screen, and once with the correct item on the right. The trial order of these 12 pairs was permuted randomly within each block. This ensured that these pairs were trained uniformly during the training phase.
Testing Phase:
After training and without any signal of a phase change, participants completed an 84-trial testing phase, consisting of two testing blocks. Each testing block presented all 21 possible pairings from the 7-item ordered list (e.g., AB, AC, AD, … EF, EG, FG), doing so twice to counterbalance stimulus position. The orders of pair presentations during each testing block were randomly permuted, with one constraint: The first 12 trials of the testing phase presented “critical test pairs,” which excluded both the training sets of adjacent pairs, and any pairs that included items A or G. The remaining pairs, BD, BE, BF, CE, CF, and DF, are considered the critical tests of whether the participant’s choices showed evidence of transitive inference.
Over the course of the 264 trials of the entire task (including training and testing phases), participants received three on-screen “encouragement signals” that served to signal their overall progress; “You’ve made it one quarter of the way” after trial 66, “You’ve made it half way” after trial 132, and “You’ve made it three quarters of the way” after trial 198.
Clinical Assessment
As noted above, participants could potentially belong to any of five diagnostic categories: Autism Spectrum Disorder (ASD), Attention Deficit/Hyperactivity Disorder (ADHD), Anxiety Disorder (AD), Specific Learning Disorder (SLD), and Communication Disorder (CD). With the exception of ASD, these are umbrella classifications that cover multiple specific diagnoses. ADHD covered all categories of symptom presentation (e.g., inattentive, hyperactive, etc.). AD potentially included any diagnosis of Separation Anxiety Disorder, Generalized Anxiety Disorder, Social Anxiety Disorder, Selective Mutism, Panic Disorder, Agoraphobia, and/or any other specific phobia or DSM-specified anxiety disorder. SLD potentially included measurable impairments in mathematics, reading, or written expression. CD potentially included Language Disorder, Social Communication Disorder, and/or Speech-Sound Disorder. Participants were grouped according to these categories partly for statistical power and partly to reduce participant identifiability. Participants’ guardians also completed an initial clinical intake interview, which informed all diagnostic categories. Subsequently, they completed a series of other assessments over the course of multiple visits, including the Wechsler Intelligence Scale for Children (WISC-V; Wechsler, 2014).
Mental health diagnoses were determined by a consensus process, primarily by interviews by clinicians and secondarily by a battery of tests. ASD, ADHD, and AD diagnoses were informed by the Schedule for Affective Disorders and Schizophrenia for School-Age Children (K-SADS-PL; Kaufman et al., 1997), conducted with both participants and their guardians, as well as the Child Behavior Checklist (CBCL; Achenback, 1991), conducted with both guardians and teachers. ASD diagnoses were additionally informed by guardian reports on the Autism Spectrum Screening Questionnaire (ASSQ; Ehlers et al., 1999), the Social Communication Questionnaire (SCQ; Chandler et al., 2007), the Social Responsiveness Scale (SRS-2; Constantino et al., 2003), and the Gillam Autism Rating Scale (GARS-3; Gillam, 2014). ADHD diagnoses were additionally informed by guardians completing the both the Strengths and Weaknesses Assessment of Normal Behavior (SWAN; Swanson et al., 2001) and its “extended” version (E-SWAN; Alexander et al., 2020), as well as participant completion of the Conners ADHD Rating Scale (C3SR; Conners, 2001) and measures taken using the NIH Toolbox (Gershon et al., 2013). AD diagnoses were additionally informed by participants and guardians both completing the Screen for Child Anxiety Related Emotional Disorders (SCARED; Birmaher et al., 1999).
In the case of SLD and CD, standardized tests were the primary determinants (in consideration with WISC-V norms), with clinical interviews providing secondary evidence. SLD evaluation considered performance on the Wechsler Individual Achievement Test (WIAT-III; Wechsler, 2009), the Comprehensive Test of Phonological Processing (CTOPP-2; Wagner et al., 2013), and the Test of Word Reading Efficiency (TOWRE-2; Torgesen et al., 2012). CD evaluation considered the WIAT-III, the CTOPP-2, the Clinical Evaluation of Language Fundamentals (CELF; Wiig et al., 2013), the Expressive Vocabulary Test (EVT-2; Williams, 2007), the Peabody Picture Vocabulary Test (PPVT-4; Dunn & Dunn, 2007), and the Goldman-Fristoe Test of Articulation (GFTA-3; Goldman & Fristoe, 2015). Participants who were at least 9 years old also completed the CELF extension for metalinguistics (Wiig & Secord, 2014).
In all cases, diagnosis ultimately depended on a clinical judgment of the full scope of collected assessments, with no single deciding criterion. A diagnosis that was received by fewer than 10 participants and did not fall into the above categories was not included as a variable in this study. As such, each participant may have been given additional diagnoses that are not reported here, both for reasons of statistical power and to reduce participant identifiability.
Analysis
Participant performance at the start of the testing phase was evaluated both in terms of proportion of correct responses and in terms of reaction times. In both cases, hierarchical regression models were used, which permitted both participant-level estimates of performance and a description of the overall patterns observed across groups of participants. Proportion of correct responses modeled using logistic regression, whereas reaction times were modeled using linear regression and the natural log of the reaction times in milliseconds. A detailed account of our analytic strategy is provided in the appendix.
To briefly summarize, each regression model describes participant performance in terms of four participant-level parameters (the intercept , the learning rate , the symbolic distance effect , and a distance/learning interaction ), as well as four population constants associated with clinical diagnoses that were shared among all participants (, and ). ASD was treated differently from the other diagnostic categories: All ASD participants were assessed using one hierarchical model, while all non-ASD participants were assessed using a second, separate hierarchical model. The only link between these two models were the four diagnostic constants. Partitioning the data into two distinct populations helped to make any differences between ASD and non-ASD performance as clear as possible; while modeling the effects of other diagnostic categories using a single constant is not ideal, the sample size was not sufficiently large to justify a more complex approach. Table 2 provides a summary of each parameter and how to interpret its effects.
Table 2:
Parameter Overview
| Parameter | Name | |
|---|---|---|
| intercept | Describes a participant’s overall performance, averaged across all stimulus pairs, on the first trial of an experimental phase. Larger values correspond to higher accuracy (for log-odds proportion correct) or slower speed (for log reaction time). | |
| learning rate | Describes how overall performance changes as a function of trials. Because all trials provided informative feedback, participants might continue to improve their performance as the testing phase unfolds. Positive values correspond to performance improving (for proportion correct) or slowing down (for log reaction times) as the experiment continues. | |
| symbolic distance effect | Describes how strong the transitive inference effect is at the start of the testing phase. Positive values indicate that participants found items that were far apart in the list to be easier (for log-odds proportion correct) or else responded to them more slowly (for log reaction times). During an analysis of the training phase, this parameter would be omitted. | |
| learning/distance interaction | Describes whether the strength of the symbolic distance effect changes during the testing phase. A participant with no distance effect at the start of testing could reveal one as testing proceeds. A value of zero, by contrast, reflects the distance effect remaining constant over time. During an analysis of the training phase, this parameter would be omitted. | |
| population-level parameter | For each of the participant-level parameters above, there was a population parameter that describes the value of an average participant in that group. For example, every participant receives their own learning rate , whereas the overall average participant in a group has a population-level learning rate . ASD participants and non-ASD participants had separate population-level parameters. | |
| diagnosis constant | For each diagnostic category other than ASD, a participant added a constant to their intercept . These population-level effects were shared by ASD and non-ASD participants. The value of each parameter is best interpreted as how large a difference a particular diagnosis makes to a participant’s intercept. |
It is natural to ask whether partitioning according to ASD is the most appropriate way to model the data. After all, some other diagnostic category might be more deserving of this partition. Perhaps, for example, an ADHD diagnosis might better explain some difference in strategy. To validate our partitioning of the data, we performed a model comparison analysis among models that partitioned participants according to each diagnostic category (including a model without any partition at all), as described in the appendix. By an overwhelming margin, the ASD vs. non-ASD split was best supported by the evidence for both measures of performance.
Results
Participants completed the task in an average of 1303.4s (SD = 382.9s), with most participants taking between 1000s and 1550s to finish. For both response accuracy and reaction time, participants steadily improved during the training phase, with an accompanying gradual acceleration in reaction times. No systematic differences between ASD and non-ASD participants were evident during training. These data are plotted in Supplemental Figure 1.
Figure 2 plots the group-level mean parameters (intercepts , distance effects , learning rates , and interaction terms ) for a logistic regression of response accuracy (top row, in log-odds units) and for a linear regression of log-reaction time (bottom row, in log milliseconds). Broadly, average performance was similar across diagnostic conditions in most cases, as evidenced by the high degree of overlap between estimated population parameter values for ASD vs. non-ASD participants. However, ASD participants differed from non-ASD participants in two respects. First, ASD participants displayed a higher learning rate (, implying a change from 61.3% accuracy to 78.7% accuracy for an average ASD participant over the first 100 trials) with respect to response accuracy than did non-ASD participants (, implying a change from 63.8% accuracy to 63.1% accuracy for an average non-ASD participant over the first 100 trials). This difference implies that the response accuracy of ASD participants improved more rapidly during the testing phase, as compared with effectively no improvement among non-ASD participants. Second, ASD participants had longer mean reaction times (, analogous to 1.86 seconds) than non-ASD participants (, analogous to 0.91 seconds). Other diagnostic categories did not show effects substantially different from zero for either behavioral measure. Trends also suggest that participants performed at above-chance levels at the start of all-pairs testing with a mild distance effect for response accuracy. A larger sample is needed to determine if an effect size of zero can be ruled out.
Figure 2.

Group-level posterior estimates for parameters in Equations 3 and 5, with ASD (white boxes) and non-ASD (gray boxes) treated as separate populations sharing common covariate effects for other diagnostic categories. Boxes represent the 80% credible interval for the mean, and whiskers represent the 95% credible interval. (Top Row) Parameters describing response accuracy in log-odds units. (Bottom Row) Parameters describing reaction time in log milliseconds.
Figure 3 plots estimated overall response accuracy and reaction time of all participants, with diagnostic indicators provided. Although the individual estimates of accuracy (top row) are noisy, most participants had posterior means above chance. Additionally, most participants had mean reaction times under 2s (middle row), although a few participants responded more slowly.
Figure 3.

Mean participant performance at the start of the testing phase, based on the regression model. Gray boxes correspond to non-ASD participants and white boxes correspond to ASD participants. Boxes represent the 80% credible interval for the mean, and whiskers represent the 95% credible interval. (Top Row) Mean response accuracy across all pairs at the start of testing. Chance responding is depicted as a dotted line. (Middle Row) Mean reaction time across all pairs at the start of testing. (Bottom Row) Indicator of each participant’s diagnostic categorizations, based on the consensus diagnostic process described in the Methods.
The diagnostic breakdown of participants in Table 1 (reproduced in Figure 3, bottom row) reveals an important limitation of working with a self-selected clinical sample. Only a single participant received no diagnoses within our framework, and the manner in which diagnoses were correlated was nonrandom. For example, ASD overlapped with ADHD more than expected by chance alone, according to a permutation test (Observed=12, Expected=8.4±1.56, p < .05), a result consistent with reported population norms (Rong et al., 2021). However, ASD also overlapped less than would be expected from chance alone with SLD (Observed=12, Expected=8.4±1.56, p < .05). ASD’s overlap with AD (Observed=7, Expected=6.3±1.47, p = .89) and CD (Observed=4, Expected=6.8±1.5, p = .11) did not differ significantly from expectation. Overall, of the 32 possible combinations of these diagnoses, only 17 appeared even once. As such, one should be cautious interpreting model parameters. For example, since nearly all non-ASD participants received either an SLD or a CD diagnosis in this sample, their reaction time intercept is, in practice, almost always combined with the trending-above-zero parameters of either or . A larger sample would make these results easier to interpret, particularly one with more participants without diagnoses in any of our five categories. Larger samples are also required to make representative projections to population performance for each diagnosis.
Because statistical modeling can obscure a summary of the measured values in a study, we also performed a simple average of response accuracy and reaction times across participants in the Non-ASD and ASD groups during the first block of testing (Figure 4). This averaging treated participants as independent of one another, with confidence intervals computed using bootstrapping. However, these means should be interpreted cautiously, because they are unable to address the underlying covariation between diagnostic categories. Nevertheless, some of the main takeaways from the regression analysis remain evident, particularly the slower reaction times of ASD participants at the start of testing.
Figure 4.

Simple means across groups for all critical pairs, estimated via bootstrapping. Gray boxes correspond to non-ASD participants and white boxes correspond to ASD participants. Boxes represent the 80% confidence interval for the mean, and whiskers represent the 95% confidence interval. (Top Row) Mean response accuracy across critical pairs at the start of testing. Chance responding is depicted as a dotted line. (Bottom Row) Mean reaction time across critical pairs at the start of testing.
Although IQ was measured as part of the consensus diagnostic process and served as an inclusion criterion for the study, IQ was not used as a parameter in our regression models. As such, it is reasonable to ask whether IQ was predictive of estimated TI performance. Figure 5 plots these relationships for both response accuracy and reaction time. Spearman’s correlation tests yield no meaningful association with IQ for either response accuracy (rs =−0.02, p = 0.88) or for reaction time (rs =−0.01, p = 0.94). This is an important finding, because it suggests that the TI task is neither trivially easy nor prohibitively difficult for a wide range of child-aged participants.
Figure 5.

Task performance at the start of the testing phase as compared to participant IQ (reported in z units). Black points correspond to non-ASD participants and white points correspond to ASD participants. Whiskers represent the 95% credible interval of the mean. (Left) Mean response accuracy across all pairs compared to IQ. (Right) Mean reaction time across all pairs compared to IQ.
Discussion
Participants, on average, learned the order of items in a 7-item list in under 200 trials, responding above chance with the suggestion of a symbolic distance effect for accuracy. These results, considered both in terms of response accuracy and reaction time, are evidence of relational learning (Dusek & Eichenbaum, 1997) across all diagnostic categories.
Diagnostic Categories
Participants with an ASD diagnosis took longer on average to respond during testing, and displayed a higher learning rate, as compared to non-ASD participants. Model comparison suggested a substantive contrast between ASD and non-ASD participants. This is noteworthy because both groups showed almost identical levels of response accuracy at the start of the task, as characterized by having nearly identical population intercepts () and distance effects () with respect to response accuracy. Informally, while both groups obtained similar outcomes when testing began, they may nevertheless have differed in the processes by which those outcomes were achieved. A qualitative difference may exist despite there being no evidence of an overall performance deficit.
Because our task involved the consistent application of a rule with no contradictory outcomes, our result is consistent with the hypothesis posed by Van de Cruys et al. (2014) that many deficits associated with ASD stem from uncertainty and seemingly inconsistent feedback. A larger difference might have been observed in our study if reward delivery was probabilistic. Reaction time and learning rate effects may be driven by differences in the speed-accuracy tradeoff and may reflect attention divided between the presented stimuli (see also Todd et al., 2009). Differences in strategy may also be present; Qian & Lipkin (2011) proposed a bias away from “interpolation” in ASD, in favor of “lookup-table” learning, so slower reaction times may be compensatory to maintain a given level of accuracy. Participants with AD or CD diagnoses did less well on average, with mean effects large enough to cancel out the model’s intercept term (Figure 2). As such, most (but not all) participants who received either of these diagnoses began the testing phase with response accuracies close to chance. By contrast, ADHD had almost no differential role, particularly for response accuracy at the start of testing.
It would be a mistake to uncritically compare our work with other reported effects in the literature, given that a majority of our ASD participants had an ADHD diagnosis, and vice versa. Our hope is that future work reflects the full covariance of diagnostic categories appearing in the general population, rather than relying heavily on exclusion criteria. Reducing strict commitment to diagnostic categories as predictors was central to the Research Domain Criteria (RDoC) initiative in 2009, and ASD and ADHD research has benefited from this broader focus (Pacheco et al., 2022).
That said, inclusive recruitment in clinical populations is a complicated endeavor. Statistically, estimating covariation among diagnostic categories requires collecting a larger sample. Until the mechanisms underlying symptoms are better understood, such studies also require complex models with a large number of covarying parameters. The present study should be understood as strictly preliminary, and the large errors in its estimates reflect that any claims we make are provisional.
Recruitment also presents sampling bias challenges. The present paper also does not have a well-defined control group, as only one participant did not qualify for any of the five diagnostic categories. As such, while we routinely invoke the category of “non-ASD participants,” this category should not be interpreted as a neurotypical control group; rather, our non-ASD participants belong to the narrower population of children whose guardians had them evaluated by clinicians, almost all of whom have some other diagnosis. Our results should be viewed as strictly exploratory, and we do not recommend extrapolating these results to the general population (Emerson, 2015).
Such extrapolation could, in a sufficiently large sample, be accomplished with techniques like Multilevel Regression with Poststratification (Downes et al., 2018). Small convenience samples can be extrapolated somewhat reliably when adjusted using weights derived from large scale datasets such as the National Longitudinal Transition Study 2 (NLTS2); however, this introduces two additional problems. First, such a dataset must exist – the youngest participants in NLTS2 were 13 years old, so it could not be used for the current study. Second, computing the relevant weights requires that participants disclose much more information about themselves, and even if raw participant information is not disclosed, reporting individual participant weights may be sufficient to “decode” identifying information about participants.
Given these difficulties, it remains notable that partitioning our sample by ASD diagnosis not only best explains our data, but does so overwhelmingly for both TI response accuracy and reaction time (Appendix, Table 3). This does not appear to reflect impairment, but rather a different strategy or approach to the task. ASD participants responded more slowly than non-ASD participants during the testing phase, and this slower responding may help to explain their higher learning rate, possibly due to a strategy that emphasized accuracy over speed. We take from this that the current literature’s usual emphasis on detecting impairment and performance deficits may be misguided in the case of TI and other tasks pertaining to analogical reasoning. Unambiguously, participants with ASD were successful at the task, and a larger sample will help to determine what distinctive characteristics they demonstrate on their road to success.
Table 3:
Model weighs under a Bayesian stacking comparison
| Model | Sample (Pool 1) | Sample (Pool 2) | Response Accuracy Model Weight | Reaction Time Model Weight |
|---|---|---|---|---|
| Single Pool (no split) | 40 | N/A | 0.15% | 1.82% |
| Split by ASD | 19 | 21 | 93.12% | 94.99% |
| Split by ADHD | 24 | 16 | 0.00% | 1.26% |
| Split by AD | 28 | 12 | 6.71% | 1.76% |
| Split by SLD | 22 | 18 | 0.02% | 0.16% |
| Split by CD | 27 | 13 | 0.00% | 0.00% |
IQ, ASD, and TI
Though all participants had IQ scores of 70 or above according to the WISC-V, there was no meaningful association with IQ scores and successful performance of our TI task.
There is little debate that IQ is an effective early-detection tool for identifying long-term intellectual disability (ID). This connection has been made especially strongly in cases of ASD diagnosis, as population studies suggest elevated rates of ID for those with an ASD diagnosis (CDC, 2012). As such, early life IQ measures reportedly provide important predictors for adult ID outcomes in ASD cases (Howlin et al., 2004; Bishop et al., 2015; Denisova & Lin, 2023). However, IQ as a more general framework for intelligence has come under increasing scrutiny. Some argue that IQ is conceptually narrow and excludes important aspects of decision making (Ganuthula & Sinha, 2019). Others raise concerns that it measures structural variables different than those intended by theory (Weiss & Saklofske, 2020) and that distortions introduced by misinterpretations of IQ are subsequently magnified by bad policy (Ford et al., 2016).
Because we excluded participant IQ measures below 70, we make no claims regarding any relationship between TI and ID. However, among our participants, IQ did not appear related to performance, despite sampling from over four standard deviations of the measure (Figure 5). This is consistent with prior work suggesting that, in children under 11, even fairly high IQ scores do not consistently translate to transitive reasoning. Wolf & Shigaki (1983) report that a cohort of children aged 8–10 with an IQ exceeding 130 nevertheless had high error rates for verbal transitive reasoning questions. Future studies should address whether TI relies more on implicit learning systems that do not show strong correlations with IQ (e.g., as described by Maybery et al., 1995; Kalra et a., 2019).
Conclusion
Cognitive processing differences in autism spectrum disorder (ASD) have not been sufficiently disentangled from other comorbid diagnoses in the literature. This complicates claims that ASD tends to be associated with performance deficits for various tasks. Children in the current sample with an ASD diagnosis performed transitive inferences at a comparable rate to those without one, after controlling for comorbid diagnoses. However, participants with ASD also had slower reaction times during testing, suggesting a different preferred speed-accuracy tradeoff. Furthermore, the response accuracy of participants with ASD improved more per trial during the testing phase, possibly as a benefit of their slower reaction times. Partitioning the sample with respect to ASD/Non-ASD was overwhelmingly preferred to any other diagnostic split according to a model comparison analysis. This suggests a different response pattern chiefly explained by ASD, and not by other comorbidities. Future research seeking the characterize how cognitive faculties vary across psychiatric diagnoses must do more to acknowledge and statistically manage diagnostic comorbidities.
Supplementary Material
Acknowledgments
The authors would like to thank the Healthy Brain Network (HBN) of Child Mind Institute (CMI) for their invaluable assistance. Specifically, we would like to acknowledge Drs. Michael Milham, Jasmine Escalera, and Rebecca Neuhaus for their contributions to our experimental design. We would also like to acknowledge the research staff of HBN: Lindsay Alexander, Camille Gregory, Carolyn Chadwick, and all the research technicians that helped implement our study.
Funding:
This work was supported by US National Institute of Mental Health, grant number NIH-MH081153 and NIH-MH111703 awarded to Vincent Ferrera and Herbert Terrace, and by PSC CUNY Research 60583–00 48, awarded to Tina Kao.
Appendix: Statistical Analysis
Proportion of correct responses during the all-pairs testing phase of the experiment was analyzed using logistic regression. At the participant level, this was accomplished using four parameters, as given in Equations 1 and 2:
| [Eq. 1] |
| [Eq. 2] |
Here, constitutes an overall participant effect derived from four parameters: An intercept , which governs overall response accuracy at the moment of transfer from training to testing; a learning rate , which determines how accuracy changes as a function of trials ; a distance effect , which governs the effect of symbolic distance at transfer; and an interaction term , which determines the rate at which the distance effect changes as a function of trial. Here, is defined as the centered symbolic distance with a mean of zero, accomplished by subtracting from the actual symbolic distance. For example, for each adjacent pair, , whereas for a symbolic distance of 6, . Equation 2 translates the log-odds value of into a proportion of correct responses using the logistic function. If an analysis were performed on a single participant in isolation, Equations 1 and 2 would be sufficient.
Measuring the impact of diagnostic categories required a hierarchical analysis, with diagnoses coded as group-level effects. Thus, parameters for were estimated for each participant, but additional global parameters were also estimated for diagnostic categories. In practice, this might take the following form:
| [Eq. 3] |
| [Eq. 4] |
Here, each estimates the effect for a given diagnosis , while is the centered indicator function; that is, if the indicator function acts as a dummy code in which when a participant does not have diagnosis , and when a participant has diagnosis , then . This centering prevents the group effects from distorting the participant-level intercepts. Under such a model, we may assume that an individual’s participant-level estimates are drawn from a population distribution, as follows:
| [Eq. 5] |
An identical construction to Equation 5 can be used in order to study reaction times, which are generally close to Gaussian on a log scale. All that is required is that the participant-level parameters in Equation 1 be used to calculate the mean of a Gaussian likelihood, and the addition of a participant-level residual error term . Analyses were of reaction time were conducted using the natural log.
A complication in setting up a hierarchical model, especially one in which each participant has many parameters that describe only a modest amount of data, is that the group-level estimates have a strong regularizing effect on participant-level estimates. This ‘shrinkage’ is desirable when participants resemble one another, but it isn’t necessarily appropriate if subgroups of participants come from distinct populations with different population distributions. Indeed, such shrinkage can conceal important differences if not taken into consideration. Another way to set up the model is to designate distinct groups according to some indicator (such as the presence or absence of a diagnosis), allowing shrinkage to occur in each without impacting estimates in the other. This gives rise to two potential designs. The first design assumes that all participants belong to a single pool, with all participant-level parameters drawn from a single population distribution per Equation 5 and with diagnostic effects handled exclusively by the parameters in Equation 3. The second design creates two copies of Equation 5 (one with participants who have a particular diagnosis and one without) and estimates the participant-level effect separately, while also omitting from Equation 3 (since it becomes redundant with the difference between the two group estimates ). Given five diagnostic categories along which to split the data, this means there are a total of six potential models that could be run, making markedly different assumptions about how best to group the participant to maximize predictive strength.
We used the stacking approach to Bayesian model comparison with leave-one-out (LOO) estimation (Yao et al., 2018) to compare models in terms of their out-of-sample predictive strength (based on variation in the estimated pointwise likelihood) and, when appropriate, combined using a weighted average. If multiple models appear to be compelling, stacking those models is likely to be a better option than trying to split a small sample into smaller and smaller subsamples. If a single model dominates, then its grouping design was most effective, and that will be the focus for the subsequent analyses.
We compared six models: One that pooled all participants in a single multi-level model, and five that split the participant pool according to each of the diagnostic categories, connecting the two pools only by the constant group-level effects for each model’s remaining four diagnostic categories. Table 3 displays the estimated model weights for each of these models.
Overwhelmingly, in both analyses, splitting the participant pool with respect to ASD resulted in the best estimated predictive strength. This is partly because splitting by ASD results in the most even split of the participants. Even so, because the Single Pool case has fewer free parameters, and the ADHD and SLD splits are also nearly equal, we feel the high model scores received by the ASD partition most likely reflect meaningful differences between our ASD and non-ASD participants. We proceeded with an analysis that focused on the Split by ASD model as providing an adequate description of this sample, at least within the measurement tolerances of our sample.
Footnotes
Conflict of Interest: The authors declare no conflicts of interest.
Ethics Approval: The experiment was approved by the Institutional Review Boards of Columbia University (protocol AAAR7039) and New York City College of Technology CUNY (protocol 2019–0601-NYCCT), conforming to guidelines for human research set forth by the American Psychological Association.
Consent: Since the participants in this study were minors, informed consent was provided by their legal guardians to collect experimental data and to make that data available in a strictly de-identified form as part of a scientific study. The study was framed as basic research, and guardians understood that the resulting experimental data would not impact participants’ clinical or diagnostic outcomes.
Data Availability:
Fully anonymized experimental data and related materials are available in an Open Science Foundation repository < https://osf.io/dqztb/ >.
References
- Achenbach TH (1991). Integrative Guide to the 1991 CBCL/4–18, YSR, and TRF Profiles. Burlington, VT, USA: University of Vermont, Department of Psychology. [Google Scholar]
- Alexander LM, Salum GA, Swanson JM, Milham MP (2020). Measuring strengths and weaknesses in dimensional psychiatry. Journal of Child Psychology and Psychiatry, 61(1), 40–50. 10.1111/jcpp.13104 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Anwyl-Irvine AL, Massonnié J, Flitton A, Kirkham N, Evershed JK (2020). Gorilla in our midst: An online behavioral experiment builder. Behavior Research Methods, 52(1), 388–407. 10.3758/s13428-019-01237-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Banker SM, Gu X, Schiller D, Foss-Feig JH (2021). Hippocampal contributions to social and cognitive deficits in autism spectrum disorder. Trends in Neuroscience, 44(10), 793–807. 10.1016/j.tins.2021.08.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Belardinelli C, Raza M, & Taneli T (2016). Comorbid behavioral problems and psychiatric disorders in autism spectrum disorders. Journal of Childhood & Developmental Disorders, 2(2), Article 11. 10.4172/2472-1786.100019 [DOI] [Google Scholar]
- Birmaher B, Brent DA, Chiappetta L, Bridge J, Monga S, Baugher M (1999). Psychometric properties of the Screen for Child Anxiety Related Emotional Disorders. Journal of the American Academy of Child & Adolescent Psychiatry, 38(10), 1230–1236. 10.1097/00004583-199910000-00011 [DOI] [PubMed] [Google Scholar]
- Bishop SL, Farmer C, Thurm A (2015). Measurement of nonverbal IQ in autism spectrum disorder: Scores in young adulthood compared to early childhood. Journal of Autism and Developmental Disorders, 45(4), 966–974. 10.1007/s10803-014-2250-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bond AB, Wei CA, Kamil AC (2010). Cognitive representation in transitive inference: A comparison of four corvid species. Behavioural Processes, 85(3), 283–292. 10.1016/j.beproc.2010.08.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brunamonti E, Costanzo F, Mammi A, Rufini C, Veneziani D, Pani P, Vicari S, Ferraina S, Menghini D (2017). Evaluation of relational reasoning by a transitive inference task in attention-deficit/hyperactivity disorder. Neuropsychology, 31(2), 200–208. 10.1037/neu0000332 [DOI] [PubMed] [Google Scholar]
- Bryant PE, Trabasso T (1971). Transitive inferences and memory in young children. Nature, 232(5311), 456–458. 10.1038/232456a0 [DOI] [PubMed] [Google Scholar]
- CDC. (2012). Prevalence of autism spectrum disorders – Autism and developmental disabilities monitoring network, 14 sites, United States, 2008. MMWR CDC Surveillance Summaries, 61(3), 1–19. [PubMed] [Google Scholar]
- Chandler S, Charman T, Baird G, Simonoff E, Loucas T, Meldrum D, Scott M, Pickles A (2007). Validation of the Social Communication Questionnaire in a population cohort of children with autism spectrum disorders. Journal of the American Academy of Child & Adolescent Psychiatry, 46(10), 1324–1332. 10.1097/chi.0b013e31812f7d8d [DOI] [PubMed] [Google Scholar]
- Chowdhury R, Sharda M, Foster NEV, Germain E, Tryfon A, Doyle-Thomas K, Anagnostou E, Hyde KL (2017). Auditory pitch perception in autism spectrum disorder is associated with nonverbal abilities. Perception, 46(11), 1298–1320. 10.1177/0301006617718715 [DOI] [PubMed] [Google Scholar]
- Church BA, Rice C,L, Dovgopoly A, Lopata CJ, Thomeer ML, Nelson A, Mercado III,E (2015). Learning, plasticity, and atypical generalization in children with autism. Psychonomic Bulletin & Review, 22, 1342–1348. 10.3758/s13423-014-0797-9 [DOI] [PubMed] [Google Scholar]
- Conners CK (2001). Conners’ Rating Scales-Revised. North Tonawanda, NY, USA: Multi-Health Systems, Inc. [Google Scholar]
- Constantino JN, Davis SA, Todd RD, Schindler MK, Gross MM, Brophy SL, Metzger LM, Shoushtari CS, Splinter R, Reich W (2003). Validation of a brief quantitative measure of autistic traits: Comparison of the Social Responsiveness Scale with the Autism Diagnostic Interview-Revised. Journal of Autism and Developmental Disorders, 33(4), 427–433. 10.1023/a:1025014929212 [DOI] [PubMed] [Google Scholar]
- Cooper RA, Plaisted-Grant KC, Baron-Cohen S, Simons JS (2017). Eye movements reveal a dissociation between memory encoding and retrieval in adults with autism. Cognition, 159, 127–138. 10.1016/j.cognition.2016.11.013 [DOI] [PubMed] [Google Scholar]
- Dunn LM, Dunn DM (2007). Peabody Picture Vocabulary Test – Fourth Edition. Circle Pines, MN, USA: American Guidance Service. [Google Scholar]
- Denisova K, Lin Z (2023). The importance of low IQ to early diagnosis of autism. Autism Research, 16(1), 122–142. 10.1002/aur.2842 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Downs M, Gurrin LC, English DR, Pirkis J, Currier D, Spittal MJ, Carlin JB (2018). Multilevel regression and poststratification: A modeling approach to estimating population quantities from highly selected survey samples. American Journal of Epidemiology, 187(8), 1780–1790. 10.1093/aje/kwy070 [DOI] [PubMed] [Google Scholar]
- Dusek JA, Eichenbaum H (1997). The hippocampus and memory for orderly stimulus relations. Proceedings of the National Academy of the Sciences USA, 94(13), 7109–7114. 10.1073/pnas.94.13.7109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ebenholtz SM Serial learning and dimensional organization. (1972). Psychology of Learning and Motivation, 5, 267–314. [Google Scholar]
- Ehlers S, Gillberg C, Wing L (1999). A screening questionnaire for Asperger syndrome and other high-functioning autism spectrum disorders in school age children. Journal of Autism and Developmental Disorders, 29(2), 129–141. 10.1023/a:1023040610384 [DOI] [PubMed] [Google Scholar]
- Emerson RW (2015). Convenience sampling, random sampling, and snowball sample: How does sampling affect the validity of research? Journal of Visual Impairment & Blindness, 109(2), 164–168. 10.1177/0145482X1510900215 [DOI] [Google Scholar]
- Ford DY, Wright BL, Washington A, Henfield MS (2016). Access and equity denied: Key theories for school psychologists to consider when assessing black and Hispanic students for gifted education. School Psychology Forum, 10(3), 265–277. [Google Scholar]
- Ganuthula VRR, Sinha S (2019). The looking glass for intelligence quotient tests: The interplay of motivation, cognitive functioning, and affect. Frontiers in Psychology, 10, Article 2857. 10.3389/fpsyg.2019.02857 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gazes RP, Hamptom RR, & Lourenco SF (2015). Transitive inference of social dominance by human infants. Developmental Science, 20, e12367. 10.1111/desc.12367 [DOI] [PubMed] [Google Scholar]
- Gershon RC, Wagster MV, Hendrie HC, Fox NA, Cook KF, Nowinski CJ (2013). NIH Toolbox for assessment of neurological and behavioral function. Neurology, 80(11 Suppl 3), S2–S6. 10.1212/wnl.0b013e3182872e5f [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gillam JE (2014). Gillam Autism Rating Scale – Third Edition (GARS-3). Austin, TX, USA: ProEd. [Google Scholar]
- Goldman R, Fristoe M (2015). Goldman-Fristoe Test of Articulation – Third Edition (GFTA-3). Circle Pines, MN, USA: American Guidance Service. [Google Scholar]
- Gorham M, Barnes-Holmes Y, Barnes-Holmes D (2009). Derived comparative and transitive relations in young children with and without autism. The Psychological Record, 59, 221–246. 10.1007/BF03395660 [DOI] [Google Scholar]
- Grosenick L, Clement TS, Fernald RD (2007) Fish can infer social rank by observation alone. Nature, 445(7126), 429–432. 10.1038/nature05646 [DOI] [PubMed] [Google Scholar]
- Holcomb WL, Stromer R, Mackay HA (1997). Transitivity and emergent sequence performances in young children. Journal of Experimental Child Psychology, 65(1):96–124. 10.1006/jecp.1996.2360 [DOI] [PubMed] [Google Scholar]
- Hotta T, Ueno K, Hataji Y, Kuroshima H, Fujita K, Kohda M (2020). Transitive inference in cleaner wrasses (Labroides dimidiatus). PLOS ONE, 15(1), e0237817. 10.1371/journal.pone.0237817 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Howlin P, Goode S, Hutton J, Rutter M (2004). Adult outcome for children with autism. Journal of Child Psychology and Psychiatry, 45(2), 212–229. 10.1111/j.1469-7610.2004.00215.x [DOI] [PubMed] [Google Scholar]
- Jensen G, Alkan Y, Muñoz F, Ferrera VP, Terrace HS. (2017). Transitive inference in humans (Homo sapiens) and rhesus macaques (Macaca mulatta) after massed training of the last two list items. Journal of Comparative Psychology, 131(3), 231–245. 10.1037/com0000065 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jensen G, Alkan Y, Ferrera VP, Terrace HS (2019) Reward associations do not explain transitive inference performance in monkeys. Science Advances, 5(7), eaaw2089. 10.1126/sciadv.aaw2089 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jensen G, Ferrera VP, Terrace HS (2022) Positional inference in rhesus macaques. Animal Cognition, 25(1), 73–93. 10.1007/s10071-021-01536-x [DOI] [PubMed] [Google Scholar]
- Kalra PB, Gabrieli JDE, Finn AS (2019). Evidence of stable individual differences in implicit learning. Cognition, 190, 199–211. 10.1016/j.cognition.2019.05.007 [DOI] [PubMed] [Google Scholar]
- Kao T, Jensen G, Michaelcheck C, Ferrera VP, Terrace HS (2020). Absolute and relative knowledge of ordinal position on implied lists. Journal of Experimental Psychology: Learning, Memory, and Cognition, 46(12), 2227–2243. 10.1037/xlm0000783 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kaufman J, Birmaher B, Brent D, Rao U, Flynn C, Moreci P, Williamson D, Ryan N (1997). Schedule for Affective Disorders and Schizophrenia for School-Age Children-Present and Lifetime Version (K-SADS-PL): Initial reliability and validity data. Journal of the American Academy of Child & Adolescent Psychiatry, 36(7), 980–988. 10.1097/00004583-199707000-00021 [DOI] [PubMed] [Google Scholar]
- Khachadourian V, Mahjani B, Sandin S, Kolevzon A, Buxbaum JD, Reichenberg A, & Janecka M (2023). Comorbidities in autism spectrum disorder and their etiologies. Translational Psychiatry, 13, Article 71. 10.1038/s41398-023-02374-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maybery M, Taylor M, O’Brien-Malone A (1995). Implicit learning: Sensitive to age but not to IQ. Australian Journal of Psychology, 47(1), 8–17. 10.1080/00049539508258763 [DOI] [Google Scholar]
- McGonigle BO, Chalmers M (1977). Are monkeys logical? Nature, 267(5613), 694–696. 10.1038/267694a0 [DOI] [PubMed] [Google Scholar]
- Mercado III E, Chow K, Church BA, Lopata C (2020). Perceptual category learning in autism spectrum disorder: Truth and consequences. Neuroscience of Biobehavioral Reviews, 118, 689–703. 10.1016/j.neubiorev.2020.08.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mercado III E, Church BA, Coutinho MVC, Dovgopoly A, Lopata CJ, Toomey JA, Thomeer ML (2015). Heterogeneity in perceptual category learning by high functioning children with autism spectrum disorder. Frontiers in Integrative Neuroscience, 9, Article 42. 10.3389/fnint.2015.00042 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Merritt DJ, Terrace HS (2011). Mechanisms of inferential order judgments in humans (Homo sapiens) and rhesus monkeys (Macaca mulatta). Journal of Comparative Psychology, 125(2), 227–238. 10.1037/a0021572 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Onwuameze OE, Titone D, Ho B-C (2016). Transitive inference deficits in unaffected biological relatives of schizophrenia patients. Schizophrenia Research, 175, 64–71. 10.1016/j.schres.2016.02.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pacheco J, Garvey MA, Sarampote CS, Cohen ED, Murphy ER, Friedman-Hill SR (2022). Annual research review: The contributions of the RDoC research framework on understanding neurodevelopmental origins, progression and treatment of mental illness. Journal of Child Psychology and Psychiatry, 63(4) 360–376. 10.1111/jcpp.13543 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qian M, Lipkin RM (2011). A learning-style theory for understanding autistic behaviors. Frontiers in Human Neuroscience, 2011, 5, Article 77. 10.3389/fnhum.2011.00077 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rong Y, Yang C-J, Jin Y, Wang Y (2021). Prevalence of attention-deficit/hyperactivity disorder in individuals with autism spectrum disorder: a meta-analysis. Research in Autism Spectrum Disorder, 83, 101759. 10.1016/j.rasd.2021.101759 [DOI] [Google Scholar]
- Rosenberg A, Patterson JS, Angelaki DE (2015). A computational perspective on autism. Proceedings of the National Academy of Sciences USA, 112(30), 9158–9165. 10.1073/pnas.1510583112 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Solomon M, Frank MJ, Smith AC, Ly S, Carter CS (2011). Transitive inference in adults with autism spectrum disorders. Cognitive, Affective, & Behavioral Neuroscience, 11(3), 437–449. 10.3758/s13415-011-0040-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Solomon M, Ragland JD, Niendam TA, Lesh TA, Beck JS, Matter JC, Frank MJ, Carter CS (2015). Atypical learning in autism spectrum disorders: A functional magnetic resonance imaging study of transitive inference. Journal of the American Academy of Child and Adolescent Psychiatry, 54(11), 947–955. 10.1016/j.jaac.2015.08.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Swanson J, Deutsch C, Cantwell D, Posner M, Kennedy JL, Barr CL, Moyzis R, Schuck S, Flodman P, Spence MA, Wasdell M (2001). Genes and attention-deficit hyperactivity disorder. Clinical Neuroscience Research, 1(3), 207–216. 10.1016/S1566-2772(01)00007-X [DOI] [Google Scholar]
- Terrace HS (2012). The comparative psychology of ordinal behavior. In Zentall TR & Wasserman EA. Oxford Handbook of Comparative Cognition, 615–651. Oxford, UK: Oxford University Press. 10.1093/oxfordhb/9780195392661.013.0032 [DOI] [Google Scholar]
- Tibbetts EA, Agudelo J, Pandit S, Roijas J (2019). Transitive inference in Polistes paper wasps. Biology Letters, 15(5), 20190015. 10.1098/rsbl.2019.0015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Todd J, Mills C, Wilson AD, Plumb MS, Mon-Williams MA (2009). Slow motor responses to visual stimuli of low salience in autism. Journal of Motor Behavior, 41(5), 419–426. 10.3200/35-08-042 [DOI] [PubMed] [Google Scholar]
- Torgesen J, Wagner R, Rashotte C (2012). TOWRE-2: Test of Word Reading Efficiency – Second Edition. Austin, TX, USA: ProEd. [Google Scholar]
- Van de Cruys S, Evers K, Van der Halen R, Van Eylen L, Boets B, de-Wit L, Wagemans J (2014). Precise minds in uncertain worlds: Predictive coding in autism. Psychological Review, 121(4), 649–675. 10.1037/a0037665 [DOI] [PubMed] [Google Scholar]
- Velikonja T, Fett A-K, Velthorst E (2019). Patterns of nonsocial and social cognitive functioning in adults with autism spectrum disorder. JAMA Psychiatry, 76(2), 135–151. 10.1001/jamapsychiatry.2018.3645 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wagner R, Torgesen J, Rashotte C, Pearson NA (2013). CTOPP-2: Comprehensive Test of Phonological Processing – Second Edition. Austin, TX, USA: ProEd. [Google Scholar]
- Wechsler D (2009). Wechsler Individual Achievement Test – Third Edition. San Antonio, TX, USA: Pearson. [Google Scholar]
- Wechsler D (2014). Wechsler Intelligence Scale for Children – Fifth Edition. San Antonio, TX, USA: Pearson. [Google Scholar]
- Weiss LG, Saklofske DH (2020). Mediators of IQ test score differences across racial and ethnic groups: The case for environmental and social justice. Personality and Individual Differences, 161, 109962. 10.1016/j.paid.2020.109962 [DOI] [Google Scholar]
- Wiig EH, Secord W, Semel E (2013). Clinical Evaluation of Language Fundamentals – Fifth Edition. San Antonio, TX, USA: Pearson. [Google Scholar]
- Wiig EH & Secord W (2014). Clinical Evaluation of Language Fundamentals – Metalinguistics. Bloomington, MN, USA: Pearson. [Google Scholar]
- Williams KT (2007). Expressive Vocabulary Test – Second Edition (EVT-2). Minneapolis, MN, USA: Pearson. [Google Scholar]
- Wing EA, D’Angelo MC, Gilboa A, Ryan JD (2021) The role of ventromedial prefrontal cortex and basal forebrain in relational memory and inference. Journal of Cognitive Neuroscience, 33(9), 1976–1989. 10.1162/jocn_a_01722 [DOI] [PubMed] [Google Scholar]
- Wolf W, Shigaki I (1983). A developmental study of gifted children’s conditional reasoning ability. Gifted Child Quarterly, 27(4), 173–179. 10.1177/001698628302700406 [DOI] [Google Scholar]
- Wright BC, Smailes J (2015). Factors and processes in children’s transitive deductions. Journal of Cognitive Psychology, 27(8), 967–978. 10.1080/20445911.2015.1063641 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yao Y, Vehtari A, Simpson D, Gelman A (2018). Using stacking to average Bayesian predictive distributions (with discussion). Bayesian Analysis, 13(3), 917–1007. 10.1214/17-BA1091 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Fully anonymized experimental data and related materials are available in an Open Science Foundation repository < https://osf.io/dqztb/ >.
