Abstract
The Social Responsiveness Scale-2 (SRS-2) is a quantitative measure used to characterize symptoms of autism spectrum disorder (ASD). However, research suggests that SRS-2 scores are significantly influenced by language ability and intellectual disability (ID). Efforts to refine the SRS-2 by Sturm, Kuhfeld, Kasari, and Mccracken [Journal of Child Psychology and Psychiatry, 58(9), 1053–1061] yielded a shortened form, yet its psychometric properties in populations with severe ID remain unknown. This study aims to examine the psychometric properties of the SRS-2 in Phelan–McDermid syndrome (PMS), a genetic condition associated with ASD and ID, thereby guiding score interpretation in this population and future development of targeted scales. Analyses, including Item Response Theory (IRT), were conducted on a sample of individuals with PMS (n = 91) recruited at six sites nationally. Psychometric properties evaluated include measures of reliability (internal consistency, test–retest reliability) and validity (structural, construct, content). While both SRS-2 forms are reliable, the shortened SRS-2 shows superior validity to the full SRS-2 for measuring ASD symptoms in PMS. On IRT analysis, the shortened SRS-2 shows excellent discrimination and precisely evaluates respondents across a wide range of ASD symptomatology but interpretation is limited by uncertain content validity and small sample size. The shortened SRS-2 shows some promise for use in PMS, but future refinements and additions are needed to develop items that are tailored to identify ASD in children with severe ID and specifically PMS.
Keywords: autism spectrum disorder, Phelan–McDermid syndrome, behavioral measures, Item Response Theory, intellectual disability
Lay Summary:
This study determined that a shortened form of the Social Responsiveness Scale, Second Edition (SRS-2) shows both promise and limitations for the characterization of autism symptomatology in individuals with Phelan–McDermid syndrome (PMS), a population characterized by intellectual disability (ID). Caution should be used when interpreting SRS-2 scores in individuals with ID and future research should modify existing items and develop new items to improve the SRS-2’s ability to accurately characterize autism symptomatology in PMS.
Introduction
Phelan–McDermid syndrome (PMS), also known as 22q13 deletion syndrome, is a rare genetic condition associated with autism spectrum disorder (ASD) and intellectual disability (ID) caused by mutation or deletion of the SHANK3 gene on terminal chromosome 22. The phenotypic presentation of PMS is variable and includes ID, ASD, hypotonia, absent or delayed speech, and dysmorphic features [De Rubeis et al., 2018; Egger, Zwanenburg, van Ravenswaaij-Arts, Kleefstra, & Verhoeven, 2016; Kolevzon et al., 2014; Phelan et al., 2001; Soorya et al., 2013; Zwanenburg, Ruiter, van den Heuvel, Flapper, & Van Ravenswaaij-Arts, 2016]. PMS and other rare genetic disorders associated with ASD provide opportunities for research because knowledge about pathophysiology allows for the development of targeted, biologically based treatments. The National Institutes of Health have recognized the potential of rare genetic syndromes to inform ASD broadly and funded the Developmental Synaptopathies Consortium (DSC), a multisite study with the goal of mapping the natural history of PMS and related genetic conditions (i.e., tuberous sclerosis complex and PTEN hamartoma tumor syndrome). To date, the DSC has collected a battery of patient-reported outcomes (PROs) on 97 participants with PMS, 91 of whom completed the Social Responsiveness Scale, Second Edition (SRS-2; Constantino, 2012).
The DSC selected the SRS-2 to characterize ASD symptoms in several genetic conditions. The SRS-2 is a caregiver report questionnaire comprised of 65 items which create a total severity score and is one of the most widely used measures within the ASD literature. It has been used in clinical practice to screen for ASD risk, as a measure of quantitative trait in studies of ASD genetics [Duvall et al., 2007; Lowe et al., 2015], and as a treatment outcome measure [Anagnostou et al., 2014]. The psychometric properties of the SRS-2 have been validated in samples of children without cognitive impairment aged 2.5–18 years. However, due to lack of validation in specific groups, the authors and other researchers have indicated that the measure should be used with caution in individuals with ID [Constantino & Gruber, 2005], children younger than 4 years old (who are administered a separate preschool version), and those with significant expressive language delays [Charman et al., 2007; Duku et al., 2013; Frazier et al., 2014; Havdahl et al., 2016; Hus, Bishop, Gotham, Huerta, & Lord, 2013; Moul, Cauchi, Hawes, Brennan, & Dadds, 2015]. Impairments in these factors have been shown to influence SRS-2 scores; SRS-2 scores correlate negatively with IQ in populations with cognitive impairment [Hus et al., 2013], implying measurement bias within the questionnaire that could potentially impact performance in PMS. Indeed, the SRS-2 manual recommends that when used in the context of significant cognitive impairment, clinicians should consider whether the magnitude of impairment in reciprocal social behavior would be expected or substantively greater than expected given the child’s developmental delay [Constantino & Gruber, 2005].
The use of measures with poor psychometric properties significantly hampers the progression of research in neurodevelopmental disorders, including PMS [Toland, 2014]. Using scales that are inappropriately matched to the intended population as outcome variables leads to reduced statistical validity [Embretson, 1996; Kang & Waller, 2005; Toland, 2014]. To address this issue, guidelines were created to foster the development of valid PROs; specifically, an international Delphi study produced the COSMIN (COnsensus-based Standards for the selection of health Measurement INstruments) checklist [Mokkink et al., 2010]. The evaluation and improvement of existing assessment tools are particularly important in populations with severe ID such as PMS, but is also challenging due to verbal communication challenges, comorbid medical conditions, and atypical skill progression [Mohr & Gray, 2005; Redin et al., 2014; Soorya, Leon, Trelles, & Thurm, 2017; Wright et al., 2015]. Diagnostic criteria mandate that ASD should not be diagnosed if symptoms are better explained by ID or developmental delay, however, diagnostic boundaries are often blurred and few operationalized criteria exist to identify clinical distinctions [Thurm, Farmer, Salzman, Lord, & Bishop, 2019]. Moreover, genetic syndromes are often accompanied by co-morbid medical conditions, which may change the clinical presentation of ASD [Thurm et al., 2019]. The Characteristics of Assessment Instruments for Psychiatric Disorders in Persons with Intellectual Developmental Disorders (CAPs-IDD) was developed as a set of Intellectual and Developmental Disability-specific guidelines to aid in the evaluation and adaptation of instruments for use in these populations [Zeilinger, Nader, Brehmer-Rinderer, Koller, & Weber, 2013]. These guidelines delineate a methodology for the evaluation of existing PROs by compiling the most informative measurement properties and evaluating the quality of the various statistical tests used to evaluate the properties.
To address the limitations of the SRS-2, recent efforts have attempted to empirically derive an adapted version that is less vulnerable to bias while retaining a unidimensional factor structure [Duku et al., 2013; Sturm, Kuhfeld, Kasari, & Mccracken, 2017]. Sturm et al.’s [2017] 16-item version of the SRS-2 was developed through systematic exploration of bias in the SRS-2 using Item Response Theory (IRT) and Differential Item Functioning (DIF) in a large sample (n = 21,426) of ASD youth. In Sturm et al. [2017], the shortened SRS-2 demonstrated unidimensional structure, was highly reliable (a = 0.96), and had strong correlations to Autism Diagnostic Observation Schedule-2 (ADOS-2; Lord et al, 2012), Autism Diagnostic Interview-Revised (ADI-R; Rutter, Le Couteur, & Lord, 2003), and Vineland Adaptive Behavior Scale, Second Edition (VABS-II; Sparrow, 2011) scores [Sturm et al., 2017]. Sturm’s shortened SRS-2 may hold promise for use in individuals with PMS and ID, but has yet to be tested in populations with severe ID. We utilized data derived from a multi-center, natural history study to evaluate the psychometric properties of both the full and shortened SRS-2 using the CAPS-IDD checklist guidelines. The goal of this study was to generate information to guide the use and interpretation of SRS-2 scores in PMS and other populations with severe ID. In addition, we aimed to guide the development of modified scales specifically designed for use in these populations.
Methods
Participants
Participants were recruited from 2015 until 2018 from an ongoing multi-site study as part of the DSC, a Rare Diseases Clinical Research Network (1 U54 NS092090-01) [Groft & Gopal-Srivastava, 2013]. Human subject approval was obtained through a centralized Institutional Review Board at Boston Children’s Hospital and all participants or their caregivers provided informed consent. Ninety-seven individuals aged 3–21 participated in this study. All participants had a molecular diagnosis of PMS confirmed by chromosomal microarray or gene sequencing and defined by deletion or pathogenic mutation of the SHANK3 gene.
Study Procedures
Participants were followed longitudinally and completed assessments at baseline, 6, 12, 18, and 24 months. Parent-reported assessments were administered according to study protocol to ensure standardization. Parents were able to have staff members to facilitate accurate questionnaire completion. Of 97 individuals, 91 participants completed the SRS-2 on at least one occasion and were included in analyses. For analyses, each participant’s baseline SRS-2 was evaluated. If no baseline SRS-2 data was available, then the SRS-2 data from the earliest time point was used (n = 6). If >5 items were missing from the baseline SRS-2, then the SRS-2 from the next time point was utilized (n = 4). Missing data analysis investigated whether missing cases are systematically different from participants with SRS-2 data.
Measures
Parent-reported assessments included the SRS-2, the Aberrant Behavior Checklist (ABC; Aman & Singh, 1986), and the Repetitive Behavior Scale-Revised (RBS-R; Bodfish et al., 2000).
Social Responsiveness Scale-2.
Both the 65-item SRS-2 and Sturm et al.’s [2017] 16-item shortened SRS-2 were evaluated in this study. The SRS-2 is a Likert-style questionnaire that uses a scale of 1–4 (1 = not true, 4 = almost always true) to evaluate ASD symptoms, with higher scores indicating greater impairment [Constantino et al., 2004; Constantino, Przybeck, Friesen, & Todd, 2000]. A designated raw score cutoff value of 70 is considered to have a sensitivity of 0.78 and specificity of 0.94 for ASD [Constantino & Gruber, 2005]. Raw total scores are converted to gender-normed T scores, with a T-score of <59 designated as normal range, 60–75 considered mild-to-moderate, and a T-score >75 indicating severe impairment. Item-level and full scale (both raw and T scores) scores were included in analyses.
Other measures.
The Vineland-II [Sparrow, 2011] is a measure of adaptive behavior and includes four subscales: communication, socialization, daily living skills, and motor skills (age 6 and below). An Adaptive Behavior Composite is derived based on responses across subscales. The Survey Interview Form was used and is administered as a caregiver interview. The ABC is a PRO of aberrant behavior and consists of five subscales: Irritability, lethargy, hyperactivity, stereotypy, and inappropriate speech [Aman & Singh, 1986]. The RBS-R is a PRO that assesses restricted and repetitive behaviors (Bodfish et al., 2000).
ASD assessments.
The diagnosis of ASD was established using clinical consensus based on Diagnostic and Statistical Manual for Mental Disorders, Fifth Edition criteria (DSM-5; American Psychiatric Association, 2013), informed by psychiatric evaluation, the ADOS-2 (Lord et al., 2012) and the ADI-R (Rutter, Le Couteur, & Lord, 2003). The ADI-R was used only in cases with estimated mental age >18 months, and this measure was not developed to assess severity of ASD scores, just whether an individual meets criteria based on an algorithm. The ADOS-2 has five modules that are designed to assess children with different ages and levels of language abilities. ADOS-2 scores must be interpreted with caution in PMS, as studies suggest decreased accuracy in individuals with mental ages below 18 months and in those with profound intellectual disability (Lord et al., 2012). In this study, ADOS-2 modules 1–4 were also used to group participants by language ability, and Calibrated Severity Scores (CSS) were used to quantify severity [Gotham et al., 2010].
Nonverbal intelligence quotient.
Cognitive evaluations were performed with one of three measures: Mullen Scales for Early Learning (MSEL) [Mullen, 1995], Stanford-Binet, Fifth Edition (SB-5) [Thorndike, Hagen, & Sattler, 1986], or the Differential Abilities Scale, Second Edition (DAS-II) [Elliot, 1990], depending on which was the most appropriate for the child’s level of ability. The SB-5 and DAS-II yield nonverbal intelligence quotient (NVIQ; or equivalent) standard scores. The MSEL consists of five subscales (gross motor, visual reception, fine motor, receptive language, and expressive language) and generates age-equivalent scores that can be used to calculate a nonverbal developmental quotient (NVDQ) in individuals administered this measure out of age range, or for whom standard scores were at the floor (lowest value). In these cases, the age-equivalent scores on the visual reception and fine motor sections are averaged and divided by chronological age [Bishop, Guthrie, Coffing, & Lord, 2011].
Statistical Analysis
The CAPS-IDD guidelines were reviewed to identify the psychometric characteristics necessary for rigorous investigation of the SRS-2 and guide the choice statistical test. Following CAPS-IDD recommendations, psychometric evaluation was conducted in three main stages. First, we evaluated descriptive measurement characteristics of the full SRS-2 at an item level. We also utilized correlations to evaluate the influence of external factors on both the full and shortened version of the SRS-2. Second, we evaluated and compared psychometric properties including reliability and validity measures of Sturm’s SRS-2 versus the full SRS-2. Third, we conducted IRT analysis on Sturm’s SRS-2 solution.
Stage 1: To determine early information about how the SRS-2 performs in PMS, we conducted a full description of all items, subscales, and the full scale. Descriptive statistics included: Mean, median, standard deviation, range, missing data, and frequency distributions. Pattern of response category frequency was utilized to clarify floor and ceiling effects. Experts in the field were queried about quality of item content with particular emphasis on those items with large amount of missing data to evaluate the likelihood that responses were missing due to inappropriateness of item content for PMS patients specifically [Reeve et al., 2007]. To evaluate the influence of external factors, scores on the 65 items were correlated with NVIQ and language ability scores. Given the ordinal nature of the SRS-2 items, Spearman correlations were evaluated.
Stage 2: Internal consistency reliability was determined using Cronbach’s alpha, with a value of 0.70 or greater considered adequate. Internal consistencies less than 0.70 are considered undesirable and indicate a lack of correlation between items and total raw score. To be conservative, case-wise deletion was used. Directionality of items was considered, and items were reversed as necessary. Test–retest reliability was evaluated through the calculation of intraclass correlation coefficients (ICC) for total raw SRS-2 scores and for each subscale total scores at all time points, 6 months apart with a Shrout–Fleiss ICC (3,k) and Winer reliability (mean of k scores) ICC > 0.70 considered acceptable (Koo & Li, 2016). Administrations were considered independent, as reporters did not have access to their previous responses. To evaluate construct validity (convergent validity; concurrent validity; divergent validity), a priori hypotheses were generated. Pearson product–moment correlations were calculated with no correction given the overall goal of evaluating the relationships between measures [Sturm et al., 2017]. For convergent validity, we hypothesized that correlation between SRS-2 and instruments measuring similar constructs would be ≥0.50. Instruments considered to measure similar constructs were measures of autism symptoms, specifically those of social affect (ADI-R social, ADOS-2 social affect, ADOS-2 comparison score), as well as measures of restricted and repetitive behaviors (ADI-R repetitive behavior, ADOS-2 repetitive behavior, and RBS-R). For concurrent validity, we hypothesized that correlations of SRS-2 with instruments measuring related but dissimilar constructs should be smaller by a minimum of 0.10 from correlations with instruments measuring similar constructs, specifically the ABC and VABS. Finally, we evaluated the structural validity of the SRS-2 through Confirmatory Factor Analysis. Statistical software MPlus version 8 was utilized for all factor analyses. CFA was conducted using the mean and variance-adjusted weighted least squares estimator (WLSMV) for categorical ordinal responses. Global fit was evaluated through tests of goodness-of-fit using suggested criteria: Comparative Fit Index (CFI > 0.90), root mean square error of approximation (RMSEA < 0.06 good fit, <0.08 fair fit), Tucker–Lewis Index (TLI > 0.90), standardized root mean square residual (SRMR < 0.08) [Hu & Bentler, 1999]. Given the small sample size, criteria were slightly relaxed. Chi-square results are presented but considered less reliable given sample size. Multiple hypothesized models including unidimensional, two-factor, and five-factor models were evaluated to improve fit. All analyses in this stage were performed on both the full SRS-2 as well as Sturm’s solution, and results were compared between the two scales.
Stage 3: Item Response Theory (IRT): After confirming the shortened SRS scale’s compliance with assumptions of unidimensionality through CFA, IRT analysis was performed on Sturm’s SRS-2. Items were calibrated using Samejima’s Graded Response Model with Maximum Likelihood, which allowed for evaluation of scale properties at an item response category level. Two IRT models were estimated: (a) parsimonious Graded Response Model (GRM) specifying a single slope, and (b) a full GRM specifying unique slopes for each of the 16 items. We compared the suitability of these two nested models by evaluating the change in fit using the DIFFTEST procedure in Mplus. On an item level, parameters estimated included a slope estimate and a difficulty parameter. An item slope estimate, also called a discrimination parameter, is an indicator of how well an item differentiates between respondents above and below a specific level of latent construct (ASD symptomatology level). Items with high slopes are considered strongly related to the underlying construct and therefore highly discriminating for respondents from the target population. The difficulty (also called threshold or location) parameter identifies the amount of latent construct at which an individual has a 50% probability of endorsing a particular response category or higher. Location parameters provide information about where each item functions on the latent trait continuum, facilitating the identification of candidate items whose location parameters approximately match the construct levels of the target population [Reeve et al., 2007; Toland, 2014]. These values are then consolidated into response curves, which provide deeper information about the scale. First, to investigate the appropriateness of the four response category structure of the scale, Item Characteristic Curves (ICCs) are evaluated. ICCs combine estimated slope and threshold parameters to graphically demonstrate the probability that a participant will endorse each response category at varying levels of latent trait [Toland, 2014]. In an ideal ICC, each response category has distinct peaks indicating all categories are being used by participants as expected [Toland, 2014]. Then, to gain an overall idea of the precision/reliability with which the latent trait has been measured, a variable known as information is examined [Thissen & Wainer, 2001; Toland, 2014]. Information identifies the range of latent trait continuum where each item and the scale as a whole discriminates among examinees with certainty [Reeve & Fayers, 2005; Toland, 2014]. Item information curves (IICs) are excellent metrics to determine how well an item is performing, as low information items may be poorly written, measuring a different construct, or be inappropriate for a particular population of respondents [Reeve & Fayers, 2005] The test information curve (TIC) is an indicator of the overall scale’s level of reliability along the entire latent trait continuum [Reeve & Fayers, 2005]. Finally, item parameters are used to estimate each participant’s latent trait score (θ and s.e.) on a standard normal metric [Toland, 2014; Reeve et al., 2007]. Ideally, theta scores (the measure of amount of latent trait) have a mean of 0 and variance of 1. These scores are then compared to the TIC to guide scale revisions [Toland, 2014].
Results
Participants
Data from 91 participants ages 3–21 (mean = 9.10, SD = 4.59) were included in the analyses. The final sample consisted of 47 males and 44 females and was primarily Caucasian (n = 76), in addition to African American (n = 3), Native American (n = 1), Asian (n = 8), or Other (n = 3). The sample demonstrated a mean NVIQ of 30.20 (SD = 17.94) and were characterized as having profound (NVIQ < 25, n = 44), severe ID (NVIQ 25–39, n = 14), moderate ID (NVIQ 40–54, n = 25), mild ID (NVIQ 55–69, n = 5) or borderline cognitive functioning (NVIQ > 70, n = 3). Language was assessed through MSEL age equivalent scores for receptive language (n = 62; mean = 12.50 months; range = 1–27; SD = 6.43) and expressive language (n = 62; mean = 8.50 months; range = 1–31; SD = 6.06), and by use of ADOS-2 module as a proxy for language ability. Sixty participants (65.93%) received the ADOS-2 Module 1; 14 (15.38%) received Module 2; 11 (12.09%) received Module 3, and 3 (3.30%) received Module 4. Three children (3.30%) did not receive the ADOS-2. In the 63 participants with mental ages >18 months, the median ADOS-2 CSS was 6, Inter-quartile range 5 and were characterized as non-spectrum (CSS 1–3, n = 16), autism spectrum (CSS 4–5, n = 9), and autism (CSS 6–10, n = 35). Three were missing CSS scores. ASD diagnosis was made based on clinical consensus using the DSM-5, ADOS-2, and ADI-R. The sample consisted of 52 individuals (57.78%) diagnosed with ASD and 38 classified as non-ASD (42.22%). Demographic characteristics are presented in Table 1.
Table 1.
Demographic Characteristics of n = 91 Subjects with SRS-2 Data Included in Study
Demographic characteristics | ||
---|---|---|
Age in years (M/SD) | 9.10 | 4.59 |
Sex (n/%) | ||
Male | 47 | 51.65% |
Female | 44 | 48.35% |
Race/ethnicity (n/%) | ||
Caucasian | 76 | 83.52% |
Asian | 8 | 8.79% |
American Indian | 1 | 1.10% |
African American/Black | 3 | 3.30% |
Native Hawaiian or other Pacific Islander | 0 | 0% |
Other-Hispanic Latino | 3 | 3.30% |
Clinical certainty diagnosis (n/%) (n = 90) | ||
Autism | 52 | 57.78% |
Non-spectrum | 38 | 42.22% |
Verbal abilities by ADOS module (n/%) (n = 88) | ||
Module 1 | 60 | 65.93% |
Module 2 | 14 | 15.38% |
Module 3 | 11 | 12.09% |
Module 4 | 3 | 3.30% |
ADOS-2 (Median/IQR) (n = 63) | ||
Comparison score | 6.00 | 5 |
ADI-R (M/SD) (n = 38) | ||
Social | 19.32 | 7.84 |
Restricted repetitive behavior | 3.92 | 2.06 |
Development | 4.55 | 1.06 |
Note. Demographic characteristics of study population.
Missing Data Analysis
Six subjects were excluded from the present analysis due to absent SRS-2 data. Four of those participants were not administered the SRS-2 due to investigator decision. In all four cases, investigators indicated participant functional level as the reason for the study decision. One participant dropped out of the research study before completion of baseline measures. The sixth participant failed to return the SRS-2 questionnaire at all study points and described difficulties with SRS-2 form due to limited relevance to their participant’s presentation. Missing data were analyzed at the item level at all time points by evaluating for frequency of missing response in order. Items most frequently left blank were 4, 13, 21, 25, 40, 44, 46, 47, 52, and 61; each item was omitted by at least nine caregivers throughout all time points. Of note, except for item 25, all other items were eliminated in Sturm’s empiric shortening of the SRS-2.
Item Analysis
On an item level, eight items correlated with age, 44 items correlated with NVIQ, and 11 items correlated with verbal ability. Although individuals with ASD demonstrated higher mean scores as compared to their non-ASD counterparts, (t = 2.52, P = 0.0135), both groups had mean overall T scores significantly greater than the established SRS-2 cutoff of 60 indicating clinically significant ASD symptoms (ASD M = 77.63 SD = 10.62; non-ASD M = 71.63 SD = 11.85). In fact, of the entire sample, only eight had scores that fell below a T-score of 60. Individuals with ASD demonstrated higher mean scores than their non-ASD counterparts on only 28 of 65 items. As seen in Table 2, on the shortened SRS-2, 9 of 16 items were higher in the ASD group as compared to the non-ASD group. As shown in Table 2, in which abbreviated item content and item response frequencies are displayed for the shortened SRS-2, the majority of respondents in this sample endorsed the response categories associated with the most severe level of impairment for most items. For items 30, 41, and 51, a majority of respondents endorsed the response category associated with the least severe level of impairment.
Table 2.
Frequencies of Responses to 16 Items Included in Sturm’s Adapted Form
Item # | Abbreviated item content | Subscale | ASD (M, SD) (n = 52) | NON-ASD (M, SD) (n = 38) | t-Value | P-value |
---|---|---|---|---|---|---|
Item 6 | Aware of others’ thinking/feeling | SAw | 3.44, (0.85) | 3.13, (0.78) | 1.78 | 0.079 |
Item 7 | Behaves strange/bizarre | RRB | 2.59, (0.91) | 2.11, (0.76) | 2.70 | 0.008 |
Item 11 | Avoid eye contact | SCom | 2.38, (1.11) | 1.89, (0.89) | 2.25 | 0.027 |
Item 15 | Play appropriately | SCom | 3.77, (0.43) | 3.18, (0.83) | 3.97 | 0.0002 |
Item 16 | Does not join group activities | SMot | 3.29, (1.04) | 2.24, (1.02) | 4.78 | <0.0001 |
Item 22 | Regarded by other children as odd | RRB | 3.21, (0.99) | 2.65, (1.01) | 2.62 | 0.011 |
Item 23 | Becomes upset in a situation with lots of things going on | SCog | 2.42, (1.05) | 2.21, (0.93) | 0.99 | 0.325 |
Item 25 | Socially awkward | SComm | 2.96, (1.14) | 2.39, (1.03) | 2.43 | 0.017 |
Item 27 | Responds appropriately to mood changes in others | SComm | 3.44, (0.89) | 2.89, (0.89) | 2.87 | 0.005 |
Item 30 | Overly sensitive to sounds | SCog | 1.77, (1.00) | 2.03, (1.05) | −1.18 | 0.242 |
Item 41 | Seems to react to people as if they are objects | SAw | 1.52, (0.70) | 1.29, (0.57) | 1.66 | 0.099 |
Item 51 | Under stress shows rigid or inflexible patterns of behavior | RRB | 2.29, (1.13) | 1.89, (0.86) | 1.80 | 0.075 |
Item 54 | Awkward in turn taking interactions with peers | SComm | 3.15, (1.11) | 2.79, (0.99) | 1.61 | 0.111 |
Item 56 | Has difficulty making friends | SComm | 3.04, (1.12) | 2.29, (1.01) | 3.26 | 0.002 |
Item 60 | Has an unusually narrow range of interests | RRB | 3.06, (1.13) | 2.24, (1.13) | 3.41 | 0.001 |
Item 65 | Has difficulty relating to peers | SComm | 3.21, (1.02) | 2.79, (1.04) | 1.92 | 0.058 |
Note. Adapted SRS-2 item response frequencies. Bold values denote statistical significance to the p<0.05 level.
Abbreviations: RRB, restricted repetitive behaviors; SAw, social awareness; SCog, social cognition; SCom, social communication; SMot, social motivation.
Internal Consistency
Internal consistency was excellent in both the full SRS-2 (Cronbach’s α = 0.935), and Sturm’s solution (Cronbach’s α = 0.878).
Test–Retest Reliability
Test–retest reliability was excellent in both the full SRS-2 (Shrout–Fleiss = 0.944, Winer reliability = 0.944) and the shortened SRS-2 (Shrout–Fleiss = 0.945, Winer reliability = 0.944).
Structural Validity
The full SRS-2 showed inadequate model fit in all models, with the best performing model (graded response model, unidimensional, estimated with WLSMV) demonstrating: CFI = 0.747, TLI = 0.739, RMSEA = 0.067, P = 0.000, SRMR = 0.155, chi-square = 2841.003, P = 0.000. As shown in Table 3, the shortened 16-item SRS-2 scale demonstrated fit (WLSMV probit link, graded response model, unidimensional) at or close to thresholds of acceptability for our sample data (CFI = 0.922, TLI = 0.910, RMSEA = 0.093, P = 0.001, SRMR = 0.089, chi-square = 186.565, P = 0.000). Alternative solutions did not yield superior fit relative to the original model. Therefore, we concluded that the 16-item SRS-2 scale was sufficiently unidimensional for IRT analysis.
Table 3.
Global Fit Indices of SRS Full and Sturm
Model | Items # | Chi-square | CFI | TLI | RMSEA | SRMR | Internal consistency Cronbach α | Test-retest reliability | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Value | df | P-value | Estimate | Lower Cl | Higher Cl | P-value | Shrout-Fleiss | Winer reliability | ||||||
Full SRS-2 | 65 | 2841.003 | 2015 | 0.000 | 0.747 | 0.739 | 0.067 | 0.061 | 0.073 | 0.000 | 0.155 | 0.935 | 0.944 | 0.944 |
Sturm 16 variable | 16 | 186.565 | 104 | 0.000 | 0.922 | 0.910 | 0.093 | 0.071 | 0.115 | 0.001 | 0.089 | 0.878 | 0.945 | 0.944 |
Note. Comparison of Structural validity of the SRS full and Sturm form. Convergent validity of SRS-2 full and Sturm with ASD Symptom Measures.
Construct, Concurrent, and Convergent Validity
Total raw scores on Sturm’s SRS-2 demonstrated excellent construct validity, given its strong correlation with total raw scores on the full SRS-2 (r = 0.940, P < 0.0001). All correlations with established ASD diagnostic instruments were positive and statistically significant in both the full and shortened versions of the SRS-2 (Ps < 0.05; Table 4), indicating that convergent validity of both SRS-2 forms was overall moderate or strong. Correlations with measures of autism symptoms on the ADOS-2 were mildly improved or similar in Sturm’s SRS-2 as compared to the full SRS-2. Specifically, correlations with ADOS-2 comparison score were increased in Sturm’s SRS-2 (r = 0.520) as compared to the full SRS-2 (r = 0.442). Sturm’s SRS-2 showed mildly reduced association with RBS scores. As shown in Table 5, with regard to measures of behavior, Sturm’s SRS-2 showed decreased association with behavioral measures including ABC and VABS as compared to the full SRS-2, with the exception of ABC Stereotypy and Hyperactivity subscales where the association increased.
Table 4.
Pearson Product Correlations of SRS Full and Sturm with ASD Symptom Measures ADOS, ADIR, and RBS
Scale and subscale | SRS-2 full | Sturm SRS-2 |
---|---|---|
ADOS social awareness | 0.451 | 0.502 |
<0.0001 | <0.0001 | |
85 | 84 | |
ADOS repetitive restricted behaviors | 0.258 | 0.325 |
0.017 | 0.0026 | |
85 | 84 | |
ADOS overall total | 0.436 | 0.499 |
<0.0001 | <0.0001 | |
85 | 84 | |
AD0S2 comparison score | 0.435 | 0.516 |
<0.0001 | <0.0001 | |
81 | 80 | |
RBS stereotyped | 0.608 | 0.584 |
<0.0001 | <0.0001 | |
90 | 89 | |
RBS self-injurious | 0.461 | 0.411 |
<0.0001 | <0.0001 | |
90 | 89 | |
RBS compulsive | 0.333 | 0.242 |
0.001 | 0.022 | |
90 | 89 | |
RBS ritualistic | 0.414 | 0.334 |
<0.0001 | 0.001 | |
90 | 89 | |
RBS restricted | 0.589 | 0.512 |
<0.0001 | <0.0001 | |
90 | 89 | |
RBS total score | 0.599 | 0.506 |
<0.0001 | <0.0001 | |
90 | 89 | |
RBS sameness subscale | 0.489 | 0.390 |
<0.0001 | 0.0004 | |
79 | 78 |
Abbreviations: ADIR, Autism Diagnostic Interview-Revised; ADOS, Autism Diagnostic Observation Schedule; RBS, Repetitive Behavior Scale.
Table 5.
Pearson Product Correlations of SRS Full and Sturm with Behavioral Measures ABC and VABS
Scale and subscale | SRS-2 full | Sturm SRS-2 |
---|---|---|
ABC irritability | 0.597 | 0.506 |
<0.0001 | <0.0001 | |
91 | 90 | |
ABC lethargy | 0.660 | 0.611 |
<0.0001 | <0.0001 | |
91 | 90 | |
ABC stereotypy | 0.573 | 0.597 |
<0.0001 | <0.0001 | |
91 | 90 | |
ABC hyperactivity | 0.651 | 0.658 |
<0.0001 | <0.0001 | |
90 | 89 | |
ABC speech | 0.194 | 0.151 |
0.066 | 0.155 | |
91 | 90 | |
VABs communication | −0.521 | −0.491 |
<0.0001 | <0.0001 | |
91 | 90 | |
VABS daily living skills | −0.460 | −0.430 |
<0.0001 | <0.0001 | |
91 | 90 | |
VABS motor | −0.302 | −0.317 |
0.011 | 0.008 | |
70 | 69 | |
VABS socialization | −0.583 | −0.511 |
<0.0001 | <0.0001 | |
91 | 90 | |
VABS aberrant behavior | −0.548 | −0.505 |
<0.0001 | <0.0001 | |
91 | 90 |
Note. Concurrent validity of SRS-2 full and Sturm with behavioral measures.
Abbreviations: ABC, Aberrant Behavior Checklist; CBC-L, Child Behavior Checklist; SRS, Social Responsiveness Scale; VABS, Vineland Adaptive Behavior Scales.
Item Response Theory
Table 6 shows the parameter estimates and standard errors from IRT analysis. Slope estimates range from 0.531 (item 30) to 2.586 (item 22), suggesting the 16 items vary in the strength of their association with our underlying latent trait of ASD symptoms at a range considered reasonably good [Baker, 2002; De Ayala, 2009; Hambleton et al., 1991; Toland, 2014]. The large slope for item 22 indicates it has the strongest relationship with the underlying latent trait. Item difficulty location parameters range from −3.592 (item 6, b1) to 4.12 (item 30, b3). Given the four-category ordinal nature of the scale, each item has three difficulty location parameters indicating the level of ASD symptomatology at which respondents are likely to endorse a particular response category. For example, while item 15 (b1 = −2.643) is endorsed at very low level of ASD traits, item 41 (b1 = 0.552) requires above average (average θ = 0) levels of ASD traits to be endorsed. As shown graphically in Figure 1, the wide range in location parameters of the 16 items indicates that the shortened SRS-2 measures a large range of the ASD symptomatology continuum. Figure 2B graphically demonstrates this through IICs, with each line showing the ASD trait levels for which each individual item is informative. On further evaluation, it appears that only three items (23, 30, and 41) have response categories endorsed by respondents with severe levels of ASD symptomatology (i.e., θ > 2.25) [Toland, 2014]. Interpretation of this finding is limited as item 41 is significantly skewed, with only one respondent selecting the most extreme response category, and items 23 and 30 have the weakest relationship with the ASD construct based on their slopes.
Table 6.
GRM item parameter estimates and fit statistics for Sturm’s SRS-2 (n = 91)
Item | a | b1 | b2 | b3 |
---|---|---|---|---|
Item 6r | 0.963 | −3.592 | −2.200 | −0.066 |
Item 7 | 2.035 | −1.396 | 0.303 | 1.557 |
Item 11 | 1.343 | −0.810 | 0.692 | 1.619 |
Item 15r | 1.958 | −2.643 | −1.916 | −0.378 |
Item 16 | 1.828 | −1.421 | −0.331 | 0.148 |
Item 22 | 2.586 | −1.532 | −0.514 | 0.267 |
Item 23 | 0.748 | −1.816 | 0.452 | 2.485 |
Item 25 | 2.140 | −1.291 | −0.200 | 0.443 |
Item 27r | 1.423 | −2.240 | −1.380 | 0.090 |
Item 30 | 0.531 | −0.245 | 2.247 | 4.122 |
Item 41 | 1.678 | 0.552 | 2.182 | 3.591 |
Item 51 | 1.190 | −0.791 | 0.777 | 1.813 |
Item 54 | 0.907 | −2.544 | −0.871 | 0.215 |
Item 56 | 1.794 | −1.389 | −0.192 | 0.384 |
Item 60 | 1.653 | −1.103 | −0.359 | 0.420 |
Item 65 | 1.735 | −1.927 | −0.653 | 0.041 |
Note. Item response theory parameter estimates.
Figure 1.
Graphical representation of item discrimination and difficulty parameters.
Figure 2.
Information curves of Sturm’s SRS-2 as a function of latent trait (autism symptomatology) Level. (A) Total information curve and (B) item information curves for items 7, 11, 15, 22, 23, 30, 41, 51, 56, and 65.
ICC analysis indicated that the current four response category structure of the SRS-2 may be inappropriate in our sample, though this finding is tentative given our small sample size. Comparing Figure 3A with Figure 3B demonstrates that while item seven functions as a four-category item, only three categories may be useful in item 16 as category 3 is less likely to be chosen than the other three categories throughout the whole continuum, indicating possible redundancy. Overall, 12 items demonstrated ICCs indicating that only three of four response categories were used appropriately.
Figure 3.
Item characteristic curves for selected items. (A) Item 7 and (B) Item 16 (all response categories) as a function of F1.
Figure 2A shows the test information curve calculated for the shortened 16 items scale, which demonstrates that Sturm’s SRS-2 is highly reliable (80% or greater) at a broad range of the latent trait continuum (−3.25 to 2.25). The maximum amount of information given by the scale is 12.25 around latent trait estimates of −0.25 to 0.25. Precision around score estimates significantly worsen at latent trait levels above 2.25. Figure 2B shows item information curves for 10 randomly selected items from the overall scale. Items 7, 15, 16, 22, and 25 provide the largest empirical information (>1). Two of the three items that provide the most information are the most precise around the mean level of latent trait, but provide less information among those individuals with extreme levels of abilities. For this reason, items such as 15 are important, as they accurately provide information at the extremes of ability. All items except for 23 and 30 provide information >0.2, with item 30 providing the least information. Items 11 and 51 appear to provide nearly identical information given that their respective IIC are extremely similar. This concordance is also true of items 56 and 65, indicating that only one of the items in each pair may be necessary.
IRT theta score estimates calculated for our 91 participants range from −1.905 to 2.908 with a mean = 0.005, SD = 0.956. Theta calculated in this sample demonstrated variance of 0.904, implying a ceiling effect of uncertain significance. Our score estimates worsen outside of the −3.25 to 2.25 range, with greater uncertainty in the score estimates for the respondents observed to have IRT scores outside this range. However, in our sample, only two of the 91 respondents had IRT scores above 2.25.
Discussion
This is the first study to assess the psychometric properties of the SRS-2 in individuals with PMS, a population characterized by ID, ASD, and minimal verbal ability. While this investigation confirmed prior findings that the full SRS-2 has limited utility in populations with significant ID [Constantino & Gruber, 2005; Hall, Lightbody, Hirt, Rezvani, & Reiss, 2010; Skuse et al., 2009], the psychometric properties of Sturm’s shortened SRS-2 are more encouraging although interpretation of findings is limited by small sample size and a lack of empiric data about behaviors that differentiate ASD in the context of severe ID [Thurm et al., 2019]. Consistent with previous studies, our sample of individuals with PMS demonstrated a mean overall NVIQ in the severe ID range, and had significantly limited language abilities [Egger et al., 2016; Oberman, Boccuto, Cascio, Sarasua, & Kaufmann, 2015; Philippe et al., 2008; Soorya et al., 2013; Richards et al., 2017; Zwanenburg et al., 2016]. The severity of ID significantly impacts the ability to make clinical distinctions between ID with and without ASD.
Baseline evaluation of the original SRS-2 in PMS demonstrated significant limitations in the relevance and comprehensibility of many items for a PMS population, pointing to the necessity of a shortened scale. Analysis of missing data at an item level highlighted the utility of Sturm’s shortened SRS-2, as the items with the most missing data were predominantly also those removed by the shortened version. Item-level analysis also demonstrated significant ceiling effects of many SRS-2 items in our sample, with the majority of respondents endorsing the response category associated with the greatest severity of ASD symptoms. This ceiling effect is also evident in the large percentage of individuals (93.41%) within the sample who had SRS-2 T scores >60, despite only 57.78% meeting ASD criteria by clinical consensus. It is unclear whether this ceiling effect is due to high ASD symptom severity or indicative of poor item content validity. Expert evaluation of item content indicated concerns about the relevance of items that specifically reference verbal abilities in a primarily nonverbal population, or describe behaviors that require intact cognitive abilities in a population where the majority of individuals present with moderate to profound ID.
Evaluation of the shortened 16-item version of the SRS-2 in comparison to the original SRS-2 demonstrated significant improvement in the majority of reliability and validity metrics. While only 43% of the full scale’s items discriminated between ASD and non-ASD groups, 56% of Sturm’s items did. Sturm’s shortened version of the SRS-2 maintained strong correlations with the longer form, overall improved correlations with measures of ASD symptoms, and reduced correlations with associated behavioral measures. Both forms showed excellent test–retest reliability and internal consistency. Sturm’s shortened SRS-2 showed significant improvement in structural validity, and all items loaded significantly onto one factor, as compared to the original SRS-2. Given that the full SRS-2 lacked structural validity, had stronger associations with external confounding factors and worse performance on validity measures, IRT evaluation was conducted only on Sturm’s shortened SRS-2.
Item Response Theory analysis yielded mixed results. At an item level, the discrimination parameters (slopes) indicated that all 16 questions are strongly associated with and relevant for measuring ASD symptomatology. The two items with the weakest association with ASD symptoms both belonged to the Social Cognition domain, which may be less relevant for the PMS population given the severity of ID. Of the four items that displayed the strongest association, two evaluated social communication and two evaluated restricted, repetitive behaviors. As a scale, Sturm’s SRS-2 is highly reliable with good precision (80% or more) at a broad range of latent trait continuum, but loses reliability above latent trait levels of around two standard deviations above the mean. In an ideal outcome measure, item difficulty should be matched with the particular population’s latent trait range. In the shortened SRS-2 scale, the majority of participants endorsed the response category associated with the most severe symptoms regardless of ASD diagnosis, which could indicate limited discriminatory ability in our population. However, this interpretation must also be considered in the context of the study’s small sample size and the potentially different presentation of ASD in individuals with PMS versus the population without ID for which the SRS-2 was initially developed [Baker, 2002]. Two pairs of items had nearly identical ICC curves, implying redundancy in item content, even within the shortened version. Together, these findings suggest that revisions to existing items are necessary to eliminate redundancy and ensure that the wording of items is relevant to ID individuals. Evaluation of ICC curves indicated that in 12 items, at least one response category was never more likely to be chosen than the other options, a nonspecific indicator of aberrant item performance or redundancy in item response categories [Reeve & Fayers, 2005]. It is unclear whether this finding is due to actual redundancy in item response category structure, or the small sample size of our study.
Evaluation of item slopes, performance in ASD versus non-ASD cohorts, and content validity offer insight to guide future attempts to develop assessments tailored to PMS. For example, item 30 “Seems overly sensitive to sounds, textures, or smells” had the lowest discrimination parameter of all 16 items. No significant difference was found in mean scores of those PMS individuals with ASD as compared to those without. Indeed, research has shown that individuals with PMS have less sensory sensitivity than their idiopathic ASD counterparts [Mieses et al., 2016], so this item may be less relevant for PMS. Item 54, “Is awkward in turn taking interactions…doesn’t seem to understand the give and take of conversations”, also fails to distinguish between ASD and non-ASD cohorts, and has a lower discrimination parameter. This may reflect limited content validity in a population that is largely nonverbal. Item 15 “plays appropriately with children his or her age” had significantly skewed responses, possibly due to the use of the word “age” in a group of children with severe ID whose play skills are unlikely to match those of their agemates. Refinements of the SRS-2 could improve content validity of these questions by using different language or targeting more developmentally relevant precursor skills. The addition of a “not applicable” category to distinguish when a child’s level of ID (and level of language impairment) makes it impossible to determine if a symptom is present could also be an effective strategy [Thurm et al., 2019].
In this study, investigation of content validity was limited to expert opinion. Content validity is widely considered the most important measurement property of a PRO, but is extremely challenging to assess due to lack of standardized methodology and the complexity of constructs evaluated (Terwee et al., 2018). The construct of ASD is additionally difficult to conceptualize in our PMS population due to the dearth of objective data clinically distinguishing ASD in the setting of severe ID. Questionable content validity is one of the biggest barriers limiting result interpretation in our study and delaying tailored scaled development. Therefore, extensive evaluation of this measurement property is a crucial next step. Future studies should rigorously investigate content validity according to COSMIN standards and include both patient and expert perspectives (Terwee et al., 2018). Moreover, additional research is required to thoroughly characterize ASD in ID, expanding the body of knowledge and informing expert evaluation of item content. The most significant limitation of this study is the relatively small sample size, which is one of the greatest challenges in studying rare diseases like PMS. The small sample size causes some items to have small numbers in one or more response categories, which may skew parameter estimates and lead to inaccurate ICCs. Larger samples improve one’s ability to evaluate response category utility, but no specific guideline indicates what sample size is adequate. Instead, parameters can be calculated as long as every response category has at least one response [Toland, 2014]. To ameliorate the impact of sample size limitations on our results, we used WLSMV with theta parameterization for CFA estimation, a method that is less sensitive to sample size compared to other forms of estimation. Still, the graded response model is a highly parameterized IRT model, which requires a larger sample size for accurate parameter estimates [Edelen & Reeve, 2007]. Therefore, interpretation of IRT findings such as parameter estimates and response category usage remain limited. Although our sample appears characteristic of the larger population of PMS, it is important to note the difference between PMS populations, who are largely nonverbal with severe ID and the sample of children (aged 2.5–18 without cognitive impairment) in whom the measure was initially validated. Future studies should aim to develop items that can distinguish impairments that are greater than that expected based on level of ID and that take into account the unique behavioral phenotype in PMS [Oberman et al., 2015]. Another possible source of bias could be the use of SRS-2 forms from different time points in our IRT model. Given the excellent test–retest reliability indicating stability of test scores, however, it was considered reasonable to include the first complete SRS-2 form for each participant to avoid skewing results with missing data.
This study is the first psychometric evaluation of both original and shortened versions of the SRS-2 in PMS. Our findings are consistent with a growing body of research demonstrating the impact of ID on SRS-2 scores and expand on this research by exploring psychometric measurement properties in a sample characterized by severe ID. These results therefore provide valuable information that may guide the utilization and interpretation of the SRS-2 in PMS and other ID populations associated with ASD. We believe that the shortened version of the SRS-2 developed by Sturm et al. [2017] is a relevant scale whose items are strongly related to underlying ASD latent trait, and reliably measures ASD symptoms across a large range of latent trait continuum. In its current form, it is nevertheless limited by the diagnostic challenges of differentiating ASD in those with severe ID. With the addition of items specifically delineating level of behaviors that are beyond ID, and targeting individuals with PMS specifically, a further-modified SRS-2 short form could hold promise as a clinical outcome assessment. This refinement would be the latest in a growing body of literature attempting to improve psychometric properties of the SRS-2 in minimally verbal and intellectually disabled populations [Duku et al., 2013; Sturm et al., 2017]. A major strength of this study is the use of CAPS-IDD and COSMIN guidelines to identify informative measurement characteristics and assure quality standards. Future research is required to further interrogate content validity of Sturm’s SRS-2 and to identify and develop items that can reliably differentiate social reciprocity symptoms at the full range of intellectual functioning and particularly in subjects with a higher burden of ASD features. Future studies will apply similar methods to develop a novel PRO specific to PMS based on item-level analysis across multiple established measures. An independent sample of participants with PMS will be used to cross-validate the properties of a novel scale in addition to adapted versions of existing ones, such as the short SRS-2. Developing improved clinical outcome measures in neurodevelopmental disorders is critical to advance novel therapeutics.
Acknowledgments
Research reported in this publication was supported by the National Institute of Neurological Disorders and Stroke (NINDS), Eunice Kennedy Shriver National Institute of Child Health & Human Development (NICHD), National Institute of Mental Health (NIMH) and National Center For Advancing Translational Sciences (NCATS) under Award Number U54NS092090. This research was also supported (in part) by the Phelan-McDermid Syndrome Foundation and the Intramural Research Program of the NIMH ZICMH002961. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. We are sincerely indebted to the generosity of the families and patients in PMS clinics across the United States who contributed their time and effort to this study. We would also like to thank the Phelan-McDermid Syndrome Foundation for their continued support in PMS research.
Conflict of Interest
Dr Alexander Kolevzon receives research support from AMO Pharma and consults to Ovid Therapeutics, Sema4, Coronis, Takeda, 5AM Ventures, and LabCorp.
References
- Aman MG, & Singh NN (1986). Aberrant behavior checklist. East Aurora, NY: Slosson. [Google Scholar]
- American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (DSM-5®). Philadelphia, PA: American Psychiatric Publishing. [Google Scholar]
- Anagnostou E, Soorya L, Brian J, Dupuis A, Mankad D, Smile S, & Jacob S (2014). Intranasal oxytocin in the treatment of autism spectrum disorders: A review of literature and early safety and efficacy data in youth. Brain Research, 1580, 188–198. [DOI] [PubMed] [Google Scholar]
- Baker FB (2002). The basics of item response theory (Vol. 2). College Park, MD: ERIC Clearinghouse on Assessment and Evaluation. [Google Scholar]
- Bishop SL, Guthrie W, Coffing M, & Lord C (2011). Convergent validity of the Mullen Scales of Early Learning and the differential ability scales in children with autism spectrum disorders. American Journal on Intellectual and Developmental Disabilities, 116(5), 331–343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bodfish JW, Symons FJ, Parker DE, & Lewis MH (2000). Varieties of repetitive behavior in autism: Comparisons to mental retardation. Journal of autism and developmental disorders, 30(3), 237–243. [DOI] [PubMed] [Google Scholar]
- Charman T, Baird G, Simonoff E, Loucas T, Chandler S, Meldrum D, & Pickles A (2007). Efficacy of three screening instruments in the identification of autistic-spectrum disorders. The British Journal of Psychiatry, 191(6), 554–559. [DOI] [PubMed] [Google Scholar]
- Constantino J (2012). Social Responsiveness Scale (2nd ed.). Los Angeles, CA: Western Psychological Services. [Google Scholar]
- Constantino JN, & Gruber CP (2005). Social responsive scale (SRS) manual. Western Psychological Services: Los Angeles, CA. [Google Scholar]
- Constantino JN, Gruber CP, Davis S, Hayes S, Passanante N, & Przybeck T (2004). The factor structure of autistic traits. Journal of Child Psychology and Psychiatry, 45 (4), 719–726. [DOI] [PubMed] [Google Scholar]
- Constantino JN, Przybeck T, Friesen D, & Todd RD (2000). Reciprocal social behavior in children with and without pervasive developmental disorders. Journal of Developmental and Behavioral Pediatrics, 21(1), 2–11. [DOI] [PubMed] [Google Scholar]
- De Ayala RJ (2009). The Theory and Practice of Item Response Theory. New York, NY: The Guilford Press. [Google Scholar]
- De Rubeis S, Siper PM, Durkin A, Weissman J, Muratet F, Halpern D, … Holder JL (2018). Delineation of the genetic and clinical spectrum of Phelan-McDermid syndrome caused by SHANK3 point mutations. Molecular Autism, 9(1), 31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duku E, Vaillancourt T, Szatmari P, Georgiades S, Zwaigenbaum L, Smith IM, … Volden J (2013). Investigating the measurement properties of the social responsiveness scale in preschool children with autism spectrum disorders. Journal of Autism and Developmental Disorders, 43(4), 860–868. [DOI] [PubMed] [Google Scholar]
- Edelen MO, & Reeve BB (2007). Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement. Quality of Life Research, 16(1), 5. [DOI] [PubMed] [Google Scholar]
- Egger JIM, Zwanenburg RJ, van Ravenswaaij-Arts CMA, Kleefstra T, & Verhoeven WMA (2016). Neuropsychological phenotype and psychopathology in seven adult patients with Phelan-McDermid syndrome: Implications for treatment strategy. Genes, Brain and Behavior, 15(4), 395–404. [DOI] [PubMed] [Google Scholar]
- Elliot CD (1990). Differential abilities scale. San Antonio, TX: Psychological Corporation. [Google Scholar]
- Embretson SE (1996). The new rules of measurement. Psychological assessment, 8(4), 341. [Google Scholar]
- Frazier TW, Ratliff KR, Gruber C, Zhang Y, Law PA, & Constantino JN (2014). Confirmatory factor analytic structure and measurement invariance of quantitative autistic traits measured by the Social Responsiveness Scale-2. Autism, 18(1), 31–44. [DOI] [PubMed] [Google Scholar]
- Gotham K, Pickles A, & Lord C (2010). Standardizing ADOS scores for a measure of severity in Autism Spectrum Disorders. Journal of Autism and Developmental Disorders, 39(5), 693–705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Groft SC, & Gopal-Srivastava R (2013). A model for collaborative clinical research in rare diseases: Experience from the Rare Disease Clinical Research Network Program. Clinical Investigation, 3(11), 1015–1021. [Google Scholar]
- Hall SS, Lightbody AA, Hirt M, Rezvani A, & Reiss AL (2010). Autism in fragile X syndrome: A category mistake? Journal of the American Academy of Child & Adolescent Psychiatry, 49(9), 921–933. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hambleton RK, Swaminathan H, & Rogers HJ (1991). Fundamentals of item response theory. New York, NY: SAGE Publishing. [Google Scholar]
- Havdahl KA, Bal VH, Huerta M, Pickles A, Øyen AS, Stoltenberg C, … Bishop SL (2016). Multidimensional influences on autism symptom measures: Implications for use in etiological research. Journal of the American Academy of Child & Adolescent Psychiatry, 55(12), 1054–1063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu LT, & Bentler PM (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55. [Google Scholar]
- Hus V, Bishop S, Gotham K, Huerta M, & Lord C (2013). Factors influencing scores on the social responsiveness scale. Journal of Child Psychology and Psychiatry, 54(2), 216–224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kang SM, & Waller NG (2005). Moderated multiple regression, spurious interaction effects, and IRT. Applied Psychological Measurement, 29(2), 87–105. [Google Scholar]
- Kolevzon A, Angarita B, Bush L, Wang AT, Frank Y, Yang A, … Edelmann LJ (2014). Phelan-McDermid syndrome: A review of the literature and practice parameters for medical assessment and monitoring. Journal of Neurodevelopmental Disorders, 6(1), 39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koo TK, & Li MY (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine, 15(2), 155–163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lord C, Rutter ML, Dilavore PC, Risi S, Gotham K, & Bishop SL (2012). Autism Diagnostic Observation Schedule (2nd ed.). Torrance, CA: Western Psychological Services. [Google Scholar]
- Lowe JK, Werling DM, Constantino JN, Cantor RM, & Geschwind DH (2015). Social responsiveness, an autism endophenotype: Genome wide significant linkage to two regions on chromosome 8. American Journal of Psychiatry, 172(3), 266–275. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mieses AM, Tavassoli T, Li E, Soorya L, Lurie S, Wang AT, … Kolevzon A (2016). Brief report: Sensory reactivity in children with Phelan–McDermid Syndrome. Journal of Autism and Developmental Disorders, 46(7), 2508–2513. [DOI] [PubMed] [Google Scholar]
- Mohr C, & Gray KM (2005). Assessment in intellectual disability. Current Opinion in Psychiatry, 18(5), 476–483. [DOI] [PubMed] [Google Scholar]
- Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, … De Vet HC (2010). The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: An international Delphi study. Quality of Life Research, 19(4), 539–549. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moul C, Cauchi A, Hawes DJ, Brennan J, & Dadds MR (2015). Differentiating autism spectrum disorder and overlapping psychopathology with a brief version of the social responsiveness scale. Child Psychiatry & Human Development, 46(1), 108–117. [DOI] [PubMed] [Google Scholar]
- Mullen EM (1995). Mullen scales of early learning (pp. 58–64). Circle Pines, MN: AGS. [Google Scholar]
- Oberman LM, Boccuto L, Cascio L, Sarasua S, & Kaufmann WE (2015). Autism spectrum disorder in Phelan-McDermid syndrome: Initial characterization and genotype-phenotype correlations. Orphanet Journal of Rare Diseases, 10(1), 105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Phelan MC, Rogers RC, Saul RA, Stapleton GA, Sweet K, McDermid H, … Kelly DP (2001). 22q13 deletion syndrome. American Journal of Medical Genetics, 101(2), 91–99. [DOI] [PubMed] [Google Scholar]
- Philippe A, Boddaert N, Vaivre-Douret L, Robel L, Danon-Boileau L, Malan V, … Zilbovicius M (2008). Neurobehavioral profile and brain imaging study of the 22q13. 3 deletion syndrome in childhood. Pediatrics, 122(2), e376–e382. [DOI] [PubMed] [Google Scholar]
- Redin C, Gérard B, Lauer J, Herenger Y, Muller J, Quartier A, … Le Gras S (2014). Efficient strategy for the molecular diagnosis of intellectual disability using targeted high-throughput sequencing. Journal of Medical Genetics, 51 (11), 724–736. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reeve BB, & Fayers P (2005). Applying item response theory modeling for evaluating questionnaire item and scale properties. Assessing Quality of Life in Clinical Trials: Methods of Practice, 2, 55–73. [Google Scholar]
- Reeve BB, Hays RD, Bjorner JB, Cook KF, Crane PK, Teresi JA, … Liu H (2007). Psychometric evaluation and calibration of health-related quality of life item banks: Plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Medical Care, 45, S22–S31. [DOI] [PubMed] [Google Scholar]
- Richards C, Powis L, Moss J, Stinton C, Nelson L, & Oliver C (2017). Prospective study of autism phenomenology and the behavioural phenotype of Phelan–McDermid syndrome: Comparison to fragile X syndrome, Down syndrome and idiopathic autism spectrum disorder. Journal of Neurodevelopmental Disorders, 9(1), 37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rutter M, Le Couteur A, & Lord C (2003). Autism diagnostic interview [revised] (Vol. 29, p. 30). Los Angeles, CA: Western Psychological Services. [Google Scholar]
- Skuse DH, Mandy W, Steer C, Miller LL, Goodman R, Lawrence K, … Golding J (2009). Social communication competence and functional adaptation in a general population of children: Preliminary evidence for sex-by-verbal IQ differential risk. Journal of the American Academy of Child & Adolescent Psychiatry, 48(2), 128–137. [DOI] [PubMed] [Google Scholar]
- Soorya L, Kolevzon A, Zweifach J, Lim T, Dobry Y, Schwartz L, … Halpern D (2013). Prospective investigation of autism and genotype-phenotype correlations in 22q13 deletion syndrome and SHANK3 deficiency. Molecular Autism, 4(1), 18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Soorya L, Leon J, Trelles MP, & Thurm A (2017). Framework for assessing individuals with rare genetic disorders associated with profound intellectual and multiple disabilities (PIMD): The example of Phelan McDermid Syndrome. The Clinical Neuropsychologist, 32(7), 1226–1255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sparrow SS (2011). Vineland adaptive behavior scales. In Kreutzer JS, DeLuca J, & Caplan B (Eds.), Encyclopedia of clinical neuropsychology (pp. 2618–2621). New York, NY: Springer. [Google Scholar]
- Sturm A, Kuhfeld M, Kasari C, & Mccracken JT (2017). Development and validation of an item response theory-based Social Responsiveness Scale short form. Journal of Child Psychology and Psychiatry, 58(9), 1053–1061. [DOI] [PubMed] [Google Scholar]
- Terwee CB, Prinsen CA, Chiarotto A, de Vet HC, Bouter LM, Alonso J, … & Mokkink LB (2018). COSMIN methodology for assessing the content validity of PROMs—User manual. [DOI] [PMC free article] [PubMed]
- Thissen DE, & Wainer HE (2001). Test scoring. New Jersey, NJ: Lawrence Erlbaum Associates Publishers. [Google Scholar]
- Thorndike RL, Hagen EP, & Sattler JM (1986). Stanford-Binet intelligence scale. Rolling Meadows, IL: Riverside Publishing Company. [Google Scholar]
- Thurm A, Farmer C, Salzman E, Lord C, & Bishop S (2019). State of the Field: Differentiating Intellectual Disability From Autism Spectrum Disorder. Frontiers in Psychiatry, 10, 526. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Toland MD (2014). Practical guide to conducting an item response theory analysis. The Journal of Early Adolescence, 34(1), 120–151. [Google Scholar]
- Wright CF, Fitzgerald TW, Jones WD, Clayton S, McRae JF, Van Kogelenberg M, … Bevan AP (2015). Genetic diagnosis of developmental disorders in the DDD study: A scalable analysis of genome-wide research data. The Lancet, 385(9975), 1305–1314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zeilinger EL, Nader IW, Brehmer-Rinderer B, Koller I, & Weber G (2013). CAPs-IDD: characteristics of assessment instruments for psychiatric disorders in persons with intellectual developmental disorders. Journal of Intellectual Disability Research, 57(8), 737–746. [DOI] [PubMed] [Google Scholar]
- Zwanenburg RJ, Ruiter SA, van den Heuvel ER, Flapper BC, & Van Ravenswaaij-Arts CM (2016). Developmental phenotype in Phelan-McDermid (22q13. 3 deletion) syndrome: A systematic and prospective study in 34 children. Journal of Neurodevelopmental Disorders, 8(1), 16. [DOI] [PMC free article] [PubMed] [Google Scholar]