Abstract
Because of recent concerns about the replication of published results in the behavioral and biomedical sciences (Ioannidis, 2005; Pashler & Wagenmakers, 2012; Open Science Collaboration, 2015), we have conducted a replication of our recently published analyses of longitudinal reading performance and attention deficit-hyperactivity disorder data from twin pairs selected for reading difficulties (Wadsworth et al., 2015). Results obtained from univariate and bivariate DeFries-Fulker (1985, 1988) analyses of data from a subset of twin pairs tested in the International Longitudinal Twin Study of Early Reading Development at post-4th grade, and its continuation into high school at post-9th grade, were compared to those from our previous report. Similar measures of reading performance, the same measures of inattention and hyperactivity/impulsivity, and similar selection criteria were used in the two studies. In general, the patterns of results obtained from these two independent studies were highly similar. Thus, these results clearly illustrate the principle that findings from studies in quantitative behavioral genetics often replicate (Plomin et al., 2016).
We recently reported results of the first longitudinal twin study of reading difficulties (RD) and attention deficit-hyperactivity disorder (ADHD) symptom dimensions in the Colorado Learning Disabilities Research Center (CLDRC; DeFries et al., 1997), a sample of twin pairs selected for reading difficulties (Wadsworth et al., 2015). The purpose of that study was to assess the etiology of the stability of RD as well as the etiology of comorbidity between RD and ADHD symptom dimensions both contemporaneously and longitudinally, using univariate and bivariate DeFries-Fulker (DF) analyses (DeFries & Fulker, 1985, 1988). Reading composite data based on the Reading Recognition, Reading Comprehension and Spelling subtests of the Peabody Individual Achievement Test (PIAT; Dunn & Markwardt, 1970) and ADHD symptom dimensions (inattention and hyperactivity/impulsivity) from the Disruptive Behavior Rating Scale (DBRS; Barkley & Murphy, 1998), were analyzed from twin pairs in which at least one member met proband criteria for RD at initial assessment, and in which both members of the pair had data from a follow-up assessment approximately five years later. In order for an individual to be classified as RD, he or she was required to have a positive history for reading problems and be classified as affected by scores on the reading composite. Additional diagnostic criteria included a verbal or performance IQ score of at least 85 on the Wechsler Intelligence Scale for Children – Revised (WISC-R; Wechsler, 1974) or Wechsler Adult Intelligence Scale–Revised (WAIS-R; Wechsler, 1981); no evidence of neurological problems; and no uncorrected visual or auditory acuity deficits. The subjects ranged in age from 7.7 to 20.5 years (average age of 11.6 years) at initial assessment, and from 12.6 to 26.6 years (average age of 16.2 years) at follow-up.
The genetic etiologies of RD and of the comorbidity between RD and ADHD at the initial measurement occasion were assessed by DF analyses of data from 767 twin pairs for the univariate analysis of RD and 345 pairs for the bivariate analyses of RD and ADHD. In addition, data were analyzed from 94 twin pairs in which at least one member of each pair met proband criteria for RD and for whom reading data were available at both measurement occasions, as well as from 88 twin pairs that also had ADHD data at follow-up. Results of these analyses indicated that more than 60% of the proband deficit in reading at initial assessment was due to genetic influences, and that reading deficits at follow-up were substantially due to these same genetic influences (Biv h2g = .79 ± .22). Results of bivariate DF analyses of initial reading and both initial and follow-up symptoms of inattention (IN) indicated that genetic influences accounted for 60% of the contemporaneous relationship and approximately two-thirds of the longitudinal relationship (Biv h2g = −.68 ± .33). In contrast, bivariate h2g estimates for the comorbidity between intial reading and both contemporaneous and follow-up hyperactivity/impulsivity (HI/I) symptoms were small and nonsignificant (Wadsworth et al., 2015). In summary, our previous findings based on analyses of data from the CLDRC selected sample indicated strong genetic influences on reading difficulties at initial assessment, as well as on comorbidity between reading difficulties and IN at initial and follow-up assessments (see Table 1).
Table 1.
Results of Univariate and Bivariate DF Analyses of Two Studies1
| Analysis | CLDRC Results (C.I.) |
ILTS Results (C.I.) |
|---|---|---|
| Univariate DF-Initial Reading | h2g = .63 ± .06, p < 1.1 × 10−16 (.50, .75) |
h2g = .69 ± .23, p ≤ .0015 (.23, 1.14) |
| Bivariate DF-Initial and Follow-up Reading | Biv h2g = .79 ± .22, p <.0003 (.35, 1.23) |
Biv h2g = .72 ± .31, p ≤ .013 (.11, 1.34) |
| Bivariate DF-Initial Reading and Initial IN | Biv h2g = −.60 ± .15, p < 6.9 × 10−5 (−.90, −.20) |
Biv h2g = −.40 ± .30, p ≥ .09 (−1.00, .20) |
| Bivariate DF-Initial Reading and Follow-up IN | Biv h2g = −.68 ± .33, p <.02 (−1.33, −.04) |
Biv h2g = −.33 ± .31, p ≥ .14 (−.95, .28) |
| Bivariate DF-Initial Reading and Initial H/I | Biv h2g = −.20 ± .15, p > .097 (−.50, .10) |
Biv h2g = −.29 ± .32, p ≥ .18 (−.92, .35) |
| Bivariate DF-Initial Reading and Follow-up H/I | Biv h2g = .11 ± .37, p > .38 (−.62, .84) |
Biv h2g = −.27 ± .34, p ≥ .21 (−.95, .40) |
All p-values are one-tailed.
Recently, it has been noted that many statistically significant findings in the behavioral sciences have not replicated (Pashler & Wagenmakers, 2012; Plomin et al., 2016). A recent attempt to replicate findings of 100 such studies found that 64% failed to replicate (Open Science Collaboration, 2015). In an attempt to replicate 17 brain-behavior studies, Boekel et al. (2015) found that none replicated. Results of attempts to replicate medical findings have been similarly discouraging, with five of six nonrandomized designs failing to replicate (Ioannidis, 2005). These and similar results have led to claims that 85% of research resources are wasted (Macleod et al., 2014).
The International Longitudinal Twin Study of Early Reading Development (ILTS; Byrne et al., 2006) and its continuation into high school provide an exceptional opportunity to conduct a replication of our previous study using similar measures and selection criteria, and exactly the same analyses. To accomplish this, we chose those measurement occasions (post-4th grade and post 9th grade) that corresponded most closely in age to the mean ages at initial and follow-up assessments in the CLDRC (11.6 and 16.2 years, respectively) and selected those twin pairs in which at least one member of the pair had reading difficulties at post-fourth-grade assessment (average age 10.5 years) and follow-up data at post-ninth grade (average age 15.5 years).
Methods
Participants
Subjects in the current study are participants in the ongoing ILTS (Byrne et al., 2006) that includes twins from Australia, U.S., and Scandinavia. However, the subset of twins whose data were used in the current study include only those participating in the US (Colorado) study. Twins were recruited from birth records and zygosity was determined from DNA extracted from cheek swabs, or in a minority of cases (28%, most of whom were clearly fraternal) from selected items from the Nichols and Bilbro (1966) questionnaire. All twins were learning to read English at entrance into the study. Those twin pairs in which at least one member of the pair had a composite reading score at least one standard deviation below the full sample mean at post-4th grade, and scores on either the WPPSI Vocabulary or Block Design at entrance of no more than one standard deviation below the sample mean at entry into the study were selected for analyses. The subsample selected for reading difficulties at the end of 4th grade consisted of 86 twin pairs, 38 monozygotic (MZ; i.e., identical) and 48 same-sex dizygotic (DZ; i.e., fraternal). By post-9th grade the sample consisted of 34 MZ and 46 DZ pairs.
Procedure and Measures
The measures included in the present analyses are from larger test batteries that were administered in the ILTS in the summer after each school year. Testing at each time point was conducted in a single session in the twins’ homes or schools. Two testers separately assessed each twin at the same time. The following measures were included in the current analyses:
Reading
The Test of Word Reading Efficiency (TOWRE; Torgesen et al., 1999), Sight Word Efficiency, as well as the Woodcock-Johnson Word ID and Passage Comprehension (Woodcock, McGrew, & Mather, 2001) were administered at both post-4th grade and post-9th grade.
ADHD
Inattention (IN) and hyperactivity/impulsivity (H/I) were measured using 9 items relating to IN and 9 relating to H/I from the parent and teacher versions of the Disruptive Behavior Rating Scale (DBRS; Barkley & Murphy, 1998). These items have been shown to be a valid and reliable measure of ADHD symptoms in children (Lahey et al., 2004; Willcutt et al., 2007).
Verbal and Performance IQ
WPPSI Vocabulary, assessed at entry into the study at pre-Kindergarten, was used as a proxy for verbal IQ, and Block Design was used as a proxy for performance IQ.
Analyses
Multiple regression analysis of twin data
Although qualitative analysis such as a comparison of concordance rates is appropriate as a test for genetic etiology of a dichotomous variable, such as diagnosis of an illness or behavioral disorder, reading difficulties and ADHD symptoms occur on a continuum, with somewhat arbitrary cutoff points designating an individual as “affected” or “unaffected.” Therefore, DeFries and Fulker (1985) proposed a multiple regression analysis of twin data to assess the etiology of extreme scores on a continuous measure. A basic model was proposed in which a cotwin’s score is predicted from the proband’s score on the selected trait and the coefficient of relationship (1.0 and 0.5 for identical and fraternal twin pairs, respectively) such that
| (1) |
where C symbolizes the cotwin’s score, P is the proband’s score, R is the coefficient of relationship, and A is the regression constant. B1 is the partial regression of the co-twin’s score on the proband’s score, a measure of average MZ and DZ twin resemblance, B2 is the partial regression of the co-twin’s score on the coefficient of relationship and equals twice the difference between the MZ and DZ co-twin means after covariance adjustment for any difference between MZ and DZ proband means. As a result, B2 provides a direct test for genetic etiology. Further, when the data are appropriately transformed prior to multiple-regression analysis (i.e., each score is expressed as a deviation from the mean of the unselected population and then divided by the difference between the proband and population means), B2= h2g, an index of the extent to which the average deficit of the probands is due to genetic influences (DeFries & Fulker, 1988). For the current analyses, the unselected population is represented by the full population sample of twin pairs at each assessment.
Etiologies of stability and comorbidity
The DeFries-Fulker multiple regression model may be extended to assess the relationship between two different phenotypes or the same phenotype at two different time points. For example, to assess the etiology of stability between deficits in reading performance at the two time points, the following bivariate extension of the basic regression model was fitted to proband reading scores at initial assessment and cotwins’ scores at follow-up:
| (2) |
where Cy is the cotwin’s score at follow-up (Y) and Px is the proband’s score at initial assessment. In the bivariate case, B1 is the partial regression of the cotwin’s reading score at follow-up (Y) on the proband’s initial reading score (X), a measure of the average MZ–DZ cross-variable twin resemblance, or the extent to which cotwin scores on Y are related to proband scores on X (in this case, reading) across zygosity. B2 is the partial regression of the cotwin’s Y score on the coefficient of relationship. When the data are appropriately transformed, B2 = hx hy rG(xy), an index of the extent to which the proband deficit on X is due to genetic factors that also influence scores on Y, i.e. “bivariate heritability” (Light & DeFries, 1995). rG(xy) is the genetic correlation, an index of the degree to which individual differences in two variables are due to the same genetic influences. Thus, Equation 2 can also be applied to assess the genetic etiologies of both contemporaneous and longitudinal comorbidities between reading difficulties and ADHD symptom dimensions.
In the current study, the etiology of reading deficits at grade 4 are assessed, as well as their longitudinal stability between grade 4 and grade 9. In addition, both the contemporaneous relations between grade 4 reading and grade 4 IN and H/I, and the longitudinal relations between grade 4 reading and grade 9 IN and H/I were assessed. In order to provide strictly parallel analyses to the CLDRC analyses, subjects were not reselected at grade 9.
Results
Table 1 presents results of both the previously published analyses of data from the CLDRC and those of the current study. Although there are some relatively minor differences between the results, the overall pattern of results, and indeed most estimates, are highly similar. In both studies, the heritability of the group deficit in reading at initial assessment is greater than 60%. Also, in both studies, genetic influences on stability of the reading deficit are greater than 70%. Although the bivariate heritability for initial reading and IN in the CLDRC (−.60) is larger than the corresponding estimate for the ILTS (−.40), and the difference is even greater for the bivariate heritability of initial reading and follow-up IN (−.68 vs. −.33), their confidence intervals overlap substantially. In addition, bivariate heritabilities for initial reading and both initial and follow-up H/I are somewhat lower than the corresponding estimates for IN in both studies.
Discussion
The failure of many findings in the behavioral and biomedical sciences to replicate may have many possible causes, including differences in populations, ages of subjects, measures, diagnostic criteria, etc. Thus, the current study, based on analyses of data from a selected subset of a population sample, has attempted to replicate our previous findings from a selected sample using identical analyses, as well as highly similar measures and diagnostic criteria. Sample sizes differed depending on the measures analyzed and samples from which subjects were drawn, but were similar for the bivariate analyses in the two studies. Results obtained from DF analyses indicated that reading deficits at initial assessment and their stability are due substantially to genetic influences in both studies. Also, results of both studies suggested that genetic influences on the comorbidity between initial reading and IN were greater than those on the comorbidity between initial reading and H/I, both contemporaneously and longitudinally.
As indicated by their relatively large confidence intervals, the differences between the CLDRC and ILTS bivariate heritabilities for initial reading and IN may only be due to chance. However, these differences could also be due in part to some minor differences in sample and procedure. First, the CLDRC sample is a selected sample. Although a subset of subjects was selected for these analyses in the ILTS, the selection criteria were not exactly the same, and indeed could not be the same due to differences in measures administered. Second, although the mean ages of subjects at each measurement occasion were similar, there was a wide range of ages at both measurement occasions in the CLDRC, with the range of ages at initial assessment from 7.7 to 20.5 years of age, and at follow-up from 12.6 to 26.6 years of age, whereas in the ILTS, all subjects were post-4th grade and post-9th grade, with little range in age at each assessment. Further, the measures of reading also differed somewhat for the two samples.
Conclusions
Our previous findings of substantial genetic influences for reading deficits and their longitudinal stability are clearly replicated in this independent analysis of twin data. Although the bivariate h2g estimates between reading difficulties and IN are somewhat lower in this replication study, the bivariate heritability estimates between reading deficits and HI are relatively low in both studies. Nevertheless, the minor differences between these results clearly illustrate the need for standardization of procedures and measures, as well as for the importance of replication.
Acknowledgments
The continued cooperation of the many families and schools participating in the CLCRC and ILTS, as well as the work of the staff members of these projects is gratefully acknowledged.
Financial Support
The Colorado Learning Disabilities Research Center is supported by grant HD027802; the U.S. Sample of the International Longitudinal Twin Study of Early Reading Development is supported by grant HD038526, and its continuation into high school, Etiology and Neuropsychology of Math, Reading, ADHD, and Their Covariation by grant HD068728, all from the Eunice Kennedy Shriver Center of the National Institute of Child Health and Human Development (NICHD).
Footnotes
Conflict of Interest
None
Ethical Standards
The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.
Literature Cited
- Barkley RA, Murphy K. Attention-deficit hyperactivity disorder: A clinical workbook. 2nd ed Guilford Press; New York: 1998. [Google Scholar]
- Boekel W, Wagenmakers EJ, Belay L, Verhagen J, Brown S, Forstmann BU. A purely confirmatory replication study of structural brain-behavior correlations. Cortex. 2015;66:115–133. doi: 10.1016/j.cortex.2014.11.019. [DOI] [PubMed] [Google Scholar]
- Byrne B, Olson RK, Samuelsson S, Wadsworth S, Corley R, DeFries JC, Willcutt E. Genetic and environmental influences on early literacy. Journal of Research in Reading. 2006;29(1):33–49. [Google Scholar]
- DeFries JC, Fulker DW. Multiple regression analysis of twin data. Behavior Genetics. 1985;15:467–473. doi: 10.1007/BF01066239. [DOI] [PubMed] [Google Scholar]
- DeFries JC, Fulker DW. Multiple regression analysis of twin data: Etiology of deviant scores versus individual differences. Acta Geneticae Medicae et Gemellologiae: Twin Research. 1988;37(205-216) doi: 10.1017/s0001566000003810. [DOI] [PubMed] [Google Scholar]
- Dunn LM, Markwardt FC. Peabody Individual Achievement Test. American Guidance Service; Circle Pines, MN: 1970. [Google Scholar]
- Ioannidis JPA. Why most published research findings are false. PLos Medicine. 2005;2(8):e124. doi: 10.1371/journal.pmed.0020124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lahey BB, Pelham WE, Loney J, Kipp H, Ehrhardt A, Lee SS, Willcutt EG, Hartung CM, Chronis A, Massetti G. Three-year predictive validity of DSM-IV attention deficit hyperactivity disorder in children diagnosed at 4-6 years of age. American Journal of Psychiatry. 2004;161(11):2014–2020. doi: 10.1176/appi.ajp.161.11.2014. [DOI] [PubMed] [Google Scholar]
- Light JG, DeFries JC. Comorbidity of reading and mathematics disabilities: Genetic and environmental etiologies. Journal of Learning Disabilities. 1995;28:96–106. doi: 10.1177/002221949502800204. [DOI] [PubMed] [Google Scholar]
- Macleod M, Michie S, Roberts I, Dirnagl U, Chalmers I, Ioannidis J, Al-Shahi S, Chan A, Glasziou P. Biomedical research: Increasing value, reducing waste. Lancet. 2014;383:101–104. doi: 10.1016/S0140-6736(13)62329-6. [DOI] [PubMed] [Google Scholar]
- Nichols RC, Bilbro WC. The diagnosis of twin zygosity. Acta Genetica et Statistica Medica. 1966;16:265–275. doi: 10.1159/000151973. [DOI] [PubMed] [Google Scholar]
- OpenScienceCollaboration Estimating the reproducibility of psychological science. Science. 2015;349(6251) doi: 10.1126/science.aac4716. doi: aac4716. [DOI] [PubMed] [Google Scholar]
- Pashler H, Wagenmakers E-J. Editors’ introduction to the special section on replicability in psychological science: A crisis of confidence? Perspectives on Psychological Science. 2012;7:528–530. doi: 10.1177/1745691612465253. [DOI] [PubMed] [Google Scholar]
- Plomin R, DeFries JC, Knopik VS, Neiderhiser JM. Top 10 Replicated Findings From Behavioral Genetics. Perspectives on Psychological Science. 2016;11(1):3–23. doi: 10.1177/1745691615617439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Torgesen J, Wagner R, Rashotte CA. A Test of Word Reading Efficiency (TOWRE) PRO-ED; Austin, TX: 1999. [Google Scholar]
- Wadsworth SJ, DeFries JC, Willcutt EG, Pennington BF, Olson RK. The Colorado Longitudinal Twin Study of Reading Difficulties and ADHD: Etiologies of Comorbidity and Stability. Twin Research and Human Genetics. 2015;18(6):755–761. doi: 10.1017/thg.2015.66. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wechsler D. Examiner's manual: Wechsler Intelligence Scale for Children-Revised. The Psychological Corporation; New York: 1974. [Google Scholar]
- Wechsler D. Examiner's manual: Wechsler Adult Intelligence Scale-Revised. The Psychological Corporation; New York: 1981. [Google Scholar]
- Willcutt EG, Betjemann RS, Pennington BF, Olson RK, DeFries JC, Wadsworth SJ. Longitudinal study of reading disability and attention-deficit/hyperactivity disorder: Implications for education. Mind, Brain, and Education. 2007;4:181–192. [Google Scholar]
