Abstract
Although twin, family, and adoption studies have shown that general cognitive ability (GCA) is substantially heritable, GWAS has not uncovered a genetic polymorphism replicably associated with this phenotype. However, most polymorphisms used in GWAS are common SNPs. The present study explores use of a different class of genetic variant, the copy-number variant (CNV), to predict GCA in a sample of 6,199 participants, combined from two longitudinal family studies. We aggregated low-frequency (<5%) CNV calls into eight different mutational burden scores, each reflecting a different operationalization of mutational burden. We further conducted three genome-wide association scans, each of which utilized a different subset of identified low-frequency CNVs. Association signals from the burden analyses were generally small in effect size, and none were statistically significant after a careful Type I error correction was applied. No signal from the genome-wide scans significantly differed from zero at the adjusted Type I error rate. Thus, the present study provides no evidence that CNVs underlie heritable variance in GCA, though we cannot rule out the possibility of very rare or small-effect CNVs for this trait, which would require even larger samples to detect. We interpret these null results in light of recent breakthroughs that aggregate SNP effects to explain much, but not all, of the heritable variance in some quantitative traits.
Keywords: Molecular genetics, copy-number variants, CNVs, general cognitive ability, IQ
General cognitive ability (GCA) is the theoretical construct involved to some extent in every cognitively demanding task. Frequently identified with Spearman’s (1904) general factor g, it is posited to contribute some part of the variance in scores on all mental-ability tests. GCA correlates appreciably with other variables in a striking variety of domains (Gottfredson, 2003; Herrnstein & Murray, 1996; Jensen, 1998; Deary, 2012). Decades of twin, adoption, and family studies have firmly established via biometric methods that general cognitive ability is substantially heritable (Bouchard & McGue, 2003; Deary, Johnson, & Houlihan, 2009), but genome-wide association studies (GWAS) for this trait (Butcher, Davis, Craig, & Plomin, 2008; Davis et al., 2010; Davies et al., 2011; Benyamin et al., 2013) have not uncovered a single-nucleotide polymorphism (SNP) replicably associated with it at genome-wide significance levels.
However, there are classes of genetic variation other than SNPs, which might instead more powerfully account for heritable variance in complex traits. One of these is the copy-number variant (CNV). According to Scherer et al.’s (2007) taxonomy of genomic variation, a CNV is a submicroscopic structural variant of at least 1000 base pairs that appears a different number of times in the genomes of different individuals. CNVs include deletions, in which case the individual will have fewer than the typical two copies of the DNA sequence, as well as duplications, where more than two copies are present.
Previous research has implicated CNVs in psychiatric diseases, including autism (Sebat et al., 2007; Pinto et al., 2010) and schizophrenia (The International Schizophrenia Consortium, 2008). Autism is highly comorbid with mental retardation (American Psychiatric Association, 1994), and the connection between schizophrenia and pre-morbid cognitive deficits is well-documented (Woodberry, Giuliano, & Seidman, 2008). This suggests the possibility that CNVs underlie part of the heritable variance of GCA. Indeed, large-scale, cytogenetically visible structural duplications and deletions of chromosomal material can cause syndromal forms of mental retardation, the textbook example being trisomy 21 (Down syndrome), caused by a redundant copy of an entire chromosome. CNVs (as defined here) are smaller-scale and submicroscopic, but recent technological advances have enabled identification of a growing list of specific deletions and duplications associated with syndromal intellectual disability (Mefford, Batshaw, & Hoffman, 2012), as well as neurodevelopmental disorders more generally (Glessner, Connolly, & Hakonarson, 2012; Coe, Girirajan, & Eichler, 2012). Thus, one research strategy would be to search for specific CNVs contributing to normal-range variation in cognitive functioning. However, if the same lessons learned from SNPs for quantitative phenotypes also hold for CNVs, the effect-sizes of relatively common deletions and duplications will be quite small. In fact, the effects of common CNVs have already been indirectly investigated by SNP GWAS, since most common CNVs are well tagged by common SNPs (Wellcome Trust Case-Control Consortium, 2010). This is not to say that large-effect CNVs are unlikely to exist for GCA, but rather, that they are unlikely to be common mutations. The reliability of calls for low-baserate CNVs can be limited (Cooper & Mefford, 2011), and analyses of particularly rare variants will generally be underpowered unless the sample size is quite large. Therefore, another strategy is to aggregate CNVs from across the genome, into one or more mutational burden scores. Both the International Schizophrenia Consortium (2008) and Pinto et al.’s autism study (2010) exploited this strategy, and report significant case-control burden differences. Further, a recent review of the role of CNVs in neurodevelopmental disorders (Coe et al., 2012) concluded that a greater burden of larger and/or more numerous small CNVs typically corresponds to greater phenotypic severity.
Contemporary neurobiological theory of general intelligence recognizes the key role of a distributed network of frontal and parietal brain structures (Jung & Haier, 2007; Gläscher et al., 2010), and much evidence supports the hypothesis that the brains of more-intelligent individuals function more efficiently, in terms of energy consumption during a cognitively demanding task (Neubauer & Fink, 2009). GCA seemingly depends upon the efficient functioning and coordinated directed effort of a distributed array of neural structures, and the functioning of such a system is far easier to disturb than enhance. Approximately 84% of human genes are expressed in brain (Hawrylycz et al., 2012), and in light of CNVs’ role in pathological cognitive deficits, it seems reasonable to hypothesize that they could contribute to normal-range variation as well. Specifically, individuals whose genomes show greater deviation from “typical” reference copy-number states would more likely to harbor detrimental trait-relevant mutations, and have correspondingly lower cognitive ability scores.
An organism’s total load of mutations relative to the “typical” reference genome could reflect developmental instability, particularly if the mutations in question are rare. In turn, developmental instability is frequently operationalized as fluctuating asymmetry (deviation from morphological bilateral symmetry; Gangestad & Thornhill, 1999), which correlates negatively with GCA (Banks, Batchelor, & McDaniel, 2010). Therefore, one might hypothesize that mutational burden would also associate negatively with GCA. This evolutionary-biological hypothesis motivated a small-sample study of CNVs and IQ, by Yeo, Gangestad, Liu, Calhoun, and Hutchinson (2011). The study participants entered into analysis were N = 74 adults diagnosed with alcohol dependence, who had been assessed with the Wechsler Abbreviated Scale of Intelligence. From DNA samples, Yeo et al. detected a total of 13,557 low-frequency (<5%) CNVs, 7,249 of which were deletions, and 6,308 of which were duplications. Detected copy-number deletions in the sample ranged from ~8kb to ~626kb in length; the average copy-number deletion was ~210kb long (SD ≈ 14kb). The length (in kilobases) of the copy-number deletions participants carried correlated negatively with their Full-scale IQ (FSIQ) scores (r = −0.30, p = 0.01). In contrast, participants’ counts of deletions carried correlated positively with FSIQ, though not significantly so (r = 0.21, p = 0.08). The number of deletions carried ranged from 1 to 25 in the sample, with an average of 10.95 (SD = 5.48). Neither the length nor count of copy-number duplications correlated significantly with FSIQ, and both correlations were less than 0.10 in absolute magnitude. Yeo et al. (2011) interpret their result for copy-deletion length as consistent with the hypothesis that individual differences in cognitive ability result partly from individual variation in the total burden of detrimental mutations carried.
Three recent studies with larger samples have attempted to replicate Yeo et al. (2011). One of these (MacLeod et al., 2012) was a study of both fluid and crystallized intelligence in a sample of over 3,000 older British adults genotyped on the Illumina 610-Quadv1 chip. MacLeod et al. called CNVs using both PennCNV (Wang et al., 2007) and QuantiSNP (Colella et al., 2007), retaining only those calls produced by both. No association was observed with rare-CNV (<1%) burden. Suggestive evidence of association with fluid intelligence was observed for a specific CNV overlapping with SHANK3 (permutation-corrected p = 0.01). But, of the three mutant carriers in the sample, two had duplications and one had a deletion, which MacLeod et al. regard as counter-intuitive.
Bagshaw et al. (2013) reported a study of IQ and academic achievement conducted in a sample of 717 participants from the longitudinal Christchurch Health and Development Study in New Zealand. These participants were genotyped on the Illumina 660W-Quad chip. Bagshaw et al. called CNVs with PennCNV, and conducted a rare-CNV burden analysis and a genome-wide scan of common CNVs. They observed no strong evidence of association, and the only suggestive association signals were for academic achievement, and not IQ, which we regard as a superior measure of the GCA construct.
The third recent study of interest is McRae et al. (2013), which was conducted in a sample of 800 Australian adolescents, who were IQ-tested around age 16, and genotyped on the Illumina 610K Quad array. McRae et al. called CNVs using QuantiSNP, and performed a mutational burden analysis and a genome-wide scan with low-frequency (<5%) CNVs, obtaining only null results.
Below, we report the results from our study of CNVs and IQ, which was conducted in an ethnically homogeneous community-based sample, larger (N > 6000 participants) than that of Yeo et al. (2011) or the three recent studies combined. Our participants were genotyped on the Illumina 660W Quad array, and we called CNVs with PennCNV. We hypothesize that mutational burden will associate negatively with IQ. We report the results of mutational burden analyses, and—since we have reasonable power to detect a specific CNV of moderate-to-large effect—we supplement the burden analyses with genome-wide association scans of single CNVs.
Methods
GWAS Sample
Participants
Both studies involved—the Sibling Interaction and Behavior Study (SIBS; McGue, Keyes, Sharma, Elkins, Legrand, et al., 2007) and the Minnesota Twin Family Study (MTFS; Iacono, Carlson, Taylor, Elkins, & McGue, 1999; Iacono & McGue, 2002; Keyes et al., 2009)—and the collection, genotyping, and analysis of DNA samples, were approved by the University of Minnesota Institutional Review Board's Human Subjects Committee. Written informed assent or consent was obtained from all participants; parents provided written consent for their minor children.
Our participants came from two longitudinal family studies conducted by the Minnesota Center for Twin and Family Research (MCTFR): SIBS and MTFS. The two age cohorts of the MTFS, the 11-year-old and 17-year-old cohorts, are named for the target ages of their constituent twins at the intake assessment. MTFS is a longitudinal study of a community-based sample of same-sex twins born between 1972 and 1994 in the State of Minnesota, and their parents. SIBS is an adoption study of sibling pairs and their parents; its community-based sample contains families where both siblings are adopted, where both are biologically related to the parents, or where one is adopted and one is biologically related. As required by SIBS inclusion criteria, any sibling in the sample who was adopted into the family will not be biologically related to his or her co-sibling, which has been genomically verified (Miller et al., 2012). For the purposes of our analysis, the sample comprises six distinct family-types:
Monozygotic- (MZ) twin families (N = 3452, in 1086 families),
Digyzotic- (DZ) twin families (N = 1820, in 618 families),
SIBS families with two adopted offspring (N = 260, in 209 families),
SIBS families with two biological offspring (N = 416, in 178 families),
“Mixed” SIBS families with 1 biological and 1 adopted offspring (N = 176, in 100 families),
Step-parents (N = 75).
Table 1 provides some descriptive characteristics of our sample. As explained below, our method of analysis accounted for the clustering of individual participants within families. However, family-type #6, step-parents, do not fit neatly into a four-member family unit; we treated them as independent observations in our analysis.
Table 1.
Descriptive characteristics of sample
Parents | Twins (17yo) | Twins (11yo) | Non-twin Biological Offspring |
Adoptees | Step-parents | |
---|---|---|---|---|---|---|
N | 2879 | 1015 | 1777 | 358 | 95 | 75 |
Female(%) | 60.7% | 55.5% | 50.5% | 50.6% | 44.2% | 9.3% |
Mean Age at Intake (SD) | 43.3 (5.51) | 17.5 (0.46) | 11.8 (0.44) | 14.9 (1.86) | 15.3 (2.15) | 40.7 (7.49) |
Mean FSIQ (SD) | 105.9 (14.2) | 100.3 (14.1) | 103.8 (13.6) | 108.9 (12.9) | 105.9 (13.9) | 102.5 (15.6) |
Note. Total N = 6199, in 2196 families. FSIQ = Full-Scale IQ; 17yo = 17-year-old cohort; 11yo = 11-year-old cohort. For a minority of twins (39%), FSIQ represents a within-person average of FSIQ scores from more than one assessment (see text). FSIQ range: 151 − 59 = 92. Parental / step-parental intake age range: 65 − 28 = 37. Offspring intake age range: 20 − 11 = 9.
Genotyping
Participants who provided DNA samples were typed on a genome-wide set of markers with the Illumina Human660W-Quad array. The array comprises 657,366 markers, including 95,876 intensity-only markers that were designed to increase coverage in between SNPs and to tag previously identified regions known to contain CNVs. Both DNA samples and markers were subject to thorough quality-control screens (Miller et al., 2012). The present analyses restricted the sample only to Caucasian participants of European ancestry (i.e., “White” participants), who were identified based upon self-reported ancestry as well as principal components from EIGENSTRAT (Price et al., 2006). After excluding DNA samples that failed quality-control screening, the GWAS sample of 7702 White participants was identified. Details concerning the studies (MTFS and SIBS), genotyping, quality-control, and ancestry determination can be found in Miller et al. (2012).
Phenotypic measurement
Measurement of GCA was included in the design of the intake assessment for most participants, by way of an abbreviated form of the Wechsler Intelligence Scale for Children-Revised (WISC-R) or Wechsler Adult Intelligence Scale-Revised (WAIS-R), as age-appropriate (that is, 16 or younger, and older than 16, respectively). The short forms consisted of two Performance subtests (Block Design and Picture Arrangement) and Verbal subtests (Information and Vocabulary), the scaled scores on which were prorated to determine Full-Scale IQ (FSIQ). FSIQ estimates from this short form have been shown to correlate 0.94 with FSIQ from the complete test (Sattler, 1974). Parents in the SIBS sample are an exception, in that they were not tested with this short form of WAIS-R until the first SIBS follow-up assessment. By design, only one parent per SIBS family returned for this follow-up, which was usually the mother. As a result, IQ data for SIBS fathers is very limited in its availability.
IQ-testing was also included in the design of the second follow-up for both cohorts of MTFS twins, and for the fourth follow-up for the 11-year-old cohort. At these assessments, twins received a further abbreviated form of WAIS-R, consisting only of the Vocabulary and Block Design subtests, the scaled scores on which were again prorated to determine FSIQ. Of the 2,792 twins entered into our analysis, 785 were tested twice, and 306 were tested three times. Multiple testing occasions were spaced approximately seven years apart. To achieve a more reliable assessment of the phenotype, we simply averaged all available measures of FSIQ for each participant, and used these single within-person averages in analysis. FSIQ among participants entered into analysis ranged from 59 to 151 (also see Table 1). Ten participants with FSIQ of 70 or below were included in analyses. Despite their low scores, these participants were not obviously impaired and were capable of completing the multifaceted MTFS/SIBS assessment during their visit. They are therefore unlikely to meet diagnostic criteria for mental retardation (DSM-IV), and instead, merely represent the low end of the normal-range distribution of GCA.
CNV calling
We called CNVs using PennCNV (Wang et al., 2007), which implements a hidden Markov model to resolve CNVs from genome-wide normalized intensity data. To briefly summarize, within a given participant, PennCNV’s algorithm treats each SNP’s combined probe intensity (known as the log R ratio) and relative B-allele intensity as observed realizations of a random vector with distribution defined by the unknown (“hidden”) copy-number state of the SNP. Conditional on some copy-number state, this vector’s two elements are independent, and their marginal distributions have closed-form expression. Thus, the inferential problem is to determine which copy-number state in this mixture model has highest posterior probability. The copy-number state of a given SNP i only depends upon the states of other SNPs by way of the immediately adjacent SNP i – 1 (hence, the “Markov” in “hidden Markov model”). The probability that the copy-number state changes from one SNP to the next is a function of the two states and the distance separating the SNPs. Copy-number variants within a participant are identified as runs of SNPs with the same mutant state. PennCNV further provides an index of the overall quality of a participant’s DNA sample for CNV-calling purposes. This is the standard deviation (across SNPs, within-person) of the log-R ratio, corrected for guanine-cytosine (GC) content (hereinafter, LRRSD). Prior to calling CNVs, we re-clustered intensities from the Illumina array only using data from the 6,110 participants with raw (i.e., not GC-corrected) LRRSD below 0.30.
We only retained CNVs that spanned 15 or more markers and had confidence scores greater than 10.0, which, assuming 1:1 prior odds, corresponds to posterior odds in excess of 20,000 to 1 favoring the called copy-number state over the second-most-likely state. These are somewhat conservative thresholds intended to reduce risk of false-positive CNV calls1. We further excluded CNVs near the centromeres and telomeres, and those overlapping immunoglobulin genes. We made these two exclusions because CNVs identified in these regions have a high probability of being artifactual (Need et al., 2009), the latter because previous research has found that CNV calls in immunoglobulin regions are not representative of germline DNA and tend to depend substantially on DNA source—e.g., saliva versus blood (Need et al., 2009) or lymphoblastoid cell lines versus blood (Wang et al., 2007; Sebat et al., 2004).
Common CNVs are well-tagged by common SNPs: in a major study by the Wellcome Trust Case-Control Consortium (2010), a majority (79%) of di-allelic and tri-allelic CNVs with MAF > 10% had r2 > 0.8 with at least one SNP. Thus, the phenotypic associations of such mutations are indirectly assessed with SNP GWAS. We therefore restricted our analysis to low-frequency CNVs. We operationalized “low-frequency CNV” as a deletion or duplication in which at least 50% of SNP markers spanned are each resolved as mutant-state in fewer than 5% of participants. Roughly, this restricts analysis to those CNVs harbored by only 5% or fewer of participants.
To be included in the analysis, a participant had to have LRRSD below 0.20 (approximately 96th percentile; see Figure 1), and have provided a blood sample for DNA extraction. During CNV calling, 7 participants were identified with apparent somatic mosaicism for partial trisomy or monosomy of >80% of an arm of one chromosome. They were identified by plotting the standard deviation and the inter-quartile range of the B-allele frequency for each chromosomal arm of each participant and then visually inspecting the intensity data (see Figure S1). These participants were excluded from analysis.
Figure 1.
Participants’ total CNV count (A) and length (B), as function of LRRSD. The mean level and degree of variation for both of these CNV variables become markedly higher above LRRSD = 0.20 (vertical lines). Therefore, participants with LRRSD greater than 0.20 were not included in analyses. Participants excluded from analyses due to any other filters, such as mosaicism or providing a non-blood sample, are not represented in these figures.
Upon imposing these filters on CNVs and participants, PennCNV made 91,666 total calls across 5,287 DNA samples. Since only one MZ twin per pair was genotyped, this number of DNA samples is smaller than N = 6,199. Some of these calls represent the same CNV in different people, whereas others represent singletons: PennCNV returned 12,616 unique pairs of CNV start/stop points, a majority of which occurred in only one DNA sample.
We hypothesized that genome-wide mutational burden would negatively associate with FSIQ. To operationalize “mutational burden,” we scored participants on eight different variables (hereinafter, the “CNV variables”) from the low-frequency CNV calls:
Total (i.e., both deletions and duplications) CNV count,
Total CNV length,
Deletion count,
Deletion length,
Duplication count,
Duplication length,
Count of homozygous deletions,
Length of homozygous deletions.
Here, “count” refers to the number of distinct identified CNVs of the specified type, whereas “length” refers to the sum of lengths (in kilobases) of CNVs in the specified type. Table 2 presents descriptive statistics for the CNV variables, including zero-order correlations with FSIQ. All participants harbored at least one deletion CNV (modal count = 12), but not all harbored a duplication (modal count = 2). Of course, homozygous deletions (deletions present on both chromosome copies) were less common—nearly half the sample (47%) had the modal count of zero such mutations.
Table 2.
Descriptive statistics for CNV variables and results of mutational burden analyses
CNV Variable | Mean (SD) | Min – Max | Correlation with FSIQ |
Regression Coefficient |
95% CI | P-value | ΔR2 |
---|---|---|---|---|---|---|---|
Count, All | 17.4 (7.6) | 3 – 172 | −0.002 | 0.0034 | −0.0400, 0.0468 | 0.878 | 0 |
Length, All | 420.8 (431.1) | 10 – 8083 | −0.019 | −0.0007 | −0.0016, 0.0001 | 0.088 | 0.0005 |
Count, Deletions | 13.4 (6.2) | 2 – 169 | 0.010 | 0.0430 | −0.0106, 0.0965 | 0.116 | 0.0004 |
Length, Deletions | 210.9 (240.9) | 3 – 7387 | −0.012 | 0.0003 | −0.0012, 0.0017 | 0.670 | 0 |
Count, Duplications | 4.0 (4.4) | 0 – 100 | −0.017 | −0.0735 | −0.1480, 0.0009 | 0.053 | 0.0006 |
Length, Duplications | 210.0 (353.3) | 0 – 6746 | −0.014 | −0.0013 | −0.0023, −0.0002 | 0.016 | 0.0010 |
Count, Homozygous Deletions | 0.8 (0.9) | 0 – 11 | −0.002 | 0.1129 | −0.2516, 0.4774 | 0.544 | 0 |
Length, Omozygous Deletions | 3.4 (11.6) | 0 – 533 | 0.006 | 0.0006 | −0.0268, 0.0280 | 0.964 | 0 |
Note. “Count” refers to number of identified CNVs of the given type; “length” refers to their combined size in kilobases; “all” refers to both duplications and deletions. The descriptive statistics describe CNVs per individual—for example, the average number of CNVs detected within a participant was 17.4, with at least three CNVs detected in all participants. Likewise, the average participant gained 210 kilobases due to duplications, but some participants had no duplications. Correlations are zero-order Pearson correlations with Full-Scale IQ. Regression covariates are sex, birth year, and first two principal components from EIGENSTRAT (Price et al., 2006). Confidence intervals and p-values are from Student’s t distribution on 6193df. The Cheverud-Nyholt adjusted significance level (Nyholt, 2004) is p < 0.0073. The rightmost column reports increase in Buse’s (1973) R2 over a covariates-only model, for which R2 = 0.0231. Increases in R2 reported as zero were zero out to four decimal places.
In our genome-wide scans, the appropriate focus of analysis is copy-number state at genome-wide SNP loci, rather than CNV calls per se. To see why this is so, consider two SNPs on the same chromosome. In one given participant, the copy-number state at the two SNPs is (say) “duplication” because they bracket one such CNV, whereas in another given participant, the copy-number state at the two SNPs is also “duplication” because a different, larger CNV completely spans both SNPs. But, if either CNV is related to a phenotype, it is because it has affected overlapping or nearby genes in some way, and all duplications affecting the same genes would be expected to have similar effects, even if they arose from different mutational events. What is important, therefore, is which loci in the genome have the mutant copy-number state under investigation.
To conduct genome-wide association scans for specific CNVs, we first coded the copy-number states for the 6,199 participants, at SNP loci across the 22 autosomes, under three coding schemes: code “0” for reference copy-number state, otherwise, code “1” for (a) deletion or duplication; (b) deletion only; (c) homozygous deletion. There were runs on the genome-wide array across which no participant’s copy-number state changed from one adjacent SNP to the next. These runs were each collapsed into a single locus, identified by the first SNP in the run. “Monomorphic” loci, with no copy-number variation among participants, were dropped from analysis.
The genome-wide scans were conducted across 13,806 loci counting deletions and/or duplications, 7,904 loci counting deletions only, and 304 loci counting homozygous deletions only. Under all three coding schemes, the modal mutation frequency across loci was 1/6199 ≈ 0.0002 (that is, the most common count of mutant-state participants per locus was one). Mutant states at loci were rare overall: median frequencies were 7, 8, and 3 out of 6,199 when respectively counting deletions and/or duplications, deletions only, and homozygous deletions only.
Type I error correction
We report nine burden analyses in all, one for each of the eight CNV variables, plus a base model with covariates only. Thus, we tested the significance of eight regression coefficients. A Bonferroni or Šidák correction of the per-comparison Type I error rate would be straightforward, but overly conservative, since the CNV variables are correlated with one another. We therefore applied the Cheverud-Nyholt correction (Nyholt, 2004), which is a Šidák correction for the “effective” number of independent statistical tests, meff, as calculated from the spectral decomposition of m variables’ correlation matrix. Let λ1,…,λm denote the eigenvalues of that correlation matrix, ordered from greatest to least. Then,
(1) |
where
(2) |
For m = 8 CNV variables, we calculate a meff = 7.0045. At the conventional familywise Type I error rate of αFW= 0.05, we therefore have a per-comparison Type I error rate (significance threshold) of αPC1–(1–αFW)1/meff= 0.0073.
We similarly applied this correction to the three genome-wide scans by calculating the correlation matrix for all loci under the relevant coding scheme on a given chromosome, and obtaining a meff for that chromosome. Summing the 22 meff values then provided the effective number of statistical tests for each scan, from which corrected values for αPC were calculated; these were 3.95×10−6, 6.55×10−6, and 1.72×10−4, counting deletions and/or duplications, deletions only, and homozygous deletions only, respectively. These corrected significance levels are perhaps slightly conservative, since our correction ignores the dependence among the three scans due to their overlapping sets of mutations (e.g., homozygous deletions are a subset of deletions).
Analyses
Statistical Power
Because our participants are clustered within families, the effective number of independent observations (Neff) in our sample was less than 6,199. We conducted two sets of power calculations in Quanto (Gauderman & Morrison, 2006), one that assumed 6,199 independent participants (surely an overestimate of our Neff), and one that assumed 2,196 independent participants, which is equal to the number of families in the analysis, and represents a very conservative estimate of our Neff. In the burden analyses, if Neff = 6,199, we would have at least 80% power for a CNV variable that accounts for 0.2% or more of phenotypic variance, and if Neff = 2,196, for a CNV variable that accounts for 0.57% or more of variance. In the genome-wide scans, if Neff = 6,199, we would have at least 80% power to detect a mutation accounting for at least 0.49%, 0.47%, or 0.35% of variance, when respectively counting deletions and/or duplications, deletions only, or homozygous deletions only. If instead Neff = 2,196, we would have at least 80% power to detect a mutation accounting for at least 1.36%, 1.31%, or 0.96% of variance, when respectively counting deletions and/or duplications, deletions only, or homozygous deletions only. For further details about the power of the genome-wide scans, see Figures S2, S3, and S4.
Mutational Burden
We hypothesized that genomic deviation from reference copy-number state due to rare CNVs, i.e. mutational burden, negatively associates with FSIQ. A total of 6,199 Caucasian participants, from 2,196 families, provided valid data both for FSIQ and CNVs, and were entered into analysis. Since our participants are clustered within families, observations were not sampled independently of one another. Moreover, the expected covariance structure among family members depends upon family type. We therefore conducted our multiple-regression analysis with RFGLS, a package for the R statistical computing environment, which conducts feasible generalized least-squares (FGLS) regression in datasets with complicated family structures (Li, Basu, Miller, Iacono, & McGue, 2011).
Our analyses included four covariates: sex, birth year2, and the first-two principal components from EIGENSTRAT (Price et al., 2006), to control for population stratification. The correlations among these covariates were uniformly small, in no case exceeding 0.036 in magnitude (see Table S1). Similarly, the correlations among the covariates and the CNV variables were small, the largest in magnitude being that between deletion count and a dummy variable for female sex (r = −0.048). Since these analyses simultaneously estimated the fixed-effects’ regression coefficients and the residual covariance matrix, the model contained 18 free parameters altogether: the regression intercept, the regression coefficient for the CNV variable, regression coefficients for the four covariates, four residual variance parameters (one each for offspring, mothers, fathers, and step-parents), and residual correlation parameters for eight relationships between family members (MZ twin, DZ twin and full sibling, adoptive siblings, spouses, biological mother-offspring, biological father-offspring, adoptive mother-offspring, adoptive father-offspring).
Genome-wide association scans
We conducted the association scans with RFGLS, using its “rapid-FGLS” approximation, which works as follows. We first calculated the residual covariance matrix from a FGLS regression of FSIQ onto the covariates only (sex, birth year, and the two principal components to control for population stratification). Then, we “plugged in” this matrix for use in each subsequent single-locus FGLS regression of FSIQ onto copy-number state, with covariates. This approximation only requires that a single residual covariance matrix be calculated, which greatly reduces computation time and produces negligible bias in the resulting p-values, as long as no site accounts for more than 1% of phenotypic variance (Li et al., 2011). We resolved to follow-up any suggestive signals from the scans with either a non-parametric test, or a slower but more precise FGLS regression analysis of the implicated locus that would simultaneously estimate the regression coefficients and residual covariance matrix (as was done in the burden analyses), depending on how rare the mutation is.
Results
Table 2 presents the results of our RFGLS regression analyses. The rightmost column provides the increase, relative to the covariates-only model, in Buse’s (1973) R2 for generalized least-squares. The sum of this increase and the base of 0.023 is the proportion of variance in FSIQ attributable to the CNV variable and covariates. Parameter estimates from the covariates-only model are provided in Table S2.
In the mutational burden analyses, only one of the eight CNV variables we analyzed, duplication length, showed nominally significant association with FSIQ (p = 0.016). Its effect size estimate (b̂ = −0.001) approximately corresponds to a predicted loss of 1 IQ point for every million bases of duplicated DNA in a person’s genome. Two other CNV variables strongly correlated with duplication length—duplication count and total length—showed suggestive association with the phenotype (both p < 0.10). The other five regression coefficients were nonsignificant, and were also positive and therefore not in the expected direction. All eight CNV variables provided a negligible increase in the proportion of phenotypic variance explainable by the regression model, and none of their associations remained significant after correction for multiple testing.
Further, we obtained no evidence that any specific CNV is associated with FSIQ (see figure S5, S6, and S7). In the scan counting homozygous deletions, only one p-value was more extreme than 0.01. In the other two scans, even the most extreme p-values fell short of the corrected significance level by at least an order of magnitude. We do not regard any of these association signals suggestive enough to warrant follow-up analyses.
Discussion
The present study obtained no evidence for our hypothesis that low-frequency copy-number variants contribute to the heritable variance of GCA. Contrary to hypothesis, mutational burden was not negatively associated with FSIQ, and genome-wide scans failed to identify any CNV significantly related to the phenotype. Thus, we did not replicate the mutational-burden results of Yeo et al. (2011). Instead, our results were much like those of recent studies (MacLeod et al., 2012; Bagshaw et al., 2013; McRae et al., 2013), all of which had larger samples than Yeo et al. (2011). Both our data and those of these recent studies suggest it is unlikely that CNVs will provide a missing piece in the puzzle of what has become known as “missing heritability” (Maher, 2008) for this particular quantitative trait.
In particular, the results of our aggregate mutational burden analyses with CNVs appear especially unimpressive compared to recently developed methods that leverage genome-wide SNP data by predicting the phenotype from the combined effect of multiple SNPs at once. One such approach is to select SNPs that manifest suggestive signal in a GWAS, and weight participants’ reference-allele counts for each selected SNP by its GWAS regression coefficient. Each participant’s sum of weighted allele counts constitutes his/her “polygenic score.” Polygenic scores are then employed as predictors of the phenotype, and cross-validated in (ideally) an independent sample not used in the GWAS. This genetic scoring approach has been used to predict, under cross-validation, as much as 3% of disease risk for schizophrenia (International Schizophrenia Genetics Consortium, 2009) and as much as 16.8% of the variance in height (Lango Allen et al., 2010). More impressive results can be achieved by a method of analysis that treats the influence of every SNP in the GWAS as a latent random effect. Some (e.g., Benjamin et al., 2012) refer to this method as GREML, for “genomic-relatedness-matrix restricted maximum-likelihood,” though it is widely identified with the software written to implement it, GCTA (Genome-wide Complex Trait Analysis; Yang, Lee, Goddard, & Visscher, 2011). Very briefly summarized, it is accomplished by calculating a genetic relationship matrix between classically unrelated individuals and performing best linear unbiased prediction of the SNP variance component in a mixed linear model fit by restricted maximum likelihood. Via GREML, common SNPs can account for 22% to 51% of the variance in intelligence (Davies et al., 2011; Benyamin et al., 2013) and 45% of the variance in height (Yang et al., 2010). Biometric heritability estimates for adult IQ, in the normal range of variation in Western countries, typically range from 50% to 70% (Bouchard & McGue, 2003; Deary, Johnson, & Houlihan, 2009), somewhat above these GREML estimates. However, it appears from our results that CNVs—a different class of genetic polymorphism from SNPs—show little potential to help “fill in the gap” between classical heritability estimates and the somewhat lower GREML variance components.
On the other hand, while no SNP has been consistently associated with GCA at genome-wide significance levels (Benyamin et al., 2013) researchers have identified specific CNVs associated with syndromal intellectual disability and other neurodevelopmental pathology (reviewed by Mefford et al., 2012). This elicits the hypothesis that there might exist specific CNVs contributing to normal-range variation in cognitive ability. But, our three genome-wide association scans, each using a different coding scheme for copy-number state, failed to discover any association signal exceeding our carefully adjusted significance threshold, and provide no support for the hypothesis.
However, we readily acknowledge that these conclusions are mitigated by several limitations of our study. First, our sample size of over 6,000 participants may have afforded insufficient power to reliably detect true CNV effects of quite small magnitude. For the sake of comparison, the GIANT Consortium (Lango Allen et al., 2010) required about 180,000 participants—a sample size of truly Brobdingnagian proportions—to confirm 180 SNPs associated with stature. We are also bound by the inherent limitations of our Illumina 660W-Quad array. A SNP array with denser coverage of the genome would allow for higher-fidelity CNV calls. Despite the very modest sample size in Yeo et al.’s (2011) study, one of its strengths, as both MacLeod et al. (2012) and McRae et al. (2013) point out, is their use of the Illumina Human 1M Duo BeadChip array. This chip allowed them to genotype their participants on over 1 million SNPs and achieve denser coverage of the genome than did we or any of the three recent studies.
We also face the limitation of any analysis of low-frequency variants, which is simply that statistical power to detect association is limited for a mutation called only in a handful of participants, unless it has an especially large effect on mean-level phenotype (see Figures S2, S3, and S4). Whereas this limitation pertains to detecting the effect of called low-frequency variants, there is another limitation pertaining to detecting such variants in the first place: it is possible we may have “missed,” and not called, some low-baserate CNVs. To reduce the risk of spurious calls, we have taken a conservative approach: we only retained CNVs with a confidence score greater than 10.0 and spanning 15 or more markers, and dropped CNVs occurring in artifact-prone regions of the genome. McRae et al. (2013) also imposed a minimum confidence score, and remarked that, based on manual inspection of data, it served to reduce false-positives. A previous study by Pankratz et al. (2011) reported that analysis of PennCNV calls subjected to similar filters had highest power to detect a known true positive, and had the lowest rate of significant association signals for loci that subsequently failed molecular validation. Importantly, if Pankratz et al. had only retained those CNVs called by more than one software algorithm, they would have missed the signal for PARK2. Guided by their experience, we elected not to use “consensus calls,” which would entail retaining only those CNVs in the intersection of the set of PennCNV calls and the set(s) of calls from other software. However, we restricted analysis to participants who had provided adequate-quality DNA from a blood sample, and who passed a screen for somatic mosaicism. We regard our filters on CNV calls and DNA samples as adequate safeguards against spurious CNV calls. Nonetheless, they commensurately diminish the sensitivity of CNV detection for short and/or extremely rare variants. Therefore, from our data, we cannot rule out the possibility that such CNVs with substantial phenotypic effects exist. Such mutations are difficult to call reliably, and attempting to discover them would require acceptance of a higher error rate. If only small-effect or extremely rare CNVs contribute to variation in GCA, further studies will require still larger samples to make headway along this line of research.
As with SNP-association studies, artifacts from population stratification are a potential concern in CNV-association studies (McCarroll & Altshuler, 2007), but it is doubtful that our results are due to stratification suppressing true association signals. The analysis was restricted to participants who had been identified as White based on both self-report and principal components extracted from genome-wide SNP data (Miller et al., 2012). We covaried out the first two of these components in our analyses, but the conclusions from the burden analyses and genome-wide scans did not change when covarying out all 10 components that had been computed. Visual inspection of graphs of these components did not reveal any apparent heterogeneity, except for a cluster of fifty relative outliers on the first component. Excluding these fifty participants did not change the conclusions of the burden analyses.
Perhaps our most surprising (non-)finding is the lack of association between rare homozygous deletions and general cognitive ability. Because GCA was likely under selection pressure in humans’ ancestral environment of adaptation (Jensen, 1998), one would anticipate that most mutations with the potential to cause loss-of-function would detrimentally affect the trait (Gangestad, 2010), and that such mutations would not be common (unless of small effect). But, the present study—which utilized a large, ethnically homogeneous sample, assessed on an individually administered IQ test—found no evidence that mutational burden due to low-frequency homozygous deletions, nor any specific such deletion, is significantly related to GCA. The effect sizes from the burden analyses were not even in the expected (negative) direction. Although CNVs can cause mental retardation and other forms of neurodevelopmental psychopathology, normal-range variation in general cognitive ability is “highly heritable and polygenic” (Davies et al., 2011), and its genetic architecture may also be more redundant and more robust to mutational disruption than might have been guessed.
Supplementary Material
Red = contains severe mosaicism with lots of CNV calls in the region and were therefore excluded from analyses (n=7 samples, 8 chromosomal arms); Green = contains mild mosaicism, but not severe enough to lead to an intensity change called by PennCNV or to be excluded from analyses; Purple = contains large tracts of homozygosity that are biasing the BAF statistics; Blue = contains large duplications (0.5–6 Mb in size); Black = normal chromosomal arm BAF distributions.
The solid curve assumes a sample of 6,199 independent participants, whereas the dotted curve assumes 2,196 independent participants. “Mutation effect size” refers to mean difference in FSIQ between the reference-state and mutant-state groups. Each point was plotted by first calculating what the sizes of the reference-state and mutant-state groups would be, given the Neff and the proportion of participants having the mutant copy-number state (the “observed mutation frequency”). Then, the mutation effect size (the ordinate position of the point) was calculated from the power function for a two-tailed, independent-samples t-test, conditional on 80% power, a Type I error rate of 3.95 × 10−6, and the two group sizes. The lowest plotted mutation frequency is 0.0015, corresponding to about 3 of 2,196 participants harboring the mutation, and to a mutation effect size near or outside the bounds of the plot (31.61 if Neff = 6,199, and 81.45 if Neff = 2,196). At the other extreme, given a mutation frequency of 0.05, we would have power to detect an effect of 8.12 if Neff = 2,196, or 4.79 if Neff = 6,199.
The solid curve assumes a sample of 6,199 independent participants, whereas the dotted curve assumes 2,196 independent participants. “Mutation effect size” refers to mean difference in FSIQ between the reference-state and mutant-state groups. Each point was plotted by first calculating what the sizes of the reference-state and mutant-state groups would be, given the Neff and the proportion of participants having the mutant copy-number state (the “observed mutation frequency”). Then, the mutation effect size (the ordinate position of the point) was calculated from the power function for a two-tailed, independent-samples t-test, conditional on 80% power, a Type I error rate of 6.55 × 10−6, and the two group sizes. The lowest plotted mutation frequency is 0.0015, corresponding to about 3 of 2,196 participants harboring the mutation, and to a mutation effect size near or outside the bounds of the plot (30.75 if Neff = 6,199, and 77.48 if Neff = 2,196). At the other extreme, given a mutation frequency of 0.05, we would have power to detect an effect of 7.95 if Neff = 2,196, or 4.70 if Neff = 6,199.
The solid curve assumes a sample of 6,199 independent participants, whereas the dotted curve assumes 2,196 independent participants. “Mutation effect size” refers to mean difference in FSIQ between the reference-state and mutant-state groups. Each point was plotted by first calculating what the sizes of the reference-state and mutant-state groups would be, given the Neff and the proportion of participants having the mutant copy-number state (the “observed mutation frequency”). Then, the mutation effect size (the ordinate position of the point) was calculated from the power function for a two-tailed, independent-samples t-test, conditional on 80% power, a Type I error rate of 1.72 × 10−4, and the two group sizes. The lowest plotted mutation frequency is 0.0015, corresponding to about 3 of 2,196 participants harboring the mutation, and to a mutation effect size of 25.17 if Neff = 6,199, and 55.22 (beyond the bounds of the plot) if Neff = 2,196. At the other extreme, given a mutation frequency of 0.05, we would have power to detect an effect of 6.81 if Neff = 2,196, or 4.03 if Neff = 6,199.
The threshold for genome-wide significance is about 5.40, outside the bounds of the plot.
The threshold for genome-wide significance is about 5.18, outside the bounds of the plot.
The threshold for genome-wide significance is about 3.76, outside the bounds of the plot.
Acknowledgements
This research was supported in part by USPHS Grants from the National Institute on Alcohol Abuse and Alcoholism (AA09367 and AA11886), the National Institute on Drug Abuse (DA05147, DA13240, and DA024417), and the National Institute on Mental Health (MH066140). The first author (RMK) was supported by a Doctoral Dissertation Fellowship from the University of Minnesota Graduate School.
The authors give their special thanks to William S. Oetting and Niels G. Waller for their helpful comments on a draft of this paper.
Footnotes
In a recent association study of CNVs and familial Parkinson disease (Pankratz et al., 2011), PARK2, a known true positive, evidenced a genome-wide significant signal only when analysis was restricted to PennCNV calls spanning 20 or more markers, and not when using a less stringent threshold nor when using different software (and therefore, not when using a “consensus-call” approach, which only retains CNVs called by more than one software algorithm). Furthermore, analysis of calls made with the conservative threshold did not yield genome-wide significant signal at two loci, implicated in other analyses, that ultimately failed molecular validation.
We covaried out birth year, rather than age, for three reasons. First, IQ tests are age-normed to begin with. Second, the FSIQ scores of a minority of our twins are within-person averages of FSIQs from more than one testing occasion. In a sense, these wins have multiple ages at testing, but of course, they have only one birth year. Third, the nuisance confound for which we want to correct is the Flynn Effect (first reported by Flynn, 1984, 1987), which is the secular trend of increasing IQ scores with each generation, and is directly related to birth year, and not to age >per se. Surprisingly, our data seem to contradict the Flynn Effect at a glance. In the covariates-only FGLS regression, the estimated coefficient for birth year was −0.087 (Table S2), indicating that later birth year predicted lower IQ. The zero-order Pearson correlation of birth year and FSIQ was also negative (Table S1).
References
- American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders. 4th ed. Washington DC: Author; 1994. [Google Scholar]
- Bagshaw ATM, Horwood LJ, Liu Y, Fergusson DM, Sullivan PF, Kennedy MA. No effect of genome-wide copy number variation on measures of intelligence in a New Zealand birth cohort. PLoS ONE. 2013;8(1):e55208. doi: 10.1371/journal.pone.0055208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Banks GC, Batchelor JH, McDaniel MA. Smarter people are (a bit) more symmetrical: A meta-analysis of the relationship between intelligence and fluctuating asymmetry. Intelligence. 2010;38:393–401. [Google Scholar]
- Benjamin DJ, Cesarini D, van der Loos MJHM, Dawes CT, Koellinger PD, Magnusson PKE, … Visscher PM. The genetic architecture of economic and political preferences. PNAS. 2013;109(21):8026–8031. doi: 10.1073/pnas.1120666109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Benyamin B, St Pourcaine B, Davis OS, Davies G, Hansell NK, Brion M-JA, Visscher PM. Childhood intelligence is heritable, highly polygenic and associated with FNBP1L . Molecular Psychiatry. 2013 doi: 10.1038/mp.2012.184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bouchard TJ, McGue M. Genetic and environmental influences on human psychological differences. Journal of Neurobiology. 2003;54:4–45. doi: 10.1002/neu.10160. [DOI] [PubMed] [Google Scholar]
- Butcher LM, Davis OSP, Craig IW, Plomin R. Genome-wide quantitative trait locus association scan of general cognitive ability using pooled DNA and 500K single nucleotide polymorphism microarrays. Genes, Brain and Behavior. 2008;7:435–446. doi: 10.1111/j.1601-183X.2007.00368.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Coe BP, Girirajan S, Eichler EE. The genetic variability and commonality of neurodevelopmental disease. American Journal of Medical Genetics Part C (Seminars in Medical Genetics) 2012;160C:118–129. doi: 10.1002/ajmg.c.31327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Colella S, Yau C, Taylor JM, Mirza G, Butler H, Clouston P, … Ragoussis J. QuantiSNP: An Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Research. 2007;35(6):2013–2025. doi: 10.1093/nar/gkm076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cooper GM, Mefford HC. Detection of copy number variation using SNP genotyping. In: Schwarz PH, Wesselschmidt RL, editors. Human Pluripotent Stem Cells: Methods and Protocols. New York: Springer Science+Business Media, LLC.; 2011. pp. 243–252. [DOI] [PubMed] [Google Scholar]
- Davies G, Tenesa A, Payton A, Yang J, Harris SE, Liewald D, … Deary IJ. Genome-wide association studies establish that human intelligence is highly heritable and polygenic. Molecular Psychiatry. 2011;16(10):996–1005. doi: 10.1038/mp.2011.85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davis OSP, Butcher LM, Docherty SJ, Meaburn EL, Curtis CJC, Plomin R. A three-stage genome-wide association study of general cognitive ability: Hunting the small effects. Behavior Genetics. 2010;40:759–767. doi: 10.1007/s10519-010-9350-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deary IJ. Intelligence. Annual Review of Psychology. 2012;63:453–482. doi: 10.1146/annurev-psych-120710-100353. [DOI] [PubMed] [Google Scholar]
- Deary IJ, Johnson W, Houlihan LM. Genetic foundations of human intelligence. Human Genetics. 2009;126:215–232. doi: 10.1007/s00439-009-0655-4. [DOI] [PubMed] [Google Scholar]
- Flynn JR. The mean IQ of Americans: Massive gains 1932 to 1978. Psychological Bulletin. 1984;95(1):29–51. [Google Scholar]
- Flynn JR. Massive IQ gains in 14 nations: What IQ tests really measure. Psychological Bulletin. 1987;101(2):171–191. [Google Scholar]
- Gangestad SW. Evolutionary biology looks at behavior genetics. Personality and Individual Differences. 2010;49:289–295. [Google Scholar]
- Gangestad SW, Thornhill R. Individual differences in developmental precision and fluctuating asymmetry: a model and its implications. Journal of Evolutionary Biology. 1999;12:402–416. [Google Scholar]
- Gauderman WJ, Morrison JM. QUANTO 1.1: A computer program for power and sample size calculations for genetic-epidemiology studies [software and manual] 2006 Available at http://hydra.usc.edu/gxe/
- Gläscher J, Rudrauf D, Colom R, Paul LK, Tranel D, Damasio H, Adolphs R. Distributed neural system for general intelligence revealed by lesion mapping. PNAS. 2010;107(10):4705–4709. doi: 10.1073/pnas.0910397107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Glessner JT, Connolly JJM, Hakonarson H. Rare genomic deletions and duplications and their role in neurodevelopmental disorders. Current Topics in Behavioral Neuroscience. 2012;12:345–360. doi: 10.1007/7854_2011_179. [DOI] [PubMed] [Google Scholar]
- Gottfredson LS. g, Jobs, and Life. In: Nyborg H, editor. The Scientific Study of General Intelligence: Tribute to Arthur R. Jensen. New York: Pergamon; 2003. pp. 293–342. [Google Scholar]
- Hawrylycz MJ, Lein ES, Guillozet-Bongaarts AL, Shen EH, Ng L, Miller JA, Jones AR. An anatomically comprehensive atlas of the adult human brain transcriptome. Nature. 2012;489:391–399. doi: 10.1038/nature11405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Herrnstein RJ, Murray C. The Bell Curve: Intelligence and Class Structure in American Life. New York: Simon & Schuster, Inc.; 1994. [Google Scholar]
- Iacono WG, Carlson SR, Taylor J, Elkins IJ, McGue M. Behavioral disinhibition and the development of substance-use disorders: Findings from the Minnesota Twin Family Study. Development and Psychopathology. 1999;11:869–900. doi: 10.1017/s0954579499002369. [DOI] [PubMed] [Google Scholar]
- Iacono WG, McGue M. Minnesota Twin Family Study. Twin Research. 2002;5(5):482–487. doi: 10.1375/136905202320906327. [DOI] [PubMed] [Google Scholar]
- The International Schizophrenia Consortium. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460:748–752. doi: 10.1038/nature08185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- The International Schizophrenia Consortium. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460:748–752. doi: 10.1038/nature08185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jensen AR. The g Factor: The Science of Mental Ability. London: Praeger; 1998. [Google Scholar]
- Jung RE, Haier RJ. The parieto-frontal integration theory of intelligence: Converging neuroimaging evidence. Behavioral & Brain Sciences. 2007;30:135–187. doi: 10.1017/S0140525X07001185. [DOI] [PubMed] [Google Scholar]
- Keyes MA, Malone SM, Elkins IJ, Legrand LN, McGue M, Iacono WG. The Enrichment Study of the Minnesota Twin Family Study: Increasing the yield of twin families at high risk for externalizing psychopathology. Twin Research and Human Genetics. 2009;12(5):489–501. doi: 10.1375/twin.12.5.489. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lango Allen H, Estrada K, Lettre G, Berndt SI, Weedon MN, Rivadeneira F, et al. on behalf of the GIANT Consortium. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature. 2010;467:832–838. doi: 10.1038/nature09410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li X, Basu S, Miller MB, Iacono WG, McGue M. A rapid generalized least squares model for genome-wide quantitative trait association analysis. Human Heredity. 2011;71:67–82. doi: 10.1159/000324839. Package and manual available at http://www.cran.r-project.org/web/packages/RFGLS/ [DOI] [PMC free article] [PubMed] [Google Scholar]
- MacLeod AK, Davies G, Payton A, Tenesa A, Harris SE, Liewald D, Deary IJ. Genetic copy number variation and general cognitive ability. PLoS One. 2012;7(12):e37385. doi: 10.1371/journal.pone.0037385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maher B. The case of the missing heritability. Nature. 2008;456(6):18–21. doi: 10.1038/456018a. [DOI] [PubMed] [Google Scholar]
- McCarroll SA, Altshuler DM. Copy-number variation and association studies of human disease. Nature Genetics. 2007;39(supplement):S37–S42. doi: 10.1038/ng2080. [DOI] [PubMed] [Google Scholar]
- McGue M, Keyes M, Sharma A, Elkins I, Legrand L, Johnson W, Iacono WG. The environments of adopted and non-adopted youth: Evidence on range restriction from the Sibling Interaction and Behavior Study (SIBS) Behavior Genetics. 2007;37:449–462. doi: 10.1007/s10519-007-9142-7. [DOI] [PubMed] [Google Scholar]
- McRae AF, Wright MJ, Hanselle NK, Montgomery GW, Martin NG. No association between general cognitive ability and rare copy number variation. Behavior Genetics. 2013 doi: 10.1007/s10519-013-9587-9. Advance online publication. [DOI] [PubMed] [Google Scholar]
- Mefford HC, Batshaw ML, Hoffman EP. Genomics, intellectual disability, and autism. New England Journal of Medicine. 2012;366:733–743. doi: 10.1056/NEJMra1114194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miller MB, Basu S, Cunningham J, Eskin E, Malone SM, Oetting WS, McGue M. The Minnesota Center for Twin and Family Research Genome-Wide Association Study. Twin Research & Human Genetics. 2012;15(6):767–774. doi: 10.1017/thg.2012.62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nagelkerke NJD. A note on a general definition of the coefficient of determination. Biometrika. 1991;78(3):691–692. [Google Scholar]
- Need AC, Ge D, Weale ME, Maia J, Feng S, Heinzen EL, Goldstein DB. A genome-wide investigation of SNPs and CNVs in schizophrenia. PLoS Genetics. 2009;5(2):e1000373. doi: 10.1371/journal.pgen.1000373. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Neubauer AC, Fink A. Intelligence and neural efficiency. Neuroscience and Biobehavioral Reviews. 2009;33:1004–1023. doi: 10.1016/j.neubiorev.2009.04.001. [DOI] [PubMed] [Google Scholar]
- Nyholt DR. A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. American Journal of Human Genetics. 2004;74:765–769. doi: 10.1086/383251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pankratz N, Dumitriu A, Hetrick KN, Sun M, Latourelle JC, Wilk JB the PSG?PROGENI and GenePD Investigators, Coordinators and Molecular Genetic Laboratories. Copy number variation in familial Parkinson disease. PLoS ONE. 2011;6(8):e20988. doi: 10.1371/journal.pone.0020988. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pinto D, Pagnamenta AT, Klei L, Anney R, Merico D, Regan R, Betancur C. Functional impact of global rare copy number variation in autism spectrum disorders. Nature. 2010;466:368–372. doi: 10.1038/nature09146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics. 2006;38(8):904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]
- Sattler JM. Assessment of Children (Revised) Philadelphia: W. B. Saunders Company; 1974. [Google Scholar]
- Scherer SW, Lee C, Birney E, Altshuler DM, Eichler EE, Feuk L. Challenges and standards in integrating surveys of structural variation. Nature Genetics. 2007;39:S7–S15. doi: 10.1038/ng2093. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sebat J, Lakshmi B, Malhotra D, Troge J, Lese-Martin C, Walsh T, Wigler M. Strong association of de novo copy number mutations with autism. Science. 2007;316:445–449. doi: 10.1126/science.1138659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, Wigler M. Large-scale copy number polymorphism in the human genome. Science. 2004;305:525–528. doi: 10.1126/science.1098918. Supporting online material: http://www.sciencemag.org/content/suppl/2004/07/22/305.5683.525.DC1.html. [DOI] [PubMed] [Google Scholar]
- Wang K, Li M, Hadley D, Liu R, Glessner J, et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Research. 2007;17:1665–1674. doi: 10.1101/gr.6861907. Software and manual available at: http://www.openbioinformatics.org/penncnv/ [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wellcome Trust Case Control Consortium. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature. 2010;464:713–720. doi: 10.1038/nature08979. Supporting online material: http://www.nature.com/nature/journal/v464/n7289/suppinfo/nature08979.html. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Woodberry KA, Giuliano AJ, Seidman LJ. Premorbid IQ in schizophrenia: A meta-analytic review. American Journal of Psychiatry. 2008;165:579–587. doi: 10.1176/appi.ajp.2008.07081242. [DOI] [PubMed] [Google Scholar]
- Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Visscher PM. Common SNPs explain a large proportion of the heritability for human height. Nature Genetics. 2010;42(7):565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: A tool for genome-wide complex trait analysis. The American Journal of Human Genetics. 2011;88:76–82. doi: 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yeo RA, Gangestad SW, Liu J, Calhoun VD, Hutchinson KE. Rare copy number deletions predict individual variation in intelligence. PLoS ONE. 2011;6(1):e16339. doi: 10.1371/journal.pone.0016339. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Red = contains severe mosaicism with lots of CNV calls in the region and were therefore excluded from analyses (n=7 samples, 8 chromosomal arms); Green = contains mild mosaicism, but not severe enough to lead to an intensity change called by PennCNV or to be excluded from analyses; Purple = contains large tracts of homozygosity that are biasing the BAF statistics; Blue = contains large duplications (0.5–6 Mb in size); Black = normal chromosomal arm BAF distributions.
The solid curve assumes a sample of 6,199 independent participants, whereas the dotted curve assumes 2,196 independent participants. “Mutation effect size” refers to mean difference in FSIQ between the reference-state and mutant-state groups. Each point was plotted by first calculating what the sizes of the reference-state and mutant-state groups would be, given the Neff and the proportion of participants having the mutant copy-number state (the “observed mutation frequency”). Then, the mutation effect size (the ordinate position of the point) was calculated from the power function for a two-tailed, independent-samples t-test, conditional on 80% power, a Type I error rate of 3.95 × 10−6, and the two group sizes. The lowest plotted mutation frequency is 0.0015, corresponding to about 3 of 2,196 participants harboring the mutation, and to a mutation effect size near or outside the bounds of the plot (31.61 if Neff = 6,199, and 81.45 if Neff = 2,196). At the other extreme, given a mutation frequency of 0.05, we would have power to detect an effect of 8.12 if Neff = 2,196, or 4.79 if Neff = 6,199.
The solid curve assumes a sample of 6,199 independent participants, whereas the dotted curve assumes 2,196 independent participants. “Mutation effect size” refers to mean difference in FSIQ between the reference-state and mutant-state groups. Each point was plotted by first calculating what the sizes of the reference-state and mutant-state groups would be, given the Neff and the proportion of participants having the mutant copy-number state (the “observed mutation frequency”). Then, the mutation effect size (the ordinate position of the point) was calculated from the power function for a two-tailed, independent-samples t-test, conditional on 80% power, a Type I error rate of 6.55 × 10−6, and the two group sizes. The lowest plotted mutation frequency is 0.0015, corresponding to about 3 of 2,196 participants harboring the mutation, and to a mutation effect size near or outside the bounds of the plot (30.75 if Neff = 6,199, and 77.48 if Neff = 2,196). At the other extreme, given a mutation frequency of 0.05, we would have power to detect an effect of 7.95 if Neff = 2,196, or 4.70 if Neff = 6,199.
The solid curve assumes a sample of 6,199 independent participants, whereas the dotted curve assumes 2,196 independent participants. “Mutation effect size” refers to mean difference in FSIQ between the reference-state and mutant-state groups. Each point was plotted by first calculating what the sizes of the reference-state and mutant-state groups would be, given the Neff and the proportion of participants having the mutant copy-number state (the “observed mutation frequency”). Then, the mutation effect size (the ordinate position of the point) was calculated from the power function for a two-tailed, independent-samples t-test, conditional on 80% power, a Type I error rate of 1.72 × 10−4, and the two group sizes. The lowest plotted mutation frequency is 0.0015, corresponding to about 3 of 2,196 participants harboring the mutation, and to a mutation effect size of 25.17 if Neff = 6,199, and 55.22 (beyond the bounds of the plot) if Neff = 2,196. At the other extreme, given a mutation frequency of 0.05, we would have power to detect an effect of 6.81 if Neff = 2,196, or 4.03 if Neff = 6,199.
The threshold for genome-wide significance is about 5.40, outside the bounds of the plot.
The threshold for genome-wide significance is about 5.18, outside the bounds of the plot.
The threshold for genome-wide significance is about 3.76, outside the bounds of the plot.