Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Apr 1.
Published in final edited form as: Neuroimage. 2015 Jan 16;109:505–514. doi: 10.1016/j.neuroimage.2015.01.029

A Kernel Machine Method for Detecting Effects of Interaction Between Multidimensional Variable Sets: An Imaging Genetics Application

Tian Ge a,b,, Thomas E Nichols c, Debashis Ghosh d, Elizabeth C Mormino e, Jordan W Smoller b,f,#, Mert R Sabuncu a,g,†,#; the Alzheimer's Disease Neuroimaging Initiative*
PMCID: PMC4339421  NIHMSID: NIHMS660823  PMID: 25600633

Abstract

Measurements derived from neuroimaging data can serve as markers of disease and/or healthy development, are largely heritable, and have been increasingly utilized as (intermediate) phenotypes in genetic association studies. To date, imaging genetic studies have mostly focused on discovering isolated genetic effects, typically ignoring potential interactions with non-genetic variables such as disease risk factors, environmental exposures, and epigenetic markers. However, identifying significant interaction effects is critical for revealing the true relationship between genetic and phenotypic variables, and shedding light on disease mechanisms. In this paper, we present a general kernel machine based method for detecting effects of interaction between multidimensional variable sets. This method can model the joint and epistatic effect of a collection of single nucleotide polymorphisms (SNPs), accommodate multiple factors that potentially moderate genetic influences, and test for nonlinear interactions between sets of variables in a flexible framework. As a demonstration of application, we applied the method to data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) to detect the effects of the interactions between candidate Alzheimer's disease (AD) risk genes and a collection of cardiovascular disease (CVD) risk factors, on hippocampal volume measurements derived from structural brain magnetic resonance imaging (MRI) scans. Our method identified that two genes, CR1 and EPHA1, demonstrate significant interactions with CVD risk factors on hippocampal volume, suggesting that CR1 and EPHA1 may play a role in influencing AD-related neurodegeneration in the presence of CVD risks.

Keywords: interaction, kernel machines, Alzheimer's disease, cardiovascular disease, imaging genetics

1 Introduction

Genetic components play a significant role in most brain-related illnesses. The discovery of genetic effects can elucidate the biological pathways and processes underlying neurological disorders, and ultimately yield prevention and treatment strategies. In the field of imaging genetics, this goal is approached by using quantitative brain image derived measurements as intermediate or endophenotypes [Biffi et al., 2010; Ge et al., 2014; Gottesman and Shields, 1972; Gottesman and Gould, 2003; Meyer-Lindenberg and Weinberger, 2006; Sabuncu et al., 2012], which are biomarkers of disease, and are believed to be closer to the disease process and have a simpler genetic architecture than clinical diagnoses.

However, heritability analyses and genome-wide association studies (GWAS) [Visscher et al., 2012] of complex genetic phenotypes ranging from human height [Yang et al., 2010], body mass index, von Willebrand factor [Yang et al., 2011], schizophrenia [Lee et al., 2012b], to various volume-, surface- or connection-based brain measurements computed from structural, functional or diffusion images [Thompson et al., 2013], indicate that phenotypic variation cannot be solely explained by genetics. The interactions between genetic and non-genetic variables such as disease risk factors, environmental exposures and epigenetic markers may play an important role in the variation of complex phenotypes [Sullivan et al., 2012], and the influence of genetic variants on the likelihood, development, and progression of a brain illness may be indirect and interactive. The presence of interactions implies that genetics can modulate the effects of various risk factors on the disease, producing variations across subjects even exposed to the same environment. Alternatively, the effect of the genotype on outcomes can depend on one or more risk factors or environmental exposures. For example, Caspi et al. [2002] reported that the effect of maltreatment of children from birth to adulthood on the development of antisocial behavior is moderated by a functional polymorphism in the MAOA gene. The genotype of a locus known as 5-HTTLPR located in the promoter region of the serotonin transporter gene was found to moderate the influence of stressful life events on depression [Caspi et al., 2003]. Therefore, identifying potential genetic interactions with non-genetic variables can be critical in understanding the true relationship between genotype and phenotype.

Thanks to recent advances in genotyping technology, it is now possible to investigate genetic interaction effects involving specific genetic risk factors, candidate genes, or even the entire genome, in unrelated individuals. Current statistical methods to test for interactions largely utilize multiple linear regression models with quantitative phenotypes, or logistic regression models with binary outcomes, in both the genetics community [Aschard et al., 2011; Kraft et al., 2007; Paré et al., 2010], and the imaging community (e.g., psychophysiological interactions analysis [Friston et al., 1997]). In these analyses, both main effects are typically univariate variables, and the interaction is modeled by their product. Although a number of recent papers have tried to improve the power of the classical univariate interaction test [Hsu et al., 2012; Mukherjee and Chatterjee, 2008; Murcray et al., 2011], they suffer from two main drawbacks when detecting interactions between genetic variants and non-genetic variables. First, converging evidence has shown that many complex brain disorders are polygenic and influenced by up to thousands of genetic variants with small effects [Purcell et al., 2009; Sullivan et al., 2012]. Analyzing each individual locus may not identify any reliable results with a small to moderate sample size, which is typical in imaging genetic studies. And second, it is now not uncommon to collect a large number of disease risk factors, environmental variables, or epigenetic markers in a single study. The product of all possible pairs of genetic variants and non-genetic variables may be dauntingly large, which dramatically increases the burden of computation and multiple testing correction. More critically, Lin et al. [2013] showed that if the main effects of a set of genetic variants are associated with the phenotype, testing each single genetic variant for interactions can be biased.

In this paper, inspired by Li and Cui [2012], we present a semiparametric kernel machine based method to detect interactions between multidimensional variable sets. Kernel machine based methods have been previously used in association studies between single nucleotide polymorphism (SNP) sets and complex diseases or imaging phenotypes [Kwee et al., 2008; Liu et al., 2007; Wu et al., 2010, 2011], and have been applied to voxel-wise genome-wide association studies to obtain boosted statistical power [Ge et al., 2012; Stein et al., 2010]. Here, to jointly model the genetic and non-genetic variables, and their interactions, we extend the original kernel machine based method, and include three appropriately selected kernels in the model; one for genetic variants, one for non-genetic variables, and a third one, which is the Hadamard product of the genetic and non-genetic kernel, for the interaction effect. The genetic kernel provides a biologically-informed way to capture epistasis in a set of SNPs and model their joint effect on the phenotype. SNP sets can be formed by SNPs located in or near a gene, within a gene pathway or a haplotype structure; risk SNPs identified by previous studies or other a priori biological information [Wu et al., 2010]. Examining the collective contribution of SNPs further opens possibilities to investigate cumulative effects of rare variants [Wu et al., 2011], and often provides improved reproducibility, biologically informed insights, and increased power relative to univariate methods. The non-genetic kernel allows for modeling the joint effect of multiple variables. By using a connection to linear mixed effects models, the interaction effect can be tested by a variance component score test [Lin, 1997; Liu et al., 2007]. The proposed method thus offers a flexible framework to account for epistatic effects, multiple non-genetic factors, and test for the overall interaction effect between sets of multidimensional variables.

As a demonstration of application, we applied the proposed method to detect the interaction effects between candidate late-onset Alzheimer's disease (AD) risk genes and cardiovascular disease (CVD) risk factors including age, gender, body mass index (BMI), hypertension, current smoking status and diabetes, on hippocampal volume derived from structural brain magnetic resonance imaging (MRI) scans, which is associated with AD risk and future AD progression [Sperling et al., 2011].

AD, the most common form of dementia, is characterized by memory loss, cognitive decline, and other symptoms. The cause and progression of AD are not well understood. As a disease that often co-occurs with AD in the elderly population, vascular pathology is among the potential factors to increase the risk of AD. In particular, increasing evidence shows that many CVD risk factors including hypertension, smoking and diabetes are associated with cognitive decline and neurodegeneration, and may increase the risk and accelerate the progression of AD [Helzner et al., 2009; Kivipelto et al., 2001; Lo et al., 2012; Luchsinger et al., 2005; Purnell et al., 2009]. For example, the neurovascular hypothesis of AD suggests that neurovascular dysfunction reduces the clearance of amyloid beta (Aβ) peptide across the blood-brain barrier, which could initiate a series of pathological processes and ultimately lead to neuronal injury and loss [Zlokovic, 2005]. Moreover, recent studies have identified that the interaction within multiple CVD risk factors, and the interaction between CVD risk factors and the apolipoprotein E (APOE) polymorphism, the largest genetic determinant of late-onset AD susceptibility, may significantly influence the risk and progression of AD [Borenstein et al., 2005; Irie et al., 2008; Purnell et al., 2009; Qiu et al., 2003]. We therefore hypothesized that genetic components play a role in the development and progression of AD in the presence of CVD risk factors and events. Testing for the interactions between AD risk genes and CVD risk factors on hippocampal volume may shed light on the underlying mechanisms of AD-related neurodegeneration, and suggest potential therapeutic treatment as many CVD risk factors are largely modifiable.

The remainder of the paper is organized as follows. In the Materials and Methods section, we present the kernel machine based method and the statistical test for interaction detection between multidimensional variable sets. Simulation studies are then introduced to evaluate the proposed method. In the Results section, simulation results, as well as our findings on the real data are shown, and compared to alternative interaction detection methods. The advantages and weaknesses of the method, and the implication of the findings, are summarized in the Discussion section. Some theoretical aspects of the kernel method and supplementary analyses are provided in Appendix.

2 Materials and Methods

2.1 Kernel methods for interaction detection

2.1.1 The model

We assume that there are N unrelated subjects under investigation. yi, i = 1, ···, N, is a quantitative phenotype for the i-th subject, such as an image derived disease marker. We are interested in detecting the interaction between a collection of genetic variants and a set of non-genetic variables such as disease risk factors, environmental exposures, or epigenetic markers. In particular, let Gi=[Gi,1,,Gi,L]T denote the L SNP markers, where Gi,s, s = 1, ···, L, is the genotype coded to be the number of copies of the minor allele that the i-th subject possesses for the s-th SNP, and takes the values of 0 (homozygotic major alleles), 1 (heterozygote), and 2 (homozygotic minor alleles). Let Wi=[Wi,1,,Wi,R]T denote the R non-genetic variables for the i-th subject. We associate the phenotype with the genetic and non-genetic variables via the following semiparametric model:

yi=xiTβ+f(Gi,Wi)+i,i=1,,N, (1)

where xi is a p × 1 vector of covariates (e.g., age, sex) for the i-th subject, β is a p × 1 vector of fixed effects, εi is random residual with zero-mean and homogeneous variance σ2, f is an unknown function on the product domain X=XGXW, with GiXG and WiXW. According to the ANOVA decomposition of functions [Gu, 2002], f can be expanded as

f(Gi,Wi)=hG(Gi)+hW(Wi)+hG×W(Gi,Wi), (2)

where hG(Gi) and hW(Wi) are the main effects of genetics and non-genetic factors, respectively, and hG×W (Gi, Wi) captures interactions. The overall mean of f can be absorbed into the intercept contained in xi, and is therefore omitted here. A reproducing kernel Hilbert space (RKHS) H of smooth real-valued functions on X can be constructed [Gu and Wahba, 1993; Wahba et al., 1995]. In particular, the functional space H has an orthogonal decomposition:

H=HGHWHG×W, (3)

where HG and HW are RKHSs of functions on XG and XW, respectively, HG×W is a RKHS of functions on X,, denotes direct sum. Each component in Eq. (2) lies in the corresponding subspace in Eq. (3). Therefore, H is a RKHS with the associated reproducing kernel as the sum of the reproducing kernels of the three component subspaces. We assume that H is equipped with an inner product < ·, · > and a norm H.

2.1.2 Model estimation

The function fH can be estimated by minimizing the penalized squared-error loss function of model (1):

L(y,β,f)=12i=1N[yixiTβf(Gi,Wi)]2+λ2J(f), (4)

where J()=H2 is a roughness penalty, and λ is a tuning parameter. Since the entire functional space H has the orthogonal decomposition (3), the penalty function J() can be decomposed accordingly, and Eq. (4) can be more explicitly written as

L(y,β,f)=12i=1N[yixiTβhG(Gi)hW(Wi)hG×W(Gi,Wi)]2+λG2hGHG2+λW2hWHW2+λG×W2hG×WHG×W2=12(yXβhGhWhG×W)T(yXβhGhWhG×W)+λG2hGHG2+λW2hWHW2+λζG×W2hG×WHG×W2, (5)

where y=[y1,,yN]T, X=[x1,,xN]T, hG=[hG(G1),,hG(GN)]T, hW=[hW(W1),,hW(WN)]T, hG×W=[hG×W(G1,W1),,hG×W(GN,WN)]T, λG, λW, and λG×W are positive smoothing parameters that balance the goodness of fit and complexity of the model.

By the representer theorem [Kimeldorf and Wahba, 1971; Wahba, 1990], the functions hG, hW and hG×W that minimize the functional (5) take the forms

hG(G)=j=1NαG,jkG(G,Gj),hW(W)=j=1NαW,jkW(W,Wj),hG×W(G,W)=j=1NαG×W.jkG×W((G,W),(Gj,Wj)), (6)

for arbitrary G* and W*, where αG,j, αW,j and αG×W,j, j = 1, 2, ···, N, are unknown coefficients, kG, kW and kG×W are reproducing kernel functions of the Hilbert spaces HG, HW and HG×W, respectively. Since the reproducing kernel of a tensor product of two RKHSs is the product of the two reproducing kernels [Aronszajn, 1950], the kernel function kG×W is connected to the kernel functions kG and kW by

kG×W((G,W),(Gj,Wj))=kG(G,Gj)kW(W,Wj). (7)

Define the N×N symmetric kernel matrices KG = {kG(Gi, Gj)}, KW = {kW (Wi, Wj)} and KG×W={kG×W((Gi,Wi),(Gj,Wj))}=KGKW, where is the Hadamard product (element-wise product) of two matrices. Then

hG=KGαG,hW=KWαW,hG×W=KG×WαG×W, (8)

where αG=[αG,1,,αG,N]T, αW=[αW,1,,αW,N]T and αG×W=[αG×W,1,,αG×W,N]T. Substituting hG, hW and hG×W into Eq. (5), and making use of the reproducing kernel property, we obtain

L(y,β,αG,αW,αG×W)=12T+λG2αGTKGαG+λW2αWTKW×αW+λG×W2αG×WTKG×WαG×W, (9)

where ε = yKGαGKWαWKG×WαG×W.

The gradients of L with respect to the parametric coefficients β and nonparametric coefficients αG, αW and αG×W are

Lβ=XT,LαG=KGT+λGKGαG,LαW=KWT+λWKWαW,LαG×W=KG×WT+λG×WKG×WαG×W. (10)

Therefore, setting the gradients to zero, this first-order condition is given by the linear system:

[XTXXTKGXTKWXTKG×WKGTXKGTKG+λGKGKGTKWKGTKG×WKWTXKWTKGKWTKW+λWKWKWTKG×WKG×WTXKG×WTGKG×WTKWKG×WTKG×W+λG×WKG×W]ζ[ζβαGαWαG×W]=[XTyKGTyKWTyKG×WTy]. (11)

Liu et al. [2007] showed that this first-order linear system is equivalent to the normal equation of the linear mixed effects model

y=Xβ+hG+hW+hG×W+, (12)

where β is a coefficient vector of fixed effects, hG, hW and hG×W are independent random effects, and distributed as hGN(0,τG2KG), τG2=λG1σ2, hWN(0,τW2KW), hWN(0,τW2KW), τW2=λW1σ2, hG×WN(0,τG×W2KG×W), τG×W2=λG×W1σ2, ε is independent of random effects and follows ε ~ N(0, σ2I), and I is an identity matrix. This connection indicates that the fixed effects β, and the random effects hG, hW and hG×W, obtained by minimizing the loss function in Eq. (4), are equivalent to the best linear unbiased predictors (BLUPs) of the linear mixed effects model (12). The variance components τG2, τW2, τG×W2 and σ2 can be estimated via the restricted maximum likelihood (ReML) approach [Harville, 1977; Lindstrom and Bates, 1988] (see Appendix A for details), and the estimates of random effects ĥG, ĥW, ĥG×W can be obtained by solving the linear system (11) and inserting the α^ estimates into Eq. (8).

2.1.3 Selection of kernels

There are a variety of choices for the kernel functions to characterize the similarity between subjects with respect to the genetic variants and non-genetic factors, as long as they are nonnegative definite [Schaid, 2010a,b]. Possible candidates are the linear kernel, the polynomial kernel, the Euclidean distance (ED) kernel, the Gaussian kernel, and the identity-by-state (IBS) kernel [Kwee et al., 2008].

Here we use the IBS kernel for the genetic effect. The IBS kernel measures the similarity of the genotypes between the i-th and j-th subject by

kG(Gi,Gj)=12Ls=1L(2Gi,sGj,s), (13)

where L is the number of SNP markers to be combined. The IBS kernel is a nonparametric function of the genotypes, as it does not depend on the selection of basis or any assumption on the types of genetic interaction. Therefore, in principle, it can capture any epistatic effect between genetic variants and their nonlinear influences on the phenotypes.

We propose the linear kernel to combine multiple non-genetic factors. The linear kernel can be represented as

kW(Wi,Wj)=1RWi,Wj=1Rs=1RWi,sWj,s, (14)

where R is the number of non-genetic factors under investigation. We evaluate the performance of the two kernels by simulation studies.

2.1.4 Score test

We note, from the linear mixed effects model representation (12), that testing an overall genetic and non-genetic effect H0:hG()=hW()=hG×W()=0 is equivalent to testing the variance components: H0:τG2=τW2=τG×W2=0. To address the issue that, under the null hypothesis, the parameters τG2, τW2, and τG×W2 are on the boundary of the parameter space, Liu et al. [2007] proposed a score test based on the ReML. In particular, let K = KG + KW + KG×W, and the score test statistic is defined as:

S(σ02)=12σ02(yXβ^0)TK(yXβ^0)=12σ02yTP0KP0y, (15)

where β^0 is the maximum likelihood estimate (MLE) of the regression coefficients under the null model y = 0 + ε0, σ02 is the variance of ε0, P0=IX(XTX)1XT is the projection matrix under the null. S(σ02) is a quadratic function of y and follows a mixture of chi-squares under the null. We use the Satterthwaite method to approximate the distribution of S(σ02) by a scaled chi-square distribution κχν2. In practice, the unknown value of the model parameter σ02 in S is replaced by its ReML estimate σ^02 under the null model. To account for this substitution, the fitted scale parameter κ and the degrees of freedom ν are adjusted, giving κ^ and ν^ (see Appendix B for details). The p-value of an observed score statistic S(σ^02) is then computed using the scaled chi-square distribution κ^χν^2.

To test the interaction effect, we notice that testing the null hypothesis H0I:hG×W()=0 is equivalent to testing the variance component: H0I:τG×W2=0. Let Σ=τG2KG+τW2KW+σ2I, where τG2, τW2, and σ2 are model parameters under the null model y = + hG + hW + ε. We follow Li and Cui [2012] and design a score test statistic

SI(τG2,τW2,σ2)=12yTPIKG×WPIy, (16)

where PI=Σ1Σ1X(XTΣ1X)1XTΣ1 is the projection matrix under the null hypothesis H0I. Analogously, the Satterthwaite method is used to approximate the distribution of SI by a scaled chi-square distribution κIχνI2. In practice, the unknown model parameters τG2, τW2 and σ2 in SI are replaced by their ReML estimates τ^G2, τ^W2 and σ^2 under the null model. The fitted scale parameter κI and the degrees of freedom νI are adjusted to account for this substitution, giving κ^I and ν^I (see Appendix B for details). The p-value of an observed score statistic SI(τ^G2,τ^W2,σ^2) is then computed using the scaled chi-square distribution κ^Iχν^I2.

2.2 The ADNI data

Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies and non-profit organizations, as a $60 million, 5-year public-private partnership. The primary goal of ADNI has been to test whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer's disease (AD). Determination of sensitive and specific markers of very early AD progression is intended to aid researchers and clinicians to develop new treatments and monitor their effectiveness, as well as lessen the time and cost of clinical trials.

The Principal Investigator of this initiative is Michael W. Weiner, MD, VA Medical Center and University of California – San Francisco. ADNI is the result of efforts of many co-investigators from a broad range of academic institutions and private corporations, and subjects have been recruited from over 50 sites across the U.S. and Canada. The initial goal of ADNI was to recruit 800 subjects but ADNI has been followed by ADNI-GO and ADNI-2. To date these three protocols have recruited over 1500 adults, ages 55 to 90, to participate in the research, consisting of cognitively normal older subjects, people with early or late MCI, and people with early AD. The follow up duration of each group is specified in the protocols for ADNI-1, ADNI-2 and ADNI-GO. Subjects originally recruited for ADNI-1 and ADNI-GO had the option to be followed in ADNI-2. For up-to-date information, see www.adni-info.org.

2.2.1 Data preprocessing and SNP grouping

All ADNI-1 1.5T structural brain MRI scans were processed using FreeSurfer (freesurfer.nmr.mgh.harvard.edu) [Dale et al., 1999; Fischl, 2012; Fischl et al., 1999], version 4.3. Subject specific intra-cranial volume (ICV) and bilateral hippocampal volumes were automatically computed by FreeSurfer, after skull stripping, B1 bias field correction, segmentation and labeling [Fischl et al., 2002, 2004], and passed rigorous visual quality control checks. For more details regarding the imaging processing and quality control, we refer the reader to the official website of ADNI (http://adni.loni.usc.edu).

CVD risk factors considered in the present study included age, gender, body mass index (BMI), systolic blood pressure, current smoking status and diabetes. A CVD risk score summarizing these six risk factors can be calculated using the non-laboratory, office-based cardiovascular risk profile prediction function from the Framingham Heart Study (FHS) [D'Agostino et al., 2008]. The score can be treated as a continuous variable, and higher values indicate higher risks of developing individual CVD events. We use the FHS risk score as a benchmark variable to compare the results obtained with the proposed multivariate method.

We followed the ENIGMA2 1KGP cookbook (v3) (The Enhancing Neuroimaging Genetics through Meta-Analysis (ENIGMA) consortium, http://enigma.loni.ucla.edu/wp-content/uploads/2012/07/ENIGMA2_1KGP_cookbook_v3.doc, version July 27, 2012), developed by the ENIGMA2 Genetics support team, to preprocess and impute the ADNI genome-wide SNP data. In brief, we used PLINK [Purcell et al., 2007] for preprocessing and quality control, which included sex discrepancy check, removing subjects with low genotype call rate (< 95%), and filtering individual markers that contained an ambiguous strand assignment and that did not satisfy the following quality control criteria: genotype call rate ≥ 95%, minor allele frequency (MAF) ≥ 1%, and Hardy-Weinberg equilibrium p ≥ 1 × 10–6. We then used the MaCH software [Li et al., 2010] to impute ungenotyped SNPs based on the 1,000 genomes reference [1000 Genomes Project Consortium, 2012]. 697 subjects (cognitive normal controls N = 203, subjects with mild cognitive impairment N = 334, and AD patients N = 160) that have complete imaging and genetic data, and CVD risk factors, were included in the following analyses. Among the 334 subjects with mild cognitive impairment (MCI), 183 subjects were stable and did not convert to AD throughout the follow-up, and 151 subjects progressed to AD in at least one of the follow-up visits.

In addition to APOE, the major genetic risk factor for late-onset AD, a recent two-stage meta-analysis of GWAS with 74,046 individuals identified 20 susceptibility loci for late-onset AD [Lambert et al., 2013]. A very recent article suggested that the REST gene may play a critical role in normal aging in human cortical and hippocampal neurons, and may distinguish neuroprotection from neurodegeneration [Lu et al., 2014]. We therefore used these 21 genes as our candidate gene set and extracted all the SNPs on the coding regions as well as 20kb up/downstream of each of these genes in the ADNI data set. Some of these genes, e.g., BIN1, CR1 and PICALM, have been associated with quantitative imaging phenotypes, such as hippocampal volume, amygdala volume and entorhinal cortical thickness, in ADNI [Biffi et al., 2010; Bralten et al., 2011; Furney et al., 2010; Weiner et al., 2013]. Table 1 lists the 21 genes and the final number of SNPs located on them after preprocessing and quality control.

Table 1.

A list of 21 candidate risk genes for late-onset Alzheimer's disease and the final number of SNPs located on and near them.

Chr Gene SNP num Chr Gene SNP num
19 ABCA7 240 6 HLA-DRB5 62
2 BIN1 301 2 INPP5D 495
20 CASS4 165 5 MEF2C 272
6 CD2AP 421 11 MS4A6A 63
19 CD33 85 11 PICALM 360
11 CELF1 97 8 PTK2B 419
8 CLU 116 4 REST 146
1 CR1 264 14 SLC24A4 716
18 DSG2 219 11 SORL1 233
7 EPHA1 115 7 ZCWPW1 74
14 FERMT2 242

2.3 Alternative methods

No standard method exists in the literature that can detect interactions between a collection of SNPs and a set of non-genetic variables such as CVD risk factors. Below in both simulation studies and real data analysis, we consider alternative methods based on burden tests and principal component analysis (PCA) that can summarize multiple variables into a single regressor and convert the problem into standard multiple regression analyses.

Burden tests collapse a set of variants in a genetic region into a single burden variable. They can be powerful when most variants in a region are causal and the effects are in the same direction, but suffer from dramatic power loss when these assumptions are violated [Lee et al., 2012a]. Different variants of burden tests have been proposed and are mainly aimed at rare variant association tests. Here we adapt two methods to our context: (1) the rare variant test (RVT), proposed by Morris and Zeggini [2010], which calculates the proportion of minor alleles in the set of genetic variants for each subject as the burden regressor, and (2) the weighted sum test (WST) [Madsen and Browning, 2009], which calculates a genetic score as the burden variable. The genetic score is a weighted average of the count of minor alleles for each subject. Specifically, if Gi,s ∈ {0, 1, 2} is the count of minor alleles in genetic variant s for the i-th subject, then the genetic score is γi=s=1LGi,sws, where L is the number of SNP markers, ws=Nsqs(1qs) is the weight, in which qs = (ms + 1)/(2Ns + 2), Ns is the total number of subjects genotyped for variant s, and ms is the number of minor alleles observed for variant s. Many other burden tests are similar to these two methods. We note that the underlying assumptions of these collapsing methods are that the interactions have similar effect sizes and the same direction for all the genetic variants being collapsed. The tests can be biased or have inflated type I error if these assumptions are violated.

For the second alternative method, we perform PCA on the set of SNP regressors or CVD risk factors to extract the first principal component that explains the largest possible variance of the original regressors.

After reducing the dimension of the SNP set and the CVD risk factors, we can carry out a standard multiple regression analysis, in which the interaction effect between the derived univariate SNP regressor and the CVD risk factor is modeled by their product.

2.4 Simulation studies

We conducted simulation studies to evaluate the performance of the ReML algorithm and the accuracy of the score tests. The simulation was based on real ADNI demographic information, genetic data and CVD risk factors with N = 697 subjects, in order to best mimic the situation of our real data application. To synthesize quantitative phenotypes, we employed the following model:

yi=xiTβ+αM[hG(Gi)+hW(Wi)]+αIhG×W(Gi,Wi)+σi, (17)

where xi is a vector comprising an intercept, the ICV, and the education (in years) of the i-th subject, β is a vector of all ones, εi is a Gaussian distributed random error with zero mean and unit variance, σ is the standard deviation of the error and was set to 5 in our simulation studies. αM and αI are two free parameters. We followed Liu et al. [2007] and designed the function hG to have the following complex form:

hG(Gi)=2cos(Gi,1)3Gi,22+2eGi,3Gi,41.6sin(Gi,5)cos(Gi,3)+4Gi,1Gi,5. (18)

The main non-genetic effect was designed as hW(Wi) = Wi,1 + Wi,2. Finally, we introduced a linear interaction effect between the genetic variants and CVD risk factors: hG×W(Gi, Wi) = 3hG(Gi)hW(Wi).

Since previous work has performed extensive simulations to characterize the overall score test for the semiparametric model [Hua and Ghosh, 2014; Liu et al., 2007], we focused our simulations on testing for the interaction effect. Our major concern is to assess whether the main effects “bleed” into the interaction, yielding false positives, or “cloud” the interaction, reducing sensitivity.

In the first simulation study, we generated data under different values of αM and αI to evaluate the performance of the score tests. Specifically, when αM = αI = 0, both main and interaction effects vanish, and we studied the false positive rate of the score test for overall effect. When αM > 0 and αI = 0, there are main effects but no interaction, and we therefore assessed the power of the overall score test, and the false positive control of the score test for interaction effect. We also set αM and αI at a number of different values to test the power of both score tests in different situations. 1,000 simulations were performed for each setting. For each run, we randomly picked a gene from Table 1 and randomly selected five adjacent SNPs on the gene, reflecting the linkage disequilibrium (LD) between genetic markers, and randomly selected two variables from the six CVD risk factors (age, gender, BMI, systolic blood pressure, smoking and diabetes). The phenotypic data were then generated using the five SNPs and two CVD risk variables following Eq. (17). We note that for all the genes the signal only comes from a very small proportion of the SNPs. Likewise, only part of the CVD risk factors were used in producing the phenotypic data.

We then evaluated the performance of the kernel method. As a comparison, we summarized genetic variants and CVD risk factors into a single regressor respectively using different collapsing methods (for genetic data: PCA, RVT and WST; for CVD risk factors: PCA and the FHS risk score1), and conducted standard univariate interaction tests between all possible combinations of these univariate genetic and CVD risk variables, which amounted to six multiple regression analyses.

In the second simulation study, we fixed αM = 1, and for each run αI was assigned a random number uniformly distributed on [0, 1] with a probability of 0.5, and was fixed at 0 otherwise. We then generated data following the same approach described above, and compared the Receiver Operating Characteristic (ROC) curves of the kernel method and alternative methods for interaction detection.

2.5 Real data application

As a sanity check, we started with some standard regression analyses of real data from ADNI. Specifically, we tested the association between hippocampal volume (averaged between two hemispheres) and APOE-ε4 status (carriers vs. non-carriers), after controlling for ICV, age, gender and education. We conducted multiple regression analyses to assess the main effects of the FHS CVD risk score and each CVD risk factor, and their interaction effects with APOE-ε4 status on hippocampal volume, after properly controlling for covariates. Using logistic regression, we also analyzed the association between diagnosis (AD patients vs. cognitive normal controls) and the FHS CVD risk score and each CVD risk factor.

We then applied the kernel method to detect interaction effects between each of the 21 candidate AD risk genes listed in Table 1, and the collection of six CVD risk factors, on hippocampal volume. ICV and education were included in the model as covariates. The IBS kernel was used to combine SNPs located on and near each gene, and a linear kernel was used for the CVD risk factors. All CVD risk factors were standardized (subtracting the mean and devided by the standard deviation) to transform variables measured with different units onto the same scale. Bonferroni correction was used to control the family-wise error (FWE) rate, and a gene was identified to have a significant interaction with the CVD risk factors if the p-value was smaller than 0.05/21 ≈ 2.38 × 10–3. Analogous to the simulation studies, we compared the proposed kernel machine based method to six univariate interaction tests based on different dimension reduction methods.

In order to reveal the direction of a significant interaction effect, we collapsed genetic variables and CVD risk factors into scalar variables; the RVT burden variable and the FHS risk score, respectively, and defined four regimes, low genetic risk and low CVD risk, high genetic risk and low CVD risk, low genetic risk and high CVD risk, high genetic risk and high CVD risk, by splitting the data with respect to the medians of the RVT burden variable and the FHS risk score. We then averaged the estimated interaction effect ĥG×W within each of the four regimes. A smaller average indicates a higher risk of the interaction effect (smaller hippocampal volume). Jackknife resampling was used to get accurate standard error estimates of these average statistics. We also compared ĥG×W between the 183 stable MCI subjects (who remained MCI throughout the follow-up) and the 151 MCI subjects that progressed to AD to investigate the predictive power of the interaction effect on disease progression.

3 Results

3.1 Simulation results

Table 2 shows the simulation results for the overall and interaction score tests. Here we used a nominal p-value threshold of 0.05. In more than 99% of the situations, the ReML algorithm converged within 50 iterations (convergence was declared when the difference between successive log ReML likelihoods was smaller than 10–4), the maximum number of iterations we set in this simulation study, and in most cases it converged very quickly within 10 iterations and a few seconds with a MATLAB implementation on a MacBook Pro with 8GB of memory and a 2.4 GHz Intel Core i7 processor.

Table 2.

Simulation results of the overall and interaction score tests, and the alternative methods for interaction detection based on dimension reduction and multiple regression. Nominal p-value threshold was set to 0.05. The first row corresponds to simulating the null hypothesis for both the overall and interaction effects. The second and third rows correspond to the null hypothesis of the interaction effect only. Thus, corresponding detection rates in the first three rows are desired to be below the p-value threshold of 0.05. PCg: first principal component of the genetic data; PCw: first principal component of the cardiovascular disease risk factors; RVT: rare variant test burden variable; WST: weighted sum test burden variable; FHS: the Framingham Heart Study vascular disease risk score.

Kernel method
Alternative methods
(αM, αI) Overall Interaction PCg × FHS PCg × PCw RVT × FHS RVT × PCw WST × FHS WST × PCw
(0, 0) 0.048 0.051 0.043 0.051 0.040 0.049 0.038
(0.5, 0) 0.908 0.046 0.061 0.046 0.063 0.043 0.062 0.054
(1, 0) 1.000 0.051 0.052 0.052 0.068 0.049 0.061 0.052
(0, 0.5) 0.961 0.918 0.622 0.499 0.578 0.444 0.572 0.455
(0,1) 0.983 0.950 0.681 0.546 0.620 0.508 0.631 0.505
(1, 0.1) 0.999 0.585 0.292 0.242 0.229 0.204 0.226 0.216
(1, 0.25) 0.995 0.865 0.506 0.405 0.432 0.324 0.433 0.325
(1, 0.5) 0.997 0.926 0.591 0.481 0.542 0.413 0.536 0.425
(0.1, 1) 0.984 0.951 0.681 0.529 0.622 0.478 0.610 0.476
(0. 25, 1) 0.986 0.944 0.665 0.521 0.600 0.459 0.601 0.469
(0.5, 1) 0.983 0.951 0.654 0.517 0.612 0.472 0.587 0.446
(0.5, 0. 5) 0.984 0.918 0.625 0.488 0.587 0.423 0.575 0.431
(1, 1) 0.994 0.958 0.660 0.527 0.629 0.488 0.620 0.489

It can be seen that when αM = αI = 0, the size of the overall score test is close to the nominal p-value threshold of 0.05. When αM > 0 and αI = 0, the false positive rate of the score test for interaction effect is also well controlled. When αI > 0, the power of the interaction test quickly exceeds 0.90. In contrast, we observe that dimension reduction methods can have slightly inflated false positive rates and are dramatically under-powered when compared to the kernel machine based method.

Fig. 1 shows the ROC curves of the kernel method and alternative methods for interaction detection, obtained with the second simulation data. The power gain of the kernel method relative to the alternative methods is evident.

Figure 1.

Figure 1

Receiver operating characteristic (ROC) curves of the kernel method and alternative methods for interaction detection in the simulated data. False positive rates are plotted against true positive rates with the p-value threshold varying between 0 and 1 with a step size of 0.01.

3.2 Application to ADNI data

APOE-ε4 status is significantly associated with hippocampal volume (p = 3.97 × 10–16), after controlling for ICV, age, gender and education. Table 3 shows the main effects of the CVD risk factors, and their interaction effects with APOE-ε4 status on hippocampal volume obtained by conventional interaction analyses, as well as the association between diagnosis (AD patients vs. cognitive normal controls) and each CVD risk factor obtained by logistic regression analyses. As expected, the association between age and hippocampal volume is highly significant, indicating the reduction in the size of the hippocampus over time. The FHS CVD risk score is also significantly associated with hippocampal volume. Specifically, higher CVD risk scores suggest smaller hippocampal volumes. Age also shows a suggestive significant interaction with APOE-ε4 status but did not survive a Bonferroni correction for the total number of statistical tests performed here.

Table 3.

Results of standard regression analyses with two different outcomes: hippocampal volume and Alzheimer's disease (AD) diagnosis (AD vs. control). The p-values for the main effects of the Framingham Heart Study (FHS) cardiovascular disease (CVD) risk score and each CVD risk factor, and their interaction effects with APOE-ε4 status on hippocampal volume obtained by conventional interaction analyses are presented. The p-values for the association between diagnosis (AD patients vs. cognitive normal controls) and each CVD risk factor obtained by logistic regression analyses are shown. Significant associations, with Bonferroni corrected p-values smaller than 0.05, are highlighted in bold.

Risk factor Covariates adjusted Hippocampal volume (linear regression)
AD vs. control (logistic regression)
Main effect Interaction with APOE-ε4
FHS risk score ICV, Edu a 5.12 × 10–4 0.132 0.761
Age ICV, Edu, Gender 3.99 × 10–18 4.03 × 10–3 0.301
Gender ICV, Edu, Age 0.103 0.982 0.467
Body mass index (BMI) ICV, Edu, Age, Gender 2.37 × 10–3 0.227 0.011
Systolic blood pressure ICV, Edu, Age, Gender 0.832 0.591 0.077
Smoking status ICV, Edu, Age, Gender 0.062 0.112 0.974
Diabetes ICV, Edu, Age, Gender 0.609 0.541 0.247

Table 4 lists the ReML estimates of τG2, τW2, τG×W2 and σ2, and the p-values for the interaction effects between each of the 21 candidate AD risk genes and the CVD risk factors on hippocampal volume. Two genes, CR1 (p = 4.85×10–4) and EPHA1 (p = 5.64×10–4), are identified to have significant interaction with the CVD risk factors.

Table 4.

Results of the multivariate interaction analyses. The restricted maximum likelihood (ReML) estimates of τG2, τW2, τG×W2 and σ2, and the p-values for the interaction effects between each of the 21 candidate Alzheimer's disease (AD) risk genes and the cardiovascular disease (CVD) risk factors on hippocampal volume, using the kernel method and the alternative methods based on dimension reduction and multiple regression, are shown. p-values that survive multiple testing corrections are highlighted in bold. PCg: first principal component of the genetic data; PCw: first principal component of the CVD risk factors; RVT: rare variant test burden variable; WST: weighted sum test burden variable; FHS: the Framingham Heart Study vascular disease risk score.

Gene Kernel method
Alternative methods
τG2 τW2 τG×W2 σ 2 p-value PCg × FHS PCg × PCw RVT × FHS RVT × PCw WST × FHS WST × PCw
ABCA7 3.44E-7 3.24E-2 9.03E-4 0.286 0.479 0.138 0.253 0.215 0.183 0.291 0.258
BIN1 6.41E-4 1.66E-2 2.03E-2 0.279 0.167 0.861 0.615 0.165 0.296 0.049 0.441
CASS4 2.17E-3 3.31E-2 3.44E-7 0.284 0.492 0.522 0.889 0.971 0.565 0.742 0.838
CD2AP 3.44E-7 4.31E-2 3.44E-7 0.290 0.954 0.150 0.737 0.145 0.618 0.229 0.728
CD33 7.76E-5 4.39E-2 3.44E-7 0.289 0.845 0.196 0.245 0.807 0.543 0.734 0.807
CELF1 3.44E-7 2.74E-2 6.52E-3 0.284 0.284 0.591 0.626 0.102 0.115 0.230 0.216
CLU 1.62E-3 3.60E-2 3.44E-7 0.286 0.602 0.920 0.998 0.140 0.479 0.154 0.563
CR1 3.44E-7 7.21E-3 4.68E-2 0.273 4.85E-4 0.559 4.08E-4 0.268 0.023 0.354 0.109
DSG2 3.44E-7 3.51E-2 3.44E-7 0.286 0.566 0.232 0.135 0.823 0.367 0.617 0.202
EPHA1 3.44E-7 3.44E-7 5.12E-2 0.273 5.64E-4 0.323 0.133 0.182 0.270 0.939 0.979
FERMT2 2.05E-2 3.17E-2 5.93E-3 0.278 0.342 0.556 0.817 0.767 0.794 0.852 0.876
HLA-DRB5 3.44E-7 3.80E-2 3.44E-7 0.286 0.759 0.764 0.388 0.763 0.388 0.763 0.388
INPP5D 6.18E-4 3.44E-2 3.44E-7 0.285 0.527 0.910 0.395 0.267 0.022 0.375 0.156
MEF2C 1.74E-3 2.08E-2 1.77E-2 0.279 0.058 0.125 0.408 0.085 0.436 0.067 0.418
MS4A6A 9.32E-4 3.77E-2 3.44E-7 0.287 0.902 0.769 0.969 0.779 0.945 0.781 0.926
PICALM 1.89E-2 3.91E-2 3.44E-7 0.281 0.679 0.473 0.295 0.580 0.390 0.922 0.294
PTK2B 1.68E-3 3.25E-2 9.00E-4 0.284 0.460 0.779 0.615 0.630 0.406 0.043 0.048
REST 3.66E-2 2.11E-2 1.95E-2 0.273 0.189 0.732 0.497 0.800 0.578 0.694 0.671
SLC24A4 8.22E-3 1.64E-2 2.35E-2 0.277 0.211 0.345 0.029 0.634 0.222 0.942 0.268
SORL1 2.30E-3 4.77E-2 3.44E-7 0.290 0.919 0.663 0.521 0.129 0.481 0.126 0.455
ZCWPW1 3.44E-7 2.85E-2 3.80E-3 0.284 0.265 0.634 0.373 0.600 0.422 0.574 0.437

Fig. 2 shows the average of the estimated interaction effect ĥG×W within each of the four regimes (low genetic risk and low CVD risk, high genetic risk and low CVD risk, low genetic risk and high CVD risk, high genetic risk and high CVD risk) for the two genes CR1 and EPHA1 that show a significant interaction effect. For both genes, CVD risks largely dominate the interaction effect with higher CVD risk associated with higher risk of interaction and vice versa. The genetic risk appears to have an opposite effect of its marginal effect on interaction under high CVD risk, i.e., high genetic risk reduces the interaction effect in the presence of high CVD risk. One interpretation of this interaction pattern is that under low genetic risk, CVD risk factors have a more detrimental effect.

Figure 2.

Figure 2

Direction of significant interaction effects. For genes CR1 and EPHA1 that show significant interaction effect, genetic variables and cardiovascular disease (CVD) risk factors were collapsed into scalar variables; the RVT burden variable and the FHS risk score, respectively. The average of the estimated interaction effect ĥG×W within each of the four regimes (low genetic risk and low CVD risk, high genetic risk and low CVD risk, low genetic risk and high CVD risk, high genetic risk and high CVD risk) is shown with a standard error estimate obtained by Jackknife resampling. A smaller average indicates a higher risk of the interaction effect (smaller hippocampal volume).

Two-sample t-tests showed that subjects with stable MCI have significantly larger ĥG×W (lower risk) than subjects that progressed to AD for both genes (CR1, p = 0.049; EPHA1, p = 0.044), suggesting that disease progression is predicted by the interaction effect.

3.3 Comparison to alternative methods

Table 4 also shows the p-values for the alternative methods to test the interaction effect. PCA on both the genetic data and CVD risk factors, followed by multiple linear regression analyses, also identified CR1 with a FWE-corrected significant p-value, but failed to find EPHA1. Other alternative methods did not identify any significant interaction effect.

4 Discussion

In this paper, we have proposed a kernel machine based method to test for interactions between multidimensional variable sets. Compared to traditional collapsing and PCA-based methods, the proposed method provides a more flexible and biological plausible way to model epistasis between genetic variants, accommodates multiple factors that potentially moderate genetic effects, and can test for complex interaction effects between multidimensional variable sets. Although multivariate methods typically produce more powerful and reproducible results, which can also be biologically more insightful, the interpretation of model parameters is often challenging. In this paper, we made some preliminary attempts to reveal the direction of interaction between multidimensional variable sets and investigate the prediction of disease progression by interaction effects. Further improvement of model interpretation would be facilitated by incorporating more biological information when a better understanding of the underlying mechanisms is achieved.

One particular case where model interpretation might be straightforward is when we use a linear kernel, as we did to model non-genetic effects. In our analyses, the non-genetic effect hW can be represented as a linear combination of the CVD risk factors: hW = W, where W=[W1,,WN]T are the individual CVD risk factors. The linear coefficients βW reflect the influence of each variable on the phenotype. The covariance matrix of the coefficient estimates can be computed as

cov(β^WβW)=(τW2R)I(τW2R)2WTPW, (19)

where R is the number of non-genetic variables, P=V1V1X(XTV1X)1XTV1, and V=τG2KG+τW2KW+τG×W2KG×W+σ2I. An estimate of this covariance structure can be obtained by inserting the ReML estimates of the variance component parameters τG2, τW2, τG×W2 and σ2 into Eq. (19), assuming that the error of the ReML estimation can be ignored. Supplementary Table S1 presents the point estimates and standard errors for each element of βW, for the ADNI analyses corresponding to each one of the 21 candidate AD risk genes. The above strategy does not apply to nonlinear kernels, but individual subjects can be examined by inspecting the estimated main and interaction effects ĥG, ĥW, ĥG×W, and their variabilities. More specifically, cov(h^G×WhG×W)=τG×W2KG×W(τG×W2KG×W)P(τG×W2KG×W), and the variability of ĥG and ĥW can be quantified analogously. Analyses of individual subjects may provide additional information about the model, but we consider this beyond the scope of the present paper.

Due to the moderate sample size in the present study, we constrained our analysis to a list of candidate late-onset AD risk genes. However, the proposed method can be applied to genome-wide interaction studies. In particular, we note that when testing for the overall genetic and non-genetic effect, the variance component parameters τG2, τW2, and τG×W2 need not be estimated. Therefore, the overall score test offers an efficient and non-iterative approach to screen the whole genome for genetic variants that might show significant contribution to the phenotypic variation. Fitting the full model, estimating the variance components, and testing for interactions can then focus on genetic components with significant overall effect, which will dramatically reduce computational burden. A similar argument applies to voxel-/vertex-wise interaction studies.

We would like to note that most of the CVD risk factors we employed in our ADNI analyses are largely endogenous and thus are, to some extent, under genetic control. Although, this might make the interpretation of the results difficult, this challenge, we believe, exists in many interaction effects probed and detected in the genetics literature. Furthermore, even though the non-genetic variables we used are collectively associated with cardiovascular risk, and thus our interpretation of the detected interaction effects as genetic influences modified by cardiovascular risk is highly likely, alternative explanations that do not involve cardiovascular mechanisms are also possible. Finally, while hippocampal volume is a sensitive biomarker of AD, it is not solely related to this condition. In fact, we conducted additional analyses with entorhinal cortex thickness and volume (also MRI markers of late-onset AD) as alternative outcome variables. These analyses (not included here) did not reveal any statistically significant interaction effect. Although our presented results demonstrated that the detected interaction effects with hippocampal volume predict future MCI-to-AD conversion, one possibility is that these associations might not be specific to AD. Elucidating these issues is beyond the scope of this paper and will require careful follow-up studies that will consider all alternative possibilities.

Another potential concern in the present study is that we took the coding regions and 20kb up/downstream of the 21 candidate genes as units for interaction detection. Although Lambert et al. [2013] examined all SNPs that have strong associations with the top SNPs to confirm the relevance of these genes, we are aware that they are likely not the causative genes. Also, the size of the regulatory region of different genes may vary substantially. Therefore, an alternative strategy is to group SNPs in high LD with the most associated SNP, whether or not they are in or close to the nearest gene.

The choice of kernels may have an impact on the validity and power of the method too. In the present study, we employed an IBS kernel for the genetic data and a linear kernel for the CVD risk factors, as both kernels are parameter free and can in principle capture complex epistatic effects between genetic variants and model the joint effect of multiple non-genetic variables. We found, through simulation studies, that the proposed selection of kernels appears to work well in our setting, both in terms of false positive rate control and statistical power. Using other kernel functions, e.g., the Gaussian kernel for combining non-genetic factors, is certainly possible, but might require preselecting or estimating additional parameters. Our preliminary implementation (results not shown) suggests that incorporating the estimation of the spread parameter in the Gaussian kernel into the ReML algorithm might lead to unstable estimates and failure of convergence. The performance of various kernel functions in different data structures, and the optimal selection of kernels, deserve future investigation.

Although we illustrated the proposed method using a univariate quantitative image derived phenotype, genes as units to group SNPs, and CVD risk factors as non-genetic variables, the modeling framework is general and can be applied to other types of phenotypic and genetic data, and to detecting other types of interactions such as genotype-by-environment interactions. Our method can also be extended to accommodate binary outcomes, and thus has potential wide applications to case-control studies. Recently, a series of papers have been published on the proper modeling of longitudinal and time-to-event data in neuroimaging studies [Bernal-Rusiel et al., 2013a,b; Sabuncu et al., 2014]. Incorporating genetic components and interactions in longitudinal and survival models, and investigating the genetic contributions to the progression of a brain-related illness and the timing of a clinical event of interest, seem promising directions for future research.

Two genes, CR1 and EPHA1, were identified to have significant interaction effects with the CVD risk factors in the present study. The associations between the two genes and AD have been identified and replicated by a number of independent studies [see e.g., Harold et al., 2009; Hollingworth et al., 2011; Lambert et al., 2009; Naj et al., 2011], in addition to Lambert et al. [2013], and their potential contributions to the mechanism of AD have been under active investigation [Biffi et al., 2012; Chibnik et al., 2011; Thambisetty et al., 2013]. Moreover, recent studies show that many of the AD risk genes have potential roles in relationship with CVD risk factors, such as hypertension, hypercholesterolemia, and obesity [Guerreiro et al., 2012; Liu et al., 2014]. In particular, excess adiposity may act as an enhanced substrate for CR1-related inflammatory events [Guerreiro et al., 2012]. Our findings indicate that genetic components may contribute to the etiology of late-onset AD in the presence of CVD risks, and warrant further investigations.

Supplementary Material

01

Highlights.

  • Novel kernel machine based method for detecting interaction effects

  • Method can model epistatic effects

  • Method can accommodate multiple environmental variables

  • We show novel gene-cardiovascular risk interaction relevant to Alzheimer's

Acknowledgements

Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Alzheimer's Association; Alzheimer's Drug Discovery Foundation; BioClinica, Inc.; Biogen Idec Inc.; Bristol-Myers Squibb Company; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; GE Healthcare; Innogenetics, N.V.; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Medpace, Inc.; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Synarc Inc.; and Takeda Pharmaceutical Company. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer's Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.

This research was carried out in whole or in part at Athinoula A. Martinos Center for Biomedical Imaging at Massachusetts General Hospital, using resources provided by Center for Functional Neuroimaging Technologies, P41EB015896, a P41 Biotechnology Resource Grant supported by National Institute of Biomedical Imaging and Bioengineering (NIBIB), National Institutes of Health (NIH).

This research was also funded in part by NIH grants R01 EB015611-01 and U54 MH091657-03 (TEN), R01 NS083534, R01 NS070963, and NIH NIBIB 1K25EB013649-01 (MRS), K24MH094614 and R01 MH101486 (JWS), Wellcome Trust grants 100309/Z/12/Z and 098369/Z/12/Z (TEN), and a BrightFocus grant AHAF-A2012333 (MRS). Two additional R01 grants from the National Institute of Aging (R01 AG008122 and R01 AG016495) provided partial support for this research.

Appendix

A The ReML estimation of the linear mixed effects model

The variance component parameters τG2, τW2, τG×W2 and σ2 in the linear mixed effects model (12) can be estimated via the restricted maximum likelihood (ReML) approach [Harville, 1977; Lindstrom and Bates, 1988].

Specifically, let θ=(τG2,τW2,τG×W2,σ2)T, and V(θ)=τG2KG+τW2KW+τG×W2KG×W+σ2I. The log ReML likelihood can be written as

(τG2,τW2,τG×W2,σ2)=12logV(θ)12logXTV1(θ)X12(yXβ^)TV1(θ)(yXβ^), (20)

where β^ is the BLUP of the regression coefficients β. The score equation of each unknown component of θ is

θi=12tr{PVθi}+12(yXβ^)TV1VθiV1(yXθ^)=12tr{PVθi}+12yTPVθiPy=0, (21)

where P=V1V1X(XTV1X)1XTV1, VτG2=KG,VτW2=KW, VτG×W2=KG×W, and V/∂σ2 = I. The element of the observed information matrix is

2θiθj=12tr{PVθiPVθj}+yTPVθiPVθjPy. (22)

The element of the Fisher information (expected information) matrix is

E[2θiθj]=12tr{PVθiPVθj}. (23)

Here we employ the Newton-Raphson method, and in particular, the Fisher's scoring algorithm, to solve Eq. (21) [Kenward and Roger, 1997]. Given an estimate of the unknown variance component parameters at the k-th iteration θ(k), the parameters are updated by

θ(k+1)=θ(k)+[I(k)]1θθ(k), (24)

where I is the Fisher information matrix. At the beginning of the iteration process, all the variance components are initialized to var[y]/4. We use the following expectation maximization (EM) algorithm [Laird et al., 1987] as an initial step to determine the direction of the updates as EM algorithm is robust to poor starting values:

σ2=yTPy(Np),θsi(1)=1N[[θi(0)]2yTPVθiPy+tr{θi(0)I[θi(0)]2PVθi}],i=1,2,3. (25)

After one EM update, we then switch to the Fisher's scoring algorithm for the remaining iterations until the difference between successive log ReML likelihoods, (k+1)(k), is smaller than 10–4. In the iteration process, any component that escapes from the parameter space, i.e., negative estimate, is set to 10–6 × var[y]. The approach to fit the null model, when τG×W2 is fixed at zero, is analogous.

B Satterthwaite approximation to the score test

The score test statistic S(σ02) defined in Eq. (15) is approximated by a scaled chi-square distribution κχν2. The scale parameter κ and the degrees of freedom ν are calculated by matching the mean and variance of S(σ02) with those of κχν2. Specifically,

{δ=E[S(σ02)]=12tr{P0K}=E[κχμ2]=κν,ρ=var[S(σ02)]=12tr{P0KP0K}=var[κχν2]=2κ2ν.} (26)

Solving the two equations leads to κ = ρ/2δ and ν = 2δ2/ρ. In practice, the unknown model parameter σ02 in S is replaced by its ReML estimate σ^02 under the null model y = 0 + ε0. To account for this substitution, we follow Liu et al. [2007] and Li and Cui [2012], and replace ρ by ρ^ based on the efficient information [Zhang and Lin, 2003]. It can be seen from Eq. (23) that the elements of the Fisher information matrix of θ are

Iττ=12[tr(P0KGP0KG)tr(P0KGP0KW)tr(P0KGP0KG×W)tr(P0KWP0KG)tr(P0KWP0KW)tr(P0KWP0KG×W)tr(P0KG×WP0KG)tr(P0KG×WP0KW)tr(P0KG×WP0KG×W)].
Iτσ2=12[tr(P0KG),tr(P0KW),tr(P0KG×W)]T,Iσ2σ2=12tr(P0P0).

The efficient information is defined as I^ττ=IττIτσ2Iσ2σ21Iτσ2T, and ρ^=var[S(σ^02)] is estimated by the sum of all the elements in the matrix I^ττ. With the adjusted parameters κ^=ρ^2δ and ν^=2δ2ρ^, the p-value of an observed score statistic S(σ^02) is then computed using the scaled chi-square distribution κ^χν^2.

Analogously, the score test statistic SI(τG2,τW2,σ2) defined in Eq. (16) for the interaction effect is approximated by a scaled chi-square distribution κIχνI2. The scale parameter κI and the degrees of freedom νI are calculated by matching the mean and variance of SI with those of κIχνI2. Specifically,

{δI=E[SI]=12tr{PIKG×W}=E[κIχνI2]=κIνIρI=var[SI]=12tr{PIKG×WPG×W}=var[κIχνI2]=2κI2νI.} (27)

Solving the two equations gives κI = ρI/2δI and νI=2δI2ρI. In practice, the unknown model parameters τG2, τW2 and σ2 in SI are replaced by their ReML estimates τ^G2, τ^W2 and σ^2 under the null model y = +hG+hW+ε. To account for this substitution, ρI is replaced by ρ^I based on the efficient information [Zhang and Lin, 2003]. Specifically, κ^I=ρ^I2δI and ν^I=2δI2ρ^I, where δ^I=tr{PIKG×W}2, and ρ^I=tr{PIKG×WPIKG×W}2ΨΛ1ΨT2, in which

Ψ=[tr(PIKG×WPI),tr(PIKG×WPIKG),tr(PIKG×WPIKW)],

and

Λ=[tr(PI2)tr(PI2KG)tr(PI2KW)tr(PI2KG)tr(PIKGPIKG)tr(PIKGPIKW)tr(PI2KW)tr(PIKWPIKG)tr(PIKWPIKW)].

The p-value of an observed score statistic SI(τ^G2,τ^W2,σ^2) is then computed using the scaled chi-square distribution κ^Iχν^I2.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1

We note that the FHS risk score was derived from real biological data. Thus the FHS risk score is likely suboptimal for detecting the simulated effects of the CVD risk factors on the phenotype.

References

  1. 1000 Genomes Project Consortium An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491(7422):56–65. doi: 10.1038/nature11632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Aronszajn N. Theory of reproducing kernels. Transactions of the American mathematical society. 1950;68(3):337–404. [Google Scholar]
  3. Aschard H, Hancock DB, London SJ, Kraft P. Genome-wide meta-analysis of joint tests for genetic and gene-environment interaction effects. Human Heredity. 2011;70(4):292–300. doi: 10.1159/000323318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bernal-Rusiel JL, Greve DN, Reuter M, Fischl B, Sabuncu MR. Statistical analysis of longitudinal neuroimage data with linear mixed effects models. NeuroImage. 2013a;66:249–260. doi: 10.1016/j.neuroimage.2012.10.065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bernal-Rusiel JL, Reuter M, Greve DN, Fischl B, Sabuncu MR. Spatiotemporal linear mixed effects modeling for the mass-univariate analysis of longitudinal neuroimage data. NeuroImage. 2013b;81:358–370. doi: 10.1016/j.neuroimage.2013.05.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Biffi A, Anderson CD, Desikan RS, Sabuncu M, Cortellini L, et al. Genetic variation and neuroimaging measures in Alzheimer disease. Archives of Neurology. 2010;67(6):677–685. doi: 10.1001/archneurol.2010.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Biffi A, Shulman JM, Jagiella JM, Cortellini L, Ayres AM, et al. Genetic variation at CR1 increases risk of cerebral amyloid angiopathy. Neurology. 2012;78(5):334–341. doi: 10.1212/WNL.0b013e3182452b40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Borenstein AR, Wu Y, Mortimer JA, Schellenberg GD, McCormick WC, et al. Developmental and vascular risk factors for Alzheimer's disease. Neurobiology of Aging. 2005;26(3):325–334. doi: 10.1016/j.neurobiolaging.2004.04.010. [DOI] [PubMed] [Google Scholar]
  9. Bralten J, Franke B, Arias-Vásquez A, Heister A, Brunner HG, et al. Cr1 genotype is associated with entorhinal cortex volume in young healthy adults. Neurobiology of Aging. 2011;32(11):2106–e7. doi: 10.1016/j.neurobiolaging.2011.05.017. [DOI] [PubMed] [Google Scholar]
  10. Caspi A, McClay J, Moffitt TE, Mill J, Martin J, et al. Role of genotype in the cycle of violence in maltreated children. Science. 2002;297(5582):851–854. doi: 10.1126/science.1072290. [DOI] [PubMed] [Google Scholar]
  11. Caspi A, Sugden K, Moffitt TE, Taylor A, Craig IW, et al. Influence of life stress on depression: moderation by a polymorphism in the 5-HTT gene. Science. 2003;301(5631):386–389. doi: 10.1126/science.1083968. [DOI] [PubMed] [Google Scholar]
  12. Chibnik LB, Shulman JM, Leurgans SE, Schneider JA, Wilson RS, et al. CR1 is associated with amyloid plaque burden and age-related cognitive decline. Annals of Neurology. 2011;69(3):560–569. doi: 10.1002/ana.22277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. D'Agostino RB, Vasan RS, Pencina MJ, Wolf PA, Cobain M, et al. General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation. 2008;117(6):743–753. doi: 10.1161/CIRCULATIONAHA.107.699579. [DOI] [PubMed] [Google Scholar]
  14. Dale AM, Fischl B, Sereno MI. Cortical surface-based analysis: I. Segmentation and surface reconstruction. NeuroImage. 1999;9(2):179–194. doi: 10.1006/nimg.1998.0395. [DOI] [PubMed] [Google Scholar]
  15. Fischl B. Freesurfer. NeuroImage. 2012;62(2):774–781. doi: 10.1016/j.neuroimage.2012.01.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Fischl B, Sereno MI, Dale AM. Cortical surface-based analysis: II: Inflation, flattening, and a surface-based coordinate system. NeuroImage. 1999;9(2):195–207. doi: 10.1006/nimg.1998.0396. [DOI] [PubMed] [Google Scholar]
  17. Fischl B, Salat DH, Busa E, Albert M, Dieterich M, et al. Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain. Neuron. 2002;33(3):341–355. doi: 10.1016/s0896-6273(02)00569-x. [DOI] [PubMed] [Google Scholar]
  18. Fischl B, van der Kouwe A, Destrieux C, Halgren E, Ségonne F, et al. Automatically parcellating the human cerebral cortex. Cerebral Cortex. 2004;14(1):11–22. doi: 10.1093/cercor/bhg087. [DOI] [PubMed] [Google Scholar]
  19. Friston KJ, Buechel C, Fink GR, Morris J, Rolls E, Dolan RJ. Psychophysiological and modulatory interactions in neuroimaging. NeuroImage. 1997;6(3):218–229. doi: 10.1006/nimg.1997.0291. [DOI] [PubMed] [Google Scholar]
  20. Furney SJ, Simmons A, Breen G, Pedroso I, Lunnon K, et al. Genome-wide association with MRI atrophy measures as a quantitative trait locus for Alzheimer's disease. Molecular Psychiatry. 2010;16(11):1130–1138. doi: 10.1038/mp.2010.123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Ge T, Feng J, Hibar DP, Thompson PM, Nichols TE. Increasing power for voxel-wise genome-wide association studies: The random field theory, least square kernel machines and fast permutation procedures. NeuroImage. 2012;63:858–873. doi: 10.1016/j.neuroimage.2012.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Ge T, Schumann G, Feng J. Imaging genetics — Towards discovery neuroscience. Quantitative Biology. 2014;1(4):1–19. [Google Scholar]
  23. Gottesman II, Shields J. Schizophrenia genetics: A twin study vantage point. Academic Press; New York: 1972. [Google Scholar]
  24. Gottesman II, Gould TD. The endophenotype concept in psychiatry: etymology and strategic intentions. American Journal of Psychiatry. 2003;160(4):636. doi: 10.1176/appi.ajp.160.4.636. [DOI] [PubMed] [Google Scholar]
  25. Gu C. Smoothing spline ANOVA models. Springer; 2002. [Google Scholar]
  26. Gu C, Wahba G. Smoothing spline ANOVA with component-wise Bayesian “confidence intervals”. Journal of Computational and Graphical Statistics. 1993;2(1):97–117. [Google Scholar]
  27. Guerreiro RJ, Gustafson DR, Hardy J. The genetic architecture of Alzheimer's disease: beyond APP, PSENs and APOE. Neurobiology of Aging. 2012;33(3):437–456. doi: 10.1016/j.neurobiolaging.2010.03.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Harold D, Abraham R, Hollingworth P, Sims R, Gerrish A, et al. Genome-wide association study identifies variants at CLU and PICALM associated with Alzheimer's disease. Nature Genetics. 2009;41:1088–1093. doi: 10.1038/ng.440. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Harville DA. Maximum likelihood approaches to variance component estimation and to related problems. Journal of the American Statistical Association. 1977;72(358):320–338. [Google Scholar]
  30. Helzner EP, Luchsinger JA, Scarmeas N, Cosentino S, Brickman AM, et al. Contribution of vascular risk factors to the progression in Alzheimer disease. Archives of Neurology. 2009;66(3):343–348. doi: 10.1001/archneur.66.3.343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Hollingworth P, Harold D, Sims R, Gerrish A, Lambert J, et al. Common variants at ABCA7, MS4A6A/MS4A4E, EPHA1, CD33 and CD2AP are associated with Alzheimer's disease. Nature Genetics. 2011;43(5):429–435. doi: 10.1038/ng.803. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Hsu L, Jiao S, Dai JY, Hutter C, Peters U, et al. Powerful cocktail methods for detecting genome-wide gene-environment interaction. Genetic Epidemiology. 2012;36(3):183–194. doi: 10.1002/gepi.21610. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Hua W, Ghosh D. Equivalence of kernel machine regression and kernel distance covariance for multidimensional trait association studies. arXiv preprint arXiv:1402.2679. 2014 doi: 10.1111/biom.12314. [DOI] [PubMed] [Google Scholar]
  34. Irie F, Fitzpatrick AL, Lopez OL, Kuller LH, Peila R, et al. Enhanced risk for alzheimer disease in persons with type 2 diabetes and APOE ε4: The Cardiovascular Health Study Cognition Study. Archives of Neurology. 2008;65(1):89–93. doi: 10.1001/archneurol.2007.29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Kenward MG, Roger JH. Small sample inference for fixed effects from restricted maximum likelihood. Biometrics. 1997:983–997. [PubMed] [Google Scholar]
  36. Kimeldorf G, Wahba G. Some results on Tchebycheffian spline functions. Journal of Mathematical Analysis and Applications. 1971;33(1):82–95. [Google Scholar]
  37. Kivipelto M, Helkala E, Laakso MP, Hänninen T, Hallikainen M, et al. Midlife vascular risk factors and Alzheimer's disease in later life: longitudinal, population based study. BMJ. 2001;322(7300):1447–1451. doi: 10.1136/bmj.322.7300.1447. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Kraft P, Yen YC, Stram DO, Morrison J, Gauderman WJ. Exploiting gene-environment interaction to detect genetic associations. Human Heredity. 2007;63(2):111–119. doi: 10.1159/000099183. [DOI] [PubMed] [Google Scholar]
  39. Kwee LC, Liu D, Lin X, Ghosh D, Epstein MP. A powerful and flexible multilocus association test for quantitative traits. The American Journal of Human Genetics. 2008;82(2):386–397. doi: 10.1016/j.ajhg.2007.10.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Laird N, Lange N, Stram D. Maximum likelihood computations with repeated measures: application of the EM algorithm. Journal of the American Statistical Association. 1987;82(397):97–105. [Google Scholar]
  41. Lambert JC, Heath S, Even G, Campion D, Sleegers K, et al. Genome-wide association study identifies variants at CLU and CR1 associated with Alzheimer's disease. Nature Genetics. 2009;41:1094–1099. doi: 10.1038/ng.439. [DOI] [PubMed] [Google Scholar]
  42. Lambert JC, Ibrahim-Verbaas CA, Harold D, Naj AC, Sims R, et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease. Nature Genetics. 2013;45:1452–1458. doi: 10.1038/ng.2802. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Lee S, Emond MJ, Bamshad MJ, Barnes KC, Rieder MJ, et al. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. The American Journal of Human Genetics. 2012a;91(2):224–237. doi: 10.1016/j.ajhg.2012.06.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Lee SH, DeCandia TR, Ripke S, Yang J, Sullivan PF, et al. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nature Genetics. 2012b;44(3):247–250. doi: 10.1038/ng.1108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Li S, Cui Y. Gene-centric gene–gene interaction: A model-based kernel machine method. The Annals of Applied Statistics. 2012;6(3):1134–1161. [Google Scholar]
  46. Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genetic Epidemiology. 2010;34(8):816–834. doi: 10.1002/gepi.20533. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Lin X. Variance component testing in generalised linear models with random effects. Biometrika. 1997;84(2):309–326. [Google Scholar]
  48. Lin X, Lee S, Christiani DC, Lin X. Test for interactions between a genetic marker set and environment in generalized linear models. Biostatistics. 2013;14(4):667–681. doi: 10.1093/biostatistics/kxt006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Lindstrom MJ, Bates DM. Newton-Raphson and EM algorithms for linear mixed-effects models for repeated-measures data. Journal of the American Statistical Association. 1988;83(404):1014–1022. [Google Scholar]
  50. Liu D, Lin X, Ghosh D. Semiparametric regression of multidimensional genetic pathway data: Least-squares kernel machines and linear mixed models. Biometrics. 2007;63(4):1079–1088. doi: 10.1111/j.1541-0420.2007.00799.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Liu G, Yao L, Liu J, Jiang Y, Ma G, et al. Cardiovascular disease contributes to Alzheimer's disease: evidence from large-scale genome-wide association studies. Neurobiology of Aging. 2014;35(4):786–792. doi: 10.1016/j.neurobiolaging.2013.10.084. [DOI] [PubMed] [Google Scholar]
  52. Lo RY, Jagust WJ, Weiner M, Aisen P, Petersen R, et al. Vascular burden and Alzheimer disease pathologic progression. Neurology. 2012;79(13):1349–1355. doi: 10.1212/WNL.0b013e31826c1b9d. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Lu T, Aron L, Zullo J, Pan Y, Kim H, et al. REST and stress resistance in ageing and Alzheimer's disease. Nature. 2014;507(7493):448–454. doi: 10.1038/nature13163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Luchsinger JA, Reitz C, Honig LS, Tang MX, Shea S, et al. Aggregation of vascular risk factors and risk of incident Alzheimer disease. Neurology. 2005;65(4):545–551. doi: 10.1212/01.wnl.0000172914.08967.dc. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Madsen BE, Browning SR. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genetics. 2009;5(2):e1000384. doi: 10.1371/journal.pgen.1000384. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Meyer-Lindenberg A, Weinberger DR. Intermediate phenotypes and genetic mechanisms of psychiatric disorders. Nature Reviews Neuroscience. 2006;7(10):818–827. doi: 10.1038/nrn1993. [DOI] [PubMed] [Google Scholar]
  57. Morris AP, Zeggini E. An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genetic Epidemiology. 2010;34(2):188–193. doi: 10.1002/gepi.20450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Mukherjee B, Chatterjee N. Exploiting gene-environment independence for analysis of case – control studies: An empirical Bayes-type shrinkage estimator to trade-off between bias and efficiency. Biometrics. 2008;64(3):685–694. doi: 10.1111/j.1541-0420.2007.00953.x. [DOI] [PubMed] [Google Scholar]
  59. Murcray CE, Lewinger JP, Conti DV, Thomas DC, Gauderman WJ. Sample size requirements to detect gene-environment interactions in genome-wide association studies. Genetic Epidemiology. 2011;35(3):201–210. doi: 10.1002/gepi.20569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Naj AC, Jun G, Beecham GW, Wang L, Vardarajan BN, et al. Common variants at MS4A4/MS4A6E, CD2AP, CD33 and EPHA1 are associated with late-onset Alzheimer's disease. Nature Genetics. 2011;43(5):436–441. doi: 10.1038/ng.801. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Paré G, Cook NR, Ridker PM, Chasman DI. On the use of variance per genotype as a tool to identify quantitative trait interaction effects: a report from the Women's Genome Health Study. PLoS Genetics. 2010;6(6):e1000981. doi: 10.1371/journal.pgen.1000981. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics. 2007;81(3):559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Purcell SM, Wray NR, Stone JL, Visscher PM, O'Donovan MC, et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460(7256):748–752. doi: 10.1038/nature08185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Purnell C, Gao S, Callahan CM, Hendrie HC. Cardiovascular risk factors and incident Alzheimer disease: a systematic review of the literature. Alzheimer Disease and Associated Disorders. 2009;23(1):1. doi: 10.1097/WAD.0b013e318187541c. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Qiu C, Winblad B, Fastbom J, Fratiglioni L. Combined effects of APOE genotype, blood pressure, and antihypertensive drug use on incident AD. Neurology. 2003;61(5):655–660. doi: 10.1212/wnl.61.5.655. [DOI] [PubMed] [Google Scholar]
  66. Sabuncu MR, Buckner RL, Smoller JW, Lee PH, Fischl B, et al. The association between a polygenic Alzheimer score and cortical thickness in clinically normal subjects. Cerebral Cortex. 2012;22(11):2653–2661. doi: 10.1093/cercor/bhr348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Sabuncu MR, Bernal-Rusiel JL, Reuter M, Greve DN, Fischl B. Event time analysis of longitudinal neuroimage data. NeuroImage. 2014;97:9–18. doi: 10.1016/j.neuroimage.2014.04.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Schaid DJ. Genomic similarity and kernel methods I: advancements by building on mathematical and statistical foundations. Human Heredity. 2010a;70(2):109–131. doi: 10.1159/000312641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Schaid DJ. Genomic similarity and kernel methods II: methods for genomic information. Human Heredity. 2010b;70(2):132–140. doi: 10.1159/000312643. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Sperling RA, Aisen PS, Beckett LA, Bennett DA, Craft S, et al. Toward defining the preclinical stages of Alzheimer’??s disease: Recommendations from the national institute on Aging-Alzheimer's Association workgroups on diagnostic guidelines for Alzheimer's disease. Alzheimer's & Dementia. 2011;7(3):280–292. doi: 10.1016/j.jalz.2011.03.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Stein JL, Hua X, Lee S, Ho AJ, Leow AD, et al. Voxelwise genome-wide association study (vGWAS). NeuroImage. 2010;53(3):1160–1174. doi: 10.1016/j.neuroimage.2010.02.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Sullivan PF, Daly MJ, O'Donovan M. Genetic architectures of psychiatric disorders: the emerging picture and its implications. Nature Reviews Genetics. 2012;13(8):537–551. doi: 10.1038/nrg3240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Thambisetty M, An Y, Nalls M, Sojkova J, Swaminathan S, et al. Effect of complement CR1 on brain amyloid burden during aging and its modification by APOE genotype. Biological Psychiatry. 2013;73(5):422–428. doi: 10.1016/j.biopsych.2012.08.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Thompson PM, Ge T, Glahn DC, Jahanshad N, Nichols TE. Genetics of the connectome. NeuroImage. 2013;80:475–488. doi: 10.1016/j.neuroimage.2013.05.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Visscher PM, Brown MA, McCarthy MI, Yang J. Five years of GWAS discovery. The American Journal of Human Genetics. 2012;90(1):7–24. doi: 10.1016/j.ajhg.2011.11.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Wahba G. Spline models for observational data. SIAM. 1990 [Google Scholar]
  77. Wahba G, Wang Y, Gu C, Klein R, Klein B. Smoothing spline ANOVA for exponential families, with application to the Wisconsin epidemiological study of diabetic retinopathy. The Annals of Statistics. 1995;23(6):1865–1895. [Google Scholar]
  78. Weiner MW, Veitch DP, Aisen PS, Beckett LA, Cairns NJ, et al. The Alzheimer's Disease Neuroimaging Initiative: a review of papers published since its inception. Alzheimer's & Dementia. 2013;9(5):e111–e194. doi: 10.1016/j.jalz.2013.05.1769. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Wu MC, Kraft P, Epstein MP, Taylor DM, Chanock SJ, et al. Powerful SNP-set analysis for case-control genome-wide association studies. The American Journal of Human Genetics. 2010;86(6):929–942. doi: 10.1016/j.ajhg.2010.05.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Wu MC, Lee S, Cai T, Li Y, Boehnke M, et al. Rare-variant association testing for sequencing data with the sequence kernel association test. The American Journal of Human Genetics. 2011;89(1):82–93. doi: 10.1016/j.ajhg.2011.05.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, et al. Common SNPs explain a large proportion of the heritability for human height. Nature Genetics. 2010;42(7):565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Yang J, Manolio TA, Pasquale LR, Boerwinkle E, Caporaso N, et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nature Genetics. 2011;43(6):519–525. doi: 10.1038/ng.823. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Zhang D, Lin X. Hypothesis testing in semiparametric additive mixed models. Biostatistics. 2003;4(1):57–74. doi: 10.1093/biostatistics/4.1.57. [DOI] [PubMed] [Google Scholar]
  84. Zlokovic BV. Neurovascular mechanisms of Alzheimer's neurodegeneration. Trends in Neurosciences. 2005;28(4):202–208. doi: 10.1016/j.tins.2005.02.001. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

01

RESOURCES