Abstract
Pleiotropy is a phenomenon that a single gene inflicts multiple correlated phenotypic effects, often characterized as traits, involving multiple biological systems. We propose a two-stage method to identify pleiotropic effects on multiple longitudinal traits from a family-based data set. The first stage analyzes each longitudinal trait via a three-level mixed-effects model. Random effects at the subject-level and at the family-level measure the subject-specific genetic effects and between-subjects intraclass correlations within families, respectively. The second stage performs a simultaneous association test between a single nucleotide polymorphism and all subject-specific effects for multiple longitudinal traits. This is performed using a quasi-likelihood scoring method in which the correlation structure among related subjects is adjusted. Two simulation studies for the proposed method are undertaken to assess both the type I error control and the power. Furthermore, we demonstrate the utility of the two-stage method in identifying pleiotropic genes or loci by analyzing the Genetic Analysis Workshop 16 Problem 2 cohort data drawn from the Framingham Heart Study and illustrate an example of the kind of complexity in data that can be handled by the proposed approach. We establish that our two-stage method can identify pleiotropic effects whilst accommodating varying data types in the model.
Keywords: genetic association study, longitudinal data, mixed effects model, multiple traits, pleiotropy, quasi-likelihood, single nucleotide polymorphisms
1. Introduction
Genetic aetiology of complex diseases, such as type 2 diabetes and cardiovascular disease (CVD), has identified genetic elements as common, contributing factors to these diseases. However, identification of specific genes that predispose humans to these complex diseases has been difficult (Newman et al., 2011). It is suspected that these diseases have complex combinations of genetic components and non-genetic elements that contribute to their occurrences. In genome-wide association studies (GWAS), hundreds of thousands of genetic variants are tested for their individual association with a phenotypic trait of interest. GWAS are considered to be a practical approach in screening the entire human genome for disease-associated loci via common genetic variants such as single nucleotide polymorphisms (SNPs) (O'Reilly et al., 2012; Solovieff et al., 2013). GWAS has become practical as the cost of acquiring a dense panel of SNPs has become more affordable. Complex phenotypic traits may be governed by multiple genes and environmental factors, and are subjective and ad hoc in nature. On the contrary, in general, genotypes are definitive entities. Therefore, it is the core objective of GWAS to characterize those phenotypic traits that are well-defined in their biological associations with complex diseases by genotypes.
Pleiotropy is a genetic phenomenon in which a single gene or genetic variant imposes two or more correlated phenotypic effects, often characterized as traits, involving two or more biological systems. A study of pleiotropic genes or loci may provide new knowledge about the evolution of genes and gene families as they relate to the aetiology of complex diseases (Hodgkin, 1998). The recent emergence of multiple-trait analysis in GWAS was not unforeseen, as clinical and epidemiological studies in humans capture multiple phenotype information (Shriner, 2012). For example, the Framingham Heart Study (FHS) includes multiple phenotypic measures, such as measurements of systolic blood pressure (SBP), total and high-density lipoprotein (HDL) cholesterol, and fasting glucose, to identify common characteristics that contribute to CVD; note that the aforementioned quantitative traits are now known to be some of the major risk factors of CVD. Shriner (2012) states that the statistical advantages of joint analysis of correlated traits include increased power to detect loci and increased precision of parameter estimation. Furthermore, performing joint analysis of correlated traits provides a means to (1) address the issue of varying types of pleiotropy and (2) investigate endophenotypes of complex traits, and thereby to better our understanding of the aetiology of complex diseases. A simple, traditional method for investigating pleiotropy involves multiple univariate analyses, in which a hypothesis test for an association between a genetic variant (e.g., SNP genotypes as the covariate) and a single trait (as the response variable) is performed for all complex traits in question over hundreds of thousands of genetic variants. This requires a subsequent step to determine whether or not the genetic variant is significantly associated with more than one trait. Inflation of family-wise error rate (FWER) is of concern when performing multiple hypothesis tests, especially with an increasing number of phenotypic traits (Feng, 2014; Wang et al., 2014). There are other methods proposed to analyze multiple correlated traits such the methods based on the multivariate linear mixed models and the principal-component analysis (Zhou & Stephens, 2014; Stephens, 2013; Aschard et al., 2014).
Longitudinal studies provide well-documented advantages over cross-sectional studies, but longitudinal studies have their challenges. (Hedeker & Gibbons, 2006). To gain power in detecting strongly associated SNPs or genes, we attempt to take full advantage of utilizing these available longitudinal data to lay the foundations for more reliable causal inference. One particular advantage of longitudinal studies is the ability to model a dynamical system within subjects and state statistical propositions about the dynamical system through statistical inferences. Furthermore, the inclusion of repeated measurements of time-varying covariates in the model permits much stronger statistical inferences about this dynamical system. However, the presence of missing data and the dependency in data impart significant complexity to the statistical modelling of longitudinal data (Hedeker & Gibbons, 2006). We overcome some of these challenges and gain positive features from conducting a longitudinal study via generalized linear mixed models (GLMMs). Application of GLMMs in longitudinal studies relaxes restrictive assumptions about the variance-covariance structure of the repeated measurements and missing data across time. GLMMs are quite robust to missing data and repeated measurements taken at unequal time points, thereby allowing analysis of unbalanced longitudinal data according to large sample theory (Hedeker & Gibbons, 2006). Furthermore, GLMMs conveniently accommodate both time-invariant and time-varying covariates. A particular feature of longitudinal studies we aim to exploit is their multi-level data structures. The use of all available data from each subject in a longitudinal study via GLMMs enables us to predict both subject-specific and family-specific effects, leading to increased statistical power and decreased bias due to attrition (Hedeker & Gibbons, 2006). The methods for genetic association analysis with the longitudinal traits can also be found in Sikorska et al. (2013); Furlotte et al. (2012). However, Sikorska et al. (2013) state that fitting linear mixed models to the longitudinal data can be computationally demanding and propose a computationally feasible solution for investigating the association between the given SNP and the longitudinal trait of interest. Their method is an approximate method based a conditional two-step approach.
Family-based genome-wide SNP data with rare genetic variants and a complex pedigree structure pose problems of high dimensionality. While performing population-based GWAS is a simpler approach, it is susceptible to population stratification(Feng et al., 2011). Hence, effectively incorporating family-based designs in GWAS can provide robustness to the effect of population stratification in allele frequencies (Naylor et al., 2010). Newman et al. (2011) emphasize that failure to account for pedigree relationships affects statistical tests of association. In GWAS of multiplex families, affected subjects (e.g., subjects with type 2 diabetes) with affected, biologically related subjects have a higher expected frequency of the allele that predisposes them to exhibit closely associated genetic conditions than do affected subjects with no affected, related subjects. As a result, the power to detect genetic association is expected to increase when affected subjects with affected, related subjects are included in the study. When related subjects are used in association studies, it is critical to account for the fact that subjects who are biologically related have correlated genotypes (Thornton & McPeek, 2007). The generalized quasi-likelihood scoring method (GQLSM) is an extension of the generalized linear model framework, proposed by Feng (2014), that was designed to accommodate variables other than binary type. Furthermore, its capacity to integrate the correlation structure among related individuals was inherited from the derivatives of the quasi-likelihood scoring framework introduced by Bourgain et al. (2003) and Thornton & McPeek (2007). In studies of complex diseases, it is inevitable that different types of data are used to express the phenotypic traits and that multiple data types (e.g., binary, ordinal, count, quantitative) are collected. Feng (2014) emphasizes that having a model that can not only accommodate a variety of data types but that can simultaneously analyze these varying data types is desired and can provide a powerful tool in the field of statistical genetics. Here, we also address the confounding effects caused by the population stratification by proposing a robust method to the effects imposed by population structure, e.g., the confounding effect of ethnicity is well recognized as the effect of population heterogeneity in genetics literature (Feng et al., 2011).
We extend the two-step strategy introduced by Wang et al. (2014) and design an alternative statistical method to accommodate cases when the assumption of independent subjects is violated. We propose a two-stage method to identify pleiotropy on multiple longitudinal traits from family-based data. First, we analyze each longitudinal trait via a three-level mixed-effect model in which the repeated measurements are nested within subjects and the subjects are nested within families. Random effects predicted at the subject-level and the family-level, via GLMMs, represent the subject-specific genetic effects and between-subject intraclass correlations within families, respectively. Second, we perform a simultaneous association test between an SNP and all subject-specific effects for multiple longitudinal traits. The genetic association test is based on the GQLSM in which the correlation structure among related subjects is adjusted.
Our manuscript is organized as follows. In Section 2, we provide an overview of the proposed statistical method. The details about the simulation studies and their results are shared in Section 3. We applied the proposed method to analyze the Genetic Analysis Workshop 16 (GAW16) Problem 2 data drawn from the FHS. Section 4 provides descriptions of the original GAW16 Problem 2 data set, pre-processing steps taken for our analysis, and key findings from the analysis of GAW16 Problem 2 data. Discussion about the results and recommendations for future research are described in Section 5.
2. Statistical Methods
2.1. Generalized Linear Mixed Models for Longitudinal Traits
We model longitudinal data via a generalized linear mixed model (GLMM). In particular, each longitudinal phenotypic trait is modelled using a three-level mixed-effect model. Here, random effects at the subject-level and at the family-level measure the subject-specific genetic effects and between-subjects intraclass correlations within families, respectively. The random effects allow the correlation between the repeated measurements to be incorporated into the estimates of parameters, standard errors, and tests of hypotheses. We can conceptualize the random effects at the subject-level as representing subject-specific differences in the propensity to response over time, conditional on their values of fixed effects included in the model (Hedeker & Gibbons, 2006).
Suppose we have a sample consisting of F independent families in an outbred population. Among n subjects, let ni be the number of subjects that are from the ith family. Then, we have the sample size of n = n1 + … + ni + … + nF. Let Xijk = (Xijk1, …, Xijkt, …, XijkTijk)′ be the vector of Tijk measurements of the kth trait of the jth subject from the ith family. A general form of a generalized linear mixed model (GLMM) is given by
| (1) |
where gk(·) is the link function for the kth trait; μijkt, ηijkt, and Zijkt are the conditional mean of Xijkt given Zijkt, linear predictor, and vector of covariates associated with the kth trait, respectively, for the jth subject from the ith family at time t; ak is a vector of fixed effects for covariates Zijkt; Γik is the ith family-specific effect (random effect) on the kth trait; γijk is the jth subject-specific effect (random effect) on the kth trait. Note that Zijkt can take on both time-varying and time-invariant covariates. For example, body-mass index (BMI) is a time-varying covariate and a well-known risk factor for CVD. A covariate such as a subject's sex is time-invariant and is assumed to be constant over the course of a longitudinal study. We provide the flexibility to choose different sets of covariates to be included in the GLMMs for different longitudinal traits as denoted by the subscript k in Zijkt, where k = 1, 2, …, K. Moreover, the number of repeated measurements can vary from subject to subject as denoted by the subscripts j and t, where j = 1, 2, …, ni and t = 1, 2, …, Tijk. The family-specific effect Γik can be defined as the effect of shared environmental factors, not accounted for by the confounding covariates, for the ith family on their repeated measurements on the kth trait. Similarly, we define the subject-specific effect γijk as the influence of the jth subject on his or her repeated measurements on the kth trait, which captures the unobservable effects of major genes and polygenes. For kth trait, for simplicity, we assume that the Γik's follow a normal distribution with a mean of 0 and a kth trait-specific variance and the γijk's follow a normal distribution with a mean of 0 and a kth trait-specific variance . Note that it is not necessary for the γijk and Γik to follow a normal distribution.
For a continuous kth trait, a GLMM becomes a linear mixed-effect model such that , where a random error εijkt is assumed to follow a normal distribution with a mean of 0 and a kth trait-specific variance . In this example, gk(·) is an identity link such that gk(μijkt) = μijkt. We fit the GLMM, using the ‘lme4’ package in R, to obtain a predicted jth subject-specific effect from the ith family on the kth trait, γijk (Bates et al., 2015a,b; Bates, 2014a,b). We denote these predicted subject-specific effects for the kth trait as a vector γ̂k = (γ̂11k, …, γ̂1n1k, …, γ̂F1k, …, γ̂FnFk). Then, we set the predicted subject-specific effects γ̂k on the kth trait as the covariates in the second stage to perform a simultaneous association test between an SNP and all subject-specific effects for the kth longitudinal trait. Furthermore, it is worth noting here that the fixed effects associated with confounding covariates are estimated using the GLMMs. As you will see in Section 3, if we have a longitudinal, binary trait Xij3t (e.g., hypertensive status), we can interpret the association between the subject-specific effect γijk and the hypertensive status accordingly. We may state that the subject-specific effect γijk is the underlying genetic risk factors for the jth subject, from the ith family, that affects the log-odds of experiencing hypertension. Recall that this is the case because, for a binary trait, a logistic link can be used with gk(μijkt) = log[μijkt/(1 − μijkt)] (Wang et al., 2014).
2.2. Genetic Association Study with Multiple Longitudinal Traits
From Section 2.1, we have acquired the predicted subject-specific effects on K traits, γ̂1, γ̂2, …, γ̂K, for a sample of n subjects from F independent families. For a given SNP, let Yi = (Yi1, …, Yini)′ represent the observed genotypes of subjects from the ith family with Yij defined as the proportion of allele 1 in the observed genotype of the jth subject from the ith family, i.e., , where , or 1 for all i = 1, 2, …, F, and j = 1, 2, …, ni. Under the Hardy-Weinberg equilibrium, 2Yij follows Binomial(2, πij), where πij is the expected frequency of allele 1 for the given SNP for the jth subject in the ith family. Then, we arrange the response vector such that , and the overall design matrix is of the form , where γi is an ni by (K + 1) design matrix with its first column consisting of 1's. In γi, the (k + 1)th column represents the subject-specific effects corresponding to the kth longitudinal trait for all subjects in the ith family. The jth row of the design matrix contains a 1 for the intercept and the K subject-specific effects for the jth subject from the ith family.
Feng (2014) proposes a logistic regression model to link the expected allele frequency of allele 1 with multiple traits. Here, we treat K subject-specific effects as K phenotypic traits, so
| (2) |
If an SNP is associated with a longitudinal trait, it should be associated with its corresponding subject-specific effect, which includes the contribution of the SNP to the variation of the trait. Otherwise, the SNP would not be associated with the subject-specific effect if it is not associated with the longitudinal trait, and so the corresponding coefficient, say βk, should be 0. Then, an overall test of the association between an SNP and a set of longitudinal traits can be formulated as
Here, the null hypothesis corresponds to the situation when the SNP is not associated with any one of the K longitudinal traits. Moreover, a logistic regression model provides the natural constraint that πij ∈ (0, 1) for all i and j. Under the null hypothesis, the mean of response Yij given subject-specific effects on K traits γij simplifies to . Thus, the mean response vector becomes a constant vector in the form of π = E(Y|γ) = E(Y) = π1. Under the null hypothesis, the overall covariance matrix of Y has the form , where ρ is the overall correlation matrix. The ρ is a block diagonal matrix, where the diagonal elements are the ρi's for i = 1, …, F. Each ρi represents the correlation among subjects from the ith family and zero matrices in the off-diagonal blocks represent the correlations among independent families. Within a family, the correlation matrix ρi can be calculated by the kinship and inbreeding coefficients based on the known relationships. For example, the correlation matrix of Yi is given by
where ϕj is the inbreeding coefficient of the jth subject from the ith family and ϕjj′ is the kinship coefficient between the jth subject and the jth subject in the ith family. With an outbred population, ϕj = 0 for all j. Note that the requirement of known relationships can be relaxed if genome-wide genetic data are available from which the relationships can be inferred. The quasi-likelihood score functions are in a (K + 1)-vector that has the form
| (3) |
where D is an n × (K + 1) derivative matrix of the form , and Σ is the covariance matrix of Y. Under the null hypothesis that β−0 = (β1, β2, …, βk, …, βK)′ = 0, the mean response vector π = π1, the covariance matrix Σ = Σ0, D = π(1 − π)γ, and U(β) = 2γ′ρ−1(Y − π1). Under the null hypothesis that β−0 = 0, the estimate of π given by π̂ = (1′ρ−11)−11′ρ−1Y or can be written as , where 1i is the ni-vector of 1's. According to Cox & Hinkley (1974) and Heyde (1997), the quasi-likelihood score statistic is given by
| (4) |
where Uβ−0(β̂0, 0) is a vector of score functions given by Equation 3 in which the score function for β0 is omitted and is a K × K matrix where the first row and the first column of the inverse of the information matrix I(β) are omitted; these are computed under the null hypothesis that β−0 = 0. From Feng (2014), the W statistic can be derived explicitly and is given as
| (5) |
or in an alternative form of
Under the null hypothesis, the W statistic follows a χ2-distribution with the degrees of freedom determined by the rank of the matrix . Thus, if the K subject-specific effects being tested are linearly independent, then asymptotically. The latter form of the W statistic breaks down a large sample of size n into F independent families. As a result, it achieves computational feasibility by circumventing the manipulation of high dimensional matrices. As mentioned in Feng (2014), when a single kth trait is tested, i.e., when a kth trait is tested individually, the W statistic for testing the association between an SNP and the kth trait can be rewritten as
| (6) |
where and asymptotically.
3. Simulation Studies
3.1. Simulation Models and Methods
To assess the performance of the proposed two-stage method, we conducted simulation studies evaluating the type I error rate and the power of the association tests. The assessment of power compares the power obtained by testing multiple traits simultaneously with the power achieved by testing each trait individually.
To generate a family data set, we grow a family starting from two unrelated subjects as a couple. Note that we define these unrelated subjects in a family whose parental information is unknown as founders. For each couple, the number of offspring is generated according to a Poisson distribution with a mean of 3. Each offspring is then assigned an unrelated subject as a spouse with a probability of 0.8 to form an offspring couple. Then, a grand-offspring of this offspring couple is generated from a Poisson distribution with a mean of 3. Note that the unrelated spouse is defined as a founder as well. We grow a family for up to three generations. It may be the case that a family stops growing before the completion of three generations by the process of natural degeneration. As a result, we generate families that are made up of two to 36 subjects with a mean size of about 9 subjects per family. The genealogy of each family remains for calculating the correlation matrix ρ.
Two simulation studies with 1 000 simulation replicates per study are implemented, each with sample sizes of n = 300, 500, and 1000. In both studies, we consider two quantitative traits X1 and X2, and one binary trait X3. These traits can be affected by five causal SNPs, denoted by G1, G2, …, G5, at different levels, i.e., each SNP affects at least one of the three traits. The effects of SNPs on the three traits are shown in Table 1. In Study 1, all five SNPs have genetic effects on all three traits at different levels. In Study 2, each of the five SNPs affects a different number of the three traits. For example, G3 has a genetic effect on X2 only, which is defined by setting the coefficients b13 = 0, b23 = 0.16, and b33 = 0 as shown in Table 1.
Table 1.
Effects of SNPs on longitudinal traits for Studies 1 and 2.
| Study 1 | Study 2 | |||||
|---|---|---|---|---|---|---|
|
|
|
|||||
| X1 | X2 | X3 | X1 | X2 | X3 | |
| G1 | b11 = 0.25 | b21 = 0.25 | b31 = 0.45 | b11 = 0.25 | b21 = 0.2 | b31 = 0.45 |
| G2 | b12 = 0.5 | b22 = 0.55 | b32 = 0.65 | b12 = 0.58 | b22 = 0 | b32 = 0.66 |
| G3 | b13 = 0.2 | b23 = 0.15 | b33 = 0.3 | b13 = 0 | b23 = 0.16 | b33 = 0 |
| G4 | b14 = 0.2 | b24 = 0.2 | b34 = 0.2 | b14 = 0.24 | b24 = 0.21 | b34 = 0 |
| G5 | b15 = 0.25 | b25 = 0.25 | b35 = 0.25 | b15 = 0 | b25 = 0 | b35 = 0.36 |
In practical situations, causal SNPs might not be genotyped. Instead, SNPs that are proximal to or in linkage disequilibrium (LD) with the causal SNPs are genotyped and available for the association analysis. To take this situation into account, we generate genotypes of both causal SNPs and SNPs that are in LD with the causal ones. We denote the SNPs that are in LD with the causal ones by M1, M2, …, M5. For each subject, to generate the SNP genotypes, we generate haplotype pairs for each subject. A haplotype is referred as the combination of marker alleles on a single chromosome that were inherited as a unit from a single parent. We denote the haplotype for two SNPs Gr and Mr as Hr = (HGr, HMr) for r = 1, 2, …, 5. HGr and HMr take a value of 1 for having allele 1 and 0 for not having allele 1. Given a family, haplotypes of founders are generated from a bivariate Bernoulli distribution with a mean vector πr = (πGr, πMr) and a covariance matrix where , and the correlation for an rth pair of SNPs, ρr, is set at a fixed value between 0.7 and 0.9. By random mating, a pair of HGr and HMr are generated to make up the genotypes Gr and Mr for a founder. Haplotypes of non-founders (i.e., offspring) are generated according to the Mendelian Law of Segregation from each parent. Similarly, a pair of haplotypes Hr for an offspring makes up the genotypes of the two SNPs Gr and Mr for this offspring. Furthermore, for the assessment of type I error rate, ten independent SNPs that are not associated with any one of the three traits are generated. The results from the type I error rate assessments for these ten SNPs are accumulated in both studies, resulting in 10 000 simulation replicates per study.
Then, we generate two covariates Zijt1 and Zijt2 and a family-level random effect Γik, where Zijt1 is a binary covariate generated from Bernoulli(0.3), Zijt2 is a continuous covariate generated from Gamma(ψg, θg), and Γik is a family-specific effect generated from . Here, Zijt1 is a time-varying, binary covariate that mimics the treatment status that may change over time. The second time-varying covariate Zijt2 is generated to mimic the age of a subject that changes over time. With family data that include members over three generations, the parameters of Gamma(ψg, θg) are estimated empirically using the GAW16 Problem 2 data set in order to generate more realistic age data. For example, when g = 1, the empirical mean and variance of subject age in the grandparent generation are used to estimate ψ1 and theta1, respectively. For each jth subject in an ith family, Tij measurements of age are generated from Gamma(ψ1, θ1) and are sorted in an ascending order, where the jth subject is a grandparent for g = 1. We repeat this process to generate the age for subjects in the second and third generations, i.e., for g = 1 and 2.
Given the generated covariates, genotypes of the causal SNPs, and the family-specific effect, we compute the linear predictor ηijkt for each kth longitudinal trait such that
| (7) |
for i = 1, …, F, j = 1, …, ni, k = 1, …, K, and t = 1, …, Tij. Then, two quantitative traits Xij1t and Xij2t are generated from N(μij1t, 1) and N(μij2t, 1) with identity links ηij1t = μij1t and ηij2t− = muij2t, respectively. In addition, a binary trait Xij3t is generated from Bernoulli(μij3t), where μij3t = exp(ηij3t)/(1 + exp(ηij3t)). Table 1 summarizes the effects of SNPs on the longitudinal traits for Studies 1 and 2. For the fixed effects of covariates Zijt1 and Zijt2, we set a1 = (a10, a11, a12)′ = (0, 0.3, 0.5)′ for the first trait Xij1t and a2 = (a20, a21, a22)′ = (0, 0.2, −0.3)′ for the second trait Xij2t. For the binary trait Xij3t, we set a3 = (a30, a31, a32)′ = (−2.4, −1.6, 0.06)′. We use the identical sets of fixed effects of covariates in both Study 1 and Study 2.
For each simulated data set, we first fit the GLMM, based on Equation 1, to obtain a predicted subject-specific effect γijk, denoted as γ̂ijk, on each trait for the jth subject in the ith family. In the GLMM, both covariates Zijt1 and Zijt2 are included. Note that family-specific effects, Γik's, on each trait for F families are also predicted. However, the family-specific effects are not the focus of this manuscript and so the results related to the predicted family-specific effects, Γ̂ik's, are not shown. Again, the fitting of the GLMM is implemented using the ‘lme4’ package in R (Bates et al., 2015a,b; Bates, 2014a,b). The predicted subject-specific effects γ̂ijk's are treated as phenotypes for the analysis in the second stage. For each SNP, we perform a simultaneous test on all three predicted subject-specific effects, γ̂ij1, γ̂ij2, and γ̂ij3, as in the overall hypothesis test. We compute the W statistic as given in Equation 5 and take the (1 − αF)th quantile of the -distribution to be the rejection threshold. To perform the association test on each subject-specific effect, we compute the Wk statistic for k = 1, 2, 3, as given in Equation 6. The rejection threshold for the individual test is set at the (1 − αF)th quantile of the -distribution, where α is obtained by solving αF = 1 − (1 − α)3 and αF is the FWER that we try to control for multiple tests. We set αF = 0.05, 0.01, and 0.001 so that the corresponding α-levels for individual tests are set at 0.01667, 0.00333, and 0.00033, respectively.
3.2. Simulation Study Results
This section provides the results obtained from the simulation studies that evaluated the ability of the proposed two-stage method to control the type I error at the desired level of significance αF, and to attain the statistical power for the given significance level over the multiple hypothesis testing procedure with the Bonferroni correction. Table 2 lists the mean and standard errors of the fixed effect coefficient estimates attained from fitting GLMMs for the three longitudinal traits in both studies. The fitting of the GLMMs under different combinations of settings generally yields estimates of the coefficients with small bias and standard errors.
Table 2.
Mean and standard error of fixed-effect estimates using GLMMs and based on over 1 000 simulation replicates for sample sizes of n = 300, 500, and 1 000.
| Trait k | Fixed Effect akℓ | Study 1 | Study 2 | ||
|---|---|---|---|---|---|
|
|
|
||||
| âkℓ† | SE(âkℓ)‡ | âkℓ† | SE(âkℓ)‡ | ||
| n = 300 | |||||
|
| |||||
| 1 | a11 = 0.3 | 0.300 | 0.058 | 0.301 | 0.057 |
| a12 = 0.5 | 0.500 | 0.003 | 0.500 | 0.003 | |
|
| |||||
| 2 | a21 = 0.2 | 0.198 | 0.058 | 0.201 | 0.057 |
| a22 = −0.3 | −0.300 | 0.003 | −0.300 | 0.003 | |
|
| |||||
| 3 | a31 = −1.6 | −1.608 | 0.150 | −1.608 | 0.153 |
| a32 = 0.06 | 0.060 | 0.007 | 0.060 | 0.007 | |
|
| |||||
| n = 500 | |||||
|
| |||||
| 1 | a11 = 0.3 | 0.303 | 0.045 | 0.300 | 0.044 |
| a12 = 0.5 | 0.500 | 0.002 | 0.500 | 0.002 | |
|
| |||||
| 2 | a21 = 0.2 | 0.201 | 0.045 | 0.200 | 0.044 |
| a22 = −0.3 | −0.300 | 0.002 | −0.300 | 0.002 | |
|
| |||||
| 3 | a31 = −1.6 | −1.609 | 0.115 | −1.605 | 0.118 |
| a32 = 0.06 | 0.060 | 0.006 | 0.060 | 0.006 | |
|
| |||||
| n = 1000 | |||||
|
| |||||
| 1 | a11 = 0.3 | 0.299 | 0.032 | 0.299 | 0.031 |
| a12 = 0.5 | 0.500 | 0.002 | 0.500 | 0.002 | |
|
| |||||
| 2 | a21 = 0.2 | 0.201 | 0.032 | 0.200 | 0.031 |
| a22 = −0.3 | −0.300 | 0.002 | −0.300 | 0.002 | |
|
| |||||
| 3 | a31 = −1.6 | −1.601 | 0.082 | −1.606 | 0.083 |
| a32 = 0.06 | 0.060 | 0.004 | 0.060 | 0.004 | |
and ‡ are mean values of âkℓ and SE(âkℓ) over 1000 simulation replicates, respectively, where k = 1, 2, 3 and ℓ = 1,2.
3.2.1. Type I Error Rate Assessment
The empirical null rejection rates found in Table 3 are the summary of the accumulated null rejection rates from the ten SNPs that are not associated with any one of the traits in both Study 1 and Study 2. Under the null hypothesis, each SNP has 1000 simulation replicates. So we have 10 000 simulation replicates, in total, to assess the type 1 error rate. For the simultaneous tests, the empirical null rejection rates are very close to their corresponding nominal levels. For example, in Study 1, the empirical null rejection rates at αF = 0.05 are 0.0526, 0.0498, and 0.0506 for samples sizes of n = 300, 500, and 1000, respectively. The union of the longitudinal rejection rates reports the empirical FWER among the three traits. We observe that the empirical FWERs are very close to their corresponding nominal levels of αF over all different combinations of settings. For example, in Study 1, the empirical unions of null rejection rates over the three individual tests at αF = 0.05, or equivalently at α = 0.01667, are 0.0516, 0.0478, and 0.0499 for samples sizes of n = 300, 500, and 1000, respectively.
Table 3.
Type I error rate assessments based on 10000 simulation replicates in each of the two studies for sample sizes of n = 300,500, and 1000.
| αF† | Individual Tests | Simultaneous Test | ||||
|---|---|---|---|---|---|---|
|
| ||||||
| X1 | X2 | X3 | Union | |||
| n = 300 | ||||||
|
| ||||||
| Study 1 | 0.05 | 0.0172 | 0.0187 | 0.0171 | 0.0516 | 0.0526 |
| 0.01 | 0.0032 | 0.0044 | 0.0039 | 0.0112 | 0.0117 | |
| 0.001 | 0.0004 | 0.0006 | 0.0007 | 0.0016 | 0.0021 | |
|
| ||||||
| Study 2 | 0.05 | 0.0195 | 0.0166 | 0.0179 | 0.0526 | 0.0553 |
| 0.01 | 0.0045 | 0.0038 | 0.0034 | 0.0116 | 0.0129 | |
| 0.001 | 0.0005 | 0.0002 | 0.0005 | 0.0012 | 0.0018 | |
|
| ||||||
| n = 500 | ||||||
|
| ||||||
| Study 1 | 0.05 | 0.0182 | 0.0149 | 0.017 | 0.0487 | 0.0483 |
| 0.01 | 0.0029 | 0.002 | 0.0027 | 0.0076 | 0.0092 | |
| 0.001 | 0.0002 | 0.0002 | 0.0005 | 0.0009 | 0.0014 | |
|
| ||||||
| Study 2 | 0.05 | 0.0175 | 0.0159 | 0.0157 | 0.0481 | 0.0498 |
| 0.01 | 0.0042 | 0.004 | 0.003 | 0.0112 | 0.0112 | |
| 0.001 | 0.0006 | 0.0005 | 0.0003 | 0.0014 | 0.0014 | |
|
| ||||||
| n = 1000 | ||||||
|
| ||||||
| Study 1 | 0.05 | 0.0165 | 0.018 | 0.0161 | 0.0499 | 0.0506 |
| 0.01 | 0.0031 | 0.0045 | 0.003 | 0.0106 | 0.0102 | |
| 0.001 | 0.0004 | 0.001 | 0.0003 | 0.0014 | 0.001 | |
|
| ||||||
| Study 2 | 0.05 | 0.0179 | 0.0128 | 0.0153 | 0.0455 | 0.0467 |
| 0.01 | 0.0036 | 0.0027 | 0.0035 | 0.0097 | 0.0094 | |
| 0.001 | 0.0004 | 0.0002 | 0.0008 | 0.0014 | 0.0011 | |
For αF = 0.05, 0.01, 0.001, α = 0.01667, 0.00333, 0.00033, respectively.
3.2.2. Power Assessment
We compared the power achieved by the simultaneous association test with the power achieved by the individual tests. Tables 4–6 summarize the results for sample sizes of n = 300, 500, and 1 000 in Study 1. Similarly, Tables 7–9 summarize the results for sample sizes ofn = 300, 500, and 1000 in Study 2. Note that the test with the higher power is indicated in boldface.
Table 4.
Power comparisons based on 1000 simulation replicates in Study 1 for a sample size of n = 300.
| αF† | Individual Tests | Simultaneous Test | ||||
|---|---|---|---|---|---|---|
|
| ||||||
| X1 | X2 | X3 | Union | |||
| Study 1 | ||||||
|
| ||||||
| G1 | 0.05 | 0.299 | 0.264 | 0.19 | 0.542 | 0.606 |
| 0.01 | 0.129 | 0.127 | 0.085 | 0.291 | 0.394 | |
| 0.001 | 0.043 | 0.03 | 0.031 | 0.099 | 0.187 | |
|
| ||||||
| G2 | 0.05 | 0.224 | 0.262 | 0.098 | 0.403 | 0.435 |
| 0.01 | 0.132 | 0.155 | 0.052 | 0.248 | 0.302 | |
| 0.001 | 0.072 | 0.072 | 0.018 | 0.139 | 0.176 | |
|
| ||||||
| G3 | 0.05 | 0.574 | 0.334 | 0.275 | 0.78 | 0.839 |
| 0.01 | 0.379 | 0.164 | 0.137 | 0.526 | 0.666 | |
| 0.001 | 0.145 | 0.061 | 0.041 | 0.223 | 0.399 | |
|
| ||||||
| G4 | 0.05 | 0.449 | 0.476 | 0.096 | 0.721 | 0.782 |
| 0.01 | 0.279 | 0.282 | 0.025 | 0.482 | 0.582 | |
| 0.001 | 0.096 | 0.111 | 0.005 | 0.196 | 0.312 | |
|
| ||||||
| G5 | 0.05 | 0.512 | 0.529 | 0.103 | 0.774 | 0.812 |
| 0.01 | 0.307 | 0.33 | 0.047 | 0.53 | 0.63 | |
| 0.001 | 0.132 | 0.129 | 0.014 | 0.239 | 0.368 | |
|
| ||||||
| M1 | 0.05 | 0.063 | 0.067 | 0.071 | 0.179 | 0.192 |
| 0.01 | 0.016 | 0.018 | 0.021 | 0.054 | 0.079 | |
| 0.001 | 0.003 | 0.004 | 0.004 | 0.011 | 0.017 | |
|
| ||||||
| M2 | 0.05 | 0.053 | 0.054 | 0.027 | 0.122 | 0.128 |
| 0.01 | 0.017 | 0.02 | 0.016 | 0.05 | 0.054 | |
| 0.001 | 0.005 | 0.005 | 0.003 | 0.012 | 0.012 | |
|
| ||||||
| M3 | 0.05 | 0.283 | 0.167 | 0.132 | 0.468 | 0.525 |
| 0.01 | 0.134 | 0.078 | 0.045 | 0.228 | 0.293 | |
| 0.001 | 0.034 | 0.016 | 0.008 | 0.056 | 0.102 | |
|
| ||||||
| M4 | 0.05 | 0.279 | 0.288 | 0.049 | 0.498 | 0.555 |
| 0.01 | 0.141 | 0.15 | 0.012 | 0.278 | 0.33 | |
| 0.001 | 0.047 | 0.052 | 0.001 | 0.094 | 0.132 | |
|
| ||||||
| M5 | 0.05 | 0.229 | 0.206 | 0.047 | 0.407 | 0.435 |
| 0.01 | 0.098 | 0.092 | 0.011 | 0.184 | 0.23 | |
| 0.001 | 0.034 | 0.029 | 0.003 | 0.06 | 0.084 | |
For αF = 0.05, 0.01, 0.001, α = 0.01667, 0.00333, 0.00033, respectively.
Highest powers are noted in boldface.
Table 6.
Power comparisons based on 1000 simulation replicates in Study 1 for a sample size of n = 1000.
| αF† | Individual Tests | Simultaneous Test | ||||
|---|---|---|---|---|---|---|
|
| ||||||
| X1 | X2 | X3 | Union | |||
| Study 1 | ||||||
|
| ||||||
| G1 | 0.05 | 0.763 | 0.802 | 0.606 | 0.969 | 0.987 |
| 0.01 | 0.609 | 0.604 | 0.403 | 0.881 | 0.951 | |
| 0.001 | 0.368 | 0.355 | 0.193 | 0.624 | 0.85 | |
|
| ||||||
| G2 | 0.05 | 0.613 | 0.702 | 0.276 | 0.827 | 0.872 |
| 0.01 | 0.455 | 0.539 | 0.139 | 0.678 | 0.767 | |
| 0.001 | 0.275 | 0.338 | 0.057 | 0.456 | 0.604 | |
|
| ||||||
| G3 | 0.05 | 0.99 | 0.874 | 0.793 | 1 | 1 |
| 0.01 | 0.955 | 0.725 | 0.616 | 0.991 | 0.999 | |
| 0.001 | 0.847 | 0.479 | 0.358 | 0.929 | 0.994 | |
|
| ||||||
| G4 | 0.05 | 0.961 | 0.959 | 0.313 | 0.997 | 1 |
| 0.01 | 0.875 | 0.898 | 0.161 | 0.982 | 0.999 | |
| 0.001 | 0.716 | 0.742 | 0.056 | 0.904 | 0.976 | |
|
| ||||||
| G5 | 0.05 | 0.98 | 0.975 | 0.37 | 0.995 | 0.999 |
| 0.01 | 0.915 | 0.925 | 0.21 | 0.985 | 0.994 | |
| 0.001 | 0.775 | 0.806 | 0.078 | 0.94 | 0.983 | |
|
| ||||||
| M1 | 0.05 | 0.231 | 0.195 | 0.146 | 0.45 | 0.531 |
| 0.01 | 0.109 | 0.087 | 0.055 | 0.224 | 0.317 | |
| 0.001 | 0.043 | 0.027 | 0.016 | 0.083 | 0.134 | |
|
| ||||||
| M2 | 0.05 | 0.104 | 0.135 | 0.037 | 0.239 | 0.247 |
| 0.01 | 0.048 | 0.054 | 0.012 | 0.105 | 0.124 | |
| 0.001 | 0.01 | 0.019 | 0.004 | 0.032 | 0.04 | |
|
| ||||||
| M3 | 0.05 | 0.809 | 0.519 | 0.477 | 0.933 | 0.968 |
| 0.01 | 0.645 | 0.317 | 0.272 | 0.793 | 0.908 | |
| 0.001 | 0.389 | 0.137 | 0.104 | 0.504 | 0.748 | |
|
| ||||||
| M4 | 0.05 | 0.806 | 0.807 | 0.204 | 0.963 | 0.984 |
| 0.01 | 0.614 | 0.653 | 0.092 | 0.85 | 0.93 | |
| 0.001 | 0.358 | 0.393 | 0.019 | 0.599 | 0.773 | |
|
| ||||||
| M5 | 0.05 | 0.665 | 0.646 | 0.145 | 0.879 | 0.915 |
| 0.01 | 0.447 | 0.437 | 0.056 | 0.671 | 0.785 | |
| 0.001 | 0.23 | 0.222 | 0.013 | 0.384 | 0.549 | |
For αF = 0.05, 0.01, 0.001, α = 0.01667, 0.00333, 0.00033, respectively.
Highest powers are noted in boldface.
Table 7.
Power comparisons based on 1000 simulation replicates in Study 2 for a sample size of n = 300.
| αF† | Individual Tests | Simultaneous Text | ||||
|---|---|---|---|---|---|---|
|
| ||||||
| X1 | X2 | X3 | Union | |||
| Study 2 | ||||||
|
| ||||||
| G1 | 0.05 | 0.261 | 0.171 | 0.198 | 0.484 | 0.578 |
| 0.01 | 0.124 | 0.057 | 0.089 | 0.236 | 0.357 | |
| 0.001 | 0.039 | 0.011 | 0.032 | 0.077 | 0.165 | |
|
| ||||||
| G2 | 0.05 | 0.318 | 0.026 | 0.104 | 0.359 | 0.359 |
| 0.01 | 0.193 | 0.014 | 0.058 | 0.222 | 0.248 | |
| 0.001 | 0.115 | 0.006 | 0.02 | 0.125 | 0.151 | |
|
| ||||||
| G3 | 0.05 | 0.019 | 0.386 | 0.012 | 0.386 | 0.384 |
| 0.01 | 0.004 | 0.195 | 0.004 | 0.195 | 0.191 | |
| 0.001 | 0 | 0.067 | 0.001 | 0.067 | 0.058 | |
|
| ||||||
| G4 | 0.05 | 0.665 | 0.52 | 0.015 | 0.839 | 0.881 |
| 0.01 | 0.446 | 0.305 | 0.001 | 0.598 | 0.716 | |
| 0.001 | 0.22 | 0.127 | 0 | 0.313 | 0.453 | |
|
| ||||||
| G5 | 0.05 | 0.017 | 0.018 | 0.211 | 0.211 | 0.228 |
| 0.01 | 0.005 | 0.005 | 0.093 | 0.093 | 0.1 | |
| 0.001 | 0.001 | 0.001 | 0.026 | 0.026 | 0.03 | |
|
| ||||||
| M1 | 0.05 | 0.065 | 0.05 | 0.061 | 0.163 | 0.184 |
| 0.01 | 0.02 | 0.016 | 0.023 | 0.057 | 0.079 | |
| 0.001 | 0.003 | 0.002 | 0.005 | 0.01 | 0.015 | |
|
| ||||||
| M2 | 0.05 | 0.067 | 0.014 | 0.032 | 0.096 | 0.117 |
| 0.01 | 0.026 | 0.005 | 0.01 | 0.036 | 0.044 | |
| 0.001 | 0.01 | 0 | 0 | 0.01 | 0.012 | |
|
| ||||||
| M3 | 0.05 | 0.02 | 0.191 | 0.02 | 0.191 | 0.229 |
| 0.01 | 0.003 | 0.078 | 0.004 | 0.078 | 0.081 | |
| 0.001 | 0 | 0.023 | 0 | 0.023 | 0.018 | |
|
| ||||||
| M4 | 0.05 | 0.411 | 0.327 | 0.014 | 0.591 | 0.659 |
| 0.01 | 0.238 | 0.169 | 0.001 | 0.363 | 0.432 | |
| 0.001 | 0.089 | 0.064 | 0 | 0.138 | 0.203 | |
|
| ||||||
| M5 | 0.05 | 0.018 | 0.019 | 0.08 | 0.08 | 0.121 |
| 0.01 | 0.006 | 0.004 | 0.023 | 0.023 | 0.031 | |
| 0.001 | 0.001 | 0.001 | 0.003 | 0.003 | 0.007 | |
For αF = 0.05, 0.01, 0.001, α = 0.01667, 0.00333, 0.00033, respectively.
Highest powers are noted in boldface.
Type 1 error rates are highlighted in grey.
Table 9.
Power comparisons based on 1000 simulation replicates in Study 2 for a sample size of n = 1000.
| αF† | Individual Tests | Simultaneous Test | ||||
|---|---|---|---|---|---|---|
|
| ||||||
| X1 | X2 | X3 | Union | |||
| Study 2 | ||||||
|
| ||||||
| G1 | 0.05 | 0.795 | 0.537 | 0.549 | 0.949 | 0.979 |
| 0.01 | 0.605 | 0.336 | 0.354 | 0.812 | 0.936 | |
| 0.001 | 0.371 | 0.143 | 0.17 | 0.522 | 0.803 | |
|
| ||||||
| G2 | 0.05 | 0.782 | 0.023 | 0.289 | 0.811 | 0.83 |
| 0.01 | 0.652 | 0.008 | 0.164 | 0.681 | 0.708 | |
| 0.001 | 0.464 | 0.001 | 0.068 | 0.486 | 0.533 | |
|
| ||||||
| G3 | 0.05 | 0.02 | 0.921 | 0.02 | 0.921 | 0.908 |
| 0.01 | 0.004 | 0.819 | 0.006 | 0.819 | 0.789 | |
| 0.001 | 0.001 | 0.602 | 0 | 0.602 | 0.522 | |
|
| ||||||
| G4 | 0.05 | 1 | 0.982 | 0.021 | 1 | 1 |
| 0.01 | 0.986 | 0.938 | 0.005 | 1 | 1 | |
| 0.001 | 0.943 | 0.818 | 0 | 0.985 | 0.998 | |
|
| ||||||
| G5 | 0.05 | 0.018 | 0.014 | 0.683 | 0.683 | 0.659 |
| 0.01 | 0.001 | 0.006 | 0.478 | 0.478 | 0.437 | |
| 0.001 | 0 | 0.001 | 0.251 | 0.251 | 0.216 | |
|
| ||||||
| M1 | 0.05 | 0.213 | 0.129 | 0.127 | 0.389 | 0.474 |
| 0.01 | 0.097 | 0.043 | 0.054 | 0.181 | 0.26 | |
| 0.001 | 0.022 | 0.01 | 0.011 | 0.042 | 0.1 | |
|
| ||||||
| M2 | 0.05 | 0.152 | 0.017 | 0.058 | 0.191 | 0.229 |
| 0.01 | 0.076 | 0.004 | 0.017 | 0.09 | 0.088 | |
| 0.001 | 0.022 | 0.001 | 0.001 | 0.023 | 0.035 | |
|
| ||||||
| M3 | 0.05 | 0.021 | 0.621 | 0.017 | 0.621 | 0.607 |
| 0.01 | 0.005 | 0.403 | 0.003 | 0.403 | 0.372 | |
| 0.001 | 0 | 0.169 | 0 | 0.169 | 0.147 | |
|
| ||||||
| M4 | 0.05 | 0.954 | 0.869 | 0.018 | 0.995 | 0.999 |
| 0.01 | 0.879 | 0.736 | 0.006 | 0.966 | 0.987 | |
| 0.001 | 0.693 | 0.509 | 0.001 | 0.83 | 0.937 | |
|
| ||||||
| M5 | 0.05 | 0.015 | 0.018 | 0.311 | 0.311 | 0.332 |
| 0.01 | 0.003 | 0.005 | 0.154 | 0.154 | 0.149 | |
| 0.001 | 0.001 | 0.001 | 0.054 | 0.054 | 0.048 | |
For αF = 0.05, 0.01, 0.001, α = 0.01667, 0.00333, 0.00033, respectively.
Highest powers are noted in boldface.
Type 1 error rates are highlighted in grey.
In Study 1, all causal SNPs, G1, …, G5, have influences on all three longitudinal traits. For each causal SNP, when performing the simultaneous association test on all traits, the power is consistently higher than the power obtained from the union of individual tests on each trait. When testing on SNPs, M1, …, M5, that are in LD with the causal SNPs, the power obtained from the simultaneous test is consistently higher than the power obtained from the union of individual tests on each trait. Moreover, as expected, the power is lower for testing on M1, …, M5 compared to the power obtained from testing on the causal SNPs G1, …, G5, correspondingly. The dilution of the power depends on the LD levels between the Gr and Mr for r = 1, 2, …, 5, and their allele frequencies.
In Study 2, we designed the causal SNPs to be associated with different numbers of traits: G1 affects all three traits, G2 and G4 affect two traits, and G3 and G5 affect only one trait. The results are summarized in Tables 7–9. When the causal SNPs influence more than one trait, such as with G1, G2, and G4, the simultaneous association tests are consistently more powerful than the union of individual tests on each trait across different sample sizes. The power gain is more obvious if an SNP has effects on more traits. Note that G2 is not associated with the second trait X2 so the rejection rate should correspond to the type I error rate. So, these empirical type I error rates are highlighted in grey in Tables 7–9 to distinguish them from the empirical power. When the causal SNPs affect only one trait, such as with G3 and G5, the power obtained from the simultaneous test is similar to the power obtained from the individual tests. Again, when SNPs M1, …, M5 that are in LD with the causal SNPs are tested, the power is generally lower. But, similar patterns in power to those obtained from the tests of the causal SNPs are observed.
4. Real Data Analysis
4.1. Data Description
We apply our proposed method to analyze the GAW16 Problem 2 data drawn from the FHS. The FHS is an ongoing, observational, prospective study for identifying CVD risk factors. The FHS is conducted under the supervision of the National Heart, Lung and Blood Institute and in collaboration with Boston University. The GAW16 Problem 2 data set is drawn from the FHS, and includes pedigree and phenotype data from three generations; Original Cohort, Offspring Cohort, and Third Generation Cohort were recruited from Framingham, Massachusetts, in 1948, 1971, and 2002, respectively, with four examinations of phenotypic traits collected repeatedly for the first two generations. The phenotype data set contains information on demographics (e.g., sex and age) and clinical measurements (e.g., height, weight, blood pressure, hypertensive status, diabetic status, etc.). Furthermore, it includes genotype data from the three generations with over 900 known familial relationships, in which Affymetrix performed dense SNP genotyping using approximately 550000 SNPs (GeneChip®Human Mapping 500K Array Set and 50K Human Gene Focused Panel) in the three generations of subjects (Cupples et al., 2009). We considered 6 979 subjects with known pedigrees. Among them, 2 050 subjects in 460 pedigrees are considered for our two-stage analysis, in which the number of subjects per family ranges from 2 to 101. Moreover, a total of 467 773 SNPs on 22 autosomes will be tested for the association.
In this study, we are interested in the genetic association with respect to four CVD-related longitudinal traits: systolic blood pressure (SBP), high-density lipoprotein (HDL) cholesterol level, approximated low-density lipoprotein (LDL) cholesterol level, and triglyceride (TG) level. In stage 1, the four longitudinal traits were adjusted for confounding factors (listed in Table 10 under the column named ‘Covariate’) and were transformed if necessary so that the residuals were approximately normally distributed. To select relevant covariates for each longitudinal trait, a function called ‘bfFixefLMER_F.fnc’ found in the ‘LMERConvenienceFunctions’ package in R (Tremblay et al., 2015) is used. With the inclusion of only the selected significant covariates, a linear mixed-effect model is fit to each longitudinal trait to obtain a predicted subject-specific effect for each subject. In stage 2, we simultaneously test the association between each SNP and four predicted subject-specific effects corresponding to the four longitudinal traits. Individual tests between each SNP and each predicted subject-specific effect for each longitudinal trait are also performed for comparison.
Table 10.
Fixed effect estimates and their associated standard errors (SEs) of covariates for each longitudinal trait using GLMM.
| Covariate | Fixed Effect akℓ | Trait k | ||||
|---|---|---|---|---|---|---|
|
| ||||||
| log (SBP) | log (HDL) | LDL | log (TG) | |||
| Sex | Estimate | −0.0269 | 0.2551 | −5.9117 | −0.1054 | |
| SE | 0.0041 | 0.0089 | 1.2544 | 0.0182 | ||
|
| ||||||
| p–value† | ≈ 0 | ≈0 | 1.66 × 10-10 | ≈ 0 | ||
|
| ||||||
| Diabetes | Estimate | 0.0441 | −0.0690 | 1.5096 | 0.0787 | |
| SE | 0.0070 | 0.0152 | 2.1588 | 0.0308 | ||
|
| ||||||
| p–value† | ≈ 0 | ≈ 0 | 8.98 × 10-3 | ≈ 0 | ||
|
| ||||||
| Age | Estimate | 0.0019 | 0.0024 | — | 0.0167 | |
| SE | 0.0001 | 0.0002 | — | 0.0005 | ||
|
| ||||||
| p–value† | ≈ 0 | 9 54 × 10-10 | — | ≈ 0 | ||
|
| ||||||
| BMI | Estimate | 0.0071 | −0.0147 | 1.4312 | 0.0415 | |
| SE | 0.0004 | 0.0007 | 0.0956 | 0.0016 | ||
|
| ||||||
| p–value† | ≈0 | ≈0 | ≈ 0 | ≈ 0 | ||
|
| ||||||
| Smoke‡ | Estimate( ) | −0.0162 | −0.0036 | 3.2727 | 0.0083 | |
| SE( ) | 0.0043 | 0.0086 | 1.2086 | 0.0184 | ||
|
| ||||||
| Estimate( ) | −0.0183 | −0.0652 | 0.0323 | 0.0600 | ||
| SE( ) | 0.0046 | 0.0115 | 1.6540 | 0.0260 | ||
|
| ||||||
| p–value† | 3.53 × 10-3 | ≈ 0 | 6.34 × 10-11 | 6.04 × 10-14 | ||
|
| ||||||
| Alcohol | Estimate | 0.0030 | 0.0119 | — | 0.0050 | |
| SE | 0.0004 | 0.0006 | — | 0.0015 | ||
|
| ||||||
| p–value† | 9.00 × 10-16 | ≈0 | — | 6.32 × 10-4 | ||
|
| ||||||
| Cigarettes | Estimate | — | −0.0009 | 0.2592 | 0.0028 | |
| SE | — | 0.0004 | 0.0545 | 0.0009 | ||
|
| ||||||
| p–value† | — | 1.98 × 10-2 | 4.69 × 10-9 | 1.62 × 10-3 | ||
|
| ||||||
| Cholesterol RX | Estimate | −0.0151 | — | −39.7373 | −0.1403 | |
| SE | 0.0051 | — | 1.1720 | 0.0206 | ||
|
| ||||||
| p–value† | 3.10 × 10-3 | — | ≈ 0 | 2.52 × 10-10 | ||
|
| ||||||
| Hypertension RX | Estimate | — | −0.0214 | — | 0.0483 | |
| SE | — | 0.0066 | — | 0.0161 | ||
|
| ||||||
| p–value† | — | 1.20 × 10-3 | — | 2.77 × 10-3 | ||
Approximate upper-bound p–values for the analysis of variance used in the backward selection of GLMMs.
The covariate Smoke is a categorical variable that has three levels: non-smoker (reference level), former smoker, and current smoker. So, there are two coefficients and for the levels of former smoker and current smoker, respectively.
4.2. Results
In the first step, nine potential covariates, as listed in the first column of Table 10, are considered for each of the longitudinal traits. The backward selection function in the ‘LMERConvenienceFunction’ package in R (Tremblay et al., 2015) selected different sets of covariates that have significant effects on different longitudinal traits. The selected covariates for each longitudinal trait are then included in the GLMM so the fixed effects for the selected covariates are estimated, and the subject-specific genetic effect (γijk) and the family-specific random effect (Γik) are predicted.
Table 10 summarizes the selected covariates for each longitudinal trait. For each selected covariate for a given longitudinal trait, the approximated upper-bound p–value is reported. The estimate of its fixed effect âkℓ and the associated standard error from fitting the GLMM are also reported in Table 10. The excluded covariates in the fitted GLMMs are indicated by a dash ‘—’ in Table 10. The time-invariant covariates, sex (Sex) and diabetes status (Diabetes), are strongly significant for all traits, with p–values being nearly zero for all traits (except for the diabetes status associated with LDL for which p–value = 8.98 × 10−3). Here, diabetes status is defined as the occurrence of diabetes at any time during the study. The time-varying covariates, BMI (BMI) and smoking status (Smoke), are strongly significant for all traits. Note that smoking status is a categorical variable with three levels: non-smoker as the reference level, former smoker, and current smoker. So, there are two coefficients and for the levels of former smoker and current smoker, respectively, to be estimated. Age (Age) and number of ounces of equivalent alcohol consumed per week (Alcohol) are found to have significant effects on the log(SBP), log(HDL), and log(TG). Number of cigarettes smoked per day (Cigarettes) is found to be significant for log(HDL), LDL, and log(TG). Cholesterol treatment (Cholesterol Rx) is significant for log(SBP), LDL, and log(TG), and hypertensive treatment (Hypertension Rx) is significant for log(HDL) and log(TG).
In the second stage, we simultaneously test the association between each SNP and all predicted subject-specific effects, where γ̂1, γ̂2, γ̂3, and γ̂4 correspond to the longitudinal traits log(SBP), log(HDL), LDL, and log(TG), respectively. We also test the association between each SNP and the predicted subject-specific effect for each trait individually. SNPs with p–values less than 1 × 10−5 from the simultaneous test are listed in Table 11. In addition, Table 11 also reports the significance level of association with these SNPs when they are tested with each trait individually. The SNP rs3776779 is an intron variant within the FAM174A gene. The FAM174A gene has been recently recognized as one of six new candidate genes for its regulatory role in cholesterol homeostasis. Re-localization of the protein, FAM174A, to alternative organelles under reduced cholesterol levels resembled a key feature of other known regulators of cellular cholesterol homeostasis (Blattmann et al., 2013). This SNP is found to be strongly significant in the simultaneous test with the smallest p–value of 2.22 × 10−16 and is also confirmed to be significantly associated with the LDL trait with a p–value of 4.44 × 10−15. The ten SNPs on chromosome 8 located around the region of 19.9 Mb are found by the simultaneous association test to have strong significant associations with at least one of the four traits. These SNPs are located within or close to the LPL gene that encodes lipoprotein lipase enzyme (Andreotti et al., 2009) and has dual functions as a triglyceride hydrolase and a ligand factor for receptor-mediated lipoprotein uptake. Results from individual tests show that all of these SNPs are significantly associated with TG and HDL traits. Note that their p–values based on the union of individual tests are consistently less than the p–values obtained from the simultaneous tests. Table 11 provides the confirmations of the associated SNPs based on other literatures. It is important note that the p–values for the individual tests listed in Table 11 have been adjusted via the Bonferroni procedure for multiple testing.
Table 11.
Most significant SNPs (p–values<10-5) based on simultaneous tests.
| SNP† | p–value | ||||||
|---|---|---|---|---|---|---|---|
|
| |||||||
| Ch | Location (Mb) | Gene(s) | Individual Tests | Simultaneous Test | |||
|
| |||||||
| ‡ Trait | Union | ||||||
| rs5998397,10,11,12,17 | 1 | 109.624 |
CELSR2 PSRC1 SORT1 |
5,7,12-14,16-19LDL | 4.98 × 10−11 | 4.98 × 10−11 | 1.82 × 10−9 |
|
| |||||||
| rs78009412,15 | 2 | 27.595 | GCKR | 5,8,12,15,18,19TG | 4.93 × 10−7 | 4.93 × 10−7 | 1.65 × 10−12 |
|
| |||||||
| rs3776779 | 5 | 99.925 | FAM174A | 2LDL | 4.44 × 10−15 | 4.44 × 10−15 | 2.22 × 10−16 |
|
| |||||||
| rs2631 | 8 | 19.857 | LPL | HDL | 3.28 × 10−4 | 3.28 × 10−4 | 2.69 × 10−6 |
| 1TG | 3.02 × 10−3 | ||||||
|
| |||||||
| rs17410962 | 8 | 19.892 | LPL | 11,14,19HDL | 2.47 × 10−7 | 2.47 × 10−7 | 1.40 × 10−7 |
| 14,19TG | 4.87 × 10−5 | ||||||
|
| |||||||
| rs17489268 | 8 | 19.896 | LPL | 14,19TG | 9.22 × 10−7 | 9.22 × 10−7 | 2.32 × 10−7 |
| 11,14,19HDL | 1.25 × 10−6 | ||||||
|
| |||||||
| rs1741103118 | 8 | 19.897 | LPL | 14,19TG | 1.69 × 10−7 | 1.69 × 10−7 | 5.54 × 10−8 |
| 11,14,18,19HDL | 7.03 × 10−7 | ||||||
|
| |||||||
| rs17489282 | 8 | 19.897 | LPL | 11,19HDL | 5.96 × 10−6 | 5.96 × 10−6 | 1.93 × 10−6 |
| 19TG | 5.96 × 10−6 | ||||||
|
| |||||||
| rs17411126 | 8 | 19.900 | LPL | 14,19TG | 1.24 × 10−7 | 1.24 × 10−7 | 6.69 × 10−8 |
| 11,14,19HDL | 1.57 × 10−6 | ||||||
|
| |||||||
| rs76554710 | 8 | 19.911 | LPL | 14,19TG | 9.85 × 10−8 | 9.85 × 10−8 | 3.36 × 10−8 |
| 11,14,19HDL | 5.49 × 10−7 | ||||||
|
| |||||||
| rs11986942 | 8 | 19.912 | LPL | 19TG | 2.06 × 10−7 | 2.06 × 10−7 | 3.07 × 10−8 |
| 11,14,19HDL | 1.77 × 10−6 | ||||||
|
| |||||||
| rs1837842 | 8 | 19.913 | LPL | 14,19TG | 1.56 × 10−7 | 1.56 × 10−7 | 7.84 × 10−8 |
| 11,14,19HDL | 1.43 × 10−6 | ||||||
|
| |||||||
| rs1919484 | 8 | 19.914 | LPL | 14,19TG | 6.40 × 10−7 | 6.40 × 10−7 | 1.90 × 10−7 |
| 4,11,14,19HDL | 1.23 × 10−6 | ||||||
|
| |||||||
| rs70677949 | 10 | 21.464 | NEBL | 9,15TG | 6.90 × 10−4 | 6.90 × 10−4 | 3.40 × 10−6 |
|
| |||||||
| rs47750415,12 | 15 | 56.462 | LIPC | 6,12,15,19HDL | 1.79 × 10−3 | 1.79 × 10−3 | 2.77 × 10−6 |
|
| |||||||
| rs49398833,17 | 18 | 45.421 |
ACAA2 LIPG |
11,14HDL | 7.47 × 10−4 | 7.47 × 10−4 | 6.95 × 10−6 |
|
| |||||||
| rs4137715117 | 19 | 50.115 | APOC1 | 17LDL | 1.18 × 10−5 | 1.18 × 10−5 | 4.13 × 10−7 |
rs3776779, rs7067794, and rs41377151 are rare variants where their minor allele frequencies adjusted for the relationship among subjects are less than 0.01.
Reference(s) for significant association between a given SNP and/or candidate gene(s) and correlated trait(s) either not listed under the column ‘Trait’ or not investigated in this study.
Reference(s) for confirmatory findings of significant association between a given SNP and/or candidate gene(s) and the given trait(s).
5. Discussion
In this manuscript, a two-stage method for analyzing the association with multiple longitudinal traits and particularly, for analyzing data from samples of related subjects, is proposed. In the first stage, a three-level nested mixed effects model is used to analyze each longitudinal trait. In the three-level mixed effects model, repeated measurements (level 1) are nested within subjects (level 2) and subjects are nested within families (level 3). While the fixed effects for confounding covariates are included in the model, the random effects at the subject and family levels are also included. The subject level random effect is interpreted as the unobserved subject-specific genetic effects that contribute to the variation of the longitudinal trait. The family level random effect is interpreted as the unobserved common environmental factors shared among family members that explain part of the variation of the longitudinal trait. In most situations when a mixed effects model is used to analyze longitudinal data, random effects are treated as nuisance parameters that account for the intra-correlation among repeated measurements while the estimation of the fixed effects of the covariates of interest are of main concern. In our method, the focus is also on the prediction of the subject-specific random effects. In the second stage, a generalized quasi-likelihood scoring approach is proposed to simultaneously test the association between a given SNP and multiple subject-specific random effects arising from multiple longitudinal traits.
In the second stage, treating the observed allele frequency of each subject as the response and the subject-specific random effects as the covariates allows us to simultaneously test more than one trait. Results from both the simulation studies and real analysis show that the proposed method is more powerful for detecting genetic pleiotropy when compared to the conventional approaches of taking the consensus of individual tests. In addition, when analyzing samples of related subjects, such as family data, the GQLSM allows us to adjust for the correlation of observed genotypes among subjects due to relatedness among subjects from the same family.
In many situations, GWAS are conducted to screen the genome for a set of most relevant or important SNPs that are associated with the traits of interest. When multiple traits are studied, the simultaneous test would provide an overall significance level for the association with all traits. This makes the ranking of the significance level of the SNPs much easier than the consensus of individual tests, as individual tests only provide the significance level of an SNP for each individual trait. For example, the results from the real analysis of the GAW16 FHS data show that SNP rs780094 on chromosome 2 is strongly significant for TG (p–value = 4.93 × 10−7) and SNP rs263 on chromosome 8 is moderately significant for HDL (p–value = 3.28 × 10−4) and TG (p–value = 3.02 × 10−3) by the individual tests. In this case, it is difficult to discern from the consensus of individual tests which SNP is of more concern and should have higher priority for further investigation. However, the results from the simultaneous test better answer this question, SNP rs780094 would be more significant and thus more worthy to be further investigated than SNP rs263 because rs780094 has a more significant p–value (1.65 × 10−12) than rs263 (p–value = 2.69 × 10−6) in the simultaneous test.
Our proposed method opens several avenues for future research. Missing data are common in longitudinal studies. For example, in a randomization clinical trail a participant may miss a particular examination for various different reasons, such that all measurements at that time point are missed. Moreover, some participants might drop out of the clinical trials for different reasons. In general, three mechanisms – missing complete at random (MCAR), missing at random (MAR), and missing not at random (MNAR) – generate missing data. Under the assumption of the MCAR mechanism, the mixed effects model in step 1 is still applicable. However, when this missing assumption is violated, typically when missing is under the MNAR assumption, using the mixed effects model to analyze such data is no longer appropriate. Therefore, methods of handling missing data under different missing mechanism assumptions are worth investigating to obtain more precise predictions of the subject-specific effects as well as better estimates of fixed effects of covariates.
Table 5.
Power comparisons based on 1000 simulation replicates in Study 1 for a sample size of n = 500.
| αF† | Individual Tests | Simultaneous Test | ||||
|---|---|---|---|---|---|---|
|
| ||||||
| X1 | X2 | X3 | Union | |||
| Study 1 | ||||||
|
| ||||||
| G1 | 0.05 | 0.435 | 0.432 | 0.286 | 0.714 | 0.798 |
| 0.01 | 0.25 | 0.255 | 0.124 | 0.47 | 0.619 | |
| 0.001 | 0.114 | 0.111 | 0.041 | 0.226 | 0.373 | |
|
| ||||||
| G2 | 0.05 | 0.37 | 0.416 | 0.142 | 0.575 | 0.626 |
| 0.01 | 0.219 | 0.272 | 0.064 | 0.396 | 0.469 | |
| 0.001 | 0.103 | 0.147 | 0.023 | 0.219 | 0.324 | |
|
| ||||||
| G3 | 0.05 | 0.821 | 0.551 | 0.423 | 0.939 | 0.976 |
| 0.01 | 0.648 | 0.331 | 0.243 | 0.794 | 0.9 | |
| 0.001 | 0.401 | 0.132 | 0.092 | 0.5 | 0.748 | |
|
| ||||||
| G4 | 0.05 | 0.71 | 0.726 | 0.143 | 0.914 | 0.951 |
| 0.01 | 0.513 | 0.514 | 0.047 | 0.753 | 0.863 | |
| 0.001 | 0.262 | 0.269 | 0.007 | 0.44 | 0.645 | |
|
| ||||||
| G5 | 0.05 | 0.759 | 0.752 | 0.149 | 0.938 | 0.958 |
| 0.01 | 0.564 | 0.562 | 0.063 | 0.805 | 0.891 | |
| 0.001 | 0.326 | 0.32 | 0.018 | 0.527 | 0.692 | |
|
| ||||||
| M1 | 0.05 | 0.112 | 0.091 | 0.087 | 0.25 | 0.299 |
| 0.01 | 0.054 | 0.035 | 0.03 | 0.116 | 0.143 | |
| 0.001 | 0.007 | 0.008 | 0.005 | 0.019 | 0.037 | |
|
| ||||||
| M2 | 0.05 | 0.068 | 0.08 | 0.029 | 0.161 | 0.163 |
| 0.01 | 0.021 | 0.028 | 0.01 | 0.057 | 0.061 | |
| 0.001 | 0.004 | 0.002 | 0.002 | 0.008 | 0.016 | |
|
| ||||||
| M3 | 0.05 | 0.484 | 0.273 | 0.189 | 0.685 | 0.744 |
| 0.01 | 0.311 | 0.116 | 0.091 | 0.445 | 0.57 | |
| 0.001 | 0.13 | 0.035 | 0.022 | 0.171 | 0.297 | |
|
| ||||||
| M4 | 0.05 | 0.484 | 0.464 | 0.082 | 0.726 | 0.792 |
| 0.01 | 0.289 | 0.269 | 0.017 | 0.476 | 0.59 | |
| 0.001 | 0.107 | 0.101 | 0.001 | 0.192 | 0.31 | |
|
| ||||||
| M5 | 0.05 | 0.328 | 0.333 | 0.056 | 0.567 | 0.62 |
| 0.01 | 0.178 | 0.164 | 0.014 | 0.302 | 0.372 | |
| 0.001 | 0.054 | 0.059 | 0.001 | 0.106 | 0.157 | |
For αF = 0.05, 0.01, 0.001, α = 0.01667, 0.00333, 0.00033, respectively.
Highest powers are noted in boldface.
Table 8.
Power comparisons based on 1000 simulation replicates in Study 2 for a sample size of n = 500.
| αF† | Individual Tests | Simultaneous Test | ||||
|---|---|---|---|---|---|---|
|
| ||||||
| X1 | X2 | X3 | Union | |||
| Study 2 | ||||||
|
| ||||||
| G1 | 0.05 | 0.471 | 0.302 | 0.323 | 0.699 | 0.804 |
| 0.01 | 0.284 | 0.142 | 0.166 | 0.449 | 0.599 | |
| 0.001 | 0.129 | 0.047 | 0.06 | 0.201 | 0.388 | |
|
| ||||||
| G2 | 0.05 | 0.446 | 0.028 | 0.162 | 0.491 | 0.512 |
| 0.01 | 0.31 | 0.008 | 0.078 | 0.341 | 0.37 | |
| 0.001 | 0.185 | 0.003 | 0.032 | 0.196 | 0.228 | |
|
| ||||||
| G3 | 0.05 | 0.02 | 0.629 | 0.017 | 0.629 | 0.614 |
| 0.01 | 0.005 | 0.424 | 0.004 | 0.424 | 0.386 | |
| 0.001 | 0 | 0.2 | 0.001 | 0.2 | 0.168 | |
|
| ||||||
| G4 | 0.05 | 0.894 | 0.802 | 0.02 | 0.984 | 0.992 |
| 0.01 | 0.757 | 0.595 | 0.004 | 0.889 | 0.951 | |
| 0.001 | 0.521 | 0.331 | 0 | 0.663 | 0.82 | |
|
| ||||||
| G5 | 0.05 | 0.015 | 0.02 | 0.381 | 0.381 | 0.389 |
| 0.01 | 0.003 | 0.003 | 0.195 | 0.195 | 0.191 | |
| 0.001 | 0 | 0 | 0.054 | 0.054 | 0.052 | |
|
| ||||||
| M1 | 0.05 | 0.123 | 0.082 | 0.079 | 0.249 | 0.295 |
| 0.01 | 0.041 | 0.028 | 0.02 | 0.085 | 0.132 | |
| 0.001 | 0.009 | 0.006 | 0.001 | 0.016 | 0.039 | |
|
| ||||||
| M2 | 0.05 | 0.092 | 0.022 | 0.025 | 0.111 | 0.151 |
| 0.01 | 0.034 | 0.002 | 0.009 | 0.042 | 0.051 | |
| 0.001 | 0.014 | 0 | 0.002 | 0.016 | 0.017 | |
|
| ||||||
| M3 | 0.05 | 0.019 | 0.306 | 0.02 | 0.306 | 0.331 |
| 0.01 | 0.005 | 0.151 | 0.005 | 0.151 | 0.147 | |
| 0.001 | 0 | 0.041 | 0.001 | 0.041 | 0.041 | |
|
| ||||||
| M4 | 0.05 | 0.69 | 0.542 | 0.015 | 0.848 | 0.897 |
| 0.01 | 0.48 | 0.344 | 0.002 | 0.641 | 0.736 | |
| 0.001 | 0.253 | 0.151 | 0.001 | 0.358 | 0.512 | |
|
| ||||||
| M5 | 0.05 | 0.016 | 0.02 | 0.142 | 0.142 | 0.169 |
| 0.01 | 0.001 | 0.003 | 0.049 | 0.049 | 0.059 | |
| 0.001 | 0 | 0.002 | 0.011 | 0.011 | 0.01 | |
For αF = 0.05, 0.01, 0.001, α = 0.01667, 0.00333, 0.00033, respectively.
Highest powers are noted in boldface.
Type 1 error rates are highlighted in grey.
Acknowledgments
The Framingham Heart Study project is conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with Boston University (N01 HC25195). The Genetic Analysis Workshop is supported by NIH grant R01 GM31575. The GAW16 Framingham data used for the analyses described in this manuscript were obtained through dbGaP (phs000128.v3.p3). The authors acknowledge the investigators of the Framingham Heart Study and does not necessarily reflect the opinions or views of the Framingham Heart Study, Boston University, or the NHLBI.
This work is partially supported by Natural Sciences and Engineering Research Council of Canada individual discovery grant.
References
- Andreotti G, Menashe I, Chen J, Chang SC, Rashid A, Gao YT, Han TQ, Sakoda LC, Chanock S, Rosenberg PS, Hsing AW. Genetic determinants of serum lipid levels in Chinese subjects: a population-based study in Shanghai, China. European Journal of Epidemiology. 2009;24(12):763–774. doi: 10.1007/s10654-009-9402-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aschard H, Vilhjalmsson BJ, Greliche N, Morange PE, Tregouet D, Kraft P. Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies. American Journal of Human Genetics. 2014;94(5):662–676. doi: 10.1016/j.ajhg.2014.03.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bates D. Computational methods for mixed models. R package version 1.1-8 2014a [Google Scholar]
- Bates D. Penalized least squares versus generalized least squares representations of linear mixed models. R package version 1.1-8 2014b [Google Scholar]
- Bates D, Maechler M, Bolker BM, Walker S. Fitting linear mixed-effects models using lme4, ArXiv e-print. Journal of Statistical Software 2015a in press. [Google Scholar]
- Bates D, Maechler M, Bolker BM, Walker S. lme4: Linear mixed-effects models using Eigen and S4. R package version 1.1-8 2015b [Google Scholar]
- Blattmann P, Schuberth C, Pepperkok R, Runz H. RNAi-based functional profiling of loci from blood lipid genome-wide association studies identifies genes with cholesterol-regulatory function. PLoS Genetics. 2013;9(2):e1003338. doi: 10.1371/journal.pgen.1003338. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bourgain C, Hoffjan S, Nicolae R, Newman D, Steiner L, Walker K, et al. McPeek MS. Novel case-control test in a founder population identifies P-selectin as an atopy-susceptibility locus. American Journal of Human Genetics. 2003;73(3):612–626. doi: 10.1086/378208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Browne RW, Weinstock-Guttman B, Zivadinov R, Horakova D, Bodziak ML, Tamaño-Blanco M, et al. Ramanathan M. Serum lipoprotein composition and vitamin D metabolite levels in clinically isolated syndromes: Results from a multi-center study. The Journal of Steroid Biochemistry and Molecular Biology. 2014;143(1):424–433. doi: 10.1016/j.jsbmb.2014.06.007. [DOI] [PubMed] [Google Scholar]
- Chen MH, Huang J, Chen WM, Larson MG, Fox CS, Vasan RS, et al. Yang Q. Using family-based imputation in genome-wide association studies with large complex pedigrees: the Framingham Heart Study. PLoS ONE. 2012;7(12):e51589. doi: 10.1371/journal.pone.0051589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cox DR, Hinkley DV. Theoretical Statistics. Chapman and Hall; London: 1974. [Google Scholar]
- Cupples LA, Heard-Costa N, Lee M, Atwood LD, Framingham Heart, Study Investigators. Genetics Analysis Workshop 16 Problem 2: the Framingham Heart Study data. BMC Proceedings. 2009;3(Suppl 7):S3. doi: 10.3389/fgene.2012.00001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feng Z. A generalized quasi-likelihood scoring approach for simultaneously testing the genetic association of multiple traits. Journal of Royal Statistical Society Series C, Applied Statistics. 2014;63(3):483–498. doi: 10.1111/rssc.12038. [DOI] [Google Scholar]
- Feng Z, Wong WWL, Gao X, Schenkel F. Generalized genetic association study with samples of related individuals. The Annals of Applied Statistics. 2011;5(3):2109–2130. doi: 10.1214/11-AOAS465. [DOI] [Google Scholar]
- Furlotte NA, Eskin E, Eyheramendy S. Genome-wide association mapping with longitudinal data. Genetic Epidemiology. 2012;36(5):463–471. doi: 10.1002/gepi.21640. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hedeker D, Gibbons RD. Longitudinal Data Analysis. John Wiley & Sons, Inc; Hoboken: 2006. [Google Scholar]
- Hegele RA, Ban MR, Hsueh N, Kennedy BA, Cao H, Zou GY, et al. Wang J. A polygenic basis for four classical Fredrickson hyperlipoproteinemia phenotypes that are characterized by hypertriglyceridemia. Human Molecular Genetics. 2009;18(21):4189–4194. doi: 10.1093/hmg/ddp361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heyde C. Quasi-likelihood and Its Application: a General Approach to Optimal Parameter Estimation. Springer; New York: 1997. [Google Scholar]
- Hodgkin J. Seven types of pleiotropy. The International Journal of Developmental Biology. 1998;42(3):501–505. [PubMed] [Google Scholar]
- Hodoglugil U, Williamson DW, Mahley RW. Polymorphisms in the hepatic lipase gene affect plasma HDL-cholesterol levels in a Turkish population. Journal of Lipid Research. 2010;51(2):422–430. doi: 10.1194/jlr.P001578. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kleber ME, Renner W, Grammer TB, Linsel-Nitschke P, Boehm BO, Winkelmann BR, et al. März W. Association of the single nucleotide polymorphism rs599839 in the vicinity of the sortilin 1 gene with LDL and triglyceride metabolism, coronary heart disease and myocardial infarction. the Ludwigshafen Risk and Cardiovascular Health Study. Atherosclerosis. 2010;209(2):492–497. doi: 10.1016/j.atherosclerosis.2009.09.068. [DOI] [PubMed] [Google Scholar]
- Kozian DH, Barthel A, Cousin E, Brunnhöfer R, Anderka O, März W, et al. Schmoll D. Glucokinase-activating GCKR polymorphisms increase plasma levels of triglycerides and free fatty acids, but do not elevate cardiovascular risk in the Ludwigshafen Risk and Cardiovascular Health Study. Hormone and Metabolic Research. 2010;42(7):502–506. doi: 10.1055/s-0030-1249637. [DOI] [PubMed] [Google Scholar]
- Lieb W, Chen MH, Teumer A, de Boer RA, Lin H, Fox ER, et al. EchoGen Consortium. Genome-wide meta-analyses of plasma renin activity and concentration reveal association with the kininogen 1 and prekallikrein genes. Circulation: Cardiovascular Genetics. 2015;8(11):131–140. doi: 10.1161/CIRCGENETICS.114.000613. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ma L, Han D, Yang J, Da Y. Multi-locus test conditional on confirmed effects leads to increased power in genome-wide association studies. PLoS One. 2010a;5(11):e15006. doi: 10.1371/journal.pone.0015006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ma L, Yang J, Runesha HB, Tanaka T, Ferrucci L, Bandinelli S, Da Y. Genome-wide association analysis of total cholesterol and high-density lipoprotein cholesterol levels using the Framingham Heart Study data. BMC Medical Genetics. 2010b;11:55. doi: 10.1186/1471-2350-11-55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mohlke KL, Boehnke M, Abecasis GR. Metabolic and cardiovascular traits: an abundance of recently identified common genetic variants. Human Molecular Genetics. 2008;17(R2):102–108. doi: 10.1093/hmg/ddn275. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Muendlein A, Geller-Rhomberg S, Saely CH, Winder T, Sonderegger G, Rein P, et al. Drexel H. Significant impact of chromosomal locus 1p13.3 on serum LDL cholesterol and on angiographically characterized coronary atherosclerosis. Atherosclerosis. 2009;206(2):494–499. doi: 10.1016/j.atherosclerosis.2009.02.040. [DOI] [PubMed] [Google Scholar]
- Naylor MG, Weiss ST, Lange C. A bayesian approach to genetic association studies with family-based designs. American Journal of Human Genetics. 2010;34(6):569–574. doi: 10.1002/gepi.20513. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Newman DL, Abney M, McPeek MS, Ober C, Cox NJ. The importance of genealogy in determining genetic associations with complex traits. American Journal of Human Genetics. 2011;69(5):1146–1148. doi: 10.1086/323659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- O'Reilly PF, Hoggart CJ, Pomyen Y, Calboli FCF, Elliott P, Jarvelin MR, Coin LJM. MultiPhen: Joint model of multiple phenotypes can increase discovery in GWAS. PLoS One. 2012;7(5):e34861. doi: 10.1371/journal.pone.0034861. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Piccolo SR, Abo RP, Allen-Brady K, Camp NJ, Knight S, Anderson JL, Horne BD. Evaluation of genetic risk scores for lipid levels using genome-wide markers in the Framingham Heart Study. BMC Proceedings. 2009;3(Suppl 7):S46. doi: 10.1186/1753-6561-3-S7-S46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rafiq S, Venkata KK, Gupta V, Vinay DG, Spurgeon CJ, Parameshwaran S, et al. Indian Migration Study Group. Evaluation of seven common lipid associated loci in a large Indian sib pair study. Lipids in Health and Disease. 2012;11(1):155. doi: 10.1186/1476-511X-11-155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roslin NM, Hamid JS, Paterson AD, Beyene J. Genome-wide association analysis of cardiovascular-related quantitative traits in the Framingham Heart Study. BMC Proceedings. 2009;3(Suppl 7):S117. doi: 10.1186/1753-6561-3-S7-S117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shriner D. Moving toward system genetics through multiple trait analysis in genome-wide association studies. Frontiers in Genetics. 2012;3(1):1–7. doi: 10.3389/fgene.2012.00001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sikorska K, Rivadeneira F, Groenen PJ, Hofman A, Uitterlinden AG, Eilers PH, Lesaffre E. Fast linear mixed model computations for genome-wide association studies with longitudinal data. Statistics in Medicine. 2013;32(1):165–180. doi: 10.1002/sim.5517. [DOI] [PubMed] [Google Scholar]
- Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW. Pleiotropy in complex traits: Challenges and strategies. Nature Reviews Genetics. 2013;14(7):483–495. doi: 10.1038/nrg3461. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stephens M. A unified framework for association analysis with multiple related phenotypes. PLoS One. 2013;8(7):e65245. doi: 10.1371/journal.pone.0065245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Suchindran S, Rivedal D, Guyton JR, Milledge T, Gao X, Benjamin A, et al. McCarthy JJ. Genome-wide association study of Lp-PLA(2) activity and mass in the Framingham Heart Study. PLoS Genetics. 2010;6(4):e1000928. doi: 10.1371/journal.pgen.1000928. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thornton T, McPeek MS. Case-control association testing with related individuals: A more powerful quasi-likelihood score test. American Journal of Human Genetics. 2007;81(2):321–337. doi: 10.1086/519497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tremblay A, Dalhousie University, Ransijn J University of Copenhagen. LMERConvenienceFunctions: Model selection and post-hoc analysis for (G)LMER models. R package version 2.10 2015 [Google Scholar]
- Wallace C, Newhouse SJ, Braund P, Zhang F, Tobin M, Falchi M, et al. Munroe PB. Genome-wide association study identifies genes for biomarkers of cardiovascular disease: serum urate and dyslipidemia. American Journal of Human Genetics. 2008;82(1):139–149. doi: 10.1016/j.ajhg.2007.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang W, Feng Z, Bull SB, Wang Z. A 2-step strategy for detecting pleiotropic effects on multiple longitudinal traits. Frontiers in Genetics. 2014;5(357):1–14. doi: 10.3389/fgene.2014.00357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou X, Stephens M. Efficient algorithms for multivariate linear mixed models in genome-wide association studies. Nature Methods. 2014;11(4):407–409. doi: 10.1038/nmeth.2848. [DOI] [PMC free article] [PubMed] [Google Scholar]
