SUMMARY
It is of great interest to quantify the contributions of genetic variation to brain structure and function, which are usually measured by high-dimensional imaging data (e.g., magnetic resonance imaging). In addition to the variance, the covariance patterns in the genetic effects of a functional phenotype are of biological importance, and covariance patterns have been linked to psychiatric disorders. The aim of this article is to develop a scalable method to estimate heritability and the nonstationary covariance components in high-dimensional imaging data from twin studies. Our motivating example is from the Human Connectome Project (HCP). Several major big-data challenges arise from estimating the genetic and environmental covariance functions of functional phenotypes extracted from imaging data, such as cortical thickness with 60 000 vertices. Notably, truncating to positive eigenvalues and their eigenfunctions from unconstrained estimators can result in large bias. This motivated our development of a novel estimator ensuring positive semidefiniteness. Simulation studies demonstrate large improvements over existing approaches, both with respect to heritability estimates and covariance estimation. We applied the proposed method to cortical thickness data from the HCP. Our analysis suggests fine-scale differences in covariance patterns, identifying locations in which genetic control is correlated with large areas of the brain and locations where it is highly localized.
Keywords: Covariance estimation, Functional data analysis, Heritability, Neuroimaging, Twin studies
1. Introduction
It is of great interest to quantify the contribution of genetic effects to brain structure and function, but scientific understanding of such effects is in its infancy (Chen and others, 2013). Measuring the relative size of genetic and environmental effects on brain traits may provide insight into the etiology of neurological and neurodegenerative disorders (Kendler, 2001). One method to estimate genetic variation in brain phenotypes is to use twin studies. A major goal of the young-adult Human Connectome Project (HCP) is to map the heritability and genetic underpinnings of brain traits (Van Essen and others, 2013). This dataset includes imaging data on approximately 1200 subjects with over 200 twin pairs.
The traditional heritability model uses monozygotic and dizygotic twins to decompose genetic and environmental components of univariate phenotypes, and it provides guidance for molecular genetic studies (Van Dongen and others, 2012). Brain structure and function, however, are often measured by high-dimensional imaging data and commonly represented as functional phenotypes. For instance, various shape analysis methods have been developed to characterize brain cortical and subcortical structures in humans. Functional phenotypes extracted from shape analysis may be effective for the identification of causal genes and a mechanistic understanding of the pathophysiological processes of neurological disorders (Zhao and Castellanos, 2016).
As an illustration, we consider the cortical thickness dataset obtained from the HCP, where each cortical thickness functional phenotype is measured at approximately
locations on the cortical surface. The cerebral cortex is the outer layer of the brain and consists of a highly folded sheet of gray matter varying in thickness from approximately 2 to 4 mm. Cortical thickness is important to cognition and intelligence, and cortical thinning may be associated with dementia (Dickerson and others, 2009).
Correlations in cortical thickness between subregions measured across a population are biologically important, but the extent to which these correlations result from genetic influences is poorly understood (Evans, 2013). These correlations, also called cortical networks, may reflect coordinated developmental pathways and recapitulate certain white-matter tracts and functional networks (Alexander-Bloch and others, 2013). Cortical correlations have been associated with psychiatric and neurological disorders including depression and Alzheimer’s (Wang and others, 2016; He and others, 2008). Cortical correlations are typically estimated between different regions using a parcellation, but additional insight may be gained by examining higher resolution spatial covariance functions. In this article, we will use the HCP dataset to measure the heritability and various covariance structures (e.g., genetic) of cortical thickness in the healthy human brain. In particular, we focus on the additive genetic covariance function,
, described in Section 2.1. The clinical meaning of
is the covariance in the genetic component of cortical thickness between two locations. For example, a positive value indicates an individual with a thicker cortex in location
tends to have a thicker cortex in location
due to genetic factors.
The aim of this article is to develop a scalable method that improves estimates of heritability and three nonstationary covariance functions—genetic, shared (i.e., common) environmental, and unique environmental—of high-dimensional functional phenotypes. Luo and others (2019) proposed a heritability model for twin functional data based on Fisher’s Additive genetic, Common environmental, and unique Environmental (ACE) model (see Section 2.1). However, high-dimensional functional data with thousands or more grid points results in several major big-data challenges. The first big-data challenge is the dimensionality of the covariance matrices, each of which consists of
unknown parameters, where
is the number of grid points for a given functional phenotype and
is the number of twin pairs. It is computationally intractable to use joint maximum likelihood estimation (MLE) to estimate these
covariance matrices. Alternatively, one may resort to a pairwise analysis by estimating the genetic and environmental correlations of each grid pair, e.g. using bivariate MLE separately for all possible grid pairs. However, the computational costs are still very high with no software available to perform such a large-scale implementation, and there are additional statistical issues described below.
The second big-data challenge is to develop a method in which the covariance functions are positive semidefinite (PSD), where estimators lacking this property may be less accurate. Pairwise approaches do not ensure the PSD properties of the three nonstationary covariance functions. In contrast, joint estimation approaches can result in PSD estimates, and as we will show, large improvements in accuracy. However, joint approaches become computationally more difficult as
increases. Methods to estimate covariance matrices and functions in the high-dimensional, low-sample size setting have been recently proposed (Xiao and others, 2016) but are not applicable to multiple covariance functions from twin functional data. Our major contributions are as follows:
We propose novel estimators of nonstationary genetic and environmental effects in twin studies with high-dimensional, low-sample size data.
We propose estimates of measurement-error corrected heritability, where measurement error can be estimated based on the smoothness of the underlying biological processes.
We automate smoothing using kernels that incorporate local information based on geodesic distance and use generalized cross-validation (GCV) to select the bandwidth, which in our application leads to a higher effective resolution.
We estimate the covariance patterns in genetic effects in cortical thickness in the HCP, which provides detailed insight into cortical networks.
The remainder of this article is organized as follows. In Section 2, we present Fisher’s model for heritability and a recent functional extension. We then present our estimators and algorithm. In Section 3, we conduct a simulation study. In Section 4, we analyze the HCP cortical thickness data and conclude with a discussion in Section 5.
2. Methods
2.1. The ACE of space model
Fisher’s ACDE model proposes additive genetic (A), dominant genetic (D), common environmental (C), and unique environmental (E) components of variation in a phenotype. Additive genetic effects can be estimated by assuming the correlation between genetic traits is 1 for monozygotic (MZ) twins and 0.5 for dizygotic (DZ) twins, which is based on genetic theory. The correlation for dominant effects is 1 for MZ and 0.25 for DZ, but dominant and additive effects are not simultaneously identifiable in the basic twin study design. The ACE model controls for effects due to the shared environment and is most appropriate for polygenic phenotypes. Heritability estimates from the ACE model are called narrow-sense heritability, denoted by
, which contrasts with broad-sense heritability,
, which includes dominant effects.
Let
, where
indexes the family and
the total number of families. Let
denote the number of MZ pairs and
the set of families with MZ pairs;
and
denote the number and set, respectively, of DZ pairs; and
and
denote singletons with no relatives in the dataset. Let
if the
th family contains twins and
if the
th family comprises a singleton, and let
index the individual in the
th family. For clarity, we here assume that all observations
belong to one of these three classes, but inclusion of nontwin siblings is discussed in Section 2.2. We define the standard ACE model using the mixed model formulation (Rabe-Hesketh and others, 2008) as follows:
![]() |
(2.1) |
where
is an indicator function;
are fixed covariates, which are typically effects we want to control for that are not of primary interest;
is a vector of coefficients;
and
are the additive genetic effects;
is the common environmental effect;
is the random effect for the total unique variance, which is equal to the sum of the unique environmental effects plus measurement error; and
, and
are mutually independent. In the next section, we will decompose
into the unique environmental effect and measurement error. The ACE model can also be formulated as a structural equation model. The formulation in (2.1) assumes that there are no dominant effects (no nonadditive genetic effects), gene–gene interactions (epistasis), gene–environment interactions, and no assortative mating. The standard approach for heritability estimates in neuroimaging is to pre-smooth the data and then estimate a separate model at each location, where the amount of smoothing is fixed a priori.
Next, we apply the functional model in Luo and others (2019) to a spatial domain, which we call the ACE of space. Let
denote a spatial domain and
an arbitrary location. In our application,
is the cortical surface, in which the subjects are aligned in a common domain consisting of two 2D manifolds (one for each hemisphere) embedded in 3D space, and data (e.g., cortical thickness) is measured at approximately 60 000 locations (vertices). We use the spherical representation of each manifold in which each cerebral “hemisphere” is represented by its own sphere, with null values in the noncortical areas corresponding to the connection between the two hemispheres (Figure S8 in the supplementary materials available at Biostatistics online). This is the common template in the processed data in which cortical thickness at a given vertex represents the same location across subjects relative to the aligned cortical folding patterns of the sulci and gyri. The location of each vertex is denoted by a triple
, where the last coordinate denotes the cerebral hemisphere. The locations with data are denoted as
. For conciseness, we hereafter denote these locations with single indices
or
. Modeling will incorporate the local spatial correlation using kernel regression with geodesic distance, as measured using the great circle distance, and the kernel is equal to zero when vertices are in different hemispheres. See Appendix C in the supplementary materials available at Biostatistics online. More generally, long-distance correlations will be estimated from the data (see Section 2.2). For display, the vertices are mapped to the Conte69 atlas, which is an average of cortical folding patterns from subjects in an independent dataset.
Let
denote the unique environmental effect generated from a Gaussian process, and let
denote the measurement error. Then, the functional ACE model is
![]() |
(2.2) |
It is assumed that
,
,
,
, and
are mutually independent mean-zero Gaussian processes with covariance functions
,
,
,
, and
, respectively. We assume
for
. We have
![]() |
To simplify notation, we let
,
,
, and
. Then narrow-sense heritability is defined:
![]() |
(2.3) |
which is corrected for the measurement error due to
.
Pairwise covariance estimators were proposed in Luo and others (2019) based on kernel regression and applied to 150 locations on the corpus callosum for 129 twin pairs. We examine these estimators, called S-FSEM (Symmetric estimators from the Functional Structural Equation model), in simulations (Section 3). This approach can result in negative estimates of variance parameters, particularly for
. Here, we develop a method that jointly estimates the covariance function (under PSD constraints) for thousands of locations, which can improve both estimates of heritability and correlation patterns.
2.2. Covariance estimation
When the estimate of a covariance function is not PSD, it is common to truncate to the positive eigenvalues and their associated eigenfunctions. This approach generally results in lower MISE than the original symmetric functions (Hall and others, 2008). Truncation was used to estimate a genetic covariance function in a pedigree model for cow growth in Lei and others (2015). In the ACE of space model, we initially truncated to positive eigenvalues of the FSEM, called PSD-FSEM, but this resulted in large biases as examined in Section 3. This may be more problematic in our
application. This motivated estimation of the ACE model under PSD constraints (PSD-ACE), which we summarize in six steps.
Step 1. Calculate point-wise MLEs of unknown parameters in model (2.1).
Step 2. Smooth the parameter estimates using bandwidths selected using GCV and calculate the fixed effect residuals.
Step 3. Estimate the measurement error.
Step 4. Use the fixed-effect residuals and estimates of measurement error as input to an initial estimator of the covariance functions, which has a convenient closed-form solution, and estimate the bandwidths using GCV.
Step 5. Estimate the rank of the covariance functions from the number of positive eigenvalues (aided by scree plots) and truncate to the corresponding positive eigenvalues/vectors.
Step 6. Estimate the covariance functions under PSD constraints initialized from Step 5.
Step 1 is straightforward, e.g., Rabe-Hesketh and others (2008). In our implementation, we numerically optimize the log of the variance parameters, which constrains their estimates to be positive. Step 2 aims to decrease the mean square error (MSE) of the point-wise MLEs since the MLEs in one location tend to be similar to those in adjacent locations. The smoothed MLEs (SMLEs) are calculated using a biweight (quartic) kernel with bandwidth
, denoted
, based on the geodesic distance between locations
and
. The biweight kernel is defined as
![]() |
(2.4) |
and is used throughout. We use the convention that
is the focal vertex, which here we restrict to
, and
are locations that can contribute to the estimate at the focal vertex
. A sparse
smoothing matrix is formed,
, with normalized entries
with
. Let
be a vector of length
of the MLE estimates of the unique environmental plus measurement error variance. Then define
. GCV is an approximation to leave-one-location out cross-validation and is calculated as
![]() |
(2.5) |
where
is the trace operator. Then the GCV is calculated for a grid of values of
and the value minimizing the GCV is chosen. Note that this approach allows a separate bandwidth to be chosen for each parameter, which contrasts with maximum weighted likelihood. This procedure is repeated for
, and the residuals from the smoothed estimates are calculated.
Step 3 uses the difference between the estimates of the total variance from the SMLEs and estimates of the variance due to genetic and environmental effects derived based on smoothness assumptions. We use kernel regression with the biweight kernel to estimate the smooth sum of covariance functions in which diagonal elements are excluded as described in Appendix A1 of the supplementary materials available at Biostatistics online, which has similarities to estimating the nugget effect in spatial statistics. Step 4 is related to the closed-form solutions presented in the FSEM in Luo and others (2019), whose method is described in Appendix A2. Our modification lends itself to GCV for bandwidth selection and is described in Appendix A3. We call this modification the sandwich estimator in the spirit of Xiao and others (2016), abbreviated as SW hereafter. Step 5 chooses the rank based on the scree plot. When
is greater than the number of subjects, such as our data application, the rank is clearly determined by the number of twins and subjects due to constraints on the maximum possible rank (e.g., Figure S11 of the supplementary materials available at Biostatistics online). Hence, the approach is arguably less ad hoc in our application than its use in standard PCA.
Step 6 greatly improves upon the initial estimates. Let
denote the number of individuals in a family. We order the families such that
,
, and
, and order the individuals such that the twins are indexed by
and
. Let
. For some observed location
, let
, the fixed effect residuals obtained from Step 2, and
be the estimate of the measurement error obtained from Step 3. For
, define
![]() |
(2.6) |
![]() |
(2.7) |
We consider the discrete problem restricting the covariance functions
and
to observed locations
. Let
denote the
matrix. Define
,
, and
. Consider the class of
(symmetric) PSD matrices
of rank
. We can define this class as
. Let
denote the
th row of
. Similarly define
and
for
and
. For focal locations
and contributing locations
(i.e., have nonzero weight when nearby), define the PSD–ACE objective function
![]() |
(2.8) |
We note that this decomposition is not identifiable because the minimum is not unique. However, we are not interested in the decompositions beyond their usefulness as a memory efficient representation of the covariance matrices. The key observation is that this re-parameterization allows the calculation of an analytic gradient that is scalable. Then we can optimize (2.8) by initializing from the closed-form solution in Step 5. Extensive simulations support the estimation steps outlined here. The steps and iterations of the gradient descent algorithm can be seen as progressively decreasing the MSE: truncating the eigenvalues/vectors decreases the MSE relative to the symmetric (closed form) estimator, and then the gradient steps result in additional improvements. This will be seen in the simulations. See also the discussion in Section 3.2. In our applications, we did not have issues with convergence, but an approach using diagonalization after each iteration could also be pursued.
Also note this objective function uses the product of two bivariate functions as the kernel for covariance estimation, i.e.,
, which results in computational simplifications that enable estimation for large datasets. We again use the biweight kernel, and note its finite support decreases computational expense.
The computational complexity of (2.8) is
, which makes it impracticable for modestly sized
, much less
. However, we can derive a gradient descent algorithm in which there is a one time cost of
and updates are
; the formulas for the analytic gradients appear in Appendix A4 of the supplementary materials available at Biostatistics online. We initially use a modestly sized learning rate. In each iteration, the algorithm checks if the norm of the gradient increased relative to the previous iteration, and if so, halves the learning rate. Convergence is assessed relative to the initial size of the gradient; see Algorithm 1.
Algorithm 1:
Covariance estimation
Inputs : The
data matrix
and design matrix
; tolerance
, e.g., 0.0001; learning rate
, e.g., 0.1.
Result:
,
, and
.
- (1) Estimate measurement error,
, and residuals,
, using SMLE with input
and
. (Steps 1-3) - (2) Calculate
,
, and
in which the bandwidths are chosen using GCV. These bandwidths will be used in subsequent estimators. (Step 4) - (3) Choose the rank
based on the scree plot for
. Use the selected eigenvalue/eigenvector pairs to generate an initial value
. Repeat this process for
and
. (Step 5) - (4) Calculate gradients
,
, and
using the initial values and calculate
, and let
. - (5) While
, increment
and calculate
, and similarly for
and
. - (6) If
, then return to the previous step and decrease the learning rate,
. Re-calculate
, and similarly for
and
.
For clarity, we described (2.8) for twins and unrelated singletons; however, we can also use information from nontwin siblings. By looking at the expected values of the products of residuals in (2.6), it can be seen that the nontwin siblings can be treated in the same way as singletons, and thus are included in the first summand in (2.8). In a similar manner, we can treat nontwin siblings as singletons in steps 3–5. Note we do not make the assumption that nontwin siblings have the same common environment as their twin siblings.
This procedure results in three covariance matrices, which are the covariance functions evaluated at
locations. However, it may be desirable to obtain their corresponding covariance functions, which can be used to evaluate the heritabilities and covariances at unobserved grid locations; see Appendix A5 of the supplementary materials available at Biostatistics online. The approach can also be used to estimate the covariance functions from a sample of points from the spatial domain to decrease computational expense, particularly with respect to memory usage. We can partition the locations, estimate the covariance functions for each subset, interpolate to all locations, and subsequently combine the estimates. Code in the supplementary materials available at Biostatistics online is available to implement this low-memory approach.
3. Simulations
3.1. Simulation design
We simulated functional spatial data for 100 MZ pairs, 100 DZ pairs, and 200 singletons with 1002 grid points on the unit sphere with the sizes of the variances motivated from the HCP cortical thickness data and the basis functions chosen to result in a realistic range of variances and covariances. We defined covariance functions using sixth-order even spherical harmonics, which comprise 28 basis functions,
. This order was chosen to result in a mixture of high- and low-frequency fluctuations. Higher frequencies capture quick-changes in correlation structure, which is motivated by the patterns observed in Section 4. We then defined
using five basis functions as
where
was chosen such that
. The value of 0.015 is approximately equal to the average of the MLE of the genetic variance (in mm
) in cortical thickness across all locations, as calculated in Section 4. Similarly, we define
where
was chosen such that
. The value of 0.010 is approximately equal to the average common environmental variance in Section 4. Next,
was defined with basis functions
and scaled so that the average variance was 0.12. This resulted in heritability that ranged from 0.016 to 0.498 with mean 0.126. Finally,
was defined from the diagonal of the matrix formed from the basis functions
and scaled to have average equal to 0.03. Estimation included a design matrix with a column of ones and a continuous covariate, while their true coefficients were equal to zero.
Here, we define the quantities
and
as measures of error, where
denotes the
th simulation. We also present normalized versions in the supplementary materials Section B available at Biostatistics online.
We compare three estimators of the covariance functions: (i) the symmetric FSEM proposed in Luo and others (2019), S-FSEM, defined in Appendix A2 in the supplementary materials available at Biostatistics online; (ii) the PSD analog of S-FSEM based on truncating to positive eigenvalues (PSD-FSEM); and (iii) the PSD-ACE estimator. We also compared three additional estimators of the covariance functions: (iv) the symmetric sandwich estimator (S-SW) defined in Appendix A3 of the supplementary materials available at Biostatistics online and used in initialization in Step 4 of the PSD-ACE; (v) the PSD-SW based on truncating to positive eigenvalues; and (vi) PSD-ACE oracle estimator (PSD-ACE-O), based on Algorithm 1 but using the true ranks. The S-SW results are very similar to S-FSEM, and the PSD-SW results are very similar to PSD-FSEM; additionally, PSD-ACE-O results are very similar to PSD-ACE (Tables S1 and S2 and Figures S1–S4 of the supplementary materials available at Biostatistics online). We include the primary estimators S-FSEM, PSD-FSEM, and PSD-ACE in the main manuscript. Based on an inspection of the scree plots from a few hundred simulations, we chose eight, eight, and six eigenvalues for
,
, and
, respectively (where the true ranks were 5, 5, and 6) for the PSD-ACE. We also compared the variances and heritabilities with the point-wise MLE and the maximum weighted likelihood estimator (MWLE) with bandwidth selected using 5-fold CV as in Luo and others (2019) (defined in Appendix A6 of the supplementary materials available at Biostatistics online).
3.2. Simulation results
Overall, the MISEs for
and
are much lower for PSD-ACE than the other methods (Figure 1, Figure S1 of the supplementary materials available at Biostatistics online). In all individual simulations, PSD-ACE has a lower ISE than S-FSEM and PSD-FSEM for all covariance matrices. For
and
, S-FSEM has the lowest bias but largest variance, whereas PSD-FSEM has high bias but smaller variance, while PSD-ACE has some bias but less than PSD-FSEM and dramatically lower variance (Figure 1(a) and (b)). A different pattern emerges for
. The S-FSEM version of this estimator is PSD in most simulations. Consequently, the S-FSEM and PSD-FSEM versions are very similar. For
, PSD-ACE again has the best MISE driven by lower variance, but now has more bias relative to PSD-FSEM. To visualize the bias, we can examine plots of the covariance between a focal vertex, i.e., a seed, and the 1,002 vertices of the discretized spatial domain, which corresponds to a row of the covariance matrix of the
locations. PSD-FSEM tends to inflate differences between locations, in particular having higher values near the seed, whereas the constrained estimates tend to shrink the differences towards zero (Figure 2; Figure S2 of the supplementary materials available at Biostatistics online).
Fig. 1.
MISE of covariance functions for the S-FSEM, PSD-FSEM (truncated eigenvalues of S-FSEM), and PSD-ACE. Panels d, e, and f depict boxplots of the ISE from 1000 simulations. See Figure S1 in the supplementary materials available at Biostatistics online for a normalized version of this figure.
Fig. 2.
Visualizing bias in covariance estimation. Plotted is the average across 1000 simulations of
, where
denotes the simulation, and the covariance between a randomly selected seed location (
) and 1002 locations is evaluated.
When we restrict attention to the MISE of the variance functions (diagonals of the previous matrices), we again observe that PSD-ACE has the best MISE, and in particular outperforms the point-wise likelihood methods (Figure 3; Figure S3 of the supplementary materials available at Biostatistics online). Many of the estimates of variance and heritability for S-FSEM are negative, contributing to large variances, which results in large MISEs. The average genetic variance across all locations is biased upwards for PSD-ACE (for
, the average is 0.020 whereas the true average is 0.015), but this represents a dramatic improvement relative to PSD-FSEM (0.091), while MLE and MWLE are the least biased (0.015, 0.015) (see also Figures S4–S7 of the supplementary materials available at Biostatistics online). Note that in Figure 3(c),
for MLE and MWLE depict estimates of measurement error plus unique environmental variance,
, since measurement error is not identifiable in the point-wise approach. Consequently, MLE and MWLE have large bias.
Fig. 3.
MISE of
,
,
, and heritability across
=1002 locations. (See Figures S.4–S.7 for a visualization of the bias across space.) In (c), point-wise MLE and point-wise MWLE estimates of
are presented because they do not separate
, which leads to bias. (e) Boxplots of the ISE from 1000 simulations, where the y-axis is on the log10 scale due to large differences between methods. See Figure S3 in the supplementary materials available at Biostatistics online for a normalized version of this figure.
For heritability, PSD-ACE is less biased than MLE and MWLE due to the ability to disentangle measurement error and unique environmental variance. Variance and bias accumulate in the heritability estimates such that the relative improvements of PSD-ACE over other approaches are even greater (Figure 3(d) and (e), Table S2 of the supplementary materials available at Biostatistics online), and the unidentifiability of measurement error in the point-wise MLE and MWLE results in downwardly biased estimates of heritability (Figures S6 and S7 of the supplementary materials available at Biostatistics online).
Note that the PSD-ACE improved upon the initial estimators in all simulations. Previous literature has noted that when imposing positive definite constraints with the parameterization
, nonuniqueness can lead to issues with convergence when optima are close to each other (Pinheiro and Bates, 1996). Here, this does not appear to detrimentally affect parameter estimation, where we used an adaptive learning rate. We used a strict convergence criteria (ratio of the norm of the current gradient to the initial gradient less than 0.0001), and found that when the algorithm did not converge (in the sense that the size of the gradient failed to get smaller for vanishingly small learning rates), the estimate appeared to have adequately minimized the objective function (e.g., Figure 1). In practice, we found that convergence can be improved by increasing the ranks in PSD-ACE, but this does not necessarily improve the estimate.
In summary, PSD-ACE has the lowest MISE albeit with more bias than the S-FSEM. Restricting the analysis to heritability, the bias in PSD-ACE was less than the MLE and MWLE, while also having the lowest overall MISE. Truncating covariance functions (PSD-FSEM) from symmetric estimates led to a large amount of bias in
and
.
4. Application to HCP
We used the 32k (per hemisphere) preprocessed cortical thickness data from the 1200-subject HCP data release. We controlled for age, gender, and total intracranial volume (Appendix C of the supplementary materials available at Biostatistics online). The HCP dataset contains cortical thickness for 1094 subjects, which includes twins, nontwin siblings, and unrelated individuals. In this sample, the age (mean
SD) was 28.8
3.7 years. There were 595 females versus 499 males with 75% White, 15% African-American, 6% Asian/Native Hawaiian/Pacific Islander, and 4% “other”. Of these, 452 were genotyped, which revealed that 31 of 109 twins pairs that self-reported DZ twin status were in fact MZs. In contrast, 151 out of 151 genotyped twin pairs that self-reported MZ were in fact MZ. Thus, the set of subjects that self-reported MZ was 100% accurate, while the set of subjects that self-reported DZ was only 72% accurate. Consequently, we included all self-reported MZs but excluded self-reported DZs without genotype data. This resulted in 151 MZ and 78 DZ pairs.
For S-FSEM and PSD-ACE, we included all 1094 subjects, where nontwin siblings were treated as singletons as discussed in Section 2.2. We excluded nontwin siblings from the MLEs and MWLEs, where the likelihoods assume independence or require the assumption that siblings can be treated in the same manner as DZ pairs (i.e., the common environmental variance effects,
, are the same for siblings of different ages and twins). Similarly, we elected to include one randomly selected member of a family for the families with siblings and no twins, which resulted in 676 subjects in the MLE and MWLE analyses.
We estimated heritabilities using the point-wise MLE, point-wise MWLE with 5-fold leave-one-family out CV, S-FSEM, and PSD-ACE. Overall, the selected bandwidths were notably small; for details, see supplementary materials C4 available at Biostatistics online. An inspection of the scree plots of the eigenvalues clearly indicates the ranks of the covariance functions are determined by the number of twin pairs and individuals: for
and
, the rank equals the number of twin families (229); for
, the rank equals
(943) (Figure S10 of the supplementary materials available at Biostatistics online).
Step 6 is the most computationally intensive step. In the supplementary materials available at Biostatistics online, we provide scripts with the option to partition the data to decrease memory overhead, where the number of partitions can be tuned to meet a user’s memory limits. Here, we used the full data on a high memory server (required approximately 1.8 TB) and 24 CPUs. The PSD-ACE took approximately 26 h to fit, while the S-FSEM took approximately 0.5 h, and the PSD-FSEM took approximately 1 h.
We assessed the sensitivity of the PSD-ACE to the selected ranks. We re-ran Step 6 with the ranks reduced by 10 for each covariance, and found negligible changes; see supplementary materials C4 available at Biostatistics online. This is expected because the PSD-ACE is initiated from the ordered positive eigenvalues/vectors of the symmetric estimates, such that excluding the smallest eigenvalues/vectors should have negligible impacts. We also assessed convergence of the PSD-ACE by running an additional 600 iterations and found negligible changes; see supplementary materials C5 available at Biostatistics online.
In general, heritability was higher near the central sulci and medial areas near the corpus callosum (Figure 4; also see the annotated PSD-ACE in Figure S13 of the supplementary materials available at Biostatistics online). The heritability was higher in PSD-ACE than MLE and MWLE, and there were many zeros in the MLE and MWLE estimates but not PSD-ACE. S-FSEM had some negative heritabilities due to negative estimates of
and tended to have estimates that were higher than MLE and MWLE but lower than PSD-ACE. The mean
SD across all vertices was
for PSD-ACE and
for S-FSEM, while MLE and MWLE were
and
, respectively. Using
from Step 3 in the PSD-ACE estimation, we can also construct estimates of measurement-error corrected heritability for MLE and MWLE:
and
, respectively. Here, the measurement error only accounts for a small proportion of the differences.
Fig. 4.
Heritability estimated using the point-wise MLE (top left), point-wise MWLE (top right), PSD-ACE (bottom right), and S-FSEM (bottom left).
The patterns of higher heritability in the central sulci and medial areas near the corpus callosum are similar to Shen and others (2016), which was based on young adults (22.8 years
2.3) and used 10 mm Laplace-Beltrami pre-smoothing. Their vertex-wise heritability estimates averaged across ROIs ranged from 0.026 to 0.523. Since our data were not pre-smoothed and the GCV-selected bandwidth was small, we suggest our Figure 4 has a higher effective resolution than Shen and others (2016) Figure 2. There are some notable differences. Shen and others (2016) found strong heritability in medial frontal areas in the left hemisphere (labeled as “right” using the radiological convention in their paper), whereas we did not find high heritability in these areas in either hemisphere. We found higher heritability in the parahippocampal gyrus and entorhinal cortex (ventral to the medial wall; see annotated Figure S13 in the supplementary materials available at Biostatistics online), whereas this pattern was less evident in Shen and others (2016). The entorhinal cortex is involved in memory, and interestingly, thinning of the entorhinal cortex may interact with the APOE-
4 gene in Alzheimer’s (Thompson and others, 2011).
The genetic covariance function can be efficiently explored by creating an animation that progresses through different seeds (supplementary materials available at Biostatistics online). We have created figures from selected seeds that highlight findings of scientific interest (Figure 5). The covariances are normalized to define correlations. First, the seed map for vertex 1577 in the right cortex (top left), located in the parahippocampal gyrus near the boundary with the isthmus cingulate, suggests that this location is a potential hub, as it is relatively highly correlated with many areas of the cerebral cortex. Thus the genetic control of cortical thickness for this location is related to the genetic control over cortical thickness across broad areas of the brain. In contrast, the seed map for vertex 161 (top right), also located in the parahippocampal gyrus, exhibits high correlations along a narrow ridge, with substantially lower overall correlations and much more localized genetic control. The parahippocampal gyrus is associated with memory encoding and retrieval, and our analysis indicates heterogeneous genetic patterns in this region. Next, the bottom row of Figure 5 illustrates that the local patterns of correlation can differ greatly between nearby locations. Vertex 180 (bottom right) and vertex 239 (bottom left), both located in the isthmus cingulate, have very different local correlation patterns.
Fig. 5.
Genetic correlation function evaluated at selected seeds, as estimated using PSD-ACE. Surface vertex indices from right cortex of the fs_LR template, clockwise from top left: 1577, 161, 180, and 239. Animation depicting hundreds of seeds is available in the supplementary materials available at Biostatistics online.
5. Discussion
We present a method to estimate the genetic covariance function and heritability of brain traits from neuroimaging data. Our main contribution is the development of an estimation method that can handle a large number of grid points. In simulations, our approach improves estimates of heritability, genetic covariances, and environmental covariances. We apply our method to gain novel insights into the heritability and genetic correlations from the HCP. Our approach reveals fine-scale differences in covariance patterns, identifying locations in which genetic control is correlated with large areas of the brain and locations where it is highly localized. This enables insight into the genetic underpinnings of structural networks of cortical thickness.
Our analysis reveals tradeoffs between bias and variance, which is an important consideration for scientific interpretation. Here, we discuss three bias-variance tradeoffs in estimates of heritability in neuroimaging: (i) bias from smoothing; (ii) bias from constraining the covariance matrix to be PSD; and (iii) bias from measurement error. With respect to smoothing, we use GCV for a data-based selection of the bias-variance tradeoff, which in general will reduce the mean squared error relative to using an a priori determined degree of smoothing. The impacts of smoothing on bias in twin studies is discussed in Li and others (2012). At higher resolutions, there is greater measurement error, which has historically motivated the use of a large amount of smoothing, e.g., a Gaussian kernel with full-width at half-maximum equal to 25–30 mm, or the use of brain traits averaged across a smaller number of regions. These approaches decrease the variance of estimators, which can increase power, but can also decrease the spatial precision, which can increase false positives.The small amount of smoothing selected via GCV in our estimates will generally result in more variable estimates than approaches using larger amounts of smoothing, but also preserves fine-scaled changes in correlation. Different approaches have different costs and benefits, and in this respect can complement one another.
A second form of bias arises when imposing PSD constraints, as revealed by large differences between the unconstrained, truncated, and constrained estimators. Truncating the covariance functions to positive eigenvalues/eigenvectors results in lower MISE but large bias, and this large bias motivated our development of the PSD-ACE estimator. We view PSD-ACE as a compromise that results in dramatically lower variance relative to the symmetric estimators at the cost of some bias (Figures 1 and 3) and dramatically lower bias relative to truncated estimators at additional computational cost. In practice, one approach is to estimate both PSD-ACE and S-FSEM to compare the estimators with better variance properties versus better bias properties.
A third form of bias, measurement error, is common in heritability studies. When repeated scans on the same subject are available, Ge and others (2017) proposed the use of linear mixed effects models with repeated measures, leading to large improvements in estimates of heritability. A functional ACE model for repeated measurements is an important avenue for future research. In the absence of repeated measurements, we utilize the assumption that the underlying genetic and environmental functions are smooth Gaussian processes, which allows the estimation of measurement error in a manner similar to the nugget effect in spatial statistics. In our simulations, likelihood-based approaches were more biased than PSD-ACE in heritability estimates because they conflate measurement error and unique environmental variance.
In this study, we have not addressed inference. In particular, many of the correlations may not be statistically significant, including the large changes in genetic correlation observed over short distances. For datasets with fewer locations, future research could develop permutation tests to calculate FWER-corrected p-values for genetic correlations. Note that for variance components, Luo and others (2019) proposed a test for the significance using MWLE. Another important avenue for future research is the extension of the PSD-ACE to more general pedigree models, which could allow the estimation of genetic components from large datasets like UK Biobank. This would also be useful in evaluating the replicability of cortical thickness heritability.
We applied our method to tens of thousands of points and produced a detailed atlas of the covariance in cortical thickness related to genetic factors. By determining the degree of smoothing from the data, our approach allows a more detailed spatial resolution. We used the same kernel and bandwidth for all locations across the cortical surface. However, the large changes in correlation patterns over small distances, e.g., the bottom row of Figure 5, together with the small bandwidth selected by GCV and 5-fold CV, suggest that future research could explore additional modeling approaches. Locally adaptive procedures have been developed for image smoothing, regression, and maximum weighted likelihood (e.g., Li and others, 2012). A recent method for functional PCA based on the Laplace-Beltrami operator for the cortical surface may better characterize local features than a fixed kernel (Lila and others, 2017). The most flexible approach would be to allow jump discontinuities, e.g. extending Zhu and others (2014). Developing these approaches to estimate multiple covariance functions in big neuroimaging twin studies is challenging.
Supplementary Material
Acknowledgments
We thank Sandeep Sarangi at the University of North Carolina Research Computing, Dr. Richard Smith in the Department of Statistics and Operations Research at UNC, and the Statistical and Applied Mathematical Sciences Institute (SAMSI). Data were provided by the Human Connectome Project, WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University.
Conflict of Interest: None declared.
Funding
The NSF (DMS-1127914 to the Statistical and Applied Mathematical Science Institute); U.S. NIH (MH086633, MH092335 and MH116527), NSF (SES-1357666 and DMS-1407655), the Cancer Prevention Research Institute of Texas, and the endowed Bao-Shan Jing Professorship in Diagnostic Imaging to H.Z.
References
- Alexander-Bloch, A., Giedd, J. N. and Bullmore, E. (2013). Imaging structural co-variance between human brain regions. Nature Reviews Neuroscience 14, 322. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen, C.-H., Fiecas, M., Gutierrez, E., Panizzon, M. S., Eyler, L. T., Vuoksimaa, E., Thompson, W. K., Fennema-Notestine, C., Hagler, D. J., Jernigan, T..L.. and others (2013). Genetic topography of brain morphology. Proceedings of the National Academy of Sciences of the USA 110, 17089–17094. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dickerson, B. C., Bakkour, A., Salat, D. H., Feczko, E., Pacheco, J., Greve, D. N., Grodstein, F., Wright, C. I., Blacker, D., Rosas, H. D.. and others (2009). The cortical signature of Alzheimer’s disease: regionally specific cortical thinning relates to symptom severity in very mild to mild ad dementia and is detectable in asymptomatic amyloid-positive individuals. Cerebral Cortex 19, 497–510. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Evans, A. C. (2013). Networks of anatomical covariance. Neuroimage 80, 489–504. [DOI] [PubMed] [Google Scholar]
- Ge, T., Holmes, A. J., Buckner, R. L., Smoller, J. W. and Sabuncu, M. R. (2017). Heritability analysis with repeat measurements and its application to resting-state functional connectivity. Proceedings of the National Academy of Sciences of the USA 114, 5521–5526. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hall, P., Müller, H.-G. and Yao, F. (2008). Modelling sparse generalized longitudinal observations with latent gaussian processes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70, 703–723. [Google Scholar]
- He, Y., Chen, Z. and Evans, A. (2008). Structural insights into aberrant topological patterns of large-scale cortical networks in Alzheimer’s disease. Journal of Neuroscience 28, 4756–4766. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kendler, K. S. (2001). Twin studies of psychiatric illness: an update. Archives of General Psychiatry 58, 1005–1014. [DOI] [PubMed] [Google Scholar]
- Lei, E., Yao, F., Heckman, N. and Meyer, K. (2015). Functional data model for genetically related individuals with application to cow growth. Journal of Computational and Graphical Statistics 24, 756–770. [Google Scholar]
- Li, Y., Gilmore, J. H., Wang, J., Styner, M., Lin, W. and Zhu, H. (2012). Twinmarm: two-stage multiscale adaptive regression methods for twin neuroimaging data. IEEE Transactions on Medical Imaging 31, 1100–1112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lila, E., Aston, J.A.D. and Sangalli, L. M. (2017). Smooth principal component analysis over two-dimensional manifolds with an application to neuroimaging. The Annals of Applied Statistics 10, 1854–1879. [Google Scholar]
- Luo, S., Song, R., Styner, M., Gilmore, J. H. and Zhu, H. (2019). FSEM: functional structural equation models for twin functional data. Journal of the American Statistical Association 114, 344–357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pinheiro, J. C. and Bates, D. M. (1996). Unconstrained parametrizations for variance-covariance matrices. Statistics and Computing 6, 289–296. [Google Scholar]
- Rabe-Hesketh, S., Skrondal, A. and Gjessing, H. K. (2008). Biometrical modeling of twin and family data using standard mixed model software. Biometrics 64, 280–288. [DOI] [PubMed] [Google Scholar]
- Shen, K.-K., Dore, V., Rose, S., Fripp, J., McMahon, K. L., de Zubicaray, G. I., Martin, N. G., Thompson, P. M., Wright, M. J. and Salvado, O. (2016). Heritability and genetic correlation between the cerebral cortex and associated white matter connections. Human Brain Mapping 37, 2331–2347. [DOI] [PMC free article] [PubMed] [Google Scholar]
-
Thompson, W. K., Hallmayer, J. and O’Hara, R.; Alzheimer’s Disease Neuroimaging Initiative. (2011). Design considerations for characterizing psychiatric trajectories across the lifespan: application to effects of APOE-
4 on cerebral cortical thickness in Alzheimer’s disease. American Journal of Psychiatry 168, 894–903. [DOI] [PMC free article] [PubMed] [Google Scholar] - Van Dongen, J., Slagboom, P. E., Draisma, H. H. M., Martin, N. G. and Boomsma, D. I. (2012). The continuing value of twin studies in the omics era. Nature Reviews Genetics 13, 640–653. [DOI] [PubMed] [Google Scholar]
- Van Essen, D. C., Smith, S. M., Barch, D. M., Behrens, T. E., Yacoub, E. and Ugurbil, K.; WU-Minn HCP Consortium. (2013). The WU-Minn Human Connectome Project: an overview. Neuroimage 80, 62–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang, T., Wang, K., Qu, H., Zhou, J., Li, Q., Deng, Z., Du, X., Lv, F., Ren, G., Guo, J.. and others (2016). Disorganized cortical thickness covariance network in major depressive disorder implicated by aberrant hubs in large-scale networks. Scientific Reports 6:27964. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xiao, L., Zipunnikov, V., Ruppert, D. and Crainiceanu, C. (2016). Fast covariance estimation for high-dimensional functional data. Statistics and Computing 26, 409–421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao, Y. and Castellanos, F. X. (2016). Annual research review: discovery science strategies in studies of the pathophysiology of child and adolescent psychiatric disorders-promises and limitations. Journal of Child Psychology and Psychiatry 57, 421–439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu, H., Fan, J. and Kong, L. (2014). Spatially varying coefficient model for neuroimaging data with jump discontinuities. Journal of the American Statistical Association 109, 1084–1098. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.














