Abstract
Background
Linear mixed-effects models (LMM) are a leading method in conducting genome-wide association studies (GWAS) but require residual maximum likelihood (REML) estimation of variance components, which is computationally demanding. Previous work has reduced the computational burden of variance component estimation by replacing direct matrix operations with iterative and stochastic methods and by employing loose tolerances to limit the number of iterations in the REML optimization procedure. Here, we introduce two novel algorithms, stochastic Lanczos derivative-free REML (SLDF_REML) and Lanczos first-order Monte Carlo REML (L_FOMC_REML), that exploit problem structure via the principle of Krylov subspace shift-invariance to speed computation beyond existing methods. Both novel algorithms only require a single round of computation involving iterative matrix operations, after which their respective objectives can be repeatedly evaluated using vector operations. Further, in contrast to existing stochastic methods, SLDF_REML can exploit precomputed genomic relatedness matrices (GRMs), when available, to further speed computation.
Results
Results of numerical experiments are congruent with theory and demonstrate that interpreted-language implementations of both algorithms match or exceed existing compiled-language software packages in speed, accuracy, and flexibility.
Conclusions
Both the SLDF_REML and L_FOMC_REML algorithms outperform existing methods for REML estimation of variance components for LMM and are suitable for incorporation into existing GWAS LMM software implementations.
Electronic supplementary material
The online version of this article (10.1186/s12859-019-2978-z) contains supplementary material, which is available to authorized users.
Keywords: GWAS, Linear mixed-effects models, Variance components, REML, Conjugate gradients, Stochastic trace estimation, Stochastic Lanczos quadrature
Background
Linear mixed-effects modeling (LMM) is a leading methodology employed in genome-wide association studies (GWAS) of complex traits in humans, offering the dual benefits of controlling for population stratification while permitting the inclusion of data from related individuals [1]. However, the implementation of LMM comes at the cost of increased computational burden relative to ordinary least-squares regression, particularly in performing residual maximum likelihood (REML) estimation of genomic variance components. Conventional REML algorithms require multiple or matrix operations, where m and n are the numbers of markers and individuals, respectively, rendering them infeasible for large biobank scale data sets. Further, common numerical methods for REML estimation rely on sparse matrix methods suitable for traditional LMM applications (e.g., pedigree data or experiments with repeated measures [2]) that are inapplicable to genomics variance components models since these models involve dense relatedness matrices. As a result, the problem of increasing the computational efficiency of REML estimation of genomic variance components has generated considerable research activity [3–8].
In the case of the standard two variance component model (1), the estimation of which is the focus of the current research, previous efforts toward increasing computational efficiency fit into two primary categories: 1., reducing the number of cubic time complexity matrix operations needed to achieve convergence; and 2., substituting stochastic and iterative matrix operations for deterministic, direct methods to obtain procedures with quadratic time complexity. The first approach is embodied by the methods implemented in the FaST-LMM and GEMMA packages [3, 5, 6], which take advantage of the fact that the genetic relatedness matrix (GRM) and identity matrix comprising the covariance structure are simultaneously diagonalizable. As a result, after performing a single spectral decomposition of the GRM and a small number of matrix-vector multiplications, the REML criterion (3) and its gradient and Hessian can be repeatedly evaluated using only vector operations. The second approach is exemplified by the popular BOLT-LMM software [7, 8], which avoids all cubic operations by solving linear systems via the method of conjugate gradients (CG) and employing stochastic trace estimators in place of deterministic computations.
In the current research, we propose two algorithms, stochastic Lanczos derivative-free residual maximum likelihood (SLDF_REML; Algorithm 3) and Lanczos first-order Monte Carlo residual maximum likelihood (L_FOMC_REML; Algorithm 4), that combine features of both approaches (Fig. 1). Here, we translate the simultaneous diagonalizability of the heritable and non-heritable components of the covariance structure to stochastic and iterative methods via the principle of Krylov subspace shift-invariance. As a result, we only need to compute the costliest portions of the objective function once (via stochastic/iterative methods), computing all subsequent iterations of the REML optimization problem only using vector operations. We develop the theory underlying these methods and demonstrate their performance relative to previous methods via numerical experiment.
Results
Across 20 replications per condition for random subsamples of n=16,000 to 256,000 unrelated European-ancestry individuals, both SLDF_REML and L_FOMC_REML produced heritability estimates for height consistent with those generated by the GCTA software package (Figs. 4 and 5). For large samples, the novel algorithms achieved greater accuracy than either version of BOLT-LMM (e.g., for n=250,000, mean-squared error was 1.74 ×10−6 for BOLT-LMM v2.3.2 versus 1.24 ×10−7 for L_FOMC_REML). Particularly, the time required per additional iteration after initial overhead computations was low for the novel algorithms (e.g., =20.07 min for BOLT-LMM v2.3.2 versus 2.06 min for L_FOMC_REML; Table 2), enabling increased precision at minor cost. With respect to total timings, SLDF_REML dramatically outperformed all other methods when the precomputed GRM was available (Table 2 and Fig. 3), which we expect whenever the number of markers exceeds the sample size. Examining methods taking genotype matrices as inputs, SLDF_REML and L_FOMC_REML performed similarly, whereas BOLT-LMM v2.3.2 converged more quickly than either in smaller samples (Fig. 3), though the differences for n =256,000 were relatively minor (e.g., 91.09 min for BOLT-LMM v2.3.2 versus 102.21 min for L_FOMC_REML; Table 2). The older version of BOLT-LMM, v2.1, performed significantly more slowly than any of the other implementations examined (e.g., average wall clock time was 177.95 min at n =256,000), demonstrating the importance of implementation optimization.
Table 2.
Method | Overhead | Per evaluation | Evaluation count | Total | |
---|---|---|---|---|---|
BOLT-LMM v2.1 | 34.63 | 35.83 | 4 | 177.95 | |
BOLT-LMM v2.3.2 | 10.82 | 20.07 | 4 | 91.09 | |
L_FOMC_REML | 89.87 | 2.06 | 6 | 102.21 | |
SLDF_REML { | with genotype matrix | 90.22 | 1.06 | 9 | 99.73 |
with precomputed GRM | 28.95 | 1.07 | 9 | 38.60 |
Data reflect trimmed mean wall clock time in minutes over 20 iterations per condition
As the computations needed to compute the Lanczos decompositions in L_FOMC_REML and BOLT-LMM v2.3.2 are equivalent in time and memory complexity, we expect that an optimized compiled-language implementation of L_FOMC_REML would reduce the overhead computation time by a significant linear factor (≈3 for n =256,000, comparing the sum of the overhead time and single objective function evaluation time for BOLT-LMM v2.3.2 to its total running time; Table 2). Consistent with theory, the wall clock times per objective function evaluation for the novel algorithms were trivial given the Lanczos decompositions (e.g., for n =256,000, = 2.06 versus 20.07 min for L_FOMC_REML and BOLT-LMM v2.3.2, respectively; Table 2 and Fig. 2).
Discussion
We have proposed stochastic algorithms for estimating the two variance component model (1), both of which theoretically offer substantial time savings relative to existing methods. Our methods capitalize on the principle of Krylov subspace shift invariance to reduce the number of steps involving or computations to one, whereas existing methods perform equivalent computations at each iteration of the REML optimization procedure. For large samples, when taking genotype matrices as inputs, our interpreted-language implementations of L_FOMC_REML and SLDF_REML [9] produced more accurate variance component estimates than the highly-optimized, compiled BOLT-LMM implementations, while taking similar amounts of time. Thus, we expect comparably-optimized implementations of the novel algorithms to compute high accuracy REML estimates in close to the time required by BOLT-LMM v2.3.2 for a single objective function evaluation. Further, in contrast to the BOLT_LMM algorithm, which requires the genotype matrix, SLDF_REML can exploit precomputed GRMs to reduce operation count by an factor (Table 1), which yields dramatic time savings when the number of markers greatly exceeds the number of individuals (Fig. 3). While GRM precomputation is itself , it can be effectively and asynchronously parallelized across multiple compute nodes, substantially mitigating computational burden (though we note that serial input/output constraints can interfere with efficient parallelization). However, as the L_FOMC_REML algorithm involves the computation of BLUPs of SNP effects, L_FOMC_REML is preferable to SLDF_REML when BLUP estimates are desired for prediction (as in [10]).
Table 1.
Method | Overhead | Objective function evaluation |
---|---|---|
SLDF_REML | ||
L_FOMC_REML | ||
BOLT_LMM |
n denotes the number of individuals, m the number of markers, and c the number of covariates. nrand indicates the number of random probing vectors and is fixed at 15 in all numerical experiments. nκ reflects the number of conjugate gradient iterations required to achieve convergence at a specified tolerance and can be bounded in terms of the spectral condition number of H0. As noted in [8], implicit preconditioning of H0 can be achieved by including the first few right singular vectors of the genotype matrix (or eigenvectors of the GRM) as covariates
There are several limitations to the proposed approaches. First, SLDF_REML, which benefits from the ability to take GRMs as input, depends linearly on the number of included covariates, which might grow prohibitive in samples spanning numerous genotyping batches and ascertainment locations. However, as in BOLT_LMM, L_FOMC_REML requires matrix multiplications for BLUP computation at each step of the REML optimization procedure, whereas SLDF_REML requires only vector operations. Thus, though the options provided by the two novel algorithms increase researchers’ flexibility overall, the choice of whether to employ SLDF_REML versus L_FOMC_REML is problem-specific and necessitates greater researcher attention to resource allocation. For example, even when a precomputed GRM is available, it might be preferable to use L_FOMC_REML if BLUPs of latent SNP effects are desired. On the other hand, if a researcher intends to sequentially analyze a large number of phenotypes in a relatively small sample of individuals, it might prove most efficient to compute a GRM, despite the involved computational burden, in order to speed subsequent computations by supplying the GRM to the SLDF_REML algorithm. Second, neither algorithm mitigates the substantial or memory complexity common to all algorithms for REML estimation of genomic variance components, requiring that researchers have access to high-memory compute nodes to work with large samples (though we note that neither of the novel algorithms substantial increases this burden either). Finally, for the same reasons that the spectral decomposition-based direct methods implemented in the FaST-LMM and GEMMA packages [3, 5, 6] are restricted to the simple two component model (1) (i.e., whereas the GRM and identity matrix are simultaneously diagonalizable, the same doesn’t hold for arbitrary collections of three or more symmetric positive semidefinite matrices), the shift-invariance property exploited by the proposed methods does not extend to multiple genomic variance components. Given that the two component model is insufficient for precise heritability estimation for many complex traits [11], our novel algorithms apply to the particular, though common, tasks of variance component and BLUP estimation for LMM in association studies.
Despite these limitations, the proposed algorithms have clear advantages over existing methods in terms of flexibility, accuracy, and speed of computation. We provide both pseudocode and heavily annotated Python 3 implementations [9] to facilitate their incorporation into existing software packages. Further, though our algorithms are restricted to the two variance component model, they can be used to generate the inputs necessary for estimation of more complex models, such as the mixture model estimated via variational approximation implemented in [7], and thus have applications to non-infinitesimal models. Finally, we suggest that the methods presented in our theoretical development, in particular stochastic trace estimation and stochastic Lanczos quadrature, are likely to find uses in REML estimation of other models of interest to researchers in genomics. In particular, we suggest the development of models that exploit Krylov subspace shift-invariance to speed up variance/covariance component estimation for the case of multivariate phenotypes as a target for future research. Such models necessarily involve the computation or approximation of Hessian matrices, thereby introducing additional complexity in comparison to the univariate case considered above. However, the extension of fast cubic complexity methods based on the spectral decomposition of the covariance matrix [3, 5] to the multivariate case [6] suggests the potential for multivariate analogues of the algorithms presented here.
Conclusions
The proposed algorithms, SLDF_REML and L_FOMC_REML, unify previous approaches to estimating the two variance component model (1) by exploiting the simultaneous diagonalizability of the covariance structure components while avoiding matrix operations with cubic time complexity. As a result, the most expensive operations only need to be performed once, as with the spectral decomposition performed in the FaST-LMM and GEMMA software packages [3, 5, 6], but these operations consist only of matrix-vector products, as in the BOLT-LMM software package [7, 8]. All but one iteration of the REML optimization procedure requires only vector operations, yielding increased speed and numerical precision relative to existing methods. Furthermore, the unique strengths of the two methods lead to a flexible approach depending on researcher goals: SLDF_REML is capable of operating on precomputed GRMs when available, whereas L_FOMC_REML can generate BLUPs of latent SNP effects without added computational burden. We recommend these algorithms for incorporation into GWAS LMM implementations.
Method
We consider the two component genomic variance components model commonly employed in LMM association studies [1], which is of the form
1 |
where y is a measured phenotype, the c≪n columns of are covariates (including an intercept term) with corresponding fixed effects β, and is a matrix of n individuals’ standardized genotypes at m loci. Without loss of generality, we assume that X has full column rank; in the case of numerical rank deficiency we can simply replace X by the optimal full rank approximation generated by its economy singular value decomposition or rank revealing QR decomposition. The latent genetic effects and residuals are random variables with distributions parametrized by the heritable and non-heritable variance components, and , respectively. The REML criterion corresponds to the marginal likelihood of , where KT projects to an (n−c)-dimensional subspace orthogonal to the covariate vectors such that the null space of KT is exactly the column space of X [12]. In other words such that . The transformed random variable KTy has the marginal distribution , which we reparametrize as , where
2 |
Here, , which indicates the average covariance between individuals’ standardized genotypes, is often referred to as the genomic relatedness matrix (GRM). The REML criterion, or marginal log likelihood, can be expressed as a function of τ:
3 |
where Pτ=K(KTHτK)−1KT, and, as implied by the REML first-order (stationarity) conditions, is the expected residual variance component given τ and [12, 13]. In practice, K is never explicitly formed.
Naïve procedures for maximizing the REML criterion require evaluating (3) or its derivatives at each iteration of the optimization procedure. Previous methods either reduce the number of necessary cubic time complexity operations to one by exploiting problem structure, or substitute quadratic time complexity iterative and stochastic matrix operations for direct computations (Fig. 1). Here, we unify these approaches via the principle of Krylov subspace shift invariance to achieve methods that only require a single iteration of quadratic time complexity operations.
In what follows, we first present a brief survey of the Lanczos process, its applications to families of shifted linear systems, and its use in constructing Gaussian quadratures for spectral matrix functions. We assume familiarity with the method of conjugate gradients, an iterative procedure for approximating solutions to symmetric positive definite linear systems, and Gaussian quadrature, a method for approximating the integral of a given function by a well chosen weighted sum of its values; if not, see [14] and [15], respectively. We present these methods toward the goal of efficiently evaluating the quadratic form and log-determinant terms appearing in the REML criterion (3). We then present the details of the SLDF_REML and L_FOMC_REML algorithms, both of which exploit problem structure via Lanczos process-based methods in order to speed computation. Finally, we derive expressions for the computational complexity of the present algorithms, which we confirm via numerical experiment.
Preliminaries
The notation in this section is self-contained. Our presentation borrows from the literature extensively; further details on the (block) Lanczos procedure [14, 16], conjugate gradients for shifted linear systems [17, 18], stochastic trace estimation [19, 20], and stochastic Lanczos quadrature [21–23] are suggested in the bibliography.
Krylov subspaces
Consider a symmetric positive-definite matrix A and nonzero vector b. Define the mth Krylov subspace by the span of the first m−1 monomials in A applied to b; that is, . Krylov subspaces are shift invariant—i.e., for real numbers σ, we have .
The Lanczos procedure
The Lanczos procedure generates the decomposition AUm=UmTm, where the columns u1,…,um of Um form an orthonormal basis for and the Jacobi matrices are symmetric tridiagonal. Choosing u1=b/∥b∥, successive columns are uniquely determined by the sequence of Lanczos polynomials such that each uk=pk−1(A)u1 and each pk is the characteristic polynomial of Jacobi matrix Tk consisting of the first k rows and columns of Tm. The Lanczos procedure is equivalent to the well-known method of conjugate gradients (CG) for solving the linear system Ax=b in that the mth step CG approximate solution x(m) is obtained from the above decomposition using only vector operations (see Algorithm 1). The number of steps m prior to termination corresponds to the number of CG iterations need to bound the norm of the residual below a specified tolerance: ∥Ax(m)−b∥<ε. The rate of convergence depends on the spectral properties of A and can be controlled in terms of the spectral condition number κ(A). In the present application, the fact that all complex traits of interest generally have a non-trivial non-heritable component results in well-conditioned systems [7, 9].
Solving families of shifted linear systems
Having applied the Lanczos process to the seed system Ax=b, shift-invariance can be exploited to obtain the mth step CG approximate solution to the shifted linear system Aσxσ=(A+σI)xσ=b, only using vector operations [17]. It can be shown that any positive shift by σ≥0 improves the rate of convergence such that , where δm>0 is the mth diagonal element of the Lanczos Jacobi matrix corresponding to .
Lanczos polynomials and Gaussian quadrature
Additionally, the Lanczos polynomials comprise a sequence of orthogonal polynomials with respect to the spectral measure
where A=QΛQT is the spectral decomposition [21, 22]. Quadratic forms vTf(A)v involving spectral functions f(A)=Qf(Λ)QT, e.g., for the matrix logarithm, , can be written as Riemann–Stieltjes integrals of the form
4 |
The Lanczos decomposition AUm=UmTm generates the weights and nodes for an m-point Gaussian quadrature approximating the above integral. Denoting the spectral decomposition of the jth Jacobi matrix for j=1,…,m, we approximate (4) as
where θj,ℓ={Dj}ℓ,ℓ and . As m here corresponds to the number of CG iterations needed to ensure that ∥Ax(m)−v∥ is smaller than a specified tolerance, the tridiagonal Jacobi matrices are small and calculating their spectral decompositions is computationally trivial.
Stochastic Lanczos quadrature
Stochastic Lanczos quadrature (SLQ) combines the above quadrature formulation with Hutchinson-type stochastic trace estimators [21]. Such estimators approximate the trace of a matrix by a weighted sum of quadratic forms for normalized, suitably distributed i.i.d. random probing vectors [19]. The SLQ approximate trace of a spectral function of a matrix, tr(f(A)), is then
5 |
Whereas the number of probing vectors nrand is chosen a priori, the number quadrature nodes mκ corresponds to the number of conjugate gradient iterations needed to ensure is less than a specified tolerance for each j=1,…,nrand.
SLQ and shift invariance
For a fixed probing vector vi, we can exploit the shift invariance of to efficiently update Gaussian quadrature generated by the corresponding Lanczos decomposition AUm=UmTm. Again denoting the spectral decomposition of the Jacobi matrix , the Lanczos decomposition of the shifted system is simply . Thus, given the approximation (5) for tr(f(A)), we can efficiently compute an approximation of tr(f(Aσ)) for any σ>0. In Algorithm 2 we implement a method for estimating tr(log(Aσ)) in operations given the spectral decompositions of the Jacobi matrices corresponding to for probing vectors .
Block methods
For multiple right hand sides B=[b1|⋯|bc], the Lanczos procedure can be generalized to the block Krylov subspace, resulting in a collection of Lanczos decompositions AUj=UjTj such that {Uj}1=bj/∥bj∥ for j=1,…,c. This process is equivalent to block CG methods in that the Jacobi matrices can again be used to generate an approximate solution X(m) to the matrix equation AX(m)=B. We provide an implementation of the block Lanczos procedure in L_Seed [9], employing a conservative convergence criterion defined in terms of the magnitude of the (1,2) operator norm of the residual . Compared to performing c separate Lanczos procedures with respect to , block Lanczos with respect to , with B=[b1|⋯|bc], produces the same result (for a fixed number of steps). However, block Lanczos employs BLAS-3 operations and is thus more performant, especially when implemented on top of parallelized linear algebra subroutines.
A derivative-free REML algorithm
We propose the stochastic Lanczos derivative-free residual maximum likelihood algorithm (SLDF_REML; Algorithm 3), a method for efficiently and repeatedly evaluating the REML criterion, which is then subject to a zeroth-order optimization scheme. To achieve this goal, we first identify the parameter space of interest with a family of shifted linear systems. We then develop a scheme for evaluating the quadratic form yTPτy and log determinant ln(det(KTHτK)) terms in the REML criterion (3) that use the previously discussed Lanczos methods to exploit this shifted structure. Specifically, after obtaining a collection of Lanczos decompositions, we can repeatedly solve the linear systems involved in the quadratic form term via Lanczos conjugate gradients and approximate the log determinant term via stochastic Lanczos quadrature.
The parameter space as shifted linear systems
Given a range of possible values of the standardized genetic variance component, or heritability,
6 |
we set and define , noting that for all , the spectral condition number of Hτ will be less than that of H0 as the identity component of Hτ will only increase. Further, we have now identified elements of our parameter space τ∈Θ with the family of shifted linear systems
For any vector v for which we have computed the Lanczos decomposition H0U=UT with the first column of U equal to v/∥v∥, we can use Algorithm 1 to obtain the CG approximate solution for all σ≥0 in O(n) operations.
The quadratic form
Directly evaluating the quadratic form
7 |
is computationally demanding and is typically avoided in direct estimation methods [12, 13]. Writing the complete QR decomposition of the covariate matrix allows us to define , noting that substituting for KT preserves the value of (7). is equivalent to the orthogonal projection operator , which admits an efficient implicit construction and is computed in O(nc2) operations via the economy QR decomposition X=QXRX. Then, reexpressing (7) as yTS(SHτS)†Sy, we can use the Lanczos process to construct an orthonormal basis and corresponding Jacobi matrix for the Krylov subspace . We can then obtain the CG approximation of yTS(SHσS)−1Sy using vector operations as, for any shift σ, we have yTS(SHσS)†Sy=yTS(SH0S+σIn)−1Sy (see Lemma 1 in Additional file 1 for proof).
The log determinant
We use an equivalent formulation [12, 24] of the term ln(det(KTHτK)), rewriting it as
The det(XTX) term is constant with respect to τ and can be disregarded. For c≪n, is computationally trivial via direct methods given , which we can compute for all parameter values of interest in O(n) operations having first applied the block Lanczos process with respect to . Computing the block Lanczos decomposition corresponding to , which is only performed once, unfortunately scales with the number of covariates c, a disadvantage not shared by our second algorithm (Algorithm 4). The remaining term, ln(det(Hτ)), is approximated by applying SLQ (Algorithm 2) to a special case of (5): We rewrite the log determinant as the trace of the matrix logarithm
where we have spectrally decomposed H0=QΛQT for some τ0≤τ with σ=τ−τ0. We draw nrand i.i.d. normalized Rademacher random vectors , where each element of each vector vi takes values of either 1/∥vi∥ or −1/∥vi∥ with equal probability. The SLQ approximate of the log determinant for the seed system is
where the weights wi,ℓ and nodes θi,ℓ are respectively derived by using the Lanczos process to construct orthonormal bases for (in practice, we apply block Lanczos to ) [21, 22].
The SLDF_REML algorithm
Stochastic Lanczos derivative-free residual maximum likelihood (SLDF_REML; Algorithm 3), conceptually similar to the derivative-free algorithm of Graser and colleagues [13], applies the previously introduced Lanczos methods to approximate the above reparametrization of the REML criterion. Shift-invariance is then exploited such that, with the exception of the initial Lanczos decompositions, the REML log likelihood can be repeatedly evaluated using only vector operations. SLDF_REML takes a phenotype vector , a covariate matrix , either the genetic relatedness matrix or the standardized genotype matrix (in which case the action of the GRM as a linear operator is coded implicitly as v↦Z(ZTv)), and a range of possible standardized genomic variance component values as arguments and generates a function REML_criterion that efficiently computes the log-likelihood of τ|KTy. This function is then subject to scalar optimization via Brent’s method, which is feasible given the low cost of evaluation and low dimension of Θ. Hyperparameters include the number of probing vectors to be used for the SLQ approximation of the log determinant nrand, as well as tolerances corresponding to the REML criterion, parameter estimates, and the Lanczos residual norms. Convergence to a given tolerance on a sensible scale is ensured by optimizing with respect to the heritability h2∈Θ⊆ [ 0,1] and evaluating the REML criterion at τ=(1−h2)/h2. The REML criterion can be repeatedly evaluated in operations, making high accuracy computationally feasible.
A first-order Monte Carlo REML algorithm
We additionally propose the Lanczos first-order Monte Carlo residual maximum likelihood algorithm (L_FOMC_REML; Algorithm 4), which also takes advantage of the shifted structure of the standard genomic variance components model to speed computation. We first present the related first-order algorithm implemented in the efficient and widely-used BOLT-LMM software [7, 8], which we refer to as BOLT_LMM and of which the proposed L_FOMC_REML algorithm is a straightforward extension.
BOLT_LMM (First-order Monte Carlo REML)
The BOLT_LMM algorithm is based on the observation that at stationary points of the REML criterion (3), the first-order REML conditions (i.e., ∇ℓ=0) imply that
8 |
where and are the best linear unbiased predictions (BLUPs) of the latent genetic effects and residuals, respectively [25]. The BLUPs are functions of τ given by
9 |
where we have defined . The expectations (8) are approximated via the following stochastic procedure: Monte Carlo samples of the latent variables, , are used to generate samples of the projected phenotype vector
BLUPs are then computed as in (9), yielding the approximations
Using the above expressions, Loh et al. [7, 8] apply a zeroth-order root-finding algorithm to the quantity
noting that fr=0 is a necessary condition (and, in practice, a sufficient condition) for (8). Using CG to approximate solutions to the linear systems involved in BLUP computations results in an efficient REML estimation procedure involving O(n·m·nrand) operations for well-conditioned covariance structures (i.e., for nontrivial non-heritable variance component values). As noted in [8], implicit preconditioning of H0 can be achieved by including the first few right singular vectors of the genotype matrix (or eigenvectors of the GRM) as columns of the covariate matrix X.
The L_FOMC_REML algorithm
The BOLT_LMM algorithm described above involves solving nrand+1 linear systems
at each iteration of the optimization scheme in order to compute BLUPs of the latent variables for the observed phenotype vector and each of the Monte Carlo samples. However, each iteration involves spectral shifts of the left hand side of the form
As in the SLDF_REML algorithm, the underlying block Krylov subspace is invariant to these shifts (i.e., , where ). Thus, having performed the Lanczos process for an initial parameter value τ0, we can use L_Solve (Algorithm 1) to obtain the block CG approximate solution in O(n·nrand) operations. We are thus able to avoid solving linear systems in all subsequent iterations, though the relatively small number of matrix-vector products involved in computing BLUPs for the latent genetic effects at each step are unavoidable. The requirement of the genotype matrix for computing (9) prevents both L_FOMC_REML and BOLT_LMM from efficiently exploiting precomputed GRMs.
Comparison of methods
We compare theoretical and empirical properties of our proposed algorithms, SLDF_REML and L_FOMC_REML, to those of BOLT_LMM.
Computational complexity
In contrast to BOLT_LMM, the Lanczos-decomposition based algorithms we have proposed only need to perform the computationally demanding operations necessary to evaluate the REML criterion once. As such, we differentiate between overhead computations, which occur once and do not depend on the number of iterations needed to achieve convergence, and per-iteration computations, which are repeated until convergence of the optimization process (Table 1 and Fig. 2).
The overhead computations of SLDF_REML are dominated by the need to construct bases for the nrand+c+1 subspaces , and are thus when a precomputed GRM is available and otherwise. Here, nκ denotes the number of Lanczos iterations needed to achieve convergence at a pre-specified tolerance and increases with . Subsequent iterations are dominated by the cost of solving c+1 shifted linear systems via L_Solve and are thus . The overhead computations in L_FOMC_REML are dominated by the Lanczos decompositions corresponding to the 2nrand+1 seed systems, where the GRM is implicitly represented in terms of the standardized genotype matrix, and is thus . Operations of equivalent complexity are needed at every iteration of BOLT_LMM.
Numerical experiments
We compared wall clock times for genomic variance component estimation for height in nested random subsets of 16,000, 32,000, 64,000, 128,000, and 256,000 unrelated European ancestry individuals from the widely used UK Biobank data set [26]. All analyses included 24 covariates consisting of age, sex, and testing center and used hard-called genotypes from 330,723 array SNPs remaining after enforcing a 1% minor allele frequency cutoff. We compared SLDF_REML, with and without a precomputed GRM, to L_FOMC_REML which requires the genotype matrix. For the novel algorithms, absolute tolerances for the Lanczos iterations and the REML optimization procedure were set to 5e-5 and 1e-5, respectively. Additionally, we compared our interpreted Python 3.6 code to BOLT-LMM versions 2.1 and 2.3.3 (C++ code compiled against the Intel MKL and Boost libraries) [7, 8, 27, 28]. We ran each algorithm twenty times per condition, trimming away the two most extreme timings in each condition. Mirroring the default settings of the BOLT-LMM software packages, we set nrand=15 across both of our proposed methods.
Novel algorithms were implemented in the Python v3.6.5 computing environment [9], using NumPy v1.14.3 and SciPy v1.1.0 compiled against the Intel Math Kernel Library v2018.0.2 [28–30]. Optimization was performed using SciPy’s implementation of Brent’s method, with convergence determined via absolute tolerance of the standardized genomic variance component . Timing results (Table 2 and Figs. 3 and 5) do not include time required to read genotypes into memory, or, when applicable, to compute GRMs, and reflect total running time on an Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz with 32 physical cores and 1 terabyte of RAM. Timing experiments excluded methods with cubic time complexity, including GCTA, FaST-LMM, and GEMMA. Accuracy was assessed by comparing heritability estimates generated by the stochastic algorithms to those estimated via the direct, deterministic average-information Newton–Raphson algorithm as implemented in the GCTA software package v1.92.0b2 [4] (Figs. 4 and 5).
Additional file
Acknowledgements
The authors wish to thank UK Biobank participants. Additionally, the authors thank Matthew C. Keller and Luke M. Evans for their thoughtful comments and provision of computational resources. Publication of this article was funded by the University of Colorado Boulder Libraries Open Access Fund.
Abbreviations
- BLAS
Basic linear algebra subprogram
- BLUP
Basic linear unbiased prediction
- CG
Conjugate gradients method
- GCTA
Genome-wide complex trait analysis [4]
- GRM
Genomic relatedness matrix
- GWAS
Genome-wide association study
- LMM
Linear mixed-effects model
- REML
Residual maximum likelihood
- SLQ
Stochastic Lanczos quadrature
Authors’ contributions
RB wrote the manuscript, developed the algorithms, wrote the code used in numerical experiments, and analyzed the data. SB supervised the project and contributed to the development of the algorithms and the writing of the manuscript. Both authors read and approved the final manuscript.
Funding
Richard Border was supported by a training grant from the National Institute of Mental Health (T32 MH016880) and by the Institute for Behavioral Genetics. Stephen Becker acknowledges funding by NSF grant DMS-1819251.
Availability of data and materials
The UK Biobank data are available to qualified researchers via the UK Biobank Access Management System (https://bbams.ndph.ox.ac.uk/ams). The code used in the numerical experiments is available on Github (https://github.com/rborder/SL_REML).
Ethics approval and consent to participate
UK Biobank data collection procedures were approved by the UK Biobank Research Ethics Committee (reference 11/NW/0382).
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Richard Border, Email: richard.border@colorado.edu.
Stephen Becker, Email: stephen.becker@colorado.edu.
References
- 1.Yang J, Zaitlen NA, Goddard ME, Visscher PM, Price AL. Advantages and Pitfalls in the Application of Mixed Model Association Methods. Nat Genet. 2014;46(2):100–6. doi: 10.1038/ng.2876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Bates D, Mächler M, Bolker B, Walker S. Fitting Linear Mixed-Effects Models Using Lme4. 2014. arXiv preprint arXiv:14065823.
- 3.Zhou X, Stephens M. Genome-Wide Efficient Mixed Model Analysis for Association Studies. Nat Genet. 2012;44(7):821–4. doi: 10.1038/ng.2310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: A Tool for Genome-Wide Complex Trait Analysis. Am J Hum Genet. 2011;88(1):76–82. doi: 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Lippert C, Listgarten J, Liu Y, Kadie CM, Davidson RI, Heckerman D. FaST Linear Mixed Models for Genome-Wide Association Studies. Nat Methods. 2011;8(10):833–5. doi: 10.1038/nmeth.1681. [DOI] [PubMed] [Google Scholar]
- 6.Zhou X, Stephens M. Efficient Multivariate Linear Mixed Model Algorithms for Genome-Wide Association Studies. Nat Methods. 2014;11(4):407. doi: 10.1038/nmeth.2848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Loh PR, Tucker G, Bulik-Sullivan BK, Vilhjálmsson BJ, Finucane HK, Salem RM, et al. Efficient Bayesian Mixed-Model Analysis Increases Association Power in Large Cohorts. Nat Genet. 2015;47(3):284–90. doi: 10.1038/ng.3190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Loh PR, Kichaev G, Gazal S, Schoech AP, Price AL. Mixed-Model Association for Biobank-Scale Datasets. Nat Genet. 2018;50:906–8. doi: 10.1038/s41588-018-0144-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Border R. Stochastic Lanczos Likelihood Estimation of Genomic Variance Components. Appl Math Grad Theses Dissertations. 2018;120.
- 10.de los Campos, Vazquez AI, Fernando R, Klimentidis YC, Sorensen D. Prediction of Complex Human Traits Using the Genomic Best Linear Unbiased Predictor. PLoS Genet. 2013;9(7):e1003608. doi: 10.1371/journal.pgen.1003608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Evans LM, Tahmasbi R, Vrieze SI, Abecasis GR, Das S, Gazal S, et al. Comparison of Methods That Use Whole Genome Data to Estimate the Heritability and Genetic Architecture of Complex Traits. Nat Genet. 2018;50(5):737–45. doi: 10.1038/s41588-018-0108-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Searle SR, Casella G, McCulloch CE . Variance Components, vol. 391. United States: Wiley; 2009. [Google Scholar]
- 13.Graser HU, Smith SP, Tier B. A Derivative-Free Approach for Estimating Variance Components in Animal Models by Restricted Maximum Likelihood. J Anim Sci. 1987;64(5):1362–70. doi: 10.2527/jas1987.6451362x. [DOI] [Google Scholar]
- 14.Björck A. Numerical Methods in Matrix Computations, vol. 59. Switzerland: Springer; 2015. [Google Scholar]
- 15.Atkinson KE. An Introduction to Numerical Analysis. United Kingdom: Wiley; 2008. [Google Scholar]
- 16.O’Leary DP. The Block Conjugate Gradient Algorithm and Related Methods. Linear Algebra Appl. 1980;29:293–322. doi: 10.1016/0024-3795(80)90247-5. [DOI] [Google Scholar]
- 17.Frommer A, Maass P. Fast CG-Based Methods for Tikhonov-Phillips Regularization. SIAM J Sci Comput. 1999;20(5):1831–50. doi: 10.1137/S1064827596313310. [DOI] [Google Scholar]
- 18.Sogabe T. A Fast Numerical Method for Generalized Shifted Linear Systems with Complex Symmetric Matrices. Recent Dev Num Anal Num Comput Algoritm. 2010;:13.
- 19.Hutchinson MF. A Stochastic Estimator of the Trace of the Influence Matrix for Laplacian Smoothing Splines. Commun Stat Simul Comput. 1990;19(2):433–50. doi: 10.1080/03610919008812866. [DOI] [Google Scholar]
- 20.Avron H, Toledo S. Randomized Algorithms for Estimating the Trace of an Implicit Symmetric Positive Semi-Definite Matrix. J ACM. 2011;58(2):8:1–8:34. doi: 10.1145/1944345.1944349. [DOI] [Google Scholar]
- 21.Golub GH, Matrices MG. Moments and Quadrature with Applications. Princeton: Princeton University Press; 2009. [Google Scholar]
- 22.Ubaru S, Chen J, Saad Y. Fast Estimation of Tr(f(A)) via Stochastic Lanczos Quadrature. SIAM J Matrix Anal Appl. 2017;38(4):1075–99. doi: 10.1137/16M1104974. [DOI] [Google Scholar]
- 23.Chen J, Saad Y. A Posteriori Error Estimate for Computing Tr(f(A)) by Using the Lanczos Method. 2018. arXiv:180204928 [math].
- 24.Zhu S, Wathen AJ. Essential Formulae for Restricted Maximum Likelihood and Its Derivatives Associated with the Linear Mixed Models. 2018. arXiv:180505188 [stat].
- 25.McCulloch C, Searle SR, Neuhaus JM . Generalized, Linear, and Mixed Models. Hoboken: Wiley; 2008. [Google Scholar]
- 26.Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLoS Med. 2015;12(3):e1001779. doi: 10.1371/journal.pmed.1001779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Schling B. The Boost C++ Libraries. USA: XML Press; 2011. [Google Scholar]
- 28.Wang E, Zhang Q, Shen B, Zhang G, Lu X, Wu Q, et al. Intel Math Kernel Library. In: High-Performance Computing on the Intel®Xeon Phi™. New York: 2014. p. 167–88.
- 29.Oliphant T. NumPy: A Guide to NumPy. 2006.
- 30.Jones E, Oliphant T, Peterson P, et al. SciPy: Open Source Scientific Tools for Python. 2001.
- 31.Loh PR, Bhatia G, Gusev A, Finucane HK, Bulik-Sullivan BK, Pollack SJ, et al. Contrasting Genetic Architectures of Schizophrenia and Other Complex Diseases Using Fast Variance Components Analysis. Nat Genet. 2015;47(12):1385–92. doi: 10.1038/ng.3431. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The UK Biobank data are available to qualified researchers via the UK Biobank Access Management System (https://bbams.ndph.ox.ac.uk/ams). The code used in the numerical experiments is available on Github (https://github.com/rborder/SL_REML).