Connectivity-informed adaptive regularization for generalized outcomes

Damian BRZYSKI; Marta KARAS; Beau M ANCES; Mario DZEMIDZIC; Joaquín GOÑI; Timothy W RANDOLPH; Jaroslaw HAREZLAK

doi:10.1002/cjs.11606

. Author manuscript; available in PMC: 2022 Mar 1.

Published in final edited form as: Can J Stat. 2021 Feb 15;49(1):203–227. doi: 10.1002/cjs.11606

Show available content in

Connectivity-informed adaptive regularization for generalized outcomes

Damian BRZYSKI ¹, Marta KARAS ², Beau M ANCES ³, Mario DZEMIDZIC ⁴, Joaquín GOÑI ⁵, Timothy W RANDOLPH ⁶, Jaroslaw HAREZLAK ^7,^8,^*

PMCID: PMC8730330 NIHMSID: NIHMS1763726 PMID: 35002039

Abstract

One of the challenging problems in neuroimaging is the principled incorporation of information from different imaging modalities. Data from each modality are frequently analyzed separately using, for instance, dimensionality reduction techniques, which result in a loss of mutual information. We propose a novel regularization method, generalized ridgified Partially Empirical Eigenvectors for Regression (griPEER), to estimate associations between the brain structure features and a scalar outcome within the generalized linear regression framework. griPEER improves the regression coefficient estimation by providing a principled approach to use external information from the structural brain connectivity. Specifically, we incorporate a penalty term, derived from the structural connectivity Laplacian matrix, in the penalized generalized linear regression. In this work, we address both theoretical and computational issues and demonstrate the robustness of our method despite incomplete information about the structural brain connectivity. In addition, we also provide a significance testing procedure for performing inference on the estimated coefficients. Finally, griPEER is evaluated both in extensive simulation studies and using clinical data to classify HIV+ and HIV− individuals.

Keywords: Brain connectivity, brain structure, generalized linear regression, Laplacian matrix, penalized regression, structured penalties

1. INTRODUCTION

Neuroimaging data usually include multiple data types, but most commonly, the analysis is performed separately for each data type. Implicit in the work of Randolph, Harezlak, & Feng (2012) is a framework for the simultaneous use of multiple data types. For instance, structural and/or functional connectivity measures can be viewed as prior knowledge about the structure of dependencies between brain regions in a linear model that estimates the association between brain region properties (e.g., cortical thickness) and a scalar outcome. Karas et al. (2019) demonstrated that using correct prior information significantly increases estimation accuracy. The statistical methodology, ridgified Partially Empirical Eigenvectors for Regression (riPEER) (Karas et al., 2019), incorporates such a predefined structure into a regression model, which minimizes the use of incorrect information. However, the estimation procedure in Karas et al. (2019) assumes that the response variable is normally distributed, which excludes, for instance, a binary response that indicates the presence/absence of a condition such as a disease or phenotype.

To fill this gap, we developed a variant of riPEER, named generalized ridgified Partially Empirical Eigenvectors for Regression (griPEER), which handles the outcomes coming from the exponential family of distributions. In the context of brain imaging, our approach can incorporate information from either a structural or functional connectivity matrix. Similar to the riPEER precursor, griPEER uses the predefined information across a whole range of scenarios—from full inclusion to complete exclusion. To achieve this, griPEER employs a penalized optimization approach with a flexible, parametrized penalty term with parameters chosen in a fully automated, data-driven manner.

Here, we work with a generalized linear regression model where the ith scalar outcome, y_i, belongs to the exponential family of distributions with the parameter θ_i. We consider only the canonical link functions and assume that θ = Xβ + Zb. Here, X denotes a matrix of covariates (e.g., demographic data) for which the prior information is not used, and Z is a matrix whose columns correspond to variables with at least a partially known structure. In this work, β includes the intercept and demographic data, and b represents the coefficients of the average cortical thickness of 66 brain regions. The associations between these cortical thickness measures are represented by a connectivity matrix that encodes a density of connections or the average fractional anisotropy (FA).

The use of structural information in image reconstruction and estimation has been implemented for over 50 years (Phillips, 1962; Bertero & Boccacci, 1998; Engl, Hanke, & Neubauer, 2000). If the object of interest is a function belonging to a class of, for example, differentiable functions, a differential operator-based penalty may be used to “regularize” or impose smoothness on the estimates (Huang, Shen, & Buja, 2008). This may improve the prediction and interpretability and is “efficient and sometimes essential” when many highly correlated predictors (Hastie, Buja, & Tibshirani, 1995) are present. When the object of estimation is a vector, the penalties are very often constructed based on ℓ₁ and ℓ₂ norms as implemented in LASSO (Tibshirani, 1996), adaptive LASSO (Zou, 2006), ridge regression (Tikhonov, 1963) and elastic net (Zou & Hastie, 2005), to name just a few.

There is no unique method or approach to regularize a particular model in different applications, and the final construction depends strongly on the context. If the coefficients are sparse or they occur in blocks, using the ℓ₁ norm to constrain them (as in the LASSO) or constrain the difference of adjacent coefficients (as in the fused LASSO) is useful (Tibshirani et al., 2005). A more generalized fused LASSO with two ℓ₁ norms could also be applied: one constraining the coefficients and one their pairwise differences (Xin et al., 2016).

When structure among the variables is more intricate, and some (possibly imprecise) knowledge is available, less generic penalization schemes are more appropriate (Slawski, Castell, & Tutz, 2010; Tibshirani & Taylor, 2011), for example, a p × p adjacency matrix representing known connections, or “edges,” between p nodes in a graph. This matrix can inform a model that aims to estimate the relationship between an outcome and a vector of p values at the graph nodes. More specifically, the adjacency matrix is used to define the graph Laplacian matrix, which represents differences between nodes (Chung, 2005) and may be used to penalize the process of estimating regression coefficients, b.

For any p × p matrix Q, defining a penalty of the form λb^⊤Qb, where λ is a nonnegative regularization parameter, constitutes the essence of the methods of Li & Li (2008) and Karas et al. (2019). Using a penalty of this form also links the considered optimization problem with the theory of mixed-effects models in which b is assumed to be a random-effects vector with distribution $N (0, σ_{b}^{2} Q^{- 1})$ , for some $σ_{b}^{2} > 0$ . This, in turn, reveals a connection with the Bayesian approach, where the distribution is treated as a prior on b, as in Maldonado (2009).

However, Q may not be invertible as is the case when Q is defined as Laplacian or normalized Laplacian (Chung, 2005). In addition, a single multiplicative parameter, λ, adjusts the trade-off between model fit and penalty terms but cannot change the regularization pattern, that is, the shape of the set {b: penalty(b) = const} is preserved. When Q is misspecified (i.e., is not informative), this lack of adaptivity may significantly degrade performance below that offered by an uninformed penalty such as ridge regression or LASSO (Karas et al., 2019).

Both these issues were considered in Karas et al. (2019), which does not assume that Q is exactly the true signal precision matrix, $Q$ , but is merely “close.” It is important to notice that brain connectivities are rather rough approximations to the truly complex brain network structure, and it is unrealistic to assume that all the entries reflect some universal “truth” in any given sample. Moreover, the multistep procedure of quantifying the diffusion tensor imaging-based structural brain connectivity itself can produce a significant number of false positives. Increasing the flexibility of the regularization process is therefore a natural step, and Karas et al. (2019) achieve that by extending the space of possible precision matrices and assuming $b \sim N (0, σ_{b}^{2} (Q + a I_{p})^{- 1})$ . For a > 0, such modification of Q is invertible and can therefore be directly used in the estimation procedure. The resulting penalty, λb^⊤(Q + aI_p)b, has an equivalent form $λ_{Q} b^{T} Q b + λ_{R} ‖ b ‖_{2}^{2}$ , and the connection with a specific linear mixed model enables the selection of λ_Q and X_R.

The approach by Karas et al. (2019) assumes the response variable to be normally distributed and hence not suitable for categorical outcomes. Here, we extend the concept of riPEER’s penalty function to the case when the distribution of the response variable is a member of a one-parameter exponential family of distributions. The proposed estimation method, griPEER, is of the form:

[\begin{matrix} {\hat{β}}^{gP} \\ {\hat{b}}^{gP} \end{matrix}] ≔ \underset{β, b}{argmin} {- 2 l o g l i k (β, b ∣ y) + λ_{Q} b^{T} Q b + λ_{R} ‖ b ‖_{2}^{2}},

(1)

where loglik(β, b ∣ y) is a log-likelihood. Here, the term −2loglik(β, b ∣ y) is used to fit the model to the response distribution, while the parameters λ_Q and λ_R are chosen based on the connection between the optimization problem and the generalized linear mixed model; this is formulated explicitly in Section 2. It is important to emphasize that these parameters not only determine the trade-off between the model fit and the penalty term but also the form of the penalty. More precisely, if λ_Q is large relative to λ_R, then the connectivity information has a large role in the estimation process. Conversely, when λ_Q is small relative to λ_R, the penalty is equal in all coordinates, as with ridge regression.

We illustrate this in a simple example with p = 2 variables and prior information stating that these variables are connected. The solution structure depends on the shape of level curve of penalty function as illustrated by three scenarios shown in Figure 1. If the relationship between variables, as represented in Q, is reflected in the data and related to the outcome y, then griPEER will tend to choose relatively large λ_Q, which links the coefficients in b (see Figure 1c). The other extreme is when the structure in Q is not informative for the relationship between y and Z. In such cases, griPEER will select a relatively large λ_R inducing a ridge-like penalty that ignores Q (see Figure 1a).

Figure 1: — Shapes of the set ${b : λ_{Q} b^{T} Q b + λ_{R} ‖ b ‖_{2}^{2} = 1}$ (level curve) for various pairs of regularization parameters: (a) assumed strong connections between variables is neglected, (b) moderate tendency for coefficients of the solution to be similar to each other and (c) strong tendency for coefficients of the solution to be similar to each other.

The remainder of this work is organized as follows. In Section 2, we formulate our statistical model, investigate the special case of binomial distribution and discuss the equivalence between generalized linear mixed models (GLMMs) and penalized optimization problems. We also describe the penalty term construction from the graph theory point of view. The estimation procedure used to select the optimal regularization parameters is introduced in Section 3, while Section 4 addresses the selection of response-relevant variables. The extensive simulations showing very good performance of griPEER in the context of estimation accuracy and variables selection (under various scenarios illustrating the impact of inaccurate prior information) are reported in Section 5. Finally, in Section 6, we apply our methodology to study associations between cortical thickness and HIV disease. The conclusions and a discussion are summarized in Section 7.

2. STATISTICAL MODEL

We address the problem of estimation in a penalized generalized linear model where the penalty term is derived using structural connectivity information. This information is represented by a p × p symmetric matrix whose nondiagonal elements are nonnegative and whose diagonal elements are set to zero. This adjacency matrix or connectivity matrix is denoted by $A$ . The corresponding graph Laplacian matrix, Q, which defines the penalty term of (1) is explained next, followed by specific details about the considered statistical model.

2.1. The Graph Laplacian, Q

We are interested in modelling the association between a scalar outcome, y, and a set of p predictor variables that are measured at each graph node. We assume that information about connections between these variables—that is, strengths of the connections between the nodes—can be summarized by a (symmetric) p × p adjacency matrix $A = [a_{i j}]$ , 1 ≤ i, j≤ p that has nonnegative entries and zeros on the diagonal. We denote the jth node degree as $d_{j} ≔ \sum_{i} a_{i j}$ and define the degree matrix as D: = diag(d₁, … , d_p).

Following Chung (2005), we define the unnormalized Laplacian, Q_u, corresponding to $A$ simply as $Q_{u} ≔ D - A$ . This matrix is always positively semidefinite. It is also singular as, for the vector of ones, 1 := [1, …, 1]^⊤, we have $1^{T} Q_{u} 1 = 1^{T} D 1 - 1^{T} A 1 = \sum_{i} d_{i} - \sum_{i} d_{i} = 0$ .

A penalty of the form b^TQ_ub, as in (1), can be intuitively understood by the following simple formula: for any adjacency matrix, $A$ , and its unnormalized Laplacian, Q_u,

b^{T} Q_{u} b = \sum_{i, j} a_{i j} {(b_{i} - b_{j})}^{2} .

(2)

That is, if the term b^TQ_ub is used in the optimization problem (1), the squared differences of coefficients are penalized in a manner that is proportional to the strengths of connections between them. Consequently, coefficients corresponding to nodes having many stronger connections (i.e., higher degree nodes) are constrained more than others.

In order to allow a smaller number of nodes with larger d_i to have more extreme values, we employed the normalized Laplacian, Q, which is obtained by dividing each column and row of Q_u by the square root of the corresponding node’s degree. As a result, the property (2), with Q instead of Q_u, takes the form $b^{T} Q b = \sum_{i, j : a_{i j} \neq 0} a_{i j} {(\frac{b_{i}}{\sqrt{d_{i}}} - \frac{b_{j}}{\sqrt{d_{j}}})}^{2}$ . Q has ones on the diagonal and, just like the unnormalized Laplacian, is a symmetric, positive semidefinite, and singular matrix.

2.2. Statistical Model in General Form

Consider the general setting where y is an n × 1 vector of observations, and the design matrices, X and Z, are n × p and n × m matrices, respectively. The columns of X represent the p covariates, and the rows are denoted by X_i. Similarly, the columns of Z correspond to m variables, or graph nodes, for which some connectivity information may be available; the rows are denoted by Z_i. We assume that unknown vectors b and β exist such that, for each i∈{1, … , n}, y_i is the member of a one-parameter exponential family of distributions described by

f (y_{i}) = exp {y_{i} θ_{i} - ψ (θ_{i}) + c (y_{i}, φ)},

(3)

where θ_i := X_iβ + Z_ib is a subject-specific parameter. The expression in (3) includes exponential, binomial, Poisson and Laplace densities.

It can be shown that, for the exponential family of distributions, the mean of y_i is simply given by the first derivative of ψ at the point θ_i, while the variance could be expressed as the second derivative of ψ, that is,

E (y_{i}) = ψ^{'} (θ_{i}), var (y_{i}) = ψ^{″} (θ_{i}) .

(4)

Moreover, the log-likelihood function is

{loglik}_{ψ, c} (β, b ∣ y) = \sum_{i = 1}^{n} {y_{i} (X_{i} β + Z_{i} b) - ψ (X_{i} β + Z_{i} b) + c (y_{i})}

(5)

and it provides a core for the methodology presented in this work. Indeed, we define griPEER as a solution to the following optimization problem

[\begin{matrix} {\hat{β}}^{gP} \\ {\hat{b}}^{gP} \end{matrix}] ≔ \underset{β, b}{argmin} {- 2 l_{ψ} (β, b ∣ y) + λ_{Q} b^{T} Q b + λ_{R} ‖ b ‖_{2}^{2}},

(6)

where $l_{ψ} (β, b ∣ y) ≔ \sum_{i = 1}^{n} {y_{i} (X_{i} β + Z_{i} b) - ψ (X_{i} β + Z_{i} b)}$ consists of the terms of log-likelihood function (5) depending on b and β. Here, λ_Q and λ_R are regularization parameters, which are selected automatically, as described in Section 3.

2.3. The Special Case: Binomial Distribution

This subsection focuses on one special choice of a density function, the binomial distribution. This case is further outlined in Sections 5 (binomial outcome in the simulations) and 6 (binomial distribution application to neuroimaging data).

Similar to the classical logistic regression theory, we assume that the response, y_i, takes the value 1 with probability e^θ_i/(e^θ_i + 1) and 0 with probability 1/(e^θ_i + 1). Consequently, the density function, f(y_i), is given by

f (y_{i}) = exp {y_{i} θ_{i} - ln (1 + e^{θ_{i}})},

(7)

which is a member of the exponential family of distributions (3) with ψ(θ_i) = ln(1 + e^θ_i) and c(y_i) = 0. We also have

{\begin{matrix} E (y_{i}) = ψ^{'} (θ_{i}) = e^{θ_{i}} ∕ (e^{θ_{i}} + 1) \\ var (y_{i}) = ψ^{″} (θ_{i}) = e^{θ_{i}} ∕ (e^{θ_{i}} + 1)^{2} \end{matrix}

(8)

From this, $θ_{i} = ln (\frac{E (y_{i})}{1 - E (y_{i})})$ that, with the assumption θ = Xβ + Zb adopted earlier, yields the canonical link for logistic regression, the logit function.

2.4. Equivalence Between GLMM and Two Optimization Problems

The optimization problem in (6) can be strongly connected with the specific GLMM formulation. Indeed, suppose that β and b are vectors of fixed and random effects, respectively. Moreover, y_i ∣ b are independent, and consequently, $f (y ∣ b) = \prod_{i = 1}^{n} f (y_{i} ∣ b)$ , f(y_i∣b) = exp {y_i(X_iβ + Z_ib)–ψ(X_iβ + Z_ib) + c(y_i)}, for some (known) functions ψ, c and i = 1, … , n. Moreover, $b \sim N (0, {\tilde{Q}}_{λ}^{- 1})$ , where ${\tilde{Q}}_{λ} ≔ λ_{Q} Q + λ_{R} I_{p}$ for some unknown, positive parameters λ_Q and λ_R.

To see this correspondence, assume that the parameters λ_Q and λ_R have been estimated, say as $\hat{λ} ≔ [{\hat{λ}}_{Q}, {\hat{λ}}_{R}]^{T}$ , and these values are used to obtain β and b. One can proceed by treating both fixed and random effects as parameters and finding maximum likelihood estimates by maximizing (with respect to β, b) the density function

f (y, b) = f (y ∣ b) f (b) = \prod_{i = 1}^{n} {f (y_{i} ∣ b)} f (b) \propto exp {\sum_{i = 1}^{n} [y_{i} θ_{i} - ψ (θ_{i})] - \frac{1}{2} b^{T} {\tilde{Q}}_{\hat{λ}} b},

(9)

where θ_i = X_iβ + Z_ib, for i = 1, … , n. Taking the logarithm of the above leads directly to the objective of the optimization problem (6).

We now derive a constrained optimization problem that is equivalent to (6) and reveals the impact of the regularization parameters on the solution from a slightly different perspective. For this, suppose that $[\begin{matrix} \hat{β} \\ \hat{b} \end{matrix}]$ is the solution to (6) for given parameters λ_Q and λ_R. Then, we can define $c ≔ λ_{Q} {\hat{b}}^{T} Q \hat{b} + λ_{R} ‖ \hat{b} ‖_{2}^{2} \geq 0$ . One can check that $[\begin{matrix} \hat{β} \\ \hat{b} \end{matrix}]$ also solves the problem

\underset{β, b}{argmin} {- 2 l_{ψ} (β, b ∣ y) + λ_{Q} b^{T} Q b + λ_{R} ‖ b ‖_{2}^{2}} subject to λ_{Q} b^{T} Q b + λ_{R} ‖ b ‖_{2}^{2} = c .

(10)

The multiplicative factor may be neglected, as well as the term $λ_{Q} b^{T} Q b + λ_{R} ‖ b ‖_{2}^{2}$ , which is constant in the feasible set. This yields

\underset{β, b}{argmax} l_{ψ} (β, b ∣ y) subject to λ_{Q} b^{T} Q b + λ_{R} ‖ b ‖_{2}^{2} = c .

(11)

This formulation quantifies our intuition presented in the introduction and the corresponding Figure 1, where griPEER selects the estimates by taking the maximal likelihood value on a set whose shape is explicitly regularized by the parameters λ_Q and λ_R.

3. A NEW ESTIMATION ALGORITHM

To select the optimal values of λ_Q and λ_R, we employ the corresponding GLMM. The likelihood function, $L (β, λ ∣ y)$ , is given by

L (β, λ ∣ y) = \int_{R^{p}} f_{β, λ} (y ∣ b) f_{β, λ} (b) d b = \int_{R^{p}} {∣ 2 π {\tilde{Q}}_{λ} ∣}^{- \frac{1}{2}} exp {\sum_{i = 1}^{n} [y_{i} (X_{i} β + Z_{i} b) - ψ (X_{i} β + Z_{i} b) - c (y_{i})] - \frac{1}{2} b^{T} {\tilde{Q}}_{λ} b} d b .

(12)

Unfortunately, obtaining the maximum of $L$ with respect to β and λ is complicated by the fact that there is no closed-form solution to the multidimensional integral in (12). However, several approaches to find a solution have been proposed. Breslow & Clayton (1993) proposed a general method based on penalized quasilikelihood (PQL) for the estimation of fixed effects and the prediction of random effects. Wolfinger & O’connell (1993) investigated the pseudolikelihood (PL) approach, which is closely related to the Laplace approximation of $L$ . Other proposals include the Adaptive Gaussian Quadrature to approximate integrals with respect to a given kernel (Pinheiro & Chao, 2006) and an Markov chain Monte Carlo-based procedure (Zeger & Karim, 1991).

In this work, we focus on the Wolfinger PL approach, which is recognized as being fast and computationally efficient. It relies on the first-order Taylor series approximation and uses the linear mixed model (LMM) proxy in the iterative process: At each iteration, the updates of β and b are based on the variance–covariance parameters of random effects. The steps are repeated until the convergence criteria are met.

The procedure used here differs from Wolfinger & O’connell (1993) in how the updates of β and b are obtained. Unlike the Wolfinger PL approach, which uses the solution to the mixed-model equations to update β and b, we employ the correspondence between GLMM and the griPEER optimization problem, as described in Section 2.4. Specifically, the (k – 1)-step estimates of λ_Q and λ_R (i.e., $\overset{[k - 1]}{λ_{Q}}$ and $\overset{[k - 1]}{λ_{R}}$ ) are used to obtain the (k – 1)-step estimates of β and b (i.e., $\overset{[k - 1]}{β}$ and $\overset{[k - 1]}{b}$ ) via the solution to (6). Consequently, we can define $\overset{[k - 1]}{θ_{i}} ≔ X_{i} \overset{[k - 1]}{β} + Z_{i} \overset{[k - 1]}{b}$ .

Details of our estimation procedure are as follows. Using the Taylor approximation of function ψ′ at point $\overset{[k - 1]}{θ_{i}}$ , we get

ψ^{'} (θ_{i}) \approx ψ^{'} (\overset{[k - 1]}{θ_{i}}) + ψ^{″} (\overset{[k - 1]}{θ_{i}}) \cdot (θ_{i} - \overset{[k - 1]}{θ_{i}})

(13)

and therefore, from (4)

[ψ^{″} (\overset{[k - 1]}{θ_{i}})]^{- 1} \cdot (E (y_{i} ∣ β, b) - ψ^{'} (\overset{[k - 1]}{θ_{i}})) + \overset{[k - 1]}{θ_{i}} \approx θ_{i} .

(14)

We now define a random variable $\overset{[k]}{y_{i}} ≔ [ψ^{″} (\overset{[k - 1]}{θ_{i}})]^{- 1} \cdot (y_{i} - ψ^{'} (\overset{[k - 1]}{θ_{i}})) + \overset{[k - 1]}{θ_{i}}$ . The main step now is the assumption that the distribution of $\overset{[k]}{y_{i}}$ can be well approximated by a normal density. Computation of mean and variance of $\overset{[k]}{y_{i}}$ immediately yields

E (\overset{[k]}{y_{i}} ∣ b) \approx θ_{i} = X_{i} β + Z_{i} b, and var (\overset{[k]}{y_{i}} ∣ b) = [ψ^{″} (\overset{[k - 1]}{θ_{i}})]^{- 2} ψ^{″} (θ_{i}) \approx [ψ^{″} (\overset{[k - 1]}{θ_{i}})]^{- 1} .

(15)

The assumption that $\overset{[k]}{y_{i}}$ is approximately normally distributed allows us to replace the GLMM formulation in the kth step by an LMM where β is a vector of fixed effects and b is a vector of random effects, $\overset{[k]}{y} = [X Z] [\begin{matrix} β \\ b \end{matrix}] + \overset{[k]}{ε}$ , $\overset{[k]}{ε} \sim N (0, \overset{[k]}{W})$ , where $\overset{[k]}{W} ≔ d i a g ([ψ^{″} (\overset{[k - 1]}{θ_{1}})]^{- 1}, \dots, [ψ^{″} (\overset{[k - 1]}{θ_{n}})]^{- 1})$ , $b \sim N (0, {\tilde{Q}}_{λ}^{- 1})$ , where ${\tilde{Q}}_{λ}$ was defined before.

We denote by $\overset{[k]}{P} ≔ I - X (X^{T} \overset{[k]}{W^{- 1}} X)^{- 1} X^{T} \overset{[k]}{W^{- 1}}$ the $\overset{[k]}{W}$ -weighted projection onto the orthogonal complement of the columns of X. Now, defining $\overset{[k]}{\tilde{y}} ≔ \overset{[k]}{P} \overset{[k]}{y}$ , $\overset{[k]}{X} ≔ \overset{[k]}{P} X$ and $\overset{[k]}{Z} ≔ \overset{[k]}{P} Z$ we assume that

\overset{[k]}{\tilde{y}} \sim N (0, \overset{[k]}{V}_{λ}), for \overset{[k]}{V}_{λ} ≔ \overset{[k]}{Z} {\tilde{Q}}_{λ}^{- 1} \overset{[k] T}{Z} + \overset{[k]}{W} .

(16)

Maximizing the log-likelihood for $\overset{[k]}{\tilde{y}}$ , that is, the function $l (\overset{[k]}{\tilde{y}}; λ) ≔ - \frac{n}{2} ln 2 π - \frac{1}{2} ln ∣ \overset{[k]}{V}_{λ} ∣ - \frac{1}{2} \overset{[k]}{{\tilde{y}}^{T}} \overset{[k]}{V_{λ}^{- 1}} \overset{[k]}{\tilde{y}}$ , leads directly to the optimization problem

[\begin{matrix} \overset{[k]}{λ_{Q}} \\ \overset{[k]}{λ_{R}} \end{matrix}] ≔ \underset{λ \underline{≻} 0}{armin} {ln ∣ \overset{[k]}{V}_{λ} ∣ + \overset{[k]}{{\tilde{y}}^{T}} \overset{[k]}{V_{λ}^{- 1}} \overset{[k]}{\tilde{y}}},

(17)

where λ ≽ 0 refers to {(λ_Q, λ_R) : λ_Q ≥ 0, λ_R ≥ 0}. The following proposition helps us to rewrite the objective of (17). We provide proof in the Supplementary Material of section 8.

Proposition 1. Let $\overset{[k]}{Ω} ≔ \overset{[k] T}{Z} \overset{[k] - 1}{W} \overset{[k]}{Z}$ and $\overset{[k]}{q} ≔ \overset{[k] T}{Z} \overset{[k] - 1}{W} \overset{[k]}{\tilde{y}}$ . Then

$ln det \overset{[k]}{V_{λ}} = det ({\tilde{Q}}_{λ} + \overset{[k]}{Ω}) - ln det {\tilde{Q}}_{λ} + ln det (\overset{[k]}{W})$ ,
$\overset{[k] T}{\tilde{y}} \overset{[k]}{V_{λ}^{- 1}} \overset{[k]}{\tilde{y}} = - \overset{[k] T}{q} ({\tilde{Q}}_{λ} + \overset{[k]}{Ω})^{- 1} \overset{[k]}{q} + \overset{[k]}{{\tilde{y}}^{T}} \overset{[k]}{W^{- 1}} \overset{[k]}{\tilde{y}}$ .

This proposition makes it possible to reformulate (17) and define the kth step update, $\overset{[k]}{λ_{Q}}$ and $\overset{[k]}{λ_{R}}$ , as

\underset{λ ⪰ 0}{armin} {ln det {({\tilde{Q}}_{λ} + \overset{[k]}{Ω}) {\tilde{Q}}_{λ}^{- 1}} - \overset{[k] T}{q} ({\tilde{Q}}_{λ} + \overset{[k]}{Ω})^{- 1} \overset{[k]}{q}} .

(18)

It is important to use an efficient and accurate method to solve (18) as this problem appears in every step k and determines when the entire algorithm is terminated (i.e., when $‖ \overset{[k]}{λ} - \overset{[k - 1]}{λ} ‖$ is sufficiently small). To achieve this, we have analytically derived the gradient and the Hessian of the objective function as detailed in the Supplementary Material. The final algorithm for selecting the regularization parameters is outlined here:

Algorithm 1. Finding regularization parameters in griPEER

\begin{matrix} Input : matrices: Z, X and Q; vector: y; initial point: \overset{[0]}{λ} ≔ [\overset{[0]}{λ_{Q}}, \overset{[0]}{λ_{R}}]^{T}; stop criterion: δ > 0; \\ function which defines the density: ψ; k ≔ 1 \\ do \\ 1 . define \overset{[k - 1]}{β} and \overset{[k - 1]}{b} by solving: \\ \underset{β, b}{argmin} {- 2 \sum_{i = 1}^{n} [y_{i} (X_{i} β + Z_{i} b) - ψ (X_{i} β + Z_{i} b)] + \overset{[k - 1]}{λ_{Q}} b^{T} Q b + \overset{[k - 1]}{λ_{R}} ‖ b ‖_{2}^{2}}; \\ 2 . \overset{[k - 1]}{θ} ≔ X \overset{[k - 1]}{β} + Z \overset{[k - 1]}{b}, \overset{[k]}{W} ≔ diag ([ψ^{″} (\overset{[k - 1]}{θ_{1}})]^{- 1}, \dots, [ψ^{″} (\overset{[k - 1]}{θ_{n}})]^{- 1}); \\ 3 . define \overset{[k]}{y} by putting \overset{[k]}{y_{i}} ≔ [ψ^{″} (\overset{[k - 1]}{θ_{i}})]^{- 1} \cdot (y_{i} - ψ^{'} (\overset{[k - 1]}{θ_{i}})) + \overset{[k - 1]}{θ_{i}}, for i = 1, \dots, n; \\ 4 . \overset{[k]}{P} ≔ I - X (X^{T} \overset{[k]}{W^{- 1}} X)^{- 1} X^{T} \overset{[k]}{W^{- 1}}; \\ 5 . \overset{[k]}{\tilde{y}} ≔ \overset{[k]}{P} \overset{[k]}{y}, \overset{[k]}{X} ≔ \overset{[k]}{P} X, \overset{[k]}{Z} ≔ \overset{[k]}{P} Z; \\ 6 . \overset{[k]}{Ω} ≔ \overset{[k]}{T Z} \overset{[k] - 1}{W} \overset{[k]}{Z}, \overset{[k]}{q} ≔ \overset{[k] T}{Z} \overset{[k] - 1}{W} \overset{[k]}{\tilde{y}}; \\ 7 . \overset{[k]}{λ} ≔ \underset{λ \underline{≻} 0}{argmin} {ln ∣ (λ_{Q} Q + λ_{R} I_{p} + \overset{[k]}{Ω}) {(λ_{Q} Q + λ_{R} I_{p})}^{- 1} ∣ - \overset{[k] T}{q} {(λ_{Q} Q + λ_{R} I_{p} + \overset{[k]}{Ω})}^{- 1} \overset{[k]}{q}}; \\ 8 . k \leftarrow k + 1; \\ while {‖ \overset{[k]}{λ} - \overset{[k - 1]}{λ} ‖ ∕ ‖ \overset{[k - 1]}{λ} ‖ > δ} \end{matrix}

Open in a new tab

To find the numerical solutions to problems in 1. and 7., we employed, respectively, the penalized (McIlhagga, 2016) and fsolve functions from the MATLAB Optimization Toolbox.

4. PROCEDURES FOR SIGNIFICANCE TESTING

Unlike the LASSO estimation procedure that produces a sparse set of regression coefficients but does not (without additional theory such as (Buhlmann, 2013; Zhao & Shojaie, 2016)) provide statistical significance testing, we employ two methods to identify variables that are significantly related to the response. In this section, we describe two such approaches that were implemented in our analysis. Both use the knowledge about the optimal regularization parameters described in the previous section. The first approach takes advantage of asymptotic properties of generalized linear model (GLM) estimates and constructs the asymptotic variance–covariance matrix in a similar fashion as proposed by Cessie & Houwelingen (1992) in the context of ridge-penalized logistic regression. The second approach applies the bootstrap method. Subsequently, we will refer to these two approaches as griPEER_asmp (the asymptotic-based approach) and griPEER_boot (the bootstrap-based approach). The numerical experiments presented in Section 5 suggest that griPEER_boot can achieve significantly greater power than griPEER_asmp when applied to structural brain imaging metrics. As the false discovery rates among variables labelled as relevant were similar in this brain imaging application, we applied only griPEER_boot in the analysis of cortical thickness in HIV-positive and HIV-negative participants (Section 6).

4.1. Asymptotic Variance–Covariance Matrix

We start by introducing some notations. Denote by $B$ the m + p dimensional vector of true coefficients and by $\hat{B}$ its estimate given by the solution to (6). We will also need the matrix $X ≔ [X, Z]$ and the vector $θ ≔ X B$ . Moreover, we define a (m + p) × (m + p) penalty matrix as $Q ≔ [\frac{0_{m \times m}}{0_{p \times m}} ∣ \frac{0_{m \times p}}{λ_{Q} Q + λ_{R} I_{p}}]$ , where the nonnegative parameters λ_Q and λ_R are adjusted by Algorithm 1. To summarize, $\hat{B}$ is the solution to ${argmin}_{B \in R^{p + m}} {2 \sum_{i} ψ (X_{i} B) - 2 y^{T} X B + B^{T} Q B}$ , with ψ being a function from the definition of exponential family of distributions (3). Furthermore, the expressions we derive in this section include the diagonal matrix Ψ, defined as Ψ := diag{ψ″(θ₁), … , ψ′(θ_n)}.

Using the first-order Taylor approximation, as well as asymptotic properties of a GLM estimate, one can find that estimated asymptotic variance for $\hat{B}$ has a form ${var}_{a} \hat{B} = {(X^{T} Ψ X + Q)}^{- 1} X^{T} Ψ X {(X^{T} Ψ X + Q)}^{- 1}$ . The derivation is based on Cessie & Houwelingen (1992), and it is described in Section 8.3 of the Supplementary Material. Based on the above formula, we propose a simple decision-making strategy where the ith covariate is labelled as statistically relevant if 0 is excluded from the confidence interval for its respective regression coefficient, that is, $0 \notin ({\hat{B}}_{i} - 1.96 \cdot \sqrt{({var}_{a} \hat{B})_{i i}}, {\hat{B}}_{i} + 1.96 \cdot \sqrt{({var}_{a} \hat{B})_{i i}})$ . The entire procedure was presented as Algorithm 2.

Algorithm 2. Asymptotic confidence interval

\begin{matrix} Input : matrices: X, Z and Q, estimate: \hat{B} = [\begin{matrix} {\hat{β}}^{gP} \\ {\hat{b}}^{gP} \end{matrix}], optimal parameters: λ_{Q} and λ_{R}, function: ψ \\ 1 . Define: X ≔ [X, Z], θ ≔ X \hat{B} and Ψ ≔ diag {ψ^{″} (θ_{1}), \dots, ψ^{″} (θ_{n})} \\ 2 . Construct m + p by m + p matrix: Q ≔ [\frac{0_{m \times m}}{0_{p \times m}} ∣ \frac{0_{m \times p}}{λ_{Q} Q + λ_{R} I_{p}}] \\ 3 . Calculate the variance of \hat{B} : {var}_{a} \hat{B} ≔ {(X^{T} Ψ X + Q)}^{- 1} X^{T} Ψ X {(X^{T} Ψ X + Q)}^{- 1} \\ 4 . Define the asymptotic confidence interval (CI) for B_{i} as \\ C I_{a s m p} (B_{i}) ≔ ({\hat{B}}_{i} - 1.96 \cdot \sqrt{({var}_{a} \hat{B})_{i i}}, {\hat{B}}_{i} + 1.96 \cdot \sqrt{({var}_{a} \hat{B})_{i i}}) \end{matrix}

Open in a new tab

4.2. Bootstrap-Based Approach

Here, the variances of $\hat{B}$ coefficients were estimated based on bootstrap samples. Each such sample was created from n elements of y and n corresponding rows of Z and X, where indices were selected randomly by sampling with replacement. The dataset obtained in the jth repetition, X^[j], Z^[j] and y^[j], was then substituted according to (6) with λ_Q and λ_R selected by Algorithm 1 applied to the original dataset (i.e., λ_Q and λ_R were estimated only once). The percentile bootstrap confidence intervals, with the significance level α = 0.05, were defined based on all s estimates, ${\hat{B}}^{[1]}, \dots, {\hat{B}}^{[s]}$ . The default value of s was set to 500 in this analysis and used in simulations performed in Section 5.3. Coefficients from the griPEER estimate whose confidence intervals do not contain zero are then considered response-related discoveries. We summarize this procedure as Algorithm.

Algorithm 3. Bootstrap confidence interval

\begin{matrix} Input : matrices: X, Z and Q, column matrix of responses: y, optimal parameters: λ_{Q} and λ_{R}, \\ the number of bootstrap samples: s (by defaults s = 500) \\ For j \in (1, \dots, k) do : \\ 1 . generate j th bootstrap sample, X^{[j]}, Z^{[j]} and y^{[j]}, by sampling with replacement from \\ [X, Z, y] \\ 2 . get j th griPEER estimate, {\hat{B}}^{[j]}, for Q, X^{[j]}, Z^{[j]}, y^{[j]} and tuning parameters λ_{Q} and λ_{R} \\ End \\ Define C I_{b o o t} (B_{i}) - the bootstrap CI for B_{i} - as the percentile bootstrap confidence interval \\ for {\hat{B}}_{i}^{[1]}, \dots, {\hat{B}}_{i}^{[s]} with the significance level α = 0.05 \end{matrix}

Open in a new tab

5. NUMERICAL EXPERIMENTS

We conducted a simulation study to investigate the performance of griPEER when responses are modelled by binomial distribution and compared the results with the logistic ridge estimates.

5.1. Definitions

5.1.1. Matrix density

For a p × q matrix A, density is defined as a proportion of nonzero entries,

dens (A) ≔ \frac{1}{p q} \sum_{i, j} 1 {∣ A (i, j) ∣ > 0} .

(19)

5.1.2. Matrix dissimilarity

To quantify a dissimilarity between two p × q matrices, A and B, with dens(A) = dens(B), we defined

diss (A, B) ≔ (\sum_{i, j} 1 {∣ A (i, j) - B (i, j) ∣ > 0}) ∕ (2 \sum_{i, j} 1 {B (i, j) > 0}),

(20)

with values in the interval [0, 1]. If diss(A, B) = 0, then A = B, while diss(A, B) = 1 indicates that the positions of nonzero entries do not overlap.

5.2. Model Coefficient Estimation

5.2.1. Settings

5.2.1.1. “Informativeness” of the penalty term—

The simulation settings were designed to evaluate performance in a variety of situations ranging from an “observed” (available in estimation) connectivity matrix, $A^{o b s}$ , that is fully informative to one that is completely noninformative. Here, “informativeness” refers to the amount of true dependencies among the variables that are represented in the matrix.

We denote $A^{t r u e}$ to be a matrix representing true connections between variables and $A^{o b s}$ to be a matrix that is observed and used in an estimation via griPEER. To express “informativeness” of $A^{o b s}$ with respect to $A^{t r u e}$ , we used a measure of dissimilarity, $diss (A^{o b s}, A^{t r u e})$ , defined in (20). We have

$diss (A^{o b s}, A^{t r u e}) = 0$ indicates that $A^{o b s}$ is fully informative;
$diss (A^{o b s}, A^{t r u e}) = 1$ indicates that $A^{o b s}$ is noninformative;
$diss (A^{o b s}, A^{t r u e}) \in (0, 1)$ indicates that $A^{o b s}$ is partially informative.

5.2.1.2. Brain region connectivity context—

One may view $A^{t r u e}$ as an adjacency matrix of a graph representing the connections between brain regions, and our simulations scenarios are based on the following four interpretations regarding this structure.

$A_{1}$ : “homologous regions.” $A_{1}$ represents a case when brain regions, i and j, are connected (i.e., $A_{1} (i, j) = 1$ ) if and only if i and j are homologous brain regions from different hemispheres (Figure 2, first panel).
$A_{2}$ : “modularity.” $A_{2}$ represents a case when brain regions i and j are connected if and only if they belong to the same module with $A_{2} (i, j) = 1$ within the module and 0 otherwise (Figure 2, second panel).
$A_{3}$ : “density of connections, masked.” $A_{3}$ is defined based on the brain-imaging measure—density of connections between brain regions (see, Section 6)—and is then “masked” by modularity information. Here, $A_{3} (i, j)$ equals the median of a density of connections between regions i and j if they belong to the same module. Otherwise, $A_{3} (i, j) ≔ 0$ (Figure 2, third panel)
$A_{4}$ : “neighbouring regions.” $A_{4}$ represents a case when brain regions i and j are connected if they are in close spatial proximity $(A_{4} (i, j) > 0)$ . Otherwise, they are not connected $(A_{4} (i, j) ≔ 0)$ (Figure 2, last panel).

Figure 2: — Matrices used in the simulation study to construct $A^{t r u e}$ . Presented are variants for p = 66 (based on the Desikan–Killiany atlas (Desikan et al., 2006)). $A_{1}$ *“homologous regions,”* $A_{2}$ *“modularity”*, $A_{3}$ *“density of connections, masked”* and $A_{4}$ *“neighbouring regions.”*

A homologous regions matrix $A_{1}$ reflects the situation where we assume that only homologous regions from two hemispheres are assumed to be connected. A modularity matrix $A_{2}$ , in turn, represents an adjacency-defining division of the brain cortical regions into five modules (Sporns, 2013; Cole et al., 2014; Sporns & Betzel, 2016). Next, a “density of connections, masked” matrix $A_{3}$ is based on the estimated density of connections between brain cortical regions, as described in Section 6). Finally, the “neighbouring regions” matrix $A_{4}$ models the situation where adjacent brain regions are connected, that is, the strength of connection between brain regions depends on the physical distance between them.

5.2.1.3. Simulation scenarios—

We evaluated three simulation scenarios to express different sources of “uninformativeness” of $A^{o b s}$ that loosely reflect real-life scenarios. For each scenario, we tested all four types of matrices, $A_{1}, \dots, A_{4}$ . In addition, we consider the number of observations that is commonly encountered in the imaging studies, ranging from n = 100 to n = 400, and the number of predictors ranging from p = 66 to p = 528 corresponding to common parcellations of the cortex (Eickhoff et al., 2015; Eickhoff, Yeo, & Genon, 2018).

Scenario 1. The observed connectivity matrix, $A^{o b s}$ , represents connections (partially) permuted with respect to connections represented by $A^{t r u e}$ . Based on one of the four considered matrices, the corresponding $A^{o b s}$ matrix is constructed by randomizing edges of a graph given by A^true until a desired dissimilarity, diss( $A^{o b s}$ , $A^{t r u e}$ ), is achieved. The randomization technique preserves graph size, density, strength and graph degree sequence (and hence degree distribution). Figure 13 in Section 8.5 of the Supplementary Material shows a visualization of the permutation effect for all four matrices considered, $A_{1}^{t r u e}, \dots, A_{4}^{t r u e}$ ; Figure 3 below shows a visualization of permutation effect for one selected matrix $A_{3}^{t r u e}$ .
Scenario 2. We investigate the impact of incorrect information of strong similarity between variables (and consequently an incorrect assumption about closeness of their regression coefficients) while, in fact, their influence on the response variable is very distinct. To model such situations, the true signal was generated by also taking into account the dissimilarity between some coefficients, which was accomplished by setting some entries in $A^{t r u e}$ to negative values. Specifically, for i ∈ {1, … , 4}, matrix $A_{i}^{t r u e}$ was defined by changing entries of k columns and corresponding k rows (with k ∈ {1, 4, 7, 10}) of $A_{i}$ into their negative values. Such a structure of $A^{t r u e}$ yields the tendency that k coefficients of the true signal will be separated from others. Here, $A^{o b s} (i, j) = ∣ A^{t r u e} (i, j) ∣$ , and hence, $A^{o b s}$ contains only nonnegative values. Figure 14 in Section 8.5 of the Supplementary Material shows a visualization of the sign alteration effect for all $A^{t r u e}$ matrices considered, $A_{1}^{t r u e}, \dots, A_{4}^{t r u e}$ ; Figure 4 below shows a visualization of the sign alteration effect for one selected matrix a true $A_{3}^{t r u e}$ .
Scenario 3. The observed connectivity matrix $A^{o b s}$ has lower or higher matrix density than $A^{t r u e}$ . For $A^{t r u e}$ defined based on one of the four considered matrices, the corresponding $A^{o b s}$ is then constructed by randomly removing or adding edges to the graph of connections represented by $A^{t r u e}$ until the desired ratio of matrix densities, $dens (A^{o b s}) ∕ dens (A^{t r u e})$ , is reached. Figure 15 in Section 8.5 of the Supplementary Material shows a visualization of edges removing and adding effect for all $A^{o b s}$ matrices considered, $A_{1}^{o b s}, \dots, A_{4}^{o b s}$ ; Figure 5 below shows a visualization of the removing and adding effects of edges for one selected matrix $A_{3}^{o b s}$ .

Figure 3: — $A_{3}^{t r u e}$ connectivity graph adjacency matrix (first column) and $A_{3}^{o b s}$ connectivity graph adjacency matrices (second to fourth columns) used in Scenario 1.

Figure 4: — Visualization of sign alteration effect for $A_{3}^{t r u e}$ connectivity graph adjacency matrix used in Scenario 2.

Figure 5: — Visualization of removing and adding effects of edges for matrix $A_{3}^{o b s}$ used in Scenario 3.

5.2.1.4. Simulation procedure—

In each numerical experiment, we performed the following procedure.

For graph adjacency $A^{t r u e}$ , we computed its normalized Laplacian, Q^true (in Scenario 2, the node’s degree is defined as $d_{i} ≔ \sum_{j} ∣ a_{i j} ∣$ ; see Section 2.1).
Replaced the zero singular values of Q^true by 0.01 · s, where s is the smallest nonzero singular value of Q^true (to obtain an invertible matrix required in 6. (a)).
For graph adjacency matrix, $A^{o b s}$ , we computed its normalized Laplacian, Q^obs.
Generated $Z \in R^{n \times p}$ , where the rows are independently distributed by $N_{p} (0, Σ)$ , where Σ is variance–covariance matrix estimated from a real data study (see: Sect. 6); standardized columns of Z have zero mean and unit ℓ₂ norm.
Generated X as n-dimensional column of ones.
Performed the following steps 100 times:
- generated $b \in R^{p}$ as $b \sim N (0, σ_{b}^{2} (Q^{t r u e})^{- 1})$ and set β = 0,
- defined θ : = Xβ + Zb,
- defined pr^Binom := [e^θ1 /(1 + e^θ₁), …, e^θ_n/(1 + e^θ_n)]^⊤,
- generated y ~ Binom(pr^Binom), $y \in R^{n \times 1}$ ,
- estimated model coefficients b, β using (i) griPEER, assuming the binomial distribution of y and using Q^obs in a penalty term and (ii) logistic ridge estimator
- computed b estimation error, $M S E r ≔ ‖ \hat{b} - b ‖_{2}^{2} ∕ ‖ b ‖_{2}^{2}$ , for two b estimates, (i) ${\hat{b}}^{g r i P E E R}$ and (ii) ${\hat{b}}^{l . r i d g e}$ .
Computed mean relative relative Mean Squared Error (MSEr) out of the 100 runs from (5), for the two estimation methods.

Importantly, a “true” coefficient vector b obtained as $b \sim N (0, σ_{b}^{2} (Q^{t r u e})^{- 1})$ reflects the connectivity structure represented by $A^{t r u e}$ . Example vectors b generated using $A_{1}, \dots, A_{4}$ are presented in Figure 12 in the Section 8.4 of the Supplementary Material.

5.2.1.5. Simulation parameters—

We consider the following experiment settings:

number of predictors: p ∈ {66, 198, 528},
number of observations: n ∈ {100, 200, 400},
$A^{t r u e}$ matrix constructed based on $A_{i} \in {A_{1}, \dots, A_{4}}$ ,
(Scenario 1.) dissimilarity between $A^{o b s}$ and $A^{t r u e} : d i s s (A^{o b s}, A^{t r u e}) \in [0, 1]$ ,
(Scenario 2.) number of columns (and corresponding rows) of $A^{t r u e}$ that have signs switched: k ∈ {0, 1, 4, 7, 10},
(Scenario 3.) density ratio: $dens (A^{o b s}) ∕ dens (A^{t r u e}) \in [0.5, 1.5]$ .

The number of predictors, p = 66, was selected to match the imaging analysis of the 66 brain regions as described in Section 6. To investigate the situations with a larger number of predictors for ith type of connectivity pattern, we created block-diagonal adjacency matrices with $A_{i}$ as blocks. Specifically, adjacency matrices in cases with p = 198 and p = 528 were defined as $diag {A_{i}, \dots, A_{i}}$ with a number of repetitions selected, so the resulted dimension matches the target p.

5.2.2. Results

5.2.2.1. Scenario 1—

In Scenario 1, we compare griPEER and logistic ridge estimation methods when an observed connectivity matrix $A^{o b s}$ contains connections that are permuted with respect to connections represented by $A^{t r u e}$ . We consider the following simulation parameter values: number of predictors p ∈ {66, 198, 528}; number of observations n ∈ {100, 200, 400}, $A^{t r u e}$ base matrix $A_{1}, \dots, A_{4}$ ; and dissimilarity between $A^{o b s}$ and $A^{t r u e}$ $diss (A^{o b s}, A^{t r u e}) \in [0, 1]$ . Figure 16 in the Section 8.6 of the Supplementary Material displays the aggregated (mean) values of the relative estimation error based on 100 simulation runs for all experiment settings considered; Figure 6 below shows visualization of a selected subset of experiment settings for n = 100 and p = 66.

Figure 6: — MSEr for estimation of b as a function of dissimilarity between $A^{o b s}$ and $A^{t r u e}$ as measured by $diss (A^{o b s}, A^{t r u e})$ (Scenario 1) for griPEER (blue line) and logistic ridge (red line). Standard errors of the mean from 100 experiment runs are shown.

MSEr of griPEER is lower or equal to the MSEr of the logistic ridge regression in all cases. The utility of griPEER is particularly apparent when $A^{o b s}$ is fully or largely informative, which corresponds to the low values of dissimilarity $diss (A^{o b s}, A^{t r u e})$ . As $A^{o b s}$ becomes less informative about the true connections between the model coefficients, that is, values of $diss (A^{o b s}, A^{t r u e})$ become high, the MSEr of griPEER approaches the MSEr of logistic ridge. The result illustrates the adaptiveness of griPEER to the amount of true information in an observed $A^{o b s}$ matrix. When $A^{o b s}$ is largely informative, incorporating $A^{o b s}$ into the estimation is clearly beneficial, but even when $A^{o b s}$ carries little or no information, griPEER still yields an MSEr no larger than the MSEr of logistic ridge estimator.

The performance of griPEER and logistic ridge regression depends on the structure of connections imposed by $A^{t r u e}$ on the true b. We can observe that a difference between the MSErs for griPEER and logistic ridge is smaller when $A^{t r u e}$ is defined using $A_{1}$ : homologous regions matrix (Figure 16, left column panel). Indeed, $A_{1}$ has smaller density than $A_{2}$ , $A_{3}$ and $A_{4}$ and imposes fewer connections between true coefficients in a model. Therefore, utilizing (full or partial) connectivity information $A^{o b s}$ in estimation for $A_{1}$ -based signals is less beneficial compared to other considered patterns of coefficients dependencies. Furthermore, when each node is connected to every other by a path consisting of strong connections, as in the case when $A^{t r u e}$ is created based on $A_{4}$ (fourth column panel in Figure 16), it is expected that all “true” model coefficients in a generated vector b will be strongly dependent on each other; see Figure 12 in the Section 8.4 of the Supplementary Material. In such cases, even inaccurate information about the connections (high $diss (A^{o b s}, A^{t r u e})$ values) may still be beneficial. Finally, if we compare the results within each column panel of Figure 16, we observe, as expected, that the estimation error becomes smaller as number of predictors p becomes smaller and as the number of observations n becomes larger.

5.2.2.2. Scenario 2—

In Scenario 2, we compare griPEER and logistic ridge estimation methods in a situation when some coefficients of the true signal are separated from the others much more than suggested by the observed connectivity matrix, $A^{o b s}$ . To generate such a design, $A^{t r u e}$ has some columns and rows of negative values, which “pushes” the corresponding coefficients away, while the signal is randomized. We run the simulation for number of observations, n = 100; number of variables, p = 66; and for $A^{t r u e}$ based on four connectivity patterns inducing matrices, ${A_{1}, \dots, A_{4}}$ . Matrix $A^{t r u e}$ was generated from $A_{i}$ by switching signs in k columns (and corresponding rows), where k ∈ {1, 4, 7, 10}. Figure 7 displays the aggregated (mean) values of the relative estimation error based on 100 simulation runs.

Figure 7: — MSEr for estimation of b for four true connectivity patterns described by matrices, $A_{1}, \dots, A_{4}$ (Scenario 2). Results for griPEER and logistic ridge (blue and red lines, respectively). Presented are the average values of MSEr from 100 runs for n = 100 and p = 66. The number of columns (and corresponding rows) of $A_{i}$ , for which signs of entries were switched in $A^{t r u e}$ construction, is represented by the x-axis values. Error bars indicate standard errors of the mean.

With increasing k, $A^{o b s}$ differs more from the connectivity pattern used in the true signal, and so, the relative difference between MSEr for logistic ridge regression and griPEER decreases for nearly all settings. Notably, MSEr values for griPEER remain below or equal to MSEr values for logistic ridge. These results suggest that even some incorrect information (accomplished by introducing negative dependencies between variables) is not detrimental.

5.2.2.3. Scenario 3—

In this scenario, we compare griPEER and logistic ridge estimation methods when $A^{o b s}$ is of lower and higher matrix density than $A^{t r u e}$ . We again consider n = 100 and p = 66. This time, we generated $A^{t r u e}$ by adding/removing some connections to/from $A_{i}$ , which influences the density of the resulting matrix. Here, we consider $dens (A^{o b s}) ∕ dens (A^{t r u e}) \in [0.5, 1.5]$ as a range of density ratios. Figure 8 illustrates the mean values of the relative estimation error based on 100 simulation runs.

Figure 8: — MSEr for estimation of b as a function of the matrix density ratio (Scenario 3). Results for griPEER and logistic ridge (blue and red lines, respectively). Presented are the average values of MSEr from 100 runs for n = 100, p = 66 and four true connectivity patterns inducing matrices, $A_{1}, \dots, A_{4}$ . Ratio of densities, $dens (A^{o b s}) ∕ dens (A^{t r u e})$ was varied from 0.5 to 1.5. Standard errors of the mean are shown. Green dashed vertical lines denote ratio of matrix densities of 1, when $A^{o b s}$ is identical to $A^{t r u e}$ .

Similar to Scenario 1, incorporating information on only a few connections ( $A_{1}$ case) yields the smallest gain in the estimation accuracy measured by MSEr among all considered connectivity patterns. If $A^{t r u e}$ is set to $A_{4}$ , then (again, analogously to Scenario 1) the information about the strong coefficients’ dependence is provided through $A^{o b s}$ . This results in substantially lower MSEr for griPEER across the full considered density ratio range. When $A^{t r u e}$ is one of the module-based matrices, $A_{2}$ or $A_{3}$ , we still benefit from using $A^{o b s}$ of lower density than $A^{t r u e}$ as $A^{o b s}$ contains unaffected information about five separated modules in connectivity structure (values lower than 1 on the x-axis). Including the false connections in $A^{o b s}$ (values greater than 1 on the x-axis) provides the incorrect message of some dependencies between modules. A loss in griPEER’s estimation accuracy is apparent at the transition point x = 1. It remains, however, significantly better than the estimation accuracy for logistic ridge regression over the entire range of considered densities ratios.

Table 3 in the Section 8.6 of the Supplementary Material summarizes obtained regularization parameter values and griPEER estimation execution time in numerical experiments across three simulation scenarios. The median execution time varied from 3.8 to 1,277 seconds per run depending on the simulation scenario; the longest execution times were attained for p = 528 settings. It is worth noting that the computational time depends on a few factors with longer computation times for: (a) larger sample sizes, (b) smaller ratios of the number of observations to the number of predictors and (c) fewer sparse connectivity matrices.

5.3. Model Coefficient Significance Testing

5.3.1. Settings

We designed a simulation study to evaluate the performance of the two procedures from coefficient significance testing for griPEER, introduced in Section 4: asymptotic variance–covariance matrix-based approach, griPEER_asmp, and bootstrap-based approach, griPEER_boot.

5.3.1.1. Simulation scenario—

We followed the simulation setting from Scenario 1, described in Section 5.2. Specifically, we assumed that $A^{o b s}$ represented connections (partially) permuted with respect to connections represented by $A^{t r u e}$ , that is, the corresponding $A^{o b s}$ is constructed by randomizing entries in $A^{t r u e}$ until a desired dissimilarity, $diss (A^{o b s}, A^{t r u e})$ , is achieved (see: Figure 13). The randomization technique preserves graph size, density, strength and graph degree sequence (and hence degree distribution). Here, we confined our evaluation to p = 66 and specific $A^{t r u e}$ based on $A_{3}$ . The median of a density of connections was masked by modularity information, which corresponds to an adjacency matrix in the brain imaging analysis Section 4.

The adopted simulation scheme starts by generating the true signal and responses as in Section 5.2. We generated a large number of observations, n = 1,000, but in the estimation, only 150 records were used to emulate a real data setting. The large sample size was used only to label the variables that were “truly relevant” so that the performance of griPEER_asmp and griPEER_boot in the context of variables selection could be assessed. Defining “truly relevant” variables was accomplished by using the asymptotic confidence interval for the logistic model estimate (nonregularized estimation), which is unbiased and asymptotically normal (Fahrmeir & Kaufmann, 1985). The details are described below.

5.3.1.2. Simulation study procedure—

Here, we perform the following steps.

Applied Steps 1–5 from the simulation study procedure described in Section 5.2 with n = 1,000.
Ran the following steps 100 times:
- generated p-dimensional vector of true coefficients, b, as well as n-dimensional vectors, θ and y, by following steps 2(a)–2(d) described in Section 5.2,
- calculated the asymptotic standard deviations, $δ_{i} ≔ \sqrt{{[{(Z^{T} Ψ Z)}^{- 1}]}_{i i}}$ , for i = 1, … , p, where $Ψ ≔ d i a g {\frac{e^{θ_{1}}}{(e^{θ_{1}} + 1)^{2}}, \dots, \frac{e^{θ_{n}}}{(e^{θ_{n}} + 1)^{2}}}$ (see, Section 8.3 of the Supplementary Material),
- divided the set of indices, {1, … , p}, into two separated groups: I_T, corresponding to the variables defined as relevant and I_F, corresponding to the variables defined as irrelevant, by using the criterion i ∈ I_T ⇔ 0 ∉ [b_i – 1.96 δ_i, b_i + 1.96 δ_i],
- generated the data for estimation, y*, X* and Z*, by taking the first 150 rows of y, X and Z; centered and normalized the columns of Z* to zero means and unit ℓ₂ norms,
- applied griPEER_asmp and griPEER_boot on y*, X* and Z* to indicate response-related variables defined by each of methods,
- based on information about “truly relevant” and “truly irrelevant” variables, that is, the known division into I_T and I_F; for each method identified: S, the number of true discoveries and V, the number of false discoveries,
- for each method collected measures $p o w^{*} ≔ \frac{S}{∣ I_{T} ∣}$ and, $f d r^{*} ≔ \frac{V}{V + S}$ ,
Defines the estimates of power and false discovery rate (FDR) as the averages of pow* and fdr* (across 100 repetitions of the step 2), respectively.

5.3.2. Results

The values of power and false discovery rate (FDR) (Figure 9, left and right, respectively) were estimated based on the simulation procedure described in Section 5.3.1. As expected, for both methods, power decreases as $A^{o b s}$ becomes less informative regarding the true connections between model coefficients. We observe, however, that griPEER_boot can reach substantially higher power than griPEER_asmp. The estimated FDRs are fairly similar for both methods, particularly when connectivity information is less accurate.

These results suggest that utilizing griPEER_boot for coefficient significance testing produced greater power compared to the griPEER_asmp approach without a substantial increase of FDR. Consequently, we employ griPEER_boot in the real data application in Section 6.

5.4. The Software Used in Simulations

Our analyses were performed using software built in MATLAB, available at GitHub at https://github.com/martakarass/gripeer-numerical-experiments. This repository also contains all scripts used in numerical simulations and to generate figures.

6. BRAIN IMAGING DATA APPLICATION

We model the association between the presence or absence of HIV and the properties of the cortical structural brain imaging data. More specifically, we employ cortical thickness measurements obtained using FreeSurfer software (Fischl, 2012) to classify the binary response indicating the status of HIV infection, where 0 indicates an HIV-uninfected (HIV−) individual and 1 an HIV-infected (HIV+) individual.

6.1. Data and Preprocessing

6.1.1. Study sample

The analyzed sample consisted of 162 men aged 18-42 years, where 108 were HIV+ and 54 were HIV−. The demographic and clinical characteristics are summarized in Table 1.

Table 1:

Characteristics for 162 men included in the study. Table includes the information about subjects’ age (Age), recent CD4 cell count (Recent CD4), nadir CD4 cell count (Nadir CD4) and recent viral load (Recent VL).

HIV status	Variables	Min	Median	Max	Mean	StdDev
	Age	18	24	41	26.31	6.45
HIV+ (108)	Recent CD4	20	446	1179	461.83	243.81
	Nadir CD4	15	293	690	289.13	158.31
	Recent VL	20	50	555495	30232.22	78827.35
HIV− (54)	Age	18	23	41	24.8	6.45

Open in a new tab

6.1.2. Cortical measurements

The FreeSurfer software package (version 5.1) was used to analyze the structural magnetic resonance imaging data, including gray – white matter segmentation, reconstruction of cortical surface models, labelling of regions on the cortical surface and analysis of group morphometry differences. The resulting dataset has cortical measurements for 68 cortical regions with parcellation based on the Desikan-Killiany atlas (Desikan et al., 2006). However, in this analysis, we used 66 variables describing average cortical gray matter thickness (in millimeters), which did not incorporate left and right insula due to their exclusion from the structural connectivity matrix.

6.1.3. Structural connectivity information

We used two adjacency matrices, which were incorporated in the griPEER estimation through the normalized Laplacian matrix. The adjacency matrices were based on two structural connectivity metrics: density of connections (DC) and fractional anisotropy (FA). For each of these matrices, we performed two steps to obtain the final adjacency matrix, $A$ . In the first step, we computed the entry-wise median (across participants) of DC or FA connectivity matrices. The second step relied on “masking by modularity partition”, that is, limiting the information achieved in the first step to only the connections between brain regions that were in the same module (i.e., we set $A_{i j} ≔ 0$ if regions i and j were not in the same module). For this purpose, we used the modularity connectivity matrix (Sporns, 2013; Cole et al., 2014; Sporns & Betzel, 2016), which defines the division of the brain into five separated communities. The modularity matrix was obtained using the Louvain method Blondel et al. (2008) and was based on model proposed by Hagmann et al. (2008). More details on this procedure can be found in Karas et al. (2019).

6.2. Estimation Methods

We employed griPEER_boot and logistic ridge to classify the HIV+ and HIV− individuals based on the estimated cortical thickness measurements. All analyses were adjusted for Age with its respective coefficient nonpenalized. Consequently, X was an n × 2 matrix containing the column of ones (representing the intercept) and the column corresponding to participants’ age. Columns of design matrices (other than the intercept) were zero mean-centered and normalized to unit standard deviation before the estimation. The selection of regularization parameter in logistic ridge regression was performed within the GLMM framework. For all presented results, we used a bootstrap-based approach with 50,000 samples to define the subset of statistically significant variables.

6.3. Results

The estimates obtained from the griPEER_boot and logistic ridge regression for the sample in Table 1 are presented in Figure 10. Brain regions labelled as response related are indicated by solid red vertical lines. In Table 2, we summarize the estimated values corresponding to brain regions tagged as response related by at least one considered approach. Note that all significant associations are negative, indicating that thinner cortical areas are indicative of HIV-positive status. Significant estimates obtained from the griPEER for both (FA and DC) connectivity matrices agree for seven of eight cortical brain regions. The significant logistic ridge findings disagree with the FA-based griPEER estimates in four regions and with the DC-based griPEER estimates in three regions. The corresponding tuning parameters are (λ_Q = 192.4, λ_R = 482.7) for FA and (λ_Q = 168.9, λ_R = 133.5) for the DC matrix. For the logistic ridge, the selected parameter is λ_R = 277.

Table 2:

Estimates of the cortical brain region coefficients obtained by the logistic ridge regression and griPEER_boot with two different connectivity matrices—fractional anisotropy (Masked FA) and density of connections (Masked DC). Both matrices were masked by the modularity matrix before the analysis. Values corresponding to regions tagged to be response related are shown in bold font. Regions are listed in the table if they were found to be significant by at least one of the three considered methods.

Connectivity	CaudMF	PostCen	PreCen	PreCun	SupPar	SupraMar	Entor	PostCen	SupPar
type	[L]	[L]	[L]	[L]	[L]	[L]	[R]	[R]	[R]
Empty	−0.016	−0.025	−0.018	−0.019	−0.015	−0.029	−0.021	−0.020	−0.013
Masked FA	−0.019	−0.031	−0.030	−0.011	−0.020	−0.029	−0.023	−0.019	−0.017
Masked DC	−0.016	−0.026	−0.022	−0.016	−0.018	−0.028	−0.018	−0.018	−0.015

Open in a new tab

The discovered regions are primarily located in the left hemisphere’s frontal and parietal lobes. Two regions, the postcentral gyrus and superior parietal lobule, were detected in both hemispheres. All considered approaches identified the right entorhinal cortex as being significantly thinner in HIV-infected individuals.

Brain regions identified by griPEER_boot using an FA matrix restricted to modules (Masked FA) are shown in the Figure 11.

7. DISCUSSION

In this work, we presented a rigorous, computationally feasible method to incorporate additional information in the estimation of regression parameters in the GLM setting. The method presented here, griPEER, extends our previous work performed in the linear model setting of Karas et al. (2019). We utilized known structural connectivity information (assessed by different diffusion weighted imaging (DWI) metrics) to inform the association between the cortical thickness covariates and a generalized outcome (e.g., binary indicator of HIV infection). The structural connectivity information was used to create a Laplacian matrix, which in turn allowed us to specify the regularization penalty. The simulation study showed that, in each of the presented scenarios, the griPEER method outperformed logistic ridge in a binomial model coefficient estimation—griPEER yielded a smaller or similar relative estimation error, $M S E r = ‖ \hat{b} - b ‖_{2}^{2} ∕ ‖ b ‖_{2}^{2}$ , when compared to the logistic ridge. Performance of griPEER is significantly better when the observed connectivity information was either fully or largely informative about the true connectivity structure between model coefficients. Notably, even in the cases when the observed connectivity information was only partially informative or completely noninformative, griPEER yielded MSEr that was no larger than the logistic ridge estimator. Our method has therefore the same desirable properties as its precursor, riPEER, in the continuous outcomes case. Moreover, the performed simulations showed that our implementation produces stable regression coefficient estimates. In addition, we did not encounter convergence problems even when the binary response variables were highly unbalanced.

griPEER simultaneously estimates the associations between the predictors (region-specific cortical thickness) and binary outcome variables in the GLMM framework. This constitutes a huge advantage over widely used approaches relying on testing variables individually. Such procedures are then followed by a multiple testing correction leading to an increased number of false negatives and, consequently, to reduced power. For instance, after multiple testing correction was applied in MacDuffie et al. (2018), there were no brain regions found with cortical thickness significantly related to the HIV diagnosis, although smaller global cortical area and volume were observed in HIV-positive participants with lower nadir CD4 count.

Application of griPEER to classify individuals as HIV+ and HIV− resulted in the discovery of eight and seven cortical regions when FA and DC matrices were used, respectively. Applying logistic ridge regression (i.e., performing the analysis with no external information arising from brain connectivity) to the same dataset provided six discoveries. Interestingly, all regression coefficients identified as statistically significant were estimated as negative by all three considered approaches. Our results therefore confirm the belief, strongly supported by the literature (Thompson et al., 2005; Sanford et al., 2018), that HIV infection is associated with reduced cortical thickness.

All considered approaches discovered regions located mostly in left parietal and frontal brain lobes. The cortical thickness of these brain regions is often reported as being significantly reduced by HIV infection (MacDuffie et al., 2018; Sanford et al., 2018). In particular, changes in mean cortical thickness in the left precentral and the supramarginal gyrus were found to be significantly associated with HIV infection Kallianpur et al. (2012), while significantly lower cortical thickness in the bilateral postcentral region in HIV patients was observed by Yadav et al. (2017). In addition, all considered methods revealed regions in the left primary sensory and motor cortex where cortical thickness differences between HIV-positive and HIV-negative subjects were detected by Sanford et al. (2018).

Employing griPEER with FA matrix revealed three additional cortical regions compared to logistic ridge regression, namely, the left frontal gyrus and the bilateral superior parietal lobes, that were thinner in the HIV+ individuals. These additional findings are consistent with the literature cited above and suggest that using DWI-based information may indeed increase the power of statistical techniques applied to investigate the association between cortical thickness and HIV-related outcome variables.

By providing a fully data-adaptive tool, we extend the existing approaches and show how the external information can be employed in the estimation when binary responses are considered. In future work, we plan to extend our methodology in two directions. First, we will utilize other cortical structural metrics, such as the cortical area and its curvature, to create structural connectivity matrices. Second, we will incorporate multiple sources of information in the regularization procedures. This will enable simultaneous inclusion of both structural and functional brain connectivity information, as well as allow us to divide the connectivity information into parts corresponding to different brain modules and let the algorithm automatically determine their impact on the regression coefficient estimates.

Supplementary Material

NIHMS1763726-supplement-Supplementary_Material.pdf^{(4MB, pdf)}

ACKNOWLEDGEMENTS

Research support was partially supported by the NIMH grant R01MH108467. D.B. was funded by Wrocław University of Science and Technology resources specified by the number 0108\W13\K1\11\2018.

Footnotes

CONFLICT OF INTEREST

The authors confirm that there are no known conflicts of interest associated with this publication, and there has been no significant financial support for this work that could have influenced its outcome.

Additional Supporting Information may be found in the online version of this article at the publisher’s website.

BIBLIOGRAPHY

Bertero M & Boccacci P (1998). Introduction to Inverse Problems in Imaging. Institute of Physics, Bristol, UK. [Google Scholar]
Blondel VD, Guillaume JL, Lambiotte R, & Lefebvre E (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 10, P10008. [Google Scholar]
Breslow NE & Clayton DG (1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association, 88(421), 9–25. [Google Scholar]
Buhlmann P (2013). Statistical significance in high-dimensional linear models. Bernoulli, 19(4), 1212–1242. [Google Scholar]
Cessie SL & Houwelingen JCV (1992). Ridge estimators in logistic regression. Journal of the Royal Statistical Society: Series C: Applied Statistics, 41(1), 191–201. [Google Scholar]
Chung F (2005). Laplacians and the Cheeger inequality for directed graphs. Annals of Combinatorics, 9(1), 1–19. [Google Scholar]
Cole MW, Bassett DS, Power JD, Braver TS, & Petersen SE (2014). Intrinsic and task-evoked network architectures of the human brain. Neuron, 83(1), 238–251. [DOI] [PMC free article] [PubMed] [Google Scholar]
Desikan RS, Segonne F, Fischl B, Quinn BT, Dickerson BC, Blacker D, Buckner RL, Dale AM, Maguire RP, Hyman BT, Albert MS, & Killiany RJ (2006). An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. NeuroImage, 31(3), 968–980. [DOI] [PubMed] [Google Scholar]
Eickhoff S, Thirion B, Varoquaux G, & Bzdok D (2015). Connectivity-based parcellation: Critique and implications. Human Brain Mapping, 36(12), 4771–4792. [DOI] [PMC free article] [PubMed] [Google Scholar]
Eickhoff S, Yeo T, & Genon S (2018). Imaging-based parcellations of the human brain. Nature Reviews Neuroscience, 19(11), 672–686. [DOI] [PubMed] [Google Scholar]
Engl HW, Hanke M, & Neubauer A (2000). Regularization of Inverse Problems. Kluwer, Dordrecht, Germany. [Google Scholar]
Fahrmeir L & Kaufmann H (1985). Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Annals of Statistics, 13(1), 342–368. [Google Scholar]
Fischl B (2012). FreeSurfer. NeuroImage, 62(2), 774–781. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hagmann P, Cammoun L, Gigandet X, Meuli R, Honey CJ, Wedeen VJ, & Sporns O (2008). Mapping the structural core of human cerebral cortex. PLoS Biology, 6(7), 159. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hastie T, Buja A, & Tibshirani R (1995). Penalized discriminant analysis. The Annals of Statistics, 23(1), 73–102. [Google Scholar]
Huang J, Shen H, & Buja A (2008). Functional principal components analysis via penalized rank one approximation. Electronic Journal of Statistics, 2, 678–695. [Google Scholar]
Kallianpur KJ, Kirk GR, Sailasuta N, Valcour V, Shiramizu B, Nakamoto BK, & Shikuma C (2012). Regional cortical thinning associated with detectable levels of HIV DNA. Cerebral Cortex, 22(9), 2065–2075. [DOI] [PMC free article] [PubMed] [Google Scholar]
Karas M, Brzyski D, Dzemidzic M, Goni J, Kareken DA, Randolph TW, & Harezlak J (2019). Brain connectivity–informed regularization methods for regression. Statistics in Biosciences, 11, 47–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li C & Li H (2008). Network–constrained regularization and variable selection for analysis of genomic data. Bioinformatics, 24(9), 1175–1182. [DOI] [PubMed] [Google Scholar]
MacDuffie KE, Brown GG, McKenna BS, Liu TT, Meloy MJ, Tawa B, Archibald S, Fennema-Notestine C, Atkinson JH, Ellis RJ, Letendre SL, Hesselink JR, Cherner M, Grant I, & TMARC Group. (2018). Effects of HIV infection, methamphetamine dependence and age on cortical thickness, area and volume. NeuroImage Clinial, 20, 1044–1052. [DOI] [PMC free article] [PubMed] [Google Scholar]
Maldonado YM (2009). Mixed models, posterior means and penalized least-squares. Optimality, 57, 216–236. [Google Scholar]
McIlhagga W (2016). Penalized: A MATLAB toolbox for fitting generalized linear models with penalties. Journal of Statistical Software, 72(6). [Google Scholar]
Phillips D (1962). A technique for the numerical solution of certain integral equations of the first kind. Journal of the ACM, 9(1), 84–97. [Google Scholar]
Pinheiro JC & Chao EC (2006). Efficient Laplacian and adaptive gaussian quadrature algorithms for multilevel generalized linear mixed models. Journal of Computational and Graphical Statistics, 15(1), 58–81. [Google Scholar]
Randolph TW, Harezlak J, & Feng Z (2012). Structured penalties for functional linear models—Partially empirical eigenvectors for regression. Electronic Journal of Statistics, 6, 323–353. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sanford R, Fellows LK, Ances BM, & Collins DL (2018). Association of brain structure changes and cognitive function with combination antiretroviral therapy in HIV-positive individuals. JAMA Neurology, 75(1), 72–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
Slawski M, Castell WZ, & Tutz G (2010). Feature selection guided by structural information. Annals of Applied Statistics, 4(2), 1056–1080. [Google Scholar]
Sporns O (2013). Network attributes for segregation and integration in the human brain. Current Opinion in Neurobiology, 23(2), 162–171. [DOI] [PubMed] [Google Scholar]
Sporns O & Betzel RF (2016). Modular brain networks. Annual Review of Psychology, 67, 613. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thompson PM, Dutton RA, Hayashi KM, Toga AW, Lopez OL, Aizenstein HJ, & Becker JT (2005). Thinning of the cerebral cortex visualized in HIV/AIDS reflects CD4+ T lymphocyte decline. Proceedings of the National Academy of Sciences of the United States of America, 102(43), 15647–15652. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tibshirani R (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58(1), 267–288. [Google Scholar]
Tibshirani R, Saunders M, Rosset S, Zhu J, & Knight K (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B, 67(1), 91–108. [Google Scholar]
Tibshirani R & Taylor J (2011). The solution path of the generalized lasso. The Annals of Statistics, 39(3), 1335–1371. [Google Scholar]
Tikhonov A (1963). Solution of incorrectly formulated problems and the regularization method. Soviet Mathematics, 4(4), 1035–1038. [Google Scholar]
Wolfinger R & O’connell M (1993). Generalized linear mixed models a pseudo-likelihood approach. Journal of Statistical Computation and Simulation, 48(3), 233–243. [Google Scholar]
Xin B, Kawahara Y, Wang Y, & Gao W (2016). Efficient generalized fused lasso and its applications. ACM Trans. Intell. Syst. Technol, 7, 4, Article 60 (May 2016), 22. [Google Scholar]
Yadav SK, Gupta RK, Garg RK, Venkatesh V, Gupta PK, Singh AK, Hashem S, Al-Sulaiti A, Kaura D, Wang E, Marincola FM, & Haris M (2017). Altered structural brain changes and neurocognitive performance in pediatric HIV. NeuroImage Clinical, 14, 316–322. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zeger S & Karim MR (1991). Generalized linear models with random effects—A Gibbs sampling approach. Journal of the American Statistical Association, 86(413), 79–86. [Google Scholar]
Zhao S & Shojaie A (2016). A significance test for graph-constrained estimations. Biometrics, 72(2), 484–493. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zou H (2006). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418–1429. [Google Scholar]
Zou H & Hastie T (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B, 67(2), 301–320. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

NIHMS1763726-supplement-Supplementary_Material.pdf^{(4MB, pdf)}

[R1] Bertero M & Boccacci P (1998). Introduction to Inverse Problems in Imaging. Institute of Physics, Bristol, UK. [Google Scholar]

[R2] Blondel VD, Guillaume JL, Lambiotte R, & Lefebvre E (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 10, P10008. [Google Scholar]

[R3] Breslow NE & Clayton DG (1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association, 88(421), 9–25. [Google Scholar]

[R4] Buhlmann P (2013). Statistical significance in high-dimensional linear models. Bernoulli, 19(4), 1212–1242. [Google Scholar]

[R5] Cessie SL & Houwelingen JCV (1992). Ridge estimators in logistic regression. Journal of the Royal Statistical Society: Series C: Applied Statistics, 41(1), 191–201. [Google Scholar]

[R6] Chung F (2005). Laplacians and the Cheeger inequality for directed graphs. Annals of Combinatorics, 9(1), 1–19. [Google Scholar]

[R7] Cole MW, Bassett DS, Power JD, Braver TS, & Petersen SE (2014). Intrinsic and task-evoked network architectures of the human brain. Neuron, 83(1), 238–251. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Desikan RS, Segonne F, Fischl B, Quinn BT, Dickerson BC, Blacker D, Buckner RL, Dale AM, Maguire RP, Hyman BT, Albert MS, & Killiany RJ (2006). An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. NeuroImage, 31(3), 968–980. [DOI] [PubMed] [Google Scholar]

[R9] Eickhoff S, Thirion B, Varoquaux G, & Bzdok D (2015). Connectivity-based parcellation: Critique and implications. Human Brain Mapping, 36(12), 4771–4792. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Eickhoff S, Yeo T, & Genon S (2018). Imaging-based parcellations of the human brain. Nature Reviews Neuroscience, 19(11), 672–686. [DOI] [PubMed] [Google Scholar]

[R11] Engl HW, Hanke M, & Neubauer A (2000). Regularization of Inverse Problems. Kluwer, Dordrecht, Germany. [Google Scholar]

[R12] Fahrmeir L & Kaufmann H (1985). Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Annals of Statistics, 13(1), 342–368. [Google Scholar]

[R13] Fischl B (2012). FreeSurfer. NeuroImage, 62(2), 774–781. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Hagmann P, Cammoun L, Gigandet X, Meuli R, Honey CJ, Wedeen VJ, & Sporns O (2008). Mapping the structural core of human cerebral cortex. PLoS Biology, 6(7), 159. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Hastie T, Buja A, & Tibshirani R (1995). Penalized discriminant analysis. The Annals of Statistics, 23(1), 73–102. [Google Scholar]

[R16] Huang J, Shen H, & Buja A (2008). Functional principal components analysis via penalized rank one approximation. Electronic Journal of Statistics, 2, 678–695. [Google Scholar]

[R17] Kallianpur KJ, Kirk GR, Sailasuta N, Valcour V, Shiramizu B, Nakamoto BK, & Shikuma C (2012). Regional cortical thinning associated with detectable levels of HIV DNA. Cerebral Cortex, 22(9), 2065–2075. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] Karas M, Brzyski D, Dzemidzic M, Goni J, Kareken DA, Randolph TW, & Harezlak J (2019). Brain connectivity–informed regularization methods for regression. Statistics in Biosciences, 11, 47–90. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] Li C & Li H (2008). Network–constrained regularization and variable selection for analysis of genomic data. Bioinformatics, 24(9), 1175–1182. [DOI] [PubMed] [Google Scholar]

[R20] MacDuffie KE, Brown GG, McKenna BS, Liu TT, Meloy MJ, Tawa B, Archibald S, Fennema-Notestine C, Atkinson JH, Ellis RJ, Letendre SL, Hesselink JR, Cherner M, Grant I, & TMARC Group. (2018). Effects of HIV infection, methamphetamine dependence and age on cortical thickness, area and volume. NeuroImage Clinial, 20, 1044–1052. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Maldonado YM (2009). Mixed models, posterior means and penalized least-squares. Optimality, 57, 216–236. [Google Scholar]

[R22] McIlhagga W (2016). Penalized: A MATLAB toolbox for fitting generalized linear models with penalties. Journal of Statistical Software, 72(6). [Google Scholar]

[R23] Phillips D (1962). A technique for the numerical solution of certain integral equations of the first kind. Journal of the ACM, 9(1), 84–97. [Google Scholar]

[R24] Pinheiro JC & Chao EC (2006). Efficient Laplacian and adaptive gaussian quadrature algorithms for multilevel generalized linear mixed models. Journal of Computational and Graphical Statistics, 15(1), 58–81. [Google Scholar]

[R25] Randolph TW, Harezlak J, & Feng Z (2012). Structured penalties for functional linear models—Partially empirical eigenvectors for regression. Electronic Journal of Statistics, 6, 323–353. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] Sanford R, Fellows LK, Ances BM, & Collins DL (2018). Association of brain structure changes and cognitive function with combination antiretroviral therapy in HIV-positive individuals. JAMA Neurology, 75(1), 72–79. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] Slawski M, Castell WZ, & Tutz G (2010). Feature selection guided by structural information. Annals of Applied Statistics, 4(2), 1056–1080. [Google Scholar]

[R28] Sporns O (2013). Network attributes for segregation and integration in the human brain. Current Opinion in Neurobiology, 23(2), 162–171. [DOI] [PubMed] [Google Scholar]

[R29] Sporns O & Betzel RF (2016). Modular brain networks. Annual Review of Psychology, 67, 613. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] Thompson PM, Dutton RA, Hayashi KM, Toga AW, Lopez OL, Aizenstein HJ, & Becker JT (2005). Thinning of the cerebral cortex visualized in HIV/AIDS reflects CD4+ T lymphocyte decline. Proceedings of the National Academy of Sciences of the United States of America, 102(43), 15647–15652. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] Tibshirani R (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58(1), 267–288. [Google Scholar]

[R32] Tibshirani R, Saunders M, Rosset S, Zhu J, & Knight K (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B, 67(1), 91–108. [Google Scholar]

[R33] Tibshirani R & Taylor J (2011). The solution path of the generalized lasso. The Annals of Statistics, 39(3), 1335–1371. [Google Scholar]

[R34] Tikhonov A (1963). Solution of incorrectly formulated problems and the regularization method. Soviet Mathematics, 4(4), 1035–1038. [Google Scholar]

[R35] Wolfinger R & O’connell M (1993). Generalized linear mixed models a pseudo-likelihood approach. Journal of Statistical Computation and Simulation, 48(3), 233–243. [Google Scholar]

[R36] Xin B, Kawahara Y, Wang Y, & Gao W (2016). Efficient generalized fused lasso and its applications. ACM Trans. Intell. Syst. Technol, 7, 4, Article 60 (May 2016), 22. [Google Scholar]

[R37] Yadav SK, Gupta RK, Garg RK, Venkatesh V, Gupta PK, Singh AK, Hashem S, Al-Sulaiti A, Kaura D, Wang E, Marincola FM, & Haris M (2017). Altered structural brain changes and neurocognitive performance in pediatric HIV. NeuroImage Clinical, 14, 316–322. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] Zeger S & Karim MR (1991). Generalized linear models with random effects—A Gibbs sampling approach. Journal of the American Statistical Association, 86(413), 79–86. [Google Scholar]

[R39] Zhao S & Shojaie A (2016). A significance test for graph-constrained estimations. Biometrics, 72(2), 484–493. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] Zou H (2006). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418–1429. [Google Scholar]

[R41] Zou H & Hastie T (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B, 67(2), 301–320. [Google Scholar]

PERMALINK

Connectivity-informed adaptive regularization for generalized outcomes

Damian BRZYSKI

Marta KARAS

Beau M ANCES

Mario DZEMIDZIC

Joaquín GOÑI

Timothy W RANDOLPH

Jaroslaw HAREZLAK

Abstract

Résumé:

1. INTRODUCTION

Figure 1:

2. STATISTICAL MODEL

2.1. The Graph Laplacian, Q

2.2. Statistical Model in General Form

2.3. The Special Case: Binomial Distribution

2.4. Equivalence Between GLMM and Two Optimization Problems

3. A NEW ESTIMATION ALGORITHM

4. PROCEDURES FOR SIGNIFICANCE TESTING

4.1. Asymptotic Variance–Covariance Matrix

4.2. Bootstrap-Based Approach

5. NUMERICAL EXPERIMENTS

5.1. Definitions

5.1.1. Matrix density

5.1.2. Matrix dissimilarity

5.2. Model Coefficient Estimation

5.2.1. Settings

5.2.1.1. “Informativeness” of the penalty term—

5.2.1.2. Brain region connectivity context—

Figure 2:

5.2.1.3. Simulation scenarios—

Figure 3:

Figure 4:

Figure 5:

5.2.1.4. Simulation procedure—

5.2.1.5. Simulation parameters—

5.2.2. Results

5.2.2.1. Scenario 1—

Figure 6:

5.2.2.2. Scenario 2—

Figure 7:

5.2.2.3. Scenario 3—

Figure 8:

5.3. Model Coefficient Significance Testing

5.3.1. Settings

5.3.1.1. Simulation scenario—

5.3.1.2. Simulation study procedure—

5.3.2. Results

Figure 9:

5.4. The Software Used in Simulations

6. BRAIN IMAGING DATA APPLICATION

6.1. Data and Preprocessing

6.1.1. Study sample

Table 1:

6.1.2. Cortical measurements

6.1.3. Structural connectivity information

6.2. Estimation Methods

6.3. Results

Figure 10:

Table 2:

Figure 11:

7. DISCUSSION

Supplementary Material

ACKNOWLEDGEMENTS

Footnotes

BIBLIOGRAPHY

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases