Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Apr 11.
Published in final edited form as: Stat Med. 2024 Jun 24;43(20):3862–3880. doi: 10.1002/sim.10101

A multivariate to multivariate approach for voxel-wise genome-wide association analysis

Qiong Wu 1, Yuan Zhang 2, Xiaoqi Huang 3, Tianzhou Ma 4,5, L Elliot Hong 6, Peter Kochunov 6, Shuo Chen 5,6,7,8
PMCID: PMC11986643  NIHMSID: NIHMS2067460  PMID: 38922949

Abstract

The joint analysis of imaging-genetics data facilitates the systematic investigation of genetic effects on brain structures and functions with spatial specificity. We focus on voxel-wise genome-wide association analysis, which may involve trillions of single nucleotide polymorphism (SNP)-voxel pairs. We attempt to identify underlying organized association patterns of SNP-voxel pairs and understand the polygenic and pleiotropic networks on brain imaging traits. We propose a bi-clique graph structure (ie, a set of SNPs highly correlated with a cluster of voxels) for the systematic association pattern. Next, we develop computational strategies to detect latent SNP-voxel bi-cliques and an inference model for statistical testing. We further provide theoretical results to guarantee the accuracy of our computational algorithms and statistical inference. We validate our method by extensive simulation studies, and then apply it to the whole genome genetic and voxel-level white matter integrity data collected from 1052 participants of the human connectome project. The results demonstrate multiple genetic loci influencing white matter integrity measures on splenium and genu of the corpus callosum.

Keywords: bi-clique, imaging-genetics, ultra-high dimensionality, voxel-wise GWAS, white matter integrity

1 |. INTRODUCTION

Imaging-genetics has garnered increased interest in the field of neuropsychiatric research as it provides a viable pathway to understand brain diseases by integrating genetic, brain imaging, and environmental factors. Compared to clinical descriptions of symptoms in psychiatry, brain imaging measurements assess brain structures and functions quantitatively with reproducibility, which are reported to be associated with psychiatric disorders including schizophrenia,1 Alzheimer’s disease,2 major depressive disorder.3 More importantly, neuroimaging signals can serve as intermediate phenotypes resulting in increased power in the detection of genetic loci. Recent studies have been focused on the joint analysis of imaging-genetics data that reveals the genetic effects on spatially specific brain functions and structures.410 Identifying genetic effects on objectively measured high-resolution imaging traits can not only enhance understanding the complex genetic and neurological mechanisms of neuropsychiatric disorders, but further impact early diagnosis and treatment of psychiatric disorders.

In imaging-genetics studies, both brain imaging data and genome sequence are measured for each participant. The genetic measurements can characterize genetic variations using single nucleotide polymorphism (SNP) and copy number variants (CNVs). The non-invasive brain imaging techniques assess the brain structures by magnetic resonance imaging (MRI), diffusion tensor imaging (DTI), and brain functions by functional magnetic resonance imaging (fMRI). The recent development of neuroimaging technology provides high-resolution imaging data with improved spatial specificity and thus can better assess the genetic effects on brain structures and functions.

The statistical analysis of imaging-genetics data is computationally intensive and methodologically challenging. These challenges mainly rise from the combination of two sets of high-dimensional features: multivariate imaging traits with multivariate genetic variants (Figure 1). Moreover, both imaging traits and genetic variants exhibit complex and organized dependence structure reflecting the underlying neurophysiological mechanisms and linkage disequilibrium patterns.6 For example, a typical imaging-genetics study collects up to 107 SNPs and 105 voxels, jointly contributing trillions (1012) of SNP-voxel pairs.11,12 The direct application of classic voxel-wise genome-wide association analysis (vGWAS) could require an enormous sample size (eg, multiple millions of participants) to control the false positive error rate while maintaining adequate statistical power.1316

FIGURE 1.

FIGURE 1

Data structure for vGWAS. For imaging-genetics data, we can perform GWAS analysis on each voxel of 3D brain imaging data for the study cohort. The vGWAS analyses generate billions of association results, which raises challenges of result interpretation and comprehension.

Furthermore, advanced methods have been developed to leverage group sparsity by techniques including regularization, low rank techniques and projection of high-dimensional features.10,1725 However, while these methods could gain statistical power by jointly modeling genetic variants and imaging traits through a multivariate regression model, the high dimensionality of imaging-genetics data remains challenging due to computational burdens and/or over-fittings. For instance, the analysis can only be applied on imaging data at an regional-level or genetic data with filtered to thousands of SNP loci. Besides, the results from summarized measures as a few latent variables or a coarser scale are less interpretable or lacking the spatial specificity.5

In this study, we propose a new multivariate to multivariate method to systematically investigate the SNP-(imaging)voxel association patterns with four aims: (i) identify voxel clusters as genetically correlated imaging traits, (ii) detect functionally related SNP sets, (iii) understand the SNP-voxel association patterns as polygenic and pleiotropic relationships, and (iv) test the association patterns while controlling multiplicity. In our study, a polygenic trait refers to a voxel influenced by multiple SNPs while pleiotropy indicates that one gene can affect multiple voxel traits. Specifically, we consider genetic variants and imaging voxels as two disjoint sets of nodes, correspondingly, and associations between all SNP-voxel pairs as edges in a bipartite graph. We model the polygenic and pleiotropic SNP-voxel association structure as an imaging-genetics dense bi-clique (IGDB). IGDB is a node-induced subgraph consisting of a subset of SNPs and a subset of voxels, where the possibility of a SNP associated with a voxel is much elevated than the rest of graph. Within an IGDB, each voxel can be considered as a polygenic imaging trait, and a SNP as a pleiotropic genetic variant. Therefore, our method contributes as a new GWAS tool for voxel level neuroimaging traits which alleviates the burden of ultra stringent threshold (eg, p < 5 × 10−12 in vGWAS) and uncover the systematic SNP-trait association patterns.

With the specified IGDB structure of polygenic and pleiotropic association pattern, the current study makes several contributions. First, we develop computationally efficient algorithms to identify the IGDB structure with the scalability for analyzing the whole genome-whole brain data. Second, the proposed greedy algorithm is presented with the approximation bounds for the true optimal as well as its asymptotically full recovery of IGDB-based network structure. Last, we formulate the existence of a polygenic and pleiotropic SNP-voxel association structure against a random bipartite graph, which can be evaluated through likelihood-based statistics.

2 |. MOTIVATING DATA EXAMPLE

The human connectome project (HCP) sponsored by National Institutes of Health (NIH) aims to construct the underlying neural pathways of healthy human brain functions. It is an important public resource for structural and functional brain connectivity data, accompanied by demographic, behavioral, genetic and other data. In this study, we focus on the brain imaging and genetics data in the HCP surveyed from 1052 participants (F/M 483/569; age 28.1 ± 3.7), for whom the scans and data were released in June 2014 (https://humanconnectome.org) that passed the HCP and ENIGMA quality control and assurance standards.26 The participants in the HCP study were recruited from a large population-based study named “the Missouri Family and Twin Registry.”27

The fractional anisotropy (FA) measure, derived from diffusion tensor imaging (DTI), is a widely-used metric characterizing the localized white matter microstructural integrity.28 Previous studies have investigated the heritability through variance components method of pedigrees.29 They find that 70% to 80% of the total phenotypic variance of trait-wise FA measures can be explained by additive genetic factors.30 The significantly and reliably heritable FA measurements are qualified as a set of endophenotypes which suggests further exploration on associated genetic variants. Hence, the genetic analysis is desirable to detect the genetic effect from specific loci on imaging traits with statistical inference. Moreover, it is reported that FA measurements at multiple brain locations can be affected by a common set of genetic variates.9 FA is a complex trait determined by multiple alleles. It stimulates the identification of functionally-related genetic variants. This investigation naturally invokes the search for polygenicity and pleiotropy of networks as the focus of this study. Voxel-level association analysis between imaging traits and genetic variants can provide the maximal spatial resolution. Nevertheless, the implementation is challenging because it requires a multivariate to multivariate association analysis to extract SNP-voxel subnetworks with polygenic and pleiotropic structures and further to provide sound statistical inference. To close this gap, we develop an IGDB-based framework to perform voxel-vise GWAS and systematically identify polygenic and pleiotropic structures.

3 |. METHODS

3.1 |. Background and notations

We consider an imaging-genetics data set collected from L independent subjects. We let V be the set of brain imaging voxels with |V|=n and U be the set of genetic variants (ie, SNPs) with |U|=m. For each participant l{1,,L}, define xl=x1,l,,xm,lT to be the genetic variants for the participant l and yl=y1,l,,yn,lT to be the vector of multivariate imaging traits. Let zl denote a p-dimensional vector of individual-level profiling covariates. We model the associations between multivariate imaging traits and multivariate genetic variants using a generalized linear regression model:

Eylxl=g-1BTxl+αTzl,

where g() is a known link function with inverse g-1(). The coefficient B=βuvuU,vVRm×n is called the SNP-voxel association matrix. Without loss of generality, we consider the association matrix based on GWAS analysis (eg, using open-source whole genome association analysis toolset).31 The goal of our statistical inference is to accurately identify the subset of significant associations (u,v):βuv0 from billions of entries of B by multivariate to multivariate hypothesis testing32, 33:

H0(u,v):βuv=0,vsH1(u,v):βuv0,foralluU,vV.

Conventional statistical inference methods (eg, multiple testing correction or regression shrinkage) work by regularizing vectorized B. However, this strategy may only capture individual association pairs βuv without recognizing systematic patterns (eg, the pleiotropic and polygenic structure). A prominent example is that a cluster of SNPs may jointly influence the observations through a cluster of neighboring voxels. To address this challenge, we propose a new multivariate to multivariate inference framework that extracts the joint structure in B, which we call imaging-genetics dense bi-clique (IGDB). Next, we introduce the IGDB structure, based on which, we then formally propose a novel estimation and inference procedure on this structure.

3.2 |. IGDB in a multivariate to multivariate graph structure

We characterize the vGWAS association as a bipartite graph G=(U,V,E), where U and V are distinct node sets representing SNPs and voxels, respectively. The set of binary edges E describes the locations of significant SNP-voxel associations: euvE if and only if βuv0 in the association matrix B=βuvuU,vV. In contrast to conventional approaches that treat edges euv individually, our proposal provides a succinct description of pleiotropic (one SNP to multiple image voxels) and polygenic (multiple SNPs to one voxel) relationships. To this end, we now formally propose IGDB as a subgraph structure of G. Denote an arbitrary subgraph of G by G[S,T]=(S,T,E[S,T]), where SU, TV and E[S,T]=euvEiS,jT}. Our proposed IGDB will be defined based on some particular subgraph GS0,T0 such that most βuv’s are nonzero for euvGS0,T0, while most βuv’s elsewhere are zero. We illustrate the IGDB structure of a bipartite graph in Figure 2.

FIGURE 2.

FIGURE 2

Illustration of a bipartite graph with IGDB structure GS0,T0 that reveals underlying patterns of massive SNP-voxel association. In the top-left bipartite graph, each node (square) on the left side represents an SNP, while each node (circle) represents a location-specific voxel. The edges connecting the SNPs and voxels illustrate the associations, with red edges indicating pairs of associated SNPs and voxels in an IGDB structure. The bottom-left 2D figure provides an alternative representation of SNP-voxel associations, where associated pairs are depicted as black dots. The SNP-voxel association patterns in the left figures appear to be random. The bottom-right figure showcases the patterns that can be unveiled through the proposed IGDB method, suggesting systematic associations between imaging features and genetic variants. Note that traditional statistical methods, such as bi-clustering, face limitations in accurately identifying these patterns (see Figure A1 in the Appendix).

Our core intuition can be quantified into the following formulation:

u,vIβuv0δuv=1u,vIδuv=1>u,vIβuv0δuv=0u,vIδuv=0, (1)

where δuv is a binary variable indicating the IGDB-based network structure, that is,

δuvδuvS0,T0=IeuvGS0,T0.

This reflects that imaging features T0 are polygenic traits and the genetic variants S0 are pleiotropic alleles. The genetically correlated imaging features and functionally related SNPs jointly compose a functional biclique GS0,T0. In neuroimaging studies, findings are often reported for spatially contiguous brain areas (ie, connected voxels) because of the biological interpretability and inference advantages.34 This is reflected in our proposed IGDB structure by further formulating S0 and T0 as disjoint vertex neighborhoods, as follows:

S0=𝒩1S0𝒩K1S0,andT0=𝒩1T0𝒩K2T0,

where each 𝒩kT0k1,,K2 is a spatially contiguous voxel cluster, and accordingly 𝒩kS0k1,,K1 is a set of functionally related SNPs associated with one or multiple spatially-contiguous voxel clusters (eg, 𝒩kT0). In the next subsection, we articulate that the IGDB enjoys several statistical advantages supported by graph and combinatorics theory.

3.3 |. Graph properties of IGDB

Without loss of generality, we consider the following two cases regarding the underlying network structure of G:

Case0:GisobservedfromarandombipartitegraphG(m,n,μ0),Case1:ThereexistsatleastonenontrivialIGDBGS0,T0suchthatGisobservedfromeuv=Iβuv0~Bernoulliμ1,ifuS0&vT0Bernoulliμ0,otherwisewithμ1>μ0. (2)

In Case 0 (ie, no polygenic and pleiotropic patterns), we can directly implement the conventional multiple testing corrections and regression shrinkage methods to determine individual associations between genetic variants and imaging traits. If Case 1 presents, our primary goal becomes to extract and test the underlying IGDB subgraphs as polygenic and pleiotropic subnetworks.

In practice, the estimated IGDB from a sample can be used to distinguish Case 0 vs Case 1 because the observed network behave differently under two cases on the size of the maximal “dense” subgraph. For convenience, we call a subgraph G[S,T] a γ-quasi biclique, if it contains at least γ|S||T| edges. Then, asymptotically, if S0,T0 as m,n, with high probability, the true IGDB subgraph GS0,T0 would be a γ-quasi biclique for any fixed γμ0,μ1. In contrast, under Case 0, there would rarely exist a γ-quasi biclique of decent size with high density as the following lemma.

Lemma 1.

Suppose G is observed from a random bipartite graph Gm,n,μ0 as Case 0. G[S,T] is any subgraph with edge density |E[S,T]||S||T|γμ0,1 (ie, γ-quasi biclique). Let m0,n0=Ωmaxmϵ,nϵ for some 0<ϵ<1 Then for sufficiently large m,n with cγ,μ0m08logn and cγ,μ0n08logm, we have

P|S|m0,|T|n02mnexp-14cγ,μ0m0n0,

where c(a,b)=1(a-b)2+13(a-b)-1.

4 |. ESTIMATION AND INFERENCE

Let Wm×n denote the inference result matrix (eg, test statistics wuv=tuv or -logpuv) for the regression coefficients B^m×n. Then, our goal becomes to extract and test the IGDB structure from a weighted bipartite graph G=(U,V,W). Similar to Reference 33, as a natural consequence of our model set up in Section 3.2, edge weights in W follow a mixture marginal distribution:

wuv~f1;θ1,ifβuv0f0;θ0,ifβuv=0. (3)

where wuvδuv=1~μ1f1+1-μ1f0, while wuvδuv=0~μ0f1+1-μ0f0. Empirically, we have the central tendency of f1;θ1 being greater than f0;θ0, in the sense that Eθ1wuvβuv0>Eθ0wuvβuv=0.

4.1 |. IGDB estimation

Motivated by the nature of IGDB as a subgraph of elevated mean edge weights, we estimate it by looking for the maximal subgraph of G with a density constraint. Inspired by Lemma 1, we estimate the IGDB GS0,T0 based on the edge weight matrix W by optimizing:

maxSU,TV|S||T|subjecttoW[S,T]1,1|S||T|γ (4)

or the Lagrangian form after taking logarithm on both terms:

maxSU,TVlogST+λlogW[S,T]1,1ST, (5)

where 1,1 refers to the entry-wise 1 norm such that W[S,T]1,1=uS,vTwuv,γ is the density constraint and the tuning parameter λ(1,).

Algorithm 1.

Direct optimization of objective function (5)

1: Input: G=(U,V,W),λ, pre-specified ratio hh1,h2,,hmax;Output: G[S˜λ,T˜λ]
2: procedure Algorithm
3: for hh1,h2,,hmax do
4:   S1U,T1V
5:   for k=1 to n+m-1 do
6:    Let iSk be the node with smallest degree: i=argminiSkdegXi;Sk,Tk;
7:    Let jTk be the node with smallest degree: j=argminjTkdegYj;Sk,Tk;
8:    if hdegXi;Sk,Tk1hdegYj;Sk,Tk then
9:     Sk+1Sk/{i} and Tk+1Tk;
10:    else
11:     Sk+1Sk and Tk+1Tk/{j};
12:    end if
13:   end for
14:   Output G[Sh,Th] with largest objective function in GS1,T1,,GSn+m-1,Tn+m-1;
15: end for
16:  Output G[S˜λ,T˜λ] with largest objective function in G[Sh1,Th1],,G[Shmax,Thmax];
17: end procedure

The direct optimization of the objective function (5) is challenging because it is a nondeterministic polynomial (NP) problem.35,36 We propose a computationally efficient greedy algorithm to approximately carry out the optimization of (5). We describe the greedy algorithm as Algorithm 1 in the following. In designing it, we extended the greedy algorithms for dense subgraph discovery36 in an adjacency matrix to a large bipartite matrix to extract dense bi-cliques. Algorithm 1 removes nodes with the smallest degrees iteratively, which is a deterministic algorithm that does not depend on initial values. The computational complexity of Algorithm 1 is OC1mn, where C1 is determined by the grid search of h, that is, h=|S|/|T|, representing the aspect ratio of a dense subgraph, in the following Algorithm 1.

Now we establish approximation accuracy results of Algorithm 1 and its estimation of IGDB. Let Sλ* and Tλ* be the true optimal solution to (5):

(Sλ*,Tλ*)=argmaxSU,TVdλ(S,T),

and (S˜λ,T˜λ) is from Algorithm 1 with

(S˜λ,T˜λ)=argmaxhargmaxS1,T1,,Sm+n-1,Tm+n-1dλ(S,T),

where dλS,T:=log(|S||T|)+λlogW,T1,1|S||T|.

The greedy algorithm with average-degree based density (or equivalently λ=2) is said to have a 2-approximation guarantee for the true optimal,35 namely, 2d2(S˜2,T˜2)>d2(S2*,T2*). In this article, we present the approximation bounds for the proposed objective function (5) in terms of a parameter λ as the following Theorem 1.

Algorithm 2.

Determine tuning parameter λ by likelihood function

1: Input: G=(U,V,W), a grid of tuning parameters: λ1,λ2,,λJ, a sequence of cutoffs r1,r2,,rR and its mass qr1,,qrR; Output: G[S˜λˆ,T˜λˆ] and λˆ
2: procedure Algorithm
3: while λλ1,,λJ do
4:   Return the IGDB (S˜λ,T˜λ) of W from Algorithm 1
5:   for r=r1 to rR do
6:    calculate the likelihood defined in 4.2: λ(πˆ;S˜λ,T˜λ,W(r)) (We refer to Section 4.2 for detailed definition of the likelihood function.)
7:   end for
8:   integrate w.r.t. r:
  λ(W)=i=1Rλ(πˆ;S˜λ,T˜λ,Wri)qri
9: end while
10:  Output λˆ and (S˜λˆ,T˜λˆ) with maximized λ(W)
11: end procedure

Theorem 1.

For a given bipartite graph G=(U,V,E), with (Sλ*,Tλ*) and (S˜λ,T˜λ) defined in Section 3.1.1, the greedy algorithm 1 has a ρ(λ,m,n)-approximation, that is, dλ(Sλ*,Tλ*)ρ(λ,m,n)dλ(S˜λ,T˜λ) with

ρ(λ,m,n)=2(mn)1λ1-2λifλ22(mn)1λ-12if43<λ<2(mn)1-1λif1<λ43.

In Theorem 2, we state that the optimization of the proposed objective function (5) asymptotically leads to almost full recovery of the IGDB-based network structure.

Theorem 2.

We assume the graph G=(U,V,E) with an IGDBGS0,T0=S0,T0,ES0,T0 is generated from a mixture of Bernoulli distributions: euv~δuvBernoulliπ1+1-δuvBernoulliπ0,δuv=IeuvGS0,T0 and π1>π0. For simplicity, we let m=Θ(n). Assume S0=O(|m|1/2+ϵ) and T0=O(|n|1/2+ϵ) as n for some ϵ>0. Denote

eS=1-S˜λS0S0+1-S˜λcS0cS0c

and

eT=1-T˜λT0T0+1-T˜λcT0cT0c

to be the error rates of node memberships based on (S˜λ,T˜λ) from Algorithm 1. Then, there exists some λ such that we will get almost full recovery in Algorithm 1, that is, for any fixed a(0,1), as n, we have

PeS+eTa1.

In practice, we select the tuning parameter λ by a grid search based on the likelihood criterion,37 and describe the details in Algorithm 2. Based on each dense subgraph G[S,T], we further identify spatially-contiguous voxel clusters (ie, 𝒩˜kT, k=1,,K˜2), and a corresponding set of SNPs (ie, 𝒩˜kS, k=1,,K˜1) that are functionally associated with voxel clusters (see Supplement A). Last, multiple IGDBs can be extracted by performing algorithms repeatedly with the detected IGDBs masked.38

4.2 |. Statistical inference of the IGDB

Recall that the purpose of this study is to perform statistical inference on the pleiotropic and polygenic association pattern or the IGDB. We investigate the significance of the presence of an IGDB against a random bipartite graph (Case 1 vs Case 0) as illustrated in Section 3.3.

Let r be a sound cutoff that dichotomize the weighted graph G into a binary graph Gr=(U,V,A) using auv=Iwuv>r. Then, under IGDB structure indexed by node sets S0,T0, the edges in Gr follow a mixture of two Bernoulli distributions:

auvS0,T0~Bernoulliπuv, (6)

where πuv=δuvπ1+1-δuvπ0, π1=μ1rf1w,θ1dw+1-μ1rf0w,θ0dw, π0=μ0rf1w,θ1dw+(1-μ0rf0w,θ0dw, and π1>π0.39 Then, a hypothesis testing to distinguish Cases 0 and 1 can be proposed:

H0:π1=π0=πvsH1:π1>π0,

based on our mixture distribution model (6).

We propose a likelihood-based statistic for the IGDB test. For a binarized graph Gr, let

tG=logsupH0H1π;S,T,AsupH0π;A,

with likelihood given by Bernoulli distributions in (6). Specifically,

(π;S,T,A)=uSandvTπ1auv1-π11-auv×uU/SorvV/Tπ0auv1-π01-auvand(π;A)=uUandvVπauv(1-π)1-auv.

Then, the asymptotic power is ensured using the likelihood-based statistic through the following Theorem 3.

Theorem 3 (Under IGDB alternative hypothesis H1).

Assume m=Θ(n) and the underlying IGDB GS0,T0 with generating probabilities π1>π0 satisfies S0=m0,T0=n0 and m0,n0=Ωnϵ for some ϵ>0. Then for any η>1, as n, we have

PrtG>η1.

In determining the significance of IGDBs, the simultaneous testing needs to be accounted for all potential IGDBs. Besides, a rejection region (η) should be determined based on the distribution of tG under null model. Hence, we employ the commonly used permutation test procedure in the field of neuroimaging40,41 to empirically approximate the distribution of the likelihood-based statistic tG under the IGDB null and control the family-wise error rates (FWER).

Let ϕ() be the vectorization of a matrix, such that ϕ(A) is an mn vector of the adjacency matrix A. Denote τ as a permutation of mn elements, and Pτ is the corresponding permutation matrix. Let Gτ=U,V,Eτ an edge-permuted graph from G. Then, under random bipartite graph (Case 0), the edge-permuted graph Gτ would be a realization from the same null model. We let τ(1),,τ(M) be M random permutations and the corresponding edge-permuted adjacency matrices are given by Aτ(1),,Aτ(M). The test statistics associated with edge-permuted adjacency matrices Aτ(1),,Aτ(M) forms a random sample of tG under null hypothesis, which can be utilized to obtain the empirical distribution of tG under null hypothesis. We illustrate whole procedure of the permutation test in Algorithm 3, while the P-values of multiple IGDBs can be observed by considering each IGDB individually.

To dichotomize the weighted graph G, rather than setting r as a fixed value, which could lead to an arbitrary selection, we consider r as a random variable with a distribution q(r). This allows us to integrate the likelihood function over r, utilizing the prior distribution q(r), thereby making our optimization process robust to the specific choice of r. We implement a discrete distribution for q(r), defined by a set of possible values r1,,rR and their corresponding probabilities qr1,,qrR. In practice, our algorithm demonstrates robustness to the choice of the prior distribution, given that a reasonable range for the support of r is selected.

Algorithm 3.

Implementation of likelihood ratio statistic via permutation tests

1: Input: G=(U,V,A),Sˆ,Tˆ; Output: p-value
2: procedure Algorithm
3:  calculate the test statistic on G with subgraph G[Sˆ,Tˆ] and denote as: t0
4: for b=1 to M do
5:   generate permutation matrix Pb on mn elements
6:   observe adjacency matrix of edge-permuted graph Gb:Ab=ϕ-1Pbϕ(A)
7:   calculate the test statistic on Gb as: tb
8: end for
9: end procedure

5 |. RESULTS

We applied the IGDB approach to the motivating data set. The FA measures of DTI at 117 139 voxels were used in this study to characterize the white matter integrity.30,42 The image acquisition parameters are described in the Supplement A. Regarding genetic variants, 10 595 779 SNPs passed the quality control filters in HCP data set (MAF < 0.01; HQE < 1e–6; r-squared > 0.03; call rate > 0.95) after imputation on the Michigan Imputation Server Minimac3 (https://imputationserver.sph.umich.edu) using the 1000 Genomes Project (phase 1 v3) reference set.43

We preprocessed the diffusion weighted images following the ENIGMA-DTI workflow (http://enigma.ini.usc.edu/protocols/dti-protocols/). We further applied the Sequential Oligogenic Linkage Analysis Routines (SOLAR)-Eclipse software (https://www.nitrc.org/projects/se_linux) for the heritability analysis, of which imaging voxels were kept with significant heritability, based on the Fast and Powerful Heritability Inference (FPHI) function of SOLAR-Eclipse (P<0.05) in both the HCP and Amish Connectome Project (ACP). For these voxels, we performed vGWAS using PLINK while adjusting covariates including sex, age, BWI, and population characteristics using the first 10 principal components in our application.31 We then performed sure independence screening on SNPs with multiple imaging responses through a direct extension of univariate screening procedure.44 13 498 SNPs across 22 chromosomes survive into further analysis. The details are described in the Supplement A.

We tested the imaging-genetic associations between SNPs across 22 chromosomes and voxel-level imaging traits using our proposed method. Based on the procedures described in Sections 4.1 and 4.2, we extracted IGDBs and performed permutation tests to determine its statistical significance while controlling family-wise error rate (q<0.05). We observe different brain areas being influenced by distinct genetic loci. A Manhattan plot for all SNPs across 22 chromosomes with selected imaging-genetic associations highlighted and tables for SNP and voxels across all 22 chromosomes are included in the Supplement A.

In this section, we focus on SNPs on chromosome 1 to demonstrate their systematic association patterns with voxel-traits, and then annotate the genes in the detected IGDB. Based on the matrix of association strength W1178×29627 (ie, Figure 3A), we detected an IGDB with 384 SNPs and 3803 voxels as Figure 3B by maximizing the objective function (5), which is achieved by implementing Algorithm 2 utilizing a grid search for h across the range {1/20,1/19,,1,2,,19,20}, and for λ within the interval 0.5 to 1.2, with an incremental step of 0.02. We further calculated the p value for the IGDB statistical inference via the permutation test, which results in a significant existence of an IGDB with P value < 0.001. Although the IGDB is an irreducible subgraph, it can be further refined based on data-driven algorithms and spatial information of imaging data. We applied the existing community detection algorithms45 on similarity matrices observed from the detected IGDB. The refined pattern in Figure 3C displays 6 distinct SNP-voxel association clusters. Note that the refined structure cannot be identified without revealing the IGDB by the proposed algorithm.

FIGURE 3.

FIGURE 3

IGDB procedure on chromosome 1: (A) is the input matrix W, derived from vGWAS using PLINK while adjusting covariates including sex, age, BWI, and population characteristics using the first 10 principal components in our application.31 Each entry in the matrix is -logpij of the association between an SNP and imaging voxel pair (ie, a hotter entry indicates a higher level of SNP-voxel association). Although W is obtained after screening (eg, by voxel-level heritability analysis), it remains challenging to directly recognize the patterns of imaging-genetics associations; (B) demonstrates the detected IGDB which reveals dense blocks of imaging-genetics associations; (C) displays the refined pattern of the IGDB. In panels (B) and (C), we have reordered the SNPs and voxels to better illustrate their patterns of association.

As a greedy algorithm, the computational complexity of Algorithm 1 is linear in the size of the original graph. By determining the tuning parameters through the likelihood function, as outlined in Algorithm 2, the computation remains efficient, which took 20 minutes on a PC with an i7 CPU 3.60 GHz and 64 GB memory to detect the IGDB of the SNP-voxel association graph in chromosome 1. The computation of the p-value is dependent on the number of permutations, which can be easily parallelized for efficient computation.

We illustrate the voxel clusters and corresponding SNP sets in Figure 4. For example, the voxel cluster 2 (colored cyan) includes voxels mainly from the splenium of corpus callosum (SCC), part of one of the largest white matter tracts that connects many parts of the brain, and which lesions to often result in many varied neurological issues.46 To annotate the SNPs in the identified clusters, we queried the SNPs in the QTLbase (http://mulinlab.org/qtlbase/index.html,47) for potential expression quantitative trait locus (eQTL) and examined the genes being regulated by these variants in a tissue-specific pattern. The summary of associated genes related with brain tissues is displayed in the Supplement A as supporting information. In cluster 1, multiple SNPs are linked with the LEPR gene, a protein coding gene for leptin receptor generation that has been shown to be associated with obesity. It has been known the white matter integrity is highly associated with obese disorder and body mass index.48 Therefore, this cluster reveals the marginal association of (obesity-related) LEPR gene and white matter integrity. In clusters 2 to 5, the associated genes, for example, S100A1, TAF1A, CFH, CFHR3, and DPH5 are associated with immune system functions (http://immunet.princeton.edu/, https://www.innatedb.com/moleculeSearch.do). White matter integrity can be influenced by the immune system functions and systematic inflammation. In cluster 6, the NOS1AP gene has been found to be associated with white matter microstructure in previous studies.8 In addition, the NOS1AP gene is identified to be a risk factor for schizophrenia,49 while the alterations of white matter integrity for patients with schizophrenia were studied in Kubicki et al.50 In summary, our findings provided insights into the complex neurogenetic mechanisms of how genetic variants influence imaging traits in a systematic fashion potentially via regulating gene expression and generated hypotheses to be further confirmed in future multi-omics studies.

FIGURE 4.

FIGURE 4

An illustration of the association patterns between SNP and voxel clusters on Chromosome 1. We demonstrate the systematic imaging-genetics associations in an integrated Manhattan plot based on the results of our analysis by IGDB. The highlighted subsets of SNPs are systematically associated with corresponding areas of the white matter tracks. The dual localized association patterns provide a straightforward interpretation of the genetic effects on location-specific brain areas.

6 |. SIMULATION

6.1 |. Synthetic data

We evaluate the finite-sample performance of our proposed method based on simulation studies. We generate the input matrix Wm×n based on the two sets of multivariate variables representing genetic variants Xm×L and imaging voxels Yn×L. We let the pattern of Wm×n be determined by a graph G=(U,V,E). Specifically, we assume there exists an IGDB GS0,T0=S0,T0,ES0,T0 with higher proportion of edges as significant imaging-genetics associations (ie, μ1) than the rest of graph (ie, μ0). Then, we let the entries of Wm×n follow mixture distributions according to G as wuvδuv=1~μ1tdf(v)+1-μ1tdf(0), wuvδuv=0~μ0tdf(v)+1-μ0tdf(0), where δuv is an indicator variable with δuv=1 for edges in the IGDB and 0 otherwise. tdf(ν) and tdf(0) are the non-null and null distributions of imaging-genetics associations respectively. tdf(ν) is a t distribution with the degree of freedom L-p(p covariates) and non-central parameter v=θ4/L, where θ is standardized effect size (eg, Cohen’s d). μ1 and μ0 are the proportions of the non-null distribution within the IGDB and otherwise. We use m=200, n=100, and L=60. We simulate data sets with multiple settings by varying the size of IGDB (ie, S0,T0=(50,40) and (30, 20)), standard effect size (ie, θ=0.8,1, and 1.2), and proportions of noisy edges (ie, μ1,μ0=(0.8,0.2) and (0.9, 0.1)). Additional simulation settings with larger graph and sample sizes are included in the Appendix.

6.2 |. Performance metrics and results

We evaluate the performance of proposed method at several levels. At the subgraph-level, we assess the accuracy of IGDB inference by examining if we can reject the null (ie, no systematic imaging-genetics association). At the edge-level, we evaluate the accuracy of detected IGDB by comparing it with ground truth in terms of edge differences. We also evaluated the node-assignment accuracy of the proposed method using synthetic data (see Section 1.5 of Supplement A in the Supporting information for details). The performance was only compared to Charikar’s algorithm35 for dense component extraction instead of bi-clustering algorithms. As bi-clustering algorithms tend to assign all SNPs and voxels into clusters, they are not well suited to the IGDB structure extraction (see demonstration in Appendix).

For IGDB inference, we consider a detected IGDB G[Sˆ,Tˆ] is a recovery of the underlying IGDB GS0,T0 if it is rejected in the proposed likelihood-ratio test and has high similarity with GS0,T0. Specifically, we consider G[Sˆ,Tˆ]. is a true positive detection of GS0,T0 if JXJY is no less than the cutoff with

JX=S0SˆS0SˆandJY=T0TˆT0Tˆ,

and we succeed to reject the IGDB null hypothesis in the permutation test. We display the results with cutoff of 0.8 and 0.9 on the JXJY. Therefore, the detected IGDB leads to a false negative finding if the P-value in the permutation test is not lower than the a significant level (ie, 0.05). Besides, we observe a false positive error if G[Sˆ,Tˆ] has low similarity to GS0,T0 even we rejected the IGDB null hypothesis. We report the accuracy of inference by false positive rate (FPR) and false negative rate (FNR) among replications.

Furthermore, we compare IGDB to commonly-used multivariate testing methods at the edge-level: positive false discovery rate (pFDR) by Storey51 and Bonferroni correction. These correction methods are commonly used in GWAS and vGWAS analysis in practice. We evaluate the true Δ=δuvuU,vV with estimated Δˆ={δˆuv}uU,vV from varied methods. For the proposed method, we obtain the Δˆ based on the extracted IGDB G[Sˆ,Tˆ] and the hypothesis testing. Particularly, if we reject the IGDB null hypothesis with a detected IGDB G[Sˆ,Tˆ], we let Δˆ={δˆuv}={I(euvG[Sˆ,Tˆ])}. In the case that we fails to reject, we consider Sˆ, Tˆ as empty sets such that Δˆ=0m×n. The FDR threshold of 0.2 and corrected α level of 0.05 are used in the pFDR and Bonferroni correction respectively.

Subsequently, based on the δˆuv observed from different methods, and true parameters δuv, we calculate true positive rate (TPR) and true negative rate (TNR) as:

TPR=u,vI(δuv=δˆuv=1)u,vIδuv=1,TNR=u,vI(δuv=δˆuv=0)u,vIδuv=0.

The associated means and standard deviations are reported based on 100 replications for each simulation scenario.

The results from the IGDB inference are summarized in Table 1. The power of the IGDB inference relies on the size and SNR (by different standard effect sizes) of the underlying IGDB GS0,T0, which concurs with our theoretical results. We fails to reject the IGDB null hypothesis for one simulated data set with a smaller size (30, 20) and effect size 0.8, and higher noise (0.8, 0.2).

TABLE 1.

IGDB inference results under varied SNRs and noises.

S0,T0 q1,q2 Metrics 0.8 1.0 1.2
(50,40) (0.9, 0.1) FPR (0.8) 0 (0) 0 (0) 0 (0)
FPR (0.9) 0 (0) 0 (0) 0 (0)
FNR 0 (0) 0 (0) 0 (0)
(0.8, 0.2) FPR (0.8) 0 (0) 0 (0) 0 (0)
FPR (0.9) 0 (0) 0 (0) 0 (0)
FNR 0 (0) 0 (0) 0 (0)
(30,20) (0.9, 0.1) FPR (0.8) 0 (0) 0 (0) 0 (0)
FPR (0.9) 0 (0) 0 (0) 0 (0)
FNR 0 (0) 0 (0) 0 (0)
(0.8, 0.2) FPR (0.8) 0 (0) 0 (0) 0 (0)
FPR (0.9) 0.2100 (0.4073) 0.0400 (0.1960) 0 (0)
FNR 0.0600 (0.2375) 0 (0) 0 (0)

Note: We summarize the FPR (with cutoff of 0.8 and 0.9 on the JXJY) and FNR to evaluate the estimated IGDB. The results suggest robust and accurate performance of our method at a bi-clique level (ie, revealing patterns).

The comparative edge-level results from the proposed method and competing methods are displayed in Table 2 for different sizes of IGDB. All three methods have improved performance with higher SNRs and lower noise levels. The proposed method outperforms pFDR and Bonferroni correction methods for both TPR and TNR under different scenarios. Both pFDR and Bonferroni methods have high TNR but low TPR indicating a stringent cutoff, while the proposed method achieves a higher TPR maintaining a similar or even higher TNR than the others. The Bonferroni method is even more stringent where the TPR is even smaller than 10% when we have low SNRs (eg, 0.8) for all cases.

TABLE 2.

Edge-wise accuracy under varied IGDB sizes, SNRs and noises.

S0,T0 q1,q2 Metrics 0.8 1.0 1.2
(50,40) (0.9, 0.1) IGDB TPR 0.9879 (0.0184) 0.9942 (0.0124) 0.9968 (0.0097)
TNR 1 (0) 1 (0) 1 (0)
pFDR TPR 0.7453 (0.0090) 0.8686 (0.0045) 0.8995 (0.0023)
TNR 0.8858 (0.0020) 0.8667 (0.0018) 0.8619 (0.0018)
Bonferroni TPR 0.0520 (0.0048) 0.1739 (0.0092) 0.3941 (0.0096)
TNR 0.9942 (0.0005) 0.9806 (0.0008) 0.9562 (0.0012)
(0.8, 0.2) IGDB TPR 0.9938 (0.0126) 0.9982 (0.0064) 0.9984 (0.0061)
TNR 0.9998 (0.0006) 1.0000 (0.0003) 1.0000 (0.0004)
pFDR TPR 0.7032 (0.0067) 0.7903 (0.0039) 0.8095 (0.0027)
TNR 0.7842 (0.0021) 0.7577 (0.0019) 0.7517 (0.0018)
Bonferroni TPR 0.0458 (0.0043) 0.1557 (0.0084) 0.3506 (0.0097)
TNR 0.9884 (0.0007) 0.9612 (0.0014) 0.9125 (0.0020)
(30,20) (0.9, 0.1) IGDB TPR 0.9987 (0.0081) 0.9992 (0.0060) 1 (0)
TNR 1.0000 (0.0001) 1 (0) 1 (0)
pFDR TPR 0.7043 (0.0176) 0.8537 (0.0085) 0.8954 (0.0042)
TNR 0.9017 (0.0019) 0.8799 (0.0015) 0.8741 (0.0014)
Bonferroni TPR 0.0517 (0.0082) 0.1741 (0.0163) 0.3946 (0.0175)
TNR 0.9942 (0.0005) 0.9807 (0.0009) 0.9561 (0.0012)
(0.8, 0.2) IGDB TPR 0.8527 (0.2248) 0.9645 (0.0398) 0.9778 (0.0287)
TNR 0.9996 (0.0009) 0.9995 (0.0009) 0.9997 (0.0005)
pFDR TPR 0.6891 (0.0114) 0.7857 (0.0075) 0.8069 (0.0045)
TNR 0.7952 (0.0022) 0.7661 (0.0017) 0.7596 (0.0019)
Bonferroni TPR 0.0473 (0.0095) 0.1563 (0.0144) 0.3525 (0.0173)
TNR 0.9884 (0.0008) 0.9610 (0.0013) 0.9123 (0.0017)

Note: We compare the performance of IGDB with multiple testing correction methods in terms of the accuracy of individual SNP-voxel pairs. The extracted IGDB patterns dramatically improve the SNP-voxel pair level inference accuracy by allowing pairs to borrow strengths from each other.

7 |. DISCUSSION

Imaging-genetics studies aim to model the predictive mechanism of genetic variants on quantitative imaging measures. However, high dimensionality and complex association patterns between genetic variants and imaging traits raise a considerable challenge for statistical estimation and inference. For example, purely region-level inference erases local voxel heterogeneity, thus may be ineffective in learning spatial specificity of imaging voxels. In this article, we have developed an IGDB multivariate to multivariate analysis tool to identify systematic associations between multivariate voxel-level imaging features and multivariate genetic variants. Our method focuses on the systematic polygenic and pleiotropic patterns rather than individual pairwise associations, and thus mitigates the challenges of ultra-high dimensionality due to multivariate to multivariate association analysis. Besides, our high-resolution voxel-level genome wide association analysis is not constrained by pre-specified regions of interest, hence fully accounts for the variability between voxels, and yields data-driven brain regions associated with functionally related genetic loci. Therefore, our findings are more biologically interpretable and meaningful.

We develop a new optimization solution to extract IGDB by leveraging its graph properties that we discovered in theoretical study. Our IGDB extraction algorithm is computationally efficient and scalable. The input data for our method could be either individual-level or GWAS summary statistics. The IGDB inference method controls the family-wise error rate for IGDB-level findings. We provide theoretical results to guarantee the numerical performance of IGDB extraction and accuracy of the inference model. Although initially proposed in analyzing systematic association patterns between SNPs and voxels, this approach is also well-suited for analyzing region-level imaging data, where spatial constraints are not necessary.

In real data applications, we applied our method to the HCP data set to study the genetic effects on white matter microstructure integrity. The results revealed a variety of functionally related genetic loci that are associated with sub-regions of white matter area tracts on posterior corpus callosum. These novel findings are consistent with previous findings.30 Our annotation analysis further provide evidence that selected SNPs are associated with white matter microstructures through gene expression. The overall computational load for imaging-genetics analysis remains heavy regardless improved algorithms and computational facilities. Since our initial vGWAS is performed using GWAS analysis tools (eg, plink), the analysis is limited on individual SNPs. Regardless, the input of our method is vGWAS analysis results and thus suits for any vGWAS analysis methods. Our IGDB algorithm can also be extended to further constrain the IGDB structure by leveraging the functional annotation of genetic variants.52

In summary, we have developed a new neuroimaging-GWAS tool to identify systematic associations between multivariate imaging features and multivariate genetic variants. Our IGDB method is computationally efficient and improves the accuracy and power through revealing systematic polygenic and pleiotropic patterns.

Supplementary Material

Appendix S2 Positions of SNPs from imaging-genetic association clusters detected in all chromosomes.
Appendix S3 Coordinates of voxels from imaging-genetic association clusters detected in all chromosomes.
Appendix S4 Summary of genes related to brain tissues from annotation analysis.
Appendix S5 Supplemental Material.
Appendix S1 Supplemental Material.

ACKNOWLEDGEMENTS

This work was partially supported by the National Institute on Drug Abuse of the National Institutes of Health under Award Number 1DP1DA048968-01, R01EB015611, R01MH094520. The second author Dr. Yuan Zhang was supported by NSF Grant DMS-2311109.

APPENDIX A. ADDITIONAL NUMERICAL RESULTS

Comparisons with bi-clustering algorithms.

In our simulation analysis, we only compared our method to Charikar’s algorithm instead of bi-clustering algorithms because these methods are not well suited to dense bi-clique extraction. To demonstrate this, we applied the classic spectral bi-clustering algorithm53 to a simulated data set. Specifically, we generated a bipartite graph with m=200, n=100, L=60, the IGDB size S0,T0=(50,40), and standard effect size θ=0.8, and proportions of noisy edges μ1,μ0=(0.8,0.2). The true structure of simulated bipartite graph, detected subnetworks from competing methods are displayed in Figure A1. The convectional bi-clustering algorithms can miss the dense bi-cliques.

FIGURE A1.

FIGURE A1

Comparison with other biclustering algorithm in a simulated data set. True and detected subnetworks are highlighted in red. (A) displays the true bipartite graph with an IGDB; (B) shows the IGDB structure extracted by Algorithms 1 and 2; (C) shows subnetworks detected by spectral co-clustering algorithm with K=2; (D) highlights several subnetworks detected spectral co-clustering algorithm with K=10.

Simulation results from large graphs.

We extended our simulation studies by considering larger graphs by setting m=800, n=500, and L=200. The synthetic data was generated with an IGDB (ie, S0,T0=(100,80)). The results are displayed in Table A1 with the same setting of standard effect size (ie, θ=0.8,1, and 1.2), and proportions of noisy edges (ie, μ1,μ0=(0.8,0.2) and (0.9, 0.1)) as in the main analysis.

TABLE A1.

Edge-wise accuracy under varied SNRs and noises with S0,T0=(100,80).

q1,q2 Methods Metrics 0.8 1.0 1.2
(0.9, 0.1) IGDB TPR 0.9600 (0.0000) 0.9600 (0.0000) 0.9600 (0.0000)
TNR 0.9998 (0.0000) 0.9998 (0.0000) 0.9998 (0.0000)
pFDR TPR 0.9025 (0.0005) 0.9029 (0.0006) 0.9029 (0.0006)
TNR 0.8747 (0.0003) 0.8746 (0.0003) 0.8747 (0.0003)
Bonferroni TPR 0.6060 (0.0048) 0.8692 (0.0021) 0.8994 (0.0003)
TNR 0.9326 (0.0002) 0.9035 (0.0001) 0.9001 (0.0000)
(0.8, 0.2) IGDB TPR 0.9545 (0.0101) 0.9598 (0.0024) 0.9598 (0.0024)
TNR 0.9998 (0.0001) 0.9998 (0.0000) 0.9998 (0.0000)
pFDR TPR 0.8100 (0.0011) 0.8100 (0.0011) 0.8101 (0.0011)
TNR 0.7598 (0.0004) 0.7597 (0.0004) 0.7597 (0.0004)
Bonferroni TPR 0.5385 (0.0044) 0.7724 (0.0018) 0.7994 (0.0001)
TNR 0.8652 (0.0003) 0.8069 (0.0001) 0.8001 (0.0001)

Footnotes

CONFLICT OF INTEREST STATEMENT

The authors declare no potential conflict of interest.

SUPPORTING INFORMATION

Additional supporting information can be found online in the Supporting Information section at the end of this article.

DATA AVAILABILITY STATEMENT

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions. Software in the form of Matlab code, together with a sample input data set and complete documentation is available on github through link https://github.com/qwu1221/multi2multi.

REFERENCES

  • 1.Meisenzahl E, Koutsouleris N, Bottlender R, et al. Structural brain alterations at different stages of schizophrenia: a voxel-based morphometric study. Schizophr Res. 2008;104(1–3):44–60. [DOI] [PubMed] [Google Scholar]
  • 2.Lee S, Viqar F, Zimmerman ME, et al. White matter hyperintensities are a core feature of Alzheimer’s disease: evidence from the dominantly inherited Alzheimer network. Ann Neurol. 2016;79(6):929–939. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Savitz JB, Drevets WC. Imaging phenotypes of major depressive disorder: genetic correlates. Neuroscience. 2009;164(1):300–330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Ge T, Schumann G, Feng J. Imaging genetics-towards discovery neuroscience. Quant Biol. 2013;1(4):227–245. [Google Scholar]
  • 5.Liu J, Calhoun VD. A review of multivariate analyses in imaging genetics. Front Neuroinform. 2014;8:29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Nathoo FS, Kong L, Zhu H, Initiative ADN. A review of statistical methods in imaging genetics. Can J Stat. 2019;47(1):108–131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Smith SM, Douaud G, Chen W, et al. An expanded set of genome-wide association studies of brain imaging phenotypes in UK biobank. Nat Neurosci. 2021;24(5):737–745. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Zhao B, Zhang J, Ibrahim JG, et al. Large-scale GWAS reveals genetic architecture of brain white matter microstructure and genetic overlap with cognitive and mental health traits (n= 17,706). Mol Psychiatry. 2019;26:3943–3955. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Zhao B, Li T, Yang Y, et al. Common genetic variation influencing human white matter microstructure. Science. 2021;372(6548). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Zhu H, Khondker Z, Lu Z, Ibrahim JG. Bayesian generalized low rank regression models for neuroimaging phenotypes and genetic markers. J Am Stat Assoc. 2014;109(507):977–990. [PMC free article] [PubMed] [Google Scholar]
  • 11.Huang M, Nichols T, Huang C, et al. FVGWAS: fast voxelwise genome wide association analysis of large-scale imaging genetic data. Neuroimage. 2015;118:613–627. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Huang C, Thompson P, Wang Y, et al. FGWAS: functional genome wide association analysis. Neuroimage. 2017;159:107–121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Ge T, Feng J, Hibar DP, Thompson PM, Nichols TE. Increasing power for voxel-wise genome-wide association studies: the random field theory, least square kernel machines and fast permutation procedures. Neuroimage. 2012;63(2):858–873. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Ge T, Nichols TE, Ghosh D, et al. A kernel machine method for detecting effects of interaction between multidimensional variable sets: an imaging genetics application. Neuroimage. 2015;109:505–514. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Hibar DP, Stein JL, Kohannim O, et al. Voxelwise gene-wide association study (vGeneWAS): multivariate gene-based association testing in 731 elderly subjects. Neuroimage. 2011;56(4):1875–1891. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Stein JL, Hua X, Lee S, et al. Voxelwise genome-wide association study (vGWAS). Neuroimage. 2010;53(3):1160–1174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Chi EC, Allen GI, Zhou H, Kohannim O, Lange K, Thompson PM. Imaging genetics via sparse canonical correlation analysis. 2013 IEEE 10th International Symposium on Biomedical Imaging. New York: IEEE; 2013:740–743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Greenlaw K, Szefer E, Graham J, Lesperance M, Nathoo FS, Initiative ADN. A Bayesian group sparse multi-task regression model for imaging genetics. Bioinformatics. 2017;33(16):2513–2522. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Hardoon DR, Ettinger U, Mourão-Miranda J, et al. Correlation-based multivariate analysis of genetic influence on brain volume. Neurosci Lett. 2009;450(3):281–286. [DOI] [PubMed] [Google Scholar]
  • 20.Kong D, An B, Zhang J, Zhu H. L2RM: low-rank linear regression models for high-dimensional matrix responses. J Am Stat Assoc. 2020;115(529):403–424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Le Floch É, Guillemot V, Frouin V, et al. Significant correlation between a set of genetic polymorphisms and a functional brain network revealed by feature selection and sparse partial least squares. Neuroimage. 2012;63(1):11–24. [DOI] [PubMed] [Google Scholar]
  • 22.Liu J, Pearlson G, Windemuth A, Ruano G, Perrone-Bizzozero NI, Calhoun V. Combining fMRI and SNP data to investigate connections between brain function and genetics using parallel ICA. Hum Brain Mapp. 2009;30(1):241–255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Wang H, Nie F, Huang H, et al. Identifying quantitative trait loci via group-sparse multitask regression and feature selection: an imaging genetics study of the ADNI cohort. Bioinformatics. 2012;28(2):229–237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Vounou M, Nichols TE, Montana G. Discovering genetic associations with high-dimensional neuroimaging phenotypes: a sparse reduced-rank regression approach. Neuroimage. 2010;53(3):1147–1159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Vounou M, Janousova E, Wolz R, et al. Sparse reduced-rank regression detects genetic associations with voxel-wise longitudinal phenotypes in Alzheimer’s disease. Neuroimage. 2012;60(1):700–716. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Marcus DS, Harms MP, Snyder AZ, et al. Human connectome project informatics: quality control, database services, and data visualization. Neuroimage. 2013;80:202–219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Van Essen DC, Smith SM, Barch DM, et al. The WU-Minn human connectome project: an overview. Neuroimage. 2013;80:62–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Jahanshad N, Kochunov PV, Sprooten E, et al. Multi-site genetic analysis of diffusion images and voxelwise heritability analysis: a pilot project of the ENIGMA–DTI working group. Neuroimage. 2013;81:455–469. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kochunov P, Jahanshad N, Sprooten E, et al. Multi-site study of additive genetic effects on fractional anisotropy of cerebral white matter: comparing meta and megaanalytical approaches for data pooling. Neuroimage. 2014;95:136–150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Kochunov P, Kochunov P. Heritability of fractional anisotropy in human white matter: a comparison of human connectome project and ENIGMA-DTI data. Neuroimage. 2015;111:300–311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4(1):7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Benjamini Y, Hochberg Y. On the adaptive control of the false discovery rate in multiple testing with independent statistics. J Educ Behav Stat. 2000;25(1):60–83. [Google Scholar]
  • 33.Efron B Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Vol 1. Cambridge, UK: Cambridge University Press; 2012. [Google Scholar]
  • 34.Woo CW, Krishnan A, Wager TD. Cluster-extent based thresholding in fMRI analyses: pitfalls and recommendations. Neuroimage. 2014;91:412–419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Charikar M Greedy approximation algorithms for finding dense components in a graph. International Workshop on Approximation Algorithms for Combinatorial Optimization. Berlin: Springer; 2000:84–95. [Google Scholar]
  • 36.Khuller S, Saha B. On finding dense subgraphs. International Colloquium on Automata, Languages, and Programming. Berlin: Springer; 2009:597–608. [Google Scholar]
  • 37.Amini AA, Chen A, Bickel PJ, Levina E. Pseudo-likelihood methods for community detection in large sparse networks. Ann Stat. 2013;41(4):2097–2122. [Google Scholar]
  • 38.Cheng Y, Church GM. Biclustering of expression data. Proc Int Conf Intell Syst Mol Biol. 2000;8:93–103. [PubMed] [Google Scholar]
  • 39.Xu M, Jog V, Loh PL. Optimal rates for community estimation in the weighted stochastic block model. Ann Stat. 2020;48(1):183–204. [Google Scholar]
  • 40.Zalesky A, Fornito A, Bullmore ET. Network-based statistic: identifying differences in brain networks. Neuroimage. 2010;53(4):1197–1207. [DOI] [PubMed] [Google Scholar]
  • 41.Nichols TE. Multiple testing corrections, nonparametric methods, and random field theory. Neuroimage. 2012;62(2):811–815. [DOI] [PubMed] [Google Scholar]
  • 42.Kochunov P, Rowland LM, Fieremans E, et al. Diffusion-weighted imaging uncovers likely sources of processing-speed deficits in schizophrenia. Proc Natl Acad Sci. 2016;113(47):13504–13509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Das S, Forer L, Schönherr S, et al. Next-generation genotype imputation service and methods. Nat Genet. 2016;48(10):1284–1287. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Zou H, He D, Zhou Y. On sure screening with multiple responses. Stat Sin. 2021;31:1749–1777. doi: 10.5705/ss.202018.0462 [DOI] [Google Scholar]
  • 45.Chen S, Kang J, Xing Y, Zhao Y, Milton DK. Estimating large covariance matrix with network topology for high-dimensional biomedical data. Comput Stat Data Anal. 2018;127:82–95. [Google Scholar]
  • 46.Park MK, Hwang SH, Jung S, Hong SS, Kwon SB. Lesions in the splenium of the corpus callosum: clinical and radiological implications. Neurol Asia. 2014;19(1):79–88. [Google Scholar]
  • 47.Zheng Z, Huang D, Wang J, et al. QTLbase: an integrative resource for quantitative trait loci across multiple human molecular phenotypes. Nucleic Acids Res. 2020;48(D1):D983–D991. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Verstynen TD, Weinstein AM, Schneider WW, Jakicic JM, Rofey DL, Erickson KI. Increased body mass index is associated with a global and distributed decrease in white matter microstructural integrity. Psychosom Med. 2012;74(7):682. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Brzustowicz LM, Simone J, Mohseni P, et al. Linkage disequilibrium mapping of schizophrenia susceptibility to the CAPON region of chromosome 1q22. Am J Hum Genet. 2004;74(5):1057–1063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Kubicki M, Park H, Westin CF, et al. DTI and MTR abnormalities in schizophrenia: analysis of white matter integrity. Neuroimage. 2005;26(4):1109–1118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Storey JD. A direct approach to false discovery rates. J R Stat Soc Series B Stat Methodology. 2002;64(3):479–498. [Google Scholar]
  • 52.Li X, Li Z, Zhou H, et al. Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nat Genet. 2020;52(9):969–983. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Dhillon IS. Co-clustering documents and words using bipartite spectral graph partitioning. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM; 2001:269–274. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix S2 Positions of SNPs from imaging-genetic association clusters detected in all chromosomes.
Appendix S3 Coordinates of voxels from imaging-genetic association clusters detected in all chromosomes.
Appendix S4 Summary of genes related to brain tissues from annotation analysis.
Appendix S5 Supplemental Material.
Appendix S1 Supplemental Material.

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions. Software in the form of Matlab code, together with a sample input data set and complete documentation is available on github through link https://github.com/qwu1221/multi2multi.

RESOURCES