Integrative analysis of multi-omics and imaging data with incorporation of biological information via structural Bayesian factor analysis

Jingxuan Bao; Changgee Chang; Qiyiwen Zhang; Andrew J Saykin; Li Shen; Qi Long; for the Alzheimer’s Disease Neuroimaging Initiative

doi:10.1093/bib/bbad073

. 2023 Mar 7;24(2):bbad073. doi: 10.1093/bib/bbad073

Integrative analysis of multi-omics and imaging data with incorporation of biological information via structural Bayesian factor analysis

Jingxuan Bao ^1,^✉,², Changgee Chang ^2,^✉,², Qiyiwen Zhang ³, Andrew J Saykin ⁴, Li Shen ⁵, Qi Long ⁶; for the Alzheimer’s Disease Neuroimaging Initiative

PMCID: PMC10387302 PMID: 36882008

Abstract

Motivation

With the rapid development of modern technologies, massive data are available for the systematic study of Alzheimer’s disease (AD). Though many existing AD studies mainly focus on single-modality omics data, multi-omics datasets can provide a more comprehensive understanding of AD. To bridge this gap, we proposed a novel structural Bayesian factor analysis framework (SBFA) to extract the information shared by multi-omics data through the aggregation of genotyping data, gene expression data, neuroimaging phenotypes and prior biological network knowledge. Our approach can extract common information shared by different modalities and encourage biologically related features to be selected, guiding future AD research in a biologically meaningful way.

Method

Our SBFA model decomposes the mean parameters of the data into a sparse factor loading matrix and a factor matrix, where the factor matrix represents the common information extracted from multi-omics and imaging data. Our framework is designed to incorporate prior biological network information. Our simulation study demonstrated that our proposed SBFA framework could achieve the best performance compared with the other state-of-the-art factor-analysis-based integrative analysis methods.

Results

We apply our proposed SBFA model together with several state-of-the-art factor analysis models to extract the latent common information from genotyping, gene expression and brain imaging data simultaneously from the ADNI biobank database. The latent information is then used to predict the functional activities questionnaire score, an important measurement for diagnosis of AD quantifying subjects’ abilities in daily life. Our SBFA model shows the best prediction performance compared with the other factor analysis models.

Availability

Code are publicly available at https://github.com/JingxuanBao/SBFA.

Contact

qlong@upenn.edu

Keywords: structural Bayesian factor analysis, multi-omics, Alzheimer’s disease, biological network

Introduction

With the rapid advances in modern biotechnologies such as high-throughput genotyping and sequencing, and multimodal neuroimaging, researchers can investigate disease mechanisms from multiple perspectives simultaneously. Multi-omics data integration enables us to discover new insights into biological mechanism of complex diseases such as cancer and Alzheimer’s disease (AD) [1].

AD diagnosis prediction or AD-related cognitive score prediction can help uncover novel therapeutic targets for AD. Many studies seek to predict the AD diagnosis or cognitive scores using only imaging data [2–8] or only genetics data [9]. However, due to the pathological heterogeneity of AD [10, 11], overly relying on one type of data may undermine the power of analyses. To boost the power of the existing AD research, studying AD from both imaging and genetics/genomics perspectives may improve prediction accuracy and reduce false negative signals [12–17]. In our study, we proposed a novel structural Bayesian factor analysis (SBFA) framework to extract the common information shared by both imaging and genetics/genomics information, which can be used to guide future AD research in a biologically meaningful way.

Factor analysis is a popular statistical method for integrative analysis of multi-modal data. The factor analysis aims to describe variability among high-dimensional variables in terms of a substantially lower number of unobserved variables called latent factors. Many state-of-the-art factor-analysis-based methods have been proposed and applied to multi-omics data [18], for example to integratively define the subtypes of diseases such as breast cancer [19–23] and lung cancer [19, 20]. The iCluster [19] is a widely used factor analysis framework based on the joint latent variable model. It can flexibly model different types of genomics data in a single framework. It successfully models the tumor subtypes as latent variables integrating the information from multiple datasets. Later, Mo et al. [24] extended the iCluster to the iCluster+ framework by allowing the simultaneous integration of continuous and discrete datasets. Another highly cited integrative factor analysis method based on the non-negative matrix factorization is joint and individual variation explained (JIVE) factor analysis [25], which extends iCluster by adding a data-specific term [21]. A recently published integrative factor analysis method, structural learning and integrative decomposition of multi-view data (SLIDE) [26], represents each data view via shared and individual components. SLIDE has been successfully applied to multiple applications including exploratory dimension reduction, association analysis and consensus clustering.

There has been limited work on integrative analysis of multi-omics and imaging data in a principled modeling framework in AD studies. Some existing AD studies applied factor analysis models trying to define AD subtypes solely in terms of neuropathology [27, 28], molecular signatures [29, 30] or solely through imaging phenotype [31, 32]. Additionally, none of these studies takes prior biological network information into consideration. Network information encourages the selection of related features (e.g. genes belonging to the same pathway) and produces a more biologically interpretable result.

In AD studies, the interactions or associations within or across molecular pathways can be treated as network information. Many databases can provide pathway-level interactions and associations such as Integrative Multi-species Prediction [33] and Kyoto Encyclopedia of Genes and Genomes [34–36]. Some databases can provide tissue-specific interactions and associations such as the HumanBase [37–40]. Several existing studies achieved the incorporation of the biological network information using network-based penalties [41, 42], or proper priors in the Bayesian statistics framework [43–46]. One factor-analysis-based Bayesian integrative study of multi-omics data proposed a generalized Bayesian factor analysis (GBFA) framework by extending the iCluster+ framework [47]. They incorporate the prior biological network information by employing the Markov random field prior. However, one limitation of the GBFA framework is the phase transition problem [48, 49], which may require a massive hyperparameter tuning process.

In this paper, we propose an SBFA model that assigns the Laplace prior to the loading matrix for feature selection. It also assigns a log-normal prior to the shrinkage parameter in the Laplace prior and assigns a graph Laplacian prior [50, 51] to the precision matrix in the log-normal prior. Our model formulation enables smoothing shrinkage parameters for variables that are connected in the biological graph. Our major contributions are summarized as follows:

We propose a novel structural Bayesian factor analysis framework as a dimension reduction technique that yields latent factors by integrating the genotyping data, gene expression data, region-level brain imaging amyloid deposition data and biological network information. Our SBFA framework is biologically interpretable. As such, the common information shared among multi-omics data learned from our model can be used for various subsequent analyses.
Our model can handle both continuous (e.g. gene expression and imaging phenotypic traits) and discrete [e.g. genotyping values for single nucleotide polymorphisms (SNPs)] data simultaneously. We successfully incorporate prior biological information without suffering the phase transition problem of the existing GBFA framework [47].
Our work bridges the gap that there are few multi-omics studies applied to AD. We extracted more biologically meaningful information from multiple modalities, and our framework achieves the best prediction accuracy for the subsequent machine learning prediction problems compared with the other state-of-the-art factor-analysis-based methods.

The rest of the paper is organized as follows. We introduce our proposed method including model formulation and computation in Section 2. We present and discuss the analyses of real data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) in Section 3. We present our simulation experiment results in Section 4 and conclude the paper with some discussion remarks in Section 5.

Methodology

We propose a hierarchical Structural Bayesian Factor Analysis model named SBFA (Figure 1). Our model serves as a dimension reduction technique to extract common information from multi-modal data.

**Structural Bayesian factor analysis framework.** represents the number of multi-modal data sets with each dataset denoted as , . The concatenation of all data sets is denoted by . We assume both continuous and discrete underlying distribution for each dataset dominated by mean parameter , where is number of samples. represents the th column of factor matrix where . and represent the th row and th column of loading matrix and shrinkage parameter for Laplacian prior where and is the total number of features. represents the log-normal distribution. and denote the mean and precision matrix for log-normal prior, respectively.

Inline graphic — **Structural Bayesian factor analysis framework.** represents the number of multi-modal data sets with each dataset denoted as , . The concatenation of all data sets is denoted by . We assume both continuous and discrete underlying distribution for each dataset dominated by mean parameter , where is number of samples. represents the th column of factor matrix where . and represent the th row and th column of loading matrix and shrinkage parameter for Laplacian prior where and is the total number of features. represents the log-normal distribution. and denote the mean and precision matrix for log-normal prior, respectively.

Notation

All vectors and matrices in this paper are denoted by bold lowercase letters or bold Greek letters. We use unbold lowercase letters or Greek letters to denote the corresponding elements of the vector/matrix. The columns and rows of the matrix are indicated by the subscription. For example, Inline graphic where denotes the th element of vector of length ; where and denotes the element at the th row and th column of the matrix with dimension ; represents the th column of matrix and the represents the th row of the matrix .

Structural Bayesian factor analysis model

Suppose we have multi-modal data with Inline graphic modalities generated from different technologies. We denote each data modality by for , where can be any omics data such as genotyping data, gene expression data, DNA methylation data and metabolomics data. Additionally, it can be imaging phenotypic traits such as brain gray matter density, brain volume, protein aggregates and metabolism level in a specific area of the brain. We denote the vertical concatenation of the multi-omics data to be

where Inline graphic and is number of features in dataset , representing the th feature, and representing the th subject.

We assume that each feature of the dataset Inline graphic has the same underlying distribution family governed by the mean parameter for each sample . Under the assumption that every subject is independent of each other and every omic dataset is independent of each other conditioning on , we can write the following likelihood function:

Our SBFA framework can integrate multi-modal datasets with both continuous and discrete data types simultaneously, including Gaussian data, binomial data and negative binomial data.

Gaussian distribution:

Continuous is assumed to be normally distributed with mean and precision ,
where a gamma prior will be assigned to .
Binomial distribution:

If the random variable follows a binomial distribution with parameters trials and success probability
we will use the logit model with probability mass function (pmf) shown as follows:
(1)
The parameter is determined by data. Bernoulli distribution is a special case for the binomial distribution with to be 1.
Negative binomial distribution:

To account for over-dispersion in counting data, we accommodate the negative binomial distribution
(2)
where parameter is provided by data. The geometric distribution is a special case for the negative binomial distribution with to be 1.

The mean parameter Inline graphic are modeled by the generalized factor analysis framework as

which, in matrix form, can be written as follows:

where Inline graphic is the latent factor shared by all modalities; is the factor loading matrix; is the th row of and is the th column of ; , and . Our goal is to estimate a biologically meaningful low-rank representation () of to reduce the dimension of original multi-modal data while preserving the common information shared by multiple modalities. Such dimension reduction can reduce computational expenses and prevent the potential overfitting issue.

Inline graphic represents a low-dimensional latent driver for all omics and imaging modalities and can be used for subsequent analysis in the applications on AD. For example, it can be used to cluster the subjects to define different AD subtypes. On the other hand, it can also serve as a predictor to predict the diagnosis-related cognitive traits such as the functional assessment questionnaire (FAQ) in the ADNI database, which will be described in Section 3.

Priors

Priors for W and Z

To obtain sparse estimates for Inline graphic , we employ the Laplace prior for

where Inline graphic is the shrinkage parameter. The prior distributions for the entries in are the standard Gaussian distribution

Prior for : incorporating graphical knowledge

One key contribution of our work is to incorporate biological knowledge Inline graphic via a structured prior for the shrinkage parameter . In our setting, the graph information for each omics dataset is represented by , where denotes the nodes and denotes the edges. The presence of an edge indicates that the corresponding pair of nodes are correlated. We denote the Inline graphic as the adjacency matrix representation of . We combine the to get the graph information for the multiomics data by diagonally concatenating the adjacency matrix for each omics data . Now, the graph information of multiomics data can be represented by where with and with referring to the index of the Inline graphic th feature of modality in the concatenated graph . Such a setup can help us to achieve the goal that given independent latent factors, the only way a pair of variables can be correlated conditionally on the rest is that they must load at least one common factor. Mathematically, if and Inline graphic are connected in and for some , then is encouraged to be nonzero. To this end, we employed a log-normal prior to with graph Laplacian on the precision matrix

(3)

where scalar Inline graphic , the mean of , is a tuning parameter that controls the overall shrinkage level of . Scalar , variance of , controls the extent of individual shrinkage parameters adapted to and . If , we have and the prior reduces to the Bayesian lasso. As increases, are more adapted to and . In our study, we fix the relative variance, known as the adaptivity, to one to allow sufficient variability for Inline graphic for all shrinkage levels [47], i.e.

or equivalently,

The intuition behind the choice of log-normal prior is as follows: (i) we employ the log-normal distribution for Inline graphic to incorporate the graph information via correlation and (ii) the shrinkage parameter needs to be positive.

Prior for : incorporating biological knowledge

We use the graph Laplacian prior [45] for Inline graphic to incorporate the graph information , which is a symmetric and diagonally dominant matrix

where the elements in matrix Inline graphic , , are assigned by the following prior:

where Inline graphic is the Dirac delta function concentrated at and is the indicator function. represents all connected edges in prior biological graph information. In real data application, the graph information can be summarized from gene–gene interaction networks for genetics data and from brain functional pathway information for neuroimaging modality.

There are several mathematical intuitions behind the choice of Inline graphic . First of all, the diagonal dominance of the precision matrix ensures its positive definiteness. Second, to ensure the positiveness of on the connected pairs, we incorporate a gamma-like distribution on all the connected pairs and on those pairs without an edge. The overall tendency of positiveness of the variance-covariance matrix ensures that increasing Inline graphic for some encourages an increase of the corresponding correlation.

Model for heterogeneous data types

Our model can handle heterogeneous data types simultaneously. In our framework, we use the exponential family for modeling heterogeneous data types in Inline graphic . For example, gene expression and neuroimaging data can be modeled as continuous data with normal distribution. SNP data can be modeled as two Bernoulli random variables where each SNP feature can be represented by the presence of homozygous major allele and the presence of homozygous minor allele, respectively.

With three different underlying data distributions (Gaussian, binomial, negative binomial), we can integrate the discrete exponential family random variables by unifying the likelihood function via the following identity [52]:

where Inline graphic and ( Pólya-Gamma distribution). By applying the identity to the binomial (Eq. (1)) and negative binomial (Eq. (2)) likelihood functions, we obtain the unified likelihood function:

where the unknown parameters are shown in Table 1. Now, the full posterior density can be written as

Table 1.

Parameters for unified likelihood.

Data Type
Gaussian		0	NA
Binomial	0
Negative Binomial	0

Open in a new tab

where Inline graphic .

Feature representation

In our model, new feature representation is learned from genotyping, gene expression and imaging datasets with biological graph information. The high dimensional information is summarized into the low dimensional latent variable Inline graphic for subsequent analyses.

For example, our SBFA framework estimated low dimensional latent features, which can serve as a predictor for machine learning algorithms. For unsupervised learning such as clustering, the low dimensional features derived from our algorithm are considered to be less noisy. We can apply the existing clustering algorithm to cluster the subjects based on the new feature representation. Supervised learning such as prediction often suffers from overfitting with high dimensional input data. With the low dimension feature representation derived from our algorithm, we can reduce the risk of overfitting. Compared with the other dimension reduction techniques, our SBFA is more biologically interpretable.

Parameter estimation and computation

We estimate the factor loading matrix Inline graphic and the location parameter by finding the mode of their marginal posterior density, i.e. the maximum a posteriori (MAP) estimator. Specifically, denote the parameters to be estimated by and the nuisance parameters by . Then, the MAP estimator for and is defined as

where Inline graphic is the (joint) posterior density. To find the MAP, we could utilize the expectation maximization (EM) algorithm [53], which iteratively maximizes conditional expectations as follows:

where Inline graphic denotes the expectation with respect to the conditional distribution .

Unfortunately, the conditional expectation of Inline graphic in our model does not have a closed-form solution. Therefore, we choose to use the variational EM algorithm [54, 55], where the conditional distribution is approximated by a more tractable distribution family parameterized by . The variational parameter can be determined by minimizing the Kullback–Leibler (KL) divergence

where Inline graphic denotes the expectation under . Thus, the variational EM is equivalent to maximizing

which is called the evidence lower bound (ELBO) [55] as it always satisfies Inline graphic , with respect to and .

In our model, we use a product measure on Inline graphic as the variational (approximating) measure, i.e. where

where Inline graphic ’s, ’s, ’s ’s, ’s, ’s and ’s are variational parameters. Note that is called tilted Pólya-Gamma distribution.

E-step for : We update the variational parameters Inline graphic and by matching the form of the equations between the expectation for log variational measure and the expectation for log posterior measure

where Inline graphic and ; is the mean of the variational measure for the th element of ; is the th row and th column of the covariance matrix of the variational measure for .

We then update the expectation of the Inline graphic using the mean formula for gamma distribution by

for Inline graphic and . Now, we have a graph Laplacian matrix

E-Step for : Since there exists an Inline graphic term in the log posterior distribution but not in the log variational measure, we cannot update the variational parameters by matching the form of equations as we did when updating the (E-step for ). Instead, we need to maximize the ELBO function and the results give us that

where Inline graphic , , represents the Lambert W function [56] and

and Inline graphic is an operator taking the diagonal elements of a square matrix as a column vector; represents the th row and th column element of the matrix ; represents the mean matrix for ; represents the covariance matrix for th column of ; represents the th element of the vector .

The update rule for the covariance matrix of Inline graphic , i.e., , is separated into two parts. In the first part, we update the off-diagonal element of the by

(4)

for each Inline graphic , where

The Inline graphic operator defines the trace of the matrix, and the minus sign in the subscript represents the matrix with the corresponding row or column removed, or vector with the corresponding element removed.

In the second part, we update the diagonal element of the Inline graphic as follows:

(5)

for each Inline graphic , where the r-Lambert function is defined as the inverse of the function [57] and

where Inline graphic is the inverse operator for a square matrix; is the th row and th column of the loading matrix . We iteratively update for each and .

E-step for : We update the precision matrix of the Inline graphic th column of factor matrix , i.e. , by matching the form of equations between log posterior distribution and the log variational measure. For each , we have

where Inline graphic is identity matrix; is the mean for parameter and is the th row of loading matrix . For each , we update the mean parameter for by

E-step for : By matching the form of expectation of log posterior and expectation of log variational measure, the variational parameter Inline graphic for each and for can be updated by

where Inline graphic is the th row of loading matrix and is the variance-covariance matrix for variational measure of .

If Inline graphic is a discrete random variable, we have

whereas if Inline graphic follows Gaussian distribution, we have

M-step for : We update the loading matrix row by row via maximizing the ELBO function with respect to Inline graphic for each . The optimization problem is solved by the dynamic weighted lasso algorithm [58].

M-step for : The location parameter Inline graphic is updated by maximizing the ELBO function with respect to for each

Initialization

For the initialization, we define the matrix Inline graphic as an approximation of the mean matrix as follows [47]:

The location vector Inline graphic can be initialized by a fixed pre-specified constant vector. If not, for each , can be initialized by the trimmed mean, median or mean of .

In our setting, we handle the problems by choosing Inline graphic and matrices close to orthogonal matrices via singular value decomposition. Specifically, we have

where Inline graphic and are orthonormal matrices and is diagonal matrix.

Hyperparameter tunning

We used the Bayesian information criterion (BIC) to tune the hyperparameters. We select the hyperparameters with the lowest BIC criteria

where Inline graphic is the estimated mean matrix of ; is the number of subjects and denotes the degrees of freedom, which is the number of non-zero elements in estimated factor loading matrix. We try different sets of hyperparameters () in both simulation and real data analysis.

Analysis of ADNI Data

To demonstrate the usefulness of our proposed method in real data analysis, we aimed to predict the FAQ cognitive score using the latent factors learned through integrative analysis of the genotyping, gene expression, brain regional level amyloid deposition, gene-gene interaction network and brain functional network data from the ADNI biobank database. We also performed the same analysis using existing factor analysis models (JIVE [25], SLIDE [26], iCluster+ [24], GBFA [47]) as the benchmark.

Data description and data preprocessing

The genotyping data, gene expression data, demographic data and imaging data used in the preparation of this article were obtained from the ADNI database (adni.loni.usc.edu) [59–61]. The ADNI was launched in 2003 as a public–private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging, positron emission tomography (PET), other biological markers and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early AD. Up-to-date information about the ADNI is available at www.adni-info.org.

After data preprocessing steps and the common subjects matching in each omics data, we have 604 subjects left with main characteristics summarized in Table 2.

Table 2.

Participant characteristics. Total number of subjects, age and sex are shown in this table. The mean Inline graphic sd for the age of all subjects within each diagnosis group is reported. The number of male/female subjects within each diagnosis group is also shown.

Diagnosis	CN	EMCI	LMCI	AD	Overall
Number	199	198	165	42	604
Age ()
Sex (M/F)

Open in a new tab

Demographic data preprocessing and cognitive score

Our demographic data and the cognitive score are extracted from Quantitative Templates for the Progression of Alzheimer’s Disease project in the Alzheimer’s Disease Modeling Challenge (www.pi4cs.org/qt-pad-challenge). We use baseline age, gender, education and the presence of APOE4 as our covariates. A test of the subjects’ daily living activities, the FAQ [62], is extracted as our dependent variable of real data analysis. All subjects with missing values are excluded from our study.

Genotyping data preprocessing

For the genotyping data, we performed quality control (QC) using the following criteria: genotyping call rate > 95%, minor allele frequency > 5% and Hardy–Weinberg Equilibrium > Inline graphic . After the QC, rs429358 (APOE) SNP was added to the genotyping data. We prioritized the SNP markers based on the existing findings from the major AD GWAS articles [63–67]. Our raw data were obtained by additive recoding the number of minor alleles (–recode A in PLINK1.9 [68]). Given that we want to model our genotyping data as Bernoulli distribution, we further recoded the genotyping data by the presence of homozygous major allele and the presence of homozygous minor allele, which doubled the number of input features for the genotyping dataset.

Gene expression data preprocessing

Gene expression data were produced using Affymetrix Human Genome U219 Array (Affymetrix, Santa Clara, CA) for expression profiling where the blood sample extraction, hybridization and QC process could be found in [59]. In the raw data, one gene can correspond to multiple probes. In our real data analysis, we averaged all the probes within each gene across the subjects to obtain the corresponding gene expression. We prioritize the AD-related genes according to Disease Gene Network (DisGeNet) [69]. We extract all the genes that have gene–disease association scores [69] greater than 0.15.

Imaging data preprocessing

For the neuroimaging data, participants with complete [ Inline graphic F]florbetapir (AV45) PET data (measuring amyloid burden) were included in our analysis. The AV45 PET scans were registered to the Montreal Neurological Institute space, and the standard uptake value ratio was computed by intensity normalization using the cerebellar curs reference region. Region-of-interest-level (ROI-level) AV45 measures were extracted based on the Glasser atlas [70], where 360 ROI-level quantitative traits (QTs) were obtained by averaging all the voxel-level measures within each ROI.

Biological network

We download the tissue-specific gene–gene interaction network from GIANT [40] in HumanBase database (hb.flatironinstitute.org). The imaging network is obtained from Glasser et al. [70], which utilizes a supervised machine learning classifier to automatically delineate and identify each cortical area from its neighbors across a large majority of individual subjects based on multi-modal information.

Real data analysis

In our real data analysis, we evaluate our SBFA framework by predicting the FAQ score using our extracted latent factors, which can be considered as a low-dimensional representation of the multi-omics data sets. Our real data analysis pipeline is shown in Figure 2.

Our framework starts from the dimention reduction of multi-omics data including indicator variables for homozygous dominant genotyping data Inline graphic , indicator variables for homozygous recessive genotyping data , AV45 imaging measurements and gene expression . Specifically, and are binary matrices with dimension , the entries of which are assumed to follow the Bernoulli distribution. and are normalized QTs with dimension and Inline graphic , which are assumed to be normally distributed.

We estimate the latent factors from the multi-omics and imaging dataset using our proposed SBFA method, in comparison with several existing methods including iCluster+ [19, 71], JIVE [25], SLIDE [26] and GBFA [47]. For the hyperparameters, we fix Inline graphic and tune . The best tuning results are obtained by minimizing the BIC criteria, which is presented in Figure 2. The best-tuned is indicated by , which is the number of columns of the corresponding latent factors. For our proposed SBFA model, the best-tuned hyperparameter set is . We then apply linear regression with lasso, ridge and elastic net regularizations models to predict the FAQ score using the latent factors learned from the aforementioned methods as predictors. For comparison purposes, we also include three other approaches that use (i) the demographic covariates, (ii) the multi-omics and imaging data and (iii) the multi-omics and imaging data together with the demographic covariates as the input of machine learning models, respectively; see Table 3.

Table 3.

Real data performance. The averaged MSE between the predicted FAQ score and the true FAQ score is reported across 100 random splittings. All standard errors are rounded to 0 after preserving 3 digits. For comparison purposes, we also include three other approaches that use (A) the demographic covariates, (B) the multi-omics and imaging data and (C) the multi-omics and imaging data together with the demographic covariates as the input of machine learning models.

FeatureMethod	Lasso			Ridge			ElasticNet
	Train	Validation	Test	Train	Validation	Test	Train	Validation	Test
Raw data and clinical variables
Demographic covariates	0.626	0.955	0.953	0.592	0.934	0.949	0.596	0.938	0.945
Multi-omics and imaging	0.626	0.958	0.959	0.334	0.883	0.921	0.516	0.864	0.940
Multi-omics, imaging and covariates	0.622	0.957	0.956	0.333	0.886	0.933	0.517	0.864	0.940
Latent factors
JIVE	0.638	0.964	0.963	0.638	0.964	0.963	0.638	0.964	0.963
SLIDE	0.638	0.964	0.963	0.478	0.920	0.908	0.638	0.963	0.963
iCluster+	0.638	0.964	0.963	0.637	0.974	0.961	0.638	0.964	0.963
GBFA	0.528	0.894	0.918	0.452	0.803	0.911	0.483	0.842	0.918
SBFA (proposed)	0.538	0.896	0.893	0.454	0.850	0.881	0.466	0.853	0.895

Open in a new tab

For our training process, we first evenly split the subjects into five folds where 4-folds are treated as the training-validation set and 1-fold is treated as the testing set. Next, we apply 4-fold cross-validation in our training-validation set to select the best tuning hyperparameters according to the mean squared error (MSE). The best tuning models are obtained from the validation set and applied to the testing set to get the testing performance. We repeat the splitting and tuning process 100 times by randomly splitting the data 100 times with 100 different random seeds. We summarized our real data analysis results in Table 3. The real data analysis performance is reported using the mean of the training, validation and testing MSE across 100 random splits. The standard errors for the training and testing MSE are rounded to zero after preserving three digits.

As shown in Table 3, compared with the results using original multi-omics and imaging data and clinical variables, our SBFA method does a better job in mitigating the overfitting problem. In particular, the three prediction models trained using estimated factors from our SBFA method outperform the models trained using the original multi-omics, imaging and demographic variables, in spite of the fact that our factor analysis used only multi-omics and imaging data, not demographic variables. Compared with other existing factor analysis methods, our SBFA method yields the best performance (smallest MSE) across all comparisons (i.e. linear regression model with lasso, ridge and elastic net regularizations).

Simulation Study

We compared the performance of our proposed SBFA framework with other factor analysis frameworks including JIVE [25], SLIDE [26] and GBFA model with spike and slab prior and Markov random field prior [47] in the simulation study.

Simulation design

We evaluated our proposed SBFA framework in two different simulation scenarios—low dimensional scenario and high dimensional scenario with the number of features Inline graphic and , respectively. We used the same number of samples in both simulation scenarios with . To simulate multi-omics data, we used the following model:

(6)

where Inline graphic , , and . We choose the ground truth latent variable and the number of datasets .

We design our ground-truth loading matrix Inline graphic as shown in Figure 3, where each row represents a pathway (). The shaded areas are generated by continuous uniform distribution with and every non-zero element of the loading matrix has a probability of to flip its sign.

Groundtruth setting for factor loading matrix . Numbers labeled on each column represent the latent variable. Numbers labeled on rows represent the number of rows from the first row to the corresponding row. All shaded area represents elements generated by with probability to be multiplied by . All the other area contains only zero elements.

The ground truth factor matrix Inline graphic is simply generated by

where Inline graphic represents the zero vector, and represents the identity matrix.

The ground truth mean parameter for data Inline graphic , i.e. , is obtained by

where we assume location parameter Inline graphic . For the first setting in each simulation scenario, we assume only Gaussian distribution with mean and standard deviation for all . For the second setting in each simulation scenario, we assume the first two omic datasets with features following Bernoulli distribution with each element having probability Inline graphic to be . It mimics the data structure for the allele homozygosity encoded genotyping dataset. For the second two omics datasets with features, we assume that they are generated by a binomial distribution with trail parameter independently sampled from the set . For each feature and each subject, we assume the success probability parameter Inline graphic . For the third two omics datasets with features, we assume that they are generated to mimic the gene expression and imaging data, which follows the Gaussian distribution with mean and standard deviation .

For the graph information, we evaluate our algorithm with Inline graphic different settings:

Within-pathway edges only: For the first graph information setting, we randomly generate edges within each pathway and do not add across pathway edges (i.e. in total, we have within pathway edges for features). This is a simulation of an ideal case.
Within-pathway edges and cross pathway edges: For the second graph information setting, besides the within pathway edges generated in the first setting, we also randomly generate cross-pathway edges within each modality (i.e. in total, we have within pathway edges and cross-pathway edges for features).

The graph with within-pathway edges only represents the ideal case where the graphical structure coincides with the sparsity structure in Inline graphic , while the cross-pathway edges mimic the real data.

Results

In our simulation study, we seek to reconstruct the mean matrix from the extracted low-rank representation. The performance is evaluated by the relative reconstruction error (RRE) between estimated Inline graphic and true using Frobenius norm

(7)

where Inline graphic denotes the ground truth mean parameter; represents the estimated mean parameter.

To reduce the randomness of the performance, we generate Inline graphic different datasets and apply our proposed algorithm to obtain the new feature representation for each dataset. We fixed and tuned the parameters using BIC. We compare our SBFA framework with four different baseline factor analysis algorithms including JIVE, SLIDE, iCluster+ and GBFA.

We first evaluate our results by calculating the mean and the standard error of the RRE across Inline graphic simulation datasets. It is worth noting that the best-tuned based on BIC is always equal to or larger than the ground truth. For those greater than the ground truth, compared with the other baseline methods, the estimated loadings by SBFA for the extra latent dimension(s) are exactly 0 except for a small number of nonzero entries (supplementary information Figure 1). The best-tuned results for each method and each simulation setting are summarized in Tables 4 and 5. Our proposed SBFA model always outperforms the other baseline methods in both the Gaussian distribution setting (Table 4) and the mixture distribution setting (Table 5).

Figure e1 — Estimated factor loadings for the simulated dataset. The heatmap shows the factor loadings from iCluster, JIVE, SLIDE and GBFA without graph information (GBFA(NE)), GBFA with graph information (GBFA(E)), SBFA without graph information (SBFA(NE)), SBFA with graph information (SBFA(E)) and the ground truth. The results are derived from one simulated dataset in the low-dimensional scenario (). The optimal latent dimension () for each model is indicated by the number of circles in the corresponding method track. Numbers beyond the range of () have the same color as the boundaries ( and 1).

Table 4.

Simulation performance results for Gaussian only setting. ‘Dimension’ denotes our simulation scenario where ‘Low’ indicates the simulation scenario with Inline graphic and ‘High’ indicates the simulation scenario with . ‘Edge’ represents different graph information where ‘Within’ denotes the ideal case that we use the graph information which only connects the features within each pathway, and ‘Cross’ denotes the case that we use the graph information that connects the features not only within each pathway but also some randomly selected nodes between pathways within each modality. ‘RRE (mean)’ is the average of the reconstruction error between estimated Inline graphic and the ground truth ratio to the true across 100 simulations. ‘RRE (se)’ is the standard error of the reconstruction error between estimated and the ground truth across 100 simulations.

Method	Dimension	Edge	RRE (mean)	RRE (se)
JIVE	low		0.236552	0.006583
SLIDE	low		0.167271	0.000722
iCluster	low		0.221568	0.000935
GBFA ()	low	Within	0.198160	0.000848
GBFA ()	low	Within	0.190166	0.001088
SBFA (No )	low	Within	0.182882	0.000743
SBFA (With )	low	Within	0.180844	0.000699
GBFA ()	low	Cross	0.198160	0.000848
GBFA ()	low	Cross	0.185953	0.001015
SBFA (No )	low	Cross	0.182882	0.000743
SBFA (With )	low	Cross	0.181908	0.000704
JIVE	high		0.180514	0.001741
SLIDE	high		0.154596	0.000727
iCluster	high		0.188864	0.000806
GBFA ()	high	Within	0.166760	0.000663
GBFA ()	high	Within	0.166246	0.000772
SBFA (No )	high	Within	0.153240	0.000686
SBFA (With )	high	Within	0.151277	0.000665
GBFA ()	high	Cross	0.166760	0.000663
GBFA ()	high	Cross	0.158246	0.000882
SBFA (No )	high	Cross	0.153240	0.000686
SBFA (With )	high	Cross	0.152455	0.000672

Open in a new tab

Table 5.

Simulation performance results for mixture distribution setting. ‘Dimension’ denotes our simulation scenario where ‘Low’ indicates the simulation scenario with Inline graphic and ‘High’ indicates the simulation scenario with . ‘Edge’ represents different graph information where ‘Within’ denotes the ideal case that we use the graph information which only connects the features within each pathway, and ‘Cross’ denotes the case that we use the graph information that connects the features not only within each pathway but also some randomly selected nodes between pathways within each modality. ‘RRE (mean)’ is the average of the reconstruction error between estimated Inline graphic and the ground truth ratio to the true across 100 simulations. ‘RRE (se)’ is the standard error of the reconstruction error between estimated and the ground truth across 100 simulations.

Method	Dataset	Edge	RRE (mean)	RRE (se)
JIVE	low		1.539543	0.014274
SLIDE	low		1.519234	0.010254
iCluster	low		1.407917	0.011129
GBFA ()	low	Within	0.220563	0.000961
GBFA ()	low	Within	0.220918	0.000993
SBFA (No )	low	Within	0.197850	0.000877
SBFA (With )	low	Within	0.196208	0.000864
GBFA ()	low	Cross	0.220563	0.000961
GBFA ()	low	Cross	0.217769	0.001006
SBFA (No )	low	Cross	0.197850	0.000877
SBFA (With )	low	Cross	0.199828	0.000874
JIVE	high		1.551844	0.007424
SLIDE	high		1.508521	0.006791
iCluster	high		1.397968	0.007480
GBFA ()	high	Within	0.190666	0.000886
GBFA ()	high	Within	0.188583	0.000733
SBFA (No )	high	Within	0.175308	0.000773
SBFA (With )	high	Within	0.173823	0.000733
GBFA ()	high	Cross	0.190666	0.000886
GBFA ()	high	Cross	0.189722	0.000890
SBFA (No )	high	Cross	0.175308	0.000773
SBFA (With )	high	Cross	0.177781	0.000764

Open in a new tab

In the Gaussian distribution setting (Table 4), without incorporating the graph information, our proposed model (SBFA No Inline graphic ) shows significant improvement compared with JIVE, SLIDE, iCluster+ and GBFA without graph information (GBFA ). By taking the graph information into consideration, our proposed model (SBFA With ) can further achieve a smaller RRE and significantly outperforms the GBFA model (GBFA ) both with and without cross pathway noise.

In the mixture distribution setting (Table 5), JIVE, SLIDE and iCluster yield a large RRE compared with GBFA and SBFA models. This is expected because those methods handle the discrete data by treating them as continuous random variables. Although the GBFA model and our proposed SBFA model can both handle discrete and continuous data simultaneously, our SBFA model outperforms the GBFA model in both low-dimensional simulation and high-dimensional simulation scenarios with or without incorporating the graph information.

In summary, our proposed SBFA model can handle discrete and continuous data simultaneously. Our simulation study demonstrates that our proposed SBFA model can outperform the most state-of-the-art factor analysis models. Moreover, our SBFA model can incorporate the biological graph information as prior knowledge so that our model can produce biologically interpretable results.

Conclusion

In this article, we proposed a structural Bayesian factor analysis model which decomposes the underlying mean parameters of the data into a sparse factor loading matrix and factor matrix. Our model employed the Laplace prior to achieve the sparse estimation of the loading matrix and used a graph Laplacian prior to incorporate the biological network information. Our method can handle both discrete and continuous data simultaneously. We derived an efficient variational EM algorithm to obtain the MAP estimators of the parameters of interest.

Our SBFA framework is a useful dimension reduction technique that yields latent factors that are biologically interpretable. As such, the common information shared among multi-modal data learned from our model can be used for various downstream machine learning tasks including, but not limited to, prediction. For example, another potential application is to use learned factors for clustering which can be used to define disease subtypes based on genotyping data, imaging modalities and biological pathways.

Our simulation study demonstrates that our proposed SBFA framework achieves the best performance compared with the other state-of-the-art factor-analysis-based integrative methods. Our work bridges the gap that there are few multi-omics studies applied to AD. We used the multi-omics data from the ADNI dataset to predict the FAQ score, which is an important evaluation for the diagnosis of AD. The real data analysis results show the advantages of our framework by comparing the prediction performance between our SBFA framework and other state-of-the-art factor-analysis-based integrative methods. Our method exhibited the best prediction performance.

Key Points

We propose a novel structural Bayesian factor analysis framework as a dimension reduction technique that yields latent factors by integrating the genotyping data, gene expression data, region-level brain imaging amyloid deposition data and biological network information. Our SBFA framework is biologically interpretable. The common information shared among multi-omics data learned from our model can be used for various subsequent analyses such as supervised and unsupervised machine learning.
Our structural Bayesian factor analysis model can handle both continuous (e.g. gene expression and imaging phenotypic traits) and discrete [e.g. genotyping values for single nucleotide polymorphisms (SNPs)] datatypes simultaneously.
Our work bridges the gap that there are few multi-omics studies applied to Alzheimer’s disease. We applied our framework to extract biologically meaningful information shared by multiple modalities in the ADNI database. We performed the subsequent machine learning analyses to predict the Alzheimer’s disease-related cognitive score using the extracted latent factors. Our structural Bayesian factor analysis model achieves the best prediction accuracy for the subsequent machine learning prediction problems compared with the other state-of-the-art factor-analysis-based methods.

Acknowledgements

This study was partly supported by NIH grants RF1AG063481, R01AG071174, R01 AG071470, U01 AG066833 and U01 AG068057. Data used in this study were obtained from the ADNI database (adni.loni.usc.edu), which was funded by NIH U01AG024904.

Jingxuan Bao is a Ph.D. student co-advised by Dr. Li Shen and Dr. Qi Long at the University of Pennsylvania Perelman School of Medicine.

Changgee Chang is an assistant professor at Indiana University School of Medicine.

Qiyiwen Zhang is a postdoctoral fellow in Dr. Qi Long’s lab at the University of Pennsylvania Perelman School of Medicine.

Andrew J. Saykin is the director of the Center for Neuroimaging and Indiana Alzheimer's Disease Research Center. He is a professor at Indiana University School of Medicine.

Li Shen is a Professor of Informatics at the University of Pennsylvania Perelman School of Medicine.

Qi Long is a Professor of Biostatistics at the University of Pennsylvania.

Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.ucla.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: {http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf}.

Contributor Information

Jingxuan Bao, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, 19104, PA, USA.

Changgee Chang, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, 19104, PA, USA.

Qiyiwen Zhang, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, 19104, PA, USA.

Andrew J Saykin, Department of Radiology and Imaging Sciences, Indiana University, Indianapolis, 46202, IN, USA.

Li Shen, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, 19104, PA, USA.

Qi Long, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, 19104, PA, USA.

Data availability

The source code is available through GitHub (https://github.com/JingxuanBao/SBFA). The genotyping data, gene expression data, demographic data, and imaging data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. The data are available after application through the ADNI website (http://adni.loni.usc.edu).

Author contributions statement

Jingxuan Bao and Dr. Changgee Chang take full responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Jingxuan Bao, Dr. Changgee Chang and Dr. Qi Long.

Acquisition, analysis or interpretation of data: Jingxuan Bao, Dr. Changgee Chang, Dr. Andrew J. Saykin, Dr. Li Shen and Dr. Qi Long.

Drafting of the manuscript: Jingxuan Bao, Dr. Changgee Chang and Dr. Qi Long.

Critical revision of the manuscript for important intellectual content: All authors.

Study supervision: Dr. Changgee Chang and Dr. Qi Long.

References

1. Subramanian I, Verma S, Kumar S, et al. Multi-omics data integration, interpretation, and its application. Bioinform Biol Insights 2020; 14:1177932219899051. [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Trambaiolli LR, Lorena AC, Fraga FJ, et al. Improving alzheimer’s disease diagnosis with machine learning techniques. Clin EEG Neurosci 2011; 42(3): 160–5. [DOI] [PubMed] [Google Scholar]
3. Kim M, Bao J, Liu K, et al. Structural connectivity enriched functional brain network using simplex regression with graphnet. In: Machine Learning in Medical Imaging. Cham: Springer International Publishing, 2020, 292–302. [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Kim M, Bao J, Liu K, et al. A structural enriched functional network: an application to predict brain cognitive performance. Med Image Anal 2021; 71:102026. [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Kochunov P, Zavaliangos-Petropulu A, Jahanshad N, et al. A white matter connection of schizophrenia and Alzheimer’s disease. Schizophr Bull 2020; 47(1): 197–206. [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Sendi MSE, Zendehrouh E, Miller RL, et al. Alzheimer’s disease projection from normal to mild dementia reflected in functional network connectivity: a longitudinal study. Front Neural Circuits 2021; 14: 593263. [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Sendi MSE, Zendehrouh E, Fu Z, et al. Disrupted dynamic functional network connectivity among cognitive control networks in the progression of alzheimer’s disease. Brain Connect 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Wan J, Zhang Z, Rao BD, et al. Identifying the neuroanatomical basis of cognitive impairment in alzheimer’s disease by correlation- and nonlinearity-aware sparse bayesian learning. IEEE Trans Med Imaging 2014; 33(7): 1475–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Madar IH, Sultan G, Tayubi IA, et al. Identification of marker genes in alzheimer’s disease using a machine-learning model. Bioinformation 2021; 17(2): 348–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Ferreira D, Nordberg A, Westman E. Biological subtypes of alzheimer disease. Neurology 2020; 94(10): 436–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Jellinger KA. Pathobiological subtypes of alzheimer disease. Dement Geriatr Cogn Disord 2021; 49:321–33. [DOI] [PubMed] [Google Scholar]
12. Lin D, Cao H, Calhoun VD, et al. Sparse models for correlative and integrative analysis of imaging and genetic data. J Neurosci Methods 2014; 237:69–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Zhu X, Yao J, Xiao G, et al. Imaging-genetic data mapping for clinical outcome prediction via supervised conditional gaussian graphical model. In: 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). New York City, USA: Institute of electrical and electronics engineers (IEEE), pp. 455–9, 2016.
14. Batmanghelich NK, Dalca A, Quon G, et al. Probabilistic modeling of imaging, genetics and diagnosis. IEEE Trans Med Imaging 2016; 35(7): 1765–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Shen L, Thompson PM. Brain imaging genomics: integrated analysis and machine learning. Proc IEEE 2020; 108(1): 125–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Shen L, Kim S, Qi Y, et al. Identifying neuroimaging and proteomic biomarkers for mci and ad via the elastic net. Multimodal Brain Image Analysis 2011; 7012:27–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Wang H, Nie F, Huang H, et al. Identifying disease sensitive and quantitative trait-relevant biomarkers from multidimensional heterogeneous imaging genetics data via sparse multimodal multitask learning. Bioinformatics 2012; 28(12): i127–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Wang D, Gu J. Integrative clustering methods of multi-omics data for molecule-based cancer classifications. Quant Biol 2016; 4(1): 58–67. [Google Scholar]
19. Shen R, Olshen AB, Ladanyi M. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 2009; 25(22): 2906–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Shen R, Wang S, Mo Q. Sparse integrative clustering of multiple omics data sets. Ann Appl Stat 2013; 7(1): 269–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Chauvel C, Novoloaca A, Veyre P, et al. Evaluation of integrative clustering methods for the analysis of multi-omics data. Brief Bioinform 2019; 21(2): 541–52. [DOI] [PubMed] [Google Scholar]
22. Chalise P, Fridley BL. Integrative clustering of multi-level ‘omic data based on non-negative matrix factorization algorithm. PLoS One 2017; 12(5): 1–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Ahmed KT, Sun J, Cheng S, et al. Multi-omics data integration by generative adversarial network. Bioinformatics 2021; 38(1): 179–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
24. Mo Q, Wang S, Seshan VE, et al. Pattern discovery and cancer gene identification in integrated cancer genomic data. Proc Natl Acad Sci U S A 2013; 110(11): 4245–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Lock EF, Hoadley KA, Marron JS, et al. Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. Ann Appl Stat 2013; 7(1): 523–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Gaynanova I, Li G. Structural learning and integrative decomposition of multi-view data. Biometrics 2019; 75(4): 1121–32. [DOI] [PubMed] [Google Scholar]
27. Murray ME, Graff-Radford NR, Ross OA, et al. Neuropathologically defined subtypes of alzheimer’s disease with distinct clinical characteristics: a retrospective study. Lancet Neurol 2011; 10(9): 785–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Whitwell JL, Dickson DW, Murray ME, et al. Neuroimaging correlates of pathologically defined subtypes of alzheimer’s disease: a case-control study. Lancet Neurol 2012; 11(10): 868–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Neff RA, Wang M, Vatansever S, et al. Molecular subtyping of alzheimer’s disease using rna sequencing data reveals novel mechanisms and targets. Sci Adv 2021; 7(2): eabb5398. [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Di Fede G, Catania M, Maderna E, et al. Molecular subtypes of alzheimer’s disease. Sci Rep 2018; 8(1): 3269. [DOI] [PMC free article] [PubMed] [Google Scholar]
31. Levin F, Ferreira D, Lange C, et al. Data-driven fdg-pet subtypes of alzheimer’s disease-related neurodegeneration. Alzheimer’s Res Ther 2021; 13(1): 49. [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Qiang W, Yau WM, Jun-Xia L, et al. Structural variation in amyloid- fibrils from alzheimer’s disease clinical subtypes. Nature 2017; 541(7636): 217–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
33. Wong AK, Krishnan A, Yao V, et al. IMP 2.0: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks. Nucleic Acids Res 2015; 43(W1): W128–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
34. Kanehisa M, Goto S. KEGG: Kyoto Encyclopedia of genes and genomes. Nucleic Acids Res 2000; 28(1): 27–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
35. Kanehisa M. Toward understanding the origin and evolution of cellular organisms. Protein Sci 2019; 28(11): 1947–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
36. Kanehisa M, Furumichi M, Sato Y, et al. KEGG: integrating viruses and cellular organisms. Nucleic Acids Res 2020; 49(D1): D545–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
37. Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning–based sequence model. Nat Methods 2015; 12(10): 931–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
38. Zhou J, Theesfeld CL, Yao K, et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat Genet 2018; 50(8): 1171–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
39. Krishnan A, Zhang R, Yao V, et al. Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nat Neurosci 2016; 19(11): 1454–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
40. Greene CS, Krishnan A, Wong AK, et al. Understanding multicellular function and disease with human tissue-specific networks. Nat Genet 2015; 47(6): 569–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
41. Li C, Li H. Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 2008; 24(9): 1175–82. [DOI] [PubMed] [Google Scholar]
42. Pan W, Xie B, Shen X. Incorporating predictor network in penalized regression with application to microarray data. Biometrics 2010; 66(2): 474–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
43. Li F, Zhang NR. Bayesian variable selection in structured high-dimensional covariate spaces with applications in genomics. J Am Stat Assoc 2010; 105(491): 1202–14. [Google Scholar]
44. Stingo FC, Chen YA, Tadesse MG, et al. Incorporating biological information into linear models: a bayesian approach to the selection of pathways and genes. Ann Appl Stat 2011; 5(3): 1978–2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
45. Chang C, Kundu S, Long Q. Scalable bayesian variable selection for structured high-dimensional data. Biometrics 2018; 74(4): 1372–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
46. Rockova V, Lesaffre E. Incorporating grouping information in Bayesian variable selection with applications in genomics. Bayesian Anal 2014; 9(1): 221–58. [Google Scholar]
47. Min EJ, Chang C, Long Qi. Generalized bayesian factor analysis for integrative clustering with applications to multi-omics data. In: 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA). New York City, USA: Institute of electrical and electronics engineers (IEEE), pp. 109–19, 2018. [DOI] [PMC free article] [PubMed]
48. Coniglio A, Nappi CR, Peruggi F, et al. Percolation and phase transitions in the ising model. Commun Math Phys 1976; 51(3): 315–23. [Google Scholar]
49. Dubes RC, Jain AK, Nadabar SG, Chen CC. Mrf model-based algorithms for image segmentation. In: [1990] Proceedings. 10th International Conference on Pattern Recognition. New York City, USA: Institute of electrical and electronics engineers (IEEE), pp. 808–14, vol. 1, 1990. [Google Scholar]
50. Chung FR. Spectral graph theory. Rhode Island, US: American Mathematical Soc, 1997. [Google Scholar]
51. Ng AY, Jordan MI, Weiss Y. On spectral clustering: Analysis and an algorithm. In: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, NIPS’01. Cambridge, Massachusetts, USA: Massachusetts Institute of Technology (MIT), pp. 849–56. MIT Press, 2001. [Google Scholar]
52. Polson NG, Scott JG, Windle J. Bayesian inference for logistic models using pólya–gamma latent variables. J Am Stat Assoc 2013; 108(504): 1339–49. [Google Scholar]
53. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc B Methodol 1977; 39(1): 1–38. [Google Scholar]
54. Girolami M. A Variational method for learning sparse and overcomplete representations. Neural Comput 2001; 13(11, 11): 2517–32. [DOI] [PubMed] [Google Scholar]
55. Blei DM, Kucukelbir A, McAuliffe JD. Variational inference: a review for statisticians. J Am Stat Assoc 2017; 112(518): 859–77. [Google Scholar]
56. Corless RM, Gonnet GH, Hare DEG, et al. On the lambertw function. Adv Comput Math 1996; 5(1): 329–59. [Google Scholar]
57. Mezo I, Baricz A. On the generalization of the lambert w function. Trans Am Math Soc 2017; 7(369): 7917–34. [Google Scholar]
58. Chang C, Tsay RS. Estimation of covariance matrix via the sparse cholesky factor with lasso. J Stat Plan Inference 2010; 140(12): 3858–73. [Google Scholar]
59. Saykin AJ, Shen L, Yao X, et al. Genetic studies of quantitative MCI and AD phenotypes in ADNI: progress, opportunities, and plans. Alzheimers Dement 2015; 11(7): 792–814. [DOI] [PMC free article] [PubMed] [Google Scholar]
60. Shen L, Thompson PM, Potkin SG, et al. Genetic analysis of quantitative phenotypes in AD and MCI: imaging, cognition and biomarkers. Brain Imaging Behav 2014; 8(2): 183–207. [DOI] [PMC free article] [PubMed] [Google Scholar]
61. Weiner MW, Veitch DP, Aisen PS, et al. Recent publications from the Alzheimer’s Disease Neuroimaging Initiative: reviewing progress toward improved AD clinical trials. Alzheimers Dement 2017; 13(4): e1–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
62. Pfeffer RI, Kurosaki TT, Harrah CH, Jr, et al. Measurement of functional activities in older adults in the community. J Gerontol 1982; 37(3): 323–9. [DOI] [PubMed] [Google Scholar]
63. Seshadri S, Fitzpatrick AL, Ikram MA, et al. Genome-wide analysis of genetic loci associated with Alzheimer disease. JAMA 2010; 303(18): 1832–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
64. Lambert JC, Ibrahim-Verbaas CA, Harold D, et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat Genet 2013; 45(12): 1452–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
65. Jansen IE, Savage JE, Watanabe K, et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat Genet 2019; 51(3): 404–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
66. Kunkle B, Benjamin G-B, Rebecca S, et al. Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates A, tau, immune and lipid processing. Nat Genet 2019; 51:414–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
67. Bellenguez C, Kucukali F, Jansen I, et al. New insights into the genetic etiology of Alzheimer’s disease and related dementias. Nat Genet 2020; 54: 412–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
68. Chang CC, Chow CC, Tellier LCAM, et al. Second-generation plink: rising to the challenge of larger and richer datasets. GigaScience 2015; 4:7. [DOI] [PMC free article] [PubMed] [Google Scholar]
69. Piñero J, Ramírez-Anguita JM, Saüch-Pitarch J, et al. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res 2019; 48(D1): D845–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
70. Glasser MF, Coalson TS, Robinson EC, et al. A multi-modal parcellation of human cerebral cortex. Nature 2016; 536(7615): 171–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
71. Mo Q, Shen R. iClusterPlus: integrative clustering of multi-type genomic data. R package version 2021; 1.28.0. http://52.71.54.154/packages/release/bioc/html/iClusterPlus.html. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[ref1] 1. Subramanian I, Verma S, Kumar S, et al. Multi-omics data integration, interpretation, and its application. Bioinform Biol Insights 2020; 14:1177932219899051. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref2] 2. Trambaiolli LR, Lorena AC, Fraga FJ, et al. Improving alzheimer’s disease diagnosis with machine learning techniques. Clin EEG Neurosci 2011; 42(3): 160–5. [DOI] [PubMed] [Google Scholar]

[ref3] 3. Kim M, Bao J, Liu K, et al. Structural connectivity enriched functional brain network using simplex regression with graphnet. In: Machine Learning in Medical Imaging. Cham: Springer International Publishing, 2020, 292–302. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref4] 4. Kim M, Bao J, Liu K, et al. A structural enriched functional network: an application to predict brain cognitive performance. Med Image Anal 2021; 71:102026. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref5] 5. Kochunov P, Zavaliangos-Petropulu A, Jahanshad N, et al. A white matter connection of schizophrenia and Alzheimer’s disease. Schizophr Bull 2020; 47(1): 197–206. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref6] 6. Sendi MSE, Zendehrouh E, Miller RL, et al. Alzheimer’s disease projection from normal to mild dementia reflected in functional network connectivity: a longitudinal study. Front Neural Circuits 2021; 14: 593263. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref7] 7. Sendi MSE, Zendehrouh E, Fu Z, et al. Disrupted dynamic functional network connectivity among cognitive control networks in the progression of alzheimer’s disease. Brain Connect 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref8] 8. Wan J, Zhang Z, Rao BD, et al. Identifying the neuroanatomical basis of cognitive impairment in alzheimer’s disease by correlation- and nonlinearity-aware sparse bayesian learning. IEEE Trans Med Imaging 2014; 33(7): 1475–87. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref9] 9. Madar IH, Sultan G, Tayubi IA, et al. Identification of marker genes in alzheimer’s disease using a machine-learning model. Bioinformation 2021; 17(2): 348–55. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref10] 10. Ferreira D, Nordberg A, Westman E. Biological subtypes of alzheimer disease. Neurology 2020; 94(10): 436–48. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref11] 11. Jellinger KA. Pathobiological subtypes of alzheimer disease. Dement Geriatr Cogn Disord 2021; 49:321–33. [DOI] [PubMed] [Google Scholar]

[ref12] 12. Lin D, Cao H, Calhoun VD, et al. Sparse models for correlative and integrative analysis of imaging and genetic data. J Neurosci Methods 2014; 237:69–78. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref13] 13. Zhu X, Yao J, Xiao G, et al. Imaging-genetic data mapping for clinical outcome prediction via supervised conditional gaussian graphical model. In: 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). New York City, USA: Institute of electrical and electronics engineers (IEEE), pp. 455–9, 2016.

[ref14] 14. Batmanghelich NK, Dalca A, Quon G, et al. Probabilistic modeling of imaging, genetics and diagnosis. IEEE Trans Med Imaging 2016; 35(7): 1765–79. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref15] 15. Shen L, Thompson PM. Brain imaging genomics: integrated analysis and machine learning. Proc IEEE 2020; 108(1): 125–62. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref16] 16. Shen L, Kim S, Qi Y, et al. Identifying neuroimaging and proteomic biomarkers for mci and ad via the elastic net. Multimodal Brain Image Analysis 2011; 7012:27–34. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref17] 17. Wang H, Nie F, Huang H, et al. Identifying disease sensitive and quantitative trait-relevant biomarkers from multidimensional heterogeneous imaging genetics data via sparse multimodal multitask learning. Bioinformatics 2012; 28(12): i127–36. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref18] 18. Wang D, Gu J. Integrative clustering methods of multi-omics data for molecule-based cancer classifications. Quant Biol 2016; 4(1): 58–67. [Google Scholar]

[ref19] 19. Shen R, Olshen AB, Ladanyi M. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 2009; 25(22): 2906–12. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref20] 20. Shen R, Wang S, Mo Q. Sparse integrative clustering of multiple omics data sets. Ann Appl Stat 2013; 7(1): 269–94. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref21] 21. Chauvel C, Novoloaca A, Veyre P, et al. Evaluation of integrative clustering methods for the analysis of multi-omics data. Brief Bioinform 2019; 21(2): 541–52. [DOI] [PubMed] [Google Scholar]

[ref22] 22. Chalise P, Fridley BL. Integrative clustering of multi-level ‘omic data based on non-negative matrix factorization algorithm. PLoS One 2017; 12(5): 1–18. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref23] 23. Ahmed KT, Sun J, Cheng S, et al. Multi-omics data integration by generative adversarial network. Bioinformatics 2021; 38(1): 179–86. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref24] 24. Mo Q, Wang S, Seshan VE, et al. Pattern discovery and cancer gene identification in integrated cancer genomic data. Proc Natl Acad Sci U S A 2013; 110(11): 4245–50. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref25] 25. Lock EF, Hoadley KA, Marron JS, et al. Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. Ann Appl Stat 2013; 7(1): 523–42. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref26] 26. Gaynanova I, Li G. Structural learning and integrative decomposition of multi-view data. Biometrics 2019; 75(4): 1121–32. [DOI] [PubMed] [Google Scholar]

[ref27] 27. Murray ME, Graff-Radford NR, Ross OA, et al. Neuropathologically defined subtypes of alzheimer’s disease with distinct clinical characteristics: a retrospective study. Lancet Neurol 2011; 10(9): 785–96. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref28] 28. Whitwell JL, Dickson DW, Murray ME, et al. Neuroimaging correlates of pathologically defined subtypes of alzheimer’s disease: a case-control study. Lancet Neurol 2012; 11(10): 868–77. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref29] 29. Neff RA, Wang M, Vatansever S, et al. Molecular subtyping of alzheimer’s disease using rna sequencing data reveals novel mechanisms and targets. Sci Adv 2021; 7(2): eabb5398. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref30] 30. Di Fede G, Catania M, Maderna E, et al. Molecular subtypes of alzheimer’s disease. Sci Rep 2018; 8(1): 3269. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref31] 31. Levin F, Ferreira D, Lange C, et al. Data-driven fdg-pet subtypes of alzheimer’s disease-related neurodegeneration. Alzheimer’s Res Ther 2021; 13(1): 49. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref32] 32. Qiang W, Yau WM, Jun-Xia L, et al. Structural variation in amyloid- fibrils from alzheimer’s disease clinical subtypes. Nature 2017; 541(7636): 217–21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref33] 33. Wong AK, Krishnan A, Yao V, et al. IMP 2.0: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks. Nucleic Acids Res 2015; 43(W1): W128–33. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref34] 34. Kanehisa M, Goto S. KEGG: Kyoto Encyclopedia of genes and genomes. Nucleic Acids Res 2000; 28(1): 27–30. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref35] 35. Kanehisa M. Toward understanding the origin and evolution of cellular organisms. Protein Sci 2019; 28(11): 1947–51. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref36] 36. Kanehisa M, Furumichi M, Sato Y, et al. KEGG: integrating viruses and cellular organisms. Nucleic Acids Res 2020; 49(D1): D545–51. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref37] 37. Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning–based sequence model. Nat Methods 2015; 12(10): 931–4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref38] 38. Zhou J, Theesfeld CL, Yao K, et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat Genet 2018; 50(8): 1171–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref39] 39. Krishnan A, Zhang R, Yao V, et al. Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nat Neurosci 2016; 19(11): 1454–62. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref40] 40. Greene CS, Krishnan A, Wong AK, et al. Understanding multicellular function and disease with human tissue-specific networks. Nat Genet 2015; 47(6): 569–76. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref41] 41. Li C, Li H. Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 2008; 24(9): 1175–82. [DOI] [PubMed] [Google Scholar]

[ref42] 42. Pan W, Xie B, Shen X. Incorporating predictor network in penalized regression with application to microarray data. Biometrics 2010; 66(2): 474–84. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref43] 43. Li F, Zhang NR. Bayesian variable selection in structured high-dimensional covariate spaces with applications in genomics. J Am Stat Assoc 2010; 105(491): 1202–14. [Google Scholar]

[ref44] 44. Stingo FC, Chen YA, Tadesse MG, et al. Incorporating biological information into linear models: a bayesian approach to the selection of pathways and genes. Ann Appl Stat 2011; 5(3): 1978–2002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref45] 45. Chang C, Kundu S, Long Q. Scalable bayesian variable selection for structured high-dimensional data. Biometrics 2018; 74(4): 1372–82. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref46] 46. Rockova V, Lesaffre E. Incorporating grouping information in Bayesian variable selection with applications in genomics. Bayesian Anal 2014; 9(1): 221–58. [Google Scholar]

[ref47] 47. Min EJ, Chang C, Long Qi. Generalized bayesian factor analysis for integrative clustering with applications to multi-omics data. In: 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA). New York City, USA: Institute of electrical and electronics engineers (IEEE), pp. 109–19, 2018. [DOI] [PMC free article] [PubMed]

[ref48] 48. Coniglio A, Nappi CR, Peruggi F, et al. Percolation and phase transitions in the ising model. Commun Math Phys 1976; 51(3): 315–23. [Google Scholar]

[ref49] 49. Dubes RC, Jain AK, Nadabar SG, Chen CC. Mrf model-based algorithms for image segmentation. In: [1990] Proceedings. 10th International Conference on Pattern Recognition. New York City, USA: Institute of electrical and electronics engineers (IEEE), pp. 808–14, vol. 1, 1990. [Google Scholar]

[ref50] 50. Chung FR. Spectral graph theory. Rhode Island, US: American Mathematical Soc, 1997. [Google Scholar]

[ref51] 51. Ng AY, Jordan MI, Weiss Y. On spectral clustering: Analysis and an algorithm. In: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, NIPS’01. Cambridge, Massachusetts, USA: Massachusetts Institute of Technology (MIT), pp. 849–56. MIT Press, 2001. [Google Scholar]

[ref52] 52. Polson NG, Scott JG, Windle J. Bayesian inference for logistic models using pólya–gamma latent variables. J Am Stat Assoc 2013; 108(504): 1339–49. [Google Scholar]

[ref53] 53. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc B Methodol 1977; 39(1): 1–38. [Google Scholar]

[ref54] 54. Girolami M. A Variational method for learning sparse and overcomplete representations. Neural Comput 2001; 13(11, 11): 2517–32. [DOI] [PubMed] [Google Scholar]

[ref55] 55. Blei DM, Kucukelbir A, McAuliffe JD. Variational inference: a review for statisticians. J Am Stat Assoc 2017; 112(518): 859–77. [Google Scholar]

[ref56] 56. Corless RM, Gonnet GH, Hare DEG, et al. On the lambertw function. Adv Comput Math 1996; 5(1): 329–59. [Google Scholar]

[ref57] 57. Mezo I, Baricz A. On the generalization of the lambert w function. Trans Am Math Soc 2017; 7(369): 7917–34. [Google Scholar]

[ref58] 58. Chang C, Tsay RS. Estimation of covariance matrix via the sparse cholesky factor with lasso. J Stat Plan Inference 2010; 140(12): 3858–73. [Google Scholar]

[ref59] 59. Saykin AJ, Shen L, Yao X, et al. Genetic studies of quantitative MCI and AD phenotypes in ADNI: progress, opportunities, and plans. Alzheimers Dement 2015; 11(7): 792–814. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref60] 60. Shen L, Thompson PM, Potkin SG, et al. Genetic analysis of quantitative phenotypes in AD and MCI: imaging, cognition and biomarkers. Brain Imaging Behav 2014; 8(2): 183–207. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref61] 61. Weiner MW, Veitch DP, Aisen PS, et al. Recent publications from the Alzheimer’s Disease Neuroimaging Initiative: reviewing progress toward improved AD clinical trials. Alzheimers Dement 2017; 13(4): e1–85. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref62] 62. Pfeffer RI, Kurosaki TT, Harrah CH, Jr, et al. Measurement of functional activities in older adults in the community. J Gerontol 1982; 37(3): 323–9. [DOI] [PubMed] [Google Scholar]

[ref63] 63. Seshadri S, Fitzpatrick AL, Ikram MA, et al. Genome-wide analysis of genetic loci associated with Alzheimer disease. JAMA 2010; 303(18): 1832–40. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref64] 64. Lambert JC, Ibrahim-Verbaas CA, Harold D, et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat Genet 2013; 45(12): 1452–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref65] 65. Jansen IE, Savage JE, Watanabe K, et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat Genet 2019; 51(3): 404–13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref66] 66. Kunkle B, Benjamin G-B, Rebecca S, et al. Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates A, tau, immune and lipid processing. Nat Genet 2019; 51:414–30. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref67] 67. Bellenguez C, Kucukali F, Jansen I, et al. New insights into the genetic etiology of Alzheimer’s disease and related dementias. Nat Genet 2020; 54: 412–36. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref68] 68. Chang CC, Chow CC, Tellier LCAM, et al. Second-generation plink: rising to the challenge of larger and richer datasets. GigaScience 2015; 4:7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref69] 69. Piñero J, Ramírez-Anguita JM, Saüch-Pitarch J, et al. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res 2019; 48(D1): D845–55. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref70] 70. Glasser MF, Coalson TS, Robinson EC, et al. A multi-modal parcellation of human cerebral cortex. Nature 2016; 536(7615): 171–8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref71] 71. Mo Q, Shen R. iClusterPlus: integrative clustering of multi-type genomic data. R package version 2021; 1.28.0. http://52.71.54.154/packages/release/bioc/html/iClusterPlus.html. [Google Scholar]

PERMALINK

Integrative analysis of multi-omics and imaging data with incorporation of biological information via structural Bayesian factor analysis

Jingxuan Bao

Changgee Chang

Qiyiwen Zhang

Andrew J Saykin

Li Shen

Qi Long

Abstract

Motivation

Method

Results

Availability

Contact

Introduction

Methodology

Figure 1.

Notation

Structural Bayesian factor analysis model

Priors

Priors for W and Z

Prior for : incorporating graphical knowledge

Prior for : incorporating biological knowledge

Model for heterogeneous data types

Table 1.

Feature representation

Parameter estimation and computation

Initialization

Hyperparameter tunning

Analysis of ADNI Data

Data description and data preprocessing

Table 2.

Demographic data preprocessing and cognitive score

Genotyping data preprocessing

Gene expression data preprocessing

Imaging data preprocessing

Biological network

Real data analysis

Figure 2.

Table 3.

Simulation Study

Simulation design

Figure 3.

Results

Figure e1.

Table 4.

Table 5.

Conclusion

Key Points

Acknowledgements

Contributor Information

Data availability

Author contributions statement

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases