Abstract
It’s increasingly important but difficult to determine potential biomarkers of schizophrenia disease, owing to the complex pathophysiology of this disease. In this study, a network-fusion based framework was proposed to identify genetic biomarkers of complex diseases. Genomic, epigenomic and neuroimaging data were integrated by network fusion. A three-step feature selection was applied to single nucleotide polymorphisms (SNPs), DNA methylation and functional magnetic resonance imaging (fMRI) data to select Important features, which were then used to construct two gene networks in different states for the SNPs and DNA methylation data, respectively. Two health networks (one is for SNP data and the other is for DNA methylation data) were combined into one health network from which health minimum spanning trees (MSTs) were extracted. And two disease networks were also the same. Those genes with significant changes were determined as SCZ biomarkers by comparing MSTs in two different states and they were finally validated from five aspects. The effectiveness of the proposed discovery framework was also demonstrated by comparing with other network-based discovery methods. In summary, our approach provides a general framework for discovering gene biomarkers of the complex diseases, which can be applied to the diagnosis of the complex diseases in future.
Keywords: biomarker discovery, data integration, imaging genomics, minimum spanning tree, schizophrenia
1. Introduction
Biomarkers are biological measures related to a disease state. The identification of biomarkers canfacilitate disease prognosis, diagnosis and management. The biomarkers are biological measures correlating with a disease state. Once validated, they can be used to ameliorate clinical decision-making and diseases’ early detection. Additionally, they can help decipher disease mechanisms and be used as substitute markers of drug efficacy to reduce the costs in the pipeline of a drug discovery and development. Consequently, it is important but challenging to discover the biomarkers of complex diseases, such as schizophrenia (SCZ), due to their unknown and complicated mechanisms. The SCZ is much more complicated compared with other diseases [1] and ranks among the top 10 causes of disability in developed countries throughout the world [2]. At present, the SCZ diagnosis depends on the self-report of subjective patients; meanwhile the so-called prodromal patients cannot be decisively diagnosed by so far [3]–[5]. Biomarker detection for the SCZ can assist in the SCZ diagnosis and drug development.
Neuroimaging techniques provide efficient and noninvasive ways to studying brain abnormal activity, which is helpful for complex mental disorder (e.g. SCZ) research [6]–[12]. On the other hand, genetic factors are believed to have influence on the SCZ in that the SCZ is high heritable (concordance rates in monozygotic twins reach approximately 50% [9]). Therefore, imaging genomic studies, which combine functional neuroimaging and genomics data, become essential for discovering genetic biomarkers underlying the biological mechanisms relevant to abnormal brain activities or development. Currently, there have been a large number of imaging genomic studies trying to discover risk genes, brain regions of interest or the correspondence between them in SCZ [13]–[22]. All these imaging genomic studies employed data integration methods to make good use of the complementary information from multiple sources of data, such as genomic, transcriptomic, and imaging data.
It is well known that biological network can be used as an important tool to study the biological mechanisms from a system perspective [23]–[34]. Deng et al. identified some cancer-associated genes by building and analyzing differential co-expression networks related to cancer [35], [36]. In [37] cellular networks, gene expression, and genomic data were combined to discover causal genetic drivers of human disease by analyzing regulatory networks at system level. Guelzim et al. [38] discovered the causal structure of the yeast transcriptional regulatory network. A novel computational method was developed by Kim et al. [39] to simultaneously identify causal genes and dysregulated pathways of complex diseases. Vanunu et al. [40] proposed a global, network-based method, called PRINCE, to prioritize causal genes of a disease and infer protein complex associations. A novel network-based approach was proposed by Wen et al. [41] to investigate putative causal module biomarkers of complex diseases through the integration of heterogeneous information, e.g., gene expression data, epigenomic data, and protein-protein interaction networks. Tu et al. [42] combined protein phosphorylation, genotype information, protein-protein interaction, gene expression, and transcription factor (TF)-DNA binding information and developed a network based stochastic algorithm to discover causal genes and underlying regulatory pathways. It is demonstrated by all these examples that the biological networks are effective tools for intergration analysis.
As aforesaid, the SCZ is a genetic disease with abnormal brain activity. Also, it is related to environmental factors [43], [44]. Therefore, our lab collected three types of measurement data for the SCZ including Functional mgnetic resonance imaging (fMRI), Single nucleotide polymorphisms (SNPs), and DNA methylation. The fMRI is a neuroimaging procedure that uses MRI technology measuring brain activity by detecting changes associated with blood flow. In research of language abnormalities, the fMRI can explicitly estimate the assumption that the normal lateralization of language is reversed in SCZ [45]. The SNPs are most common type of genetic variations. They can act as biomarkers to help researchers detect the genes associated with genetic diseases [46], [47]. DNA methylation is a process in which methyl groups are added to particular DNA segments, without changing the DNA sequence. In the course of life, aging processes, environmental influences and life-style factors such as diet or smoking induce biochemical alterations to the DNA, which frequently lead to DNA methylation. The DNA methylation is one of several epigenetic mechanisms and can be utilized by cells to control gene expression. Although there are many mechanisms to control the gene expression in eukaryotes, the DNA methylation is a commonly used epigenetic signaling tool, which can fix genes in the “off “position [48].
Thus, the SNPs, DNA methylation and fMRI data were comprehensively used by network-based methods to discover gene biomarkers associated with the schizophrenia disease in this study. After SNPs preselected according to Kyoto Encyclopedia of Genes and Genomes (KEGG) [49] pathways, multiple sparse CCA (msCCA) [50] was applied to these three data sets to filter SNP loci and methylation sites that have strong correlations with fMRI voxels. For the selected SNP loci, corresponding genes were found and each gene profile was represented by the average of the profiles of the corresponding SNP loci. We then generated two similarity matrices under two different conditions (i.e., diseases and healthy controls). Specifically, for each condition, we builded a gene-gene network with nodes representing genes and edge strength representing the similarity between a gene pair. We performed the same procedure on the selected methylation sites and obtained two gene-gene networks under two different conditions from DNA methylation data. The two gene networks from SNPs data was fused with the ones from methylaiton data under the same condition, respectively. In the end, fused SCZ gene network and fused healthy network were constructed and compared to find significantly differential genes, which would be used as biomarkers for schizophrenia. These identified gene biomarkers were further validated with disease association analysis, functional enrichment analysis, pathway enrichment analysis, schizophrenia associated database, and literature studies. Moreover, the performance of our framework was demonstrated through the comparison with other network-based discovery methods.
The remainder of this paper is structured as follows: Section 2 introduces several main methods used in our proposed framework for disease gene biomarkers discovery. Section 3 descibes the datasets collection and then presents results and discussions including the identified gene biomarkers and their validation from five aspects, the comparisons with other network-based methods and related discussions. Section 4 summarizes this study, including the advantages and disadvantages of our framework.
2. Methods
2.1. Multiple sparse CCA for feature selection
Canonical correlation analysis (CCA) [51] is a method seeking two most correlating linear projections of two different datasets. CCA is a multi-variate method that can provide a better significance power. It has been widely used to analyze correlation pattern between two datasets collected from a same group of samples. Sparse CCA [52], [53] was developed in order to solve over-fitting problem of conventional CCA when dealing with large scale but small sample size data, e.g., genetic data. Also, feature selection can be conducted by sparse CCA resulting from its sparse output. In addition, the sparse output of sparse CCA facilitates result interpretation. Its objective function can be expressed as
| (1) |
where X ∈ ℝn×p and Y ∈ ℝn×q are two different data with the same sample size n; p and q indicate feature sizes; u and v are canonical loading vectors of X and Y respectively; λ1 and λ2 control the sparsity penalty of u and v, respectively. For our work, since we want to analyze the relationship among SNPs, DNA methylation, and fMRI data, a sparse CCA model that can deal with three or more datasets is needed. D. M. Witten [54] extended two-way sparse CCA to multiple sparse CCA in order to tackle multiple modalities data integration. The objective function of multiple sparse CCA is
| (2) |
where m is the number of modalities that we want to integrate; Xi and Xj are the ith and jth data, respectively; λi controls the sparsity level of the ith canonical vector ui.
Cross-validation is a popular tool for tuning parameter selection, i.e. choosing λi. However, tuning parameter selection is very unstable partly due to the small sample size. Moreover, due to the L2 norm constraint on ui and heterogeneous datasets, parameter selection is very sensitive, that is, a tiny change of will lead to a huge change of the sparsity of, and sometimes this change is even non-monotonic, making it difficult to get an interpretable result. To address this problem, we introduced an iterative parameter selection procedure which was easier to handle during parameter selection and worked well for real data analysis. The relevant work can be found in [50]. In this paper, we used this modified multiple sparse CCA (msCCA) to reduce the dimensionality of the feature space for SNPs data and DNA methylation data. In other words, we used msCCA to select the SNP loci and methylation sites strongly correlated with endophenotypes (e.g., fMRI voxels).
2.2. Fusion of heterogeneous genomic data
This procedure includes the following two steps: graph construction and graph consolidation.
2.2.1. Graph Construction
Before constructing a gene-gene graph, we constructed a gene-based matrix from SNPs and DNA methylation data, respectively. Each gene’s profile is denoted by the average value of the profile of its corresponding SNP loci or methylation sites. Two state (SCZ disease and healthy control) similarity matrices were separately computed for SNPs and DNA methylation data. Also, the similarity matrix is represented as a graph or a network with its nodes representing genes and its edges indicating the relationships between them. In the gene-gene interaction network built in this study, given a gene i(i = 1, 2,..., n), and ρ2(i,j) denotes the Euclidean distance between gene i and gene j. The weight coefficient Wij, which is the edge strength between gene i and gene j, is introduced into the similarity matrix. A Gaussian function of Euclidean distance between genes was used to calculate edge strength:
| (3) |
If i is in j’s k-nearest-neighborhood or vice versa, nodes i, j can be connected by an edge. The number of a gene’s neighbors is about one-tenth of the gene number. The hyperparameter σ was experimentally set to be σ2 = 510.
2.2.2. Graph fusion
Multiple graphs, denoted by Gm = (Vm, Em, wm), are constructed from different sources of data. They can be fused together by appending new nodes or consolidating edge weights of existing nodes in the resulting graph
| (4) |
where wm(i,j) = 0 for (i,j) ∉ Em. With this graph fusion method, the individual graphs (SNPs and DNA methylation) for healthy controls can be combined as a fused graph. A similar procedure can be used to construct a fused graph for the SCZ disease state. Thus we can build two fused networks for both healthy control state and the SCZ disease state. The nodes in each fused graph consist of SNP-specific genes, methylation- specific genes and the common genes shared by both SNPs and methylation data. In each fused matrix, the similarity values between the SNP-specific genes or the methylation-specific genes are separately derived from SNPs or methylation data and the similarity values between the common genes are the sum of the similarity values from SNPs and methylation data.
2.3. Minimum spanning trees (MSTs)
A spanning tree of an undirected graph is just a subgraph consisting of all the vertices and some (or perhaps all) of edges in the graph. A single graph may have many different spanning trees. Each edge is assigned a weight and the weight of a spanning tree is defined as the sum of the weights of the edges in that spanning tree. Then, a minimum spanning tree (MST) is the one with the minimal total weights of all its edges.
In the original context of connection networks, a graph from which a shortest spanning subtree is extracted is a complete graph, that is, the graph has an edge between every pair of vertices. It is natural now to generalize the original problem by seeking shortest spanning subtrees from arbitrary connected labelled graphs. More generally, any undirected graph (not necessarily connected) has a minimum spanning forest, which is a union of the minimum spanning trees (MSTs) for its conneted components. MST is motivated by finding a low-cost network connecting a subset of nodes. If an edge weight is the distance between two nodes in a graph, then the MST in the graph is the subgraph highlighting the most highly correlated nodes. MSTs are useful in a number of seemingly disparate applications [55], [56].
The first algorithm for finding a MST was developed by Otakar Boruvka [57]. The second algorithm was Prim’s algorithm [58] that provided several simple and practical procedures to construct shortest connection networks and it was implemented in an R package”igraph” [59], which was used in this study. There exist several more computationally-efficient algorithms for finding the minimum spanning trees of a graph [60], [61]. Several simple and practical procedures were provided by Prim [58] to construct shortest connection networks.
3. Experiments and Discussions
3.1. Overview of the approach
The overview of our framework was showed in Fig. 1. The proposed framework mainly consists of the following procedures: feature selection, fused graphs construction, disease genes discovery and disease genes validation. For the purpose of reducing the feature dimensionality, a three-step feature selection was sequentially conducted based on KEGG [49] pathway information, phenotype (health and disease) and endophenotype (e.g., fMRI) and then 744 SNP loci and 69 DNA methylation sites were obtained. There were 370, and 67 genes corresponding to the selected SNP loci, and methylation sites, respectively. Next, two fused gene networks under two different conditions were built as follows. For SNPs data, a gene profile can be generated by computing the average value of its corresponding SNP loci and the similarity matrix was calculated for each of the two conditions (e.g., healthy and disease states). The same procedure was applied to the selected DNA methylation data. Then two similarity matrices for the healthy state from SNPs and methylation were combined into a fused similarity matrix for healthy controls. Similarly, a fused similarity matrix was constructed for disease state. Two fused similarity matrices were converted to distance matrices from which two sets of MSTs were extarcted. After that, we compared nodes’ connectivity to all other nodes in MSTs between the healthy controls and the diseases. Those genes with significant changes were believed to be susceptible to the SCZ disease. Finally, the genes were validated according to disease enrichment analysis, GO and pathway enrichment analysis, SCZ-associated genes database and literature search, respectively.
Fig. 1:
The schematic of our method for identifying gene biomarkers for schizophrenia.
3.2. Datasets used
In this study, participant recruitment and data collection were conducted at the Mind Research Network. Three types of data (SNP, fMRI, and DNA methylation) were collected from 208 subjects including 112 healthy controls (age: 32 ± 11, 44 females) and 96 SCZ patients (age: 34±11, 22 females). All of them were provided written informed consents. Healthy participants had no medical, neurological or psychiatric illnesses and history of substance abuse. By the clinical interview of patients for DSM IV-TR Disorders or the Comprehensive Assessment of Symptoms and History, patients met criteria for DSM-IV-TR schizophrenia. Antipsychotic history was also collected as part of the psychiatric assessment. Through a series of quality controls, we selected 183 subjects, including 103 healthy controls (age: 32 ± 11, 37 females and 66 males) and 80 SCZ cases (age: 34± 11, 20 females and 60 males). After pre-processing, 27508 DNA methylation sites, 41236 fMRI voxels and 722,177 SNP loci were obtained for the subsequent biomarkers selection.
3.2.1. SNPs data collection
First, a blood sample was obtained and DNA was extracted for each participant. Genotyping for all participants was performed at the Mind Research Network using the Illumina Infinium HumanOmni1-Quad assay covering 1,140,419 SNP loci. Bead Studio was used to make the final genotype calls. Next, the PLINK software package (http://pngu.mgh.harvard.edu/~purcell/plink) was utilized to conduct a series of standard quality control procedures, resulting in the final dataset spanning 722,177 SNP loci. Each SNP was categorized into three clusters based on their genotype and was represented with discrete numbers: 0 for ‘BB’ (no minor allele), 1 for ‘AB’s (one minor allele) and 2 for ‘AA ‘(two minor alleles).
3.2.2. fMRI data collection
The fMRI data were collected during a sensorimotor task, a block-design motor response to auditory stimulation. During the on-block, 200msec tones were presented with a 500msec stimulus onset asynchrony (SOA). A total of 16 different tones were presented in each on-block, with frequency ranging from 236 Hz to 1318 Hz. The fMRI images were acquired on Siemens 3T Trio Scanners and a 1.5T Sonata with echo-planar imaging (EPI) sequences using the following parameters (TR = 2000msec, TE = 30msec (3.0T)/40msec (1.5T), field of view = 22cm, slice thickness = 4mm, 1mm skip, 27 slices, acquisition matrix = 64 × 64, flip angle = 90°.). Data were pre-processed in SPM5 (http://www.fil.ion.ucl.ac.uk/spm) and were realigned, spatially normalized and resliced to 3 × 3 × 3 mm, smoothed with a 10 × 10 × 10mm3 Gaussian kernel to reduce spatial noise, and analyzed by multiple regression considering the stimulus and their temporal derivatives plus an intercept term as regressors. Finally the stimulus-on versus stimulus-off contrast images were extracted with 53 × 63 × 46 voxels and all the voxels with missing measurements were excluded.
3.2.3. DNA methylation data collection
DNA from the blood samples was assessed by the Illumina Infinium Methylation 27 Assay. A methylation value, beta (β), represents the ratio of the methylated probe intensity to the total probe intensity. A series of quality controls (QC) on the beta values were applied to remove bad samples and probes, such as 1) Beta value QC: Change any beta value to NaN, if p>0.05. 2) Bad sample/ bad marker removing: Samples with >5% of missing (NaN) values; markers with >5% of missing (NaN) values. This resulted in the identification of good methylation data from 224 subjects, 27508 markers (some have missing values <5%). After QC, we used the K nearest neighbor (KNN) [62] method to impute for the missing values.
3.3. Feature selection
A three-step feature selection was performed to reduce the feature dimensionality. At the first step, KEGG [49] pathways information (Figure 1) was used to preselect SNPs and DNA methylation data, obtaining 147,825 SNP loci and 6,935 methylation sites, respectively. At the second step, the preselected SNP loci and methylation sites were screened with significant difference between healthy and disease state using t-test (p-value=0.05), respectively. At the third step, msCCA was applied to the output of the second step to futher select the SNP loci and methylation sites strongly correlated with fMRI voxels, generating 744 SNP loci and 69 methylation sites. The selected SNP loci and methylation sites have 370 and 67 corresponding genes, respectively. Among them, 12 common genes were shared by the SNPs and methylation data and thus 358 SNP-specific genes and 55 methylation-specific genes, which are showed in Venn diagram (Figure 2).
Fig. 2:
The Venn diagram of SNP-Genes and Methylation-Genes. SNP-Genes and Methy-Genes represent the corresponding genes of selected SNPs data and DNA methylation data, respectively.
3.4. Identification of SCZ genes
We want to find subgraphs highlighting the most highly correlated genes, so the two fused similarity matrices under two different conditions were converted to distance matrices.
The element Dij in a distance matrixcan be difined as
| (5) |
The MSTs were extracted from each of two state distance matrices and they represent the strongest correlation paths in the similarity network.
By comparing the two MSTs for the two states, i.e,. healthy and disease, we identified some significant SCZ gene biomarkers. In detail, a gene’s connection values with all the other nodes were compared between the healthy and disease state using a paired t test and multiple testing correction (Benjamini-Hochberg, BH [63]). 342 nodes (genes) with significant difference (Figure 5) (adjP-value≤0.01) were identified as gene biomarkers for SCZ.
Fig. 5:
Significant genes identified(yellow ellipses) by CMSTD in the normal network
3.5. Validation of SCZ genes
The identified SCZ genes were validated using the following approaches: disease association analysis, gene functional enrichment analysis, pathway enrichment analysis, a SCZ database and literature search in Pubmed. Two widely used online tools are WebGestalt [64] and CPDB (Consensus PathDB) [65]. WebGestalt [64] is a “WEB-based GEne SeT AnaLysis Toolkit”, which incorporates information from different public resources, provides biologists an easy way to make sense out of gene lists, and can be used to perform disease association analysis. CPDB-human integrates interaction networks in Homo sapiens including binary and complex protein-protein, genetic, metabolic, signaling, gene regulatory and drug-target interactions, as well as biochemical pathways. Data originate from currently 32 public resources for interactions and interactions that have been curated from the literature.
3.5.1. Disease association analysis
We performed disease association analysis using online tool WebGestalt (http://www.webgestalt.org/) [64]. For “disease association analysis”, the disease terms were downloaded from PharmGKB(1/26/2013), and genes associated with individual disease terms were inferred using GLAD4U (1/26/2013).
Among the 342 identified genes, there are 34 genes significantly enriched in mental disorders (adjP=6.84e-18, Benjamini-Hochberg) and 23 genes significantly enriched in the schizophrenia disease (Figure 6)(adjP=1.17e-12, Benjamini-Hochberg) and 17 genes significantly enriched in the depression disease(adjP=6.84e-18, Benjamini-Hochberg)(Table 1).
Fig. 6:
The significant enriched genes (yellow circles) by disease association analysis in the normal network.
TABLE 1:
The results of disease association studies for the genes identified by CMSTD
| Disease | #Gene | Gene symbol | Statistics |
|---|---|---|---|
| Mental Disorders | 34 | PLA2G4A, GRIK2, CRH, GRIN3A, CNTNAP2, NRXN1, ERBB4, GRM5, HTR1A, CACNA1C, GRIN2A, NTNG1, ADRA2A, SLC1A2, GABRG3, NR3C1, AMHR2, NLGN1, GRIK1, GRIN2B, NRXN3, DGKH, GAD2, GABRB3, NRG3, PAK3, CR1, PTGDS, CHAT, CHRM5, MAOA, HTR2A, RPS6KA3, FGD1 | rawP=4.19e-20;adjP=6.84e-18 |
| Schizophrenia | 23 | PLA2G4A, GRIN3A, CNTNAP2, NRXN1, ERBB4, GRIN2B, GRM1, HTR1A, GAD2, NRG3, GABRB3, CACNA1C, PTGDS, CHRM5, MAOA, CACNA1B, GRIN2A, NTNG1, HTR2A, ADRA2A, GABRG3, SPTAN1, SLC1A2 | rawP=1.67e-14;adjP=1.17e-12 |
| Depression | 17 | NR3C1, GRIK2, CACNA1A, NLGN1, CRH, HTR1F, GRM5, GRIN2B, PDE11A, HTR1A, GABRB3, CACNA1C, MAOA, GRIN2A, VIPR2, HTR2A, ADRA2A | rawP=1.23e-13;adjP=7.09e-12 |
| Psychotic Disorders | 10 | HTR1A, GABRB3, MAOA, NLGN1, CRH, NRXN1, MAGI2, HTR2A, CHRM3, GABRG3 | rawP=7.81e-08;adjP=8.23e-07 |
| Nervous System Diseases | 24 | PDHA1, PARK2, BCKDHB, VHL, CACNA1A, CNTNAP2, CLDN14, MTR, TK2, GRIN2B, PRKCG, FGF14, GABRB3, PAK3, CR1, PTGDS, CHAT, PPP2R2B, PDE4D, SPTLC1, HEXA, HTR2A, RPS6KA3, SLC1A2 | rawP=1.92e-09;adjP=3.14e-08 |
| Central Nervous System Diseases | 16 | PDHA1, PARK2, BCKDHB, CACNA1A, GRIN2B, FGF14, GABRB3, PRKCG, CR1, CHAT, PDE4D, PPP2R2B, GRIN2A, HEXA, HTR2A, SLC1A2 | rawP=4.45e-07;adjP=3.41e-06 |
3.5.2. GO enrichment analysis
We employed CPDB [65] to conduct Gene ontology (GO) [66] enrichment analysis. For each of three ontologies including biological process (BP), cellular component (CC), and molecular function (MF), we selected the three most significant terms and neuro-associated terms(Table 2). It can be seen that a large portion of the identified SCZ genes are involved in single-organism cellular process (295 genes), plasma membrane (182 genes) and cell periphery (183 genes). What’s more, there are some SCZ genes associated with neurogenesis (75 genes), neuron development (57 genes), neuron differentiation (65 genes), and neuron part (54 genes).
TABLE 2:
The results of GO enrichment analysis for the genes identified by CMSTD
| gene ontology term | category, level | set size | candidates contained | p-value | q-value |
|---|---|---|---|---|---|
| GO:0044763 single-organism cellular process | BP 2 | 10315 | 295 (2.9%) | 2.81E-42 | 2.61E-40 |
| GO:0007166 cell surface receptor signaling pathway | BP 3 | 2697 | 142 (5.3%) | 3.61E-37 | 1.62E-34 |
| GO:0006793 phosphorus metabolic process | BP 3 | 3174 | 149 (4.7%) | 1.44E-33 | 3.23E-31 |
| GO:0044459 plasma membrane part | CC 2 | 2583 | 129 (5.0%) | 9.44E-31 | 7.08E-29 |
| GO:0005886 plasma membrane | CC 2 | 5069 | 182 (3.6%) | 1.49E-27 | 5.57E-26 |
| GO:0071944 cell periphery | CC 2 | 5176 | 183 (3.5%) | 6.77E-27 | 1.69E-25 |
| GO:0005102 receptor binding | MF3 | 1512 | 76 (5.0%) | 2.73E-17 | 3.00E-15 |
| GO:0016772 transferase activity, transferring phosphorus-containing groups | MF3 | 1025 | 56 (5.5%) | 2.27E-14 | 1.25E-12 |
| GO:0016301 kinase activity | MF4 | 869 | 51 (5.9%) | 2.63E-14 | 2.26E-12 |
| GO:0022008 neurogenesis | BP 5 | 1433 | 75 (5.2%) | 4.89E-18 | 1.19E-15 |
| GO:0048666 neuron development | BP 4 | 970 | 57 (5.9%) | 6.94E-16 | 4.46E-14 |
| GO:0030182 neuron differentiation | BP 5 | 1231 | 65 (5.3%) | 8.43E-16 | 8.57E-14 |
| GO:0031175 neuron projection development | BP 4 | 830 | 49 (5.9%) | 8.22E-14 | 2.86E-12 |
| GO:0043005 neuron projection | CC 3 | 957 | 48 (5.0%) | 5.56E-11 | 1.49E-09 |
| GO:0097485 neuron projection guidance | BP 4 | 224 | 21 (9.4%) | 5.19E-10 | 7.60E-09 |
| GO:0097458 neuron part | CC 2 | 1288 | 54 (4.2%) | 2.11E-09 | 1.76E-08 |
| GO:0043025 neuronal cell body | CC 3 | 440 | 23 (5.2%) | 3.77E-06 | 4.49E-05 |
| GO:1902284 neuron projection extension involved in neuron projection guidance | BP 5 | 45 | 7 (15.6%) | 1.27E-05 | 0.000101 |
| GO:0030594 neurotransmitter receptor activity | MF3 | 69 | 8 (11.6%) | 2.81E-05 | 0.000193 |
| GO:0007272 ensheathment of neurons | BP 3 | 112 | 10 (8.9%) | 2.91E-05 | 0.000127 |
| GO:0032589 neuron projection membrane | CC 4 | 37 | 6 (16.2%) | 4.20E-05 | 0.000541 |
| GO:0070997 neuron death | BP 4 | 288 | 16 (5.6%) | 5.61E-05 | 0.000285 |
| GO:0007269 neurotransmitter secretion | BP 2 | 149 | 11 (7.5%) | 6.06E-05 | 0.00012 |
| GO:1990138 neuron projection extension | BP 4 | 151 | 11 (7.3%) | 7.73E-05 | 0.00038 |
| GO:0051402 neuron apoptotic process | BP 5 | 216 | 13 (6.0%) | 0.000126 | 0.000695 |
| GO:0007158 neuron cell-cell adhesion | BP 4 | 16 | 4 (25.0%) | 0.000146 | 0.000669 |
| GO:0001505 regulation of neurotransmitter levels | BP 3 | 194 | 12 (6.2%) | 0.00016 | 0.000577 |
| GO:0001764 neuron migration | BP 4 | 139 | 10 (7.2%) | 0.000181 | 0.000802 |
| GO:1901214 regulation of neuron death | BP 5 | 255 | 14 (5.5%) | 0.000183 | 0.000921 |
| GO:0006836 neurotransmitter transport | BP 4 | 198 | 12 (6.1%) | 0.000194 | 0.000853 |
| GO:0021954 central nervous system neuron development | BP 5 | 71 | 7 (9.9%) | 0.000251 | 0.0012 |
| GO:0008038 neuron recognition | BP 4 | 33 | 5 (15.2%) | 0.000261 | 0.00108 |
| GO:0098984 neuron to neuron synapse | CC 2 | 150 | 10 (6.7%) | 0.000317 | 0.000881 |
| GO:0097150 neuronal stem cell population maintenance | BP 4 | 20 | 4 (20.0%) | 0.000367 | 0.0014 |
| GO:0021834 chemorepulsion involved in embryonic olfactory bulb interneuron precursor migration | BP 5 | 3 | 2 (66.7%) | 0.000918 | 0.00356 |
| GO:0042133 neurotransmitter metabolic process | BP 3 | 27 | 4 (14.8%) | 0.00121 | 0.00295 |
| GO:0098878 neurotransmitter receptor complex | CC 3 | 46 | 5 (10.9%) | 0.00125 | 0.00535 |
| GO:0038179 neurotrophin signaling pathway | BP 4 | 30 | 4 (13.3%) | 0.00181 | 0.00514 |
| GO:0098908 regulation of neuronal action potential | BP 4 | 4 | 2 (50.0%) | 0.00181 | 0.00514 |
| GO:0099601 regulation of neurotransmitter receptor activity | BP 4 | 31 | 4 (12.9%) | 0.00205 | 0.00564 |
| GO:0050877 neurological system process | BP 3 | 1300 | 37 (2.9%) | 0.00257 | 0.00554 |
| GO:0097109 neuroligin family protein binding | MF4 | 5 | 2 (40.0%) | 0.00299 | 0.0119 |
| GO:0023041 neuronal signal transduction | BP 3 | 7 | 2 (28.6%) | 0.00613 | 0.0112 |
| GO:0099528 G-protein coupled neurotransmitter receptor activity | MF4 | 7 | 2 (28.6%) | 0.00613 | 0.0205 |
| GO:0051386 regulation of neurotrophin TRK receptor signaling pathway | BP 5 | 7 | 2 (28.6%) | 0.00613 | 0.016 |
| GO:0048011 neurotrophin TRK receptor signaling pathway | BP 5 | 22 | 3 (13.6%) | 0.00652 | 0.0168 |
| GO:0031644 regulation of neurological system process | BP 5 | 69 | 5 (7.2%) | 0.00738 | 0.0184 |
| GO:0070050 neuron cellular homeostasis | BP 4 | 8 | 2 (25.0%) | 0.00808 | 0.0174 |
| GO:0044306 neuron projection terminus | CC 3 | 131 | 7 (5.3%) | 0.00861 | 0.0263 |
| GO:0050905 neuromuscular process | BP 4 | 101 | 6 (5.9%) | 0.00895 | 0.019 |
3.5.3. Pathway enrichment analysis
We got the results of pathway enrichment analysis using CPDB [65] over-representation analysis for the identified genes. The three most significantly enriched pathways are “Pathways in cancer” (43 genes, q-value=1.83e-12, KEGG), “Axon guidance” (27 genes, q-value=5.26e-10, KEGG)and “Regulation of actin cytoskeleton” (28 genes, q-value=1.39e-12, KEGG). Also, some genes are significantly enriched in neuro-associated pathways (Table 3). For example, 25 genes are significantly enriched in the pathway “Neuroactive ligand-receptor interaction” (q-value=3e-06, KEGG), and 27 genes are significantly enriched in the pathway “Neuronal System” (q-value=1.5e-05, Reactome).
TABLE 3:
The results of pathway enrichment analysis for the genes identified by CMSTD
| pathway Name | set size | candidates contained | p-value | q-value | pathway source |
|---|---|---|---|---|---|
| Pathways in cancer - Homo sapiens (human) | 397 | 43 (10.8%) | 1.21E-15 | 1.83E-12 | KEGG |
| Axon guidance - Homo sapiens (human) | 177 | 27 (15.3%) | 8.00E-14 | 4.03E-11 | KEGG |
| Regulation of actin cytoskeleton - Homo sapiens (human) | 214 | 28 (13.1%) | 1.39E-12 | 5.26E-10 | KEGG |
| Neuroactive ligand-receptor interaction - Homo sapiens (human) | 278 | 25 (9.1%) | 5.74E-08 | 3.00E-06 | KEGG |
| Neuronal System | 351 | 27 (7.7%) | 4.48E-07 | 1.50E-05 | Reactome |
| Interactions of neurexins and neuroligins at synapses | 59 | 9 (15.3%) | 1.94E-05 | 0.000283 | Reactome |
| Brain-Derived Neurotrophic Factor (BDNF) signaling pathway | 144 | 14 (9.7%) | 2.29E-05 | 0.00032 | Wikipathways |
| Neurotransmitter Receptor Binding And Downstream Transmission In The Postsynaptic Cell | 147 | 13 (8.9%) | 0.000111 | 0.00111 | Reactome |
| Sympathetic Nerve Pathway (Neuroeffector Junction) | 23 | 4 (17.4%) | 0.00269 | 0.0118 | PharmGKB |
3.5.4. SCZ database
We also validated the identified genes with a SCZ database “SZGene” [67]. Among the identified 342 significant genes, 65 genes were found to be associated with the SCZ disease (Figure 7).
Fig. 7:
The validated significant genes (yellow circles) by SZGene database in the normal network.
3.5.5. Literature search
We further did literature search in PubMed and found that besides the validated genes by the SZGene [67] database, 75 genes have been reported to be associated with the SCZ disease [68]–[85]. Among these genes, “GRIN2B”, “MSN” “CRH” and “CNTNAP2” have been reported to be related to schizophrenia in 56, 46, 35, and 29 literatures, respectively. LARS2 was upregulated in the transmitochrondrial cybrids carrying 3243A>G. The 3243A>G was detected in the postmortem brains of one patient with schizophrenia [86]. Jungerius et al. [87] found a weak but significant association between PIK3C2G gene and schizophrenia. Functional analysis of rare variants found in schizophrenia implicates a critical role of GIT1-PAK3 signaling in neuroplasticity [88]. Their study provided the first line of direct evidence suggesting that the CHRM5 gene, together with the CHRNA7 gene may be linked to schizophrenia [89]. Elevated ErbB4 mRNA is related to interneuron deficit in prefrontal cortex in schizophrenia [90]. The gene expression state of the DRD2-PI3K-AKT signaling cascade differed significantly between acute schizophrenia patients and healthy controls [91].
Additionally, there are some other interesting findings. There are three modules in the normal network(Figure 3), among which one module is fully composed of SNP-specific genes, another one is completely composed of methylation-specific genes, and the rest one is composed of the common genes. The disease network (Figure 4) keeps the methylation specific module only, but discard other two modules, i.e., common gene module and SNP specific gene module. Among the identified 342 genes, 340 genes are SNP-specific genes and two genes are methylation-specific genes (Figure 5). Among the 23 genes significantly enriched in the SCZ disease, 22 genes are SNP-specific genes and only one gene is methylation-specific gene (Figure 6). Among the 65 genes validated by the SZGenes database, 64 genes are SNP-specific genes and only one gene are methylation-specific gene (Figure 7). Therefore, this indicate that SNPs data vary significantly during the schizophrenia’s onset.
Fig. 3:
Normal network
Fig. 4:
Disease network
We also investigated which node parameters play a important role in causing the SCZ disease. We used Cytoscape [92] plugin NetworkAnalyzer [93] to calculate 10 node attributes (topological coefficient, Neighborhood connectivity, degree, betweenness centrality, stress, average shortest path length, clustering coefficient, closeness centrality, radiality, and eccentricity) of all genes and then utilized the t test to compare the node attributes of the identified genes in SZGene [67] database and those of the non-identified genes in SZGene database. We found that topolofical coefficient, neighborhood connectivity, and degree are significantly different between healthy control groups and SCZ groups. In other words, these three node parameters change significantly from the health network to the Schizophrenia disease network.
Our framework identified disease genes by Comparing two MSTs extracted from two Distance matrices (CMSTD). In order to establish the performance of our proposed framework, we compared it with another three approaches: identifying disease genes by Comparing two Whole Networks (similarity matrices) (CWN). Generally speaking, the nodes’ attributes such as degree centrality [94] should have significant changes from the healthy network to the disease network. So we calculated 10 features for each node in network (betweenness centrality, closeness centrality, strength, Bonacich power centrality scores, Eigenvector centrality scores, (Harary) graph centrality scores, information centrality scores, load centrality scores, vertex prestige scores, and the stress centrality scores [94]), compared them between the health network and the disease network using paired t-test, and ranked the genes according to the significance of changes to identify the SCZ genes according to a p-value threshold. We tried to Compare the nodes’ features between two state Whole Networks (CWNf) and Compare the nodes’ features between two state MSTs (CMSTDf) extracted from two Distance matrices. However,neither CWNf nor CMSTDf has genes with sigificant changes of features after multiple testing adjustment with p ≤ 0.05.
Moreover, we compared our discovery framework with other three network-based discovery methods: CWN, CMSTD, and DiffCorr [95]. DiffCorr [95] is a simple method for identifying pattern changes between two experimental conditions in correlation networks, which builds on a commonly used association measure, such as Pearson’s correlation coefficient. Among the genes discovered by DiffCorr, there are no genes significantly enriched in SCZ or mental disorders.
For CMSTD and CWN, if the threshold of adjusted p value is set to be 0.01, both approaches identified more than 300 genes (CMSTD: 342, CWN: 364). Among the genes identified by CMSTD, there are 23 genes enriched in SCZ and 34 genes enriched in mental disorders. Among those genes identified by CWN, there are 24 genes that can be enriched in SCZ and 35 genes that can be enriched in mental disorders. Among the top 58 genes identified by CWN there are 8 genes enriched in SCZ and 9 genes enriched in mental disorders. For the genes identified by CMSTD, there are 9 genes enriched in SCZ and 10 genes enriched in mental disorders.These two approaches have different results of disease association studies (Table 4), in terms of selecting the most significant genes. These comparisons show that CMSTD performs slightly better than CWN.
TABLE 4:
The results of disease association studies for different top number genes by CMSTD and CWN
| #top genes | CWN | CMSTD | ||
|---|---|---|---|---|
| #Genes for SCZ | #Genes for Mental disorders | #Genes for SCZ | #Genes for Mental disorders | |
| 20 | 2(adjP=0.0147) | 4(adjP=0.0011) | 2(adjP=0.0159) | 4(adjP=0.0007) |
| 58 | 8(adjP=1.65e-06) | 9(adjP=2.83e-06) | 9(adjP=1.18e-07) | 10(adjP=2.15e-07) |
| 60 | 9(adjP=1.65e-07) | 10(adjP=3.11e-07) | 9(adjP=1.66e-07) | 10(adjP=3.12e-07) |
| 100 | 10(adjP=4.72e-07) | 13(adjP=5.26e-08) | 10(adjP=4.26e-07) | 13(adjP=4.41e-08) |
| 140 | 11(adjP=7.41e-07) | 14(adjP=1.58e-07) | 11(adjP=7.36e-07) | 14(adjP=1.46e-07) |
| 180 | 11(adjP=6.17e-06) | 19(adjP=2.24e-10) | 13(adjP=1.23e-07) | 20(adjP=5.86e-11) |
| 220 | 16(adjP=2.68e-09) | 24(adjP=6.23e-13) | 19(adjP=2.54e-09) | 24(adjP=6.31e-13) |
| 260 | 18(adjP=4.01e-10) | 26(adjP=1.29e-13) | 17(adjP=3.41e-09) | 27(adjP=2.86e-14) |
| 300 | 21(adjP=7.59e-12) | 31(adjP=2.00e-16) | 19(adjP=4.15e-10) | 28(adjP=6.60e-14) |
| 340 | 23(adjP=1.30e-12) | 34(adjP=5.62e-18) | 23(adjP=1.11e-12) | 34(adjP=5.67e-18) |
| ALL (CWN:364,CMSTD:342) |
25(adjP=6.36e-14) | 36(adjP=8.28e-19) | 23(adjP=1.17e-12) | 34(adjP=6.84e-18) |
Furthermore, the gene subset selected by our framework is very stable with the change of sample size. The identified gene list doesn’t change when we pick up a subset (one-fifth, two-fifth, three-fifth, and four-fifth) of the original samples for network construction. In addition, we investigated the influence of incorporating fMRI data on the performance of network construction and feature selection. To our surprise, without incorporating fMRI data, there are no significantly enriched genes among the identified genes. This demonstrates the power of incorporating brain imaging data in the gene network construction, especially for mental disorder study.
4. Conclusion
Disease biomarkers are molecular readouts that correlate with a disease state and may have utility in the understanding of the pathogenesis and mechanism from cellular & molecular level. Once validated, these biomarkers can be used for the development of diagnostic procedures and can be further used for improved clinical decision making. In this study, we integrated SNPs, DNA methylation and fMRI data and employed the complementary information to identify important genes for SCZ. For each of the selected SNPs and methylation data, we separately constructed a pair of state networks: one is the healthy network and the other is the disease network. We then combined two health/disease networks into one fused network. We extracted the MSTs from two distance matrices corresponding to the healthy network and the disease network. By comparing the nodes’ features between two MSTs, we identified some genes with significant differences as SCZ disease biomarkers. The gene biomarkers were finally validated from five aspects including disease association study, GO enrichment analysis, pathway enrichment analysis, SCZGene [67] database and the literature search in PubMed. In addition, we compared our framework with other discovery approaches. It showed that the proposed network-based approach can effectively identify SCZ genes, which can be as used targets of drugs for the SCZ disease. The proposed framework for integrating multiple imaging and genomic datasets to quest for genes biomarkers can also be used for the study of other complex diseases.
Acknowledgments
The work is partly supported by the grants: the NIH (R01 GM109068, R01 MH104680, and R01MH107354) and NSF (#1539067). This work was also supported by the grants of the National Science Foundation of China, Nos. 61520106006, 61532008 and U1611265. China Postdoctoral Science Foundation Grant, No2015M580352.
Biographies

Su-Ping Deng received the B. Sc. degree from Henan University, China in 2003. She received the M.Sc. degree from Central South University in 2006. From July 2006 to March 2012, she worked as a teacher in West Anhui University. She received the Ph.D degree from Tongji University, China in 2012. From June 2012 to Nov. 2014, Dr. Deng was a postdoc in the college of Electronics and Information Engineering, Tongji, University, China. From Feb. 2015 to Feb. 2017, Dr. Deng was a postdoc in the department of Biomedical Engineering, Tulane University, USA. Currently, She is a postdoc the college of Electric and Computer Engieering, Texas A&M University, USA. She is mainly interested in computational biology and bioinformatics.

WenXing Hu received B.Sc. degree in Applied Mathematics from Xi’an Jiaotong University, China, 2011. Now, he is a Ph.D student in Biomedical Engineering, Tulane University, USA. His research interests include dimension reduction, correlation analysis, and multi-omics data integration.

Vince D. Calhoun (S’88-M’02-SM’05) received the Bachelor’s degree in electrical engineering from the University of Kansas, Lawrence, in 1991, the Master’s degrees in biomedical engineering and information systems from John’s Hopkins University,Baltimore, MD, in 1993 and 1996, respectively, and the Ph.D. degree in electrical engineering from the University of Maryland Baltimore County, Baltimore, in 2002.
He was a Senior Research Engineer at the Psychiatric Neuro-Imaging Laboratory, John’s Hopkins, from 1993 until 2002. He then became the Director of Medical Image Analysis at the Olin Neuropsychiatry Research Center and an Associate Professor at Yale University. He is currently Director of Image Analysis and MR Research at the Mind Research Network and is an Associate Professor in the Department of Electrical and Computer Engineering, Neurosciences, and Computer Science at the University of New Mexico, Albuquerque. He is the author of more than 80 full journal articles and over 200 technical reports, abstracts, and conference proceedings. Much of his career has been spent on the development of datadriven approaches for the analysis of functional magnetic resonance imaging (fMRI) data. He has multiple NSF and NIH grants on the incorporation of prior information into independent component analysis (ICA) for fMRI, data fusion of multimodal imaging and genetics data, and the identification of biomarkers for disease. He has participated in multiple NIH study sections.
Dr. Calhoun is a Senior Member of the Organization for Human Brain Mapping and the International Society for Magnetic Resonance in Medicine. He has worked in the organization of workshops at conferences including the Society of Biological Psychiatry (SOBP) and the International Conference Of Independent Component Analysis And Blind Source Separation (ICA). He is currently serving on the IEEE Machine Learning for Signal Processing (MLSP) Technical Committee and has previously served as the General Chair of the 2005 meeting. He is a reviewer for a number of international journals and is on the Editorial Board of the Human Brain Mapping and Neuroimage journals and an Associate Editor for the IEEE Signal Processing Letters and the International Journal of Computational Intelligence and Neuroscience..

Yu-Ping Wang Dr. Yu-Ping Wang received the BS degree in applied mathemat-ics from Tianjin University, China, in 1990, and the MS degree in computational mathematics and the PhD degree in communica-tions and electronic systems from Xi’an Jiaotong University, Chi-na, in 1993 and 1996, respectively. After his graduation, he had visiting positions at the Center for Wavelets, Approximation and Information Processing of the National University of Singapore and Washington University Medical School in St. Louis. From 2000 to 2003, he worked as a senior research engineer at Perceptive Scientific Instruments, Inc., and then Advanced Digital Imaging Research, LLC, Houston, Texas. In the fall of 2003, he returned to academia as an assistant professor of computer science and electrical engineering at the University of Missouri-Kansas City. He is currently a full professor of Biomedi-cal Engineering and Biostatistics & Bioinformatics at Tulane Uni-versity School of Science and Engineering & School of Public Health and Tropical Medicine. He is also a member of Tulane Center of Bioinformatics and Genomics, Tulane Cancer Center and Tulane Neuroscience Program. His research interests have been computer vision, signal processing and machine learning with applications to biomedical imaging and bioinformatics, where he has about 180 peer reviewed publications. He has served on numerous program committees and NSF/NIH review panels, and served as editors for several journals such as Neuroscience Methods.
Contributor Information
Su-Ping Deng, Department of Biomedical Engineering, School of Science and Engineering, Tulane University, New Orleans, LA 70118, USA., sdeng2@tulane.edu..
Wenxing Hu, Department of Biomedical Engineering, School of Science and Engineering, Tulane University, New Orleans, LA 70118, USA., whu@tulane.edu..
Vince D. Calhoun, Mind Research Network, Albuquerque, NM 87106, USA., vcalhoun@mrn.org.
Yu-Ping Wang, Department of Biomedical Engineering, School of Science and Engineering, Tulane University, New Orleans, LA 70118, USA., wyp@tulane.edu, Telephone: (504)865-5867, Fax: (504)862-8779..
References
- [1].Turck CW, Biomarkers for psychiatric disorders. Springer, 2008. [Google Scholar]
- [2].Murray CJ and Lopez AD, “The global burden of disease and injury series, volume 1: a comprehensive assessment of mortality and disability from diseases, injuries, and risk factors in 1990 and projected to 2020,” Cambridge. MA, 1996. [Google Scholar]
- [3].Picchioni MM and Murray RM, “Schizophrenia,” BMJ, vol. 335, no. 7610, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Spitzer RL and Williams JB, “Diagnostic and statistical manual of mental disorders,” in American Psychiatric Association. Citeseer, 1980. [Google Scholar]
- [5].Pies R, “How “objective” are psychiatric diagnoses,” Psychiatry, vol. 4, no. 10, 2007. [PMC free article] [PubMed] [Google Scholar]
- [6].Ardekani BA, Nierenberg J, Hoptman MJ, Javitt DC, and Lim KO, “Mri study of white matter diffusion anisotropy in schizophrenia,” Neuroreport, vol. 14, no. 16, pp. 2025–2029, 2003. [DOI] [PubMed] [Google Scholar]
- [7].Kubicki M, McCarley R, Westin C-F, Park H-J, Maier S, Kikinis R, Jolesz FA, and Shenton ME, “A review of diffusion tensor imaging studies in schizophrenia,” Journal of psychiatric research, vol. 41, no. 1, pp. 15–30, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Huang D-S, “Systematic theory of neural networks for pattern recognition,” Publishing House of Electronic Industry of China, Beijing, vol. 201, 1996. [Google Scholar]
- [9].Shergill SS, Brammer MJ, Williams SC, Murray RM, and McGuire PK, “Mapping auditory hallucinations in schizophrenia using functional magnetic resonance imaging,” Archives of general psychiatry, vol. 57, no. 11, pp. 1033–1038, 2000. [DOI] [PubMed] [Google Scholar]
- [10].Pantelis C, Yiicel M, Wood SJ, Velakoulis D, Sun D, Berger G, Stuart GW, Yung A, Phillips L, and McGorry PD, “Structural brain imaging evidence for multiple pathological processes at different stages of brain development in schizophrenia,” Schizophrenia bulletin, vol. 31, no. 3, pp. 672–696, 2005. [DOI] [PubMed] [Google Scholar]
- [11].Ho B-C, Andreasen NC, Nopoulos P, Arndt S, Magnotta V, and Flaum M, “Progressive structural brain abnormalities and their relationship to clinical outcome: a longitudinal magnetic resonance imaging study early in schizophrenia,” Archives of general Psychiatry, vol. 60, no. 6, pp. 585–594, 2003. [DOI] [PubMed] [Google Scholar]
- [12].Zhu L, You Z-H, and Huang D-S, “Increasing the reliability of protein-protein interaction networks via non-convex semantic embedding,” Neurocomputing, vol. 121, pp. 99–107, 2013. [Google Scholar]
- [13].Cao H, Duan J, Lin D, Calhoun V, and Wang Y-P, “Integrating fmri and snp data for biomarker identification for schizophrenia with a sparse representation based variable selection method,” BMC medical genomics, vol. 6, no. Suppl 3, p. S2, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Lin D, Cao H, Calhoun VD, and Wang Y-P, “Sparse models for correlative and integrative analysis of imaging and genetic data,” Journal of neuroscience methods, vol. 237, pp. 69–78, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Lin D, Zhang J, Li J, He H, Deng H-W, and Wang Y-P, “Integrative analysis of multiple diverse omics datasets by sparse group multitask regression,” Multi-omic Data Integration, p. 126, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Zhu L, Guo W-L, Deng S-P, and Huang D-S, “Chip-pit: enhancing the analysis of chip-seq data using convex-relaxed pairwise interaction tensor decomposition,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 13, no. 1, pp. 55–63, 2016. [DOI] [PubMed] [Google Scholar]
- [17].Hariri AR and Weinberger DR, “Imaging genomics,” British medical bulletin, vol. 65, no. 1, pp. 259–270, 2003. [DOI] [PubMed] [Google Scholar]
- [18].Thompson PM, Martin NG, and Wright MJ, “Imaging genomics,” Current opinion in neurology, vol. 23, no. 4, p. 368, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Glahn DC, Paus T, and Thompson PM, “Imaging genomics: mapping the influence of genetics on brain structure and function,” Human brain mapping, vol. 28, no. 6, pp. 461–463, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].van Nuenen BF, van Eimeren T, van der Vegt JP, Buhmann C, Klein C, Bloem BR, and Siebner HR, “Mapping preclinical compensation in parkinson’s disease: an imaging genomics approach,” Movement Disorders, vol. 24, no. S2, pp. S703–S710, 2009. [DOI] [PubMed] [Google Scholar]
- [21].Blasi G and Bertolino A, “Imaging genomics and response to treatment with antipsychotics in schizophrenia,” NeuroRx, vol. 3, no. 1, pp. 117–130, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Deng S-P and Huang D-S, “Sfaps: an r package for structure/function analysis of protein sequences based on informational spectrum method,” Methods, vol. 69, no. 3, pp. 207–212, 2014. [DOI] [PubMed] [Google Scholar]
- [23].Barabasi A-L and Oltvai ZN, “Network biology: understanding the cell’s functional organization,” Nature reviews genetics, vol. 5, no. 2, pp. 101–113, 2004. [DOI] [PubMed] [Google Scholar]
- [24].Wei P-J, Zhang D, Xia J, and Zheng C-H, “Lndriver: identifying driver genes by integrating mutation and expression data based on gene-gene interaction network,” BMC Bioinformatics, vol. 17, no. 17, p. 221, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].D.-s. Huang, “Radial basis probabilistic neural networks: Model and application,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 13, no. 07, pp. 1083–1101, 1999. [Google Scholar]
- [26].Li C, Liakata M, and Rebholz-Schuhmann D, “Biological network extraction from scientific literature: state of the art and challenges,” Briefings in bioinformatics, vol. 15, no. 5, pp. 856–877, 2014. [DOI] [PubMed] [Google Scholar]
- [27].Zhang D, Chen P, Zheng C-H, and Xia J, “Identification of ovarian cancer subtype-specific network modules and candidate drivers through an integrative genomics approach,” Oncotarget, vol. 7, no. 4, p. 4298, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Warde-Farley D, Donaldson SL, Comes O, Zuberi K, Badrawi R, Chao P, Franz M, Grouios C, Kazi F, Lopes CT et al. , “The genemania prediction server: biological network integration for gene prioritization and predicting gene function,” Nucleic acids research, vol. 38, no. suppl 2, pp. W214–W220, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Huang D-S and Du J-X, “A constructive hybrid structure optimization methodology for radial basis probabilistic neural networks,” IEEE Transactions on Neural Networks, vol. 19, no. 12, pp. 2099–2115, 2008. [DOI] [PubMed] [Google Scholar]
- [30].Sharan R and Ideker T, “Modeling cellular machinery through biological network comparison,” N ature biotechnology, vol. 24, no. 4, pp. 427–433, 2006. [DOI] [PubMed] [Google Scholar]
- [31].Li H, Zhang J, Xia J, and Zheng C, “Identification of driver pathways in cancer based on combinatorial patterns of somatic gene mutations.” Neoplasma, vol. 63, no. 1, pp. 57–63, 2015. [DOI] [PubMed] [Google Scholar]
- [32].Pržulj N, “Biological network comparison using graphlet degree distribution,” Bioinformatics, vol. 23, no. 2, pp. e177–e183, 2007. [DOI] [PubMed] [Google Scholar]
- [33].Junker BH, Koschutzki D, and Schreiber F, “Exploration of biological network centralities with centibin,” BMC bioinformatics, vol. 7, no. 1, p. 1, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Huang D-S and Jiang W, “A general cpl-ads methodology for fixing dynamic parameters in dual environments,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 42, no. 5, pp. 1489–1500, 2012. [DOI] [PubMed] [Google Scholar]
- [35].Deng S-P, Zhu L, and Huang D-S, “Predicting hub genes associated with cervical cancer through gene co-expression networks,” IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), vol. 13, no. 1, pp. 27–35, 2016. [DOI] [PubMed] [Google Scholar]
- [36].Deng S-P and Zhu L, “Mining the bladder cancer-associated genes by an integrated strategy for the construction and analysis of differential co-expression networks,” BMC genomics, vol. 16, no. 3, pp. 1–10, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Kogelman LJ, Zhernakova DV, Westra H-J, Cirera S, Fredholm M, Franke L, and Kadarmideen HN, “An integrative systems genetics approach reveals potential causal genes and pathways related to obesity,” Genome medicine, vol. 7, no. 1, p. 1, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [38].Guelzim N, Bottani S, Bourgine P, and Kepes F, “Topological and causal structure of the yeast transcriptional regulatory network,” Nature genetics, vol. 31, no. 1, pp. 60–63, 2002. [DOI] [PubMed] [Google Scholar]
- [39].Kim Y-A, Wuchty S, and Przytycka TM, “Identifying causal genes and dysregulated pathways in complex diseases,” PLoS Comput Biol, vol. 7, no. 3, p. e1001095, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [40].Vanunu O, Magger O, Ruppin E, Shlomi T, and Sharan R, “Associating genes and protein complexes with disease via network propagation,” PLoS Comput Biol, vol. 6, no. 1, p. e1000641, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Wen Z, Liu Z-P, Liu Z, Zhang Y, and Chen L, “An integrated approach to identify causal network modules of complex diseases with application to colorectal cancer,” Journal of the American Medical Informatics Association, vol. 20, no. 4, pp. 659–667, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [42].Tu Z, Wang L, Arbeitman MN, Chen T, and Sun F, “An integrative approach for causal gene identification and gene regulatory pathway inference,” Bioinformatics, vol. 22, no. 14, pp. e489–e496, 2006. [DOI] [PubMed] [Google Scholar]
- [43].van Os J, Kenis G, and Rutten BP, “The environment and schizophrenia,” Nature, vol. 468, no. 7321, pp. 203–212, 2010. [DOI] [PubMed] [Google Scholar]
- [44].Tsuang MT, Stone WS, and Faraone SV, “Genes, environment and schizophrenia,” The British Journal of Psychiatry, vol. 178, no. 40, pp. s18–s24, 2001. [DOI] [PubMed] [Google Scholar]
- [45].Scott SK and Wise RJ, “Functional imaging and language: A critical guide to methodology and analysis,” Speech Communication, vol. 41, no. 1, pp. 7–21, 2003. [Google Scholar]
- [46].Wallace C, Newhouse SJ, Braund P, Zhang F, Tobin M, Falchi M, Ahmadi K, Dobson RJ, Marcano ACB, Hajat C et al. , “Genome-wide association study identifies genes for biomarkers of cardiovascular disease: serum urate and dyslipidemia,” The American Journal of Human genetics, vol. 82, no. 1, pp. 139–149, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [47].Negm RS, Verma M, and Srivastava S, “The promise of biomarkers in cancer screening and detection,” Trends in molecular medicine, vol. 8, no. 6, pp. 288–293, 2002. [DOI] [PubMed] [Google Scholar]
- [48].Robertson S, “What is dna methylation?” new Medical. [Online]. Available: http://www.news-medical.net/life-sciences/What-is-DNA-Methylation.aspx
- [49].Kanehisa M and Goto S, “Kegg: kyoto encyclopedia of genes and genomes,” Nucleic acids research, vol. 28, no. 1, pp. 27–30, 2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [50].Hu W, Lin D, Calhoun VD, and Wang Y.-p., “Integration of snps-fmri-methylation data with sparse multi-cca for schizophrenia study,” in Engineering in Medicine and Biology Society (EMBC), 2016 IEEE 38th Annual International Conference of the. IEEE, 2016, pp. 3310–3313. [DOI] [PubMed] [Google Scholar]
- [51].Hotelling H, “Relations between two sets of variates,” Biometrika, vol. 28, no. 3/4, pp. 321–377, 1936. [Google Scholar]
- [52].Witten DM, Tibshirani R, and Hastie T, “A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis,” Biostatistics, p. kxp008, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [53].Parkhomenko E, Tritchler D, and Beyene J, “Sparse canonical correlation analysis with application to genomic data integration,” Statistical Applications in Genetics and Molecular Biology, vol. 8, no. 1, pp. 1–34, 2009. [DOI] [PubMed] [Google Scholar]
- [54].Witten DM and Tibshirani RJ, “Extensions of sparse canonical correlation analysis with applications to genomic data,” Statistical applications in genetics and molecular biology, vol. 8, no. 1, pp. 1–27, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [55].Xu Y, Olman V, and Xu D, “Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees,” Bioinformatics, vol. 18, no. 4, pp. 536–545, 2002. [DOI] [PubMed] [Google Scholar]
- [56].Frederickson GN, “Data structures for on-line updating of minimum spanning trees, with applications,” SIAM Journal on Computing, vol. 14, no. 4, pp. 781–798, 1985. [Google Scholar]
- [57].Borůvka O, “O jistém problému minimálním,” 1926. [Google Scholar]
- [58].Prim RC, “Shortest connection networks and some generalizations,” Bell Labs Technical Journal, vol. 36, no. 6, pp. 1389–1401, 1957. [Google Scholar]
- [59].Csardi G and Nepusz T, “The igraph software package for complex network research,” InterJournal, vol. Complex Systems, p. 1695, 2006. [Online]. Available: http://igraph.org [Google Scholar]
- [60].Karger DR, Klein PN, and Tarjan RE, “A randomized lineartime algorithm to find minimum spanning trees,” Journal of the ACM (JACM), vol. 42, no. 2, pp. 321–328, 1995. [Google Scholar]
- [61].Pettie S and Ramachandran V, “Minimizing randomness in minimum spanning tree, parallel connectivity, and set maxima algorithms,” in Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms Society for Industrial and Applied Mathematics, 2002, pp. 713–722. [Google Scholar]
- [62].Peterson LE, “K-nearest neighbor,” Scholarpedia, vol. 4, no. 2, p. 1883, 2009. [Google Scholar]
- [63].Benjamini Y and Hochberg Y, “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” Journal of the royal statistical society. Series B (Methodological), pp. 289–300, 1995. [Google Scholar]
- [64].Zhang B, Kirov S, and Snoddy J, “Webgestalt: an integrated system for exploring gene sets in various biological contexts,” Nucleic acids research, vol. 33, no. suppl 2, pp. W741–W748, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [65].Kamburov A, Stelzl U, Lehrach H, and Herwig R, “The consensuspathdb interaction database: 2013 update,” Nucleic acids research, vol. 41, no. D1, pp. D793–D800, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [66].Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT et al. , “Gene ontology: tool for the unification of biology,” Nature genetics, vol. 25, no. 1, pp. 25–29, 2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [67].Allen NC, Bagade S, McQueen MB, Ioannidis JP, Kavvoura FK, Khoury MJ, Tanzi RE, and Bertram L, “Systematic meta-analyses and field synopsis of genetic association studies in schizophrenia: the szgene database,” Nature genetics, vol. 40, no. 7, pp. 827–834, 2008. [DOI] [PubMed] [Google Scholar]
- [68].Kim M, Biag J, Fass D, Lewis M, Zhang Q, Fleishman M, Gangwar S, Machius M, Fromer M, Purcell S et al. , “Functional analysis of rare variants found in schizophrenia implicates a critical role for git1-pak3 signaling in neuroplasticity,” Molecular Psychiatry, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [69].Hu C, Chen W, Myers SJ, Yuan H, and Traynelis SF, “Human grin2b variants in neurodevelopmental disorders,” Journal of pharmacological sciences, vol. 132, no. 2, pp. 115–121, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [70].Kang WS, Kim SK, Park JK, Cho AR, Park HJ, Chung J-H, and Jim J, “Association between promoter polymorphisms of the lifr gene and schizophrenia with persecutory delusions in a korean population,” Mol Med Report, vol. 5, pp. 270–274, 2012. [DOI] [PubMed] [Google Scholar]
- [71].Tóth K, Csukly G, Sirok D, Belic Á, Kiss A, Háfra E, Déri M, Menus Á, Bitter I, and Monostory K, “Optimization of clonazepam therapy adjusted to patient’s cyp3a status and nat2 genotype,” International Journal of Neuropsychopharmacology, p. pyw083, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [72].Huang C-C, Cheng M-C, Tsai H-M, Lai C-H, and Chen C-H, “Genetic analysis of gabrb3 at 15q12 as a candidate gene of schizophrenia,” Psychiatric genetics, vol. 24, no. 4, pp. 151–157, 2014. [DOI] [PubMed] [Google Scholar]
- [73].Hansen T, Ingason A, Djurovic S, Melle I, Fenger M, Gustafsson O, Jakobsen KD, Rasmussen HB, Tosato S, Rietschel M et al. , “At-risk variant in tcf7l2 for type ii diabetes increases risk of schizophrenia,” Biological psychiatry, vol. 70, no. 1, pp. 59–63, 2011. [DOI] [PubMed] [Google Scholar]
- [74].Syu A, Ishiguro H, Inada T, Horiuchi Y, Tanaka S, Ishikawa M, Arai M, Itokawa M, Niizato K, Iritani S et al. , “Association of the hspg2 gene with neuroleptic-induced tardive dyskinesia,” Neuropsychopharmacology, vol. 35, no. 5, pp. 1155–1164, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [75].González-Peñas J, Amigo J, Santomé L, Sobrino B, Brenlla J, Agra S, Paz E, Páramo M, Carracedo Á, Arrojo M et al. , “Targeted resequencing of regulatory regions at schizophrenia risk loci: Role of rare functional variants at chromatin repressive states,” Schizophrenia research, 2016. [DOI] [PubMed] [Google Scholar]
- [76].Zhang Z, Yu H, Jiang S, Liao J, Lu T, Wang L, Zhang D, and Yue W, “Evidence for association of cell adhesion molecules pathway and nlgn1 polymorphisms with schizophrenia in chinese han population,” PloS one, vol. 10, no. 12, p. e0144719, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [77].Ohi K, Shimada T, Nitta Y, Kihara H, Okubo H, Uehara T, and Kawasaki Y, “Specific gene expression patterns of 108 schizophrenia-associated loci in cortex,” Schizophrenia research, 2016. [DOI] [PubMed] [Google Scholar]
- [78].Karlsen AS, Kaalund SS, Moller M, Plath N, and Pakkenberg B, “Expression of presynaptic markers in a neurodevelopmental animal model with relevance to schizophrenia,” Neuroreport, vol. 24, no. 16, pp. 928–933, 2013. [DOI] [PubMed] [Google Scholar]
- [79].Mamdani F, Rollins B, Morgan L, Myers R, Barchas J, Schatzberg A, Watson S, Akil H, Potkin S, Bunney W et al. , “Variable telomere length across post-mortem human brain regions and specific reduction in the hippocampus of major depressive disorder,” Translational psychiatry, vol. 5, no. 9, p. e636, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [80].Koeda M, Watanabe A, Tsuda K, Matsumoto M, Ikeda Y, Kim W, Tateno A, Naing BT, Karibe H, Shimada T et al. , “Interaction effect between handedness and cntnap2 polymorphism (rs7794745 genotype) on voice-specific frontotemporal activity in healthy individuals: an fmri study,” Frontiers in behavioral neuroscience, vol. 9, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [81].Kang SH, Han HR, Lee J.-i., Karmacharya R, Jeon HJ, and Roh S, “Effects of lep, lepr, adipoq, mc4r and fto polymorphisms on dyslipidemia in korean patients with schizophrenia who are taking clozapine,” Psychiatry research, vol. 228, no. 1, pp. 177–178, 2015. [DOI] [PubMed] [Google Scholar]
- [82].García-Bueno B, Gassó P, MacDowell KS, Callado LF, Mas S, Bernardo M, Lafuente A, Meana JJ, and Leza JC, “Evidence of activation of the toll-like receptor-4 proinflammatory pathway in patients with schizophrenia,” Journal of psychiatry & neuroscience: JPN, vol. 41, no. 3, p. E46, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [83].Tsukada T, Simamura E, Shimada H, Arai T, Higashi N, Akai T, Iizuka H, and Hatta T, “The suppression of maternal-fetal leukemia inhibitory factor signal relay pathway by maternal immune activation impairs brain development in mice,” PloS one, vol. 10, no. 6, p. e0129011, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [84].Liu S, Zhang F, Shugart Y, Yang L, Li X, Liu Z, Sun N, Yang C, Guo, Shi J et al. , “The early growth response protein 1-mir-30a-5p-neurogenic differentiation factor 1 axis as a novel biomarker for schizophrenia diagnosis and treatment monitoring,” Translational Psychiatry, vol. 7, no. 1, p. e998, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [85].Nishi A and Shuto T, “Potential for targeting dopamine/darpp32 signaling in neuropsychiatric and neurodegenerative disorders,” Expert Opinion on Therapeutic Targets, no. just-accepted, 2017. [DOI] [PubMed] [Google Scholar]
- [86].Supriyanto I, Watanabe Y, Mouri K, Shiroiwa K, RattaApha W, Yoshida M, Tamiya G, Sasada T, Eguchi N, Okazaki K et al. , “A missense mutation in the itga8 gene, a cell adhesion molecule gene, is associated with schizophrenia in japanese female patients,” Progress in Neuro-Psychopharmacology and Biological Psychiatry, vol. 40, pp. 347–352, 2013. [DOI] [PubMed] [Google Scholar]
- [87].Shah B, Khunt D, Misra M, and Padh H, “Non-invasive intranasal delivery of quetiapine fumarate loaded microemulsion for brain targeting: Formulation, physicochemical and pharmacokinetic consideration,” European Journal of Pharmaceutical Sciences, 2016. [DOI] [PubMed] [Google Scholar]
- [88].Bernstein H-G, Hildebrandt J, Dobrowolny H, Steiner J, Bogerts B, and Pahnke J, “Morphometric analysis of the cerebral expression of atp-binding cassette transporter protein abcb1 in chronic schizophrenia: Circumscribed deficits in the habenula,” Schizophrenia research, 2016. [DOI] [PubMed] [Google Scholar]
- [89].De Luca V, Wang H, Squassina A, Wong GW, Yeomans J, and Kennedy JL, “Linkage of m5 muscarinic and α7-nicotinic receptor genes on 15q13 to schizophrenia,” Neuropsychobiology, vol. 50, no. 2, pp. 124–127, 2004. [DOI] [PubMed] [Google Scholar]
- [90].Joshi D, Fullerton JM, and Weickert CS, “Elevated erbb4 mrna is related to interneuron deficit in prefrontal cortex in schizophrenia,” Journal of psychiatric research, vol. 53, pp. 125–132, 2014. [DOI] [PubMed] [Google Scholar]
- [91].Liu L, Luo Y, Zhang G, Jin C, Zhou Z, Cheng Z, and Yuan G, “The mrna expression of drd2, pi3kcb, and akt1 in the blood of acute schizophrenia patients,” Psychiatry Research, vol. 243, pp. 397–402, 2016. [DOI] [PubMed] [Google Scholar]
- [92].Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, and Ideker T, “Cytoscape: a software environment for integrated models of biomolecular interaction networks,” Genome research, vol. 13, no. 11, pp. 2498–2504, 2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [93].Assenov Y, Ramírez F, Schelhorn S-E, Lengauer T, and Albrecht M, “Computing topological parameters of biological networks,” Bioinformatics, vol. 24, no. 2, pp. 282–284, 2008. [DOI] [PubMed] [Google Scholar]
- [94].Borgatti SP and Everett MG, “A graph-theoretic perspective on centrality,” Social networks, vol. 28, no. 4, pp. 466–484, 2006. [Google Scholar]
- [95].Fukushima A, “Diffcorr: an r package to analyze and visualize differential correlations in biological networks,” Gene, vol. 518, no. 1, pp. 209–214, 2013. [DOI] [PubMed] [Google Scholar]







