Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2017 Aug 1;7:6993. doi: 10.1038/s41598-017-07481-5

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© The Author(s) 2017

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

PMC Copyright notice

Overview of Machine Learning-Assisted Network Inference (MALANI) algorithm to infer a cancer network for a given cancer type. The overall MALANI procedure consists of three major stages: (a) Stage 1: Construction of gene-wise models. Step 1.1: Split gene expression data matrix for 10-fold cross-validation. Original gene expression matrix is represented as M _G×S where G represents the number of genes (rows in data matrix) and S represents samples (column in data matrix). The gene expression matrix was split into 10 portions, each for cancer and normal samples. Step 1.2: Construct gene-wise dot product matrices. For the first training fold, gene-wise dot product matrices are constructed from split data from step 1.1. Step 1.3: Model training and testing. Gene-wise matrices generated from Step 1.2 were used for training and the remaining one split portion as testing. The step was then repeated 9 times for 10-fold cross-validation (see Materials and Methods for more detailed description). Step 1.4: Evaluation of gene-wise model performance. Performance of gene-wise dot product models were assessed for their capability to classify cancer vs. normal samples. The top 5% gene-wise models with the best classification performance that constitute a gene set of Θ were selected. (b) Stage 2: Construction of gene-pair models. Step 2.1: Construction of gene-pair vector from Θ set. Using data from Step 1.1, vectors of gene pairs from a set of Θ genes obtained from Step 1.4 (denoted as red “g”) with remaining genes in the expression array (denoted as black “g”) were constructed. Step 2.2: Model training and testing. Each vector corresponding to a gene in set Θ were used to train SVM models for each training fold and 10-fold cross-validation procedure as described in Step 1.3 was performed. Step 2.3: Evaluation of gene-pair model performances. Gene-pair model performance was assessed based on model’s capability to classify cancer vs. normal samples. Genes not found in the Θ set (black “g”) that paired with genes in the Θ set (red “g”) and were among the top 10 in terms of best classification performance were deemed functionally associated with a given gene in set Θ. (c) Stage 3: Inferring cancer networks. Step 3.1: Construction of finalized gene-pair dot product matrix. Step 3.2: Assessment of gene pair performance in classifying cancer samples. Five feature selection methods were used to assess performance of each gene pair dot product in classifying cancer vs. normal samples. Step 3.3: Construction of MALANI-Inferred Network (MIN) for a cancer. Selected gene pairs by at least three different feature selection methods in Step 3.2 were then agglomerated to reconstruct an inferred cancer network.