Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 May 14.
Published in final edited form as: Med Image Comput Comput Assist Interv. 2017 Sep 4;10434:450–458. doi: 10.1007/978-3-319-66185-8_51

Multi-label Inductive Matrix Completion for Joint MGMT and IDH1 Status Prediction for Glioma Patients

Lei Chen 1,2, Han Zhang 2, Kim-Han Thung 2, Luyan Liu 3, Junfeng Lu 4,5, Jinsong Wu 4,5, Qian Wang 3, Dinggang Shen 2,
PMCID: PMC5951635  NIHMSID: NIHMS939432  PMID: 29770368

Abstract

MGMT promoter methylation and IDH1 mutation in high-grade gliomas (HGG) have proven to be the two important molecular indicators associated with better prognosis. Traditionally, the statuses of MGMT and IDH1 are obtained via surgical biopsy, which is laborious, invasive and time-consuming. Accurate presurgical prediction of their statuses based on preoperative imaging data is of great clinical value towards better treatment plan. In this paper, we propose a novel Multi-label Inductive Matrix Completion (MIMC) model, highlighted by the online inductive learning strategy, to jointly predict both MGMT and IDH1 statuses. Our MIMC model not only uses the training subjects with possibly missing MGMT/IDH1 labels, but also leverages the unlabeled testing subjects as a supplement to the limited training dataset. More importantly, we learn inductive labels, instead of directly using transductive labels, as the prediction results for the testing subjects, to alleviate the overfitting issue in small-sample-size studies. Furthermore, we design an optimization algorithm with guaranteed convergence based on the block coordinate descent method to solve the multivariate non-smooth MIMC model. Finally, by using a precious single-center multi-modality presurgical brain imaging and genetic dataset of primary HGG, we demonstrate that our method can produce accurate prediction results, outperforming the previous widely-used single- or multi-task machine learning methods. This study shows the promise of utilizing imaging-derived brain connectome phenotypes for prognosis of HGG in a non-invasive manner.

Keywords: High-grade gliomas, Molecular biomarker, Matrix completion

1 Introduction

Gliomas account for approximately 45% of primary brain tumors. Most deadly gliomas are classified by World Health Organization (WHO) as grades III and IV, referred to as high-grade gliomas (HGG). Related studies have shown that O6-methylguanine-DNA methyltransferase promoter methylation (MGMT-m) and isocitrate dehydrogenases mutation (IDH1-m) are the two strong molecular indicators that may associate with better prognosis (i.e., better sensitivity to the treatment and longer survival time), compared to their counterparts, i.e., MGMT promoter unmethylation (MGMT-u) and IDH1 wild (IDH1-w) [1, 2]. To date, the identification of the MGMT and IDH1 statuses is becoming clinical routine, but conducted via invasive biopsy, which has limited their wider clinical implementation. For better treatment planning, non-invasive and preoperative prediction of MGMT and IDH1 statuses is highly desired.

A few studies have been carried out to predict MGMT/IDH1 status based on the preoperative neuroimages. For example, Korfiatis et al. extracted tumor texture features from a single T2-weighted MRI modality, and trained a support vector machine (SVM) to predict MGMT status [1]. Yamashita et al. extracted both the functional information (i.e., tumor blood flow) from perfusion MRI and the structural features from T1-weighted MRI, and employed a nonparametric approach to predict IDH1 status [2]. Zhang et al. extracted more voxel- and histogram-based features from T1-, T2-, and diffusion-weighted images (DWI), and employed a random forest (RF) classifier to predict IDH1 status [3].

However, all these studies are limited to predict either MGMT or IDH1 status alone by using a single-task machine learning technique, which simply ignores the potential relationship of these two molecular expressers that may help each other to achieve more accurate prediction results [4]. It is desirable to use multi-task learning approach to jointly predict the MGMT and IDH1 statuses. Meanwhile, in the clinical practice, a complete molecular pathological testing may not always be conducted; therefore, in several cases there is only one biopsy-proven MGMT or IDH1 status, which leads to incomplete training labels or a missing data problem. Traditional methods usually simply discard the subjects with incomplete labels, which, however, further reduces the number of training samples. The recently proposed Multi-label Transductive Matrix Completion (MTMC) model is an important multi-task classification method, which can make full use of the samples with missing labels [5] and has produced good performance in many previous studies [5, 6]. However, it is difficult to be generalized to a study with a limited sample size due to its inherent overfitting; thus, many phenotype-genotype studies inevitably suffer from such a problem.

In order to address the above limitations, we propose a novel Multi-label Inductive Matrix Completion (MIMC) model by introducing an online inductive learning strategy into the MTMC model. However, the solution of MIMC is not trivial, since it contains both the non-smooth nuclear-norm and L21-norm constraints. Therefore, based on the block coordinate descent method, we design an optimization algorithm to optimize the MIMC model. Note that, in this paper, we do not adopt the commonly used radiomics information derived from T1- or T2-weighted structural MRI, but instead use the connectomics information derived from both resting-state functional MRI (RS-fMRI) and diffusion tensor imaging (DTI). The motivation behind this is that the structural MRI-based radiomics features are highly affected by tumor characteristics (e.g., locations and sizes) and thus significantly variable across subjects, which is undesirable for group study and also individual-based classification. On the other hand, brain connectome features extracted from RS-fMRI and DTI reflect the inherent brain connectivity architecture and its alterations due to the highly diffusive HGG, and thus could be more consistent and reliable as imaging biomarkers.

2 Materials, Preprocessing, and Feature Extraction

Our dataset includes 63 HGG patient subjects, which were recruited during 2010–2015. Each subject has at least one biopsy-proven MGMT or IDH1 status. We exclude the subjects without entire RS-fMRI or DTI, or with significant imaging artifacts as well as excessive head motion. Finally, 47 HGG subjects are used in this paper. We summarize subjects’ demographic and clinical information in Table 1. For simplicity, MGMT-m and IDH1-m are labeled as “positive”, respectively, and MGMT-u and IDH1-w as “negative”. This study has been approved by the local ethical committee at local hospital.

Table 1.

Demographic and clinical information of the subjects involved in this study.

MGMT IDH1
Pos. labeled 26 13
Neg. labeled 20 33
Unlabeled 1 1
Age (mean/range) 48.13/23–68
Gender (M/F) 26/21
WHO III/IV 23/24

In this study, all the RS-fMRI and DTI data are collected preoperatively with the following parameters. RS-fMRI: TR (repetition time) = 2 s, number of acquisitions = 240 (8 min), and voxel size = 3.4 × 3.4 × 4 mm3. DTI: 20 directions, voxel size = 2 × 2 × 2 mm3, and multiple acquisitions = 2. SPM8 and DPARSF [7] are used to preprocess RS-fMRI data and construct brain functional networks. FSL and PANDA [8] are used to process DTI and construct brain structural networks. Multi-modality images are first co-registered within the same subject, and then registered to the atlas space. All the processing procedures are following the commonly accepted pipeline [9]. Specifically, we parcellate each brain into 90 regions of interest (ROIs) using Automated Anatomical Labeling (AAL) atlas. The parcellated ROIs in each subject are regarded as nodes in a graph, while the Pearson’s correlation coefficient between the blood oxygenation level dependent (BOLD) time series from each pair of the ROIs are calculated as the functional connectivity strength for the corresponding edge in the graph. Similarly, the structural network is constructed based on the whole-brain DTI tractography by calculating the normalized number of the tracked main streams as the structural connectivity strength for each pair of the AAL ROIs.

After network constructions, we use GRETNA [10] to extract various network properties based on graph theoretic analysis, including degree, shortest path length, clustering coefficient, global efficiency, local efficiency, and nodal centrality. These complex network properties are extracted as the connectomics features for each node in each network. We also use 12 clinical features for each subject, such as patient’s age, gender, tumor size, tumor WHO grade, tumor location, etc. Therefore, each subject has 1092 (6 metrics × 2 networks × 90 regions + 12 clinical features) features.

3 MIMC-Based MGMT and IDH1 Status Prediction

We first introduce the notations used in this paper. X(i) denotes the i-th column of matrix X. xij denotes the element in the i-th row and j-th column of matrix X. 1 denotes all-one column vector. XT denotes the transpose of matrix X. Xtrain = [x1,⋯,xm]T∈ ℝm×d and Xtest = [xm+1,⋯,xm+n]T∈ ℝn×d denote the feature matrices associated with m training subjects and n testing subjects, respectively. Assume there are t binary classification tasks, and Ytrain = [y1,⋯ ym]T∈ {−1, 1, ?}m×t and Ytest = [ym+1,⋯,ym+n]T∈{?}n×t denote the label matrices associated with m training subjects and n testing subjects, where ‘?’ denotes the unknown label. Furthermore, for the convenience of description, let Xobs = [Xtrain; Xtest], Yobs = [Ytrain; Ytest] and Zobs = [Xobs; 1; Yobs] denote the observed feature matrix, label matrix, and stacked matrix, respectively. Let X0 ∈ ℝ(m+nd denote the underlying noise-free feature matrix corresponding to Xobs. Let Y0 ∈ ℝ(m+nt denote the underlying soft label matrix, and sign(Y0) for the underlying label matrix corresponding to Yobs, where sign(·) is the element-wise sign function.

3.1 Multi-label Transductive Matrix Completion (MTMC)

MTMC is a well-known multi-label matrix completion model, which is developed with two assumptions. First, linear relationship is assumed between X0 and Y0, i.e., Y0 = [X0; 1] W, where W ∈ ℝ(d+1)×t is the implicit weight matrix. Second, X0 is also assumed to be low-rank. Let Z0 = X0; 1; Y0 denote the underlying stacked matrix corresponding to Zobs, and then from rank (Z0) ≤ rank (X)0 + 1, we can infer that Z0 is also low-rank. The goal of MTMC is to estimate Z0 given Zobs. In the real application, where Zobs is contaminated by noise, MTMC is formulated as:

minZ(d+1)=1μZ+12ZΔXXobsF2+γ(i,j)ΩYy(zi(d+1+j),yijobs), (1)

where Z = [ZΔX, Z(d+1), ZΔY] denotes the matrix to be optimized, ZΔX denotes the noise-free feature submatrix, ZΔY denotes the soft label submatrix, ΩY denotes the subscripts set of the observed entries in Yobs, ‖·‖* denotes the nuclear norm, ‖·‖F denotes the Frobenius norm, and ℂy(·,·) denotes the logistic loss function. Once the optimal Zopt is found, the labels Ytest the testing subjects can then be estimated by sign(ZΔYtestopt), where ZΔYtestopt denotes the optimal soft labels of the testing subjects. Based on the formulation of MTMC, we know that ZΔYtestopt is implicitly obtained from ZΔYtestopt=[Xtestopt,1]Wopt, where Xtestopt is the optimal noise-free counterpart of Xtest, and Wopt is the optimal estimation of W. Although Wopt is not explicitly computed, it is implicitly determined by the training subjects and their known labels (i.e., in the third term of Eq. (1)). Therefore, for multi-label classification tasks with insufficient training subjects as in our case, MTMC will still have the inherent overfitting.

3.2 Multi-label Inductive Matrix Completion (MIMC)

In order to alleviate the overfitting, we employ an online inductive learning strategy to modify the MTMC model, and name the modified MTMC as Multi-label Inductive Matrix Completion (MIMC) model. Specifically, we introduce an explicit predictor matrix W(d+1)×t into MTMC by adding the following constraint into Eq. (1):

minWλW2,1+β2ZΔY[Xobs,1]WF2, (2)

where ‖·‖2;1 denotes the L21-norm, which imposes row sparsity on W to learn the shared representations across all related classification tasks by selecting the common discriminative features. In addition, note also that, in the second term of Eq. (2), we use all the subjects (including the testing subjects) to learn the sparse predictor matrix W based on the transductive soft labels ZΔY. In other words, we leverage the testing subjects as an efficient supplement to the limited training subjects, thus alleviating the small-sample-size issue of the training data that often causes the overfitting problem for training of the classifier. The final MIMC model is given as:

minZ,WZ(d+1)=1{μZ+12ZΔXXobsF2+γ(i,j)ΩYy(zi(d+1+j),yijobs)+λW2,1+β2ZΔY[Xobs,1]WF2}. (3)

In this way, we can obtain the optimal sparse predictor matrix Wopt by using our proposed optimization algorithm in Sect. 3.3 below, and estimate the labels Ytest of the testing subjects Xtest by induction:

Ytest=sign([Xtext,1]Wopt). (4)

Comparing with the overfitting-prone transductive labels sign(ZΔYtestopt), the inductive labels in Eq. (4), which learned from more subjects (by including the testing subjects) and benefit from the advantage of joint feature selection (via L21-norm), would give us more robust predictions, thus suffering less from the small-sample-size issue.

3.3 Optimization Algorithm for MIMC

The solution of MIMC is not trivial, as it contains the all-1-column constraint (i.e., Z(d +1) = 1) in Eq. (3), along with the fact that the L21-norm and nuclear norm are the non-smooth penalties. Here, we employ the block coordinate descent method to design an optimization algorithm for solving MIMC. The key steps of this algorithm are to iteratively optimize the following two Subproblems:

Zk=argminz(d+1)=1{12ZΔXXobsF2+γ(ij)ΩYy(zi(d+1+j),yijobs)+μZ+β2ZΔY[Xobs,1]Wk1F2}, (5)
Wk=argminWλW2,1+β2(ZΔY)k[Xobs,1]WF2. (6)

We solve Subproblem 1 in Eq. (5) by employing the Fixed Point Continuation (FPC) method plus the projection technique, with its convergence being proven by Cabral et al. [6]. Specifically, it consists of two steps for each iteration t:

{(Zk)t=DμτZ((Zk)t1τZG((Zk)t1))((Zk)t)(d+1)=1, (7)

where τZ=min{1,4/32β2+2γ2} denotes the gradient step size, DμτZ() denotes the proximal operator of the nuclear norm [6], and ∇G(Z) is the gradient of G(Z):

G(Z)={12ZΔXXobsF2+β2ZΔY[Xobs,1]Wk1F2+γ(ij)ΩYy(Zi(d+1+j),yijobs)}. (8)

The Subproblem 2 in Eq. (6) is a standard L21-norm regularization problem, which can be solved via the accelarated Nesterov’s method with convergence proof in [11]. Specifically, it includes the following step for each iteration t:

(Wk)t=JλτW((Wk)t1τWF((Wk)t1)), (9)

where τW=1/σmax(β[Xobs,1]T[Xobs,1]) denotes the gradient step size, σmax(·)denotes the maximal singular value of matrix JλτW() denotes the proximal operator of L21-norm [11], and F(W) is the gradient of F(W)

F(W)=β2(ZΔY)k[Xobs,1]WF2. (10)

Theoretically, for the jointly convex problem with the separable non-smooth terms, Tseng [12] has demonstrated that the block coordinate descent method is guaranteed to converge to a global optimum, as long as all Subproblems are solvable. In our MIMC model, obviously, the objective function in Eq. (3) is jointly convex for Z and W, and its non-smooth parts, i.e., both μZ and λW2,1 are separable. Based on this fact, our proposed optimization algorithm also has the provable convergence.

4 Results and Discussions

We evaluate the proposed MIMC by jointly predicting MGMT and IDH1 statuses using our HGG patients. Considering the limited number of 47 subjects, we use 10-fold cross validation to ensure a relatively unbiased prediction performance for the new testing subjects. We compare MIMC with the widely-used single-task machine learning methods (including SVM with RBF kernel [13] and RF [14]) and state-of-the-art multi-task machine learning methods (i.e., Lest_L21 [11] and MTMC [5]). All the involved parameters in these methods are optimized by using the nested 10-fold cross validation procedure.

We measure the prediction performance in terms of accuracy (ACC), sensitivity (SEN), specificity (SPE), and area under the receiver operating characteristic curve (AUC). In order to avoid any bias introduced by randomly partitioning the dataset, each 10-fold cross-validation is independently repeated for 20 times. The average experimental results for MGMT and IDH1 status predictions are reported in Tables 2 and 3, respectively. The best results and those ones not significantly worse than the best results at 95% confidence level are highlighted in bold. Except that the Lest_L21 achieves slightly higher specificity (but not statistically significant) than MIMC (i.e., 70.75% vs. 70.00%) in MGMT status prediction, MIMC consistently outperforms SVM, RF and MTMC in all performance metrics, which indicate that our proposed online inductive learning strategy can help improve the prediction performance of MIMC. In addition, we also find that all the multi-task machine learning methods consistently outperform the single-task RF method, but not outperform the single-task SVM method in terms of ACC. We speculate that this is mainly caused by the kernel trick of SVM, which implicitly carries out the nonlinear feature mapping. In future work, we will extend our proposed MIMC model to its nonlinear version by employing the kernel trick to further improve the performance of MGMT and IDH1 status predictions.

Table 2.

Performance comparison of different methods for MGMT status prediction.

Method ACC (%) AUC (%) SEN (%) SPE (%)
Single-task SVM 68.04 72.43 71.35 63.75
RF 65.54 70.62 62.50 69.50
Multi-task Lest_L21 67.28 72.26 64.62 70.75
MTMC 68.26 72.58 70.77 65.00
MIMC 71.74 77.21 73.08 70.00

Table 3.

Performance comparison of different methods for IDH1 status prediction.

Method ACC (%) AUC (%) SEN (%) SPE (%)
Single-task SVM 75.54 82.47 61.92 80.91
RF 72.50 77.94 68.46 74.09
Multi-task Lest_L21 75.57 79.98 66.92 77.58
MTMC 75.33 82.14 78.46 74.09
MIMC 83.26 88.47 84.62 82.73

5 Conclusion

In this paper, we focus on addressing the tasks of predicting MGMT and IDH1 statuses for HGG patients. Considering strong correlation between MGMT promoter methylation and IDH1 mutation, we formulate their prediction tasks as a Multi-label Inductive Matrix Completion (MIMC) model, and then design an optimization algorithm with provable convergence to solve this model. The promising results by various experiments verify the advantages of the proposed MIMC model over the widely-used single- and multi-task classifiers. Also, for the first time, we show the feasibility of molecular biomarker prediction based on the preoperative multi-modality neuroimaging and connectomics analysis.

Acknowledgments

This work was supported in part by NIH grants (EB006733, EB008374, MH100217, MH108914, AG041721, AG049371, AG042599, AG053867, EB022880), Natural Science Foundation of Jiangsu Province (BK20161516, BK20151511), China Postdoctoral Science Foundation (2015M581794), Natural Science Research Project of Jiangsu University (15KJB520027), and Postdoctoral Science Foundation of Jiangsu Province (1501023C).

References

  • 1.Korfiatis P, Kline T, et al. MRI texture features as biomarkers to predict MGMT methylation status in glioblastomas. Med Phys. 2016;43(6):2835–2844. doi: 10.1118/1.4948668. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Yamashita K, Hiwatashi A, et al. MR imaging-based analysis of glioblastoma multiform: estimation of IDH1 mutation status. AJNI Am J Neuroradiol. 2016;37(1):58–65. doi: 10.3174/ajnr.A4491. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Zhang B, Chang K, et al. Multimodal MRI features predict isocitrate dehydrogenase genotype in high-grade gliomas. Neurooncology. 2017;19(1):109–117. doi: 10.1093/neuonc/now121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Noushmehr H, Weisenberger D, et al. Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. Cancer Cell. 2010;17(5):510–522. doi: 10.1016/j.ccr.2010.03.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Goldberg A, Zhu X, et al. Transduction with matrix completion: three birds with one stone. Proceedings of NIPS. 2010:757–765. [Google Scholar]
  • 6.Cabral R, et al. Matrix completion for weakly-supervised multi-label image classification. IEEE Trans Pattern Anal Mach Intell. 2015;37(1):121–135. doi: 10.1109/TPAMI.2014.2343234. [DOI] [PubMed] [Google Scholar]
  • 7.Yan C, Zang Y. DPARSF: a MATLAB toolbox for “pipeline” data analysis of resting-state fMRI. Front Syst Neurosci. 2010;4:13. doi: 10.3389/fnsys.2010.00013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Cui Z, Zhong S, et al. PANDA: a pipeline toolbox for analyzing brain diffusion images. Front Hum Neurosci. 2013;7:42. doi: 10.3389/fnhum.2013.00042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Liu L, Zhang H, Rekik I, Chen X, Wang Q, Shen D. Outcome prediction for patient with high-grade gliomas from brain functional and structural networks. In: Ourselin S, Joskowicz L, Sabuncu MR, Unal G, Wells W, editors. MICCAI 2016 LNCS. Vol. 9901. Springer, Cham; 2016. pp. 26–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Wang J, Wang X, et al. GRETNA: a graph theoretical network analysis toolbox for imaging connectomics. Front Hum Neurosci. 2015;9:386. doi: 10.3389/fnhum.2015.00386. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Liu J, Ji S, Ye J. Multi-task feature learning via efficient L2,1-norm minimization. Proceedings of UAI. 2009:339–348. [Google Scholar]
  • 12.Tseng P. Convergence of a block coordinate descent method for non-differentiable minimization. J Optim Theory Appl. 2001;109(3):475–494. [Google Scholar]
  • 13.Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273–297. [Google Scholar]
  • 14.Breiman L. Random forests. Mach Learn. 2001;45(1):5–32. [Google Scholar]

RESOURCES