Robust Collaborative Clustering of Subjects and Radiomic Features for Cancer Prognosis

Hangfan Liu; Hongming Li; Mohamad Habes; Yuemeng Li; Pamela Boimel; James Janopaul-Naylor; Ying Xiao; Edgar Ben-Josef; Yong Fan

doi:10.1109/TBME.2020.2969839

. Author manuscript; available in PMC: 2021 Apr 15.

Published in final edited form as: IEEE Trans Biomed Eng. 2020 Jan 27;67(10):2735–2744. doi: 10.1109/TBME.2020.2969839

Robust Collaborative Clustering of Subjects and Radiomic Features for Cancer Prognosis

Hangfan Liu ¹, Hongming Li ², Mohamad Habes ³, Yuemeng Li ⁴, Pamela Boimel ⁵, James Janopaul-Naylor ⁶, Ying Xiao ⁷, Edgar Ben-Josef ⁸, Yong Fan ⁹

PMCID: PMC8048106 NIHMSID: NIHMS1684063 PMID: 31995474

Abstract

Feature dimensionality reduction plays an important role in radiomic studies with a large number of features. However, conventional radiomic approaches may suffer from noise, and feature dimensionality reduction techniques are not equipped to utilize latent supervision information of patient data under study, such as differences in patients, to learn discriminative low dimensional representations. To achieve robustness to noise and feature dimensionality reduction with improved discriminative power, we develop a robust collaborative clustering method to simultaneously cluster patients and radiomic features into distinct groups respectively under adaptive sparse regularization. Our method is built upon matrix tri-factorization enhanced by adaptive sparsity regularization for simultaneous feature dimensionality reduction and denoising. Particularly, latent grouping information of patients with distinct radiomic features is learned and utilized as supervision information to guide the feature dimensionality reduction, and noise in radiomic features is adaptively isolated in a Bayesian framework under a general assumption of Laplacian distributions of transform-domain coefficients. Experiments on synthetic data have demonstrated the effectiveness of the proposed approach in data clustering, and evaluation results on an fDg-PET/CT dataset of rectal cancer patients have demonstrated that the proposed method outperforms alternative methods in terms of both patient stratification and prediction of patient clinical outcomes.

Keywords: Sparsity, collaborative clustering, unsupervised learning, nonnegative matrix tri-factorization, radiomics

I. Introduction

IN THE United States, rectal cancer is one of the most common causes of cancer deaths. Although surgery and chemoradiation therapy (CRT) has been widely used to treat rectal cancer, most patients achieve different degrees of partial response and have a high risk of developing local recurrences and distant metastasis [1]. Early prediction of treatment response would enable individualized patient management and treatment planning.

Radiomic approaches have obtained promising performance for rectal cancer staging and prognosis [2]-[6]. The typical components of radiomic approaches include radiomic feature extraction, feature selection, feature dimension reduction, and prediction modeling [7]-[15]. Particularly, feature selection and dimensionality reduction are often used to identify a compact set of informative radiomic features for building robust prediction models in radiomic studies where a small number of samples are characterized by high dimensional features.

Typical feature selection methods identify informative features in a supervised learning framework [16]-[18], and therefore they may overfit the data especially when the sample size is small. In contrast, feature dimensionality reduction techniques, such as principal component analysis (PCA), learn low dimensional feature presentations in an unsupervised setting [19]. However, the conventional unsupervised learning approaches does not consider latent discriminative information of the data, such as differences in radiomic features of patients with different clinical outcomes. Such differences may provide weak supervision information for the feature dimensionality reduction. Particularly, a recent study has demonstrated that capturing differences of feature patterns between sub-clusters of samples could improve both the patient stratification and dimensionality reduction by simultaneously clustering patients and their high dimensional radiomic features, compared with conventional feature dimensionality reduction and clustering techniques [20], [21]. Nevertheless, noise in radiomic features remains underexplored in radiomic studies, although the extraction of radiomic features inevitably introduces noise during the processes of image acquisition and tumor segmentation. It is nontrivial to learn robust and informative feature representations from noisy radiomic features. Therefore, it is desired to suppress the influence of noise while keeping the fidelity of radiomic features. The simultaneous clustering of patients and their radiomic features could be achieved by non-negative matrix tri-factorization methods [22]-[25]. However, these methods do not directly handle feature noise.

In this paper, we propose to integrate unsupervised collaborative clustering and data denoising into a single framework, aiming to build a robust and efficient radiomic prediction system. Specifically, a robust collaborative clustering method based on non-negative matrix tri-factorization is developed to learn discriminative low dimensional features, aiming to utilize latent supervision information derived via simultaneously grouping patients with distinct radiomic features in a sparse learning framework. The noise of radiomic features is suppressed by applying adaptive sparsity regularization to the collaborative clustering of patients and radiomic features in a Bayesian framework, which facilitates robust feature dimensionality reduction. The method has been validated based on simulated data and a data set of pretreatment PET/CT scans of rectal cancer patients and compared with convectional techniques in terms of both patient stratification and prognosis of clinical outcomes.

II. Related Machine Learning Methods

Clustering is one of the most widely-used unsupervised machine learning methods in various applications [26]-[32]. K-means [26] and k-medoids [27] are two of the best known clustering techniques that divide data points into non-overlapping clusters, while hierarchical clustering methods [28] group data points into different clusters with a hierarchical structure. In recent years, subspace clustering methods [29]-[31] have been proposed to cluster data points in their underlying subspaces, while deep embedded clustering [32] learns clustering memberships through deep neural networks. Although these approaches achieved success in a variety of applications, they are not tailored to radiomic studies where a limited number of patients have a sheer number of features. Moreover, they neglect latent discriminative information of the data that may provide weak supervision for the feature dimensionality reduction. Such information might be captured by differences in features of patients with heterogenous clinical profiles, including different clinical outcomes.

Nonnegative matrix tri-factorization (NMTF) technique [22]-[25] provides an effective tool to capture latent differences of samples and reduce feature dimensionality by simultaneously grouping samples and features into subgroups. Different from the popular nonnegative matrix factorization methods [33], the NMTF approach [22] decomposes a nonnegative matrix into three matrices, two of which contain clustering information regarding the row vectors and the column vectors of the original matrix respectively. The method has been improved by employing graph regularization [23], symmetric regularization [24], and manifold regularization [25] for specific applications. However, none of them is equipped to handle noisy features that are ubiquitous in radiomic studies due to complicated feature generation procedures, including both imaging and tumor segmentation.

To overcome limitations of existing feature dimensionality reduction methods, this study introduces band-wise adaptive sparsity regularization to enhance NMTF for improved collaborative clustering of radiomic features. Our method has following advantages: 1) Heterogeneity of patients with distinct outcomes is utilized as latent supervision information to learn a low-dimensional discriminative data representation; 2) An adaptive sparsity regularization is adopted to eliminate noise while learning the low-dimensional discriminative data representation; and 3) A unified unsupervised learning framework is used to learn the low-dimensional discriminative data representation and eliminate noise simultaneously.

III. Material and Methods

A. Collaborative Clustering With Adaptive Sparsity Regularization for Patient Stratification and Prognosis

This study focuses on the feature dimensionality reduction to enhance discriminative power of radiomic features as illustrated by Fig. 1. Particularly, we develop a sparsity regularized matrix tri-factorization method for achieving robust collaborative clustering of patients and radiomic features. The method is built upon matrix tri-factorization [22] to achieve collaborative clustering (CC), enhanced by sparsity regularization for simultaneous feature dimensionality reduction and denoising. Specifically, differences between patients with distinct radiomic features are utilized as weak supervision information to obtain discriminative low dimensional features via simultaneously clustering the radiomic features and the samples, which is achieved by orthogonal nonnegative matrix tri-factorization. Meanwhile, adaptive sparsity regularization is used to suppress the influence of noise while keeping informative features, so that improved low-dimensional representation of features as well as patient stratification can be obtained.

Given radiomic features of all the patients under study in a matrix $Y \in R^{p \times f}$ , where p is the number of patients and f denotes the number of features, the regularized tri-factorization procedure decomposes matrix Y into three matrices Φ, X, and Θ via minimizing both the data fitting error and sparsity penalty:

{Φ, X, Θ} = arg min_{Φ \geq 0, X \geq 0, Θ \geq 0} ‖ Y - Φ X Θ ‖_{F}^{2} + ‖ Λ \cdot X ‖_{1}, s.t. Φ^{T} Φ = I_{k_{P}}, Θ Θ^{T} = I_{k_{F}},

(1)

where I_{k_P} and I_{k_F} are identity matrices, $Φ \in R_{+}^{p \times k_{P}}$ encodes a mapping between the patients and k_P patient clusters, $Θ \in R_{+}^{k_{F} \times f}$ encodes a mapping between the radiomic features and k_F feature clusters, $X \in R_{+}^{k_{P} \times k_{F}}$ reflects the magnitude of the mappings and the interactions between Φ and Θ, “·” is element-wise multiplication, and $Λ \in R_{+}^{k_{P} \times k_{F}}$ is a matrix that contains coefficient-adaptive regularization parameters controlling the relative contribution of the two terms.

Given noisy observation Y, Eq. (1) aims to find a sparse representation X that is free from noise. In this objective function, $‖ Y - Φ X Θ ‖_{F}^{2}$ is the data fidelity term, and ∥Λ · X∥₁ is a ℓ₁-norm sparsity regularization term, i.e.,

‖ Λ \cdot X ‖_{1} = \sum_{i = 1}^{k_{P}} \sum_{j = 1}^{k_{F}} λ_{i j} ∣ x_{i j} ∣,

(2)

where λ_ij is a regularization parameter assigned to x_ij ∈ X. Particularly, Φ and Θ can be seen as a pair of orthonormal dictionaries used to decompose the feature matrix, and X contains the corresponding coefficients.

Once Φ, X, and Θ are obtained, it is straightforward to compute a low-dimensional representation of the radiomic features, referred to as meta-features, $M \in R_{+}^{p \times k_{F}}$ :

M = Φ X .

(3)

The meta-features in M are weighted combinations of the radiomic features in the same feature clusters. The meta features could be used in prediction modeling to predict clinical outcomes. The patient grouping results could be used to stratify patients into groups with distinct radiomic features.

Since many radiomic features are correlated with each other, especially when a large number of features are extracted in both original image domain and wavelet transferred domains [34], the coefficients in X are relatively sparse and discriminative information is contained in only a few of them. In contrast, since noise is randomly introduced to radiomic features and roughly uncorrelated, it is approximately uniformly distributed among different coefficients and cannot be sparsely represented by the learned dictionaries. Hence the noise can be isolated from the feature components [35], [36]. As a result, the sparsity regularization helps suppress the noise and thus facilitates the learning of more effective dictionaries that could be used to simultaneously cluster the subjects and radiomic features.

B. A Robust Collaborative Clustering Algorithm

A divide-and-conquer strategy is adopted to solve the optimization problem of Eq. (1) by decomposing it into 3 sub-problems, i.e., preliminary dictionary learning, adaptive soft-thresholding, and collaborative clustering. We first learn a pair of dictionaries to represent the noisy radiomic features so that the features could be represented by corresponding coefficients of the learned dictionaries, then we apply adaptive soft-thresholding to the coefficients to isolate noise from the radiomic features, and finally perform collaborative clustering based on the denoised radiomic features. Each of these sub-problems can be tackled efficiently. Similar alternating direction methods have been used in many existing studies [37]-[39]. Since the proposed collaborative clustering method is robust to noise, we dub it robust collaborative clustering (RCC).

1). Preliminary Dictionary Learning:

The best way to learn a pair of dictionaries Φ and Θ is applying tri-factorization to the noise-free feature matrix, which is unavailable in practice. As an alternative, we learn a preliminary pair of dictionaries Φ and Θ from the noisy feature matrix Y by optimizing

{Φ, Θ} = arg min_{Φ \geq 0, X \geq 0, Θ \geq 0,} ‖ Y - Φ X Θ ‖_{F}^{2}, s.t. Φ^{T} Φ = I, Θ Θ^{T} = I

(4)

via an alternative optimization method [22].

To solve the optimization problem of Eq. (4), we need to decide hyper-parameters k_F and k_p. The optimal number of feature clusters k_F can be chosen based on gap criterion [40], which is formulated as:

G_{f} (k_{F}) = E_{n}^{*} [\log (W_{k_{F}})] - \log (W_{k_{F}}),

(5)

where $E_{n}^{*} [\cdot]$ is an expectation estimated using Monte Carlo sampling from a reference distribution (typically a uniform distribution), and W_{k_F} is a measure of pooled within-cluster dispersion:

W_{k_{F}} = \sum_{r = 1}^{k_{F}} \frac{1}{2 n_{r}} D_{r},

(6)

where r is an index of feature clusters, n_r is the number of radiomic features in a cluster r, and D_r is the sum of the pairwise distances for all radiomic feature samples in the cluster r. We use the correlation between the feature samples as the distance metric for computing D_r. The number of patient clusters k_p could be empirically chosen to stratify the patients in to high, intermediate, or low risk groups.

2). Adaptive Soft-Thresholding in Bayesian Framework:

With Φ and Θ fixed, the optimization problem of Eq. (1) reduces to

\tilde{X} = arg min_{X \geq 0} ‖ Y - Φ X Θ ‖_{F}^{2} + ‖ Λ \cdot X ‖_{1} .

(7)

Due to the non-differentiability of the ℓ₁ norm based regularization, a straightforward solution to Eq. (7) is not available. Since Φ and Θ are orthonormal, i.e., Φ^TΦ = I and ΘΘ^T = I, we have

‖ Y - Φ X Θ ‖_{F}^{2} = ‖ Φ^{T} (Y - Φ X Θ) Θ^{T} ‖_{F}^{2} = ‖ Φ^{T} Y Θ^{T} - X ‖_{F}^{2}

(8)

Let V = Φ^TYΘ^T be a coefficient matrix of Y decomposed by Φ^T and Θ^T, we can transform Eq. (7) into

\tilde{X} = arg min_{X \geq 0} ‖ V - X ‖_{F}^{2} + ‖ Λ \cdot X ‖_{1} .

(9)

A closed form solution to Eq. (9) is a simple soft-thresholding operation [41]:

\tilde{X} = soft (V, Λ) ≜ \max (∣ V ∣ - Λ, 0) \cdot sgn (V),

(10)

where · is an element-wise multiplication operator. Since there is a one-to-one correspondence between Λ and X, such a soft-thresholding operation leads to an adaptive denoising of the coefficients.

We pursuit the optimal parameters λ_ij ∈ Λ in the Bayesian framework. From the Bayesian point of view, Eq. (7) is derived from maximum a posteriori (MAP) estimation, formulated as:

\tilde{X} = arg \max_{X \geq 0} P (X ∣ Y) = arg \max_{X \geq 0} \frac{P (Y ∣ X) P (X)}{P (Y)},

(11)

or equivalently as

\tilde{X} = arg \max_{X \geq 0} \log P (Y ∣ X) + \log P (X),

(12)

where log P(Y∣X) is the log-likelihood of X, and log P(X) reflects prior knowledge of the coefficients. We use Gaussian white noise to approximate errors and distortions [39], [42] introduced in the observed radiomic features Y, and consider the coefficients in X conform to Laplace distributions that are good at ruling out outliers, i.e.,

P (Y ∣ X) \propto exp (- \frac{‖ Y - Φ X Θ ‖_{2}^{2}}{2 σ_{n}^{2}}),

(13)

P (X) \propto exp (- \sqrt{2} {‖ \frac{X}{Σ} ‖}_{1}),

(14)

where $σ_{n}^{2}$ is the variance of noise, $Σ \in R_{+}^{k_{P} \times k_{F}}$ is the standard deviations of corresponding coefficients in X, and the division calculation of Eq. (14) is element-wise. It is worth noting that such formulation allows the coefficients in X to have different variances so as to better model the radiomic features than an assumption of independent and identical distribution, because the statistics of the coefficients can be significantly different, which is analogous to the difference between high-frequency and low-frequency coefficients in classic orthonormal transforms [43].

Then, Eq. (9) evolves into

\tilde{X} = arg min_{X \geq 0} \frac{1}{2 σ_{n}^{2}} ‖ Y - Φ X Θ ‖_{F}^{2} + \sqrt{2} {‖ \frac{X}{Σ} ‖}_{1} .

(15)

Comparing Eq. (7) and Eq. (15), we can see that each parameter in Λ is calculated as

λ_{i j} = 2 \sqrt{2} \frac{σ_{n}^{2}}{σ_{i j}},

(16)

where σ_ij is the standard deviation of x_ij ∈ X.

It is noteworthy that the strength of sparsity regularization is proportional to the noise level and inversely proportional to the uncertainty of feature coefficients. Specifically, Eq. (10) and Eq. (16) indicate that the strength of the soft thresholding is inversely proportional to the standard deviation of the coefficient under consideration. This is because coefficients with larger variance contains more information, which should be preserved rather than shrunk. On the contrary, the coefficients with small variance typically contain as little information as noise, hence should be shrunk more to remove the noise. Furthermore, the equations also indicate that the regularization strength is proportional to noise variance, which is because heavier noise needs stronger shrinkage for denoising.

Another problem is that the standard deviations of noise-free coefficients in ∑ are unknown because the radiomic features are noisy. Since the additive noise can be considered to be evenly distributed over the subspaces spanned by the pair of dictionaries Φ and Θ, we can estimate the variances of coefficients via the following element-wise calculation:

Σ^{2} = \max (V^{2} - c σ_{n}^{2}, 0),

(17)

where c is a constant, and noise variance $σ_{n}^{2}$ can be estimated from the observations by existing techniques (see Sec. II-C).

3). Final Collaborative Clustering:

After the soft-thresholding operations are applied to the coefficients of radiomic features, the denoised version of Y can be computed as

\tilde{Y} = Φ \tilde{X} Θ .

(18)

Then, more effective collaborative clustering can be performed by solving the optimization problem (4), and better meta-features M can be obtained via Eq. (3).

IV. Evaluations and Results

A. Evaluation Based On Synthetic Data0

Before applying the proposed approach to real-world clinical data, we tested it on synthetic data for clustering to provide a straightforward view of the improvement over the conventional collaborative clustering. In order to better visualize the clustering results, we obtained a synthetic 2-dimmensional (2D) data set with 500 vectors distributed in 5 Gaussian clusters from [44] to evaluate clustering performance of the methods under comparison. Each dimension of the 2D data points was treated as one feature. In order to determine the number of clusters of the synthetic 2D data set, we adopted gap criterion [40].

We further evaluated the RCC based on high-dimensional synthetic data with relatively heavy noise. We replicated and stacked the 2D vectors into high-dimensional samples and add Gaussian white noise to them before performing clustering. Specifically, denoting the original synthetic data by $S = [s_{x} s_{y}] \in R_{+}^{500 \times 2}$ , where s_x, $s_{y} \in R_{+}^{500 \times 1}$ , all the elements of s_x were in the interval [4.12, 5.76] and those of s_y in [2.72, 4.41], we obtained an extended matrix S_E with high-dimensional samples by horizontally stacking S, i.e., $S_{E} = [S S \dots S] \in R_{+}^{500 \times k}$ , and then add noise to get noisy input synthetic data S_N: S_N = S_E + Z, where $Z \in R_{+}^{500 \times k}$ is the additive Gaussian white noise with variance $σ_{n}^{2}$ . In this study we set k = 66.

Table I summarizes the gap criterion values calculated for different numbers of clusters. Evidently, the gap criterion value reached its maximum when cluster number was 5, which coincided with the ground truth. Then, we applied CC and RCC to the synthetic data set to obtain 5 clusters of data points and 2 clusters of features.

TABLE I.

Gap Criterion Values Obtained From Input Synthetic Data

Number of Clusters	2	3	4	5	6	7	8	9	10
Gap Criterion Values	0.10	0.12	0.50	1.56	1.46	1.36	1.28	1.21	1.19

Open in a new tab

As explained in Section III-A, the dictionary $Φ \in R_{+}^{500 \times 5}$ contained the membership information of samples, thus individual atoms (i.e., rows in Φ) encoded different clusters of data points. In order to visualize differences among the atoms, we projected the 5D vectors onto 2D planes by multiplying matrix $X \in R_{+}^{5 \times 2}$ to the right, as shown in Fig. 2-b. Atoms associated with the same cluster (ground-truth) are labeled with the same color, which is in accordance with the color labeling of ground-truth clustering in Fig. 2-a. Obviously, the atoms associated with the cluster in cyan and those with the cluster in black tended to mix with each other by the CC, while they were well separated by RCC. Such observation was consistent with the final clustering results. We labeled the data points in the same cluster by the same color, and matched the color labeling with that of the ground-truth clustering in Fig 2-a based on majority matching, which was achieved by assigning the color label of a cluster in the ground truth to the cluster obtained by CC or RCC with maximum overlap. As shown in Fig. 2-c, some elements of the cluster in black were wrongly assigned to the cluster in cyan by the CC, while the clustering results obtained by the RCC were largely consistent with the ground truth.

Fig. 2. — Top row: 2D synthetic data points from 5 distinct clusters (left) and corresponding ground-truth clustering labeled by colors (right); Middle row: 2D projection of atoms in Φ. Points from the same cluster (ground-truth) are shown in the same color; Bottom row: Clustering results of the conventional CC and the proposed RCC on synthetic data. Points in the same color belongs to the same cluster obtained by the clustering algorithm.

We tested clustering performance of CC and RCC on S_N with σ_n ranging from 0.03 to 3 to obtain 5 clusters of the samples and 2 clusters of the features. Fig. 3 shows that, although for both the CC and RCC methods the clustering performance degraded with the increasing noise in the data, the RCC obtained better performance than the CC when σ_n was set to 0.03 and 0.3. Fig. 4 shows the corresponding clustering results, indicating that both the CC and RCC methods failed to obtain correct clustering results when σ_n was set to 3. However, the RCC obtained better clustering results than the CC when the noise was moderate, namely σ_n was set to 0.03 or 0.3.

Fig. 3. — 2D projection of atoms generated by the clustering of high-dimensional synthetic data with different levels of noise using the CC and RCC methods. Points in the same color belong to the same ground-truth cluster.

Fig. 4. — Clustering results of the CC and RCC methods on noisy high dimensional synthetic data. Points in the same color belong to the same cluster obtained by the clustering algorithms.

The observation is supported by the quantitative clustering evaluation results summarized in Table II. Particularly, we adopted adjusted rand index [45] to evaluate the clustering performance on sample clustering. The adjusted rand index is a corrected-for-chance version of the Rand index that is calculated based on a contingency table that summarizes the overlap between the two clusters under consideration [45]. Table II summarizes sum of the adjusted rand indexes between all possible pair of the clusters. These results demonstrated that the RCC performed better than the CC when the data were contaminated by moderate noise (σ_n < 2). For data with heavy noise, both methods failed to obtain reliable clustering results (adjusted rand index < 0.5).

TABLE II.

Adjusted Rand Index of Noisy Sample Clustering At Different Noise Levels

σ_n	CC	RCC
0.03	0.8462	0.9705
0.3	0.6919	0.9123
3	0.0276	0.0221

Open in a new tab

B. Evaluation for Patient Stratification and Survival Prediction in a Radiomic Study of Rectal Cancer

The proposed method was further evaluated in terms of patient stratification and survival prediction in a radiomic study of rectal cancer based on a dataset of 83 rectal cancer patients treated by CRT for locally advanced rectal cancer. All the patients had pre-treatment FDG-PET/CT scans, and 8 patients deceased within a median follow-up of 3 years. The tumors were manually contoured by professional radiologists. Standardized uptake value (SUV) was computed for the PET scans.

We computed both non-texture features and texture features from the CT scans and SUV maps respectively using a radiomic method, which is capable of extracting multimodal imaging features [34]. Particularly, non-texture features contained volumetric metrics including volume, size, solidity and eccentricity of the tumor region, and SUV metrics including SUVmax, SUVpeak, SUVmean, AUC-CSH, and percent inactive for SUV maps. Texture features contained global features extracted from intensity histogram of the tumor region, and matrix-based features include GLCM (Gray-level co-occurrence matrix), GLRLM (Gray-level run-length matrix), GLSZM (Gray-level size zone matrix), and NGTDM (Neighborhood gray-tone difference matrix) based features. Detailed definition of all these features were described in [34].

For texture features, different extraction parameters, including wavelet band-pass filtering (weight ratios of band-pass sub-bands to low- and high-frequency sub-bands in wavelet domain were set to 1/2, 2/3, 1, 3/2, 2 respectively) and quantization of gray levels (number of gray levels were set to 16, 32, and 64 respectively) were adopted. Moreover, different gray level quantization algorithms including uniform and equal-probability strategies were adopted for the computation of the matrix-based texture features.

The number of CT radiomic features was 1249, and the number of PET radiomic features was 1254. All these features were pooled together for the patient stratification and prediction modeling.

The number of feature clusters k_F was set to be 11 according to correlation based gap criterion [40], and the number of patient clusters k_P was empirically set to 3, aiming to stratify the patients into a low-risk group, a medium-risk group and a high-risk group.

To estimate the noise variance in the input feature data, we measured the reproducibility of clustering under different estimates of η with noise variance $σ_{n}^{2}$ . Particularly, for each estimate of η, we randomly selected 80% of the patients N times independently to obtain clustering results, and summed up adjusted rand indexes [45] between each pair of the feature clusters ( $C_{N}^{2}$ pairs of clustering results in total) to obtain the total rand index:

T (η) = \sum_{i = 1}^{N - 1} \sum_{j = i + 1}^{N} R (c_{i}^{(η)}, c_{j}^{(η)}),

(19)

where c_i^(η) is the i-th feature cluster’s membership indicating vector with noise variance set to η, and R(·) is the function for calculating the adjusted rand index [45]. Then the η with the highest total rand index was chosen as the estimate for noise variance:

η = arg min_{η} T (η) .

(20)

As shown in Fig. 5, when the feature matrix was normalized, the optimal estimate of $σ_{n}^{2}$ was around 2⁻⁹.

With the patients grouped and the meta features calculated, we can stratify the patients and predict their survival. This study used Kaplan-Meier estimation [46] to generate a survival function for each group of patients in terms of survival. The differences between the patient groups were measured by log-rank test [47]. To predict the risk of mortality for each patient, we built prediction models with the meta features for each patient using three survival modeling methods, including Cox proportional hazard regression (Cox) [48], Cox with LASSO (CoxL) [49], and random survival forests (RSF) [50]. We trained and tested the prediction models by the same 5-fold cross-validation, and used the concordance index (C-index) [51] to examine the effectiveness of the models.

We repeated the cross-validation procedures 100 times and report the average performance scores. The R packages survival, glmnet and randomForestSRC are employed to build the prediction models. The sparsity parameter in CoxL was decided by a nested 3-fold cross validation, and 500 decision trees were used in the RSF model with the minimum leaf size of trees set to be 5.

It is intuitive that noise would affect the data-driven dictionary learning, and data denoising is expected to improve efficiency of this process. Since dictionary learning in this study was attained by iterative matrix tri-factorization [22], we checked the empirical convergence of such iterative procedure. Particularly, we obtained the residual of approximation in Eq. (4) by calculating the relative difference between the input matrix Y and the corresponding decomposition results:

γ_{k} = ‖ Y - Φ_{k} X_{k} Θ_{k} ‖_{F}^{2} ∕ ‖ Y ‖_{F}^{2},

(21)

where Φ_kX_kΘ_k is the factorization result in the k-th iteration.

Fig. 6 shows plots of the residual before adaptive soft-thresholding (AST, see Sec. III-B) and after AST. The former showed empirical convergence of the conventional CC, while the latter depicted the convergence of the proposed RCC. Obviously, the residual generated after AST decayed much faster and was closer to zero, showing that the AST operation remarkably improved the convergence.

Fig. 7 shows the clustering result of 3 patient groups and 11 feature groups obtained by the RCC method as well as SUV maps and CT images of representative patients of different patient groups. The representative patients were chosen with features close to mean of the features of all patients within each specific patient group. The patients of different groups had SUV maps with visually distinctive spatial patterns and the tumors of these patients shown in CT images had different shapes and distinctive appearances. Such differences were also characterized by their quantitative radiomic features visualized in the top panel of Fig. 7. These visualization results indicated that the RCC method could capture distinctive radiomic patterns of rectal cancer patients.

Fig. 8 visualizes the correlation between the clustered subjects and features. The correlation within the same cluster (diagonal blocks) of the patients and the features obtained by the RCC was higher than that obtained by the CC, including that the RCC obtained better clustering results in terms of both patient stratification and the clustering of the features since highly-correlated radiomic features are expected to be grouped into the same cluster, and meanwhile patients with similar features should be grouped together.

Since k-means [26] and k-medoids [27] are the most widely-used approaches for clustering in medical image analysis, we used them as baseline methods to evaluate the patient stratification performance. They partitioned patients into 3 clusters respectively. The clustering procedure was repeated 30 times before obtaining the final clusters, and each run was randomly initialized with a new set of centroids. We also compared the proposed method with orthogonal nonnegative matrix factorization (ONMF) [33] to empirically show the advantage of mapping radiomic features to three matrices via regularized tri-factorization. We used the MATLAB codes released by the authors without changing the parameter settings to partition the subjects into 3 groups. Besides, we adopted Kaplan-Meier estimation [46] to generate a survival function for each group of patients in terms of survival. The differences between the 3 patient groups were measured by log-rank test [47]. All the approaches mentioned above were implemented in MATLAB R2014b.

As summarized in Table III, the stratification results produced by the proposed RCC achieved better log-rank results, and the differences between Group 3 and Group 2 as well as Group 1 and Group 2 were statistically significant, as demonstrated by log-rank with p-values of 0.0016 and 0.0429 respectively.

TABLE III.

Comparison of p-Value Between Different Groups Stratified by k-Means, k-Medoids, CC and RCC

	K-Means	K-Medoids	ONMF	CC	RCC
Group 1 vs. Group 2	0.0701	0.0491	0.0711	0.0711	0.0429
Group 3 vs. Group 2	0.0032	0.0049	0.0026	0.0032	0.0016
Group 1 vs. Group 3	0.8624	0.9415	0.8462	0.8624	0.8533

Open in a new tab

To evaluate survival prediction models built on the meta features, we used PCA [19] based feature extraction as a baseline method to evaluate the effectiveness of low-dimensional representation obtained by the proposed RCC, in addition to the CC approach [20].

Table IV summarizes the prediction performance measured by C-index. The first column of C-indices are results of prediction models built by the Cox, CoxL, and RSF methods without feature extraction. For the PCA based prediction, we used MATLAB function pca to obtain the best results with the number of extracted features ranging from 5~13. For ONMF based prediction, the features were divided into 11 clusters to obtain the meta-feature matrix. For the CC and RCC based prediction, the results were obtained with k_P = 3 and k_F = 11. It is evident that prediction models based on meta features produced by the proposed RCC remarkably outperformed those built upon the PCA generated features and meta features obtained by the conventional CC. The best performance was achieved by the Cox with LASSO prediction model built on meta features extracted by the proposed approach, with a gain of 15.6% over the best score achieved by PCA, 14.7% over the best score attained by ONMF, and 8.1% over the best score achieved by CC.

TABLE IV.

Prediction Performance (C-Index) Comparison Between PCA, ONMF, CC and RCC

	-	PCA	ONMF	CC	RCC
Cox	0.63±0.02	0.63±0.06	0.63±0.07	0.67±0.05	0.71±0.04
CoxL	0.61±0.05	0.58±0.08	0.59±0.08	0.67±0.05	0.73±0.06
RSF	0.54±0.04	0.50±0.06	0.56±0.06	0.55±0.04	0.63±0.03

Open in a new tab

There are two major reasons why RCC outperformed ONMF, PCA, k-means and k-medoids: 1) Besides differences between features, RCC also utilized the differences between heterogeneous groups of samples to learn a discriminative low dimensional representation, while the others ignored such discriminative information; 2) RCC was designed to be robust against noise, while the others might be sensitive to noisy feature data. The improvement of RCC over CC was brought by the adaptive sparsity regularization that effectively reduce the noise.

V. Discussion and Conclusions

In order to predict clinical outcomes based on radiomic features, we develop a robust collaborative clustering approach based on adaptive sparsity regularization to compute low-dimensional, informative meta features. Different from conventional feature dimensionality reduction methods that ignore inherent differences between heterogeneous patient group in radiomic features, the robust collaborative clustering method could simultaneously cluster patients and radiomic features into distinct groups respectively under adaptive sparse regularization. Our method considers the underlying gap in radiomic features of heterogeneous groups of patients with different clinical outcomes, and utilizes an orthogonal matrix tri-factorization technique to cluster highly-correlated features and patients simultaneously so that the grouping information of patients with distinct radiomic features provides latent supervision information to guide feature dimensionality reduction, and meanwhile the noise is adaptively isolated from radiomic features. The strong correlation between features within the same clusters could lead to a sparse representation that separates feature components from noise, facilitating the proposed algorithm to remove noise via adaptive sparsity regularization based on distribution modeling of the coefficients.

The collaborative clustering approach improves both the clustering of subjects and the features. On the one hand, since the radiomic feature data from heterogeneous groups of patients are treated separately, features within the same sub-cluster are more correlated, thus could capture the connections between high-dimensional features more effectively to generate better low-dimensional representations. On the other hand, more effective and informative low-dimensional representations in return help differentiate feature patterns of patients from heterogeneous groups, and lead to better patient stratification and prediction of clinical outcomes.

Different from existing matrix tri-factorization techniques that typically adopt graph regularization [23], symmetric regularization [24] or manifold regularization [25] to improve the matrix tri-factorization performance, we propose to use adaptive sparsity regularization to make the matrix tri-factorization performance robust to noisy features. Furthermore, the adaptive sparsity regularization and the collaborative clustering also benefit from each other. Since the proposed sparsity regularization method is based on the collaborative clustering that effectively utilizes the discriminative information of feature data, the obtained sparse representation preserves such discriminative information and thus is more informative after adaptively suppressing the noise components. In contrast, due to a lack of relevant guidance, such discriminative information is prone to being removed along with noise in denoising approaches based on conventional dimensionality reduction techniques. Meanwhile, after denoising via sparsity regularization, the collaborative clustering procedure is more robust, leading to more accurate grouping of both samples and features and better meta feature for clinical analysis.

Compared with supervised learning approaches [4], [52], [53], the proposed method does not rely on any outer training dataset and thus can circumvent small sample size problems, such as overfitting. This merit is desirable for clinical analysis since there are many circumstances where only a small number of data samples are available. Although prediction models built using supervised learning methods, particularly deep learning algorithms, could achieve promising performance, their training usually relies on large data sets, which are not available in many clinical studies. Similar to most traditional clustering algorithms, deep clustering methods [54]-[56] also ignore underlying discriminative information provided by the heterogeneous subject clusters. Transfer learning is a promising technique to reuse features learned from other studies so that the small size problem could be alleviated [57]-[61]. However, well-established pre-trained models for 3D medical imaging data are not available. Therefore, robust machine learning methods, such as the proposed with adaptive sparsity regularization, are needed in studies with moderate sample sizes. Compared with conventional unsupervised learning approaches like ONMF [33], PCA [19], k-means [26] and k-medoids [27] that ignore the differences of features between heterogeneous groups of samples, the proposed method is able to effectively capture the discriminative information, leading to better clinical analysis performance.

Besides prognosis of survival [62]-[64], radiomics has also been applied to cancer patient staging [65], [66] and prediction of metastasis [65], [67], [68]. Accordingly, the meta-features generated by the proposed method could also be utilized in place of the original radiomic features in the studies of patient staging and prediction of metastasis. Furthermore, incorporating the learned meta-features into the traditional staging systems is a promising way to improve the performance. For instance, it has been demonstrated that a combination of clinical staging with CT/PET radiomic features attained higher stratification power than standard clinical staging [66]. In such applications, the meta-features produced by the proposed approach could also be utilized to pursue better performance.

The main limitation of this study is a lack of knowledge on the actual statistical characteristics of the noise and coefficients, although it is quite common to consider the noise to be independently distributed Gaussian distribution with zero mean, and it is standard to use Frobenius norm for the data fidelity term [36], [42]. From a Bayesian point of view, this study assumes the mean of the coefficients to be zero, which may be inaccurate because the actual mean can be positive. As a result, the proposed method shrinks all coefficients towards zero while the optimal way is to regularize them towards their actual expectations. How to better estimate such statistical characteristics from insufficient data samples merits further investigation.

The experiments on the synthetic data demonstrated the superiority of the proposed approach over conventional collaborative clustering method in data clustering, and empirical results on clinical analysis based on radiomic features showed that the method could group highly-correlated features in the same groups, leading to more effective low-dimensional representation and contributing to better patient stratification. Quantitative evaluation measures of both patient stratification and survival prediction further demonstrated the effectiveness of the proposed approach over existing methods.

In conclusion, the present study provides evidence that integrating feature denoising and unsupervised collaborative clustering into a robust system could improve cancer patient stratification and prognosis.

Acknowledgment

The authors would like to thank the anonymous reviewers whose comments have greatly improved this manuscript.

This work was supported by the National Institutes of Health under Grants CA223358, CA189523, and EB022573.

Contributor Information

Hangfan Liu, Center for Biomedical Image Computing and Analytics, University of Pennsylvania..

Hongming Li, Center for Biomedical Image Computing and Analytics, University of Pennsylvania..

Mohamad Habes, Center for Biomedical Image Computing and Analytics, University of Pennsylvania..

Yuemeng Li, Center for Biomedical Image Computing and Analytics, University of Pennsylvania..

Pamela Boimel, Department of Radiation Oncology, University of Pennsylvania..

James Janopaul-Naylor, Department of Radiation Oncology, University of Pennsylvania..

Ying Xiao, Department of Radiation Oncology, University of Pennsylvania..

Edgar Ben-Josef, Department of Radiation Oncology, University of Pennsylvania..

Yong Fan, Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, PA 19104 USA.

References

[1].Maas M et al. , “Long-term outcome in patients with a pathological complete response after chemoradiation for rectal cancer: A pooled analysis of individual patient data,” Lancet Oncol., vol. 11, no. 9, pp. 835–844, 2010. [DOI] [PubMed] [Google Scholar]
[2].Nie K et al. , “Rectal cancer: Assessment of neoadjuvant chemoradiation outcome based on radiomics of multi-parametric MRI,” Clinical Cancer Res., vol. 22, pp. 5256–5264, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
[3].Dinapoli N et al. , “Radiomics for rectal cancer,” Translational Cancer Res., vol. 5, no. 4, pp. 424–s431, 2016. [Google Scholar]
[4].Li H et al. , “Deep convolutional neural networks for imaging based survival analysis of rectal cancer patients,” Int. J. Radiat. Oncol. Biol. Phys, vol. 99, no. 2, 2017, Art. no. S183. [Google Scholar]
[5].Joye I et al. , “Can clinical factors be used as a selection tool for an organ-preserving strategy in rectal cancer?,” Acta Oncologica, vol. 55, no. 8, pp. 1047–1052, 2016. [DOI] [PubMed] [Google Scholar]
[6].Li H et al. , “Deep convolutional neural networks for imaging data based survival analysis of rectal cancer,” in Proc. IEEE 16th Int. Symp. Biomed. Imag., 2019, pp. 846–849. [DOI] [PMC free article] [PubMed] [Google Scholar]
[7].Gillies RJ, Kinahan PE, and Hricak H, “Radiomics: Images are more than pictures, they are data,” Radiology, vol. 278, no. 2, pp. 563–577, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[8].Kumar V et al. , “Radiomics: the process and the challenges,” Magnetic Reson. Imag, vol. 30, no. 9, pp. 1234–1248, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
[9].Cameron A et al. , “MAPS: A quantitative radiomics approach for prostate cancer detection,” IEEE Trans. Biomed. Eng, vol. 63, no. 6, pp. 1145–1156, June. 2016. [DOI] [PubMed] [Google Scholar]
[10].Emaminejad N et al. , “Fusion of quantitative image and genomic biomarkers to improve prognosis assessment of early stage lung cancer patients,” IEEE Trans. Biomed. Eng, vol. 63, no. 5, pp. 1034–1043, May 2016. [DOI] [PubMed] [Google Scholar]
[11].Ren Y et al. , “High-performance CAD-CTC scheme using shape index, multiscale enhancement filters, and radiomic features,” IEEE Trans. Biomed. Eng, vol. 64, no. 8, pp. 1924–1934, August. 2017. [DOI] [PubMed] [Google Scholar]
[12].Martin-Carreras T et al. , “Radiomic features from MRI distinguish myxomas from myxofibrosarcomas,” BMC Med. Imag, vol. 19, no. 67, pp. 1–9, August. 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
[13].Men K et al. , “A deep learning model for predicting xerostomia due to radiation therapy for head and neck squamous cell carcinoma in the RTOG 0522 clinical trial,” Int. J. Radiat. Oncol. Biol. Phys, vol. 105, no. 2, pp. 440–447, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
[14].Davatzikos C et al. , “Precision diagnostics based on machine learning-derived imaging signatures,” Magnetic Reson. Imag, vol. 64, pp. 49–61, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
[15].Liu H et al. , “Adaptive sparsity regularization based collaborative clustering for cancer prognosis,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. New York; Berlin, Germany: Springer, 2019, pp. 583–592. [DOI] [PMC free article] [PubMed] [Google Scholar]
[16].Peng H and Fan Y, “Feature selection by optimizing a lower bound of conditional mutual information,” Inf. Sciences, vol. 418, pp. 652–667, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
[17].Peng H and Fan Y, “A general framework for sparsity regularized feature selection via iteratively reweighted least square minimization,” in Proc. AAAI Conf. Artificial Intelligence, 2017, pp. 2471–2477. [Google Scholar]
[18].Peng H and Fan Y, “Direct sparsity optimization based feature selection for multi-class classification,” in Proc. 25th Int. Joint Conf. Artif. Intell. (16), 2016, pp. 1918–1924. [Google Scholar]
[19].Jolliffe IT and Cadima J, “Principal component analysis: A review and recent developments,” Philos. Trans. Royal Soc. A: Math., Physical Eng. Sciences, vol. 374, no. 2065, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
[20].Liu H et al. , “Collaborative clustering of subjects and radiomic features for predicting clinical outcomes of rectal cancer patients,” in Proc. IEEE Int. Symp. Biomed. Imag, 2019, pp. 1303–1306. [DOI] [PMC free article] [PubMed] [Google Scholar]
[21].Li H et al. , “Unsupervised machine learning of radiomic features for predicting treatment response and overall survival of early stage non-small cell lung cancer patients treated with stereotactic body radiation therapy,” Radiotherapy Oncol., vol. 129, no. 2, pp. 218–226, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
[22].Ding C et al. , “Orthogonal nonnegative matrix tri-factorizations for clustering,” in Proc. ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2006, pp. 126–135. [Google Scholar]
[23].Pei Y, Chakraborty N, and Sycara K, “Nonnegative matrix tri-factorization with graph regularization for community detection in social networks,” in Proc. Int. Joint Conf. Artif. Intell., 2015, pp. 2083–2089. [Google Scholar]
[24].Gligorijević V, Panagakis Y, and Zafeiriou S, “Non-negative matrix factorizations for multiplex network analysis,” IEEE Trans. Pattern Anal. Mach. Intell, vol. 41, no. 4, pp. 928–940, April. 2019. [DOI] [PubMed] [Google Scholar]
[25].Xu X et al. , “Matrix tri-factorization with manifold regularizations for zero-shot learning,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., 2017, pp. 3798–3807. [Google Scholar]
[26].Coates A and Ng AY, “Learning feature representations with k-means,” in Neural Networks: Tricks of the Trade. New York; Berlin, Germany: Springer, 2012, pp. 561–580. [Google Scholar]
[27].Park H-S and Jun C-H, “A simple and fast algorithm for K-medoids clustering,” Expert Syst. With Appl, vol. 36, no. 2, pp. 3336–3341, 2009. [Google Scholar]
[28].Szekely GJ and Rizzo ML, “Hierarchical clustering via joint between-within distances: Extending Ward’s minimum variance method,” J. Classification, vol. 22, no. 2, pp. 151–183, 2005. [Google Scholar]
[29].Zhang J, Li C-G, Zhang H, and Guo J, “Low-rank and structured sparse subspace clustering,” in Proc. IEEE Visual Commun. Image Process, 2016, pp. 1–4. [Google Scholar]
[30].Peng B et al. , “Unsupervised video action clustering via motion-scene interaction constraint,” IEEE Trans. Circuits Systems Video Technol, vol. 30, no. 1, pp. 131–144, January. 2020. [Google Scholar]
[31].Li B et al. , “Subspace clustering under complex noise,” IEEE Trans. Circuits Systems Video Technol, vol. 29, no. 4, pp. 930–940, April. 2019. [Google Scholar]
[32].Xie J, Girshick R, and Farhadi A, “Unsupervised deep embedding for clustering analysis,” in Proc. Int. Conf. Mach. Learn., 2016, pp. 478–487. [Google Scholar]
[33].Pompili F et al. , “Two algorithms for orthogonal nonnegative matrix factorization with application to clustering,” Neurocomputing, vol. 141, pp. 15–25, 2014. [Google Scholar]
[34].Vallières M et al. , “A radiomics model from joint FDG-PET and MRI texture features for the prediction of lung metastases in soft-tissue sarcomas of the extremities,” Phys. Medicine Biol, vol. 60, no. 14, 2015, Art. no. 5471. [DOI] [PubMed] [Google Scholar]
[35].Liu H, Zhang X, and Xiong R, “Content-adaptive low rank regularization for image denoising,” in Proc. IEEE Int. Conf. Image Process., 2016, pp. 3091–3095. [Google Scholar]
[36].Liu H et al. , “Image denoising via low rank regularization exploiting intra and inter patch correlation,” IEEE Trans. Circuits Syst. Video Technol, vol. 28, no. 12, pp. 3321–3332, December. 2018. [Google Scholar]
[37].Goldstein T and Osher S, “The split Bregman method for L1-regularized problems,” SIAM J. Imag. Sci, vol. 2, no. 2, pp. 323–343, 2009. [Google Scholar]
[38].Liu H et al. , “Nonlocal gradient sparsity regularization for image restoration,” IEEE Trans. Circuits Syst. Video Technol, vol. 27, no. 9, pp. 1909–1921, September. 2017. [Google Scholar]
[39].Liu H et al. , “CG-Cast: Scalable wireless image SoftCast using compressive gradient,” IEEE Trans. Circuits Syst. Video Technol, vol. 29, no. 6, pp. 1832–1843,June. 2019. [Google Scholar]
[40].Tibshirani R, Walther G, and Hastie T, “Estimating the number of clusters in a data set via the gap statistic,” J. Royal Statistical Soc.: Series B (Statistical Methodology), vol. 63, no. 2, pp. 411–423, 2001. [Google Scholar]
[41].Selesnick I, A Derivation of the Soft-Thresholding Function. New York, NY, USA: Polytechnic Institute of New York University, 2009. [Google Scholar]
[42].Liu H et al. , “Image super-resolution based on adaptive joint distribution modeling,” in Proc. IEEE Visual Commun. Image Process, 2017, pp. 1–4. [Google Scholar]
[43].Liu H et al. , “Image denoising via adaptive soft-thresholding based on non-local samples,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit, 2015, pp. 484–492. [Google Scholar]
[44].Rezaei M and Fränti P, “Set matching measures for external cluster validity,” IEEE Trans. Knowl. Data Eng, vol. 28, no. 8, pp. 2173–2186, August. 2016. [Google Scholar]
[45].Hubert L and Arabie P, “Comparing partitions,” J. Classification, vol. 2, no. 1, pp. 193–218, 1985. [Google Scholar]
[46].Kaplan EL and Meier P, “Nonparametric estimation from incomplete observations,” J. Amer. Statistical Association, vol. 53, no. 282, pp. 457–481, 1958. [Google Scholar]
[47].Mantel N, “Evaluation of survival data and two new rank order statistics arising in its consideration,” Cancer Chemother Rep., vol. 50, pp. 163–170, 1966. [PubMed] [Google Scholar]
[48].Fox J, “Cox proportional-hazards regression for survival data,” An R S-PLUS Companion Appl. Regression, vol. 2002, pp. 1–18, 2002. [Google Scholar]
[49].Tibshirani R, “The lasso method for variable selection in the Cox model,” Statistics Medicine, vol. 16, no. 4, pp. 385–395, 1997. [DOI] [PubMed] [Google Scholar]
[50].Ishwaran H et al. , “Random survival forests,” Ann. Appl. Statistics, vol. 2, no. 3,pp. 841–860, 2008. [Google Scholar]
[51].Harrell FE Jr., et al. , “Regression modelling strategies for improved prognostic prediction,” Statistics Medicine, vol. 3, no. 2, pp. 143–152, 1984. [DOI] [PubMed] [Google Scholar]
[52].Guo Y et al. , “Breast cancer histology image classification based on deep neural networks,” in International Conference Image Analysis and Recognition. New York; Berlin, Germany: Springer, 2018, pp. 827–836. [Google Scholar]
[53].Yin S et al. , “Multi-instance deep learning with graph convolutional neural networks for diagnosis of kidney diseases using ultrasound imaging,” in Uncertainty for Safe Utilization of Machine Learning in Medical Imaging and Clinical Image-Based Procedures. New York; Berlin, Germany: Springer, 2019, pp. 146–154. [DOI] [PMC free article] [PubMed] [Google Scholar]
[54].Hershey JR et al. , “Deep clustering: Discriminative embeddings for segmentation and separation,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 2016, pp. 31–35. [Google Scholar]
[55].Yang B et al. , “Towards k-means-friendly spaces: Simultaneous deep learning and clustering,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 3861–3870. [Google Scholar]
[56].Ghasedi Dizaji K et al. , “Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization,” in Proc. IEEE Int. Conf. Comput. Vision, 2017, pp. 5736–5745. [Google Scholar]
[57].Weiss K, Khoshgoftaar TM, and Wang D, “A survey of transfer learning,” J. Big Data, vol. 3, no. 9, pp. 1–40, 2016. [Google Scholar]
[58].He H and Wu D, “Transfer learning for brain-computer interfaces: A Euclidean space data alignment approach,” IEEE Trans. Biomed. Eng, vol. 67, no. 2, pp. 399–410, February. 2020. [DOI] [PubMed] [Google Scholar]
[59].Zheng Q et al. , “Computer-aided diagnosis of congenital abnormalities of the kidney and urinary tract in children based on ultrasound imaging data by integrating texture image features and deep transfer learning image features,” J. Pediatric Urol, vol. 15, no. 1, pp. 75.e1–75.e7, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
[60].Yin S et al. , “Automatic kidney segmentation in ultrasound images using subsequent boundary distance regression and pixelwise classification networks,” Medical Image Anal., vol. 60, 2020, Art. no. 101602. [DOI] [PMC free article] [PubMed] [Google Scholar]
[61].Zheng Q, Tastan G, and Fan Y, “Transfer learning for diagnosis of congenital abnormalities of the kidney and urinary tract in children based on ultrasound imaging data,” in Proc. 2018 IEEE 15th Int. Symp. Biomed. Imag., 2018, pp. 1487–1490. [DOI] [PMC free article] [PubMed] [Google Scholar]
[62].Parmar C et al. , “Radiomic feature clusters and prognostic signatures specific for lung and head & neck cancer,” Scientific Reports, vol. 5, 2015, Art. no. 11044. [DOI] [PMC free article] [PubMed] [Google Scholar]
[63].Zhang Y et al. , “Radiomics-based prognosis analysis for non-small cell lung cancer,” Scientific Reports, vol. 7, 2017, Art. no. 46349. [DOI] [PMC free article] [PubMed] [Google Scholar]
[64].Li H et al. , “MR imaging radiomics signatures for predicting the risk of breast cancer recurrence as given by research versions of MammaPrint, Oncotype DX, and PAM50 gene assays,” Radiology, vol. 281, no. 2, pp. 382–391, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
[65].Vicente AMG et al. , “Heterogeneity in [18F] Fluorodeoxyglucose positron emission tomography/computed tomography of non–small cell lung carcinoma and its relationship to metabolic parameters and pathologic staging,” Mol. Imag, vol. 13, no. 9, 2014, Art. no. 7290.2014. 00032. [Google Scholar]
[66].Desseroit M-C et al. , “Development of a nomogram combining clinical staging with 18 F-FDG PET/CT image features in non-small-cell lung cancer stage I-III,” Eur. J. Nuclear Medicine Mol. Imag, vol. 43, no. 8, pp. 1477–1485, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
[67].Coroller TP et al. , “CT-based radiomic signature predicts distant metastasis in lung adenocarcinoma,” Radiother. Oncol, vol. 114, no. 3, pp. 345–350, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[68].Zhou H et al. , “Diagnosis of distant metastasis of lung cancer: based on clinical and radiomic features,” Translational Oncol., vol. 11, no. 1, pp. 31–36, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] [1].Maas M et al. , “Long-term outcome in patients with a pathological complete response after chemoradiation for rectal cancer: A pooled analysis of individual patient data,” Lancet Oncol., vol. 11, no. 9, pp. 835–844, 2010. [DOI] [PubMed] [Google Scholar]

[R2] [2].Nie K et al. , “Rectal cancer: Assessment of neoadjuvant chemoradiation outcome based on radiomics of multi-parametric MRI,” Clinical Cancer Res., vol. 22, pp. 5256–5264, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] [3].Dinapoli N et al. , “Radiomics for rectal cancer,” Translational Cancer Res., vol. 5, no. 4, pp. 424–s431, 2016. [Google Scholar]

[R4] [4].Li H et al. , “Deep convolutional neural networks for imaging based survival analysis of rectal cancer patients,” Int. J. Radiat. Oncol. Biol. Phys, vol. 99, no. 2, 2017, Art. no. S183. [Google Scholar]

[R5] [5].Joye I et al. , “Can clinical factors be used as a selection tool for an organ-preserving strategy in rectal cancer?,” Acta Oncologica, vol. 55, no. 8, pp. 1047–1052, 2016. [DOI] [PubMed] [Google Scholar]

[R6] [6].Li H et al. , “Deep convolutional neural networks for imaging data based survival analysis of rectal cancer,” in Proc. IEEE 16th Int. Symp. Biomed. Imag., 2019, pp. 846–849. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] [7].Gillies RJ, Kinahan PE, and Hricak H, “Radiomics: Images are more than pictures, they are data,” Radiology, vol. 278, no. 2, pp. 563–577, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] [8].Kumar V et al. , “Radiomics: the process and the challenges,” Magnetic Reson. Imag, vol. 30, no. 9, pp. 1234–1248, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] [9].Cameron A et al. , “MAPS: A quantitative radiomics approach for prostate cancer detection,” IEEE Trans. Biomed. Eng, vol. 63, no. 6, pp. 1145–1156, June. 2016. [DOI] [PubMed] [Google Scholar]

[R10] [10].Emaminejad N et al. , “Fusion of quantitative image and genomic biomarkers to improve prognosis assessment of early stage lung cancer patients,” IEEE Trans. Biomed. Eng, vol. 63, no. 5, pp. 1034–1043, May 2016. [DOI] [PubMed] [Google Scholar]

[R11] [11].Ren Y et al. , “High-performance CAD-CTC scheme using shape index, multiscale enhancement filters, and radiomic features,” IEEE Trans. Biomed. Eng, vol. 64, no. 8, pp. 1924–1934, August. 2017. [DOI] [PubMed] [Google Scholar]

[R12] [12].Martin-Carreras T et al. , “Radiomic features from MRI distinguish myxomas from myxofibrosarcomas,” BMC Med. Imag, vol. 19, no. 67, pp. 1–9, August. 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] [13].Men K et al. , “A deep learning model for predicting xerostomia due to radiation therapy for head and neck squamous cell carcinoma in the RTOG 0522 clinical trial,” Int. J. Radiat. Oncol. Biol. Phys, vol. 105, no. 2, pp. 440–447, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] [14].Davatzikos C et al. , “Precision diagnostics based on machine learning-derived imaging signatures,” Magnetic Reson. Imag, vol. 64, pp. 49–61, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] [15].Liu H et al. , “Adaptive sparsity regularization based collaborative clustering for cancer prognosis,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. New York; Berlin, Germany: Springer, 2019, pp. 583–592. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] [16].Peng H and Fan Y, “Feature selection by optimizing a lower bound of conditional mutual information,” Inf. Sciences, vol. 418, pp. 652–667, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] [17].Peng H and Fan Y, “A general framework for sparsity regularized feature selection via iteratively reweighted least square minimization,” in Proc. AAAI Conf. Artificial Intelligence, 2017, pp. 2471–2477. [Google Scholar]

[R18] [18].Peng H and Fan Y, “Direct sparsity optimization based feature selection for multi-class classification,” in Proc. 25th Int. Joint Conf. Artif. Intell. (16), 2016, pp. 1918–1924. [Google Scholar]

[R19] [19].Jolliffe IT and Cadima J, “Principal component analysis: A review and recent developments,” Philos. Trans. Royal Soc. A: Math., Physical Eng. Sciences, vol. 374, no. 2065, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] [20].Liu H et al. , “Collaborative clustering of subjects and radiomic features for predicting clinical outcomes of rectal cancer patients,” in Proc. IEEE Int. Symp. Biomed. Imag, 2019, pp. 1303–1306. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] [21].Li H et al. , “Unsupervised machine learning of radiomic features for predicting treatment response and overall survival of early stage non-small cell lung cancer patients treated with stereotactic body radiation therapy,” Radiotherapy Oncol., vol. 129, no. 2, pp. 218–226, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] [22].Ding C et al. , “Orthogonal nonnegative matrix tri-factorizations for clustering,” in Proc. ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2006, pp. 126–135. [Google Scholar]

[R23] [23].Pei Y, Chakraborty N, and Sycara K, “Nonnegative matrix tri-factorization with graph regularization for community detection in social networks,” in Proc. Int. Joint Conf. Artif. Intell., 2015, pp. 2083–2089. [Google Scholar]

[R24] [24].Gligorijević V, Panagakis Y, and Zafeiriou S, “Non-negative matrix factorizations for multiplex network analysis,” IEEE Trans. Pattern Anal. Mach. Intell, vol. 41, no. 4, pp. 928–940, April. 2019. [DOI] [PubMed] [Google Scholar]

[R25] [25].Xu X et al. , “Matrix tri-factorization with manifold regularizations for zero-shot learning,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., 2017, pp. 3798–3807. [Google Scholar]

[R26] [26].Coates A and Ng AY, “Learning feature representations with k-means,” in Neural Networks: Tricks of the Trade. New York; Berlin, Germany: Springer, 2012, pp. 561–580. [Google Scholar]

[R27] [27].Park H-S and Jun C-H, “A simple and fast algorithm for K-medoids clustering,” Expert Syst. With Appl, vol. 36, no. 2, pp. 3336–3341, 2009. [Google Scholar]

[R28] [28].Szekely GJ and Rizzo ML, “Hierarchical clustering via joint between-within distances: Extending Ward’s minimum variance method,” J. Classification, vol. 22, no. 2, pp. 151–183, 2005. [Google Scholar]

[R29] [29].Zhang J, Li C-G, Zhang H, and Guo J, “Low-rank and structured sparse subspace clustering,” in Proc. IEEE Visual Commun. Image Process, 2016, pp. 1–4. [Google Scholar]

[R30] [30].Peng B et al. , “Unsupervised video action clustering via motion-scene interaction constraint,” IEEE Trans. Circuits Systems Video Technol, vol. 30, no. 1, pp. 131–144, January. 2020. [Google Scholar]

[R31] [31].Li B et al. , “Subspace clustering under complex noise,” IEEE Trans. Circuits Systems Video Technol, vol. 29, no. 4, pp. 930–940, April. 2019. [Google Scholar]

[R32] [32].Xie J, Girshick R, and Farhadi A, “Unsupervised deep embedding for clustering analysis,” in Proc. Int. Conf. Mach. Learn., 2016, pp. 478–487. [Google Scholar]

[R33] [33].Pompili F et al. , “Two algorithms for orthogonal nonnegative matrix factorization with application to clustering,” Neurocomputing, vol. 141, pp. 15–25, 2014. [Google Scholar]

[R34] [34].Vallières M et al. , “A radiomics model from joint FDG-PET and MRI texture features for the prediction of lung metastases in soft-tissue sarcomas of the extremities,” Phys. Medicine Biol, vol. 60, no. 14, 2015, Art. no. 5471. [DOI] [PubMed] [Google Scholar]

[R35] [35].Liu H, Zhang X, and Xiong R, “Content-adaptive low rank regularization for image denoising,” in Proc. IEEE Int. Conf. Image Process., 2016, pp. 3091–3095. [Google Scholar]

[R36] [36].Liu H et al. , “Image denoising via low rank regularization exploiting intra and inter patch correlation,” IEEE Trans. Circuits Syst. Video Technol, vol. 28, no. 12, pp. 3321–3332, December. 2018. [Google Scholar]

[R37] [37].Goldstein T and Osher S, “The split Bregman method for L1-regularized problems,” SIAM J. Imag. Sci, vol. 2, no. 2, pp. 323–343, 2009. [Google Scholar]

[R38] [38].Liu H et al. , “Nonlocal gradient sparsity regularization for image restoration,” IEEE Trans. Circuits Syst. Video Technol, vol. 27, no. 9, pp. 1909–1921, September. 2017. [Google Scholar]

[R39] [39].Liu H et al. , “CG-Cast: Scalable wireless image SoftCast using compressive gradient,” IEEE Trans. Circuits Syst. Video Technol, vol. 29, no. 6, pp. 1832–1843,June. 2019. [Google Scholar]

[R40] [40].Tibshirani R, Walther G, and Hastie T, “Estimating the number of clusters in a data set via the gap statistic,” J. Royal Statistical Soc.: Series B (Statistical Methodology), vol. 63, no. 2, pp. 411–423, 2001. [Google Scholar]

[R41] [41].Selesnick I, A Derivation of the Soft-Thresholding Function. New York, NY, USA: Polytechnic Institute of New York University, 2009. [Google Scholar]

[R42] [42].Liu H et al. , “Image super-resolution based on adaptive joint distribution modeling,” in Proc. IEEE Visual Commun. Image Process, 2017, pp. 1–4. [Google Scholar]

[R43] [43].Liu H et al. , “Image denoising via adaptive soft-thresholding based on non-local samples,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit, 2015, pp. 484–492. [Google Scholar]

[R44] [44].Rezaei M and Fränti P, “Set matching measures for external cluster validity,” IEEE Trans. Knowl. Data Eng, vol. 28, no. 8, pp. 2173–2186, August. 2016. [Google Scholar]

[R45] [45].Hubert L and Arabie P, “Comparing partitions,” J. Classification, vol. 2, no. 1, pp. 193–218, 1985. [Google Scholar]

[R46] [46].Kaplan EL and Meier P, “Nonparametric estimation from incomplete observations,” J. Amer. Statistical Association, vol. 53, no. 282, pp. 457–481, 1958. [Google Scholar]

[R47] [47].Mantel N, “Evaluation of survival data and two new rank order statistics arising in its consideration,” Cancer Chemother Rep., vol. 50, pp. 163–170, 1966. [PubMed] [Google Scholar]

[R48] [48].Fox J, “Cox proportional-hazards regression for survival data,” An R S-PLUS Companion Appl. Regression, vol. 2002, pp. 1–18, 2002. [Google Scholar]

[R49] [49].Tibshirani R, “The lasso method for variable selection in the Cox model,” Statistics Medicine, vol. 16, no. 4, pp. 385–395, 1997. [DOI] [PubMed] [Google Scholar]

[R50] [50].Ishwaran H et al. , “Random survival forests,” Ann. Appl. Statistics, vol. 2, no. 3,pp. 841–860, 2008. [Google Scholar]

[R51] [51].Harrell FE Jr., et al. , “Regression modelling strategies for improved prognostic prediction,” Statistics Medicine, vol. 3, no. 2, pp. 143–152, 1984. [DOI] [PubMed] [Google Scholar]

[R52] [52].Guo Y et al. , “Breast cancer histology image classification based on deep neural networks,” in International Conference Image Analysis and Recognition. New York; Berlin, Germany: Springer, 2018, pp. 827–836. [Google Scholar]

[R53] [53].Yin S et al. , “Multi-instance deep learning with graph convolutional neural networks for diagnosis of kidney diseases using ultrasound imaging,” in Uncertainty for Safe Utilization of Machine Learning in Medical Imaging and Clinical Image-Based Procedures. New York; Berlin, Germany: Springer, 2019, pp. 146–154. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R54] [54].Hershey JR et al. , “Deep clustering: Discriminative embeddings for segmentation and separation,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 2016, pp. 31–35. [Google Scholar]

[R55] [55].Yang B et al. , “Towards k-means-friendly spaces: Simultaneous deep learning and clustering,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 3861–3870. [Google Scholar]

[R56] [56].Ghasedi Dizaji K et al. , “Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization,” in Proc. IEEE Int. Conf. Comput. Vision, 2017, pp. 5736–5745. [Google Scholar]

[R57] [57].Weiss K, Khoshgoftaar TM, and Wang D, “A survey of transfer learning,” J. Big Data, vol. 3, no. 9, pp. 1–40, 2016. [Google Scholar]

[R58] [58].He H and Wu D, “Transfer learning for brain-computer interfaces: A Euclidean space data alignment approach,” IEEE Trans. Biomed. Eng, vol. 67, no. 2, pp. 399–410, February. 2020. [DOI] [PubMed] [Google Scholar]

[R59] [59].Zheng Q et al. , “Computer-aided diagnosis of congenital abnormalities of the kidney and urinary tract in children based on ultrasound imaging data by integrating texture image features and deep transfer learning image features,” J. Pediatric Urol, vol. 15, no. 1, pp. 75.e1–75.e7, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R60] [60].Yin S et al. , “Automatic kidney segmentation in ultrasound images using subsequent boundary distance regression and pixelwise classification networks,” Medical Image Anal., vol. 60, 2020, Art. no. 101602. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R61] [61].Zheng Q, Tastan G, and Fan Y, “Transfer learning for diagnosis of congenital abnormalities of the kidney and urinary tract in children based on ultrasound imaging data,” in Proc. 2018 IEEE 15th Int. Symp. Biomed. Imag., 2018, pp. 1487–1490. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R62] [62].Parmar C et al. , “Radiomic feature clusters and prognostic signatures specific for lung and head & neck cancer,” Scientific Reports, vol. 5, 2015, Art. no. 11044. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R63] [63].Zhang Y et al. , “Radiomics-based prognosis analysis for non-small cell lung cancer,” Scientific Reports, vol. 7, 2017, Art. no. 46349. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R64] [64].Li H et al. , “MR imaging radiomics signatures for predicting the risk of breast cancer recurrence as given by research versions of MammaPrint, Oncotype DX, and PAM50 gene assays,” Radiology, vol. 281, no. 2, pp. 382–391, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R65] [65].Vicente AMG et al. , “Heterogeneity in [18F] Fluorodeoxyglucose positron emission tomography/computed tomography of non–small cell lung carcinoma and its relationship to metabolic parameters and pathologic staging,” Mol. Imag, vol. 13, no. 9, 2014, Art. no. 7290.2014. 00032. [Google Scholar]

[R66] [66].Desseroit M-C et al. , “Development of a nomogram combining clinical staging with 18 F-FDG PET/CT image features in non-small-cell lung cancer stage I-III,” Eur. J. Nuclear Medicine Mol. Imag, vol. 43, no. 8, pp. 1477–1485, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R67] [67].Coroller TP et al. , “CT-based radiomic signature predicts distant metastasis in lung adenocarcinoma,” Radiother. Oncol, vol. 114, no. 3, pp. 345–350, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R68] [68].Zhou H et al. , “Diagnosis of distant metastasis of lung cancer: based on clinical and radiomic features,” Translational Oncol., vol. 11, no. 1, pp. 31–36, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Robust Collaborative Clustering of Subjects and Radiomic Features for Cancer Prognosis

Hangfan Liu

Hongming Li

Mohamad Habes

Yuemeng Li

Pamela Boimel

James Janopaul-Naylor

Ying Xiao

Edgar Ben-Josef

Yong Fan

Roles

Abstract

I. Introduction

II. Related Machine Learning Methods

III. Material and Methods

A. Collaborative Clustering With Adaptive Sparsity Regularization for Patient Stratification and Prognosis

Fig. 1.

B. A Robust Collaborative Clustering Algorithm

1). Preliminary Dictionary Learning:

2). Adaptive Soft-Thresholding in Bayesian Framework:

3). Final Collaborative Clustering:

IV. Evaluations and Results

A. Evaluation Based On Synthetic Data0

TABLE I.

Fig. 2.

Fig. 3.

Fig. 4.

TABLE II.

B. Evaluation for Patient Stratification and Survival Prediction in a Radiomic Study of Rectal Cancer

Fig. 5.

Fig. 6.

Fig. 7.

Fig. 8.

TABLE III.

TABLE IV.

V. Discussion and Conclusions

Acknowledgment

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases