Biclustering analysis of functionals via penalized fusion

Kuangnan Fang; Yuanxing Chen; Shuangge Ma; Qingzhao Zhang

doi:10.1016/j.jmva.2021.104874

. Author manuscript; available in PMC: 2023 May 1.

Published in final edited form as: J Multivar Anal. 2021 Oct 29;189:104874. doi: 10.1016/j.jmva.2021.104874

Biclustering analysis of functionals via penalized fusion

Kuangnan Fang ^a, Yuanxing Chen ^a, Shuangge Ma ^b, Qingzhao Zhang ^c,^*

PMCID: PMC9937451 NIHMSID: NIHMS1826775 PMID: 36817965

Abstract

In biomedical data analysis, clustering is commonly conducted. Biclustering analysis conducts clustering in both the sample and covariate dimensions and can more comprehensively describe data heterogeneity. In most of the existing biclustering analyses, scalar measurements are considered. In this study, motivated by time-course gene expression data and other examples, we take the “natural next step” and consider the biclustering analysis of functionals under which, for each covariate of each sample, a function (to be exact, its values at discrete measurement points) is present. We develop a doubly penalized fusion approach, which includes a smoothness penalty for estimating functionals and, more importantly, a fusion penalty for clustering. Statistical properties are rigorously established, providing the proposed approach a strong ground. We also develop an effective ADMM algorithm and accompanying R code. Numerical analysis, including simulations, comparisons, and the analysis of two time-course gene expression data, demonstrates the practical effectiveness of the proposed approach.

Keywords: Biclustering, Functional data, Penalized fusion, primary 62H30, 62R10

1. Introduction

In biomedical data analysis, clustering has been routinely conducted. The clustering of samples can assist better understanding sample heterogeneity, and the clustering of covariates can identify those that behave similarly across samples and then, for example, improve our understanding of covariate functionalities. Clustering can also serve as the basis of other analysis, for example, regression. Biclustering analysis has also been developed, identifying clustering structures in both sample and covariate dimensions. It includes sample- and covariate-clustering as special cases and, in a sense, can be more comprehensive. For generic reviews of techniques, theories, and applications of clustering, we refer to [19,46].

This study has been partly motivated by the analysis of gene expression data, for which sample- and covariate-clustering as well as biclustering have been extensively conducted [21,45]. Most gene expression studies generate “snapshot” values. Unlike some types of omics measurements, gene expression values can be time-dependent, and the temporal trends of gene expressions can have important biological implications [16]. Accordingly, time-course gene expression studies have been conducted, generating multiple measurements at different time points for each gene of each sample. In the analysis of time-course gene expression data, besides simple statistics, functional data analysis (FDA) techniques, have been adopted and shown as powerful [12].

FDA deals with data samples that consist of curves or other infinite-dimensional data objects. Over the last two decades, we have witnessed significant developments in its theory, method, computation, and application. For systematic reviews, we refer to [2,15,23,40]. In FDA, clustering analysis has been of particular interest. A popular approach projects functional data into a finite-dimensional space and then applies existing clustering methods. For example, Abraham et al. [1] conduct B-spline expansions, and clusters the estimated coefficients using a k-means algorithm. Peng and Müller [30] develop a distance for sparse functional data, and apply a k-means algorithm to functional principle component analysis (PCA) scores. Other approaches, such as Bayesian [37], subspace [3,9,10], and model-based [18,20], have also been developed. We refer to [17,40] for surveys on functional data clustering. Most works in this area, however, have focused on either sample- or covariate-clustering.

For biclustering analysis (of gene expression and other types of data), in this article, we take the “natural next step” and consider the scenario where for each covariate of each sample, a function or its realizations at discrete time points are available. We note that, although this study has been partly motivated by gene expression data and some of the discussions are focused on such data, the considered data scenario and proposed technique can have applications far beyond such data. For example, in biomedical studies, many biomarkers measured in blood tests vary across time, and their values can be obtained from medical records. In financial studies, many measures of a company, for example size and stock price, vary across time. As such, our investigation can have broad applications.

There is a vast literature on biclustering analysis with scalar measurements. Directly applying such techniques to the present problem will involve either treating functional measurements as scalars and then computing distances (between covariates and samples) – which may be ineffective by not sufficiently accounting for the functional nature of data, or first estimating functionals and then computing distances between the estimates – which may also encounter challenges when a large number of functionals need to be jointly estimated. Our literature review suggests that there are also a handful recent biclustering methods designed for functional (especially including longitudinal) data. For example, Slimen et al. [35] propose a biclustering method for multivariate functional data based on the Gaussian latent block model (LBM) using the first functional PCA scores. Bouveyron et al. [4] develop an extension of the Gaussian LBM by modeling the whole set of functional PCA scores. In another work [28], a biclustering method with a plaid model is extended to three-dimensional data arrays, of which multivariate longitudinal data is a special case.

For the biclustering analysis of functionals, in this article, we develop a penalized fusion based approach. More specifically, a nonparametric model is assumed for each covariate of each sample, allowing for sufficient flexibility in modeling. A doubly penalization technique is adopted, which includes a smoothness penalty to regulate nonparametric estimation. The most significant advancement is the second, fusion penalty, which “transforms” clustering in both sample and covariate dimensions to a penalized estimation problem. Statistical and numerical investigations are conducted, providing the proposed approach a solid ground. This study may complement and advance from the existing ones in multiple aspects. Compared to direct applications of biclustering methods for scalars (that either directly compute distances without functional estimation or estimate functionals separately), the proposed approach can more effectively accommodate the functional nature of data or generate more effective estimation. This is because it “combines” clustering and estimation, and as such, estimation only needs to be conducted for clusters as opposed to individual covariates, potentially leading to a smaller number of parameters and hence more effective estimation. Compared to some of the existing biclustering methods for functionals, such as [4,35], the proposed approach has a much easier way of determining the number of clusters. In addition, unlike [4,35], it does not make stringent distributional assumptions (for example, normality). Meanwhile, rigorous theoretical investigations are conducted beyond methodological developments, granting the proposed approach a stronger statistical basis. It also advances from the clustering of functional covariate effects (assuming homogeneous samples) by simultaneously examining sample heterogeneity, thus being more comprehensive. Additionally, this study may also advance and enrich the penalized fusion technique. Clustering via penalized fusion has been pioneered in [8] and other studies. Compared to alternative clustering techniques, it is more recent and has notable statistical and numerical advantages [44]. Compared to the existing penalized fusion based clustering, this study differs by conducting biclustering and by having unknown parameters generated from the basis expansion of functionals. Last but not least, this study also provides a practically useful and new way of analyzing time-course gene expression data (and other data with similar characteristics).

The remainder of this article is organized as follows: Section 2 introduces the new biclustering approach via penalized fusion and develops an effective computational algorithm. Statistical properties are established to provide our method a strong theoretical support. Simulation studies and the analysis of two time-course expression data are conducted in Sections 3 and 4, respectively. Section 5 concludes with a brief discussion. The proofs of the main results are presented in Appendix A.

2. Methods

2.1. Data and model settings

For the j ∈ {1, …, q}th covariate of sample i ∈ {1, …, N }, denote $Y_{i, j} = {(Y_{i, j, 1}, \dots, Y_{i, j, n_{i, j}})}^{⊤}$ as the ordered measurements (ordered by time for time-course gene expression data), which are the discrete realizations of an unknown underlying functional. Further denote $Y_{i} = {(Y_{i, 1}^{⊤}, \dots, Y_{i, q}^{⊤})}^{⊤}$ , $Y = {(Y_{1}^{⊤}, \dots, Y_{N}^{⊤})}^{⊤}$ and $n = \sum_{i = 1}^{N} \sum_{j = 1}^{q} n_{i, j}$ . Under the biclustering analysis framework, assume that data can be “decomposed” into K_r sample (row) groups and K_c covariate (column) groups. Note that advancing from many existing approaches, the numbers of two dimensional groups are not pre-specified. Denote $t_{i, j, m}^{'} s \in T = [0, 1]$ as the observed time points. If (sample i, covariate j) belongs to the k_r th sample group and the k_c th covariate group, then

Y_{i, j, m} = g_{(k_{r}, k_{c})} (t_{i, j, m}) + ϵ_{i, j, m},

(1)

where $g_{(k_{r}, k_{c})} (t)$ is the unknown mean function, and $ϵ_{i, j, m}^{'} s$ are the random errors with mean zero.

For estimation, we adopt the basis expansion technique. Specifically, denote U_p(t) = (U_1,p(t), …, U_p,p(t))^⊤ as the collection of p rescaled basis functions. In the literature, there are extensive studies on choosing the form and number of basis functions [32], which will not be reiterated here. In our numerical study, we adopt B-spline basis functions of order d = 3. Let g_i,j(t) be the unknown mean function for the jth covariate of the ith sample, then we have

g_{i, j} (t) \approx U_{p}^{⊤} (t) β_{i, j},

where β_i,j = (β_i,j,1, …,β_i,j,p)^⊤ is the vector of unknown coefficients. Further denote $U_{i, j} = {(U_{p} (t_{i, j, 1}), \dots, U_{p} (t_{i, j, n_{i, j}}))}^{⊤}$ . For estimation (without clustering), consider the objective function

Q (β) = \frac{1}{2} ‖ Y - U β ‖_{2}^{2} + \frac{1}{2} γ_{1} β^{⊤} M β = \frac{1}{2} \sum_{i = 1}^{N} \sum_{j = 1}^{q} ({‖ Y_{i, j} - U_{i, j} β_{i, j} ‖}_{2}^{2} + γ_{1} β_{i, j}^{⊤} D β_{i, j}),

(2)

where U = diag(U_1,1, …, U_1,q, …, U_N,q), $β = {(β_{1, 1}^{⊤}, \dots, β_{1, q}^{⊤}, \dots, β_{N, q}^{⊤})}^{⊤}$ , M = diag(D, …, D), D = δ^⊤ δ, δ is a (p − 2) × p matrix representing the second order differential operator, and γ₁ is a non-negative tuning parameter. In this objective function, the first term is the lack-of-fit, and the penalty term controls the smoothness of estimation.

2.2. Biclustering via penalized fusion

Under the clustering via penalized fusion framework, two samples (covariates) belong to the same cluster if and only if they have the same regression coefficients. As such, clustering amounts to determining whether two samples (covariates) have the same estimated coefficients. For samples i₁, i₂ ∈ {1, …, N}, denote $β_{i_{1}}^{(r)}$ , $β_{i_{2}}^{(r)}$ as the length p × q vectors of coefficients. For covariates j₁, j₂ ∈ {1, …, q}, denote $β_{j_{1}}^{(c)}$ , $β_{j_{2}}^{(c)}$ as the length p × N vectors of coefficients. For estimating β and hence determining the clustering structure, we propose minimizing the objective function:

L (β) = Q (β) + \sum_{1 \leq i_{1} < i_{2} \leq N} p_{τ} ({‖ β_{i_{1}}^{(r)} - β_{i_{2}}^{(r)} ‖}_{2}, γ_{2}) + \sum_{1 \leq j_{1} < j_{2} \leq q} p_{τ} ({‖ β_{j_{1}}^{(c)} - β_{j_{2}}^{(c)} ‖}_{2}, {(N / q)}^{1 / 2} γ_{2}) .

(3)

Here p_τ (, ) is a penalty function, τ is a regularization parameter, ∥ · ∥₂ is the ℓ₂ norm, and γ₂ is a data-dependent tuning parameter. (N/q)^1/2 is added to make the two penalties comparable. In our numerical study, we adopt MCP [47], that is, $p_{τ} (t, γ) = γ \int_{0}^{t} (1 - x / (τ γ))_{+} d x$ with τ > 1. Here (x)₊ = x if x > 0, and (x)₊ = 0 otherwise. Note that SCAD [14] and some other penalties are also applicable. Denote the estimator as $\hat{β}$ . Let ${{\hat{α}}_{1}^{(r)}, \dots, {\hat{α}}_{{\hat{K}}_{r}}^{(r)}}$ be the distinct values of ${\hat{β}}_{i}^{(r)} ’ s$ . Similarly, let ${{\hat{α}}_{1}^{(c)}, \dots, {\hat{α}}_{{\hat{K}}_{c}}^{(c)}}$ be the distinct values of ${\hat{β}}_{j}^{(c)} ’ s$ . We can then obtain the block structure of $\hat{β}$ by ${{\hat{α}}_{1, 1}^{(r, c)}, \dots, {\hat{α}}_{{\hat{K}}_{r}, {\hat{K}}_{c}}^{(r, c)}}$ , which are the distinct values of ${\hat{β}}_{i, j}$ , and set ${\hat{K}}_{b} = {\hat{K}}_{r} \times {\hat{K}}_{c}$ .

In (3), penalty is imposed to the norms of all pairwise differences to promote equality, as in “standard” penalized fusion [8]. Here it is noted that, as in [8], since there is no information on the order of samples/covariates, all pairwise differences are taken, which differs from, for example, fused Lasso and other fused penalizations. Different from [8], as clustering needs to be conducted in both the sample and covariate dimensions, two fusion penalties are imposed, promoting equality in two directions. It is also noted that each specific coefficient shows up in three different penalties. As to be shown below, with properly chosen tunings, there is not an over penalization problem. In addition, it is not rare to have a parameter involved in two or more penalties [7].

The proposed approach involves two tunings, which have “ordinary” implications, with one controlling smoothness and the other determining the structure of clustering. One possibility is to conduct a two-dimensional grid search. Here we adopt the alternative proposed in [48], which has two steps and a lower computational cost. In particular, in the first step, we set γ₂ = 0 and select the optimal γ₁ by minimizing:

BIC (γ_{1}) = \sum_{i = 1}^{N} \sum_{j = 1}^{q} {\log (\frac{{‖ Y_{i, j} - {\hat{g}}_{i, j} ‖}_{2}^{2}}{n_{i, j}}) + \frac{\log (n_{i, j})}{n_{i, j}} {df}_{i, j}},

where ${df}_{i, j} = trace {U_{i, j} {(U_{i, j}^{⊤} U_{i, j} + γ_{1} D)}^{- 1} U_{i, j}^{⊤}}$ and ${\hat{g}}_{i, j} = {({\hat{g}}_{i, j} (t_{i, j, 1}), \dots, {\hat{g}}_{i, j} (t_{i, j, n_{i, j}}))}^{⊤}$ with ${\hat{g}}_{i, j} (t) = U_{p}^{⊤} (t) {\hat{β}}_{i, j}$ .

In the second step, we fix the value of γ₁ at the optimal and select γ₂ by minimizing

BIC (γ_{2}) = \log (\frac{‖ Y - \hat{g} ‖_{2}^{2}}{N q}) + \frac{\log (N q)}{N q} df,

where $df = ({\hat{K}}_{r} {\hat{K}}_{c} / N q) \sum_{i = 1}^{N} \sum_{j = 1}^{q} {df}_{i, j}$ and $\hat{g} = {({\hat{g}}_{1, 1}^{⊤}, \dots, {\hat{g}}_{N, q}^{⊤})}^{⊤}$ .

2.3. Computation

We develop an effective algorithm based on the ADMM technique. Specifically, we first reformulate (3) as

\arg \min Q (β) + \sum_{δ \in Δ^{(r)}} p_{τ} ({‖ η_{δ}^{(r)} ‖}_{2}, γ_{2}) + \sum_{δ \in Δ^{(c)}} p_{τ} ({‖ η_{δ}^{(c)} ‖}_{2}, {(N / q)}^{1 / 2} γ_{2}),

subject to β_{i_{1}}^{(r)} - β_{i_{2}}^{(r)} = η_{δ}^{(r)}, β_{j_{1}}^{(c)} - β_{j_{2}}^{(c)} = η_{δ}^{(c)},

where Δ^(r) = {δ = (i₁, i₂) : 1 ≤ i₁ < i₂ ≤ N} and Δ^(c) = {δ = (j₁, j₂) : 1 ≤ j₁ < j₂ ≤ q}. Optimizing the constrained objective function is equivalent to optimizing the augmented Lagrangian function:

\begin{array}{l} L_{θ} (β, H_{r}, H_{c}, Λ_{r}, Λ_{c}) = \frac{1}{2} ‖ Y - U β ‖_{2}^{2} + \frac{1}{2} γ_{1} β^{⊤} M β + \sum_{δ \in Δ^{(r)}} p_{τ} ({‖ η_{δ}^{(r)} ‖}_{2}, γ_{2}) + \sum_{δ \in Δ^{(r)}} λ_{δ}^{(r) ⊤} (η_{δ}^{(r)} - β_{i_{1}}^{(r)} + β_{i_{2}}^{(r)}) \\ + \frac{θ}{2} \sum_{δ \in Δ^{(r)}} {‖ η_{δ}^{(r)} - β_{i_{1}}^{(r)} + β_{i_{2}}^{(r)} ‖}_{2}^{2} + \sum_{δ \in Δ^{(c)}} p_{τ} ({‖ η_{δ}^{(c)} ‖}_{2}, {(N / q)}^{1 / 2} γ_{2}) \\ + \sum_{δ \in Δ^{(c)}} λ_{δ}^{(c) ⊤} (η_{δ}^{(c)} - β_{j_{1}}^{(c)} + β_{j_{2}}^{(c)}) + \frac{θ}{2} \sum_{δ \in Δ^{(c)}} {‖ η_{δ}^{(c)} - β_{j_{1}}^{(c)} + β_{j_{2}}^{(c)} ‖}_{2}^{2}, \end{array}

(4)

where θ is a small positive constant, $H_{r} = (η_{(1, 2)}^{(r)}, \dots, η_{(N - 1, N)}^{(r)})$ , $H_{c} = (η_{(1, 2)}^{(c)}, \dots, η_{(q - 1, q)}^{(c)})$ , $Λ_{r} = (λ_{(1, 2)}^{(r)}, \dots, λ_{(N - 1, N)}^{(r)})$ , and $Λ_{c} = (λ_{(1, 2)}^{(c)}, \dots, λ_{(q - 1, q)}^{(c)})$ . Here we introduce the dual variables $λ_{δ}^{(r)}$ and $λ_{δ}^{(c)}$ corresponding to the pair δ in Δ^(r) and Δ^(c), and the cardinality of Δ^(r) and Δ^(c) are denoted by |Δ^(r)| and |Δ^(c)|.

We consider an iterative algorithm, where the updates in step m + 1 are:

\begin{array}{l} β^{(m + 1)} = \underset{β}{\arg \min} L_{θ} (β, H_{r}^{(m)}, H_{c}^{(m)}, Λ_{r}^{(m)}, Λ_{c}^{(m)}), H_{r}^{(m + 1)} = \underset{H_{r}}{\arg \min} L_{θ} (β^{(m + 1)}, H_{r}, Λ_{r}^{(m)}), \\ H_{c}^{(m + 1)} = \underset{H_{c}}{\arg \min} L_{θ} (β^{(m + 1)}, H_{c}, Λ_{c}^{(m)}), λ_{δ}^{(r) (m + 1)} = λ_{δ}^{(r) (m)} + θ (η_{δ}^{(r) (m + 1)} - β_{i_{1}}^{(r) (m + 1)} + β_{i_{2}}^{(r) (m + 1)}), δ \in Δ^{(r)}, \\ λ_{δ}^{(c) (m + 1)} = λ_{δ}^{(c) (m)} + θ (η_{δ}^{(c) (m + 1)} - β_{j_{1}}^{(c) (m + 1)} + β_{j_{2}}^{(c) (m + 1)}), δ \in Δ^{(c)} . \end{array}

(5)

More specifically, when optimizing over β, we consider

f (β) = \frac{1}{2} ‖ Y - U β ‖_{2}^{2} + \frac{1}{2} γ_{1} β^{⊤} M β + \frac{θ}{2} (\sum_{δ \in Δ^{(r)}} {‖ {\tilde{η}}_{δ}^{(r) (m)} - B_{δ}^{(r)} β ‖}_{2}^{2} + \sum_{δ \in Δ^{(c)}} {‖ {\tilde{η}}_{δ}^{(c) (m)} - B_{δ}^{(c)} β ‖}_{2}^{2}),

(6)

where ${\tilde{η}}_{δ}^{(r)} = η_{δ}^{(r)} + \frac{1}{θ} λ_{δ}^{(r)}$ , ${\tilde{η}}_{δ}^{(c)} = η_{δ}^{(c)} + \frac{1}{θ} λ_{δ}^{(c)}$ , $B_{δ}^{(r)} = {(e_{i_{1}}^{(r)} - e_{i_{2}}^{(r)})}^{⊤} \otimes I_{q p}$ , $B_{δ}^{(c)} = I_{N} \otimes [{(e_{j_{1}}^{(c)} - e_{j_{2}}^{(c)})}^{⊤} \otimes I_{p}]$ , $e_{i}^{(r)}$ is an N × 1 zero vector except that its ith element is 1, $e_{i}^{(c)}$ is a q × 1 zero vector except that its jth element is 1, ⊗ is the Kronecker product, and Ip is a p × p identity matrix. Denote $B_{r} = {(B_{(1, 2)}^{(r) ⊤}, \dots, B_{(N - 1, N)}^{(r) ⊤})}^{⊤}$ , $B_{c} = {(B_{(1, 2)}^{(c) ⊤}, \dots, B_{(q - 1, q)}^{(c) ⊤})}^{⊤}$ , ${\tilde{H}}_{r} = ({\tilde{η}}_{(1, 2)}^{(r)}, \dots, {\tilde{η}}_{(N - 1, N)}^{(r)})$ , and ${\tilde{H}}_{c} = ({\tilde{η}}_{(1, 2)}^{(c)}, \dots, {\tilde{η}}_{(q - 1, q)}^{(c)})$ . Then the update for β is

β^{(m + 1)} = {(U^{⊤} U + γ_{1} M + θ B_{r}^{⊤} B_{r} + θ B_{c}^{⊤} B_{c})}^{- 1} (U^{⊤} Y + θ B_{r}^{⊤} vec ({\tilde{H}}_{r}^{(m)}) + θ B_{c}^{⊤} vec ({\tilde{H}}_{c}^{(m)})),

(7)

where vec(Z) is the vectorization of matrix Z by columns.

For H_r, we consider

f (η_{δ}^{(r)}) = p_{τ} ({‖ η_{δ}^{(r)} ‖}_{2}, γ_{2}) + \frac{θ}{2} {‖ η_{δ}^{(r)} - β_{i_{1}}^{(r) (m + 1)} + β_{i_{2}}^{(r) (m + 1)} + λ_{δ}^{(r) (m)} / θ ‖}_{2}^{2} .

(8)

Denote $z_{δ}^{(r) (m + 1)} = β_{i_{1}}^{(r) (m + 1)} - β_{i_{2}}^{(r) (m + 1)} - λ_{δ}^{(r) (m)} / θ$ . With the KKT conditions of (8), we can get a closed form solution of H_r:

η_{δ}^{(r) (m + 1)} = {\begin{array}{l} z_{δ}^{(r) (m + 1)}, & if {‖ z_{δ}^{(r) (m + 1)} ‖}_{2} \geq τ γ_{2}, \\ \frac{τ θ}{τ θ - 1} {(1 - \frac{γ_{2} / θ}{{‖ z_{δ}^{(r) (m + 1)} ‖}_{2}})}_{+} z_{δ}^{(r) (m + 1)}, & if {‖ z_{δ}^{(r) (m + 1)} ‖}_{2} < τ γ_{2} . \end{array}

(9)

Similarly, denote $z_{δ}^{(c) (m + 1)} = β_{j_{1}}^{(c) (m + 1)} - β_{j_{2}}^{(c) (m + 1)} - λ_{δ}^{(c) (m)} / θ$ , and we can get a closed form solution of H_c:

η_{δ}^{(c) (m + 1)} = {\begin{array}{l} z_{δ}^{(c) (m + 1)}, & if {‖ z_{δ}^{(c) (m + 1)} ‖}_{2} \geq {(N / q)}^{1 / 2} τ γ_{2}, \\ \frac{τ θ}{τ θ - 1} {(1 - \frac{{(N / q)}^{1 / 2} γ_{2} / θ}{{‖ z_{δ}^{(c) (m + 1)} ‖}_{2}})}_{+} z_{δ}^{(c) (m + 1)}, & if {‖ z_{δ}^{(c) (m + 1)} ‖}_{2} < {(N / q)}^{1 / 2} τ γ_{2} . \end{array}

(10)

Consider the initial values $β^{(0)} = {(U^{⊤} U + γ_{1} M)}^{- 1} U^{⊤} Y$ , $η_{δ}^{(r) (0)} = β_{i_{1}}^{(r) (0)} - β_{i_{2}}^{(r) (0)}$ , and $η_{δ}^{(c) (0)} = β_{j_{1}}^{(c) (0)} - β_{j_{2}}^{(c) (0)}$ , and $Λ_{r}^{(0)}$ and $Λ_{c}^{(0)}$ are set as zero. The ADMM based algorithm is summarized in Algorithm 1.

Algorithm 1.

Input:
Response vector Y, basis expansion design matrix U, and difference matrix M;
Tuning parameters γ₁ and γ₂. Specific to MCP, regularization parameter τ;
Output:
Coefficient vector β, splitting variables H_r and H_c, and dual variables Λ_r and Λ_c;
1: repeat
2: for m = 0, 1, 2 … do
3: Update β by (7).
4: Update H_r by (9).
5: Update H_c by (10).
6: Update Λ_r and Λ_c by (5).
7: end for
8: until the stopping criteria are met, which are set as ${‖ r_{r}^{(m + 1)} ‖}_{2} \leq ϵ_{1}^{p r i}$ , ${‖ r_{c}^{(m + 1)} ‖}_{2} \leq ϵ_{2}^{p r i}$ , ${‖ s_{r}^{(m + 1)} ‖}_{2} \leq ϵ_{1}^{d u a l}$ , and ${‖ s_{c}^{(m + 1)} ‖}_{2} \leq ϵ_{2}^{d u a l}$ in our numerical study.

Open in a new tab

Proposition 1.

Denote the two primal residuals as $r_{r}^{(m + 1)} = B_{r} β^{(m + 1)} - v e c (H_{r}^{(m + 1)})$ and $r_{c}^{(m + 1)} = B_{c} β^{(m + 1)} - v e c (H_{c}^{(m + 1)})$ , and the two dual residuals as $s_{r}^{(m + 1)} = θ B_{r}^{⊤} [v e c (H_{r}^{(m + 1)}) - v e c (H_{r}^{(m)})]$ and $s_{c}^{(m + 1)} = θ B_{c}^{⊤} [v e c (H_{c}^{(m + 1)}) - v e c (H_{c}^{(m)})]$ . Then

\lim_{m \to \infty} {‖ r_{r}^{(m + 1)} ‖}_{2}^{2} = 0, \lim_{m \to \infty} {‖ r_{c}^{(m + 1)} ‖}_{2}^{2} = 0, \lim_{m \to \infty} {‖ s_{r}^{(m + 1)} + s_{c}^{(m + 1)} ‖}_{2}^{2} = 0 .

This result establishes convergence of the proposed algorithm. In numerical analysis, we stop the algorithm and conclude convergence when ${‖ r_{r}^{(m + 1)} ‖}_{2} \leq ϵ_{1}^{p r i}$ , ${‖ r_{C}^{(m + 1)} ‖}_{2} \leq ϵ_{2}^{p r i}$ , ${‖ s_{r}^{(m + 1)} ‖}_{2} \leq ϵ_{1}^{d u a l}$ and ${‖ s_{c}^{(m + 1)} ‖}_{2} \leq ϵ_{2}^{d u a l}$ . Following [5], we set the tolerance parameters as follows:

\begin{array}{l} ϵ_{1}^{p r i} = \sqrt{| Δ^{(r)} | p q} ϵ^{a b s} + ϵ^{r e l} \max {{‖ B_{r} β^{(m + 1)} ‖}_{2}, {‖ vec (H_{r}^{(m + 1)}) ‖}_{2}}, \\ ϵ_{2}^{p r i} = \sqrt{| Δ^{(c)} | p N} ϵ^{a b s} + ϵ^{r e l} \max {{‖ B_{c} β^{(m + 1)} ‖}_{2}, {‖ vec (H_{c}^{(m + 1)}) ‖}_{2}}, \\ ϵ_{1}^{d u a l} = \sqrt{N q p} ϵ^{a b s} + ϵ^{r e l} {‖ B_{r}^{⊤} vec (Λ_{r}^{(m + 1)}) ‖}_{2}, ϵ_{2}^{d u a l} = \sqrt{N q p} ϵ^{a b s} + ϵ^{r e l} {‖ B_{c}^{⊤} vec (Λ_{c}^{(m + 1)}) ‖}_{2} . \end{array}

(11)

Here ϵ^abs and ϵ^rel are predetermined small values, for example 10⁻³. In all of our numerical analysis, convergence is satisfactorily achieved within a small to moderate number of iterations. The code and example are publicly available at https://github.com/ruiqwy/Biclustering.

2.4. Statistical properties

For a vector $z = {(z_{1}, \dots, z_{s})}^{⊤} \in ℝ^{s}$ , let $‖ z ‖_{\infty} = \max_{1 \leq l \leq s} | z_{l} |$ . For a matrix Z_s×h, let $‖ Z ‖_{2} = \max_{v \in ℝ^{h}, ‖ v ‖_{2} = 1} ‖ Z v ‖_{2}$ and $‖ Z ‖_{\infty} = \max_{1 \leq i \leq s} \sum_{j = 1}^{h} | Z_{i, j} |$ . For any two sequences of real numbers {a_n} ≥ 1 and {b_n} ≥ 1, denote b_n ≪ a_n if b_n/a_n = o(1). Let r be a positive integer, v ∈ (0, 1], and κ = r + v > 1.5. Let $H$ be the collection of functions g on $T = [0, 1]$ , where the rth derivative g^(r) exists and satisfies the Lipschitz condition with order v:

| g^{(r)} (z_{1}) - g^{(r)} (z_{2}) | \leq C {| z_{1} - z_{2} |}^{v}, 0 \leq z_{1}, z_{2} \leq 1,

and C is a positive constant.

Define the following collections of index sets for clustering memberships: $G^{(r)} = (G_{1}^{(r)}, \dots, G_{K_{r}}^{(r)})$ for samples, $G^{(c)} = (G_{1}^{(c)}, \dots, G_{K_{c}}^{(c)})$ for covariates, and $G^{(r, c)} = (G_{1, 1}^{(r, c)}, \dots, G_{k_{r}, k_{c}}^{(r, c)}, \dots, G_{K_{r}, K_{c}}^{(r, c)})$ for both samples and covariates. Define $M_{G} = {β \in ℝ^{N q p} : β_{i_{1}, j_{1}} = β_{i_{2}, j_{2}}, for any (i_{1}, j_{1}), (i_{2}, j_{2}) \in G_{k_{r}, k_{c}}^{(r, c)}, 1 \leq k_{r} \leq K_{r}, 1 \leq k_{c} \leq K_{c}}$ . Let $| G_{k_{r}}^{(r)} |$ , $| G_{k_{c}}^{(c)} |$ , and $| G_{k_{r}, k_{c}}^{(r, c)} |$ be the sizes of $G_{k_{r}}^{(r)}$ , $G_{k_{c}}^{(c)}$ , and $G_{k_{r}, k_{c}}^{(r, c)}$ , respectively. Further define $| G_{m i n}^{(r)} | = \min_{1 \leq k_{r} \leq K_{r}} | G_{k_{r}}^{(r)} |$ , $| G_{m i n}^{(c)} | = \min_{1 \leq k_{c} \leq K_{c}} | G_{k_{c}}^{(c)} |$ and $| G_{m i n}^{(r, c)} | = | G_{m i n}^{(r)} | \times | G_{m i n}^{(c)} | \cdot | G_{m a x}^{(r, c)} |$ can be defined accordingly. Let ρ(t) = γ⁻¹p_τ(t, γ). Assume the following conditions.

(C1) $g_{k_{r}, k_{c}} \in H$ for all k_r ∈ {1, …, K_r}, k_c ∈ {1, …, K_c}, and ${| G_{m a x}^{(r, c)} |}^{1 / (2 κ)} ≪ p ≪ {| G_{m i n}^{(r, c)} |}^{1 / 3}$ .

(C2) The distribution of t_i,j,m’s, i ∈ {1, …, N}, j ∈ {1, …, q}, m ∈ {1, …, n_i,j} follows a density function f_T, which is absolutely continuous. There exist constants c₁ and C₁ such that $0 < c_{1} \leq \min_{t \in T} f_{T} (t) \leq \max_{t \in T} f_{T} (t) \leq C_{1} < \infty$ .

(C3) n_i,j’s are uniformly bounded for all i ∈ {1, …, N}, j ∈ {1, …, q}.

(C4) p_τ (t, γ) is symmetric, non-decreasing, and concave in t for t ∈ [0, ∞]. There exists a constant 0 < a < ∞ such that ρ(t) is a constant for all t ≥ aγ, and ρ(0) = 0. ρ^′(t) exists and is continuous except for a finite number of t and ρ^′(0+) = 1.

(C5) Let $ϵ_{i, j} = {(ϵ_{i, j, 1}, \dots, ϵ_{i, j, n_{i, j}})}^{⊤}$ , where ϵ_i,j,m’s are independent across (i, j) (among different individual observational vectors) and correlated across m (within the same (i, j)). Furthermore, there exist F > 0 and c₂ > 0, such that for all i ∈ {1, …, N} and j ∈ {1, …, q},

E (\exp {F {| n_{i, j}^{- 1} ϵ_{i, j}^{⊤} ϵ_{i, j} |}^{1 / 2}}) \leq c_{2} .

Similar conditions have been assumed in the literature. The first condition in (C1) ensures that the Hölder’s condition is satisfied [36]. The second condition in (C1) pertains to the growth rate of the number of internal knots, in a way similar to [25] and [24]. Condition (C2) assumes the boundedness of the density function, similarly to [48] and others. Conditions similar to (C3) have been commonly made. In the analysis of high-dimensional data, conditions similar to (C4) have been common, and it is easy to verify that MCP and SCAD satisfy (C4). Condition (C5) gives the boundedness condition for the error terms, and a similar condition can be found in [11].

When the true clustering structure is known, the oracle estimator for β can be defined as

{\hat{β}}^{o r} = \underset{β \in M_{G}}{\arg \min} \frac{1}{2} \sum_{k_{r} = 1}^{K_{r}} \sum_{k_{c} = 1}^{K_{c}} \sum_{(i, j) \in G_{k_{r}, k_{c}}^{(r, c)}} {{‖ Y_{i, j} - U_{i, j} β_{i, j} ‖}_{2}^{2} + γ_{1} β_{i, j}^{⊤} D β_{i, j}},

where ${\hat{g}}_{(k_{r}, k_{c})}^{o r}$ is defined as the oracle estimator of $g_{(k_{r}, k_{c})}$ based on ${\hat{β}}^{o r}$ . Let β* be the underlying true coefficient vector and $g_{(k_{r}, k_{c})}^{*}$ be the true value of $g_{(k_{r}, k_{c})}$ . For any L²-integrable function g, denote $‖ g ‖ = {(\int_{t \in T} g^{2} (t) f_{T} (t) d t)}^{1 / 2}$ .

Theorem 1.

Assume that (C1)–(C5) hold. If $γ_{1} = o ({| G_{m i n}^{(r, c)} |}^{- 1 / 2})$ and $p \log (N q) ≪ | G_{m i n}^{(r, c)} |$ , then with probability at least 1 − 3K_rK_cp/(Nq),

\sup_{1 \leq i \leq N, 1 \leq j \leq q} {‖ {\hat{β}}_{i, j}^{o r} - β_{i, j}^{*} ‖}_{2} \leq ψ, \sup_{1 \leq k_{r} \leq K_{r}, 1 \leq k_{c} \leq K_{c}} ‖ {\hat{g}}_{(k_{r}, k_{c})}^{0 r} - g_{(k_{r}, k_{c})}^{*} ‖ \leq ψ,

where $ψ = C^{*} {(p \log (N q) / | G_{m i n}^{(r, c)} |)}^{1 / 2}$ , and C* is a large constant.

This theorem establishes consistency of the oracle estimates with a high probability. Denote $b = \min_{(k_{r}, k_{c}) \neq (k_{r}^{'}, k_{c}^{'})} ‖ g_{(k_{r}, k_{c})}^{*} - g_{(k_{r}^{'}, k_{c}^{'})}^{*} ‖$ . We can further establish the following result.

Theorem 2.

Assume that (C1)–(C5) and conditions in Theorem 1 hold. If $b ≫ γ_{2} {| G_{m i n}^{(c)} |}^{- 1 / 2}$ , $b ≫ {(N / q)}^{1 / 2} γ_{2} {| G_{m i n}^{(r)} |}^{- 1 / 2}$ , and $γ_{2} ≫ {(p q)}^{1 / 2} \log (N q) / \min {| G_{m i n}^{(r)} |, | G_{m i n}^{(c)} |}$ , then there exists a local minimizer $\hat{β}$ of L(β) satisfying

P (\hat{β} = {\hat{β}}^{o r}) \to 1 a s N, q \to \infty .

This theorem establishes that the oracle estimator is a local minimizer of the objective function with a high probability. The estimation consistency along with the separateness of the true functions can lead to the clustering consistency.

3. Simulation

We conduct simulation to assess performance of the proposed approach and gauge against the following alternatives: (a) the bKmeans method [1], which first fits each curve using B-splines and then clusters the estimated coefficients using the k-means technique by rows and columns, (b) the funHDDC method [33], which has been developed for multivariate functional data clustering based on latent mixture models. It has been applied to longitudinal data, and (c) the funLBM method [4], which has been developed for functional data biclustering based on latent block models. Here we note that the proposed and funLBM methods conduct biclustering directly, whereas the bKmeans and funHDDC methods have been originally designed for one-way clustering–hence they are applied twice to achieve both row and column clusterings. In addition, the funHDDC and funLBM methods are not directly applicable to functional data with unequal measurements. We apply imputation [26] to tackle this problem. As discussed in Section 1, biclustering methods for functional data are very limited. It is possible to modify other existing one-way functional clustering methods to achieve biclustering, however, this demands additional methodological developments. The three alternatives considered here have been chosen because of their closely related frameworks and competitive performance.

In evaluation, we examine both clustering and estimation accuracy. Specifically, when examining clustering accuracy, we consider the estimated numbers of row clusters ${\hat{K}}_{r}$ , column clusters ${\hat{K}}_{c}$ , and biclusters ${\hat{K}}_{b}$ . In addition, we use the Rand index and adjusted Rand index to assess the accuracy of clustering, including RI_r and ARI_r for row clustering, RI_c and ARI_c for column clustering, and RI_b and ARI_b for biclustering. The Rand index is defined by RI = (TP + TN)/(TP + FP + FN + TN), where for example TP is the true positive count, defined as the number of sample pairs from the same cluster and assigned to the same cluster, and the other counts can be defined accordingly. As the Rand index tends to be large even under random clusterings, we also examine the adjusted Rand index defined as ARI = (RI − E(RI))/(max(RI) − E(RI)), which can partly correct this problem. To evaluate estimation accuracy, we examine the integrated squared error (ISE) defined as

ISE = \frac{1}{n} \sum_{k_{r} = 1}^{K_{r}} \sum_{k_{c} = 1}^{K_{c}} \sum_{(i, j) \in G_{k_{r}, k_{c}}^{(r, c)}} \sum_{m = 1}^{n_{i j}} {g_{(k_{r}, k_{c})} (t_{i, j, m}) - {\hat{g}}_{i, j} (t_{i, j, m})}^{2} .

We consider a total of K_b = 9 biclusters, which are formed by K_r = 3 sample (row) clusters and K_c = 3 covariate (column) clusters. $Y_{i, j, m} = g_{(k_{r}, k_{c})} (t_{i, j, m}) + ϵ_{i, j, m}$ with t_i,j,m’s, m ∈ {1, …, 10}, equally spaced on [0, 1]. The nine true functional forms are g_(1,1)(t) = cos(2πt), g_(2,1)(t) = 1 − 2exp(−6t), g_(3,1)(t) = −1.5t, g_(1,2)(t) = 1 + sin(2πt), g_(2,2)(t) = 2t², g_(3,2)(t) = t + 1, g_(1,3)(t) = 2(sin(2πt) + cos(2πt)), g_(2,3)(t) = 1 + t³, and $g_{(3, 3)} (t) = 2 \sqrt{t} + 1$ . They are also graphically presented in Fig. 1. To better mimic real data, we allow a certain proportion (ζ) of the curves from each bicluster to have 20% missing measurements. When implementing the proposed approach, we choose smoothing splines with the number of internal knots J = 3. We also fix θ = 1 and τ = 3. In what follows, under Examples 1 and 2, N > q, whereas under Example 3, N = q. Under Examples 1–3, the random errors are independent, whereas under Example 4, they are correlated. Note that under Examples 1–4, simulation results are calculated based on automatic cluster selection. Example 5 is designed to investigate the performance of these methods when the numbers of clusters are correctly prespecified. A total of 100 replicates are simulated under each setting.

Fig. 1. — Example 1: Curves of observed data (black dotted), estimated (blue solid) by the proposed method, and true (red solid) functions with (a) N = 30 and (b) N = 90 for one replicate. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Example 1.

N = 30, 60, and 90. q = 9. The clusters are balanced, with each row cluster containing N/3 samples and each column cluster containing q/3 covariates. ζ = 0.3. The random errors are iid $N (0, {0.6}^{2})$ .

Example 2.

The settings are the same as in Example 1, except that the clusters are unbalanced. The row clusters have sizes 1:2:3, and the column clusters have sizes 2:3:4.

Example 3.

Set (N, q) = (30, 30), (39, 39), (45, 45), ζ = 0.3 and 0.4. The rest are the same as in Example 1.

Example 4.

The settings are similar to those under Example 1. The random errors are correlated with an AR(1) correlation structure, where AR stands for auto-correlation. Consider AR coefficient ϕ = 0.2 and 0.8, representing weak and strong correlations.

Example 5.

The settings are the same as those in Example 1. The difference is that the numbers of clusters are correctly prespecified instead of being selected by the BIC criterion.

Results for Example 1 are presented in Figs. 1 and 2 as well as Tables 1 and 2. More specifically, in Fig. 1, we show the true functions for all clusters as well as sample observed data and estimated functions. In Table 1, we summarize the numbers of identified row and column clusters as well as biclusters. In Table 2, we summarize the Rand and adjusted Rand index values. In Fig. 2, we present the boxplots of ISE (note that different panels have different ranges for the Y-axis). Results for Examples 2–5 are presented in the Supplementary section. Although different examples have different numerical results, overall, the advantage of the proposed approach is clearly observed. Consider for example Table 1 with N = 30. The proposed approach has the mean number of row clusters 2.83, compared to 2.76, 2.63, and 4.66 of the three alternatives. When N = 90, the proposed approach has the mean number of biclusters 8.71, compared to 3.51, 6.33, and 10.68 of the three alternatives. The improved clustering accuracy is further proved by the Rand index values in Table 2. For example with N = 90, the adjusted Rand index value for biclustering with the proposed approach is 0.964, compared to 0.358, 0.686, and 0.764 with the three alternatives. Fig. 2 shows that as N increases, estimation accuracy of the proposed approach (and two alternatives) increases. Under all three N values, the proposed approach has significantly smaller ISE values. Moreover, comparing the results of Example 5 with Example 1, we observe similar performance and that the proposed approach still performs better when the numbers of clusters are correctly prespecified.

Fig. 2. — Example 1: Boxplots of ISE with (a) the proposed method, (b) bKmeans, (c) funHDDC, and (d) funLBM.

Table 2.

Example 1: Mean and standard error (shown in parentheses) of RI_r. ARI_r. RI_c. ARI_c. RI_b. and ARI_b based on 100 replicates.

N	Method	RI_r	ARI_r	RIc	ARI_c	RI_b	ARI_b

30	Proposed	0.940 (0.189)	0.911 (0.278)	0.936 (0.203)	0.910 (0.279)	0.927 (0.238)	0.909 (0.281)
	bKmeans	0.860 (0.173)	0.740 (0.290)	0.296 (0.163)	0.052 (0.194)	0.673 (0.174)	0.307 (0.167)
	funHDDC	0.744 (0.031)	0.493 (0.074)	0.940 (0.107)	0.880 (0.215)	0.889 (0.051)	0.598 (0.120)
	funLBM	0.913 (0.053)	0.786 (0.109)	0.913 (0.064)	0.746 (0.153)	0.951 (0.029)	0.708 (0.113)
60	Proposed	0.966 (0.138)	0.947 (0.208)	0.963 (0.152)	0.945 (0.212)	0.959 (0.177)	0.943 (0.216)
	bKmeans	0.887 (0.132)	0.780 (0.248)	0.316 (0.195)	0.077 (0.239)	0.704 (0.142)	0.339 (0.191)
	funHDDC	0.767 (0.021)	0.546 (0.049)	0.998 (0.025)	0.995 (0.050)	0.922 (0.014)	0.692 (0.044)
	funLBM	0.918 (0.110)	0.828 (0.221)	0.929 (0.119)	0.840 (0.257)	0.953 (0.052)	0.796 (0.198)
90	Proposed	0.978 (0.117)	0.966 (0.176)	0.975 (0.131)	0.965 (0.178)	0.971 (0.154)	0.964 (0.180)
	bKmeans	0.886 (0.134)	0.778 (0.251)	0.342 (0.226)	0.109 (0.279)	0.709 (0.152)	0.358 (0.227)
	funHDDC	0.769 (0.017)	0.551 (0.040)	0.990 (0.049)	0.980 (0.098)	0.919 (0.025)	0.686 (0.061)
	funLBM	0.909 (0.121)	0.813 (0.241)	0.908 (0.130)	0.793 (0.276)	0.944 (0.056)	0.764 (0.210)

Open in a new tab

4. Applications

Here we analyze two time-course gene expression data. Although in a sense the data characteristics are similar, the two data analyses may serve different purposes. In particular, the first dataset is “older”, which has been analyzed multiple times in the literature, and has a clearer sample clustering structure. In contrast, the second dataset is more recent, and its analysis may lead to a higher practical impact.

4.1. T-cell data

This data has been generated in a study of T-cell activation [31]. It is publicly available in the R package longitudinal (http://www.strimmerlab.org/software/longitudinal/) and contains two subsets: tcell.10 and tcell.34. The first subset contains measurements for 10 samples and 58 genes at 10 unequally spaced time points, t ∈ {0, 2, 4, 6, 8, 18, 24, 32, 48, 72}, whereas the second subset contains measurements for 34 samples and the same genes at the same time points. In [31], the distinctions between the two subsets have been noted, and they have been combined for analysis. Prior to analysis, we conduct data processing, including gene expression normalization using the method developed in [29] and linearly transforming the observed times to [0, 1], and set the knots at 0.06, 0.2, and 0.4 as well as the order as 3.

The proposed approach identifies two sample clusters, with sizes 10 and 34, which exactly match the original subset structure. The distinctions of the samples in the two subsets have been noted in [31]. As such, they are supposed to belong to different clusters. In this sense, our “finding”, although as expected, is re-assuring. In addition, eight gene clusters are identified, among which there are four trivial clusters with sizes one. The four non-trivial clusters have sizes 27, 18, 5, and 4. Detailed information on the gene clusters is available from the authors. The eight non-trivial biclusters are presented in Fig. 3. Biclusters 1–4 correspond to tcell.10, and the rest correspond to tcell.34. It is observed that the estimated functions clearly differ across biclusters. The observed temporal trends are highly similar to those reported in [28], which provides support to the validity of our approach.

Fig. 3. — Analysis of T-cell data: Curves of observed data (black dotted) and estimated functions (blue solid) for the eight non-trivial bicluster, as well as yellow points indicating the estimated values at t ∈ {0, 2, 4, 6, 8, 18, 24, 32, 48, 72} by the proposed method. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

The three alternatives are also applied. The bKmeans approach identifies three sample clusters (with sizes 10, 17, and 17) and four gene clusters (with sizes 9, 15, 19, and 15). Compared to the proposed approach, the adjusted Rand index values are 0.441 (sample), 0.619 (gene), and 0.430 (bicluster). The funHDDC approach identifies two sample clusters (with sizes 10 and 34) and two gene clusters (with sizes 9 and 49). Compared to the proposed approach, the adjusted Rand index values are 1.000 (sample), 0.286 (gene), and 0.452 (bicluster). The funLBM approach identifies two sample clusters (with sizes 10 and 34) and six gene clusters (with sizes 9, 4, 12, 5, 18, and 10). Compared to the proposed approach, the adjusted Rand index values are 1.000 (sample), 0.586 (gene), and 0.646 (bicluster). Unlike for the simulated data, it is difficult to objectively evaluate the accuracy of clustering. However, for the proposed approach, the matching with the original sample distinction and published findings can provide a strong support, which is not shared by the alternatives.

4.2. Vaccine data

This data is generated in a relative recent study [43] and available at GEO with identifier GSE124533. The study settings have been described in detail in [43]. Briefly, it concerns with the time course of whole blood gene expressions, and the samples are healthy adults residing in an inpatient unit. The samples have been randomized into three protocols (305 A, 305B and 305C). Within each protocol, samples have been randomized to receive immunization via either vaccine or saline placebo. The treatments have been referred to as YFV and VZV (under 305 A), HBV1 and HBV3 (under 305B), and TIV and ATIV (under 305C). In this experiment, gene expression levels are measured at t ∈ {1, 2, 3, 4, 5, 7, 14, 21, 28} days after immunization. A total of 43 genes have been studied, which are selected from two gene modules defined in the published literature [6,22]. Prior to analysis, gene expression normalization, rescaling of the time points (to the unit interval), and the selection of knots order are conducted in a similar way as in the previous data analysis.

Two sets of analysis are conducted. In the first set, we focus on the samples under 305 A, which contain 20 samples treated with VZV and 20 with YFV. In the second set, we pool all 122 samples from the three protocols. We note that although the gene time courses have been analyzed in [43], there is insufficient attention to clustering. Complementary to the existing literature, our clustering analysis can potentially review sample heterogeneity as well as coordination among genes.

Results for the first set of analysis are presented in Fig. 5, where we observe two sample clusters and two gene clusters, leading to four biclusters. Here the two sample clusters match the VZV and YFV experimental conditions, which provides support to the validity of our analysis. The two gene clusters contain 27 and 16 members, respectively, which are very close to the module structure. Fig. 5 shows that the temporal trends of the four clusters differ significantly, with the level of variation and position of “peak” varying significantly. The observed trends are similar to those reported in [43]. We also refer to [43] for phamarcodynamic interpretations of the findings.

In the second set of analysis, we identify four sample clusters, with sizes 96, 5, 20, and 1, respectively. In what follows, we focus on the non-trivial clusters. Clusters 1 and 2 contain samples treated with VZV, HBV1, HBV3, ATIV, and TIV, and cluster 3 contains samples treated with YFV. In the original publication, there has been little attention to sample similarity/difference across protocols. Our analysis may suggest the significant difference between YFV and other treatments as well as the relative similarity of the five treatments (YFV excluded). Our analysis leads to two gene clusters, with sizes 25 and 18, respectively. This structure is again very similar to the module structure. The overall six non-trivial biclusters are shown in Fig. 4, where we observe significant across-cluster differences. Among the six patterns, biclusters 5 and 6 are similar to those observed in the first set of analysis, where biclusters 1–4 are relatively different.

Fig. 4. — Analysis of vaccine data with samples under all three protocols: Curves of observed data (black dotted) and estimated functions (blue solid) for non-trivial clusters, as well as yellow points indicating the estimated values at t ∈ {1, 2, 3, 4, 5, 7, 14, 21, 28} by the proposed method. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

The three alternatives are also applied. The bKmeans approach identifies three sample clusters (with sizes 20, 27, and 75) and two gene clusters (with sizes 26 and 17). Compared to the proposed approach, the adjusted Rand index values are 0.551 (sample), 0.907 (gene), and 0.666 (bicluster). The funHDDC approach identifies two sample clusters (with sizes 20 and 102) and three gene clusters (with sizes 26, 12 and 5). Compared to the proposed approach, the adjusted Rand index values are 0.819 (sample), 0.774 (gene), and 0.758 (bicluster). The funLBM approach identifies four sample clusters (with sizes 20, 39, 24 and 39) and two gene clusters (with sizes 20, 23). Compared to the proposed approach, the adjusted Rand index values are 0.276 (sample), 0.818 (gene), and 0.386 (bicluster).

5. Discussion

In this article, we have conducted the biclustering analysis when functions (to be exact, their realizations at discrete time points), as opposed to scalars, are present. The data structure fits time-course gene expression and other experiments. The analysis objective is considerably more complex than the biclustering analysis of scalars and one-way clustering of functions. We have developed a novel approach based on the penalized fusion technique. Methodologically, it differs significantly from the existing biclustering and fusion approaches. Theoretically, it has the much desired consistency property, making it advantageous over some of the existing alternatives that do not have theoretical support. Numerically, it has generated more accurate clustering and estimation in simulation and led to different findings in data analysis.

In our estimation, we have adopted the penalized smoothing technique. An alternative, which may be computationally simpler, is to take fewer basis functions, with which we can eliminate the smoothness penalty. Theoretically and numerically, we expect similar performance. The fusion technique involves pairwise differences/penalties, which may incur higher computational cost when N and/or q are large. In our simulation, we have considered moderate values, which match our data analysis. It will be of interest to develop computationally more scalable approaches/algorithms, for example via model averaging. This is beyond our scope and will be postponed to the future. In data analysis, findings with certain support have been made. In the literature, most existing studies are on the “static” functionalities of genes. It will be important to further understand the dynamics of gene expressions and more solidly interpret the findings.

Supplementary Material

Supplemental Material

NIHMS1826775-supplement-Supplemental_Material.pdf^{(86.6KB, pdf)}

Table 1.

Example 1: Mean, median, and standard error (SE) of ${\hat{K}}_{r}$ , ${\hat{K}}_{c}$ , and ${\hat{K}}_{b}$ as defined in Section 2, as well as the percentage of identifying the corresponding true numbers based on 100 replicates.

N	Method	${\hat{K}}_{r}$				${\hat{K}}_{c}$				${\hat{K}}_{b}$
		Mean	Median	SE	Per	Mean	Median	SE	Per	Mean	Median	SE	Per

30	Proposed	2.83	3.00	0.53	0.90	2.83	3.00	0.53	0.90	8.29	9.00	2.18	0.90
	bKmeans	2.76	3.00	0.64	0.66	1.13	1.00	0.46	0.05	3.09	3.00	1.34	0.03
	funHDDC	2.63	2.00	0.86	0.28	2.76	3.00	0.43	0.76	7.27	6.00	2.70	0.21
	funLBM	4.66	5.00	0.64	0.09	4.43	5.00	0.83	0.22	20.88	25.00	5.31	0.09
60	Proposed	2.91	3.00	0.43	0.93	2.90	3.00	0.41	0.94	8.61	9.00	1.74	0.93
	bKmeans	2.86	3.00	0.57	0.66	1.18	1.00	0.54	0.07	3.43	3.00	1.97	0.05
	funHDDC	2.20	2.00	0.64	0.04	2.99	3.00	0.10	0.99	6.58	6.00	1.92	0.04
	funLBM	3.42	3.00	0.64	0.66	3.24	3.00	0.55	0.82	11.15	9.00	3.31	0.55
90	Proposed	2.93	3.00	0.36	0.96	2.93	3.00	0.36	0.96	8.71	9.00	1.45	0.96
	bKmeans	2.83	3.00	0.51	0.74	1.23	1.00	0.58	0.08	3.51	3.00	1.87	0.08
	funHDDC	2.14	2.00	0.38	0.12	2.96	3.00	0.20	0.96	6.33	6.00	1.17	0.11
	funLBM	3.25	3.00	0.46	0.76	3.30	3.00	0.54	0.74	10.68	9.00	2.03	0.52

Open in a new tab

Acknowledgments

We thank the Editor-in-Chief, Managing Editor, and two reviewers for insightful comments and suggestions. This work was supported by the National Natural Science Foundation of China (11971404, 72071169), Humanity and Social Science Youth Foundation of Ministry of Education of China (19YJC910010), Basic Scientific Project 71988101 of National Science Foundation of China, 111 Project (B13028), National Institutes of Health (CA204120), and National Science Foundation (1916251).

Appendix A. Proofs

Proof of Proposition 1.

By the definitions of $H_{r}^{(m + 1)}$ and $H_{c}^{(m + 1)}$ , for any H_r and H_c, we have

L_{θ} (β^{(m + 1)}, H_{r}^{(m + 1)}, H_{c}^{(m + 1)}, Λ_{r}^{(m)}, Λ_{c}^{(m)}) \leq L_{θ} (β^{(m + 1)}, H_{r}, H_{c}, Λ_{r}^{(m)}, Λ_{c}^{(m)}) .

Fig. 5. — Analysis of vaccine data with samples under 305A: Curves of observed data (black dotted) and estimated functions (blue solid), as well as yellow points indicating the estimated values at t ∈ {1, 2, 3, 4, 5, 7, 14, 21, 28} by the proposed method. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Let $Ξ (β^{(m + 1)}) = {(H_{r}, H_{c}) : B_{r} β^{(m + 1)} - vec (H_{r}) = 0, B_{c} β^{(m + 1)} - vec (H_{c}) = 0}$ and $P = \sum_{δ \in Δ^{(r)}} p_{τ} ({‖ η_{δ}^{(r)} ‖}_{2}, γ_{2}) + \sum_{δ \in Δ^{(c)}} p_{τ} ({‖ η_{δ}^{(c)} ‖}_{2}, {(N / q)}^{1 / 2} γ_{2})$ . We can define

f^{(m + 1)} = \inf_{Ξ (β^{(m + 1)})} {\frac{1}{2} {‖ Y - U β^{(m + 1)} ‖}_{2}^{2} + \frac{1}{2} γ_{1} β^{(m + 1) ⊤} M β^{(m + 1)} + P} = \inf_{Ξ (β^{(m + 1)})} L_{θ} (β^{(m + 1)}, H_{r}, H_{c}, Λ_{r}^{(m)}, Λ_{c}^{(m)}),

and then $L_{θ} (β^{(m + 1)}, H_{r}^{(m + 1)}, H_{c}^{(m + 1)}, Λ_{r}^{(m)}, Λ_{c}^{(m)}) \leq f^{(m + 1)}$ .

For any integer n, we have $vec (Λ_{r}^{(m + n - 1)}) = vec (Λ_{r}^{(m)}) + θ \sum_{i = 1}^{n - 1} [vec (H_{r}^{(m + i)}) - B_{r} β^{(m + i)}]$ and $vec (Λ_{c}^{(m + n - 1)}) = vec (Λ_{c}^{(m)}) + θ \sum_{i = 1}^{n - 1} [vec (H_{c}^{(m + i)}) - B_{c} β^{(m + i)}]$ , and then

\begin{array}{l} L_{θ} (β^{(m + n)}, H_{r}^{(m + n)}, H_{c}^{(m + n)}, Λ_{r}^{(m + n - 1)}, Λ_{c}^{(m + n - 1)}) \\ = \frac{1}{2} {‖ Y - U β^{(m + n)} ‖}_{2}^{2} + \frac{1}{2} γ_{1} β^{(m + n) ⊤} M β^{(m + n)} + P + {vec (Λ_{r}^{(m)}) + θ \sum_{i = 1}^{n - 1} [vec (H_{r}^{(m + i)}) - B_{r} β^{(m + i)}]}^{⊤} \\ \times [Vec (H_{r}^{(m + n)}) - B_{r} β^{(m + n)}] \\ + {vec (Λ_{c}^{(m)}) + θ \sum_{i = 1}^{n - 1} [vec (H_{c}^{(m + i)}) - B_{c} β^{(m + i)}]}^{⊤} [vec (H_{c}^{(m + n)}) - B_{c} β^{(m + n)}] + \frac{θ}{2} {‖ vec (H_{r}^{(m + n)} - B_{r} β^{(m + n)}) ‖}_{2}^{2} \\ + \frac{θ}{2} {‖ vec (H_{c}^{(m + n)} - B_{c} β^{(m + n)}) ‖}_{2}^{2} \leq f^{(m + n)} . \end{array}

Since the augmented Lagrangian function L_θ(β, H_r, H_c, Λ_rΛ_c) is differentiable with respect to β and is convex with respect to each $η_{δ}^{(r)}$ and $η_{δ}^{(c)}$ . By Theorem 4.1 of [38], there exists a limit point of $(β^{(m)}, H_{r}^{(m)}, H_{c}^{(m)})$ , denoted by $(β^{*}, H_{r}^{*}, H_{c}^{*})$ . Then we have

f^{*} = \lim_{m \to \infty} f^{(m + 1)} = \lim_{m \to \infty} f^{(m + n)} = \inf_{Ξ (β^{*})} {\frac{1}{2} {‖ Y - U β^{*} ‖}_{2}^{2} + \frac{1}{2} γ_{1} β^{* ⊤} M β^{*} + P}

For all t ≥ 0, we have

\begin{array}{l} \lim_{m \to \infty} L_{θ} (β^{(m + n)}, H_{r}^{(m + n)}, H_{c}^{(m + n)}, Λ_{r}^{(m + n - 1)}, Λ_{c}^{(m + n - 1)}) \\ = \frac{1}{2} {‖ Y - U β^{*} ‖}_{2}^{2} + \frac{1}{2} γ_{1} β^{* ⊤} M β^{*} + P + \lim_{m \to \infty} vec {(Λ_{r}^{(m)})}^{⊤} [vec (H_{r}^{*}) - B_{r} β^{*}] + (n - \frac{1}{2}) θ {‖ vec (H_{r}^{*}) - B_{r} β^{*} ‖}_{2}^{2} \\ + \lim_{m \to \infty} vec {(Λ_{c}^{(m)})}^{⊤} [vec (H_{c}^{*}) - B_{c} β^{*}] + (n - \frac{1}{2}) θ {‖ vec (H_{c}^{*}) - B_{c} β^{*} ‖}_{2}^{2} \leq f^{*} . \end{array}

Thus

\lim_{m \to \infty} {‖ r_{r}^{(m + 1)} ‖}_{2}^{2} = {‖ B_{r} β^{*} - vec (H_{r}^{*}) ‖}_{2}^{2} = 0, \lim_{m \to \infty} {‖ r_{c}^{(m + 1)} ‖}_{2}^{2} = {‖ B_{c} β^{*} - vec (H_{c}^{*}) ‖}_{2}^{2} = 0.

Besides, by the definition of β^(m+1), we have that

\begin{array}{l} \partial L_{θ} (β^{(m + 1)}, H_{r}^{(m + 1)}, H_{c}^{(m + 1)}, Λ_{r}^{(m)}, Λ_{c}^{(m)}) / \partial β \\ = - U^{⊤} (Y - U β^{(m + 1)}) + γ_{1} M β^{(m + 1)} - θ B_{r}^{⊤} [vec (H_{r}^{(m)}) + vec (Λ_{r}^{(m)}) / θ - B_{r} β^{(m + 1)}] \\ - θ B_{c}^{⊤} [vec (H_{c}^{(m)}) + vec (Λ_{c}^{(m)}) / θ - B_{c} β^{(m + 1)}] \\ = - U^{⊤} (Y - U β^{(m + 1)}) + γ_{1} M β^{(m + 1)} - B_{r}^{⊤} vec (Λ_{r}^{(m)}) - θ B_{r}^{⊤} [vec (H_{r}^{(m)}) - B_{r} β^{(m + 1)}] \\ - B_{c}^{⊤} vec (Λ_{c}^{(m)}) - θ B_{c}^{⊤} [vec (H_{c}^{(m)}) - B_{c} β^{(m + 1)}] \\ = - U^{⊤} (Y - U β^{(m + 1)}) + γ_{1} M β^{(m + 1)} - B_{r}^{⊤} vec (Λ_{r}^{(m + 1)}) + θ B_{r}^{⊤} [vec (H_{r}^{(m + 1)}) - vec (H_{r}^{(m)})] \\ - B_{c}^{⊤} vec (Λ_{c}^{(m + 1)}) + θ B_{c}^{⊤} [vec (H_{c}^{(m + 1)}) - vec (H_{c}^{(m)})] = 0. \end{array}

Then we can obtain

s_{r}^{(m + 1)} + s_{c}^{(m + 1)} = U^{⊤} (Y - U β^{(m + 1)}) - γ_{1} M β^{(m + 1)} + B_{r}^{⊤} vec (Λ_{r}^{(m + 1)}) + B_{c}^{⊤} vec (Λ_{c}^{(m + 1)}) .

By ${‖ B_{r} β^{*} - vec (H_{r}^{*}) ‖}_{2}^{2} = 0$ and ${‖ B_{c} β^{*} - vec (H_{c}^{*}) ‖}_{2}^{2} = 0$ , we have

\begin{array}{l} \lim_{m \to \infty} \partial L_{θ} (β^{(m + 1)}, H_{r}^{(m + 1)}, H_{c}^{(m + 1)}, Λ_{r}^{(m)}, Λ_{c}^{(m)}) / \partial β \\ = - U^{⊤} (Y - U β^{(m + 1)}) + γ_{1} M β^{(m + 1)} - B_{r}^{⊤} vec (Λ_{r}^{(m + 1)}) - B_{c}^{⊤} vec (Λ_{c}^{(m + 1)}) = 0 . \end{array}

Therefore $\lim_{m \to \infty} s_{r}^{(m + 1)} + s_{c}^{(m + 1)} = 0$ . □

Let $| G_{k_{r}, k_{c}}^{(r, c) *} | = \sum_{(i, j) \in G_{k_{r}, k_{c}}^{(r, c)}} n_{i, j}$ and $n_{m} = \max_{i \in {1, \dots, N}, j \in {1, \dots, q}} n_{i, j} < \infty$ . Then $| G_{k_{r}, k_{c}}^{(r, c)} | \leq | G_{k_{r}, k_{c}}^{(r, c) *} | \leq n_{m} | G_{k_{r}, k_{c}}^{(r, c)} |$ . Denote the number of internal knots as J and then J = p − d. Recall that $b = \min_{(k_{r}, k_{c}) \neq (k_{r}^{'}, k_{c}^{'})} ‖ g_{(k_{r}, k_{c})}^{*} - g_{(k_{r}^{'}, k_{c}^{'})}^{*} ‖$ .

Lemma 1.

Under Condition (C1), there exists a spline approximation $α_{k_{r}, k_{c}}^{* ⊤} U_{p} (t)$ of the true function $g_{(k_{r}, k_{c})}^{*} (t)$ for k_r ∈ {1, …, K_r} and k_c ∈ {1, …, K_c}, such that

\sup_{t \in T} | g_{(k_{r}, k_{c})}^{*} (t) - α_{k_{r}, k_{c}}^{* ⊤} U_{p} (t) | = O (J^{- κ}) .

Proof.

Lemma 1 follows from Corollary 6.21 of [34]. This lemma has been used in a number of studies that involve spline expansion [25,42]. We omit the proof here. □

Lemma 2. Under Conditions (C1)–(C3) and b ≫ J^−κ, there exists a constant C₂ > 0 such that for all $(k_{r}, k_{c}) \neq (k_{r}^{'}, k_{c}^{'})$ , such that

{‖ α_{k_{r}, k_{c}}^{*} - α_{k_{r}^{'}, k_{c}^{'}}^{*} ‖}_{2} \geq \frac{1}{2} C_{2}^{- 1 / 2} b,

when N and q are sufficiently large.

Proof.

By the triangular inequality, we have

‖ {(α_{k_{r}, k_{c}}^{*} - α_{k_{r}^{'}, k_{c}^{'}}^{*})}^{⊤} U_{p} ‖ \geq ‖ g_{(k_{r}, k_{c})}^{*} - g_{(k_{r}^{'}, k_{c}^{'})}^{*} ‖ - ‖ g_{(k_{r}, k_{c})}^{*} - α_{k_{r}, k_{c}}^{* ⊤} U_{p} ‖ - ‖ g_{(k_{r}^{'}, k_{c}^{'})}^{*} - α_{k_{r}^{'}, k_{c}^{'}}^{* ⊤} U_{p} ‖ .

(A.1)

Besides, by Theorem 5.4.2 of [13], Condition (C2), and the definition of the rescaled B-spline basis, for any vector $α_{p \times 1}^{'}$ , there exists a constant C₂ > 0 such that

{‖ α^{' ⊤} U_{p} ‖}^{2} \leq C_{2} {‖ α^{'} ‖}_{2}^{2} .

(A.2)

Combining (A.1), (A.2), and Lemma 1, we have

\begin{array}{l} {‖ α_{k_{r}, k_{c}}^{*} - α_{k_{r}^{'}, k_{c}^{'}}^{*} ‖}_{2} \geq C_{2}^{- 1 / 2} {‖ g_{(k_{r}, k_{c})}^{*} - g_{(k_{r}^{'}, k_{c}^{'})}^{*} ‖ - ‖ g_{(k_{r}, k_{c})}^{*} - α_{k_{r}, k_{c}}^{* ⊤} U_{p} ‖ - ‖ g_{(k_{r}, k_{c})}^{*} - α_{k_{r}, k_{c}}^{* ⊤} U_{p} ‖} \\ \geq C_{2}^{- 1 / 2} (b - 2 M_{2} J^{- κ}) > C_{2}^{- 1 / 2} (b - 2 \times \frac{1}{4} b) = \frac{1}{2} C_{2}^{- 1 / 2} b, \end{array}

where the third inequality is obtained when N and q are sufficiently large since b ≫ J^−κ. □

Lemma 3

(Bernstein’s Inequality, Lemma 2.2.11 in [39]). For independent random variables Y₁, …, Y_n with means 0 and $E {| Y_{i} |}^{m} \leq m! M^{m - 2} v_{i} / 2$ for some constants M, v_i, and every m ≥ 2,

P (| Y_{1} + \dots + Y_{n} | > x) \leq 2 \exp {- \frac{1}{2} \frac{x^{2}}{v + M x}},

where v = v₁ + · · · + v_n.

Proof of Theorem 1.

Given ${\hat{β}}^{o r} \in M_{G}$ , when the true block memberships $G_{1, 1}^{(r, c)}, \dots, G_{K_{r}, K_{c}}^{(r, c)}$ are known, the oracle estimators for all β_i,j’s are the same if $(i . j) \in G_{k_{r}, k_{c}}^{(r, c)}$ . Thus we can explore the properties of ${\hat{β}}^{o r}$ by examining the properties of the oracle common coefficient vector ${\hat{α}}^{o r} = {({\hat{α}}_{1, 1}^{o r ⊤}, \dots, {\hat{α}}_{k_{r}, k_{c}}^{o r ⊤}, \dots, {\hat{α}}_{K_{r}, K_{c}}^{o r ⊤})}^{⊤}$ , which is defined as

{\hat{α}}^{o r} = \arg \min_{α} \sum_{k_{r} = 1}^{K_{r}} \sum_{k_{c} = 1}^{K_{c}} {\hat{L}}^{o r} (α_{k_{r}, k_{c}}),

and

{\hat{L}}^{o r} (α_{k_{r}, k_{c}}) = \frac{1}{2} {‖ Y_{(k_{r}, k_{c})} - U_{(k_{r}, k_{c})} α_{k_{r}, k_{c}} ‖}_{2}^{2} + γ_{1} | G_{k_{r}, k_{c}}^{(r, c)} | α_{k_{r}, k_{c}}^{⊤} D α_{k_{r}, k_{c}},

where $Y_{(k_{r}, k_{c})} = vec {Y_{i, j}, (i, j) \in G_{k_{r}, k_{c}}}$ , $U_{(k_{r}, k_{c})} = {(U_{i, j}^{⊤}, (i, j) \in G_{k_{r}, k_{c}})}^{⊤}$ . The corresponding true B-spline coefficient vector is denoted by $α^{*} = {(α_{1, 1}^{* ⊤}, \dots, α_{k_{r}, k_{c}}^{* ⊤}, \dots, α_{K_{r}, K_{c}}^{* ⊤})}^{⊤}$ . Note that

{\frac{\partial {\hat{L}}^{o r} (α_{k_{r}, k_{c}})}{\partial α_{k_{r}, k_{c}}} |}_{α_{k_{r}, k_{c}} = {\hat{α}}_{k_{r}, k_{c}}^{o r}} - {\frac{\partial {\hat{L}}^{o r} (α_{k_{r}, k_{c}})}{\partial α_{k_{r}, k_{c}}} |}_{α_{k_{r}, k_{c}} = α_{k_{r}, k_{c}}^{*}} = {\frac{\partial {\hat{L}}^{o r} (α_{k_{r}, k_{c}})}{\partial α_{k_{r}, k_{c}} \partial α_{k_{r}, k_{c}}^{⊤}} |}_{α_{k_{r}, k_{c}} = {\bar{α}}_{k_{r}, k_{c}}} ({\hat{α}}_{k_{r}, k_{c}}^{o r} - α_{k_{r}, k_{c}}^{*}),

where ${\bar{α}}_{k_{r}, k_{c}}$ is between ${\hat{α}}_{k_{r}, k_{c}}^{o r}$ and $α_{k_{r}, k_{c}}^{*}$ . Then we have

{\hat{α}}_{k_{r}, k_{c}}^{o r} - α_{k_{r}, k_{c}}^{*} = - {{({\frac{\partial {\hat{L}}^{o r} (α_{k_{r}, k_{c}})}{\partial α_{k_{r}, k_{c}} \partial α_{k_{r}, k_{c}}^{⊤}} |}_{α_{k_{r}, k_{c}} = {\bar{α}}_{k_{r}, k_{c}}})}^{- 1} \frac{\partial {\hat{L}}^{o r} (α_{k_{r}, k_{c}})}{\partial α_{k_{r}, k_{c}}} |}_{α_{k_{r}, k_{c}} = α_{k_{r}, k_{c}}^{*}} .

Hence

{‖ {\hat{α}}_{k_{r}, k_{c}}^{o r} - α_{k_{r}, k_{c}}^{*} ‖}_{2} \leq {‖ | G_{k_{r}, k_{c}}^{(r, c) *} | {({\frac{\partial {\hat{L}}^{o r} (α_{k_{r}, k_{c}})}{\partial α_{k_{r}, k_{c}} \partial α_{k_{r}, k_{c}}^{⊤}} |}_{α_{k_{r}, k_{c}} = {\bar{α}}_{k_{r}, k_{c}}})}^{- 1} ‖}_{2} {‖ {{| G_{k_{r}, k_{c}}^{(r, c) *} |}^{- 1} \frac{\partial {\hat{L}}^{o r} (α_{k_{r}, k_{c}})}{\partial α_{k_{r}, k_{c}}} |}_{α_{k_{r}, k_{c}} = α_{k_{r}, k_{c}}^{*}} ‖}_{2} : = A_{k_{r}, k_{c}}^{(1)} \times A_{k_{r}, k_{c}}^{(2)} .

(A.3)

By Lemma A.8 of [41], Conditions (C1) and (C2), we can derive that there exists a constant C₃ > 0 such that for any 1 ≤ k_r ≤ K_r, 1 ≤ k_c ≤ K_c,

P (A_{k_{r}, k_{c}}^{(1)} \leq C_{3}) = P ({‖ \frac{U_{(k_{r}, k_{c})}^{⊤} U_{(k_{r}, k_{c})}}{| G_{k_{r}, k_{c}}^{(r, c) *} |} + \frac{γ_{1} | G_{k_{r}, k_{c}}^{(r, c)} | D}{| G_{k_{r}, k_{c}}^{(r, c) *} |} ‖}_{2} \leq C_{3}) \geq 1 - p / (N q) .

(A.4)

Besides, note that

\begin{array}{l} A_{k_{r}, k_{c}}^{(2)} = {‖ - \frac{U_{(k_{r}, k_{c})}^{⊤}}{| G_{k_{r}, k_{c}}^{(r, c) *} |} (Y_{(k_{r}, k_{c})} - g_{(k_{r}, k_{c})}^{*} + g_{(k_{r}, k_{c})}^{*} - U_{(k_{r}, k_{c})} α_{k_{r}, k_{c}}^{*}) + γ_{1} \frac{| G_{k_{r}, k_{c}}^{(r, c)} |}{| G_{k_{r}, k_{c}}^{(r, c) *} |} D α_{k_{r}, k_{c}}^{*} ‖}_{2} \\ \leq {‖ \frac{U_{(k_{r}, k_{c})}^{⊤}}{| G_{k_{r}, k_{c}}^{(r, c) *} |} ϵ_{k_{r}, k_{c}} ‖}_{2} + {‖ \frac{U_{(k_{r}, k_{c})}^{⊤}}{| G_{k_{r}, k_{c}}^{(r, c) *} |} (g_{(k_{r}, k_{c})}^{*} - U_{(k_{r}, k_{c})} α_{k_{r}, k_{c}}^{*}) ‖}_{2} + {‖ γ_{1} \frac{∣ G_{k_{r}, k_{c}}^{(r, c)}}{| G_{k_{r}, k_{c}}^{(r, c) *} |} D α_{k_{r}, k_{c}}^{*} ‖}_{2} : = B_{k_{r}, k_{c}}^{(1)} + B_{k_{r}, k_{c}}^{(2)} + B_{k_{r}, k_{c}}^{(3)} . \end{array}

(A.5)

Since the rescaled B-spline values are finite, there exists constant M₁ > 0 such that U_l,p(t) ≤ M₁ for l ∈ {1, …, p}. Let U_(i,j)·l denote the lth column of U_(i,j), and we verify the condition of Lemma 3 by Condition (C5)

E {| U_{(i, j) \cdot l}^{⊤} ϵ_{i, j} |}^{m} \leq E ({| U_{(i, j) \cdot l}^{⊤} U_{(i, j) \cdot l} |}^{m / 2} \cdot {| ϵ_{i, j}^{⊤} ϵ_{i, j} |}^{m / 2}) \leq {(F^{- 1} M_{1})}^{m} m! E (\exp {F {| n_{i, j}^{- 1} ϵ_{i, j}^{⊤} ϵ_{i, j} |}^{1 / 2}}) \leq {(F^{- 1} M_{1})}^{m} m! c_{2} .

Applying Lemma 3, we have

P (| \sum_{(i, j) \in G_{k_{r}, k_{c}}^{(r, c)}} U_{(i, j) \cdot l}^{⊤} ϵ_{i, j} | > x) \leq 2 \exp {- \frac{1}{2} \frac{x^{2}}{v + F^{- 1} M_{1} x}},

(A.6)

where $v = \sum_{(i, j) \in G_{k_{r}, k_{c}}^{(r, c)}} v_{i, j}$ and $v_{i, j} = 2 F^{- 2} M_{1}^{2} c_{2}$ .

Let $U_{(k_{r}, k_{c}) \cdot l}$ denote the lth column of $U_{(k_{r}, k_{c})}$ . For some constant 0 < C < ∞, combining Condition (C5) and (A.6), we have

\begin{array}{l} P ({‖ {| G_{k_{r}, k_{c}}^{(r, c) *} |}^{- 1} U_{(k_{r}, k_{c})}^{⊤} ϵ_{k_{r}, k_{c}} ‖}_{\infty} > C F^{- 1} M_{1} {(\log (N q) / | G_{k_{r}, k_{c}}^{(r, c) *} |)}^{1 / 2}) \\ \leq \sum_{l = 1}^{p} P (| U_{(k_{r}, k_{c}) \cdot l}^{⊤} ϵ_{k_{r}, k_{c}} | > C F^{- 1} M_{1} {(\log (N q) | G_{k_{r}, k_{c}}^{(r, c) *} |)}^{1 / 2}) \\ = \sum_{l = 1}^{p} P (| \sum_{(i, j) \in G_{k_{r}, k_{c}}^{(r, c)}} U_{(i, j) \cdot l}^{⊤} ϵ_{i, j} | > C F^{- 1} M_{1} {(\log (N q) | G_{k_{r}, k_{c}}^{(r, c) *} |)}^{1 / 2}) \\ \leq 2 p \exp {- \frac{1}{2} \frac{C^{2} F^{- 2} M_{1}^{2} (\log (N q) | G_{k_{r}, k_{c}}^{(r, c) *} |)}{2 F^{- 2} M_{1}^{2} c_{2} | G_{k_{r}, k_{c}}^{(r, c)} | + C F^{- 2} M_{1}^{2} {(\log (N q) | G_{k_{r}, k_{c}}^{(r, c) *} |)}^{1 / 2}}} \leq 2 p \exp {- \log (N q)} \leq 2 p / N q . \end{array}

Hence, we have that with probability at least 1 − 2p/(Nq),

B_{k_{r}, k_{c}}^{(1)} \leq C F^{- 1} M_{1} {(p \log (N q) / | G_{k_{r}, k_{c}}^{(r, c)} |)}^{1 / 2} .

(A.7)

By Lemma 1, there exists a constant M₂ > 0 such that

B_{k_{r}, k_{c}}^{(2)} \leq p^{1 / 2} {‖ \frac{U_{(k_{r}, k_{c})}^{⊤}}{| G_{k_{r}, k_{c}}^{(r) *} |} (g_{(k_{r}, k_{c})}^{*} - U_{(k_{r}, k_{c})} α_{k_{r}, k_{c}}^{*}) ‖}_{\infty} \leq p^{1 / 2} {‖ \frac{U_{(k_{r}, k_{c})}^{⊤}}{| G_{k_{r}, k_{c})}^{(r, c) *} |} ‖}_{\infty} {‖ (g_{(k_{r}, k_{c})}^{*} - U_{(k_{r}, k_{c})} α_{k_{r}, k_{c}}^{*}) ‖}_{\infty} \leq M_{1} M_{2} p^{1 / 2} J^{- κ} .

(A.8)

In addition,

B_{k_{r}, k_{c}}^{(3)} \leq γ_{1} \frac{| G_{k_{r}, k_{c}}^{(r, c)} |}{| G_{k_{r}, k_{c}}^{(r, c) *} |} {‖ α_{k_{r}, k_{c}}^{*} ‖}_{2} ‖ D ‖_{2} \leq p^{1 / 2} γ_{1} {‖ α_{k_{r}, k_{c}}^{*} ‖}_{\infty} ‖ D ‖_{2} .

(A.9)

Thus by (A.5), (A.7), (A.8), and (A.9), for any 1 ≤ k_r ≤ K_r, 1 ≤ k_c ≤ K_c, with probability at least 1 − 2p/(Nq),

A_{k_{r}, k_{c}}^{(2)} \leq C F^{- 1} M_{1} {(p \log (N q) / | G_{m i n}^{(r, c)} |)}^{1 / 2} + M_{1} M_{2} p^{1 / 2} J^{- κ} + \max_{k_{r}, k_{c}} {‖ α_{k_{r}, k_{c}}^{*} ‖}_{\infty} ‖ D ‖_{2} γ_{1} p^{1 / 2} .

By Condition (C1) and $γ_{1} = o ({| G_{m i n}^{(r, c)} |}^{- 1 / 2})$ , when N and q are sufficiently large, we have

p^{1 / 2} J^{- κ} ≪ {(p \log (N q) / | G_{m i n}^{(r, c)} |)}^{1 / 2}, p^{1 / 2} γ_{1} ≪ {(p \log (N q) / | G_{m i n}^{(r, c)} |)}^{1 / 2} .

Hence, for any 1 ≤ k_r ≤ K_r, 1 ≤ k_c ≤ K_c, with probability at least 1 − 2p/(Nq),

A_{k_{r}, k_{c}}^{(2)} \leq C_{4} {(p \log (N q) / | G_{m i n}^{(r, c)} |)}^{1 / 2},

where C₄ is a large constant. Together with (A.3) and (A.4), for any 1 ≤ k_r ≤ K_r, 1 ≤ k_c ≤ K_c,

\begin{array}{l} P ({‖ {\hat{α}}_{k_{r}, k_{c}}^{o r} - α_{k_{r}, k_{c}}^{*} ‖}_{2} \leq C_{3} C_{4} {(p \log (N q) / | G_{m i n}^{(r, c)} |)}^{1 / 2}) \geq 1 - P (A_{k_{r}, k_{c}}^{(1)} > C_{3}) - P (A_{k_{r}, k_{c}}^{(2)} > C_{4} {(p \log (N q) / | G_{m i n}^{(r, c)} |)}^{1 / 2}) \\ \geq 1 - 3 p / (N q) . \end{array}

By the Bonferroni’s inequality, we have

\begin{array}{l} P (\sup_{1 \leq k_{r} \leq K_{r}, 1 \leq k_{c} \leq K_{c}} {‖ {\hat{α}}_{k_{r}, k_{c}}^{o r} - α_{k_{r}, k_{c}}^{*} ‖}_{2} \leq C_{3} C_{4} {(p \log (N q) / | G_{m i n}^{(r, c)} |)}^{1 / 2}) \\ \geq 1 - \sum_{k_{r} = 1}^{K_{r}} \sum_{k_{c} = 1}^{K_{c}} P ({‖ {\hat{α}}_{k_{r}, k_{c}}^{o r} - α_{k_{r}, k_{c}}^{*} ‖}_{2} > C_{3} C_{4} {(p \log (N q) / | G_{m i n}^{(r, c)} |)}^{1 / 2}) \geq 1 - 3 K_{r} K_{c} p / (N q) . \end{array}

By Lemma 1 and (A.2), we have

\begin{array}{l} ‖ {\hat{g}}_{(k_{r}, k_{c})}^{o r} - g_{(k_{r}, k_{c})}^{*} ‖ = ‖ {\hat{α}}_{k_{r}, k_{c}}^{o r ⊤} U_{p} - α_{k_{r}, k_{c}}^{* ⊤} U_{p} + α_{k_{r}, k_{c}}^{* ⊤} U_{p} - g_{(k_{r}, k_{c})}^{*} ‖ \leq ‖ {({\hat{α}}_{k_{r}, k_{c}}^{o r} - α_{k_{r}, k_{c}}^{*})}^{⊤} U_{p} ‖ + ‖ α_{k_{r}, k_{c}}^{* ⊤} U_{p} - g_{(k_{r}, k_{c})}^{*} ‖ \\ \leq C_{2}^{1 / 2} C_{3} C_{4} {(p \log (N q) / | G_{m i n}^{(r, c)} |)}^{1 / 2} + M_{2} J^{- κ} \leq (C_{2}^{1 / 2} C_{3} C_{4} + M_{2} / 2) {(p \log (N q) / | G_{m i n}^{(r, c)} |)}^{1 / 2} \\ = C^{*} {(p \log (N q) / | G_{m i n}^{(r, c)} |)}^{1 / 2}, \end{array}

where $C^{*} = \max {C_{3} C_{4}, C_{2}^{1 / 2} C_{3} C_{4} + M_{2} / 2}$ . That is,

P (\sup_{1 \leq k_{r} \leq K_{r}, 1 \leq k_{c} \leq K_{c}} ‖ {\hat{g}}_{(k_{r}, k_{c})}^{o r} - g_{(k_{r}, k_{c})}^{*} ‖ \leq ψ) \geq 1 - 3 K_{r} K_{c} p / (N q),

where $ψ = C^{*} {(p \log (N q) / | G_{m i n}^{(r, c)} |)}^{1 / 2}$ . □

Proof of Theorem 2.

Let $ρ_{1} (t) = γ_{2}^{- 1} p_{τ} (t, γ_{2})$ and $ρ_{2} (t) = {({(N / q)}^{1 / 2} γ_{2})}^{- 1} p_{τ} (t, {(N / q)}^{1 / 2} γ_{2})$ . Define

Q (β) = \frac{1}{2} \sum_{i = 1}^{N} \sum_{j = 1}^{q} ({‖ Y_{i, j} - U_{i, j} β_{i, j} ‖}_{2}^{2} + γ_{1} β_{i, j}^{⊤} D β_{i, j}),

P e n (β) = γ_{2} \sum_{(i_{1}, i_{2}) \in Δ^{(r)}} ρ_{1} ({‖ β_{i_{1}}^{(r)} - β_{i_{2}}^{(r)} ‖}_{2}) + {(N / q)}^{1 / 2} γ_{2} \sum_{(j_{1}, j_{2}) \in Δ^{(c)}} ρ_{2} ({‖ β_{j_{1}}^{(c)} - β_{j_{2}}^{(c)} ‖}_{2}),

Q^{G} (α) = \frac{1}{2} \sum_{k_{r} = 1}^{K_{r}} \sum_{k_{c} = 1}^{K_{c}} ({‖ Y_{(k_{r}, k_{c})} - U_{(k_{r}, k_{c})} α_{k_{r}, k_{c}} ‖}_{2}^{2} + γ_{1} | G_{k_{r}, k_{c}}^{(r, c)} | α_{k_{r}, k_{c}}^{⊤} D α_{k_{r}, k_{c}}),

P e n^{G} (α) = γ_{2} \sum_{k_{r} < k_{r}^{'}} | G_{k_{r}}^{(r)} | | G_{k_{r}^{'}}^{(r)} | ρ_{1} ({‖ α_{k_{r}}^{(r)} - α_{k_{r}^{'}}^{(r)} ‖}_{2}) + {(N / q)}^{1 / 2} γ_{2} \sum_{k_{c} < k_{c}^{'}} | G_{k_{c}}^{(c)} | | G_{k_{c}^{'}}^{(c)} | ρ_{2} ({‖ α_{k_{c}}^{(c)} - α_{k_{c}^{'}}^{(c)} ‖}_{2}),

where $α_{k_{r}}^{(r)} = {(α_{k_{r}, 1}^{(r) ⊤}, \dots, α_{k_{r}, q}^{(r) ⊤})}^{⊤}$ with $α_{k_{r}, j}^{(r)} = α_{k_{r}, k}$ if $j \in G_{k}^{(c)}$ , $α_{k_{c}}^{(c)} = {(α_{1, k_{c}}^{(c) ⊤}, \dots, α_{N, k_{c}}^{(c) ⊤})}^{⊤}$ with $α_{i, k_{C}}^{(c)} = α_{k, k_{c}}$ if $i \in G_{k}^{(r)}$ . Let $L (β) = Q (β) + P e n (β)$ , $L^{G} (α) = Q^{G} (α) + P e n^{G} (α)$

We define two mappings, $\tilde{T} : M_{G} \to {\tilde{M}}_{G}$ and $\hat{T} : ℝ^{N q p} \to {\hat{M}}_{G}$ , and the two subspaces are defined by

{\tilde{M}}_{G} = {α \in ℝ^{K_{r} K_{c} p} : α_{k_{r}, k_{c}} = β_{i, j}, for any (i, j) \in G_{k_{r}, k_{c}}^{(r, c)}, 1 \leq k_{r} \leq K_{r}, 1 \leq k_{c} \leq K_{c}},

{\hat{M}}_{G} = {α \in ℝ^{K_{r} K_{c} p} : α_{k_{r}, k_{c}} = {| G_{k_{r}, k_{c}}^{(r, c)} |}^{- 1} \sum_{(i, j) \in G_{k_{r}, k_{c}}^{(r, c)}} β_{i, j}, 1 \leq k_{r} \leq K_{r}, 1 \leq k_{c} \leq K_{c}} .

For every $β \in M_{G}$ , we have $P e n (β) = P e n^{G} (\tilde{T} (β))$ , and for every $α \in {\tilde{M}}_{G}$ , we have $P e n ({\tilde{T}}^{- 1} (α)) = P e n^{G} (α)$ . Hence

L (β) = L^{G} (\tilde{T} (β)), L^{G} (α) = L ({\tilde{T}}^{- 1} (α)) .

(A.10)

Consider the neighborhood of β*:

Θ = {β \in ℝ^{N q p} : \sup_{1 \leq i \leq N, 1 \leq j \leq q} {‖ β_{i, j} - β_{i, j}^{*} ‖}_{2} \leq ψ} .

By the result in Theorem 1, there is an event E₁ such that on E₁,

\sup_{1 \leq i \leq N, 1 \leq j \leq q} {‖ {\hat{β}}_{i, j}^{o r} - β_{i, j}^{*} ‖}_{2} \leq ψ,

and $P (E_{1}^{C}) \leq 3 K_{r} K_{c} p / (N q)$ . Hence ${\hat{β}}^{o r} \in Θ$ on E₁. For any $β \in ℝ^{N q p}$ , let $\tilde{β} = {\tilde{T}}^{- 1} (\hat{T} (β))$ . Inspired by [27], we show that ${\hat{β}}^{o r}$ is a strictly local minimizer of objective function (3) with probability tending to 1 through the following two steps:

(i) On E₁, $L (\tilde{β}) > L ({\hat{β}}^{o r})$ for any β ∈ Θ and $\tilde{β} \neq {\hat{β}}^{o r}$ .
(ii) There is an event E₂ such that $P (E_{2}^{C}) \leq c_{2} / (N q)$ . On E₁ ∩ E₂, there is a neighborhood of ${\hat{β}}^{o r}$ , denoted by Θ^′, such that $L (β) \geq L (\tilde{β})$ for any β ∈ Θ^′ ∩ Θ for sufficiently large N and q.

Therefore, by the results in (i) and (ii), we have $L (β) > L ({\hat{β}}^{o r})$ for any β ∈ Θ^′ ∩ Θ and $\tilde{β} \neq {\hat{β}}^{o r}$ , so that ${\hat{β}}^{o r}$ is a strictly local minimizer of L(β) on E₁ ∩ E₂ with P(E₁ ∩ E₂) ≥ 1 − 3K_rK_pp/(Nq) − c₂/(Nq) for sufficiently large N and q.

Firstly, we prove the result in (i). Let $\hat{T} (β) = α = {(α_{1, 1}^{⊤}, \dots, α_{K_{r}, K_{C}}^{⊤})}^{⊤}$ and $α_{k_{r}}^{(r) *} = {(β_{i, 1}^{(r) * ⊤}, \dots, β_{i, q}^{(r) * ⊤})}^{⊤}$ for $i \in G_{k_{r}}^{(r)}$ . Since

{‖ α_{k_{r}}^{(r)} - α_{k_{r}^{'}}^{(r)} ‖}_{2} \geq {‖ α_{k_{r}}^{(r) *} - α_{k_{r}^{'}}^{(r) *} ‖}_{2} - 2 \sup_{1 \leq k_{r} \leq K_{r}} {‖ α_{k_{r}}^{(r)} - α_{k_{r}}^{(r) *} ‖}_{2},

and

\sup_{1 \leq k_{r} \leq K_{r}} {‖ α_{k_{r}}^{(r)} - α_{k_{r}}^{(r) *} ‖}_{2}^{2} = \sup_{1 \leq k_{r} \leq K_{r}} {\sum_{k_{c} = 1}^{K_{c}} | G_{k_{c}}^{(c)} | \cdot {‖ \sum_{i \in G_{k_{r}}^{(r)}} \sum_{j \in G_{k_{c}}^{(c)}} β_{i, j} / (| G_{k_{r}}^{(r)} ‖ G_{k_{c}}^{(c)} |) - α_{k_{r}, k_{c}}^{*} ‖}_{2}^{2}} \leq \sup_{1 \leq k_{r} \leq K_{r}} {| G_{k_{r}}^{(r)} |}^{- 1} \sum_{k_{c} = 1}^{K_{c}} \sum_{i \in G_{k_{r}}^{(r)}} \sum_{j \in G_{k_{c}}^{(c)}} {‖ β_{i, j} - β_{i, j}^{*} ‖}_{2}^{2} \leq q \sup_{1 \leq i \leq N, 1 \leq j \leq q} {‖ β_{i, j} - β_{i, j}^{*} ‖}_{2}^{2}

(A.11)

by Lemma 2, for any $k_{r} \neq k_{r}^{'}$

\begin{array}{l} {‖ α_{k_{r}}^{(r)} - α_{k_{r}^{'}}^{(r)} ‖}_{2} \geq \frac{1}{2} {| G_{m i n}^{(c)} |}^{1 / 2} C_{2}^{- 1 / 2} b - 2 q^{1 / 2} \sup_{1 \leq i \leq N, 1 \leq j \leq q} {‖ β_{i, j} - β_{i, j}^{*} ‖}_{2} \\ \geq \frac{1}{2} {| G_{m i n}^{(c)} |}^{1 / 2} C_{2}^{- 1 / 2} b - 2 q^{1 / 2} C_{3} ψ > a γ_{2} . \end{array}

The last inequality follows from the assumption that ${| G_{m i n}^{(c)} |}^{1 / 2} b ≫ γ_{2} ≫ {(p q)}^{1 / 2} \log (N q) / \min {| G_{m i n}^{(r)} |, | G_{m i n}^{(c)} |} ≫ q^{1 / 2} ψ$ Similarly, for any $k_{c} \neq k_{c}^{'}$ , we have

\begin{array}{l} {‖ α_{k_{c}}^{(c)} - α_{k_{c}^{'}}^{(c)} ‖}_{2} \geq \frac{1}{2} {| G_{m i n}^{(r)} |}^{1 / 2} C_{2}^{- 1 / 2} b - 2 N^{1 / 2} \sup_{1 \leq i \leq N, 1 \leq j \leq q} {‖ β_{i, j} - β_{i, j}^{*} ‖}_{2} \\ \geq \frac{1}{2} {| G_{m i n}^{(r)} |}^{1 / 2} C_{2}^{- 1 / 2} b - 2 N^{1 / 2} C_{3} ψ > a {(N / q)}^{1 / 2} γ_{2} . \end{array}

Hence by Condition (C4), $P e n^{G} (\hat{T} (β)) = C_{p e n}$ , a constant, and hence $L^{G} (\hat{T} (β)) = Q^{G} (\hat{T} (β)) + C_{p e n}$ for all β ∈ Θ. Since ${\hat{α}}^{o r}$ is the unique global minimizer of $Q^{G} (α)$ , $Q^{G} (\hat{T} (β)) > Q^{G} ({\hat{α}}^{o r})$ for all $\hat{T} (β) \neq {\hat{α}}^{o r}$ , and thus $L^{G} (\hat{T} (β)) > L^{G} ({\hat{α}}^{o r})$ for all $\hat{T} (β) \neq {\hat{α}}^{o r}$ . By (A.10), we have $L^{G} (\hat{T} (β)) = L (\tilde{β})$ and $L^{G} ({\hat{α}}^{o r}) = L ({\hat{β}}^{o r})$ . Therefore $L (\tilde{β}) > L ({\hat{β}}^{o r})$ for all $\tilde{β} \neq {\hat{β}}^{o r}$ , and the result (i) is proved.

Next we prove result (ii). For a positive sequence ν_n, let

Θ^{'} = {β \in ℝ^{N q p} : \sup_{1 \leq i \leq N} {‖ β_{i}^{(r)} - {\hat{β}}_{i}^{(r) o r} ‖}_{2} \leq v_{n}, \sup_{1 \leq j \leq q} {‖ β_{j}^{(c)} - {\hat{β}}_{j}^{(c) o r} ‖}_{2} \leq v_{n}},

P e n^{r} (β) = γ_{2} \sum_{(i_{1}, i_{2}) \in Δ^{(r)}} ρ_{1} ({‖ β_{i_{1}}^{(r)} - β_{i_{2}}^{(r)} ‖}_{2}), P e n^{c} (β) = {(N / q)}^{1 / 2} γ_{2} \sum_{(j_{1}, j_{2}) \in Δ^{(c)}} ρ_{2} ({‖ β_{j_{1}}^{(c)} - β_{j_{2}}^{(c)} ‖}_{2}),

and Pen(β) = Pen^r(β) + Pen^c(β). For β ∈ Θ^′ ∩ Θ, by Taylor’s expansion, we have

L (β) - L (\tilde{β}) = Ω_{1} + Ω_{2} + Ω_{3},

(A.12)

where

Ω_{1} = \sum_{i = 1}^{N} \sum_{j = 1}^{q} {[- U_{i, j}^{⊤} (Y_{i, j} - U_{i, j} {\bar{β}}_{i, j}) + γ_{1} D {\bar{β}}_{i, j}]}^{⊤} (β_{i, j} - {\tilde{β}}_{i, j}),

Ω_{2} = \sum_{i = 1}^{N} {({\frac{\partial P e n^{r} (\bar{β})}{\partial β_{i}^{(r)}} |}_{β_{i}^{(r)} = {\bar{β}}_{i}^{(r)}})}^{⊤} (β_{i}^{(r)} - {\tilde{β}}_{i}^{(r)}), Ω_{3} = \sum_{j = 1}^{q} {({\frac{\partial P e n^{c} (\bar{β})}{\partial β_{j}^{(c)}} |}_{β_{j}^{(c)} = {\bar{β}}_{j}^{(c)}})}^{⊤} (β_{j}^{(c)} - {\tilde{β}}_{j}^{(c)}),

with $\bar{β} = {({\bar{β}}_{1, 1}^{⊤}, \dots, {\bar{β}}_{N, q}^{⊤})}^{⊤}$ and ${\bar{β}}_{i, j} = s β_{i, j} + (1 - s) {\tilde{β}}_{i, j}$ for some s ∈ (0, 1).

Firstly, we have

\begin{array}{l} Ω_{2} = γ_{2} \sum_{i_{1} < i_{2}} ρ_{1}^{'} ({‖ {\bar{β}}_{i_{1}}^{(r)} - {\bar{β}}_{i_{2}}^{(r)} ‖}_{2}) {‖ {\bar{β}}_{i_{1}}^{(r)} - {\bar{β}}_{i_{2}}^{(r)} ‖}_{2}^{- 1} {({\bar{β}}_{i_{1}}^{(r)} - {\bar{β}}_{i_{2}}^{(r)})}^{⊤} (β_{i_{1}}^{(r)} - {\tilde{β}}_{i_{1}}^{(r)}) \\ + γ_{2} \sum_{i_{1} > i_{2}} ρ_{1}^{'} ({‖ {\bar{β}}_{i_{1}}^{(r)} - {\bar{β}}_{i_{2}}^{(r)} ‖}_{2}) {‖ {\bar{β}}_{i_{1}}^{(r)} - {\bar{β}}_{i_{2}}^{(r)} ‖}_{2}^{- 1} {({\bar{β}}_{i_{1}}^{(r)} - {\bar{β}}_{i_{2}}^{(r)})}^{⊤} (β_{i_{1}}^{(r)} - {\tilde{β}}_{i_{1}}^{(r)}) \\ = γ_{2} \sum_{i_{1} < i_{2}} ρ_{1}^{'} ({‖ {\bar{β}}_{i_{1}}^{(r)} - {\bar{β}}_{i_{2}}^{(r)} ‖}_{2}) {‖ {\bar{β}}_{i_{1}}^{(r)} - {\bar{β}}_{i_{2}}^{(r)} ‖}_{2}^{- 1} {({\bar{β}}_{i_{1}}^{(r)} - {\bar{β}}_{i_{2}}^{(r)})}^{⊤} [(β_{i_{1}}^{(r)} - {\tilde{β}}_{i_{1}}^{(r)}) - (β_{i_{2}}^{(r)} - {\tilde{β}}_{i_{2}}^{(r)})] . \end{array}

When i1, $i_{2} \in G_{k_{r}}^{(r)}$ , ${\tilde{β}}_{i_{1}} = {\tilde{β}}_{i_{2}}$ . Thus

\begin{array}{l} Ω_{2} = γ_{2} \sum_{k_{r} = 1}^{K_{r}} \sum_{i_{1}, i_{2} \in G_{k_{r}}^{(r)}, i_{1} < i_{2}} ρ_{1}^{'} ({‖ {\bar{β}}_{i_{1}}^{(r)} - {\bar{β}}_{i_{2}}^{(r)} ‖}_{2}) {‖ {\bar{β}}_{i_{1}}^{(r)} - {\bar{β}}_{i_{2}}^{(r)} ‖}_{2}^{- 1} {({\bar{β}}_{i_{1}}^{(r)} - {\bar{β}}_{i_{2}}^{(r)})}^{⊤} (β_{i_{1}}^{(r)} - β_{i_{2}}^{(r)}) \\ + γ_{2} \sum_{k_{r} < k_{r}^{'} i_{1} \in G_{k_{r}}^{(r)} i_{2} \in G_{k_{r}^{'}}^{(r)}} ρ_{1}^{'} ({‖ {\bar{β}}_{i_{1}}^{(r)} - {\bar{β}}_{i_{2}}^{(r)} ‖}_{2}) {‖ {\bar{β}}_{i_{1}}^{(r)} - {\bar{β}}_{i_{2}}^{(r)} ‖}_{2}^{- 1} {({\bar{β}}_{i_{1}}^{(r)} - {\bar{β}}_{i_{2}}^{(r)})}^{⊤} [(β_{i_{1}}^{(r)} - {\tilde{β}}_{i_{1}}^{(r)}) - (β_{i_{2}}^{(r)} - {\tilde{β}}_{i_{2}}^{(r)})] . \end{array}

As shown in Theorem 1, $\sup_{i} {‖ {\tilde{β}}_{i}^{(r)} - β_{i}^{(r) *} ‖}_{2}^{2} = \sup_{k_{r}} {‖ α_{k_{r}}^{(r)} - α_{k_{r}}^{(r) *} ‖}_{2}^{2} \leq q ψ^{2}$ . Since ${\bar{β}}_{i}^{(r)} = s β_{i}^{(r)} + (1 - s) {\tilde{β}}_{i}^{(r)}$ $\sup_{i} {‖ {\bar{β}}_{i}^{(r)} - β_{i}^{(r) *} ‖}_{2} \leq s q^{1 / 2} ψ + (1 - s) q^{1 / 2} ψ = q^{1 / 2} ψ$ . For $k_{r} \neq k_{r}^{'}$ , $i_{1} \in G_{k_{r}}^{(r)}$ , $i_{2} \in G_{k_{r}^{'}}^{(r)}$ , we have

{‖ {\bar{β}}_{i_{1}}^{(r)} - {\bar{β}}_{i_{2}}^{(r)} ‖}_{2} \geq \min_{i_{1} \in G_{k_{r}}^{(r)}, i_{2} \in G_{k_{r}^{'}}^{(r)}} {‖ β_{i_{1}}^{(r) *} - β_{i_{2}}^{(r) *} ‖}_{2} - 2 \max_{i} {‖ {\bar{β}}_{i}^{(r)} - β_{i}^{(r) *} ‖}_{2} \geq \frac{1}{2} {| G_{m i n}^{(c)} |}^{1 / 2} C_{2}^{- 1 / 2} b - 2 q^{1 / 2} ψ > a γ_{2},

and thus $ρ_{1}^{'} ({‖ {\bar{β}}_{i_{1}}^{(r)} - {\bar{β}}_{i_{2}}^{(r)} ‖}_{2}) = 0$ . Therefore,

\begin{array}{l} Ω_{2} = γ_{2} \sum_{k_{r} = 1}^{K_{r}} \sum_{i_{1}, i_{2} \in G_{k_{r}}^{(r)}, i_{1} < i_{2}} ρ_{1}^{'} ({‖ {\bar{β}}_{i_{1}}^{(r)} - {\bar{β}}_{i_{2}}^{(r)} ‖}_{2}) {‖ β_{i_{1}}^{(r)} - β_{i_{2}}^{(r)} ‖}_{2} \\ \geq γ_{2} \sum_{k_{r} = 1}^{K_{r}} \sum_{i_{1}, i_{2} \in G_{k_{r}}^{(r)}, i_{1} < i_{2}} ρ_{1}^{'} ({‖ {\bar{β}}_{i_{1}}^{(r)} - {\bar{β}}_{i_{2}}^{(r)} ‖}_{2}) q^{- 1 / 2} \sum_{j = 1}^{q} {‖ β_{i_{1}, j} - β_{i_{2}, j} ‖}_{2} \\ = γ_{2} q^{- 1 / 2} \sum_{k_{r} = 1}^{K_{r}} \sum_{k_{c} = 1}^{K_{c}} \sum_{i_{1}, i_{2} \in G_{k_{r}}^{(r)}, i_{1} < i_{2}} \sum_{j \in G_{k_{c}}^{(c)}} ρ_{1}^{'} ({‖ {\bar{β}}_{i_{1}}^{(r)} - {\bar{β}}_{i_{2}}^{(r)} ‖}_{2}) {‖ β_{i_{1}, j} - β_{i_{2}, j} ‖}_{2} . \end{array}

Similarly to (A.11), $\sup_{i} {‖ {\tilde{β}}_{i}^{(r)} - {\hat{β}}_{i}^{(r) o r} ‖}_{2} \leq v_{n}$ and $\sup_{i} {‖ β_{i}^{(r)} - {\hat{β}}_{i}^{(r) o r} ‖}_{2} \leq v_{n}$ . Then we have

\sup_{i_{1} < i_{2}} {‖ {\bar{β}}_{i_{1}}^{(r)} - {\bar{β}}_{i_{2}}^{(r)} ‖}_{2} \leq 2 \sup_{i} {‖ {\bar{β}}_{i}^{(r)} - {\tilde{β}}_{i}^{(r)} ‖}_{2} \leq 2 \sup_{i} {‖ β_{i}^{(r)} - {\tilde{β}}_{i}^{(r)} ‖}_{2} \leq 2 (\sup_{i} {‖ β_{i}^{(r)} - {\hat{β}}_{i}^{(r) o r} ‖}_{2} + \sup_{i} {‖ {\tilde{β}}_{i}^{(r)} - {\hat{β}}_{i}^{(r) o r} ‖}_{2}) \leq 4 v_{n} .

Hence $ρ_{1}^{'} ({‖ {\bar{β}}_{i_{1}}^{(r)} - {\bar{β}}_{i_{2}}^{(r)} ‖}_{2}) \geq ρ_{1}^{'} (4 v_{n})$ by the concavity of ρ(·). As a result,

Ω_{2} \geq γ_{2} q^{- 1 / 2} \sum_{k_{r} = 1}^{K_{r}} \sum_{k_{c} = 1}^{K_{c}} \sum_{i_{1}, i_{2} \in G_{k_{r}}^{(r)}, i_{1} < i_{2}} \sum_{j \in G_{k_{c}}^{(c)}} ρ_{1}^{'} (4 v_{n}) {‖ β_{i_{1}, j} - β_{i_{2}, j} ‖}_{2} .

(A.13)

Next we consider Ω₃. Similarly to the derivation of (A.13), we can derive

Ω_{3} \geq γ_{2} q^{- 1 / 2} \sum_{k_{r} = 1}^{K_{r}} \sum_{k_{c} = 1}^{K_{c}} \sum_{j_{1}, j_{2} \in G_{k_{c}}^{(c)} j_{1} < j_{2}} \sum_{i \in G_{k_{r}}^{(r)}} ρ_{2}^{'} (4 v_{n}) {‖ β_{i, j_{1}} - β_{i, j_{2}} ‖}_{2} .

(A.14)

Lastly for Ω₁, we have

Ω_{1} = - \sum_{i = 1}^{N} \sum_{j = 1}^{q} w_{i, j}^{⊤} (β_{i, j} - {\tilde{β}}_{i, j}) = - \sum_{k_{r} = 1}^{K_{r}} \sum_{k_{c} = 1}^{K_{c}} \sum_{i_{1}, i_{2} \in G_{k_{r}}^{(r)}} \sum_{j_{1}, j_{2} \in G_{k_{c}}^{(c)}} \frac{w_{i_{1}, j_{1}}^{⊤} (β_{i_{1}, j_{1}} - β_{i_{2}, j_{2}})}{| G_{k_{r}, k_{c}}^{(r, c)} |},

and

\begin{array}{l} \sum_{k_{r} = 1}^{K_{r}} \sum_{k_{c} = 1}^{K_{c}} \sum_{i_{1}, i_{2} \in G_{k_{r}}^{(r)}} \sum_{j_{1}, j_{2} \in G_{k_{c}}^{(c)}} \frac{| w_{i_{1}, j_{1}}^{⊤} (β_{i_{1}, j_{1}} - β_{i_{2}, j_{2}}) |}{| G_{k_{r}, k_{c}}^{(r, c)} |} \leq \sup_{i, j} {‖ w_{i, j} ‖}_{2} \sum_{k_{r} = 1}^{K_{r}} \sum_{k_{c} = 1}^{K_{c}} \sum_{i_{1}, i_{2} \in G_{k_{r}}^{(r)}} \sum_{j_{1}, j_{2} \in G_{k_{c}}^{(c)}} \frac{{‖ β_{i_{1}, j_{1}} - β_{i_{2}, j_{2}} ‖}_{2}}{| G_{k_{r}, k_{c}}^{(r, c)} |} \\ \leq 2 \sup_{i, j} {‖ w_{i, j} ‖}_{2} \sum_{k_{r} = 1}^{K_{r}} \sum_{k_{c} = 1}^{K_{c}} \sum_{i_{1}, i_{2} \in G_{k_{r}}^{(r)}, i_{1} < i_{2}} \sum_{j \in G_{k_{c}}^{(c)}} \frac{{‖ β_{i_{1}, j} - β_{i_{2}, j} ‖}_{2}}{| G_{k_{r}}^{(r)} |} + 2 \sup_{i, j} {‖ w_{i, j} ‖}_{2} \sum_{k_{r} = 1}^{K_{r}} \sum_{k_{c} = 1}^{K_{c}} \sum_{j_{1}, j_{2} \in G_{k_{c}}^{(c)} j_{1} < j_{2}} \sum_{i \in G_{k_{r}}^{(r)}} \frac{{‖ β_{i, j_{1}} - β_{i, j_{2}} ‖}_{2}}{| G_{k_{c}}^{(c)} |}, \end{array}

where $w_{i, j} = U_{i, j}^{⊤} (Y_{i, j} - U_{i, j} {\bar{β}}_{i, j}) - γ_{1} D {\bar{β}}_{i, j}$ . Note that

\sup_{i, j} {‖ w_{i, j} ‖}_{2} \leq \sup_{i, j} {‖ U_{i, j}^{⊤} (g_{i, j}^{*} - U_{i, j} β_{i, j}^{*}) ‖}_{2} + \sup_{i, j} {‖ (U_{i, j}^{⊤} U_{i, j} + γ_{1} D) (β_{i, j}^{*} - {\bar{β}}_{i, j}) ‖}_{2} + \sup_{i, j} {‖ γ_{1} D β_{i, j}^{*} ‖}_{2} + \sup_{i, j} {‖ U_{i, j}^{⊤} ϵ_{i, j} ‖}_{2} .

By Lemma 1 $\sup_{i, j} {‖ U_{i, j}^{⊤} (g_{i, j}^{*} - U_{i, j} β_{i, j}^{*}) ‖}_{2} \leq n_{m} M_{1} M_{2} p^{1 / 2} J^{- κ}$ . Moreover, $\sup_{i, j} {‖ (U_{i, j}^{⊤} U_{i, j} + γ_{1} D) (β_{i, j}^{*} - {\bar{β}}_{i, j}) ‖}_{2} \leq (n_{m}^{1 / 2} p^{1 / 2} M_{1} + γ_{1} ‖ D ‖_{2}) ψ, \sup_{i, j} {‖ γ_{1} D β_{i, j}^{*} ‖}_{2} \leq p^{1 / 2} γ_{1} ‖ D ‖_{2} {‖ β^{*} ‖}_{\infty}$ . With the Bonferroni’s inequality, Markov’s inequality, and Condition (C5), we have

\begin{array}{l} P (\sup_{i, j} {‖ U_{(i, j)}^{⊤} ϵ_{i, j} ‖}_{2} > 2 n_{i, j} F^{- 1} M_{1} p^{1 / 2} \log (N q)) \leq \sum_{i = 1}^{N} \sum_{j = 1}^{q} P ({‖ U_{(i, j)}^{⊤} ϵ_{i, j} ‖}_{2} > 2 n_{i, j} F^{- 1} M_{1} p^{1 / 2} \log (N q)) \\ \leq \sum_{i = 1}^{N} \sum_{j = 1}^{q} P (F n_{i, j}^{- 1 / 2} {‖ ϵ_{i, j} ‖}_{2} > 2 \log (N q)) \leq c_{2} / (N q) . \end{array}

Together with Conditions (C1) and (C3), we have

\sup_{i, j} {‖ w_{i, j} ‖}_{2} = O (p^{1 / 2} \log (N q))

(A.15)

holds with probability at least 1 − c₂/(Nq). Let ν_n = o(1), then $ρ_{1}^{'} (4 v_{n}) \to 1$ and $ρ_{2}^{'} (4 v_{n}) \to 1$ . Since $γ_{2} ≫ {(p q)}^{1 / 2} \log (N q) / \min {| G_{m i n}^{(r)} |, | G_{m i n}^{(c)} |}$ , then by (A.12)–(A.15)

L (β) - L (\tilde{β}) = Ω_{1} + Ω_{2} + Ω_{3} \geq \sum_{k_{r} = 1}^{K_{r}} \sum_{k_{c} = 1}^{K_{c}} \sum_{i_{1}, i_{2} \in G_{k_{r}}^{(r)}, i_{1} < i_{2}} \sum_{j \in G_{k_{c}}^{(c)}} [γ_{2} q^{- 1 / 2} ρ_{1}^{'} (4 v_{n}) - \frac{2 \sup_{i, j} {‖ w_{i, j} ‖}_{2}}{| G_{k_{r}}^{(r)} |}] {‖ β_{i_{1}, j} - β_{i_{2}, j} ‖}_{2} + \sum_{k_{r} = 1}^{K_{r}} \sum_{k_{c} = 1}^{K_{c}} \sum_{j_{1}, j_{2} \in G_{k_{c}}^{(c)} j_{1} < j_{2}} \sum_{i \in G_{k_{r}}^{(r)}} [γ_{2} q^{- 1 / 2} ρ_{2}^{'} (4 v_{n}) - \frac{2 \sup_{i, j} {‖ w_{i, j} ‖}_{2}}{| G_{k_{c}}^{(c)} |}] {‖ β_{i, j_{1}} - β_{i, j_{2}} ‖}_{2} \geq 0

holds with probability at least 1 − c₂/(Nq), which completes the proof of result (ii). □

Footnotes

CRediT authorship contribution statement

Kuangnan Fang: Methodology, Formal analysis Writing – original draft. Yuanxing Chen: Data curation, Investigation, Software, Writing – original draft. Shuangge Ma: Conceptualization, Methodology, Writing – review & editing. Qingzhao Zhang: Conceptualization, Methodology, Validation, Writing – review & editing, Supervision.

Appendix B. Supplementary data

Supplementary material related to this article can be found online at https://doi.org/10.1016/j.jmva.2021.104874. The Supplementary section contains additional tables and figures for Examples 2–5.

References

[1].Abraham C, Cornillon PA, Matzner-Løber E, Molinari N, Unsupervised curve clustering using B-splines, Scand. J. Stat 30 (3) (2003) 581–595. [Google Scholar]
[2].Aneiros G, Cao R, Fraiman R, Genest C, Vieu P, Recent advances in functional data analysis and high-dimensional statistics, J. Multivariate Anal. 170 (2019) 3–9. [Google Scholar]
[3].Biau G, Devroye L, Lugosi G, On the performance of clustering in hilbert spaces, IEEE Trans. Inform. Theory 54 (2) (2008) 781–790. [Google Scholar]
[4].Bouveyron C, Bozzi L, Jacques J, Jollois F-X, The functional latent block model for the co-clustering of electricity consumption curves, J. R. Stat. Soc. Ser. C. Appl. Stat 67 (4) (2018) 897–915. [Google Scholar]
[5].Boyd S, Parikh N, Chu E, Peleato B, Eckstein J, Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends Mach. Learn 3 (1) (2011) 1–122. [Google Scholar]
[6].Chaussabel D, Quinn C, Shen J, Patel P, Glaser C, Baldwin N, Stichweh D, Blankenship D, Li L, A modular analysis framework for blood genomics studies: application to systemic lupus erythematosus, Immunity 29 (1) (2008) 150–164. [DOI] [PMC free article] [PubMed] [Google Scholar]
[7].Chen J, Zhang S, Integrative analysis for identifying joint modular patterns of gene-expression and drug-response data, Bioinformatics 32 (11) (2016) 1724–1732. [DOI] [PubMed] [Google Scholar]
[8].Chi EC, Lange K, Splitting methods for convex clustering, J. Comput. Graph. Statist. 24 (4) (2015) 994–1013. [DOI] [PMC free article] [PubMed] [Google Scholar]
[9].Chiou J-M, Li P-L, Functional clustering and identifying substructures of longitudinal data, J. R. Stat. Soc. Ser. B Stat. Methodol 69 (4) (2007) 679–699. [Google Scholar]
[10].Chiou J-M, Li P-L, Correlation-based functional clustering via subspace projection, J. Amer. Statist. Assoc. 103 (484) (2008) 1684–1692. [Google Scholar]
[11].Chu W, Li R, Matthew R, Feature screening for time-varying coefficient models with ultrahigh dimensional longitudinal data, Ann. Appl. Stat 10 (2) (2016) 596–617. [DOI] [PMC free article] [PubMed] [Google Scholar]
[12].Coffey N, Hinde J, Holian E, Clustering longitudinal profiles using P-splines and mixed effects models applied to time-course gene expression data, Comput. Statist. Data Anal 71 (2014) 14–29. [Google Scholar]
[13].DeVore RA, Lorentz GG, Constructive Approximation: Polynomials and Splines Approximation, Springer-Verlag, Berlin, 1993. [Google Scholar]
[14].Fan J, Li R, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc. 96 (456) (2001) 1348–1360. [Google Scholar]
[15].Goia A, Vieu P, An introduction to recent advances in high/infinite dimensional statistics, J. Multivariate Anal. 146 (146) (2016) 1–6. [Google Scholar]
[16].Hejblum BP, Skinner J, Thiébaut R, Time-course gene set analysis for longitudinal gene expression data, PLoS Comput. Biol 11 (6) (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
[17].Jacques J, Preda C, Functional data clustering: a survey, Adv. Data Anal. Classif 8 (3) (2014) 231–255. [Google Scholar]
[18].Jacques J, Preda C, Model-based clustering for multivariate functional data, Comput. Statist. Data Anal 71 (2014) 92–106. [Google Scholar]
[19].Jain AK, Data clustering: 50 years beyond K-means, Int. Conf. Pattern Recognit 31 (8) (2010) 651–666. [Google Scholar]
[20].James GM, Sugar CA, Clustering for sparsely sampled functional data, J. Amer. Statist. Assoc 98 (462) (2003) 397–408. [Google Scholar]
[21].Kerr G, Ruskin HJ, Crane M, Doolan P, Techniques for clustering gene expression data, Comput. Biol. Med. 38 (3) (2008) 283–293. [DOI] [PubMed] [Google Scholar]
[22].Li S, Rouphael N, Duraisingham S, Romero-Steiner S, Presnell S, Davis CW, Schmidt DS, Johnson SE, Molecular signatures of antibody responses derived from a systems biology study of five human vaccines, Nat. Immunol. 15 (2) (2014) 195–204. [DOI] [PMC free article] [PubMed] [Google Scholar]
[23].Ling N, Vieu P, Nonparametric modelling for functional data: selected survey and tracks for future, Statistics 52 (4) (2018) 934–949. [Google Scholar]
[24].Liu L, Lin L, Subgroup analysis for heterogeneous additive partially linear models and its application to car sales data, Comput. Statist. Data Anal. 138 (2019) 239–259. [Google Scholar]
[25].Liu X, Wang L, Liang H, Estimation and variable selection for semiparametric additive partial linear models, Statist. Sinica 21 (3) (2011) 1225–1248. [DOI] [PMC free article] [PubMed] [Google Scholar]
[26].Ma P, Castillo-Davis CI, Zhong W, Liu JS, A data-driven clustering method for time course gene expression data, Nucleic Acids Res. 34 (4) (2006) 1261–1269. [DOI] [PMC free article] [PubMed] [Google Scholar]
[27].Ma S, Huang J, A concave pairwise fusion approach to subgroup analysis, J. Amer. Statist. Assoc. 112 (517) (2017) 410–423. [Google Scholar]
[28].Mankad S, Michailidis G, Biclustering three-dimensional data arrays with plaid models, J. Comput. Graph. Statist. 23 (4) (2014) 943–965. [Google Scholar]
[29].Opgen-Rhein R, Strimmer K, Inferring gene dependency networks from genomic longitudinal data: a functional data approach, REVSTAT 4 (2006) 53–65. [Google Scholar]
[30].Peng J, Müller H-G, Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions, Ann. Appl. Stat 2 (3) (2008) 1056–1077. [Google Scholar]
[31].Rangel C, Angus J, Ghahramani Z, Lioumi M, Sotheran E, Gaiba A, Wild DL, Falciani F, Modeling T-cell activation using gene expression profiling and state-space models, Bioinformatics 20 (9) (2004) 1361–1372. [DOI] [PubMed] [Google Scholar]
[32].Ruppert D, Selecting the number of knots for penalized splines, J. Comput. Graph. Statist. 11 (4) (2002) 735–757. [Google Scholar]
[33].Schmutz A, Jacques J, Bouveyron C, Cheze L, Martin P, Clustering multivariate functional data in group-specific functional subspaces, Comput. Statist (2020) 1–31. [Google Scholar]
[34].Schumaker LL, Spline Functions: Basic Theory, Wiley, New York, 2007. [Google Scholar]
[35].Slimen YB, Allio S, Jacques J, Model-based co-clustering for functional data, Neurocomputing 291 (2018) 97–108. [Google Scholar]
[36].Stone CJ, The dimensionality reduction principle for generalized additive models, Ann. Statist 14 (2) (1986) 590–606. [Google Scholar]
[37].Suarez AJ, Subhashis G, BayesIan clustering of functional data using local features, Bayesian Anal 11 (1) (2016) 71–98. [Google Scholar]
[38].Tseng P, Convergence of a block coordinate descent method for nondifferentiable minimization, J. Optim. Theory Appl. 109 (3) (2001) 475–494. [Google Scholar]
[39].Van Der Vaart AW, Wellner JA, Weak Convergence and Empirical Processes, Springer; New York, 1996. [Google Scholar]
[40].Wang J-L, Chiou J-M, Müller H-G, Functional data analysis, Annu. Rev. Stat. Appl 3 (1) (2016) 257–295. [Google Scholar]
[41].Wang L, Yang L, Spline-backfitted kernel smoothing of nonlinear additive autoregression model, Ann. Statist 35 (6) (2007) 2474–2503. [Google Scholar]
[42].Wang HJ, Zhu Z, Zhou J, Quantile regression in partially linear varying coefficient models, Ann. Statist 37 (6B) (2009) 3841–3866. [Google Scholar]
[43].Weiner J, Lewis DJM, Maertzdorf J, Mollenkopf H-J, Characterization of potential biomarkers of reactogenicity of licensed antiviral vaccines: randomized controlled clinical trials conducted by the biovacsafe consortium, Sci. Rep. 9 (1) (2019) 20362. [DOI] [PMC free article] [PubMed] [Google Scholar]
[44].Wu C, Kwon S, Shen X, Pan W, A new algorithm and theory for penalized regression-based clustering, J. Mach. Learn. Res 17 (1) (2015) 6479–6503. [PMC free article] [PubMed] [Google Scholar]
[45].Xie J, Ma A, Fennell A, Ma Q, Zhao J, It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data, Brief. Bioinform. 20 (4) (2019) 1450–1465. [DOI] [PMC free article] [PubMed] [Google Scholar]
[46].Xu R, Wunsch D, Survey of clustering algorithms, IEEE Trans. Neural Netw. 16 (3) (2005) 645–678. [DOI] [PubMed] [Google Scholar]
[47].Zhang C-H, Nearly unbiased variable selection under minimax concave penalty, Ann. Statist 38 (2) (2010) 894–942. [Google Scholar]
[48].Zhu X, Qu A, Cluster analysis of longitudinal profiles with subgroups, Electron. J. Stat. 12 (2018) 171–193. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material

NIHMS1826775-supplement-Supplemental_Material.pdf^{(86.6KB, pdf)}

[R1] [1].Abraham C, Cornillon PA, Matzner-Løber E, Molinari N, Unsupervised curve clustering using B-splines, Scand. J. Stat 30 (3) (2003) 581–595. [Google Scholar]

[R2] [2].Aneiros G, Cao R, Fraiman R, Genest C, Vieu P, Recent advances in functional data analysis and high-dimensional statistics, J. Multivariate Anal. 170 (2019) 3–9. [Google Scholar]

[R3] [3].Biau G, Devroye L, Lugosi G, On the performance of clustering in hilbert spaces, IEEE Trans. Inform. Theory 54 (2) (2008) 781–790. [Google Scholar]

[R4] [4].Bouveyron C, Bozzi L, Jacques J, Jollois F-X, The functional latent block model for the co-clustering of electricity consumption curves, J. R. Stat. Soc. Ser. C. Appl. Stat 67 (4) (2018) 897–915. [Google Scholar]

[R5] [5].Boyd S, Parikh N, Chu E, Peleato B, Eckstein J, Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends Mach. Learn 3 (1) (2011) 1–122. [Google Scholar]

[R6] [6].Chaussabel D, Quinn C, Shen J, Patel P, Glaser C, Baldwin N, Stichweh D, Blankenship D, Li L, A modular analysis framework for blood genomics studies: application to systemic lupus erythematosus, Immunity 29 (1) (2008) 150–164. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] [7].Chen J, Zhang S, Integrative analysis for identifying joint modular patterns of gene-expression and drug-response data, Bioinformatics 32 (11) (2016) 1724–1732. [DOI] [PubMed] [Google Scholar]

[R8] [8].Chi EC, Lange K, Splitting methods for convex clustering, J. Comput. Graph. Statist. 24 (4) (2015) 994–1013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] [9].Chiou J-M, Li P-L, Functional clustering and identifying substructures of longitudinal data, J. R. Stat. Soc. Ser. B Stat. Methodol 69 (4) (2007) 679–699. [Google Scholar]

[R10] [10].Chiou J-M, Li P-L, Correlation-based functional clustering via subspace projection, J. Amer. Statist. Assoc. 103 (484) (2008) 1684–1692. [Google Scholar]

[R11] [11].Chu W, Li R, Matthew R, Feature screening for time-varying coefficient models with ultrahigh dimensional longitudinal data, Ann. Appl. Stat 10 (2) (2016) 596–617. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] [12].Coffey N, Hinde J, Holian E, Clustering longitudinal profiles using P-splines and mixed effects models applied to time-course gene expression data, Comput. Statist. Data Anal 71 (2014) 14–29. [Google Scholar]

[R13] [13].DeVore RA, Lorentz GG, Constructive Approximation: Polynomials and Splines Approximation, Springer-Verlag, Berlin, 1993. [Google Scholar]

[R14] [14].Fan J, Li R, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc. 96 (456) (2001) 1348–1360. [Google Scholar]

[R15] [15].Goia A, Vieu P, An introduction to recent advances in high/infinite dimensional statistics, J. Multivariate Anal. 146 (146) (2016) 1–6. [Google Scholar]

[R16] [16].Hejblum BP, Skinner J, Thiébaut R, Time-course gene set analysis for longitudinal gene expression data, PLoS Comput. Biol 11 (6) (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] [17].Jacques J, Preda C, Functional data clustering: a survey, Adv. Data Anal. Classif 8 (3) (2014) 231–255. [Google Scholar]

[R18] [18].Jacques J, Preda C, Model-based clustering for multivariate functional data, Comput. Statist. Data Anal 71 (2014) 92–106. [Google Scholar]

[R19] [19].Jain AK, Data clustering: 50 years beyond K-means, Int. Conf. Pattern Recognit 31 (8) (2010) 651–666. [Google Scholar]

[R20] [20].James GM, Sugar CA, Clustering for sparsely sampled functional data, J. Amer. Statist. Assoc 98 (462) (2003) 397–408. [Google Scholar]

[R21] [21].Kerr G, Ruskin HJ, Crane M, Doolan P, Techniques for clustering gene expression data, Comput. Biol. Med. 38 (3) (2008) 283–293. [DOI] [PubMed] [Google Scholar]

[R22] [22].Li S, Rouphael N, Duraisingham S, Romero-Steiner S, Presnell S, Davis CW, Schmidt DS, Johnson SE, Molecular signatures of antibody responses derived from a systems biology study of five human vaccines, Nat. Immunol. 15 (2) (2014) 195–204. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] [23].Ling N, Vieu P, Nonparametric modelling for functional data: selected survey and tracks for future, Statistics 52 (4) (2018) 934–949. [Google Scholar]

[R24] [24].Liu L, Lin L, Subgroup analysis for heterogeneous additive partially linear models and its application to car sales data, Comput. Statist. Data Anal. 138 (2019) 239–259. [Google Scholar]

[R25] [25].Liu X, Wang L, Liang H, Estimation and variable selection for semiparametric additive partial linear models, Statist. Sinica 21 (3) (2011) 1225–1248. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] [26].Ma P, Castillo-Davis CI, Zhong W, Liu JS, A data-driven clustering method for time course gene expression data, Nucleic Acids Res. 34 (4) (2006) 1261–1269. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] [27].Ma S, Huang J, A concave pairwise fusion approach to subgroup analysis, J. Amer. Statist. Assoc. 112 (517) (2017) 410–423. [Google Scholar]

[R28] [28].Mankad S, Michailidis G, Biclustering three-dimensional data arrays with plaid models, J. Comput. Graph. Statist. 23 (4) (2014) 943–965. [Google Scholar]

[R29] [29].Opgen-Rhein R, Strimmer K, Inferring gene dependency networks from genomic longitudinal data: a functional data approach, REVSTAT 4 (2006) 53–65. [Google Scholar]

[R30] [30].Peng J, Müller H-G, Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions, Ann. Appl. Stat 2 (3) (2008) 1056–1077. [Google Scholar]

[R31] [31].Rangel C, Angus J, Ghahramani Z, Lioumi M, Sotheran E, Gaiba A, Wild DL, Falciani F, Modeling T-cell activation using gene expression profiling and state-space models, Bioinformatics 20 (9) (2004) 1361–1372. [DOI] [PubMed] [Google Scholar]

[R32] [32].Ruppert D, Selecting the number of knots for penalized splines, J. Comput. Graph. Statist. 11 (4) (2002) 735–757. [Google Scholar]

[R33] [33].Schmutz A, Jacques J, Bouveyron C, Cheze L, Martin P, Clustering multivariate functional data in group-specific functional subspaces, Comput. Statist (2020) 1–31. [Google Scholar]

[R34] [34].Schumaker LL, Spline Functions: Basic Theory, Wiley, New York, 2007. [Google Scholar]

[R35] [35].Slimen YB, Allio S, Jacques J, Model-based co-clustering for functional data, Neurocomputing 291 (2018) 97–108. [Google Scholar]

[R36] [36].Stone CJ, The dimensionality reduction principle for generalized additive models, Ann. Statist 14 (2) (1986) 590–606. [Google Scholar]

[R37] [37].Suarez AJ, Subhashis G, BayesIan clustering of functional data using local features, Bayesian Anal 11 (1) (2016) 71–98. [Google Scholar]

[R38] [38].Tseng P, Convergence of a block coordinate descent method for nondifferentiable minimization, J. Optim. Theory Appl. 109 (3) (2001) 475–494. [Google Scholar]

[R39] [39].Van Der Vaart AW, Wellner JA, Weak Convergence and Empirical Processes, Springer; New York, 1996. [Google Scholar]

[R40] [40].Wang J-L, Chiou J-M, Müller H-G, Functional data analysis, Annu. Rev. Stat. Appl 3 (1) (2016) 257–295. [Google Scholar]

[R41] [41].Wang L, Yang L, Spline-backfitted kernel smoothing of nonlinear additive autoregression model, Ann. Statist 35 (6) (2007) 2474–2503. [Google Scholar]

[R42] [42].Wang HJ, Zhu Z, Zhou J, Quantile regression in partially linear varying coefficient models, Ann. Statist 37 (6B) (2009) 3841–3866. [Google Scholar]

[R43] [43].Weiner J, Lewis DJM, Maertzdorf J, Mollenkopf H-J, Characterization of potential biomarkers of reactogenicity of licensed antiviral vaccines: randomized controlled clinical trials conducted by the biovacsafe consortium, Sci. Rep. 9 (1) (2019) 20362. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] [44].Wu C, Kwon S, Shen X, Pan W, A new algorithm and theory for penalized regression-based clustering, J. Mach. Learn. Res 17 (1) (2015) 6479–6503. [PMC free article] [PubMed] [Google Scholar]

[R45] [45].Xie J, Ma A, Fennell A, Ma Q, Zhao J, It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data, Brief. Bioinform. 20 (4) (2019) 1450–1465. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R46] [46].Xu R, Wunsch D, Survey of clustering algorithms, IEEE Trans. Neural Netw. 16 (3) (2005) 645–678. [DOI] [PubMed] [Google Scholar]

[R47] [47].Zhang C-H, Nearly unbiased variable selection under minimax concave penalty, Ann. Statist 38 (2) (2010) 894–942. [Google Scholar]

[R48] [48].Zhu X, Qu A, Cluster analysis of longitudinal profiles with subgroups, Electron. J. Stat. 12 (2018) 171–193. [Google Scholar]

PERMALINK

Biclustering analysis of functionals via penalized fusion

Kuangnan Fang

Yuanxing Chen

Shuangge Ma

Qingzhao Zhang

Abstract

1. Introduction

2. Methods

2.1. Data and model settings

2.2. Biclustering via penalized fusion

2.3. Computation

Algorithm 1.

Proposition 1.

2.4. Statistical properties

Theorem 1.

Theorem 2.

3. Simulation

Fig. 1.

Example 1.

Example 2.

Example 3.

Example 4.

Example 5.

Fig. 2.

Table 2.

4. Applications

4.1. T-cell data

Fig. 3.

4.2. Vaccine data

Fig. 4.

5. Discussion

Supplementary Material

Table 1.

Acknowledgments

Appendix A. Proofs

Proof of Proposition 1.

Fig. 5.

Lemma 1.

Proof.

Proof.

Lemma 3

Proof of Theorem 1.

Proof of Theorem 2.

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases