Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 May 1.
Published in final edited form as: J Multivar Anal. 2021 Oct 29;189:104874. doi: 10.1016/j.jmva.2021.104874

Biclustering analysis of functionals via penalized fusion

Kuangnan Fang a, Yuanxing Chen a, Shuangge Ma b, Qingzhao Zhang c,*
PMCID: PMC9937451  NIHMSID: NIHMS1826775  PMID: 36817965

Abstract

In biomedical data analysis, clustering is commonly conducted. Biclustering analysis conducts clustering in both the sample and covariate dimensions and can more comprehensively describe data heterogeneity. In most of the existing biclustering analyses, scalar measurements are considered. In this study, motivated by time-course gene expression data and other examples, we take the “natural next step” and consider the biclustering analysis of functionals under which, for each covariate of each sample, a function (to be exact, its values at discrete measurement points) is present. We develop a doubly penalized fusion approach, which includes a smoothness penalty for estimating functionals and, more importantly, a fusion penalty for clustering. Statistical properties are rigorously established, providing the proposed approach a strong ground. We also develop an effective ADMM algorithm and accompanying R code. Numerical analysis, including simulations, comparisons, and the analysis of two time-course gene expression data, demonstrates the practical effectiveness of the proposed approach.

Keywords: Biclustering, Functional data, Penalized fusion, primary 62H30, 62R10

1. Introduction

In biomedical data analysis, clustering has been routinely conducted. The clustering of samples can assist better understanding sample heterogeneity, and the clustering of covariates can identify those that behave similarly across samples and then, for example, improve our understanding of covariate functionalities. Clustering can also serve as the basis of other analysis, for example, regression. Biclustering analysis has also been developed, identifying clustering structures in both sample and covariate dimensions. It includes sample- and covariate-clustering as special cases and, in a sense, can be more comprehensive. For generic reviews of techniques, theories, and applications of clustering, we refer to [19,46].

This study has been partly motivated by the analysis of gene expression data, for which sample- and covariate-clustering as well as biclustering have been extensively conducted [21,45]. Most gene expression studies generate “snapshot” values. Unlike some types of omics measurements, gene expression values can be time-dependent, and the temporal trends of gene expressions can have important biological implications [16]. Accordingly, time-course gene expression studies have been conducted, generating multiple measurements at different time points for each gene of each sample. In the analysis of time-course gene expression data, besides simple statistics, functional data analysis (FDA) techniques, have been adopted and shown as powerful [12].

FDA deals with data samples that consist of curves or other infinite-dimensional data objects. Over the last two decades, we have witnessed significant developments in its theory, method, computation, and application. For systematic reviews, we refer to [2,15,23,40]. In FDA, clustering analysis has been of particular interest. A popular approach projects functional data into a finite-dimensional space and then applies existing clustering methods. For example, Abraham et al. [1] conduct B-spline expansions, and clusters the estimated coefficients using a k-means algorithm. Peng and Müller [30] develop a distance for sparse functional data, and apply a k-means algorithm to functional principle component analysis (PCA) scores. Other approaches, such as Bayesian [37], subspace [3,9,10], and model-based [18,20], have also been developed. We refer to [17,40] for surveys on functional data clustering. Most works in this area, however, have focused on either sample- or covariate-clustering.

For biclustering analysis (of gene expression and other types of data), in this article, we take the “natural next step” and consider the scenario where for each covariate of each sample, a function or its realizations at discrete time points are available. We note that, although this study has been partly motivated by gene expression data and some of the discussions are focused on such data, the considered data scenario and proposed technique can have applications far beyond such data. For example, in biomedical studies, many biomarkers measured in blood tests vary across time, and their values can be obtained from medical records. In financial studies, many measures of a company, for example size and stock price, vary across time. As such, our investigation can have broad applications.

There is a vast literature on biclustering analysis with scalar measurements. Directly applying such techniques to the present problem will involve either treating functional measurements as scalars and then computing distances (between covariates and samples) – which may be ineffective by not sufficiently accounting for the functional nature of data, or first estimating functionals and then computing distances between the estimates – which may also encounter challenges when a large number of functionals need to be jointly estimated. Our literature review suggests that there are also a handful recent biclustering methods designed for functional (especially including longitudinal) data. For example, Slimen et al. [35] propose a biclustering method for multivariate functional data based on the Gaussian latent block model (LBM) using the first functional PCA scores. Bouveyron et al. [4] develop an extension of the Gaussian LBM by modeling the whole set of functional PCA scores. In another work [28], a biclustering method with a plaid model is extended to three-dimensional data arrays, of which multivariate longitudinal data is a special case.

For the biclustering analysis of functionals, in this article, we develop a penalized fusion based approach. More specifically, a nonparametric model is assumed for each covariate of each sample, allowing for sufficient flexibility in modeling. A doubly penalization technique is adopted, which includes a smoothness penalty to regulate nonparametric estimation. The most significant advancement is the second, fusion penalty, which “transforms” clustering in both sample and covariate dimensions to a penalized estimation problem. Statistical and numerical investigations are conducted, providing the proposed approach a solid ground. This study may complement and advance from the existing ones in multiple aspects. Compared to direct applications of biclustering methods for scalars (that either directly compute distances without functional estimation or estimate functionals separately), the proposed approach can more effectively accommodate the functional nature of data or generate more effective estimation. This is because it “combines” clustering and estimation, and as such, estimation only needs to be conducted for clusters as opposed to individual covariates, potentially leading to a smaller number of parameters and hence more effective estimation. Compared to some of the existing biclustering methods for functionals, such as [4,35], the proposed approach has a much easier way of determining the number of clusters. In addition, unlike [4,35], it does not make stringent distributional assumptions (for example, normality). Meanwhile, rigorous theoretical investigations are conducted beyond methodological developments, granting the proposed approach a stronger statistical basis. It also advances from the clustering of functional covariate effects (assuming homogeneous samples) by simultaneously examining sample heterogeneity, thus being more comprehensive. Additionally, this study may also advance and enrich the penalized fusion technique. Clustering via penalized fusion has been pioneered in [8] and other studies. Compared to alternative clustering techniques, it is more recent and has notable statistical and numerical advantages [44]. Compared to the existing penalized fusion based clustering, this study differs by conducting biclustering and by having unknown parameters generated from the basis expansion of functionals. Last but not least, this study also provides a practically useful and new way of analyzing time-course gene expression data (and other data with similar characteristics).

The remainder of this article is organized as follows: Section 2 introduces the new biclustering approach via penalized fusion and develops an effective computational algorithm. Statistical properties are established to provide our method a strong theoretical support. Simulation studies and the analysis of two time-course expression data are conducted in Sections 3 and 4, respectively. Section 5 concludes with a brief discussion. The proofs of the main results are presented in Appendix A.

2. Methods

2.1. Data and model settings

For the j ∈ {1, …, q}th covariate of sample i ∈ {1, …, N }, denote Yi,j=(Yi,j,1,,Yi,j,ni,j) as the ordered measurements (ordered by time for time-course gene expression data), which are the discrete realizations of an unknown underlying functional. Further denote Yi=(Yi,1,,Yi,q), Y=(Y1,,YN) and n=i=1Nj=1qni,j. Under the biclustering analysis framework, assume that data can be “decomposed” into Kr sample (row) groups and Kc covariate (column) groups. Note that advancing from many existing approaches, the numbers of two dimensional groups are not pre-specified. Denote ti,j,msT=[0,1] as the observed time points. If (sample i, covariate j) belongs to the kr th sample group and the kc th covariate group, then

Yi,j,m=g(kr,kc)(ti,j,m)+ϵi,j,m, (1)

where g(kr,kc)(t) is the unknown mean function, and ϵi,j,ms are the random errors with mean zero.

For estimation, we adopt the basis expansion technique. Specifically, denote Up(t) = (U1,p(t), …, Up,p(t)) as the collection of p rescaled basis functions. In the literature, there are extensive studies on choosing the form and number of basis functions [32], which will not be reiterated here. In our numerical study, we adopt B-spline basis functions of order d = 3. Let gi,j(t) be the unknown mean function for the jth covariate of the ith sample, then we have

gi,j(t)Up(t)βi,j,

where βi,j = (βi,j,1, …,βi,j,p) is the vector of unknown coefficients. Further denote Ui,j=(Up(ti,j,1),,Up(ti,j,ni,j)). For estimation (without clustering), consider the objective function

Q(β)=12YUβ22+12γ1βMβ=12i=1Nj=1q(Yi,jUi,jβi,j22+γ1βi,jDβi,j), (2)

where U = diag(U1,1, …, U1,q, …, UN,q), β=(β1,1,,β1,q,,βN,q), M = diag(D, …, D), D = δ δ, δ is a (p − 2) × p matrix representing the second order differential operator, and γ1 is a non-negative tuning parameter. In this objective function, the first term is the lack-of-fit, and the penalty term controls the smoothness of estimation.

2.2. Biclustering via penalized fusion

Under the clustering via penalized fusion framework, two samples (covariates) belong to the same cluster if and only if they have the same regression coefficients. As such, clustering amounts to determining whether two samples (covariates) have the same estimated coefficients. For samples i1, i2 ∈ {1, …, N}, denote βi1(r), βi2(r) as the length p × q vectors of coefficients. For covariates j1, j2 ∈ {1, …, q}, denote βj1(c), βj2(c) as the length p × N vectors of coefficients. For estimating β and hence determining the clustering structure, we propose minimizing the objective function:

L(β)=Q(β)+1i1<i2Npτ(βi1(r)βi2(r)2,γ2)+1j1<j2qpτ(βj1(c)βj2(c)2,(N/q)1/2γ2). (3)

Here pτ (, ) is a penalty function, τ is a regularization parameter, ∥ · ∥2 is the 2 norm, and γ2 is a data-dependent tuning parameter. (N/q)1/2 is added to make the two penalties comparable. In our numerical study, we adopt MCP [47], that is, pτ(t,γ)=γ0t(1x/(τγ))+dx with τ > 1. Here (x)+ = x if x > 0, and (x)+ = 0 otherwise. Note that SCAD [14] and some other penalties are also applicable. Denote the estimator as β^. Let {α^1(r),,α^K^r(r)} be the distinct values of β^i(r)s. Similarly, let {α^1(c),,α^K^c(c)} be the distinct values of β^j(c)s. We can then obtain the block structure of β^ by {α^1,1(r,c),,α^K^r,K^c(r,c)}, which are the distinct values of β^i,j, and set K^b=K^r×K^c.

In (3), penalty is imposed to the norms of all pairwise differences to promote equality, as in “standard” penalized fusion [8]. Here it is noted that, as in [8], since there is no information on the order of samples/covariates, all pairwise differences are taken, which differs from, for example, fused Lasso and other fused penalizations. Different from [8], as clustering needs to be conducted in both the sample and covariate dimensions, two fusion penalties are imposed, promoting equality in two directions. It is also noted that each specific coefficient shows up in three different penalties. As to be shown below, with properly chosen tunings, there is not an over penalization problem. In addition, it is not rare to have a parameter involved in two or more penalties [7].

The proposed approach involves two tunings, which have “ordinary” implications, with one controlling smoothness and the other determining the structure of clustering. One possibility is to conduct a two-dimensional grid search. Here we adopt the alternative proposed in [48], which has two steps and a lower computational cost. In particular, in the first step, we set γ2 = 0 and select the optimal γ1 by minimizing:

BIC(γ1)=i=1Nj=1q{log(Yi,jg^i,j22ni,j)+log(ni,j)ni,jdfi,j},

where dfi,j=trace{Ui,j(Ui,jUi,j+γ1D)1Ui,j} and g^i,j=(g^i,j(ti,j,1),,g^i,j(ti,j,ni,j)) with g^i,j(t)=Up(t)β^i,j.

In the second step, we fix the value of γ1 at the optimal and select γ2 by minimizing

BIC(γ2)=log(Yg^22Nq)+log(Nq)Nqdf,

where df=(K^rK^c/Nq)i=1Nj=1qdfi,j and g^=(g^1,1,,g^N,q).

2.3. Computation

We develop an effective algorithm based on the ADMM technique. Specifically, we first reformulate (3) as

argminQ(β)+δΔ(r)pτ(ηδ(r)2,γ2)+δΔ(c)pτ(ηδ(c)2,(N/q)1/2γ2),
subjecttoβi1(r)βi2(r)=ηδ(r),βj1(c)βj2(c)=ηδ(c),

where Δ(r) = {δ = (i1, i2) : 1 ≤ i1 < i2N} and Δ(c) = {δ = (j1, j2) : 1 ≤ j1 < j2q}. Optimizing the constrained objective function is equivalent to optimizing the augmented Lagrangian function:

Lθ(β,Hr,Hc,Λr,Λc)=12YUβ22+12γ1βMβ+δΔ(r)pτ(ηδ(r)2,γ2)+δΔ(r)λδ(r)(ηδ(r)βi1(r)+βi2(r))+θ2δΔ(r)ηδ(r)βi1(r)+βi2(r)22+δΔ(c)pτ(ηδ(c)2,(N/q)1/2γ2)+δΔ(c)λδ(c)(ηδ(c)βj1(c)+βj2(c))+θ2δΔ(c)ηδ(c)βj1(c)+βj2(c)22, (4)

where θ is a small positive constant, Hr=(η(1,2)(r),,η(N1,N)(r)), Hc=(η(1,2)(c),,η(q1,q)(c)), Λr=(λ(1,2)(r),,λ(N1,N)(r)), and Λc=(λ(1,2)(c),,λ(q1,q)(c)). Here we introduce the dual variables λδ(r) and λδ(c) corresponding to the pair δ in Δ(r) and Δ(c), and the cardinality of Δ(r) and Δ(c) are denoted by |Δ(r)| and |Δ(c)|.

We consider an iterative algorithm, where the updates in step m + 1 are:

β(m+1)=argminβLθ(β,Hr(m),Hc(m),Λr(m),Λc(m)),Hr(m+1)=argminHrLθ(β(m+1),Hr,Λr(m)),Hc(m+1)=argminHcLθ(β(m+1),Hc,Λc(m)),λδ(r)(m+1)=λδ(r)(m)+θ(ηδ(r)(m+1)βi1(r)(m+1)+βi2(r)(m+1)),δΔ(r),λδ(c)(m+1)=λδ(c)(m)+θ(ηδ(c)(m+1)βj1(c)(m+1)+βj2(c)(m+1)),δΔ(c). (5)

More specifically, when optimizing over β, we consider

f(β)=12YUβ22+12γ1βMβ+θ2(δΔ(r)η˜δ(r)(m)Bδ(r)β22+δΔ(c)η˜δ(c)(m)Bδ(c)β22), (6)

where η˜δ(r)=ηδ(r)+1θλδ(r), η˜δ(c)=ηδ(c)+1θλδ(c), Bδ(r)=(ei1(r)ei2(r))Iqp, Bδ(c)=IN[(ej1(c)ej2(c))Ip], ei(r) is an N × 1 zero vector except that its ith element is 1, ei(c) is a q × 1 zero vector except that its jth element is 1, ⊗ is the Kronecker product, and Ip is a p × p identity matrix. Denote Br=(B(1,2)(r),,B(N1,N)(r)), Bc=(B(1,2)(c),,B(q1,q)(c)), H˜r=(η˜(1,2)(r),,η˜(N1,N)(r)), and H˜c=(η˜(1,2)(c),,η˜(q1,q)(c)). Then the update for β is

β(m+1)=(UU+γ1M+θBrBr+θBcBc)1(UY+θBrvec(H˜r(m))+θBcvec(H˜c(m))), (7)

where vec(Z) is the vectorization of matrix Z by columns.

For Hr, we consider

f(ηδ(r))=pτ(ηδ(r)2,γ2)+θ2ηδ(r)βi1(r)(m+1)+βi2(r)(m+1)+λδ(r)(m)/θ22. (8)

Denote zδ(r)(m+1)=βi1(r)(m+1)βi2(r)(m+1)λδ(r)(m)/θ. With the KKT conditions of (8), we can get a closed form solution of Hr:

ηδ(r)(m+1)={zδ(r)(m+1),ifzδ(r)(m+1)2τγ2,τθτθ1(1γ2/θzδ(r)(m+1)2)+zδ(r)(m+1),ifzδ(r)(m+1)2<τγ2. (9)

Similarly, denote zδ(c)(m+1)=βj1(c)(m+1)βj2(c)(m+1)λδ(c)(m)/θ, and we can get a closed form solution of Hc:

ηδ(c)(m+1)={zδ(c)(m+1),ifzδ(c)(m+1)2(N/q)1/2τγ2,τθτθ1(1(N/q)1/2γ2/θzδ(c)(m+1)2)+zδ(c)(m+1),ifzδ(c)(m+1)2<(N/q)1/2τγ2. (10)

Consider the initial values β(0)=(UU+γ1M)1UY,ηδ(r)(0)=βi1(r)(0)βi2(r)(0), and ηδ(c)(0)=βj1(c)(0)βj2(c)(0), and Λr(0) and Λc(0) are set as zero. The ADMM based algorithm is summarized in Algorithm 1.

Algorithm 1.

Input:
 Response vector Y, basis expansion design matrix U, and difference matrix M;
 Tuning parameters γ1 and γ2. Specific to MCP, regularization parameter τ;
Output:
 Coefficient vector β, splitting variables Hr and Hc, and dual variables Λr and Λc;
1: repeat
2: for m = 0, 1, 2 … do
3: Update β by (7).
4: Update Hr by (9).
5: Update Hc by (10).
6: Update Λr and Λc by (5).
7: end for
8: until the stopping criteria are met, which are set as rr(m+1)2ϵ1pri, rc(m+1)2ϵ2pri, sr(m+1)2ϵ1dual, and sc(m+1)2ϵ2dual in our numerical study.

Proposition 1.

Denote the two primal residuals as rr(m+1)=Brβ(m+1)vec(Hr(m+1)) and rc(m+1)=Bcβ(m+1)vec(Hc(m+1)), and the two dual residuals as sr(m+1)=θBr[vec(Hr(m+1))vec(Hr(m))] and sc(m+1)=θBc[vec(Hc(m+1))vec(Hc(m))]. Then

limmrr(m+1)22=0,limmrc(m+1)22=0,limmsr(m+1)+sc(m+1)22=0.

This result establishes convergence of the proposed algorithm. In numerical analysis, we stop the algorithm and conclude convergence when rr(m+1)2ϵ1pri, rC(m+1)2ϵ2pri, sr(m+1)2ϵ1dual and sc(m+1)2ϵ2dual. Following [5], we set the tolerance parameters as follows:

ϵ1pri=|Δ(r)|pqϵabs+ϵrelmax{Brβ(m+1)2,vec(Hr(m+1))2},ϵ2pri=|Δ(c)|pNϵabs+ϵrelmax{Bcβ(m+1)2,vec(Hc(m+1))2},ϵ1dual=Nqpϵabs+ϵrelBrvec(Λr(m+1))2,ϵ2dual=Nqpϵabs+ϵrelBcvec(Λc(m+1))2. (11)

Here ϵabs and ϵrel are predetermined small values, for example 10−3. In all of our numerical analysis, convergence is satisfactorily achieved within a small to moderate number of iterations. The code and example are publicly available at https://github.com/ruiqwy/Biclustering.

2.4. Statistical properties

For a vector z=(z1,,zs)s, let z=max1ls|zl|. For a matrix Zs×h, let Z2=maxvh,v2=1Zv2 and Z=max1isj=1h|Zi,j|. For any two sequences of real numbers {an} ≥ 1 and {bn} ≥ 1, denote bnan if bn/an = o(1). Let r be a positive integer, v ∈ (0, 1], and κ = r + v > 1.5. Let H be the collection of functions g on T=[0,1], where the rth derivative g(r) exists and satisfies the Lipschitz condition with order v:

|g(r)(z1)g(r)(z2)|C|z1z2|v,0z1,z21,

and C is a positive constant.

Define the following collections of index sets for clustering memberships: G(r)=(G1(r),,GKr(r)) for samples, G(c)=(G1(c),,GKc(c)) for covariates, and G(r,c)=(G1,1(r,c),,Gkr,kc(r,c),,GKr,Kc(r,c)) for both samples and covariates. Define MG={βNqp:βi1,j1=βi2,j2,forany(i1,j1),(i2,j2)Gkr,kc(r,c),1krKr,1kcKc}. Let |Gkr(r)|, |Gkc(c)|, and |Gkr,kc(r,c)| be the sizes of Gkr(r), Gkc(c), and Gkr,kc(r,c), respectively. Further define |Gmin(r)|=min1krKr|Gkr(r)|, |Gmin(c)|=min1kcKc|Gkc(c)| and |Gmin(r,c)|=|Gmin(r)|×|Gmin(c)||Gmax(r,c)| can be defined accordingly. Let ρ(t) = γ−1pτ(t, γ). Assume the following conditions.

(C1) gkr,kcH for all kr ∈ {1, …, Kr}, kc ∈ {1, …, Kc}, and |Gmax(r,c)|1/(2κ)p|Gmin(r,c)|1/3.

(C2) The distribution of ti,j,m’s, i ∈ {1, …, N}, j ∈ {1, …, q}, m ∈ {1, …, ni,j} follows a density function fT, which is absolutely continuous. There exist constants c1 and C1 such that 0<c1mintTfT(t)maxtTfT(t)C1<.

(C3) ni,j’s are uniformly bounded for all i ∈ {1, …, N}, j ∈ {1, …, q}.

(C4) pτ (t, γ) is symmetric, non-decreasing, and concave in t for t ∈ [0, ∞]. There exists a constant 0 < a < ∞ such that ρ(t) is a constant for all t, and ρ(0) = 0. ρ(t) exists and is continuous except for a finite number of t and ρ(0+) = 1.

(C5) Let ϵi,j=(ϵi,j,1,,ϵi,j,ni,j), where ϵi,j,m’s are independent across (i, j) (among different individual observational vectors) and correlated across m (within the same (i, j)). Furthermore, there exist F > 0 and c2 > 0, such that for all i ∈ {1, …, N} and j ∈ {1, …, q},

E(exp{F|ni,j1ϵi,jϵi,j|1/2})c2.

Similar conditions have been assumed in the literature. The first condition in (C1) ensures that the Hölder’s condition is satisfied [36]. The second condition in (C1) pertains to the growth rate of the number of internal knots, in a way similar to [25] and [24]. Condition (C2) assumes the boundedness of the density function, similarly to [48] and others. Conditions similar to (C3) have been commonly made. In the analysis of high-dimensional data, conditions similar to (C4) have been common, and it is easy to verify that MCP and SCAD satisfy (C4). Condition (C5) gives the boundedness condition for the error terms, and a similar condition can be found in [11].

When the true clustering structure is known, the oracle estimator for β can be defined as

β^or=argminβMG12kr=1Krkc=1Kc(i,j)Gkr,kc(r,c){Yi,jUi,jβi,j22+γ1βi,jDβi,j},

where g^(kr,kc)or is defined as the oracle estimator of g(kr,kc) based on β^or. Let β* be the underlying true coefficient vector and g(kr,kc)* be the true value of g(kr,kc). For any L2-integrable function g, denote g=(tTg2(t)fT(t)dt)1/2.

Theorem 1.

Assume that (C1)–(C5) hold. If γ1=o(|Gmin(r,c)|1/2) and plog(Nq)|Gmin(r,c)|, then with probability at least 1 − 3KrKcp/(Nq),

sup1iN,1jqβ^i,jorβi,j*2ψ,sup1krKr,1kcKcg^(kr,kc)0rg(kr,kc)*ψ,

where ψ=C*(plog(Nq)/|Gmin(r,c)|)1/2, and C* is a large constant.

This theorem establishes consistency of the oracle estimates with a high probability. Denote b=min(kr,kc)(kr,kc)g(kr,kc)*g(kr,kc)*. We can further establish the following result.

Theorem 2.

Assume that (C1)–(C5) and conditions in Theorem 1 hold. If bγ2|Gmin(c)|1/2, b(N/q)1/2γ2|Gmin(r)|1/2, and γ2(pq)1/2log(Nq)/min{|Gmin(r)|,|Gmin(c)|}, then there exists a local minimizer β^ of L(β) satisfying

P(β^=β^or)1asN,q.

This theorem establishes that the oracle estimator is a local minimizer of the objective function with a high probability. The estimation consistency along with the separateness of the true functions can lead to the clustering consistency.

3. Simulation

We conduct simulation to assess performance of the proposed approach and gauge against the following alternatives: (a) the bKmeans method [1], which first fits each curve using B-splines and then clusters the estimated coefficients using the k-means technique by rows and columns, (b) the funHDDC method [33], which has been developed for multivariate functional data clustering based on latent mixture models. It has been applied to longitudinal data, and (c) the funLBM method [4], which has been developed for functional data biclustering based on latent block models. Here we note that the proposed and funLBM methods conduct biclustering directly, whereas the bKmeans and funHDDC methods have been originally designed for one-way clustering–hence they are applied twice to achieve both row and column clusterings. In addition, the funHDDC and funLBM methods are not directly applicable to functional data with unequal measurements. We apply imputation [26] to tackle this problem. As discussed in Section 1, biclustering methods for functional data are very limited. It is possible to modify other existing one-way functional clustering methods to achieve biclustering, however, this demands additional methodological developments. The three alternatives considered here have been chosen because of their closely related frameworks and competitive performance.

In evaluation, we examine both clustering and estimation accuracy. Specifically, when examining clustering accuracy, we consider the estimated numbers of row clusters K^r, column clusters K^c, and biclusters K^b. In addition, we use the Rand index and adjusted Rand index to assess the accuracy of clustering, including RIr and ARIr for row clustering, RIc and ARIc for column clustering, and RIb and ARIb for biclustering. The Rand index is defined by RI = (TP + TN)/(TP + FP + FN + TN), where for example TP is the true positive count, defined as the number of sample pairs from the same cluster and assigned to the same cluster, and the other counts can be defined accordingly. As the Rand index tends to be large even under random clusterings, we also examine the adjusted Rand index defined as ARI = (RI − E(RI))/(max(RI) − E(RI)), which can partly correct this problem. To evaluate estimation accuracy, we examine the integrated squared error (ISE) defined as

ISE=1nkr=1Krkc=1Kc(i,j)Gkr,kc(r,c)m=1nij{g(kr,kc)(ti,j,m)g^i,j(ti,j,m)}2.

We consider a total of Kb = 9 biclusters, which are formed by Kr = 3 sample (row) clusters and Kc = 3 covariate (column) clusters. Yi,j,m=g(kr,kc)(ti,j,m)+ϵi,j,m with ti,j,m’s, m ∈ {1, …, 10}, equally spaced on [0, 1]. The nine true functional forms are g(1,1)(t) = cos(2πt), g(2,1)(t) = 1 − 2exp(−6t), g(3,1)(t) = −1.5t, g(1,2)(t) = 1 + sin(2πt), g(2,2)(t) = 2t2, g(3,2)(t) = t + 1, g(1,3)(t) = 2(sin(2πt) + cos(2πt)), g(2,3)(t) = 1 + t3, and g(3,3)(t)=2t+1. They are also graphically presented in Fig. 1. To better mimic real data, we allow a certain proportion (ζ) of the curves from each bicluster to have 20% missing measurements. When implementing the proposed approach, we choose smoothing splines with the number of internal knots J = 3. We also fix θ = 1 and τ = 3. In what follows, under Examples 1 and 2, N > q, whereas under Example 3, N = q. Under Examples 13, the random errors are independent, whereas under Example 4, they are correlated. Note that under Examples 14, simulation results are calculated based on automatic cluster selection. Example 5 is designed to investigate the performance of these methods when the numbers of clusters are correctly prespecified. A total of 100 replicates are simulated under each setting.

Fig. 1.

Fig. 1.

Example 1: Curves of observed data (black dotted), estimated (blue solid) by the proposed method, and true (red solid) functions with (a) N = 30 and (b) N = 90 for one replicate. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Example 1.

N = 30, 60, and 90. q = 9. The clusters are balanced, with each row cluster containing N/3 samples and each column cluster containing q/3 covariates. ζ = 0.3. The random errors are iid N(0,0.62).

Example 2.

The settings are the same as in Example 1, except that the clusters are unbalanced. The row clusters have sizes 1:2:3, and the column clusters have sizes 2:3:4.

Example 3.

Set (N, q) = (30, 30), (39, 39), (45, 45), ζ = 0.3 and 0.4. The rest are the same as in Example 1.

Example 4.

The settings are similar to those under Example 1. The random errors are correlated with an AR(1) correlation structure, where AR stands for auto-correlation. Consider AR coefficient ϕ = 0.2 and 0.8, representing weak and strong correlations.

Example 5.

The settings are the same as those in Example 1. The difference is that the numbers of clusters are correctly prespecified instead of being selected by the BIC criterion.

Results for Example 1 are presented in Figs. 1 and 2 as well as Tables 1 and 2. More specifically, in Fig. 1, we show the true functions for all clusters as well as sample observed data and estimated functions. In Table 1, we summarize the numbers of identified row and column clusters as well as biclusters. In Table 2, we summarize the Rand and adjusted Rand index values. In Fig. 2, we present the boxplots of ISE (note that different panels have different ranges for the Y-axis). Results for Examples 25 are presented in the Supplementary section. Although different examples have different numerical results, overall, the advantage of the proposed approach is clearly observed. Consider for example Table 1 with N = 30. The proposed approach has the mean number of row clusters 2.83, compared to 2.76, 2.63, and 4.66 of the three alternatives. When N = 90, the proposed approach has the mean number of biclusters 8.71, compared to 3.51, 6.33, and 10.68 of the three alternatives. The improved clustering accuracy is further proved by the Rand index values in Table 2. For example with N = 90, the adjusted Rand index value for biclustering with the proposed approach is 0.964, compared to 0.358, 0.686, and 0.764 with the three alternatives. Fig. 2 shows that as N increases, estimation accuracy of the proposed approach (and two alternatives) increases. Under all three N values, the proposed approach has significantly smaller ISE values. Moreover, comparing the results of Example 5 with Example 1, we observe similar performance and that the proposed approach still performs better when the numbers of clusters are correctly prespecified.

Fig. 2.

Fig. 2.

Example 1: Boxplots of ISE with (a) the proposed method, (b) bKmeans, (c) funHDDC, and (d) funLBM.

Table 2.

Example 1: Mean and standard error (shown in parentheses) of RIr. ARIr. RIc. ARIc. RIb. and ARIb based on 100 replicates.

N Method RIr ARIr RIc ARIc RIb ARIb

30 Proposed 0.940 (0.189) 0.911 (0.278) 0.936 (0.203) 0.910 (0.279) 0.927 (0.238) 0.909 (0.281)
bKmeans 0.860 (0.173) 0.740 (0.290) 0.296 (0.163) 0.052 (0.194) 0.673 (0.174) 0.307 (0.167)
funHDDC 0.744 (0.031) 0.493 (0.074) 0.940 (0.107) 0.880 (0.215) 0.889 (0.051) 0.598 (0.120)
funLBM 0.913 (0.053) 0.786 (0.109) 0.913 (0.064) 0.746 (0.153) 0.951 (0.029) 0.708 (0.113)
60 Proposed 0.966 (0.138) 0.947 (0.208) 0.963 (0.152) 0.945 (0.212) 0.959 (0.177) 0.943 (0.216)
bKmeans 0.887 (0.132) 0.780 (0.248) 0.316 (0.195) 0.077 (0.239) 0.704 (0.142) 0.339 (0.191)
funHDDC 0.767 (0.021) 0.546 (0.049) 0.998 (0.025) 0.995 (0.050) 0.922 (0.014) 0.692 (0.044)
funLBM 0.918 (0.110) 0.828 (0.221) 0.929 (0.119) 0.840 (0.257) 0.953 (0.052) 0.796 (0.198)
90 Proposed 0.978 (0.117) 0.966 (0.176) 0.975 (0.131) 0.965 (0.178) 0.971 (0.154) 0.964 (0.180)
bKmeans 0.886 (0.134) 0.778 (0.251) 0.342 (0.226) 0.109 (0.279) 0.709 (0.152) 0.358 (0.227)
funHDDC 0.769 (0.017) 0.551 (0.040) 0.990 (0.049) 0.980 (0.098) 0.919 (0.025) 0.686 (0.061)
funLBM 0.909 (0.121) 0.813 (0.241) 0.908 (0.130) 0.793 (0.276) 0.944 (0.056) 0.764 (0.210)

4. Applications

Here we analyze two time-course gene expression data. Although in a sense the data characteristics are similar, the two data analyses may serve different purposes. In particular, the first dataset is “older”, which has been analyzed multiple times in the literature, and has a clearer sample clustering structure. In contrast, the second dataset is more recent, and its analysis may lead to a higher practical impact.

4.1. T-cell data

This data has been generated in a study of T-cell activation [31]. It is publicly available in the R package longitudinal (http://www.strimmerlab.org/software/longitudinal/) and contains two subsets: tcell.10 and tcell.34. The first subset contains measurements for 10 samples and 58 genes at 10 unequally spaced time points, t ∈ {0, 2, 4, 6, 8, 18, 24, 32, 48, 72}, whereas the second subset contains measurements for 34 samples and the same genes at the same time points. In [31], the distinctions between the two subsets have been noted, and they have been combined for analysis. Prior to analysis, we conduct data processing, including gene expression normalization using the method developed in [29] and linearly transforming the observed times to [0, 1], and set the knots at 0.06, 0.2, and 0.4 as well as the order as 3.

The proposed approach identifies two sample clusters, with sizes 10 and 34, which exactly match the original subset structure. The distinctions of the samples in the two subsets have been noted in [31]. As such, they are supposed to belong to different clusters. In this sense, our “finding”, although as expected, is re-assuring. In addition, eight gene clusters are identified, among which there are four trivial clusters with sizes one. The four non-trivial clusters have sizes 27, 18, 5, and 4. Detailed information on the gene clusters is available from the authors. The eight non-trivial biclusters are presented in Fig. 3. Biclusters 1–4 correspond to tcell.10, and the rest correspond to tcell.34. It is observed that the estimated functions clearly differ across biclusters. The observed temporal trends are highly similar to those reported in [28], which provides support to the validity of our approach.

Fig. 3.

Fig. 3.

Analysis of T-cell data: Curves of observed data (black dotted) and estimated functions (blue solid) for the eight non-trivial bicluster, as well as yellow points indicating the estimated values at t ∈ {0, 2, 4, 6, 8, 18, 24, 32, 48, 72} by the proposed method. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

The three alternatives are also applied. The bKmeans approach identifies three sample clusters (with sizes 10, 17, and 17) and four gene clusters (with sizes 9, 15, 19, and 15). Compared to the proposed approach, the adjusted Rand index values are 0.441 (sample), 0.619 (gene), and 0.430 (bicluster). The funHDDC approach identifies two sample clusters (with sizes 10 and 34) and two gene clusters (with sizes 9 and 49). Compared to the proposed approach, the adjusted Rand index values are 1.000 (sample), 0.286 (gene), and 0.452 (bicluster). The funLBM approach identifies two sample clusters (with sizes 10 and 34) and six gene clusters (with sizes 9, 4, 12, 5, 18, and 10). Compared to the proposed approach, the adjusted Rand index values are 1.000 (sample), 0.586 (gene), and 0.646 (bicluster). Unlike for the simulated data, it is difficult to objectively evaluate the accuracy of clustering. However, for the proposed approach, the matching with the original sample distinction and published findings can provide a strong support, which is not shared by the alternatives.

4.2. Vaccine data

This data is generated in a relative recent study [43] and available at GEO with identifier GSE124533. The study settings have been described in detail in [43]. Briefly, it concerns with the time course of whole blood gene expressions, and the samples are healthy adults residing in an inpatient unit. The samples have been randomized into three protocols (305 A, 305B and 305C). Within each protocol, samples have been randomized to receive immunization via either vaccine or saline placebo. The treatments have been referred to as YFV and VZV (under 305 A), HBV1 and HBV3 (under 305B), and TIV and ATIV (under 305C). In this experiment, gene expression levels are measured at t ∈ {1, 2, 3, 4, 5, 7, 14, 21, 28} days after immunization. A total of 43 genes have been studied, which are selected from two gene modules defined in the published literature [6,22]. Prior to analysis, gene expression normalization, rescaling of the time points (to the unit interval), and the selection of knots order are conducted in a similar way as in the previous data analysis.

Two sets of analysis are conducted. In the first set, we focus on the samples under 305 A, which contain 20 samples treated with VZV and 20 with YFV. In the second set, we pool all 122 samples from the three protocols. We note that although the gene time courses have been analyzed in [43], there is insufficient attention to clustering. Complementary to the existing literature, our clustering analysis can potentially review sample heterogeneity as well as coordination among genes.

Results for the first set of analysis are presented in Fig. 5, where we observe two sample clusters and two gene clusters, leading to four biclusters. Here the two sample clusters match the VZV and YFV experimental conditions, which provides support to the validity of our analysis. The two gene clusters contain 27 and 16 members, respectively, which are very close to the module structure. Fig. 5 shows that the temporal trends of the four clusters differ significantly, with the level of variation and position of “peak” varying significantly. The observed trends are similar to those reported in [43]. We also refer to [43] for phamarcodynamic interpretations of the findings.

In the second set of analysis, we identify four sample clusters, with sizes 96, 5, 20, and 1, respectively. In what follows, we focus on the non-trivial clusters. Clusters 1 and 2 contain samples treated with VZV, HBV1, HBV3, ATIV, and TIV, and cluster 3 contains samples treated with YFV. In the original publication, there has been little attention to sample similarity/difference across protocols. Our analysis may suggest the significant difference between YFV and other treatments as well as the relative similarity of the five treatments (YFV excluded). Our analysis leads to two gene clusters, with sizes 25 and 18, respectively. This structure is again very similar to the module structure. The overall six non-trivial biclusters are shown in Fig. 4, where we observe significant across-cluster differences. Among the six patterns, biclusters 5 and 6 are similar to those observed in the first set of analysis, where biclusters 1–4 are relatively different.

Fig. 4.

Fig. 4.

Analysis of vaccine data with samples under all three protocols: Curves of observed data (black dotted) and estimated functions (blue solid) for non-trivial clusters, as well as yellow points indicating the estimated values at t ∈ {1, 2, 3, 4, 5, 7, 14, 21, 28} by the proposed method. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

The three alternatives are also applied. The bKmeans approach identifies three sample clusters (with sizes 20, 27, and 75) and two gene clusters (with sizes 26 and 17). Compared to the proposed approach, the adjusted Rand index values are 0.551 (sample), 0.907 (gene), and 0.666 (bicluster). The funHDDC approach identifies two sample clusters (with sizes 20 and 102) and three gene clusters (with sizes 26, 12 and 5). Compared to the proposed approach, the adjusted Rand index values are 0.819 (sample), 0.774 (gene), and 0.758 (bicluster). The funLBM approach identifies four sample clusters (with sizes 20, 39, 24 and 39) and two gene clusters (with sizes 20, 23). Compared to the proposed approach, the adjusted Rand index values are 0.276 (sample), 0.818 (gene), and 0.386 (bicluster).

5. Discussion

In this article, we have conducted the biclustering analysis when functions (to be exact, their realizations at discrete time points), as opposed to scalars, are present. The data structure fits time-course gene expression and other experiments. The analysis objective is considerably more complex than the biclustering analysis of scalars and one-way clustering of functions. We have developed a novel approach based on the penalized fusion technique. Methodologically, it differs significantly from the existing biclustering and fusion approaches. Theoretically, it has the much desired consistency property, making it advantageous over some of the existing alternatives that do not have theoretical support. Numerically, it has generated more accurate clustering and estimation in simulation and led to different findings in data analysis.

In our estimation, we have adopted the penalized smoothing technique. An alternative, which may be computationally simpler, is to take fewer basis functions, with which we can eliminate the smoothness penalty. Theoretically and numerically, we expect similar performance. The fusion technique involves pairwise differences/penalties, which may incur higher computational cost when N and/or q are large. In our simulation, we have considered moderate values, which match our data analysis. It will be of interest to develop computationally more scalable approaches/algorithms, for example via model averaging. This is beyond our scope and will be postponed to the future. In data analysis, findings with certain support have been made. In the literature, most existing studies are on the “static” functionalities of genes. It will be important to further understand the dynamics of gene expressions and more solidly interpret the findings.

Supplementary Material

Supplemental Material

Table 1.

Example 1: Mean, median, and standard error (SE) of K^r, K^c, and K^b as defined in Section 2, as well as the percentage of identifying the corresponding true numbers based on 100 replicates.

N Method K^r
K^c
K^b
Mean Median SE Per Mean Median SE Per Mean Median SE Per

30 Proposed 2.83 3.00 0.53 0.90 2.83 3.00 0.53 0.90 8.29 9.00 2.18 0.90
bKmeans 2.76 3.00 0.64 0.66 1.13 1.00 0.46 0.05 3.09 3.00 1.34 0.03
funHDDC 2.63 2.00 0.86 0.28 2.76 3.00 0.43 0.76 7.27 6.00 2.70 0.21
funLBM 4.66 5.00 0.64 0.09 4.43 5.00 0.83 0.22 20.88 25.00 5.31 0.09
60 Proposed 2.91 3.00 0.43 0.93 2.90 3.00 0.41 0.94 8.61 9.00 1.74 0.93
bKmeans 2.86 3.00 0.57 0.66 1.18 1.00 0.54 0.07 3.43 3.00 1.97 0.05
funHDDC 2.20 2.00 0.64 0.04 2.99 3.00 0.10 0.99 6.58 6.00 1.92 0.04
funLBM 3.42 3.00 0.64 0.66 3.24 3.00 0.55 0.82 11.15 9.00 3.31 0.55
90 Proposed 2.93 3.00 0.36 0.96 2.93 3.00 0.36 0.96 8.71 9.00 1.45 0.96
bKmeans 2.83 3.00 0.51 0.74 1.23 1.00 0.58 0.08 3.51 3.00 1.87 0.08
funHDDC 2.14 2.00 0.38 0.12 2.96 3.00 0.20 0.96 6.33 6.00 1.17 0.11
funLBM 3.25 3.00 0.46 0.76 3.30 3.00 0.54 0.74 10.68 9.00 2.03 0.52

Acknowledgments

We thank the Editor-in-Chief, Managing Editor, and two reviewers for insightful comments and suggestions. This work was supported by the National Natural Science Foundation of China (11971404, 72071169), Humanity and Social Science Youth Foundation of Ministry of Education of China (19YJC910010), Basic Scientific Project 71988101 of National Science Foundation of China, 111 Project (B13028), National Institutes of Health (CA204120), and National Science Foundation (1916251).

Appendix A. Proofs

Proof of Proposition 1.

By the definitions of Hr(m+1) and Hc(m+1), for any Hr and Hc, we have

Lθ(β(m+1),Hr(m+1),Hc(m+1),Λr(m),Λc(m))Lθ(β(m+1),Hr,Hc,Λr(m),Λc(m)).

Fig. 5.

Fig. 5.

Analysis of vaccine data with samples under 305A: Curves of observed data (black dotted) and estimated functions (blue solid), as well as yellow points indicating the estimated values at t ∈ {1, 2, 3, 4, 5, 7, 14, 21, 28} by the proposed method. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Let Ξ(β(m+1))={(Hr,Hc):Brβ(m+1)vec(Hr)=0,Bcβ(m+1)vec(Hc)=0} and P=δΔ(r)pτ(ηδ(r)2,γ2)+δΔ(c)pτ(ηδ(c)2,(N/q)1/2γ2). We can define

f(m+1)=infΞ(β(m+1)){12YUβ(m+1)22+12γ1β(m+1)Mβ(m+1)+P}=infΞ(β(m+1))Lθ(β(m+1),Hr,Hc,Λr(m),Λc(m)),

and then Lθ(β(m+1),Hr(m+1),Hc(m+1),Λr(m),Λc(m))f(m+1).

For any integer n, we have vec(Λr(m+n1))=vec(Λr(m))+θi=1n1[vec(Hr(m+i))Brβ(m+i)] and vec(Λc(m+n1))=vec(Λc(m))+θi=1n1[vec(Hc(m+i))Bcβ(m+i)], and then

Lθ(β(m+n),Hr(m+n),Hc(m+n),Λr(m+n1),Λc(m+n1))=12YUβ(m+n)22+12γ1β(m+n)Mβ(m+n)+P+{vec(Λr(m))+θi=1n1[vec(Hr(m+i))Brβ(m+i)]}×[Vec(Hr(m+n))Brβ(m+n)]+{vec(Λc(m))+θi=1n1[vec(Hc(m+i))Bcβ(m+i)]}[vec(Hc(m+n))Bcβ(m+n)]+θ2vec(Hr(m+n)Brβ(m+n))22+θ2vec(Hc(m+n)Bcβ(m+n))22f(m+n).

Since the augmented Lagrangian function Lθ(β, Hr, Hc, ΛrΛc) is differentiable with respect to β and is convex with respect to each ηδ(r) and ηδ(c). By Theorem 4.1 of [38], there exists a limit point of (β(m),Hr(m),Hc(m)), denoted by (β*,Hr*,Hc*). Then we have

f*=limmf(m+1)=limmf(m+n)=infΞ(β*){12YUβ*22+12γ1β*Mβ*+P}

For all t ≥ 0, we have

limmLθ(β(m+n),Hr(m+n),Hc(m+n),Λr(m+n1),Λc(m+n1))=12YUβ*22+12γ1β*Mβ*+P+limmvec(Λr(m))[vec(Hr*)Brβ*]+(n12)θvec(Hr*)Brβ*22+limmvec(Λc(m))[vec(Hc*)Bcβ*]+(n12)θvec(Hc*)Bcβ*22f*.

Thus

limmrr(m+1)22=Brβ*vec(Hr*)22=0,limmrc(m+1)22=Bcβ*vec(Hc*)22=0.

Besides, by the definition of β(m+1), we have that

Lθ(β(m+1),Hr(m+1),Hc(m+1),Λr(m),Λc(m))/β=U(YUβ(m+1))+γ1Mβ(m+1)θBr[vec(Hr(m))+vec(Λr(m))/θBrβ(m+1)]θBc[vec(Hc(m))+vec(Λc(m))/θBcβ(m+1)]=U(YUβ(m+1))+γ1Mβ(m+1)Brvec(Λr(m))θBr[vec(Hr(m))Brβ(m+1)]Bcvec(Λc(m))θBc[vec(Hc(m))Bcβ(m+1)]=U(YUβ(m+1))+γ1Mβ(m+1)Brvec(Λr(m+1))+θBr[vec(Hr(m+1))vec(Hr(m))]Bcvec(Λc(m+1))+θBc[vec(Hc(m+1))vec(Hc(m))]=0.

Then we can obtain

sr(m+1)+sc(m+1)=U(YUβ(m+1))γ1Mβ(m+1)+Brvec(Λr(m+1))+Bcvec(Λc(m+1)).

By Brβ*vec(Hr*)22=0 and Bcβ*vec(Hc*)22=0, we have

limmLθ(β(m+1),Hr(m+1),Hc(m+1),Λr(m),Λc(m))/β=U(YUβ(m+1))+γ1Mβ(m+1)Brvec(Λr(m+1))Bcvec(Λc(m+1))=0.

Therefore limmsr(m+1)+sc(m+1)=0. □

Let |Gkr,kc(r,c)*|=(i,j)Gkr,kc(r,c)ni,j and nm=maxi{1,,N},j{1,,q}ni,j<. Then |Gkr,kc(r,c)||Gkr,kc(r,c)*|nm|Gkr,kc(r,c)|. Denote the number of internal knots as J and then J = pd. Recall that b=min(kr,kc)(kr,kc)g(kr,kc)*g(kr,kc)*.

Lemma 1.

Under Condition (C1), there exists a spline approximation αkr,kc*Up(t) of the true function g(kr,kc)*(t) for kr ∈ {1, …, Kr} and kc ∈ {1, …, Kc}, such that

suptT|g(kr,kc)*(t)αkr,kc*Up(t)|=O(Jκ).

Proof.

Lemma 1 follows from Corollary 6.21 of [34]. This lemma has been used in a number of studies that involve spline expansion [25,42]. We omit the proof here. □

Lemma 2. Under Conditions (C1)–(C3) and bJ−κ, there exists a constant C2 > 0 such that for all (kr,kc)(kr,kc), such that

αkr,kc*αkr,kc*212C21/2b,

when N and q are sufficiently large.

Proof.

By the triangular inequality, we have

(αkr,kc*αkr,kc*)Upg(kr,kc)*g(kr,kc)*g(kr,kc)*αkr,kc*Upg(kr,kc)*αkr,kc*Up. (A.1)

Besides, by Theorem 5.4.2 of [13], Condition (C2), and the definition of the rescaled B-spline basis, for any vector αp×1, there exists a constant C2 > 0 such that

αUp2C2α22. (A.2)

Combining (A.1), (A.2), and Lemma 1, we have

αkr,kc*αkr,kc*2C21/2{g(kr,kc)*g(kr,kc)*g(kr,kc)*αkr,kc*Upg(kr,kc)*αkr,kc*Up}C21/2(b2M2Jκ)>C21/2(b2×14b)=12C21/2b,

where the third inequality is obtained when N and q are sufficiently large since bJκ. □

Lemma 3

(Bernstein’s Inequality, Lemma 2.2.11 in [39]). For independent random variables Y1, …, Yn with means 0 and E|Yi|mm!Mm2vi/2 for some constants M, vi, and every m ≥ 2,

P(|Y1++Yn|>x)2exp{12x2v+Mx},

where v = v1 + · · · + vn.

Proof of Theorem 1.

Given β^orMG, when the true block memberships G1,1(r,c),,GKr,Kc(r,c) are known, the oracle estimators for all βi,j’s are the same if (i.j)Gkr,kc(r,c). Thus we can explore the properties of β^or by examining the properties of the oracle common coefficient vector α^or=(α^1,1or,,α^kr,kcor,,α^Kr,Kcor), which is defined as

α^or=argminαkr=1Krkc=1KcL^or(αkr,kc),

and

L^or(αkr,kc)=12Y(kr,kc)U(kr,kc)αkr,kc22+γ1|Gkr,kc(r,c)|αkr,kcDαkr,kc,

where Y(kr,kc)=vec{Yi,j,(i,j)Gkr,kc}, U(kr,kc)=(Ui,j,(i,j)Gkr,kc). The corresponding true B-spline coefficient vector is denoted by α*=(α1,1*,,αkr,kc*,,αKr,Kc*). Note that

L^or(αkr,kc)αkr,kc|αkr,kc=α^kr,kcorL^or(αkr,kc)αkr,kc|αkr,kc=αkr,kc*=L^or(αkr,kc)αkr,kcαkr,kc|αkr,kc=α¯kr,kc(α^kr,kcorαkr,kc*),

where α¯kr,kc is between α^kr,kcor and αkr,kc*. Then we have

α^kr,kcorαkr,kc*=(L^or(αkr,kc)αkr,kcαkr,kc|αkr,kc=α¯kr,kc)1L^or(αkr,kc)αkr,kc|αkr,kc=αkr,kc*.

Hence

α^kr,kcorαkr,kc*2|Gkr,kc(r,c)*|(L^or(αkr,kc)αkr,kcαkr,kc|αkr,kc=α¯kr,kc)12|Gkr,kc(r,c)*|1L^or(αkr,kc)αkr,kc|αkr,kc=αkr,kc*2:=Akr,kc(1)×Akr,kc(2). (A.3)

By Lemma A.8 of [41], Conditions (C1) and (C2), we can derive that there exists a constant C3 > 0 such that for any 1 ≤ krKr, 1 ≤ kcKc,

P(Akr,kc(1)C3)=P(U(kr,kc)U(kr,kc)|Gkr,kc(r,c)*|+γ1|Gkr,kc(r,c)|D|Gkr,kc(r,c)*|2C3)1p/(Nq). (A.4)

Besides, note that

Akr,kc(2)=U(kr,kc)|Gkr,kc(r,c)*|(Y(kr,kc)g(kr,kc)*+g(kr,kc)*U(kr,kc)αkr,kc*)+γ1|Gkr,kc(r,c)||Gkr,kc(r,c)*|Dαkr,kc*2U(kr,kc)|Gkr,kc(r,c)*|ϵkr,kc2+U(kr,kc)|Gkr,kc(r,c)*|(g(kr,kc)*U(kr,kc)αkr,kc*)2+γ1Gkr,kc(r,c)|Gkr,kc(r,c)*|Dαkr,kc*2:=Bkr,kc(1)+Bkr,kc(2)+Bkr,kc(3). (A.5)

Since the rescaled B-spline values are finite, there exists constant M1 > 0 such that Ul,p(t) ≤ M1 for l ∈ {1, …, p}. Let U(i,jl denote the lth column of U(i,j), and we verify the condition of Lemma 3 by Condition (C5)

E|U(i,j)lϵi,j|mE(|U(i,j)lU(i,j)l|m/2|ϵi,jϵi,j|m/2)(F1M1)mm!E(exp{F|ni,j1ϵi,jϵi,j|1/2})(F1M1)mm!c2.

Applying Lemma 3, we have

P(|(i,j)Gkr,kc(r,c)U(i,j)lϵi,j|>x)2exp{12x2v+F1M1x}, (A.6)

where v=(i,j)Gkr,kc(r,c)vi,j and vi,j=2F2M12c2.

Let U(kr,kc)l denote the lth column of U(kr,kc). For some constant 0 < C < ∞, combining Condition (C5) and (A.6), we have

P(|Gkr,kc(r,c)*|1U(kr,kc)ϵkr,kc>CF1M1(log(Nq)/|Gkr,kc(r,c)*|)1/2)l=1pP(|U(kr,kc)lϵkr,kc|>CF1M1(log(Nq)|Gkr,kc(r,c)*|)1/2)=l=1pP(|(i,j)Gkr,kc(r,c)U(i,j)lϵi,j|>CF1M1(log(Nq)|Gkr,kc(r,c)*|)1/2)2pexp{12C2F2M12(log(Nq)|Gkr,kc(r,c)*|)2F2M12c2|Gkr,kc(r,c)|+CF2M12(log(Nq)|Gkr,kc(r,c)*|)1/2}2pexp{log(Nq)}2p/Nq.

Hence, we have that with probability at least 1 − 2p/(Nq),

Bkr,kc(1)CF1M1(plog(Nq)/|Gkr,kc(r,c)|)1/2. (A.7)

By Lemma 1, there exists a constant M2 > 0 such that

Bkr,kc(2)p1/2U(kr,kc)|Gkr,kc(r)*|(g(kr,kc)*U(kr,kc)αkr,kc*)p1/2U(kr,kc)|Gkr,kc)(r,c)*|(g(kr,kc)*U(kr,kc)αkr,kc*)M1M2p1/2Jκ. (A.8)

In addition,

Bkr,kc(3)γ1|Gkr,kc(r,c)||Gkr,kc(r,c)*|αkr,kc*2D2p1/2γ1αkr,kc*D2. (A.9)

Thus by (A.5), (A.7), (A.8), and (A.9), for any 1 ≤ krKr, 1 ≤ kcKc, with probability at least 1 − 2p/(Nq),

Akr,kc(2)CF1M1(plog(Nq)/|Gmin(r,c)|)1/2+M1M2p1/2Jκ+maxkr,kcαkr,kc*D2γ1p1/2.

By Condition (C1) and γ1=o(|Gmin(r,c)|1/2), when N and q are sufficiently large, we have

p1/2Jκ(plog(Nq)/|Gmin(r,c)|)1/2,p1/2γ1(plog(Nq)/|Gmin(r,c)|)1/2.

Hence, for any 1 ≤ krKr, 1 ≤ kcKc, with probability at least 1 − 2p/(Nq),

Akr,kc(2)C4(plog(Nq)/|Gmin(r,c)|)1/2,

where C4 is a large constant. Together with (A.3) and (A.4), for any 1 ≤ krKr, 1 ≤ kcKc,

P(α^kr,kcorαkr,kc*2C3C4(plog(Nq)/|Gmin(r,c)|)1/2)1P(Akr,kc(1)>C3)P(Akr,kc(2)>C4(plog(Nq)/|Gmin(r,c)|)1/2)13p/(Nq).

By the Bonferroni’s inequality, we have

P(sup1krKr,1kcKcα^kr,kcorαkr,kc*2C3C4(plog(Nq)/|Gmin(r,c)|)1/2)1kr=1Krkc=1KcP(α^kr,kcorαkr,kc*2>C3C4(plog(Nq)/|Gmin(r,c)|)1/2)13KrKcp/(Nq).

By Lemma 1 and (A.2), we have

g^(kr,kc)org(kr,kc)*=α^kr,kcorUpαkr,kc*Up+αkr,kc*Upg(kr,kc)*(α^kr,kcorαkr,kc*)Up+αkr,kc*Upg(kr,kc)*C21/2C3C4(plog(Nq)/|Gmin(r,c)|)1/2+M2Jκ(C21/2C3C4+M2/2)(plog(Nq)/|Gmin(r,c)|)1/2=C*(plog(Nq)/|Gmin(r,c)|)1/2,

where C*=max{C3C4,C21/2C3C4+M2/2}. That is,

P(sup1krKr,1kcKcg^(kr,kc)org(kr,kc)*ψ)13KrKcp/(Nq),

where ψ=C*(plog(Nq)/|Gmin(r,c)|)1/2. □

Proof of Theorem 2.

Let ρ1(t)=γ21pτ(t,γ2) and ρ2(t)=((N/q)1/2γ2)1pτ(t,(N/q)1/2γ2). Define

Q(β)=12i=1Nj=1q(Yi,jUi,jβi,j22+γ1βi,jDβi,j),
Pen(β)=γ2(i1,i2)Δ(r)ρ1(βi1(r)βi2(r)2)+(N/q)1/2γ2(j1,j2)Δ(c)ρ2(βj1(c)βj2(c)2),
QG(α)=12kr=1Krkc=1Kc(Y(kr,kc)U(kr,kc)αkr,kc22+γ1|Gkr,kc(r,c)|αkr,kcDαkr,kc),
PenG(α)=γ2kr<kr|Gkr(r)||Gkr(r)|ρ1(αkr(r)αkr(r)2)+(N/q)1/2γ2kc<kc|Gkc(c)||Gkc(c)|ρ2(αkc(c)αkc(c)2),

where αkr(r)=(αkr,1(r),,αkr,q(r)) with αkr,j(r)=αkr,k if jGk(c), αkc(c)=(α1,kc(c),,αN,kc(c)) with αi,kC(c)=αk,kc if iGk(r). Let L(β)=Q(β)+Pen(β), LG(α)=QG(α)+PenG(α)

We define two mappings, T˜:MGM˜G and T^:NqpM^G, and the two subspaces are defined by

M˜G={αKrKcp:αkr,kc=βi,j,forany(i,j)Gkr,kc(r,c),1krKr,1kcKc},
M^G={αKrKcp:αkr,kc=|Gkr,kc(r,c)|1(i,j)Gkr,kc(r,c)βi,j,1krKr,1kcKc}.

For every βMG, we have Pen(β)=PenG(T˜(β)), and for every αM˜G, we have Pen(T˜1(α))=PenG(α). Hence

L(β)=LG(T˜(β)),LG(α)=L(T˜1(α)). (A.10)

Consider the neighborhood of β*:

Θ={βNqp:sup1iN,1jqβi,jβi,j*2ψ}.

By the result in Theorem 1, there is an event E1 such that on E1,

sup1iN,1jqβ^i,jorβi,j*2ψ,

and P(E1C)3KrKcp/(Nq). Hence β^orΘ on E1. For any βNqp, let β˜=T˜1(T^(β)). Inspired by [27], we show that β^or is a strictly local minimizer of objective function (3) with probability tending to 1 through the following two steps:

  1. (i) On E1, L(β˜)>L(β^or) for any βΘ and β˜β^or.

  2. (ii) There is an event E2 such that P(E2C)c2/(Nq). On E1E2, there is a neighborhood of β^or, denoted by Θ, such that L(β)L(β˜) for any βΘΘ for sufficiently large N and q.

Therefore, by the results in (i) and (ii), we have L(β)>L(β^or) for any βΘΘ and β˜β^or, so that β^or is a strictly local minimizer of L(β) on E1E2 with P(E1E2) ≥ 1 − 3KrKpp/(Nq) − c2/(Nq) for sufficiently large N and q.

Firstly, we prove the result in (i). Let T^(β)=α=(α1,1,,αKr,KC) and αkr(r)*=(βi,1(r)*,,βi,q(r)*) for iGkr(r). Since

αkr(r)αkr(r)2αkr(r)*αkr(r)*22sup1krKrαkr(r)αkr(r)*2,

and

sup1krKrαkr(r)αkr(r)*22=sup1krKr{kc=1Kc|Gkc(c)|iGkr(r)jGkc(c)βi,j/(|Gkr(r)Gkc(c)|)αkr,kc*22}sup1krKr|Gkr(r)|1kc=1KciGkr(r)jGkc(c)βi,jβi,j*22qsup1iN,1jqβi,jβi,j*22 (A.11)

by Lemma 2, for any krkr

αkr(r)αkr(r)212|Gmin(c)|1/2C21/2b2q1/2sup1iN,1jqβi,jβi,j*212|Gmin(c)|1/2C21/2b2q1/2C3ψ>aγ2.

The last inequality follows from the assumption that |Gmin(c)|1/2bγ2(pq)1/2log(Nq)/min{|Gmin(r)|,|Gmin(c)|}q1/2ψ Similarly, for any kckc, we have

αkc(c)αkc(c)212|Gmin(r)|1/2C21/2b2N1/2sup1iN,1jqβi,jβi,j*212|Gmin(r)|1/2C21/2b2N1/2C3ψ>a(N/q)1/2γ2.

Hence by Condition (C4), PenG(T^(β))=Cpen, a constant, and hence LG(T^(β))=QG(T^(β))+Cpen for all βΘ. Since α^or is the unique global minimizer of QG(α), QG(T^(β))>QG(α^or) for all T^(β)α^or, and thus LG(T^(β))>LG(α^or) for all T^(β)α^or. By (A.10), we have LG(T^(β))=L(β˜) and LG(α^or)=L(β^or). Therefore L(β˜)>L(β^or) for all β˜β^or, and the result (i) is proved.

Next we prove result (ii). For a positive sequence νn, let

Θ={βNqp:sup1iNβi(r)β^i(r)or2vn,sup1jqβj(c)β^j(c)or2vn},
Penr(β)=γ2(i1,i2)Δ(r)ρ1(βi1(r)βi2(r)2),Penc(β)=(N/q)1/2γ2(j1,j2)Δ(c)ρ2(βj1(c)βj2(c)2),

and Pen(β) = Penr(β) + Penc(β). For βΘΘ, by Taylor’s expansion, we have

L(β)L(β˜)=Ω1+Ω2+Ω3, (A.12)

where

Ω1=i=1Nj=1q[Ui,j(Yi,jUi,jβ¯i,j)+γ1Dβ¯i,j](βi,jβ˜i,j),
Ω2=i=1N(Penr(β¯)βi(r)|βi(r)=β¯i(r))(βi(r)β˜i(r)),Ω3=j=1q(Penc(β¯)βj(c)|βj(c)=β¯j(c))(βj(c)β˜j(c)),

with β¯=(β¯1,1,,β¯N,q) and β¯i,j=sβi,j+(1s)β˜i,j for some s ∈ (0, 1).

Firstly, we have

Ω2=γ2i1<i2ρ1(β¯i1(r)β¯i2(r)2)β¯i1(r)β¯i2(r)21(β¯i1(r)β¯i2(r))(βi1(r)β˜i1(r))+γ2i1>i2ρ1(β¯i1(r)β¯i2(r)2)β¯i1(r)β¯i2(r)21(β¯i1(r)β¯i2(r))(βi1(r)β˜i1(r))=γ2i1<i2ρ1(β¯i1(r)β¯i2(r)2)β¯i1(r)β¯i2(r)21(β¯i1(r)β¯i2(r))[(βi1(r)β˜i1(r))(βi2(r)β˜i2(r))].

When i1, i2Gkr(r), β˜i1=β˜i2. Thus

Ω2=γ2kr=1Kri1,i2Gkr(r),i1<i2ρ1(β¯i1(r)β¯i2(r)2)β¯i1(r)β¯i2(r)21(β¯i1(r)β¯i2(r))(βi1(r)βi2(r))+γ2kr<kri1Gkr(r)i2Gkr(r)ρ1(β¯i1(r)β¯i2(r)2)β¯i1(r)β¯i2(r)21(β¯i1(r)β¯i2(r))[(βi1(r)β˜i1(r))(βi2(r)β˜i2(r))].

As shown in Theorem 1, supiβ˜i(r)βi(r)*22=supkrαkr(r)αkr(r)*22qψ2. Since β¯i(r)=sβi(r)+(1s)β˜i(r)supiβ¯i(r)βi(r)*2sq1/2ψ+(1s)q1/2ψ=q1/2ψ. For krkr, i1Gkr(r), i2Gkr(r), we have

β¯i1(r)β¯i2(r)2mini1Gkr(r),i2Gkr(r)βi1(r)*βi2(r)*22maxiβ¯i(r)βi(r)*212|Gmin(c)|1/2C21/2b2q1/2ψ>aγ2,

and thus ρ1(β¯i1(r)β¯i2(r)2)=0. Therefore,

Ω2=γ2kr=1Kri1,i2Gkr(r),i1<i2ρ1(β¯i1(r)β¯i2(r)2)βi1(r)βi2(r)2γ2kr=1Kri1,i2Gkr(r),i1<i2ρ1(β¯i1(r)β¯i2(r)2)q1/2j=1qβi1,jβi2,j2=γ2q1/2kr=1Krkc=1Kci1,i2Gkr(r),i1<i2jGkc(c)ρ1(β¯i1(r)β¯i2(r)2)βi1,jβi2,j2.

Similarly to (A.11), supiβ˜i(r)β^i(r)or2vn and supiβi(r)β^i(r)or2vn. Then we have

supi1<i2β¯i1(r)β¯i2(r)22supiβ¯i(r)β˜i(r)22supiβi(r)β˜i(r)22(supiβi(r)β^i(r)or2+supiβ˜i(r)β^i(r)or2)4vn.

Hence ρ1(β¯i1(r)β¯i2(r)2)ρ1(4vn) by the concavity of ρ(·). As a result,

Ω2γ2q1/2kr=1Krkc=1Kci1,i2Gkr(r),i1<i2jGkc(c)ρ1(4vn)βi1,jβi2,j2. (A.13)

Next we consider 3. Similarly to the derivation of (A.13), we can derive

Ω3γ2q1/2kr=1Krkc=1Kcj1,j2Gkc(c)j1<j2iGkr(r)ρ2(4vn)βi,j1βi,j22. (A.14)

Lastly for 1, we have

Ω1=i=1Nj=1qwi,j(βi,jβ˜i,j)=kr=1Krkc=1Kci1,i2Gkr(r)j1,j2Gkc(c)wi1,j1(βi1,j1βi2,j2)|Gkr,kc(r,c)|,

and

kr=1Krkc=1Kci1,i2Gkr(r)j1,j2Gkc(c)|wi1,j1(βi1,j1βi2,j2)||Gkr,kc(r,c)|supi,jwi,j2kr=1Krkc=1Kci1,i2Gkr(r)j1,j2Gkc(c)βi1,j1βi2,j22|Gkr,kc(r,c)|2supi,jwi,j2kr=1Krkc=1Kci1,i2Gkr(r),i1<i2jGkc(c)βi1,jβi2,j2|Gkr(r)|+2supi,jwi,j2kr=1Krkc=1Kcj1,j2Gkc(c)j1<j2iGkr(r)βi,j1βi,j22|Gkc(c)|,

where wi,j=Ui,j(Yi,jUi,jβ¯i,j)γ1Dβ¯i,j. Note that

supi,jwi,j2supi,jUi,j(gi,j*Ui,jβi,j*)2+supi,j(Ui,jUi,j+γ1D)(βi,j*β¯i,j)2+supi,jγ1Dβi,j*2+supi,jUi,jϵi,j2.

By Lemma 1 supi,jUi,j(gi,j*Ui,jβi,j*)2nmM1M2p1/2Jκ. Moreover, supi,j(Ui,jUi,j+γ1D)(βi,j*β¯i,j)2(nm1/2p1/2M1+γ1D2)ψ,supi,jγ1Dβi,j*2p1/2γ1D2β*. With the Bonferroni’s inequality, Markov’s inequality, and Condition (C5), we have

P(supi,jU(i,j)ϵi,j2>2ni,jF1M1p1/2log(Nq))i=1Nj=1qP(U(i,j)ϵi,j2>2ni,jF1M1p1/2log(Nq))i=1Nj=1qP(Fni,j1/2ϵi,j2>2log(Nq))c2/(Nq).

Together with Conditions (C1) and (C3), we have

supi,jwi,j2=O(p1/2log(Nq)) (A.15)

holds with probability at least 1 − c2/(Nq). Let νn = o(1), then ρ1(4vn)1 and ρ2(4vn)1. Since γ2(pq)1/2log(Nq)/min{|Gmin(r)|,|Gmin(c)|}, then by (A.12)–(A.15)

L(β)L(β˜)=Ω1+Ω2+Ω3kr=1Krkc=1Kci1,i2Gkr(r),i1<i2jGkc(c)[γ2q1/2ρ1(4vn)2supi,jwi,j2|Gkr(r)|]βi1,jβi2,j2+kr=1Krkc=1Kcj1,j2Gkc(c)j1<j2iGkr(r)[γ2q1/2ρ2(4vn)2supi,jwi,j2|Gkc(c)|]βi,j1βi,j220

holds with probability at least 1 − c2/(Nq), which completes the proof of result (ii). □

Footnotes

CRediT authorship contribution statement

Kuangnan Fang: Methodology, Formal analysis Writing – original draft. Yuanxing Chen: Data curation, Investigation, Software, Writing – original draft. Shuangge Ma: Conceptualization, Methodology, Writing – review & editing. Qingzhao Zhang: Conceptualization, Methodology, Validation, Writing – review & editing, Supervision.

Appendix B. Supplementary data

Supplementary material related to this article can be found online at https://doi.org/10.1016/j.jmva.2021.104874. The Supplementary section contains additional tables and figures for Examples 25.

References

  • [1].Abraham C, Cornillon PA, Matzner-Løber E, Molinari N, Unsupervised curve clustering using B-splines, Scand. J. Stat 30 (3) (2003) 581–595. [Google Scholar]
  • [2].Aneiros G, Cao R, Fraiman R, Genest C, Vieu P, Recent advances in functional data analysis and high-dimensional statistics, J. Multivariate Anal. 170 (2019) 3–9. [Google Scholar]
  • [3].Biau G, Devroye L, Lugosi G, On the performance of clustering in hilbert spaces, IEEE Trans. Inform. Theory 54 (2) (2008) 781–790. [Google Scholar]
  • [4].Bouveyron C, Bozzi L, Jacques J, Jollois F-X, The functional latent block model for the co-clustering of electricity consumption curves, J. R. Stat. Soc. Ser. C. Appl. Stat 67 (4) (2018) 897–915. [Google Scholar]
  • [5].Boyd S, Parikh N, Chu E, Peleato B, Eckstein J, Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends Mach. Learn 3 (1) (2011) 1–122. [Google Scholar]
  • [6].Chaussabel D, Quinn C, Shen J, Patel P, Glaser C, Baldwin N, Stichweh D, Blankenship D, Li L, A modular analysis framework for blood genomics studies: application to systemic lupus erythematosus, Immunity 29 (1) (2008) 150–164. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Chen J, Zhang S, Integrative analysis for identifying joint modular patterns of gene-expression and drug-response data, Bioinformatics 32 (11) (2016) 1724–1732. [DOI] [PubMed] [Google Scholar]
  • [8].Chi EC, Lange K, Splitting methods for convex clustering, J. Comput. Graph. Statist. 24 (4) (2015) 994–1013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Chiou J-M, Li P-L, Functional clustering and identifying substructures of longitudinal data, J. R. Stat. Soc. Ser. B Stat. Methodol 69 (4) (2007) 679–699. [Google Scholar]
  • [10].Chiou J-M, Li P-L, Correlation-based functional clustering via subspace projection, J. Amer. Statist. Assoc. 103 (484) (2008) 1684–1692. [Google Scholar]
  • [11].Chu W, Li R, Matthew R, Feature screening for time-varying coefficient models with ultrahigh dimensional longitudinal data, Ann. Appl. Stat 10 (2) (2016) 596–617. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Coffey N, Hinde J, Holian E, Clustering longitudinal profiles using P-splines and mixed effects models applied to time-course gene expression data, Comput. Statist. Data Anal 71 (2014) 14–29. [Google Scholar]
  • [13].DeVore RA, Lorentz GG, Constructive Approximation: Polynomials and Splines Approximation, Springer-Verlag, Berlin, 1993. [Google Scholar]
  • [14].Fan J, Li R, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc. 96 (456) (2001) 1348–1360. [Google Scholar]
  • [15].Goia A, Vieu P, An introduction to recent advances in high/infinite dimensional statistics, J. Multivariate Anal. 146 (146) (2016) 1–6. [Google Scholar]
  • [16].Hejblum BP, Skinner J, Thiébaut R, Time-course gene set analysis for longitudinal gene expression data, PLoS Comput. Biol 11 (6) (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Jacques J, Preda C, Functional data clustering: a survey, Adv. Data Anal. Classif 8 (3) (2014) 231–255. [Google Scholar]
  • [18].Jacques J, Preda C, Model-based clustering for multivariate functional data, Comput. Statist. Data Anal 71 (2014) 92–106. [Google Scholar]
  • [19].Jain AK, Data clustering: 50 years beyond K-means, Int. Conf. Pattern Recognit 31 (8) (2010) 651–666. [Google Scholar]
  • [20].James GM, Sugar CA, Clustering for sparsely sampled functional data, J. Amer. Statist. Assoc 98 (462) (2003) 397–408. [Google Scholar]
  • [21].Kerr G, Ruskin HJ, Crane M, Doolan P, Techniques for clustering gene expression data, Comput. Biol. Med. 38 (3) (2008) 283–293. [DOI] [PubMed] [Google Scholar]
  • [22].Li S, Rouphael N, Duraisingham S, Romero-Steiner S, Presnell S, Davis CW, Schmidt DS, Johnson SE, Molecular signatures of antibody responses derived from a systems biology study of five human vaccines, Nat. Immunol. 15 (2) (2014) 195–204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Ling N, Vieu P, Nonparametric modelling for functional data: selected survey and tracks for future, Statistics 52 (4) (2018) 934–949. [Google Scholar]
  • [24].Liu L, Lin L, Subgroup analysis for heterogeneous additive partially linear models and its application to car sales data, Comput. Statist. Data Anal. 138 (2019) 239–259. [Google Scholar]
  • [25].Liu X, Wang L, Liang H, Estimation and variable selection for semiparametric additive partial linear models, Statist. Sinica 21 (3) (2011) 1225–1248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Ma P, Castillo-Davis CI, Zhong W, Liu JS, A data-driven clustering method for time course gene expression data, Nucleic Acids Res. 34 (4) (2006) 1261–1269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Ma S, Huang J, A concave pairwise fusion approach to subgroup analysis, J. Amer. Statist. Assoc. 112 (517) (2017) 410–423. [Google Scholar]
  • [28].Mankad S, Michailidis G, Biclustering three-dimensional data arrays with plaid models, J. Comput. Graph. Statist. 23 (4) (2014) 943–965. [Google Scholar]
  • [29].Opgen-Rhein R, Strimmer K, Inferring gene dependency networks from genomic longitudinal data: a functional data approach, REVSTAT 4 (2006) 53–65. [Google Scholar]
  • [30].Peng J, Müller H-G, Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions, Ann. Appl. Stat 2 (3) (2008) 1056–1077. [Google Scholar]
  • [31].Rangel C, Angus J, Ghahramani Z, Lioumi M, Sotheran E, Gaiba A, Wild DL, Falciani F, Modeling T-cell activation using gene expression profiling and state-space models, Bioinformatics 20 (9) (2004) 1361–1372. [DOI] [PubMed] [Google Scholar]
  • [32].Ruppert D, Selecting the number of knots for penalized splines, J. Comput. Graph. Statist. 11 (4) (2002) 735–757. [Google Scholar]
  • [33].Schmutz A, Jacques J, Bouveyron C, Cheze L, Martin P, Clustering multivariate functional data in group-specific functional subspaces, Comput. Statist (2020) 1–31. [Google Scholar]
  • [34].Schumaker LL, Spline Functions: Basic Theory, Wiley, New York, 2007. [Google Scholar]
  • [35].Slimen YB, Allio S, Jacques J, Model-based co-clustering for functional data, Neurocomputing 291 (2018) 97–108. [Google Scholar]
  • [36].Stone CJ, The dimensionality reduction principle for generalized additive models, Ann. Statist 14 (2) (1986) 590–606. [Google Scholar]
  • [37].Suarez AJ, Subhashis G, BayesIan clustering of functional data using local features, Bayesian Anal 11 (1) (2016) 71–98. [Google Scholar]
  • [38].Tseng P, Convergence of a block coordinate descent method for nondifferentiable minimization, J. Optim. Theory Appl. 109 (3) (2001) 475–494. [Google Scholar]
  • [39].Van Der Vaart AW, Wellner JA, Weak Convergence and Empirical Processes, Springer; New York, 1996. [Google Scholar]
  • [40].Wang J-L, Chiou J-M, Müller H-G, Functional data analysis, Annu. Rev. Stat. Appl 3 (1) (2016) 257–295. [Google Scholar]
  • [41].Wang L, Yang L, Spline-backfitted kernel smoothing of nonlinear additive autoregression model, Ann. Statist 35 (6) (2007) 2474–2503. [Google Scholar]
  • [42].Wang HJ, Zhu Z, Zhou J, Quantile regression in partially linear varying coefficient models, Ann. Statist 37 (6B) (2009) 3841–3866. [Google Scholar]
  • [43].Weiner J, Lewis DJM, Maertzdorf J, Mollenkopf H-J, Characterization of potential biomarkers of reactogenicity of licensed antiviral vaccines: randomized controlled clinical trials conducted by the biovacsafe consortium, Sci. Rep. 9 (1) (2019) 20362. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [44].Wu C, Kwon S, Shen X, Pan W, A new algorithm and theory for penalized regression-based clustering, J. Mach. Learn. Res 17 (1) (2015) 6479–6503. [PMC free article] [PubMed] [Google Scholar]
  • [45].Xie J, Ma A, Fennell A, Ma Q, Zhao J, It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data, Brief. Bioinform. 20 (4) (2019) 1450–1465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [46].Xu R, Wunsch D, Survey of clustering algorithms, IEEE Trans. Neural Netw. 16 (3) (2005) 645–678. [DOI] [PubMed] [Google Scholar]
  • [47].Zhang C-H, Nearly unbiased variable selection under minimax concave penalty, Ann. Statist 38 (2) (2010) 894–942. [Google Scholar]
  • [48].Zhu X, Qu A, Cluster analysis of longitudinal profiles with subgroups, Electron. J. Stat. 12 (2018) 171–193. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material

RESOURCES