Abstract
In biomedical data analysis, clustering is commonly conducted. Biclustering analysis conducts clustering in both the sample and covariate dimensions and can more comprehensively describe data heterogeneity. In most of the existing biclustering analyses, scalar measurements are considered. In this study, motivated by time-course gene expression data and other examples, we take the “natural next step” and consider the biclustering analysis of functionals under which, for each covariate of each sample, a function (to be exact, its values at discrete measurement points) is present. We develop a doubly penalized fusion approach, which includes a smoothness penalty for estimating functionals and, more importantly, a fusion penalty for clustering. Statistical properties are rigorously established, providing the proposed approach a strong ground. We also develop an effective ADMM algorithm and accompanying R code. Numerical analysis, including simulations, comparisons, and the analysis of two time-course gene expression data, demonstrates the practical effectiveness of the proposed approach.
Keywords: Biclustering, Functional data, Penalized fusion, primary 62H30, 62R10
1. Introduction
In biomedical data analysis, clustering has been routinely conducted. The clustering of samples can assist better understanding sample heterogeneity, and the clustering of covariates can identify those that behave similarly across samples and then, for example, improve our understanding of covariate functionalities. Clustering can also serve as the basis of other analysis, for example, regression. Biclustering analysis has also been developed, identifying clustering structures in both sample and covariate dimensions. It includes sample- and covariate-clustering as special cases and, in a sense, can be more comprehensive. For generic reviews of techniques, theories, and applications of clustering, we refer to [19,46].
This study has been partly motivated by the analysis of gene expression data, for which sample- and covariate-clustering as well as biclustering have been extensively conducted [21,45]. Most gene expression studies generate “snapshot” values. Unlike some types of omics measurements, gene expression values can be time-dependent, and the temporal trends of gene expressions can have important biological implications [16]. Accordingly, time-course gene expression studies have been conducted, generating multiple measurements at different time points for each gene of each sample. In the analysis of time-course gene expression data, besides simple statistics, functional data analysis (FDA) techniques, have been adopted and shown as powerful [12].
FDA deals with data samples that consist of curves or other infinite-dimensional data objects. Over the last two decades, we have witnessed significant developments in its theory, method, computation, and application. For systematic reviews, we refer to [2,15,23,40]. In FDA, clustering analysis has been of particular interest. A popular approach projects functional data into a finite-dimensional space and then applies existing clustering methods. For example, Abraham et al. [1] conduct B-spline expansions, and clusters the estimated coefficients using a k-means algorithm. Peng and Müller [30] develop a distance for sparse functional data, and apply a k-means algorithm to functional principle component analysis (PCA) scores. Other approaches, such as Bayesian [37], subspace [3,9,10], and model-based [18,20], have also been developed. We refer to [17,40] for surveys on functional data clustering. Most works in this area, however, have focused on either sample- or covariate-clustering.
For biclustering analysis (of gene expression and other types of data), in this article, we take the “natural next step” and consider the scenario where for each covariate of each sample, a function or its realizations at discrete time points are available. We note that, although this study has been partly motivated by gene expression data and some of the discussions are focused on such data, the considered data scenario and proposed technique can have applications far beyond such data. For example, in biomedical studies, many biomarkers measured in blood tests vary across time, and their values can be obtained from medical records. In financial studies, many measures of a company, for example size and stock price, vary across time. As such, our investigation can have broad applications.
There is a vast literature on biclustering analysis with scalar measurements. Directly applying such techniques to the present problem will involve either treating functional measurements as scalars and then computing distances (between covariates and samples) – which may be ineffective by not sufficiently accounting for the functional nature of data, or first estimating functionals and then computing distances between the estimates – which may also encounter challenges when a large number of functionals need to be jointly estimated. Our literature review suggests that there are also a handful recent biclustering methods designed for functional (especially including longitudinal) data. For example, Slimen et al. [35] propose a biclustering method for multivariate functional data based on the Gaussian latent block model (LBM) using the first functional PCA scores. Bouveyron et al. [4] develop an extension of the Gaussian LBM by modeling the whole set of functional PCA scores. In another work [28], a biclustering method with a plaid model is extended to three-dimensional data arrays, of which multivariate longitudinal data is a special case.
For the biclustering analysis of functionals, in this article, we develop a penalized fusion based approach. More specifically, a nonparametric model is assumed for each covariate of each sample, allowing for sufficient flexibility in modeling. A doubly penalization technique is adopted, which includes a smoothness penalty to regulate nonparametric estimation. The most significant advancement is the second, fusion penalty, which “transforms” clustering in both sample and covariate dimensions to a penalized estimation problem. Statistical and numerical investigations are conducted, providing the proposed approach a solid ground. This study may complement and advance from the existing ones in multiple aspects. Compared to direct applications of biclustering methods for scalars (that either directly compute distances without functional estimation or estimate functionals separately), the proposed approach can more effectively accommodate the functional nature of data or generate more effective estimation. This is because it “combines” clustering and estimation, and as such, estimation only needs to be conducted for clusters as opposed to individual covariates, potentially leading to a smaller number of parameters and hence more effective estimation. Compared to some of the existing biclustering methods for functionals, such as [4,35], the proposed approach has a much easier way of determining the number of clusters. In addition, unlike [4,35], it does not make stringent distributional assumptions (for example, normality). Meanwhile, rigorous theoretical investigations are conducted beyond methodological developments, granting the proposed approach a stronger statistical basis. It also advances from the clustering of functional covariate effects (assuming homogeneous samples) by simultaneously examining sample heterogeneity, thus being more comprehensive. Additionally, this study may also advance and enrich the penalized fusion technique. Clustering via penalized fusion has been pioneered in [8] and other studies. Compared to alternative clustering techniques, it is more recent and has notable statistical and numerical advantages [44]. Compared to the existing penalized fusion based clustering, this study differs by conducting biclustering and by having unknown parameters generated from the basis expansion of functionals. Last but not least, this study also provides a practically useful and new way of analyzing time-course gene expression data (and other data with similar characteristics).
The remainder of this article is organized as follows: Section 2 introduces the new biclustering approach via penalized fusion and develops an effective computational algorithm. Statistical properties are established to provide our method a strong theoretical support. Simulation studies and the analysis of two time-course expression data are conducted in Sections 3 and 4, respectively. Section 5 concludes with a brief discussion. The proofs of the main results are presented in Appendix A.
2. Methods
2.1. Data and model settings
For the j ∈ {1, …, q}th covariate of sample i ∈ {1, …, N }, denote as the ordered measurements (ordered by time for time-course gene expression data), which are the discrete realizations of an unknown underlying functional. Further denote , and . Under the biclustering analysis framework, assume that data can be “decomposed” into Kr sample (row) groups and Kc covariate (column) groups. Note that advancing from many existing approaches, the numbers of two dimensional groups are not pre-specified. Denote as the observed time points. If (sample i, covariate j) belongs to the kr th sample group and the kc th covariate group, then
| (1) |
where is the unknown mean function, and are the random errors with mean zero.
For estimation, we adopt the basis expansion technique. Specifically, denote Up(t) = (U1,p(t), …, Up,p(t))⊤ as the collection of p rescaled basis functions. In the literature, there are extensive studies on choosing the form and number of basis functions [32], which will not be reiterated here. In our numerical study, we adopt B-spline basis functions of order d = 3. Let gi,j(t) be the unknown mean function for the jth covariate of the ith sample, then we have
where βi,j = (βi,j,1, …,βi,j,p)⊤ is the vector of unknown coefficients. Further denote . For estimation (without clustering), consider the objective function
| (2) |
where U = diag(U1,1, …, U1,q, …, UN,q), , M = diag(D, …, D), D = δ⊤ δ, δ is a (p − 2) × p matrix representing the second order differential operator, and γ1 is a non-negative tuning parameter. In this objective function, the first term is the lack-of-fit, and the penalty term controls the smoothness of estimation.
2.2. Biclustering via penalized fusion
Under the clustering via penalized fusion framework, two samples (covariates) belong to the same cluster if and only if they have the same regression coefficients. As such, clustering amounts to determining whether two samples (covariates) have the same estimated coefficients. For samples i1, i2 ∈ {1, …, N}, denote , as the length p × q vectors of coefficients. For covariates j1, j2 ∈ {1, …, q}, denote , as the length p × N vectors of coefficients. For estimating β and hence determining the clustering structure, we propose minimizing the objective function:
| (3) |
Here pτ (, ) is a penalty function, τ is a regularization parameter, ∥ · ∥2 is the ℓ2 norm, and γ2 is a data-dependent tuning parameter. (N/q)1/2 is added to make the two penalties comparable. In our numerical study, we adopt MCP [47], that is, with τ > 1. Here (x)+ = x if x > 0, and (x)+ = 0 otherwise. Note that SCAD [14] and some other penalties are also applicable. Denote the estimator as . Let be the distinct values of . Similarly, let be the distinct values of . We can then obtain the block structure of by , which are the distinct values of , and set .
In (3), penalty is imposed to the norms of all pairwise differences to promote equality, as in “standard” penalized fusion [8]. Here it is noted that, as in [8], since there is no information on the order of samples/covariates, all pairwise differences are taken, which differs from, for example, fused Lasso and other fused penalizations. Different from [8], as clustering needs to be conducted in both the sample and covariate dimensions, two fusion penalties are imposed, promoting equality in two directions. It is also noted that each specific coefficient shows up in three different penalties. As to be shown below, with properly chosen tunings, there is not an over penalization problem. In addition, it is not rare to have a parameter involved in two or more penalties [7].
The proposed approach involves two tunings, which have “ordinary” implications, with one controlling smoothness and the other determining the structure of clustering. One possibility is to conduct a two-dimensional grid search. Here we adopt the alternative proposed in [48], which has two steps and a lower computational cost. In particular, in the first step, we set γ2 = 0 and select the optimal γ1 by minimizing:
where and with .
In the second step, we fix the value of γ1 at the optimal and select γ2 by minimizing
where and .
2.3. Computation
We develop an effective algorithm based on the ADMM technique. Specifically, we first reformulate (3) as
where Δ(r) = {δ = (i1, i2) : 1 ≤ i1 < i2 ≤ N} and Δ(c) = {δ = (j1, j2) : 1 ≤ j1 < j2 ≤ q}. Optimizing the constrained objective function is equivalent to optimizing the augmented Lagrangian function:
| (4) |
where θ is a small positive constant, , , , and . Here we introduce the dual variables and corresponding to the pair δ in Δ(r) and Δ(c), and the cardinality of Δ(r) and Δ(c) are denoted by |Δ(r)| and |Δ(c)|.
We consider an iterative algorithm, where the updates in step m + 1 are:
| (5) |
More specifically, when optimizing over β, we consider
| (6) |
where , , , , is an N × 1 zero vector except that its ith element is 1, is a q × 1 zero vector except that its jth element is 1, ⊗ is the Kronecker product, and Ip is a p × p identity matrix. Denote , , , and . Then the update for β is
| (7) |
where vec(Z) is the vectorization of matrix Z by columns.
For Hr, we consider
| (8) |
Denote . With the KKT conditions of (8), we can get a closed form solution of Hr:
| (9) |
Similarly, denote , and we can get a closed form solution of Hc:
| (10) |
Consider the initial values ,, and , and and are set as zero. The ADMM based algorithm is summarized in Algorithm 1.
Algorithm 1.
| Input: | |
| Response vector Y, basis expansion design matrix U, and difference matrix M; | |
| Tuning parameters γ1 and γ2. Specific to MCP, regularization parameter τ; | |
| Output: | |
| Coefficient vector β, splitting variables Hr and Hc, and dual variables Λr and Λc; | |
| 1: repeat | |
| 2: for m = 0, 1, 2 … do | |
| 3: Update β by (7). | |
| 4: Update Hr by (9). | |
| 5: Update Hc by (10). | |
| 6: Update Λr and Λc by (5). | |
| 7: end for | |
| 8: until the stopping criteria are met, which are set as , , , and in our numerical study. | |
Proposition 1.
Denote the two primal residuals as and , and the two dual residuals as and . Then
This result establishes convergence of the proposed algorithm. In numerical analysis, we stop the algorithm and conclude convergence when , , and . Following [5], we set the tolerance parameters as follows:
| (11) |
Here ϵabs and ϵrel are predetermined small values, for example 10−3. In all of our numerical analysis, convergence is satisfactorily achieved within a small to moderate number of iterations. The code and example are publicly available at https://github.com/ruiqwy/Biclustering.
2.4. Statistical properties
For a vector , let . For a matrix Zs×h, let and . For any two sequences of real numbers {an} ≥ 1 and {bn} ≥ 1, denote bn ≪ an if bn/an = o(1). Let r be a positive integer, v ∈ (0, 1], and κ = r + v > 1.5. Let be the collection of functions g on , where the rth derivative g(r) exists and satisfies the Lipschitz condition with order v:
and C is a positive constant.
Define the following collections of index sets for clustering memberships: for samples, for covariates, and for both samples and covariates. Define . Let , , and be the sizes of , , and , respectively. Further define , and can be defined accordingly. Let ρ(t) = γ−1pτ(t, γ). Assume the following conditions.
(C1) for all kr ∈ {1, …, Kr}, kc ∈ {1, …, Kc}, and .
(C2) The distribution of ti,j,m’s, i ∈ {1, …, N}, j ∈ {1, …, q}, m ∈ {1, …, ni,j} follows a density function fT, which is absolutely continuous. There exist constants c1 and C1 such that .
(C3) ni,j’s are uniformly bounded for all i ∈ {1, …, N}, j ∈ {1, …, q}.
(C4) pτ (t, γ) is symmetric, non-decreasing, and concave in t for t ∈ [0, ∞]. There exists a constant 0 < a < ∞ such that ρ(t) is a constant for all t ≥ aγ, and ρ(0) = 0. ρ′(t) exists and is continuous except for a finite number of t and ρ′(0+) = 1.
(C5) Let , where ϵi,j,m’s are independent across (i, j) (among different individual observational vectors) and correlated across m (within the same (i, j)). Furthermore, there exist F > 0 and c2 > 0, such that for all i ∈ {1, …, N} and j ∈ {1, …, q},
Similar conditions have been assumed in the literature. The first condition in (C1) ensures that the Hölder’s condition is satisfied [36]. The second condition in (C1) pertains to the growth rate of the number of internal knots, in a way similar to [25] and [24]. Condition (C2) assumes the boundedness of the density function, similarly to [48] and others. Conditions similar to (C3) have been commonly made. In the analysis of high-dimensional data, conditions similar to (C4) have been common, and it is easy to verify that MCP and SCAD satisfy (C4). Condition (C5) gives the boundedness condition for the error terms, and a similar condition can be found in [11].
When the true clustering structure is known, the oracle estimator for β can be defined as
where is defined as the oracle estimator of based on . Let β* be the underlying true coefficient vector and be the true value of . For any L2-integrable function g, denote .
Theorem 1.
Assume that (C1)–(C5) hold. If and , then with probability at least 1 − 3KrKcp/(Nq),
where , and C* is a large constant.
This theorem establishes consistency of the oracle estimates with a high probability. Denote . We can further establish the following result.
Theorem 2.
Assume that (C1)–(C5) and conditions in Theorem 1 hold. If , , and , then there exists a local minimizer of L(β) satisfying
This theorem establishes that the oracle estimator is a local minimizer of the objective function with a high probability. The estimation consistency along with the separateness of the true functions can lead to the clustering consistency.
3. Simulation
We conduct simulation to assess performance of the proposed approach and gauge against the following alternatives: (a) the bKmeans method [1], which first fits each curve using B-splines and then clusters the estimated coefficients using the k-means technique by rows and columns, (b) the funHDDC method [33], which has been developed for multivariate functional data clustering based on latent mixture models. It has been applied to longitudinal data, and (c) the funLBM method [4], which has been developed for functional data biclustering based on latent block models. Here we note that the proposed and funLBM methods conduct biclustering directly, whereas the bKmeans and funHDDC methods have been originally designed for one-way clustering–hence they are applied twice to achieve both row and column clusterings. In addition, the funHDDC and funLBM methods are not directly applicable to functional data with unequal measurements. We apply imputation [26] to tackle this problem. As discussed in Section 1, biclustering methods for functional data are very limited. It is possible to modify other existing one-way functional clustering methods to achieve biclustering, however, this demands additional methodological developments. The three alternatives considered here have been chosen because of their closely related frameworks and competitive performance.
In evaluation, we examine both clustering and estimation accuracy. Specifically, when examining clustering accuracy, we consider the estimated numbers of row clusters , column clusters , and biclusters . In addition, we use the Rand index and adjusted Rand index to assess the accuracy of clustering, including RIr and ARIr for row clustering, RIc and ARIc for column clustering, and RIb and ARIb for biclustering. The Rand index is defined by RI = (TP + TN)/(TP + FP + FN + TN), where for example TP is the true positive count, defined as the number of sample pairs from the same cluster and assigned to the same cluster, and the other counts can be defined accordingly. As the Rand index tends to be large even under random clusterings, we also examine the adjusted Rand index defined as ARI = (RI − E(RI))/(max(RI) − E(RI)), which can partly correct this problem. To evaluate estimation accuracy, we examine the integrated squared error (ISE) defined as
We consider a total of Kb = 9 biclusters, which are formed by Kr = 3 sample (row) clusters and Kc = 3 covariate (column) clusters. with ti,j,m’s, m ∈ {1, …, 10}, equally spaced on [0, 1]. The nine true functional forms are g(1,1)(t) = cos(2πt), g(2,1)(t) = 1 − 2exp(−6t), g(3,1)(t) = −1.5t, g(1,2)(t) = 1 + sin(2πt), g(2,2)(t) = 2t2, g(3,2)(t) = t + 1, g(1,3)(t) = 2(sin(2πt) + cos(2πt)), g(2,3)(t) = 1 + t3, and . They are also graphically presented in Fig. 1. To better mimic real data, we allow a certain proportion (ζ) of the curves from each bicluster to have 20% missing measurements. When implementing the proposed approach, we choose smoothing splines with the number of internal knots J = 3. We also fix θ = 1 and τ = 3. In what follows, under Examples 1 and 2, N > q, whereas under Example 3, N = q. Under Examples 1–3, the random errors are independent, whereas under Example 4, they are correlated. Note that under Examples 1–4, simulation results are calculated based on automatic cluster selection. Example 5 is designed to investigate the performance of these methods when the numbers of clusters are correctly prespecified. A total of 100 replicates are simulated under each setting.
Fig. 1.

Example 1: Curves of observed data (black dotted), estimated (blue solid) by the proposed method, and true (red solid) functions with (a) N = 30 and (b) N = 90 for one replicate. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Example 1.
N = 30, 60, and 90. q = 9. The clusters are balanced, with each row cluster containing N/3 samples and each column cluster containing q/3 covariates. ζ = 0.3. The random errors are iid .
Example 2.
The settings are the same as in Example 1, except that the clusters are unbalanced. The row clusters have sizes 1:2:3, and the column clusters have sizes 2:3:4.
Example 3.
Set (N, q) = (30, 30), (39, 39), (45, 45), ζ = 0.3 and 0.4. The rest are the same as in Example 1.
Example 4.
The settings are similar to those under Example 1. The random errors are correlated with an AR(1) correlation structure, where AR stands for auto-correlation. Consider AR coefficient ϕ = 0.2 and 0.8, representing weak and strong correlations.
Example 5.
The settings are the same as those in Example 1. The difference is that the numbers of clusters are correctly prespecified instead of being selected by the BIC criterion.
Results for Example 1 are presented in Figs. 1 and 2 as well as Tables 1 and 2. More specifically, in Fig. 1, we show the true functions for all clusters as well as sample observed data and estimated functions. In Table 1, we summarize the numbers of identified row and column clusters as well as biclusters. In Table 2, we summarize the Rand and adjusted Rand index values. In Fig. 2, we present the boxplots of ISE (note that different panels have different ranges for the Y-axis). Results for Examples 2–5 are presented in the Supplementary section. Although different examples have different numerical results, overall, the advantage of the proposed approach is clearly observed. Consider for example Table 1 with N = 30. The proposed approach has the mean number of row clusters 2.83, compared to 2.76, 2.63, and 4.66 of the three alternatives. When N = 90, the proposed approach has the mean number of biclusters 8.71, compared to 3.51, 6.33, and 10.68 of the three alternatives. The improved clustering accuracy is further proved by the Rand index values in Table 2. For example with N = 90, the adjusted Rand index value for biclustering with the proposed approach is 0.964, compared to 0.358, 0.686, and 0.764 with the three alternatives. Fig. 2 shows that as N increases, estimation accuracy of the proposed approach (and two alternatives) increases. Under all three N values, the proposed approach has significantly smaller ISE values. Moreover, comparing the results of Example 5 with Example 1, we observe similar performance and that the proposed approach still performs better when the numbers of clusters are correctly prespecified.
Fig. 2.

Example 1: Boxplots of ISE with (a) the proposed method, (b) bKmeans, (c) funHDDC, and (d) funLBM.
Table 2.
Example 1: Mean and standard error (shown in parentheses) of RIr. ARIr. RIc. ARIc. RIb. and ARIb based on 100 replicates.
| N | Method | RIr | ARIr | RIc | ARIc | RIb | ARIb |
|---|---|---|---|---|---|---|---|
|
| |||||||
| 30 | Proposed | 0.940 (0.189) | 0.911 (0.278) | 0.936 (0.203) | 0.910 (0.279) | 0.927 (0.238) | 0.909 (0.281) |
| bKmeans | 0.860 (0.173) | 0.740 (0.290) | 0.296 (0.163) | 0.052 (0.194) | 0.673 (0.174) | 0.307 (0.167) | |
| funHDDC | 0.744 (0.031) | 0.493 (0.074) | 0.940 (0.107) | 0.880 (0.215) | 0.889 (0.051) | 0.598 (0.120) | |
| funLBM | 0.913 (0.053) | 0.786 (0.109) | 0.913 (0.064) | 0.746 (0.153) | 0.951 (0.029) | 0.708 (0.113) | |
| 60 | Proposed | 0.966 (0.138) | 0.947 (0.208) | 0.963 (0.152) | 0.945 (0.212) | 0.959 (0.177) | 0.943 (0.216) |
| bKmeans | 0.887 (0.132) | 0.780 (0.248) | 0.316 (0.195) | 0.077 (0.239) | 0.704 (0.142) | 0.339 (0.191) | |
| funHDDC | 0.767 (0.021) | 0.546 (0.049) | 0.998 (0.025) | 0.995 (0.050) | 0.922 (0.014) | 0.692 (0.044) | |
| funLBM | 0.918 (0.110) | 0.828 (0.221) | 0.929 (0.119) | 0.840 (0.257) | 0.953 (0.052) | 0.796 (0.198) | |
| 90 | Proposed | 0.978 (0.117) | 0.966 (0.176) | 0.975 (0.131) | 0.965 (0.178) | 0.971 (0.154) | 0.964 (0.180) |
| bKmeans | 0.886 (0.134) | 0.778 (0.251) | 0.342 (0.226) | 0.109 (0.279) | 0.709 (0.152) | 0.358 (0.227) | |
| funHDDC | 0.769 (0.017) | 0.551 (0.040) | 0.990 (0.049) | 0.980 (0.098) | 0.919 (0.025) | 0.686 (0.061) | |
| funLBM | 0.909 (0.121) | 0.813 (0.241) | 0.908 (0.130) | 0.793 (0.276) | 0.944 (0.056) | 0.764 (0.210) | |
4. Applications
Here we analyze two time-course gene expression data. Although in a sense the data characteristics are similar, the two data analyses may serve different purposes. In particular, the first dataset is “older”, which has been analyzed multiple times in the literature, and has a clearer sample clustering structure. In contrast, the second dataset is more recent, and its analysis may lead to a higher practical impact.
4.1. T-cell data
This data has been generated in a study of T-cell activation [31]. It is publicly available in the R package longitudinal (http://www.strimmerlab.org/software/longitudinal/) and contains two subsets: tcell.10 and tcell.34. The first subset contains measurements for 10 samples and 58 genes at 10 unequally spaced time points, t ∈ {0, 2, 4, 6, 8, 18, 24, 32, 48, 72}, whereas the second subset contains measurements for 34 samples and the same genes at the same time points. In [31], the distinctions between the two subsets have been noted, and they have been combined for analysis. Prior to analysis, we conduct data processing, including gene expression normalization using the method developed in [29] and linearly transforming the observed times to [0, 1], and set the knots at 0.06, 0.2, and 0.4 as well as the order as 3.
The proposed approach identifies two sample clusters, with sizes 10 and 34, which exactly match the original subset structure. The distinctions of the samples in the two subsets have been noted in [31]. As such, they are supposed to belong to different clusters. In this sense, our “finding”, although as expected, is re-assuring. In addition, eight gene clusters are identified, among which there are four trivial clusters with sizes one. The four non-trivial clusters have sizes 27, 18, 5, and 4. Detailed information on the gene clusters is available from the authors. The eight non-trivial biclusters are presented in Fig. 3. Biclusters 1–4 correspond to tcell.10, and the rest correspond to tcell.34. It is observed that the estimated functions clearly differ across biclusters. The observed temporal trends are highly similar to those reported in [28], which provides support to the validity of our approach.
Fig. 3.

Analysis of T-cell data: Curves of observed data (black dotted) and estimated functions (blue solid) for the eight non-trivial bicluster, as well as yellow points indicating the estimated values at t ∈ {0, 2, 4, 6, 8, 18, 24, 32, 48, 72} by the proposed method. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
The three alternatives are also applied. The bKmeans approach identifies three sample clusters (with sizes 10, 17, and 17) and four gene clusters (with sizes 9, 15, 19, and 15). Compared to the proposed approach, the adjusted Rand index values are 0.441 (sample), 0.619 (gene), and 0.430 (bicluster). The funHDDC approach identifies two sample clusters (with sizes 10 and 34) and two gene clusters (with sizes 9 and 49). Compared to the proposed approach, the adjusted Rand index values are 1.000 (sample), 0.286 (gene), and 0.452 (bicluster). The funLBM approach identifies two sample clusters (with sizes 10 and 34) and six gene clusters (with sizes 9, 4, 12, 5, 18, and 10). Compared to the proposed approach, the adjusted Rand index values are 1.000 (sample), 0.586 (gene), and 0.646 (bicluster). Unlike for the simulated data, it is difficult to objectively evaluate the accuracy of clustering. However, for the proposed approach, the matching with the original sample distinction and published findings can provide a strong support, which is not shared by the alternatives.
4.2. Vaccine data
This data is generated in a relative recent study [43] and available at GEO with identifier GSE124533. The study settings have been described in detail in [43]. Briefly, it concerns with the time course of whole blood gene expressions, and the samples are healthy adults residing in an inpatient unit. The samples have been randomized into three protocols (305 A, 305B and 305C). Within each protocol, samples have been randomized to receive immunization via either vaccine or saline placebo. The treatments have been referred to as YFV and VZV (under 305 A), HBV1 and HBV3 (under 305B), and TIV and ATIV (under 305C). In this experiment, gene expression levels are measured at t ∈ {1, 2, 3, 4, 5, 7, 14, 21, 28} days after immunization. A total of 43 genes have been studied, which are selected from two gene modules defined in the published literature [6,22]. Prior to analysis, gene expression normalization, rescaling of the time points (to the unit interval), and the selection of knots order are conducted in a similar way as in the previous data analysis.
Two sets of analysis are conducted. In the first set, we focus on the samples under 305 A, which contain 20 samples treated with VZV and 20 with YFV. In the second set, we pool all 122 samples from the three protocols. We note that although the gene time courses have been analyzed in [43], there is insufficient attention to clustering. Complementary to the existing literature, our clustering analysis can potentially review sample heterogeneity as well as coordination among genes.
Results for the first set of analysis are presented in Fig. 5, where we observe two sample clusters and two gene clusters, leading to four biclusters. Here the two sample clusters match the VZV and YFV experimental conditions, which provides support to the validity of our analysis. The two gene clusters contain 27 and 16 members, respectively, which are very close to the module structure. Fig. 5 shows that the temporal trends of the four clusters differ significantly, with the level of variation and position of “peak” varying significantly. The observed trends are similar to those reported in [43]. We also refer to [43] for phamarcodynamic interpretations of the findings.
In the second set of analysis, we identify four sample clusters, with sizes 96, 5, 20, and 1, respectively. In what follows, we focus on the non-trivial clusters. Clusters 1 and 2 contain samples treated with VZV, HBV1, HBV3, ATIV, and TIV, and cluster 3 contains samples treated with YFV. In the original publication, there has been little attention to sample similarity/difference across protocols. Our analysis may suggest the significant difference between YFV and other treatments as well as the relative similarity of the five treatments (YFV excluded). Our analysis leads to two gene clusters, with sizes 25 and 18, respectively. This structure is again very similar to the module structure. The overall six non-trivial biclusters are shown in Fig. 4, where we observe significant across-cluster differences. Among the six patterns, biclusters 5 and 6 are similar to those observed in the first set of analysis, where biclusters 1–4 are relatively different.
Fig. 4.

Analysis of vaccine data with samples under all three protocols: Curves of observed data (black dotted) and estimated functions (blue solid) for non-trivial clusters, as well as yellow points indicating the estimated values at t ∈ {1, 2, 3, 4, 5, 7, 14, 21, 28} by the proposed method. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
The three alternatives are also applied. The bKmeans approach identifies three sample clusters (with sizes 20, 27, and 75) and two gene clusters (with sizes 26 and 17). Compared to the proposed approach, the adjusted Rand index values are 0.551 (sample), 0.907 (gene), and 0.666 (bicluster). The funHDDC approach identifies two sample clusters (with sizes 20 and 102) and three gene clusters (with sizes 26, 12 and 5). Compared to the proposed approach, the adjusted Rand index values are 0.819 (sample), 0.774 (gene), and 0.758 (bicluster). The funLBM approach identifies four sample clusters (with sizes 20, 39, 24 and 39) and two gene clusters (with sizes 20, 23). Compared to the proposed approach, the adjusted Rand index values are 0.276 (sample), 0.818 (gene), and 0.386 (bicluster).
5. Discussion
In this article, we have conducted the biclustering analysis when functions (to be exact, their realizations at discrete time points), as opposed to scalars, are present. The data structure fits time-course gene expression and other experiments. The analysis objective is considerably more complex than the biclustering analysis of scalars and one-way clustering of functions. We have developed a novel approach based on the penalized fusion technique. Methodologically, it differs significantly from the existing biclustering and fusion approaches. Theoretically, it has the much desired consistency property, making it advantageous over some of the existing alternatives that do not have theoretical support. Numerically, it has generated more accurate clustering and estimation in simulation and led to different findings in data analysis.
In our estimation, we have adopted the penalized smoothing technique. An alternative, which may be computationally simpler, is to take fewer basis functions, with which we can eliminate the smoothness penalty. Theoretically and numerically, we expect similar performance. The fusion technique involves pairwise differences/penalties, which may incur higher computational cost when N and/or q are large. In our simulation, we have considered moderate values, which match our data analysis. It will be of interest to develop computationally more scalable approaches/algorithms, for example via model averaging. This is beyond our scope and will be postponed to the future. In data analysis, findings with certain support have been made. In the literature, most existing studies are on the “static” functionalities of genes. It will be important to further understand the dynamics of gene expressions and more solidly interpret the findings.
Supplementary Material
Table 1.
Example 1: Mean, median, and standard error (SE) of , , and as defined in Section 2, as well as the percentage of identifying the corresponding true numbers based on 100 replicates.
| N | Method |
|
|
|
|||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mean | Median | SE | Per | Mean | Median | SE | Per | Mean | Median | SE | Per | ||
|
| |||||||||||||
| 30 | Proposed | 2.83 | 3.00 | 0.53 | 0.90 | 2.83 | 3.00 | 0.53 | 0.90 | 8.29 | 9.00 | 2.18 | 0.90 |
| bKmeans | 2.76 | 3.00 | 0.64 | 0.66 | 1.13 | 1.00 | 0.46 | 0.05 | 3.09 | 3.00 | 1.34 | 0.03 | |
| funHDDC | 2.63 | 2.00 | 0.86 | 0.28 | 2.76 | 3.00 | 0.43 | 0.76 | 7.27 | 6.00 | 2.70 | 0.21 | |
| funLBM | 4.66 | 5.00 | 0.64 | 0.09 | 4.43 | 5.00 | 0.83 | 0.22 | 20.88 | 25.00 | 5.31 | 0.09 | |
| 60 | Proposed | 2.91 | 3.00 | 0.43 | 0.93 | 2.90 | 3.00 | 0.41 | 0.94 | 8.61 | 9.00 | 1.74 | 0.93 |
| bKmeans | 2.86 | 3.00 | 0.57 | 0.66 | 1.18 | 1.00 | 0.54 | 0.07 | 3.43 | 3.00 | 1.97 | 0.05 | |
| funHDDC | 2.20 | 2.00 | 0.64 | 0.04 | 2.99 | 3.00 | 0.10 | 0.99 | 6.58 | 6.00 | 1.92 | 0.04 | |
| funLBM | 3.42 | 3.00 | 0.64 | 0.66 | 3.24 | 3.00 | 0.55 | 0.82 | 11.15 | 9.00 | 3.31 | 0.55 | |
| 90 | Proposed | 2.93 | 3.00 | 0.36 | 0.96 | 2.93 | 3.00 | 0.36 | 0.96 | 8.71 | 9.00 | 1.45 | 0.96 |
| bKmeans | 2.83 | 3.00 | 0.51 | 0.74 | 1.23 | 1.00 | 0.58 | 0.08 | 3.51 | 3.00 | 1.87 | 0.08 | |
| funHDDC | 2.14 | 2.00 | 0.38 | 0.12 | 2.96 | 3.00 | 0.20 | 0.96 | 6.33 | 6.00 | 1.17 | 0.11 | |
| funLBM | 3.25 | 3.00 | 0.46 | 0.76 | 3.30 | 3.00 | 0.54 | 0.74 | 10.68 | 9.00 | 2.03 | 0.52 | |
Acknowledgments
We thank the Editor-in-Chief, Managing Editor, and two reviewers for insightful comments and suggestions. This work was supported by the National Natural Science Foundation of China (11971404, 72071169), Humanity and Social Science Youth Foundation of Ministry of Education of China (19YJC910010), Basic Scientific Project 71988101 of National Science Foundation of China, 111 Project (B13028), National Institutes of Health (CA204120), and National Science Foundation (1916251).
Appendix A. Proofs
Proof of Proposition 1.
By the definitions of and , for any Hr and Hc, we have
Fig. 5.

Analysis of vaccine data with samples under 305A: Curves of observed data (black dotted) and estimated functions (blue solid), as well as yellow points indicating the estimated values at t ∈ {1, 2, 3, 4, 5, 7, 14, 21, 28} by the proposed method. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Let and . We can define
and then .
For any integer n, we have and , and then
Since the augmented Lagrangian function Lθ(β, Hr, Hc, ΛrΛc) is differentiable with respect to β and is convex with respect to each and . By Theorem 4.1 of [38], there exists a limit point of , denoted by . Then we have
For all t ≥ 0, we have
Thus
Besides, by the definition of β(m+1), we have that
Then we can obtain
By and , we have
Therefore . □
Let and . Then . Denote the number of internal knots as J and then J = p − d. Recall that .
Lemma 1.
Under Condition (C1), there exists a spline approximation of the true function for kr ∈ {1, …, Kr} and kc ∈ {1, …, Kc}, such that
Proof.
Lemma 1 follows from Corollary 6.21 of [34]. This lemma has been used in a number of studies that involve spline expansion [25,42]. We omit the proof here. □
Lemma 2. Under Conditions (C1)–(C3) and b ≫ J−κ, there exists a constant C2 > 0 such that for all , such that
when N and q are sufficiently large.
Proof.
By the triangular inequality, we have
| (A.1) |
Besides, by Theorem 5.4.2 of [13], Condition (C2), and the definition of the rescaled B-spline basis, for any vector , there exists a constant C2 > 0 such that
| (A.2) |
Combining (A.1), (A.2), and Lemma 1, we have
where the third inequality is obtained when N and q are sufficiently large since b ≫ J−κ. □
Lemma 3
(Bernstein’s Inequality, Lemma 2.2.11 in [39]). For independent random variables Y1, …, Yn with means 0 and for some constants M, vi, and every m ≥ 2,
where v = v1 + · · · + vn.
Proof of Theorem 1.
Given , when the true block memberships are known, the oracle estimators for all βi,j’s are the same if . Thus we can explore the properties of by examining the properties of the oracle common coefficient vector , which is defined as
and
where , . The corresponding true B-spline coefficient vector is denoted by . Note that
where is between and . Then we have
Hence
| (A.3) |
By Lemma A.8 of [41], Conditions (C1) and (C2), we can derive that there exists a constant C3 > 0 such that for any 1 ≤ kr ≤ Kr, 1 ≤ kc ≤ Kc,
| (A.4) |
Besides, note that
| (A.5) |
Since the rescaled B-spline values are finite, there exists constant M1 > 0 such that Ul,p(t) ≤ M1 for l ∈ {1, …, p}. Let U(i,j)·l denote the lth column of U(i,j), and we verify the condition of Lemma 3 by Condition (C5)
Applying Lemma 3, we have
| (A.6) |
where and .
Let denote the lth column of . For some constant 0 < C < ∞, combining Condition (C5) and (A.6), we have
Hence, we have that with probability at least 1 − 2p/(Nq),
| (A.7) |
By Lemma 1, there exists a constant M2 > 0 such that
| (A.8) |
In addition,
| (A.9) |
Thus by (A.5), (A.7), (A.8), and (A.9), for any 1 ≤ kr ≤ Kr, 1 ≤ kc ≤ Kc, with probability at least 1 − 2p/(Nq),
By Condition (C1) and , when N and q are sufficiently large, we have
Hence, for any 1 ≤ kr ≤ Kr, 1 ≤ kc ≤ Kc, with probability at least 1 − 2p/(Nq),
where C4 is a large constant. Together with (A.3) and (A.4), for any 1 ≤ kr ≤ Kr, 1 ≤ kc ≤ Kc,
By the Bonferroni’s inequality, we have
By Lemma 1 and (A.2), we have
where . That is,
where . □
Proof of Theorem 2.
Let and . Define
where with if , with if . Let ,
We define two mappings, and , and the two subspaces are defined by
For every , we have , and for every , we have . Hence
| (A.10) |
Consider the neighborhood of β*:
By the result in Theorem 1, there is an event E1 such that on E1,
and . Hence on E1. For any , let . Inspired by [27], we show that is a strictly local minimizer of objective function (3) with probability tending to 1 through the following two steps:
(i) On E1, for any β ∈ Θ and .
(ii) There is an event E2 such that . On E1 ∩ E2, there is a neighborhood of , denoted by Θ′, such that for any β ∈ Θ′ ∩ Θ for sufficiently large N and q.
Therefore, by the results in (i) and (ii), we have for any β ∈ Θ′ ∩ Θ and , so that is a strictly local minimizer of L(β) on E1 ∩ E2 with P(E1 ∩ E2) ≥ 1 − 3KrKpp/(Nq) − c2/(Nq) for sufficiently large N and q.
Firstly, we prove the result in (i). Let and for . Since
and
| (A.11) |
by Lemma 2, for any
The last inequality follows from the assumption that Similarly, for any , we have
Hence by Condition (C4), , a constant, and hence for all β ∈ Θ. Since is the unique global minimizer of , for all , and thus for all . By (A.10), we have and . Therefore for all , and the result (i) is proved.
Next we prove result (ii). For a positive sequence νn, let
and Pen(β) = Penr(β) + Penc(β). For β ∈ Θ′ ∩ Θ, by Taylor’s expansion, we have
| (A.12) |
where
with and for some s ∈ (0, 1).
Firstly, we have
When i1, , . Thus
As shown in Theorem 1, . Since . For , , , we have
and thus . Therefore,
Similarly to (A.11), and . Then we have
Hence by the concavity of ρ(·). As a result,
| (A.13) |
Next we consider Ω3. Similarly to the derivation of (A.13), we can derive
| (A.14) |
Lastly for Ω1, we have
and
where . Note that
By Lemma 1 . Moreover, . With the Bonferroni’s inequality, Markov’s inequality, and Condition (C5), we have
Together with Conditions (C1) and (C3), we have
| (A.15) |
holds with probability at least 1 − c2/(Nq). Let νn = o(1), then and . Since , then by (A.12)–(A.15)
holds with probability at least 1 − c2/(Nq), which completes the proof of result (ii). □
Footnotes
CRediT authorship contribution statement
Kuangnan Fang: Methodology, Formal analysis Writing – original draft. Yuanxing Chen: Data curation, Investigation, Software, Writing – original draft. Shuangge Ma: Conceptualization, Methodology, Writing – review & editing. Qingzhao Zhang: Conceptualization, Methodology, Validation, Writing – review & editing, Supervision.
Appendix B. Supplementary data
Supplementary material related to this article can be found online at https://doi.org/10.1016/j.jmva.2021.104874. The Supplementary section contains additional tables and figures for Examples 2–5.
References
- [1].Abraham C, Cornillon PA, Matzner-Løber E, Molinari N, Unsupervised curve clustering using B-splines, Scand. J. Stat 30 (3) (2003) 581–595. [Google Scholar]
- [2].Aneiros G, Cao R, Fraiman R, Genest C, Vieu P, Recent advances in functional data analysis and high-dimensional statistics, J. Multivariate Anal. 170 (2019) 3–9. [Google Scholar]
- [3].Biau G, Devroye L, Lugosi G, On the performance of clustering in hilbert spaces, IEEE Trans. Inform. Theory 54 (2) (2008) 781–790. [Google Scholar]
- [4].Bouveyron C, Bozzi L, Jacques J, Jollois F-X, The functional latent block model for the co-clustering of electricity consumption curves, J. R. Stat. Soc. Ser. C. Appl. Stat 67 (4) (2018) 897–915. [Google Scholar]
- [5].Boyd S, Parikh N, Chu E, Peleato B, Eckstein J, Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends Mach. Learn 3 (1) (2011) 1–122. [Google Scholar]
- [6].Chaussabel D, Quinn C, Shen J, Patel P, Glaser C, Baldwin N, Stichweh D, Blankenship D, Li L, A modular analysis framework for blood genomics studies: application to systemic lupus erythematosus, Immunity 29 (1) (2008) 150–164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Chen J, Zhang S, Integrative analysis for identifying joint modular patterns of gene-expression and drug-response data, Bioinformatics 32 (11) (2016) 1724–1732. [DOI] [PubMed] [Google Scholar]
- [8].Chi EC, Lange K, Splitting methods for convex clustering, J. Comput. Graph. Statist. 24 (4) (2015) 994–1013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Chiou J-M, Li P-L, Functional clustering and identifying substructures of longitudinal data, J. R. Stat. Soc. Ser. B Stat. Methodol 69 (4) (2007) 679–699. [Google Scholar]
- [10].Chiou J-M, Li P-L, Correlation-based functional clustering via subspace projection, J. Amer. Statist. Assoc. 103 (484) (2008) 1684–1692. [Google Scholar]
- [11].Chu W, Li R, Matthew R, Feature screening for time-varying coefficient models with ultrahigh dimensional longitudinal data, Ann. Appl. Stat 10 (2) (2016) 596–617. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Coffey N, Hinde J, Holian E, Clustering longitudinal profiles using P-splines and mixed effects models applied to time-course gene expression data, Comput. Statist. Data Anal 71 (2014) 14–29. [Google Scholar]
- [13].DeVore RA, Lorentz GG, Constructive Approximation: Polynomials and Splines Approximation, Springer-Verlag, Berlin, 1993. [Google Scholar]
- [14].Fan J, Li R, Variable selection via nonconcave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc. 96 (456) (2001) 1348–1360. [Google Scholar]
- [15].Goia A, Vieu P, An introduction to recent advances in high/infinite dimensional statistics, J. Multivariate Anal. 146 (146) (2016) 1–6. [Google Scholar]
- [16].Hejblum BP, Skinner J, Thiébaut R, Time-course gene set analysis for longitudinal gene expression data, PLoS Comput. Biol 11 (6) (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Jacques J, Preda C, Functional data clustering: a survey, Adv. Data Anal. Classif 8 (3) (2014) 231–255. [Google Scholar]
- [18].Jacques J, Preda C, Model-based clustering for multivariate functional data, Comput. Statist. Data Anal 71 (2014) 92–106. [Google Scholar]
- [19].Jain AK, Data clustering: 50 years beyond K-means, Int. Conf. Pattern Recognit 31 (8) (2010) 651–666. [Google Scholar]
- [20].James GM, Sugar CA, Clustering for sparsely sampled functional data, J. Amer. Statist. Assoc 98 (462) (2003) 397–408. [Google Scholar]
- [21].Kerr G, Ruskin HJ, Crane M, Doolan P, Techniques for clustering gene expression data, Comput. Biol. Med. 38 (3) (2008) 283–293. [DOI] [PubMed] [Google Scholar]
- [22].Li S, Rouphael N, Duraisingham S, Romero-Steiner S, Presnell S, Davis CW, Schmidt DS, Johnson SE, Molecular signatures of antibody responses derived from a systems biology study of five human vaccines, Nat. Immunol. 15 (2) (2014) 195–204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Ling N, Vieu P, Nonparametric modelling for functional data: selected survey and tracks for future, Statistics 52 (4) (2018) 934–949. [Google Scholar]
- [24].Liu L, Lin L, Subgroup analysis for heterogeneous additive partially linear models and its application to car sales data, Comput. Statist. Data Anal. 138 (2019) 239–259. [Google Scholar]
- [25].Liu X, Wang L, Liang H, Estimation and variable selection for semiparametric additive partial linear models, Statist. Sinica 21 (3) (2011) 1225–1248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Ma P, Castillo-Davis CI, Zhong W, Liu JS, A data-driven clustering method for time course gene expression data, Nucleic Acids Res. 34 (4) (2006) 1261–1269. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Ma S, Huang J, A concave pairwise fusion approach to subgroup analysis, J. Amer. Statist. Assoc. 112 (517) (2017) 410–423. [Google Scholar]
- [28].Mankad S, Michailidis G, Biclustering three-dimensional data arrays with plaid models, J. Comput. Graph. Statist. 23 (4) (2014) 943–965. [Google Scholar]
- [29].Opgen-Rhein R, Strimmer K, Inferring gene dependency networks from genomic longitudinal data: a functional data approach, REVSTAT 4 (2006) 53–65. [Google Scholar]
- [30].Peng J, Müller H-G, Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions, Ann. Appl. Stat 2 (3) (2008) 1056–1077. [Google Scholar]
- [31].Rangel C, Angus J, Ghahramani Z, Lioumi M, Sotheran E, Gaiba A, Wild DL, Falciani F, Modeling T-cell activation using gene expression profiling and state-space models, Bioinformatics 20 (9) (2004) 1361–1372. [DOI] [PubMed] [Google Scholar]
- [32].Ruppert D, Selecting the number of knots for penalized splines, J. Comput. Graph. Statist. 11 (4) (2002) 735–757. [Google Scholar]
- [33].Schmutz A, Jacques J, Bouveyron C, Cheze L, Martin P, Clustering multivariate functional data in group-specific functional subspaces, Comput. Statist (2020) 1–31. [Google Scholar]
- [34].Schumaker LL, Spline Functions: Basic Theory, Wiley, New York, 2007. [Google Scholar]
- [35].Slimen YB, Allio S, Jacques J, Model-based co-clustering for functional data, Neurocomputing 291 (2018) 97–108. [Google Scholar]
- [36].Stone CJ, The dimensionality reduction principle for generalized additive models, Ann. Statist 14 (2) (1986) 590–606. [Google Scholar]
- [37].Suarez AJ, Subhashis G, BayesIan clustering of functional data using local features, Bayesian Anal 11 (1) (2016) 71–98. [Google Scholar]
- [38].Tseng P, Convergence of a block coordinate descent method for nondifferentiable minimization, J. Optim. Theory Appl. 109 (3) (2001) 475–494. [Google Scholar]
- [39].Van Der Vaart AW, Wellner JA, Weak Convergence and Empirical Processes, Springer; New York, 1996. [Google Scholar]
- [40].Wang J-L, Chiou J-M, Müller H-G, Functional data analysis, Annu. Rev. Stat. Appl 3 (1) (2016) 257–295. [Google Scholar]
- [41].Wang L, Yang L, Spline-backfitted kernel smoothing of nonlinear additive autoregression model, Ann. Statist 35 (6) (2007) 2474–2503. [Google Scholar]
- [42].Wang HJ, Zhu Z, Zhou J, Quantile regression in partially linear varying coefficient models, Ann. Statist 37 (6B) (2009) 3841–3866. [Google Scholar]
- [43].Weiner J, Lewis DJM, Maertzdorf J, Mollenkopf H-J, Characterization of potential biomarkers of reactogenicity of licensed antiviral vaccines: randomized controlled clinical trials conducted by the biovacsafe consortium, Sci. Rep. 9 (1) (2019) 20362. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Wu C, Kwon S, Shen X, Pan W, A new algorithm and theory for penalized regression-based clustering, J. Mach. Learn. Res 17 (1) (2015) 6479–6503. [PMC free article] [PubMed] [Google Scholar]
- [45].Xie J, Ma A, Fennell A, Ma Q, Zhao J, It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data, Brief. Bioinform. 20 (4) (2019) 1450–1465. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [46].Xu R, Wunsch D, Survey of clustering algorithms, IEEE Trans. Neural Netw. 16 (3) (2005) 645–678. [DOI] [PubMed] [Google Scholar]
- [47].Zhang C-H, Nearly unbiased variable selection under minimax concave penalty, Ann. Statist 38 (2) (2010) 894–942. [Google Scholar]
- [48].Zhu X, Qu A, Cluster analysis of longitudinal profiles with subgroups, Electron. J. Stat. 12 (2018) 171–193. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
