Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Oct 18.
Published in final edited form as: Adv Neural Inf Process Syst. 2012;2012:296–304.

Multi-task Vector Field Learning

Binbin Lin 1, Sen Yang 2, Chiyuan Zhang 1, Jieping Ye 2, Xiaofei He 1
PMCID: PMC4201856  NIHMSID: NIHMS497483  PMID: 25332642

Abstract

Multi-task learning (MTL) aims to improve generalization performance by learning multiple related tasks simultaneously and identifying the shared information among tasks. Most of existing MTL methods focus on learning linear models under the supervised setting. We propose a novel semi-supervised and nonlinear approach for MTL using vector fields. A vector field is a smooth mapping from the manifold to the tangent spaces which can be viewed as a directional derivative of functions on the manifold. We argue that vector fields provide a natural way to exploit the geometric structure of data as well as the shared differential structure of tasks, both of which are crucial for semi-supervised multi-task learning. In this paper, we develop multi-task vector field learning (MTVFL) which learns the predictor functions and the vector fields simultaneously. MTVFL has the following key properties. (1) The vector fields MTVFL learns are close to the gradient fields of the predictor functions. (2) Within each task, the vector field is required to be as parallel as possible which is expected to span a low dimensional subspace. (3) The vector fields from all tasks share a low dimensional subspace. We formalize our idea in a regularization framework and also provide a convex relaxation method to solve the original non-convex problem. The experimental results on synthetic and real data demonstrate the effectiveness of our proposed approach.

1 Introduction

In many applications, labeled data are expensive and time consuming to obtain while unlabeled data are abundant. The problem of using unlabeled data to improve the generalization performance is often referred to as semi-supervised learning (SSL). It is well known that in order to make semi-supervised learning work, some assumptions on the dependency between the predictor function and the marginal distribution of data are needed. The manifold assumption [15, 5], which has been widely adopted in the last decade, states that the predictor function lives in a low dimensional manifold of the marginal distribution.

Multi-task learning was proposed to enhance the generalization performance by learning multiple related tasks simultaneously. The abundant literature on multi-task learning demonstrates that the learning performance indeed improves when the tasks are related [4, 6, 7]. The key step in MTL is to find the shared information among tasks. Evgeniou et al. [12] proposed a regularization MTL framework which assumes all tasks are related and close to each other. Ando and Zhang [2] proposed a structural learning framework, which assumed multiple predictors for different tasks shared a common structure on the underlying predictor space. An alternating structure optimization (ASO) method was proposed for linear predictors which assumed the task parameters shared a low dimensional subspace. Arvind et al. [1] generalized the idea of sharing a subspace by assuming that all task parameters lie on a manifold.

In this paper, we consider semi-supervised multi-task learning (SSMTL). Although many SSL methods have been proposed in the literature [10], these methods are often not directly amenable to MTL extensions [18]. Liu et al. [18] proposed an SSMTL framework which encouraged related models to have similar parameters. However they require that related tasks share similar representations [9]. Wang et al. [19] proposed another SSMTL method under the assumption that the tasks are clustered [4, 14]. The cluster structure is characterized by task parameters of linear predictor functions. For linear predictors, the task parameters they used are actually the constant gradient of the predictor functions which form a first order differential structure. For general nonlinear predictor functions, we show it is more natural to capture the shared differential structure using vector fields.

In this paper, we propose a novel SSMTL formulation using vector fields. A vector field is a smooth mapping from the manifold to the tangent spaces which can be viewed as a directional derivative of functions on the manifold. In this way, a vector field naturally characterizes the differential structure of functions while also providing a natural way to exploit the geometric structure of data; these are the two most important aspects for SSMTL. Based on this idea, we develop the multi-task vector field learning (MTVFL) method which learns the prediction functions and the vector fields simultaneously. The vector fields we learned are forced to be close to the gradient fields of predictor functions. In each task, the vector field is required to be as parallel as possible. We say that a vector field is parallel if the vectors are parallel along the geodesics on the manifold. In extreme cases, when the manifold is a linear (or an affine) space, then the geodesics of such manifold are straight lines. In such cases, the space spanned by these parallel vectors is a simply one-dimensional subspace. Thus when the manifold is flat (i.e., with zero curvature) or the curvature is small, it is expected that these parallel vectors concentrate on a low dimensional subspace. As an example, we can see from Fig. 1 that the parallel field on the plane spans a one-dimensional subspace and the parallel field on the Swiss roll spans a two-dimensional subspace. For the multi-task case, these vector fields share a low dimensional subspace. In addition, we assume these vector fields share a low dimensional subspace among all tasks. In essence, we use a first-order differential structure to characterize the shared structure of tasks and use a second-order differential structure to characterize the specific parts of tasks. We formalize our idea in a regularization framework and provide a convex relaxation method to solve the original non-convex problem. We have performed experiments using both synthetic and real data; results demonstrate the effectiveness of our proposed approach.

Figure 1.

Figure 1

Examples of parallel fields. The parallel field on ℝ2 spans a one dimensional subspace and the parallel field on the Swiss roll spans a two dimensional subspace.

2 Multi-task Learning: A Vector Field Approach

In this section, we first introduce vector fields and then present multi-task learning via exploring shared structure using vector fields.

2.1 Multi-task Learning Setting and Vector Fields

We first introduce notation and symbols. We are given m tasks, with nl samples xil, i = 1, …, nl for the l-th task. The total number of samples is n = Σl nl. For the l-th task, we assume the data { xil} are on a dl-dimensional manifold Inline graphic. All of these data manifolds are embedded in the same D-dimensional ambient space ℝD. It is worth noting that the dimensions of different data manifolds are not required to be the same. Without loss of generality, we assume the first nl(nl<nl) samples are labeled, with yjl for regression and yjl{-1,1} for classification, j=1,,nl. The total number of labeled samples is n=lnl. For the l-th task, we denote the regression function or classification function by fl. The goal of semi-supervised multi-task learning is to learn the function value on unlabeled data, i.e., fl(xil),nl+1inl.

Given the l-th task, we first construct a nearest neighbor graph by either ε-neighborhood or k nearest neighbors. Let xil~xjl denote that xil and xjl are neighbors. Let wijl denote the weight which measures the similarity between xil and xjl. It can be approximated by the heat kernel weight or the simple 0–1 weight. For each point xil, we estimate its tangent space TxilM by performing PCA on its neighborhood. We choose the largest dl eigenvectors as the bases since the tangent space TxilM has the same dimension as the manifold Inline graphic. Let TilD×dl be the matrix whose columns constitute an orthonormal basis for TxilM. It is easy to show that Pil=TilTilT is the unique orthogonal projection from ℝD onto the tangent space TxilM [13]. That is, for any vector a ∈ ℝm, we have PilaTxilM and (a-Pila)Pila.

We now formally define the vector field and show how to represent it in the discrete case.

Definition 2.1 ([16])

A vector field X on the manifold Inline graphic is a continuous map X : Inline graphicT Inline graphic where T Inline graphic is the set of tangent spaces, written as pXp, with the property that for each pInline graphic, Xp is an element of Tp Inline graphic.

We can think of a vector field on the manifold as an arrow in the same way as we think of the vector field in the Euclidean space, with a given magnitude and direction attached to each point on the manifold, and chosen to be tangent to the manifold. A vector field V on the manifold is called a gradient field if there exists a function f on the manifold such that ∇f = V where ∇ is the covariant derivative on the manifold. Therefore, gradient fields are one kind of vector fields. It plays a critical role in connecting vector fields and functions.

Let Vl be a vector field on the manifold Inline graphic. For each point xil, let Vxil denote the value of the vector field Vl at xil. Recall the definition of vector field, Vxil should be a vector in the tangent space TxilMl. Therefore, we can represent it by the coordinates of the tangent space TxilMl as Vxil=Tilvil, where vildl is the local representation of Vxil with respect of Til. Let fl be a function on the manifold Inline graphic. By abusing the notation without confusion, we also use fl to denote the vector fl=(fl(xl1),,fl(xnll))T and use Vl to denote the vector Vl=(v1lT,,vnllT)Tdlnl. That is, Vl is a dlnl-dimensional big column vector which concatenates all the vil’s for a fixed l. Then for each task, we aim to compute the vector fl and the vector Vl.

2.2 Multi-task Vector Field Learning

In this section, we introduce multi-task vector field learning (MTVFL).

Many existing MTL methods capture the task relatedness by sharing task parameters. For linear predictors, the task parameters they used are actually the constant gradient vectors of the predictor functions. For general nonlinear predictor functions, we show it is natural to capture the shared differential structure using vector fields. Let f denote the vector (f1T,,fmT)T and V denote the vector (V1T,,VmT)T=(v11T,,vnlmT)T. We propose to learn f and V simultaneously:

  • The vector field Vl should be close to the gradient field ∇fl of fl, which can be formularized as follows:
    minf,VR1(f,V)=l=1mR1(fl,Vl):=l=1mMlfl-Vl2 (1)
  • The vector field Vl should be as parallel as possible:
    minVR2(V)=l=1mR2(Vl):=l=1mMlVlHS2 (2)

    where ∇ is the covariant derivative on the manifold, where ||·||HS denotes the Hilbert-Schmidt tensor norm [11]. ∇Vl measures the change of the vector field, therefore minimizing MlVlHS2 enforces the vector field Vl to be parallel.

  • All vector fields share an h-dimensional subspace where h is a predefined parameter:
    Tilvil=uil+ΘTwil,s.t.ΘΘT=Ih×h. (3)

Since these vector fields are assumed to share a low dimensional space, it is expected that the residual vector uil is small. We define another term R3 to control the complexity as follows:

R3(vil,wil,Θ)=l=1mi=1nlαuil2+βTilvil2 (4)
=l=1mi=1nlαTilvil-ΘTwil2+βTilvil2 (5)

Note that α and β are pre-specified coefficients, indicating the importance of the corresponding regularization component. Since we would like the vector field to be parallel, the vector norm is not expected to be too small. Besides, we assume the vector fields share a low dimensional subspace, the residual vector uil is expected to be small. In practice we suggest to use a small β and a large α. By setting β = 0, R3 will reduce to the regularization term proposed in ASO if we also replace the tangent vectors by the task parameters. Therefore, this formulation is a generalization of ASO.

It can be verified that wil=ΘTilvil=argminwilR3(vil,wil,Θ). Thus we have uil=Tilvil-ΘTwil=(I-ΘTΘ)Tilvil. Therefore, we can rewrite R3 as follows:

R3(V,Θ)=l=1mi=1nlαuil2+βTilvil2=l=1mi=1nl(α(I-ΘTΘ)Tilvil2+βTilvil2)=αVTAΘV+βVTHV (6)

where H is a block diagonal matrix with the diagonal blocks being TilTTil, and AΘ is another block diagonal matrix with the diagonal blocks being TilT(I-ΘTΘ)T(I-ΘTΘ)Til=TilT(I-ΘTΘ)Til. Therefore, the proposed formulation solves the following optimization problem:

argminf,V,ΘE(f,V,Θ)=R0(f)+λ1R1(f,V)+λ2R2(V)+λ3R3(V,Θ)s.t.ΘΘT=Ih×h (7)

where R0(f) is the loss function. For simplicity, we use the quadratic loss function R0(f)=l=1mi=1nl(fl(xil)-yil)2.

2.3 Objective Function in the Matrix Form

To simplify Eq. (7), in this section we rewrite our objective function in the matrix form.

Using the discrete methods in [17], we have the following discrete form equations:

R1(fl,Vl)=ij~iwijl((xjl-xil)TTilvil-fjl+fil)2 (8)
R2(fl,Vl)=i~jwijlPilTjlvjl-Tilvil2 (9)

Interestingly, with some algebraic transformations, we have the following matrix forms for our objective functions:

R1(fl,Vl)=2flTLlfl+VlTGlVl-2VlTClfl (10)

where Ll is the graph Laplacian matrix, Gl is a dlnl × dlnl block diagonal matrix, and Cl=[C1lT,,CnlT]T is a dlnl × nl block matrix. Denote the i-th dl × dl diagonal block of Gl by Giil and the i-th dl × nl block of Cl by Cil, we have

Giil=j~iwijl(xjl-xil)(xjl-xil)T,Cil=j~iwijl(xjl-xil)sijlT (11)

where sijlnl is a selection vector of all zero elements except for the i-th element being −1 and the j-th element being 1. And R2 becomes

R2(Vl)=VlTBlVl (12)

where Bl is a dlnl × dlnl sparse block matrix. If we index each dl × dl block by Bijl, then we have

Biil=j~iwijl(QijlQijlT+I) (13)
Bijl={-2wijlQijl,ifxi~xj0,otherwise (14)

where Qijl=TilTTjl. It is worth nothing that both R1 and R2 depend on tangent spaces Til.

Thus we can further write R1(f, V) and R2(V) as follows

R1(f,V)=l=1mR1(fl,Vl)=2fTLf+VTGV-2VTCf (15)
R2(V)=l=1mR2(Vl)=VTBV (16)

where L, G and B are block diagonal matrices with the corresponding l-th block matrix being Ll, Gl and Bl, respectively. C is a column block matrix with the l-th block matrix being Cl.

Let Inline graphic denote an n × n diagonal matrix where Inline graphic = 1 if the corresponding i-th data is labeled and Inline graphic = 0 otherwise. And let y ∈ ℝn be a column vector whose i-th element is the corresponding label of the i-th labeled data and 0 otherwise. Then R0(f)=1n(f-y)TI(f-y). Finally, we get the following matrix form for our objective function in Eq. (7) with the constraint ΘΘT = Ih×h as:

E(f,V,Θ)=R0(f)+λ1R1(f,V)+λ2R2(V)+λ3R3(V,Θ)=1n(f-y)TI(f-y)+λ1(2fTLf+VTGV-2VTCf)+λ2VTBV+λ3VT(αAΘ+βH)V=1n(f-y)TI(f-y)+2λ1fTLf+VT(λ1G+λ2B+λ3(αAΘ+βH))V-2λ1VTCf

It is worth noting that matrices L, G, B, C depend on data, and only the matrix AΘ is related to Θ.

3 Optimization

In this section, we discuss how to solve the following optimization problem:

argminf,V,ΘE(f,V,Θ),s.t.ΘΘT=Ih×h (17)

We use the alternating optimization to solve this problem.

  • Optimization of f and V. For a fixed Θ, the optimal f and V can be obtained via solving
    argminf,VE(f,V,Θ) (18)
  • Optimization of Θ. For a fixed V, the optimal Θ can be obtained via solving.
    argminΘR3(V,Θ),s.t.ΘΘT=Ih×h (19)

3.1 Optimization of f and V for a Given Θ

When Θ is fixed, the objective function is similar to that of the single task case. However, there are some differences we would like to mention. Firstly, when constructing the nearest neighbor graph, data points from different tasks are disconnected. Therefore when estimating tangent spaces, data points from different tasks are independent. Secondly, we do not require the dimension of tangent spaces from each task to be the same.

We note that

Ef=2(1nI+2λ1L)f-2λ1CTV-21ny (20)
EV=-2λ1Cf+2(λ1G+λ2H+λ3(αAΘ+βH))V (21)

Requiring the derivatives to be vanish, we get the following linear system

(1nI+2λ1L-λ1CT-λ1Cλ1G+λ2B+λ3(αAΘ+βH))(fV)=(1ny0) (22)

Except for the matrix AΘ, all other matrices can be computed in advance and will not change during the iterative process.

3.2 Optimization of Θ for a Given V

Since functions R0(f), R1(f, V) and R2(V) are not related to the variable Θ, we only need to optimize R3(V, Θ) subject to ΘΘT = Ih×h.

Recall Eq. (6), we rewrite R3(V, Θ) as follows:

Θ^=argminΘl=1mi=1nlα((I-ΘTΘ)Tilvil2+βαTilvil2)=argminΘαtr(VT((1+βα)I-ΘTΘ)V)=argmaxΘtr(ΘVVTΘT) (23)

where V=(T11v11,,Tnmmvnmm) is a D × n matrix with each column being a tangent vector. The optimal Θ̂ can be obtained by using singular value decomposition (SVD). Let V=Z1Z2T be the SVD of V and we assume that the singular values are in a decreasing order in Σ. Then the rows of Θ̂ are given by the first h columns of Z1.

3.3 Convex Relaxation

The orthogonality constraint in Eq. (23) is non-convex. Next, we propose to convert Eq. (23) into a convex formulation by relaxing its feasible domain into a convex set.

Let η = β/α. It can be verified that the following equality holds: (1 + η)I − ΘTΘ = η (1 + η)(ηI + ΘTΘ)−1. Then we can rewrite R3(V, Θ) as R3(V, Θ) = αη (1 + η) tr VT (ηI + ΘT Θ)−1V). Let Me be defined as Me = {M : M = ΘTΘ, ΘΘT = I, Θ ∈ ℝh×d}. The convex hull [8] of Me can be expressed as the convex set Mc given by Mc={M:tr(M)=h,MI,MS+d} and each element in Me is referred to as an extreme point of Mc.

To convert the non-convex problem Eq. (23) into a convex formulation, we replace ΘT Θ with M, and naturally relax its feasible domain into a convex set based on the relationship between Me and Mc presented above; this results in an optimization problem as

argminΘR3(V,M),s.t.,tr(M)=h,MI,MS+d (24)

where R3(V, M) is defined as R3(V, M) = αη (1 + η) tr VT(ηI + M)−1V). It follows from [3, Theorem 3.1] that the relaxed R3 is jointly convex in V and M. After we obtain the optimal M, the optimal Θ can be approximated using the first h eigenvectors (corresponding to the largest h eigenvalues) of the optimal M.

4 Experiments

In this section, we evaluate our method on one synthetic data and one real data set. We compare the proposed Multi-Task Vector Field Learning (MTVFL) algorithm against the following methods: (a) Single Task Vector Field Learning (STVFL, or PFR), (b) Alternating Structure Optimization (ASO) and (c) its nonlinear version - Kernelized Alternating Structure Optimization (KASO). The kernel constructed in KASO uses both labeled data and unlabeled data. Thus it can be viewed as a semi-supervised MTL method.

4.1 Synthetic Data

We first construct a synthetic data to evaluate our method in comparison with the semi-supervised single task learning method (STVFL). We generate two data sets including Swiss roll and Swiss roll with hole embedded in 3-dimensional Euclidean space. The Swiss roll is generated by the following equations x = t1 cos t1; y = t2; z = t1 sin t1 where t1 ∈ [3π/2, 9π/2]; t2 ∈ [0, 21]. The Swill roll with hole excludes points within t1 ∈ [9, 12] and t2 ∈ [9, 14]. The ground truth function is f(x, y, z) = t1. This test is a semi-supervised multi-task regression problem. We randomly select a number of labeled data in each task and try to predict the value on other unlabeled data.

Each data set has 400 points. We construct a nearest neighbor graph for each task. The number of nearest neighbors is set to 5 and the manifold dimension is set to 2 as they are both 2 dimensional manifolds. The shared subspace dimension is set to 2. The regularization parameters are chosen via cross-validation. We perform 100 independent trials with randomly selected labeled sets. The performance is measured by the mean squared error (MSE). We also try ASO and KASO, however they perform poorly since the data is highly nonlinear. The averaged MSE over two tasks is presented in Fig. 2. We can observe that MTVFL consistently outperforms STVFL which demonstrates the effectiveness of SSMTL.

Figure 2.

Figure 2

(a) Performance of MTVFL and STVFL; (b) The singular value distribution.

We also show the singular value distribution of the ground truth gradient fields. Given the ground truth f, we can compute the gradient field V by taking derivatives of R1(f, V) with respect to V. Requiring the derivative to vanish, we get the following equation GV = Cf. After obtaining V, the gradient vector Vxil at each point can be obtained as Vxil=Tilvil. Then we perform PCA on these vectors and the singular values of the covariance matrix of Vxil are shown in Fig. 2 (b). As can be seen from Fig. 2 (b), the number of dominant singular values is 2 which indicates that the ground truth gradient fields concentrate on a 2-dimensional subspace.

4.2 Landmine Detection

We use the landmine data set studied in [20]. There are totally 29 sets of data which are collected from various real landmine fields. Each data example is represented by a 9-dimensional vector with a binary label, which is either 1 for landmine or 0 for clutter. The problem of landmine detection is to predict the labels of unlabeled objects. Among the 29 data sets, 1–15 correspond to relatively highly foliated regions and 16–29 correspond to bare earth or desert regions. Following [20], we choose the data sets 1–10 and 16–24 to form 19 tasks.

The basic setup of all the algorithms is as follows. First, we construct a nearest neighbor graph for each task. The number of nearest neighbors is set to 10 and the manifold dimension is set to 4 empirically. These two parameters are the same for all 19 tasks. The shared subspace dimension is set to be 5 for both of MTVFL and ASO and the shared subspace dimension of KASO is set to 10. All the regularization parameters for the four algorithms are chosen via cross-validation. Note that KASO needs to construct a kernel matrix. We use Gaussian kernel in KASO and the Gaussian width is set to be optimal by searching within [0.01, 10].

We perform 100 independent trials with randomly selected labeled sets. We measure the performance by AUC which denotes area under the Receiver Operation Characteristic (ROC) curve. A large AUC value indicates good classification performance. Since the data have severely unbalanced labels, following [20], we do a special setting that assures there is at least one “1” and one “0” labeled sample in the training set of each task. The AUC averaged over the 19 tasks is presented in Fig. 3 (a). As can be seen, MTVFL consistently outperforms the other three algorithms. When the number of labeled data increases, KASO outperforms STVFL. ASO does not improve much when the amount of labeled data increases, which is probably because the data have severely unbalanced labels and the ground truth predictor function is nonlinear. We also show the singular value distribution of the ground truth gradient fields in Fig. 3 (b). The computation of the singular values is the same as in Section. 4.1. As can be seen from Fig. 3 (b), the number of dominant singular values is 5. The percentage of the sum of the first 5 singular values over the total sum is 91.34%, which indicates that the ground truth gradient fields concentrate on a 5-dimensional subspace.

Figure 3.

Figure 3

(a) Performance of various MTL algorithms; (b) The singular value distribution.

5 Conclusion

In this paper, we propose a new semi-supervised multi-task learning formulation using vector fields. We show that vector fields can naturally capture the shared differential structure among tasks as well as the structure of the data manifolds, both of which are crucial for semi-supervised multi-task learning. Our experimental results on synthetic and real data demonstrate the effectiveness of the proposed method. There are several interesting future directions suggested in this work. One is the relation between learning on task parameters and learning on vector fields. Ultimately, both of them are learning functions. Another direction is to apply other assumptions made in the multi-task learning community into vector field learning, e.g., the cluster assumption.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grants 61125203 and 61233011, the National Basic Research Program of China (973 Program) under Grant 2012CB316404, NIH (R01 LM010730) and NSF (IIS-0953662, CCF-1025177).

Footnotes

1

The data set is available at http://www.ee.duke.edu/~lcarin/LandmineData.zip.

Contributor Information

Binbin Lin, Email: binbinlinzju@gmail.com.

Sen Yang, Email: senyang@asu.edu.

Chiyuan Zhang, Email: chiyuan.zhang.zju@gmail.com.

Jieping Ye, Email: jieping.ye@asu.edu.

Xiaofei He, Email: xiaofeihe@gmail.com.

References

  • 1.Agarwal A, HD, Gerber S. Learning multiple tasks using manifold regularization. Advances in Neural Information Processing Systems. 2010;23:46–54. [Google Scholar]
  • 2.Ando RK, Zhang T. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research. 2005;6:1817–1853. [Google Scholar]
  • 3.Argyriou A, Micchelli CA, Pontil M, Ying Y. A spectral regularization framework for multi-task structure learning. Advances in Neural Information Processing Systems. 2008;20:25–32. [Google Scholar]
  • 4.Bakker B, Heskes T. Task clustering and gating for bayesian multitask learning. Journal of Machine Learning Research. 2003;4:83–99. [Google Scholar]
  • 5.Belkin M, Niyogi P, Sindhwani V. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research. 2006 Dec;7:2399–2434. [Google Scholar]
  • 6.Ben-David S, Gehrke J, Schuller R. A theoretical framework for learning from a pool of disparate data sources. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. 2002:443–449. [Google Scholar]
  • 7.Ben-David S, Schuller R. Exploiting task relatedness for mulitple task learning. Conference on Learning Theory. 2003:567–580. [Google Scholar]
  • 8.Boyd S, Vandenberghe L. Convex Optimization. Cambridge University Press; 2004. [Google Scholar]
  • 9.Carlson A, Betteridge J, Wang RC, Hruschka ER, Jr, Mitchell TM. Coupled semi-supervised learning for information extraction. Proceedings of the third ACM international conference on Web search and data mining. 2010:101–110. [Google Scholar]
  • 10.Chapelle O, Schölkopf B, Zien A, editors. Semi-Supervised Learning. MIT Press; 2006. [Google Scholar]
  • 11.Defant A, Floret K. Tensor Norms and Operator Ideals. North-Holland Mathematics Studies; North-Holland, Amsterdam: 1993. [Google Scholar]
  • 12.Evgeniou T, Micchelli CA, Pontil M. Learning multiple tasks with kernel methods. Journal of Machine Learning Research. 2005;6:615–637. [Google Scholar]
  • 13.Golub GH, Loan CFV. Matrix computations. 3. Johns Hopkins University Press; 1996. [Google Scholar]
  • 14.Jacob L, Bach F, Vert JP. Clustered multi-task learning: A convex formulation. Advances in Neural Information Processing Systems. 2009;21:745–752. [Google Scholar]
  • 15.Lafferty J, Wasserman L. Statistical analysis of semi-supervised regression. Advances in Neural Information Processing Systems. 2007;20:801–808. [Google Scholar]
  • 16.Lee JM. Introduction to Smooth Manifolds. 2. Springer Verlag; New York: 2003. [Google Scholar]
  • 17.Lin B, Zhang C, He X. Semi-supervised regression via parallel field regularization. Advances in Neural Information Processing Systems. 2011;24:433–441. [Google Scholar]
  • 18.Liu Q, Liao X, Carin L. Semi-supervised multitask learning. Advances in Neural Information Processing Systems. 2008;20:937–944. [Google Scholar]
  • 19.Wang F, Wang X, Li T. Proceedings of the 2009 Ninth IEEE International Conference on Data Mining. IEEE Computer Society; 2009. Semi-supervised multi-task learning with task regularizations; pp. 562–568. [Google Scholar]
  • 20.Xue Y, Liao X, Carin L, Krishnapuram B. Multi-task learning for classification with dirichlet process priors. Journal of Machine Learning Research. 2007;8:35–63. [Google Scholar]

RESOURCES