Abstract
We introduce a data-dependent regularization problem which uses the geometry structure of the data to learn functions from incomplete data. We show another proof of the standard representer theorem when introducing the problem. At the end of the paper, two applications in image processing are used to illustrate the function learning framework.
Keywords: Function learning, Manifold structure, Representer theorem
Introduction
Background
Many machine learning problems involve the learning of multidimensional functions from incomplete training data. For example, the classification problem can be viewed as learning a function whose function values give the classes that the inputs belong to. The direct representation of the function in high-dimensional spaces often suffers from the issue of dimensionality. The large number of parameters in the function representation would translate to the need of extensive training data, which is expensive to obtain. However, researchers found that many natural datasets have extensive structure presented in them, which is usually known as manifold structure. The intrinsic structure of the data can then be used to improve the learning results. Nowadays, assuming data lying on or close to a manifold becomes more and more common in machine learning. It is called manifold assumption in machine learning. Though researchers are not clear about the theoretical reason why the datasets have manifold structure, it is useful for supervised learning and it gives excellent performance. In this work, we will exploit the manifold structure to learn functions from incomplete training data.
A Motivated Example
One of the main problems in numerical analysis is function approximation. During the last several decades, researchers usually considered the following problem to apply the theory of function approximation to real-world problems:
![]() |
1 |
where L is some linear operator,
are n accessible observations and
is the input space. We can use the method of Lagrange multiplier to solve Problem (1). Assume that the searching space for the function f is large enough (for example
space). Then the Lagrangian function C(f) is given by
![]() |
Taking the gradient of the Lagrangian function w.r.t. the function f gives us
![]() |
Setting
, we have
![]() |
where
is the delta function. Suppose
is the adjoint operator of L. Then we have
![]() |
which gives us
. This implies
, for some
and
.
Kernels and Representer Theorem
As machine learning develops fast these years, kernel methods [1] have received much attentions. Researchers found that working in the original data space is somehow not well-performed. So, we would like to map the data to a high dimensional space (feature space) using some non-linear mapping (feature map). Then we can do a better job (e.g. classification) in the feature space. When we talk about feature map, one concept that is unavoidable to mention is the kernel, which easily speaking is the inner product of the features. With a kernel (positive definite), we can then have a corresponding reproducing kernel Hilbert space (RKHS) [2]
. We can now solve the problem that is similar to (1) in the RKHS:
![]() |
A more feasible way is to consider a regularization problem in the RKHS:
![]() |
2 |
Then the searching space of f becomes
, which is a Hilbert space. Before solving Problem (2), we would like to recall some basic concepts about the RKHS. Suppose we have a positive definite kernel
, i.e.,
![]() |
then
is the Hilbert space corresponding to the kernel
. It is defined by all the possible linear combination of the kernel
, i.e.,
. Thus, for any
, there exists
and
such that
![]() |
Since
is a Hilbert space, it is equipped with an inner product. The principle to define the inner product is to let
have representer
and the representer performs like the delta function for functions in
(note that delta function is not in
). In other word, we want to have a similar result to the following formula:
![]() |
This is called reproducing relation or reproducing property. In
, we want to define the inner product so that we have the reproducing relation in
:
![]() |
To achieve this goal, we can define
![]() |
Then we have
![]() |
With the kernel, the feature map
can be defined as
![]() |
Having these knowledge about the RKHS, we can now look at the solution of Problem (2). It can be characterized by the famous conclusion named representer theorem, which states that the solution of Problem (2) is
![]() |
The standard proof of the representer theorem is well-known and can be found in many literatures, see for example [3, 4]. While the drawback of the standard proof is that the proof did not provide the expression of the coefficients
. In the first part of this work, we will provide another proof of the representer theorem. As a by-product, we can also build the relation between Problem (1) and Problem (2).
Another Proof of Representer Theorem
To give another proof of the representer theorem, we first build some relations between
and
. We endow the dataset X with a measure
. Then the corresponding
inner product is given by
![]() |
Consider an operator L on f with respect to the kernel K:
![]() |
3 |
which is the Hilbert-Schmidt integral operator [5]. This operator is self-adjoint, bounded and compact. By the spectral theorem [6], we can obtain that the eigenfunctions
of the operator will form an orthonormal basis of
, i.e.,
![]() |
With the operator L defined as (3), we can look at the relations between
and
. Suppose
are the eigenfunctions of the operator L and
are the corresponding eigenvalues, then
![]() |
4 |
But by the reproducing relation, we have
![]() |
Now, let us look at how to represent K(x, y) by the eigenfunctions. We have
![]() |
and
can be computed by
![]() |
To see
, we can just plug it into (4) to verify it:
![]() |
Since the eigenfunctions of L form an orthogonal basis of
, then for any
, it can be written as
. So we have
![]() |
While for the
norm, we have
![]() |
Next we show that the orthonormal basis
are within
. Note that
![]() |
which implies
![]() |
So we can get
![]() |
Therefore, we get
.
We now need to investigate that for any
, when will we have that
. To let
, we need to have
. So
![]() |
This means that to let
, we need to have
[7].
Combining all these analysis, we can then get the following relation between
and
:
![]() |
According to which, we can have another proof of the representer theorem.
Proof
Suppose
are eigenfunctions of the operator L. Then we can write the solution as
. To let
, we require
.
We consider here a more general form of Problem (2):
![]() |
where
is the error function which is differentiable with respect to each
. We would use the tools in
space to get the solution.
The cost function of the regularization problem is
![]() |
By substituting
into the cost function, we have
![]() |
Since
![]() |
differentiating
w.r.t. each
and setting it equal to zero gives
![]() |
Solving
, we get
![]() |
Since
, we have
![]() |
This proves the representer theorem.
Note that this result not only proves the representer theorem, but also gives the expression of the coefficients
.
With the operator L, we can also build a relation between Problem (1) and Problem (2). Define the operator in Problem (1) to be the inverse of the Hilbert-Schmidt Integral operator. The discussion on the inverse of the Hilbert-Schmidt Integral operator can be found in [8]. Note that for the delta function, we have
![]() |
Then the solution of Problem (1) becomes
. So we have
. Applying L on both sides gives
![]() |
By which we obtain
![]() |
Data-Dependent Regularization
So far, we have introduced the standard representer theorem. While as we discussed at the very beginning, many natural datasets have the manifold structure presented in them. So based on the classical Problem (2), we would like to introduce a new learning problem which exploits the manifold structure of the data. We call it the data-dependent regularization problem. Regularization problem has a long history going back to Tikhonov [9]. He proposed the Tikhonov regularization to solve the ill-posed inverse problem.
To exploit the manifold structure of the data, we can then divide a function into two parts: the function restricted on the manifold and the function restricted outside the manifold. So the problem can be formulated as
![]() |
5 |
where
and
. The norms
and
will be explained later in details.
and
are two parameters which control the degree for penalizing the energy of the function on the manifold and outside the manifold. We will show later that by controlling the two balancing parameters (set
), the standard representer theorem is a special case of Problem (5).
We now discuss something about the functions
and
. Consider the ambient space
(or
) and a positive definite kernel K. Let us first look at the restriction of K to the manifold
. The restriction is again a positive definite kernel [2] and it will then have a corresponding Hilbert space. We consider the relation between the RKHS
and the restricted RKHS to explain the norms
and
.
Lemma 1
([10]). Suppose
(or
) is a positive definite kernel. Let
be a subset of X (or
).
denote all the functions defined on
. Then the RKHS given by the restricted kernel
is
![]() |
6 |
with the norm defined as
![]() |
Proof
Define the set
![]() |
We first show that the set
has a minuma for any
. Choose a sequence
. Then the sequence is bounded because the space
is a Hilbert space. It is reasonable to assume that
is weakly convergent because of the Banach-Alaoglu theorem [11]. By the weakly convergence, we can obtain pointwise convergence according to the reproducing property. So the limit of the sequence
attains the minima.
We further define
. We show that
is a Hilbert space by the parallelogram law. In other word, we are going to show that
![]() |
Since we defined
. Then for all
, there exists
such that
![]() |
By the definition of
, we can choose
such that
![]() |
and
![]() |
Thus, we have
![]() |
For the reverse inequality, we first choose
such that
and
. Then
![]() |
Therefore, we get
![]() |
Next, we show (6) by showing that for all
and
,
![]() |
where
.
Choose
such that
and
. This is possible because of the analysis above. Specially, we have
![]() |
Now, for any function
such that
, we have
![]() |
Thus,
![]() |
This completes the proof of the lemma.
With this lemma, the solution of Problem (5) then becomes easy to obtain. By the representer theorem we mentioned before, we know that the function satisfies
![]() |
is
. Since we have
![]() |
![]() |
Thus, we can conclude that is solution of (5) is exactly
![]() |
where the coefficients
are controlled by the parameters
and
.
With the norms
and
being well-defined, we would like to seek the relation between
,
and
. Before stating the relation, we would like to restate some of the notations to make the statement more clear. Let
![]() |
and
![]() |
![]() |
![]() |
![]() |
To find the relation between
,
and
, we need to pullback the restricted kernel
and
to the original space. To do so, define
![]() |
![]() |
Then we have
. The corresponding Hilbert spaces for
and
are
![]() |
![]() |
It is straightforward to define that
![]() |
![]() |
The following lemma shows the relation between
,
and
, which also reveals the relation between
,
and
by Moore-Aronszajn theorem [12].
Lemma 2
Suppose
(or
) are two positive definie kernels. If
, then
![]() |
is a Hilbert space with the norm defined by
![]() |
The idea of the proof of this lemma is exactly the same as the one for Lemma 1. Thus we omit it here.
A direct corollary of this lemma is:
Corollary 1
Under the assumption of Lemma 2, if the functions in
and
have no functions except for zero function in common. Then the norm of
is given simply by
![]() |
If we go back to our scenario, we can get the following result by Corollary 1:
![]() |
This means that if we set
in Problem (5), it will reduce to Problem (2). Therefore, the standard representer theorem is a special case of our data-dependent regularization problem (5).
Applications
As we said in the introduction part, many engineering problems can be viewed as learning multidimensional functions from incomplete data. In this section, we would like to show two applications of functions learning: image interpolation and patch-based iamge denoising.
Image Interpolation
Image interpolation tries to best approximate the color and intensity of a pixel based on the values at surrounding pixels. See Fig. 1 for illustration. From function learning perspective, image interpolation is to learn a function from the known pixels and their corresponding positions.
Fig. 1.
Illustration of image interpolation. The size of the original image is
. We want to enlarge it as an
image. Then the blue shaded positions are unknown. Using image interpolation, we can find the values of these positions. (Color figure online)
We would like to use the Lena image as shown in Fig. 2(a) to give an example of image interpolation utilizing the proposed framework. The zoomed image is shown in Fig. 2(d). In the image interpolation example, the two balancing parameters are set to be the same and the Laplacian kernel [13] is used:
![]() |
Note that we can also use other kernels, for example, polynomial kernel and Gaussian kernel to proceed image interpolation. Choosing the right kernel is an interesting problem and we do not have enough space to compare different kernels in this paper.
Fig. 2.
Illustration of image interpolation. The original image is downsampled by a factor of 3 in each direction. We use the proposed function learning framework to obtain the interpolation function from downsampled image. From the results, we can see that the proposed framework works for image interpolation.
In Fig. 2(b), we downsampled the original image by a factor of 3 in each direction. The zoomed image is shown in Fig. 2(e). The interpolation result with the zoomed image are shown in Fig. 2(c) and Fig. 2(f).
Patch-Based Image Denoising
From the function learning point of view, the patch-based image denoising problem can be viewed as learning a function from noisy patches to their “noise-free” centered pixels. See Fig. 3 for illustration.
Fig. 3.

Illustration of patch-based image denoising. It can be viewed as learning a function from the
noisy patches to the centered clean pixels.
In the patch-based image denoising application, we use the Laplacian kernel as well. We assume that the noisy patches are lying close to some manifold so we set the balancing parameter which controls the energy outside the manifold to be large enough. We use the images in Fig. 4 as known data to learn the function. Then for a given noisy image, we can use the learned function to do image denoising. To speed up the learning process, we randomly choose only 10% of the known data to learn the function.
Fig. 4.

Four training images. We use noisy images and clean pixels to learn the denoising function.
We use the image Baboon to test the learned denoising function. The denoising results are shown in Fig. 5. Each column shows the result corresponding to one noise level.
Fig. 5.
Illustration of the denoising results.
Conclusion and Future Work
In this paper, we introduced a framework of learning functions from part of the data. We gave a data-dependent regularization problem which helps us learn a function using the manifold structure of the data. We used two applications to illustrate the learning framework. While these two applications are just part of the learning framework. They are special cases of the data-dependent regularization problem. However, for the general application, we need to calculate
and
, which is hard to do so since we only have partial data. So we need to approximate
and
from incomplete data and to propose a new learning algorithm so that our framework can be used in a general application. This is part of our future work. Another line for the future work is from the theoretical aspect. We showed that the solution of the data-dependent regularization problem is the linear combination of the kernel. It then can be viewed as a function approximation result. If it is an approximated function, then we can consider the error analysis of the approximated function.
Contributor Information
Valeria V. Krzhizhanovskaya, Email: V.Krzhizhanovskaya@uva.nl
Gábor Závodszky, Email: G.Zavodszky@uva.nl.
Michael H. Lees, Email: m.h.lees@uva.nl
Jack J. Dongarra, Email: dongarra@icl.utk.edu
Peter M. A. Sloot, Email: p.m.a.sloot@uva.nl
Sérgio Brissos, Email: sergio.brissos@intellegibilis.com.
João Teixeira, Email: joao.teixeira@intellegibilis.com.
Qing Zou, Email: zou-qing@uiowa.edu.
References
- 1.Schölkopf B, Smola AJ, Bach F. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge: MIT press; 2002. [Google Scholar]
- 2.Aronszajn N. Theory of reproducing kernels. Trans. Am. Math. Soc. 1950;68(3):337–404. doi: 10.1090/S0002-9947-1950-0051437-7. [DOI] [Google Scholar]
- 3.Schölkopf B, Herbrich R, Smola AJ. A generalized representer theorem. In: Helmbold D, Williamson B, editors. Computational Learning Theory; Heidelberg: Springer; 2001. pp. 416–426. [Google Scholar]
- 4.Argyriou A, Micchelli CA, Pontil M. When is there a representer theorem? vector versus matrix regularizers. J. Mach. Learn. Res. 2009;10(Nov):2507–2529. [Google Scholar]
- 5.Gohberg, I., Goldberg, S., Kaashoek, M.A.: Hilbert-Schmidt operators. In: Classes of Linear Operators, vol. I, pp. 138–147. Birkhäuser, Basel (1990)
- 6.Helmberg G. Introduction to Spectral Theory in Hilbert Space. New York: Courier Dover Publications; 2008. [Google Scholar]
- 7.Mikhail B, Partha N, Vikas S. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 2006;7:2507–2529. [Google Scholar]
- 8.Pipkin AC. A Course on Integral Equations. New York: Springer; 1991. [Google Scholar]
- 9.Tikhonov AN. Regularization of incorrectly posed problems. Soviet Math. Doklady. 1963;4(6):1624–1627. [Google Scholar]
- 10.Saitoh S, Sawano Y. Theory of Reproducing Kernels and Applications. Singapore: Springer; 2016. [Google Scholar]
- 11.Rudin W. Functional Analysis. MA. Boston: McGraw-Hill; 1991. [Google Scholar]
- 12.Amir, A.D., Luis, G.C.R., Yukawa, M., Stanczak, S.: Adaptive learning for symbol detection: a reproducing kernel hilbert space approach. Mach. Learn. Fut. Wirel. Commun., 197–211 (2020)
- 13.Kernel Functions for Machine Learning Applications. http://crsouza.com/2010/03/17/kernel-functions-for-machine-learning-applications/














































































