Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2020 May 22;12139:312–326. doi: 10.1007/978-3-030-50420-5_23

Learning Functions Using Data-Dependent Regularization: Representer Theorem Revisited

Qing Zou 8,
Editors: Valeria V Krzhizhanovskaya8, Gábor Závodszky9, Michael H Lees10, Jack J Dongarra11, Peter M A Sloot12, Sérgio Brissos13, João Teixeira14
PMCID: PMC7304014

Abstract

We introduce a data-dependent regularization problem which uses the geometry structure of the data to learn functions from incomplete data. We show another proof of the standard representer theorem when introducing the problem. At the end of the paper, two applications in image processing are used to illustrate the function learning framework.

Keywords: Function learning, Manifold structure, Representer theorem

Introduction

Background

Many machine learning problems involve the learning of multidimensional functions from incomplete training data. For example, the classification problem can be viewed as learning a function whose function values give the classes that the inputs belong to. The direct representation of the function in high-dimensional spaces often suffers from the issue of dimensionality. The large number of parameters in the function representation would translate to the need of extensive training data, which is expensive to obtain. However, researchers found that many natural datasets have extensive structure presented in them, which is usually known as manifold structure. The intrinsic structure of the data can then be used to improve the learning results. Nowadays, assuming data lying on or close to a manifold becomes more and more common in machine learning. It is called manifold assumption in machine learning. Though researchers are not clear about the theoretical reason why the datasets have manifold structure, it is useful for supervised learning and it gives excellent performance. In this work, we will exploit the manifold structure to learn functions from incomplete training data.

A Motivated Example

One of the main problems in numerical analysis is function approximation. During the last several decades, researchers usually considered the following problem to apply the theory of function approximation to real-world problems:

graphic file with name M1.gif 1

where L is some linear operator, Inline graphic are n accessible observations and Inline graphic is the input space. We can use the method of Lagrange multiplier to solve Problem (1). Assume that the searching space for the function f is large enough (for example Inline graphic space). Then the Lagrangian function C(f) is given by

graphic file with name M5.gif

Taking the gradient of the Lagrangian function w.r.t. the function f gives us

graphic file with name M6.gif

Setting Inline graphic, we have

graphic file with name M8.gif

where Inline graphic is the delta function. Suppose Inline graphic is the adjoint operator of L. Then we have

graphic file with name M11.gif

which gives us Inline graphic. This implies Inline graphic, for some Inline graphic and Inline graphic.

Kernels and Representer Theorem

As machine learning develops fast these years, kernel methods [1] have received much attentions. Researchers found that working in the original data space is somehow not well-performed. So, we would like to map the data to a high dimensional space (feature space) using some non-linear mapping (feature map). Then we can do a better job (e.g. classification) in the feature space. When we talk about feature map, one concept that is unavoidable to mention is the kernel, which easily speaking is the inner product of the features. With a kernel (positive definite), we can then have a corresponding reproducing kernel Hilbert space (RKHS) [2] Inline graphic. We can now solve the problem that is similar to (1) in the RKHS:

graphic file with name M17.gif

A more feasible way is to consider a regularization problem in the RKHS:

graphic file with name M18.gif 2

Then the searching space of f becomes Inline graphic, which is a Hilbert space. Before solving Problem (2), we would like to recall some basic concepts about the RKHS. Suppose we have a positive definite kernel Inline graphic, i.e.,

graphic file with name M21.gif

then Inline graphic is the Hilbert space corresponding to the kernel Inline graphic. It is defined by all the possible linear combination of the kernel Inline graphic, i.e., Inline graphic. Thus, for any Inline graphic, there exists Inline graphic and Inline graphic such that

graphic file with name M29.gif

Since Inline graphic is a Hilbert space, it is equipped with an inner product. The principle to define the inner product is to let Inline graphic have representer Inline graphic and the representer performs like the delta function for functions in Inline graphic (note that delta function is not in Inline graphic). In other word, we want to have a similar result to the following formula:

graphic file with name M35.gif

This is called reproducing relation or reproducing property. In Inline graphic, we want to define the inner product so that we have the reproducing relation in Inline graphic:

graphic file with name M38.gif

To achieve this goal, we can define

graphic file with name M39.gif

Then we have

graphic file with name M40.gif

With the kernel, the feature map Inline graphic can be defined as

graphic file with name M42.gif

Having these knowledge about the RKHS, we can now look at the solution of Problem (2). It can be characterized by the famous conclusion named representer theorem, which states that the solution of Problem (2) is

graphic file with name M43.gif

The standard proof of the representer theorem is well-known and can be found in many literatures, see for example [3, 4]. While the drawback of the standard proof is that the proof did not provide the expression of the coefficients Inline graphic. In the first part of this work, we will provide another proof of the representer theorem. As a by-product, we can also build the relation between Problem (1) and Problem (2).

Another Proof of Representer Theorem

To give another proof of the representer theorem, we first build some relations between Inline graphic and Inline graphic. We endow the dataset X with a measure Inline graphic. Then the corresponding Inline graphic inner product is given by

graphic file with name M49.gif

Consider an operator L on f with respect to the kernel K:

graphic file with name M50.gif 3

which is the Hilbert-Schmidt integral operator [5]. This operator is self-adjoint, bounded and compact. By the spectral theorem [6], we can obtain that the eigenfunctions Inline graphic of the operator will form an orthonormal basis of Inline graphic, i.e.,

graphic file with name M53.gif

With the operator L defined as (3), we can look at the relations between Inline graphic and Inline graphic. Suppose Inline graphic are the eigenfunctions of the operator L and Inline graphic are the corresponding eigenvalues, then

graphic file with name M58.gif 4

But by the reproducing relation, we have

graphic file with name M59.gif

Now, let us look at how to represent K(xy) by the eigenfunctions. We have

graphic file with name M60.gif

and Inline graphic can be computed by

graphic file with name M62.gif

To see Inline graphic, we can just plug it into (4) to verify it:

graphic file with name M64.gif

Since the eigenfunctions of L form an orthogonal basis of Inline graphic, then for any Inline graphic, it can be written as Inline graphic. So we have

graphic file with name M68.gif

While for the Inline graphic norm, we have

graphic file with name M70.gif

Next we show that the orthonormal basis Inline graphic are within Inline graphic. Note that

graphic file with name M73.gif

which implies

graphic file with name M74.gif

So we can get

graphic file with name M75.gif

Therefore, we get Inline graphic.

We now need to investigate that for any Inline graphic, when will we have that Inline graphic. To let Inline graphic, we need to have Inline graphic. So

graphic file with name M81.gif

This means that to let Inline graphic, we need to have Inline graphic [7].

Combining all these analysis, we can then get the following relation between Inline graphic and Inline graphic:

graphic file with name M86.gif

According to which, we can have another proof of the representer theorem.

Proof

Suppose Inline graphic are eigenfunctions of the operator L. Then we can write the solution as Inline graphic. To let Inline graphic, we require Inline graphic.

We consider here a more general form of Problem (2):

graphic file with name M91.gif

where Inline graphic is the error function which is differentiable with respect to each Inline graphic. We would use the tools in Inline graphic space to get the solution.

The cost function of the regularization problem is

graphic file with name M95.gif

By substituting Inline graphic into the cost function, we have

graphic file with name M97.gif

Since

graphic file with name M98.gif

differentiating Inline graphic w.r.t. each Inline graphic and setting it equal to zero gives

graphic file with name M101.gif

Solving Inline graphic, we get

graphic file with name M103.gif

Since Inline graphic, we have

graphic file with name M105.gif

This proves the representer theorem.

Note that this result not only proves the representer theorem, but also gives the expression of the coefficients Inline graphic.

With the operator L, we can also build a relation between Problem (1) and Problem (2). Define the operator in Problem (1) to be the inverse of the Hilbert-Schmidt Integral operator. The discussion on the inverse of the Hilbert-Schmidt Integral operator can be found in [8]. Note that for the delta function, we have

graphic file with name M107.gif

Then the solution of Problem (1) becomes Inline graphic. So we have Inline graphic. Applying L on both sides gives

graphic file with name M110.gif

By which we obtain

graphic file with name M111.gif

Data-Dependent Regularization

So far, we have introduced the standard representer theorem. While as we discussed at the very beginning, many natural datasets have the manifold structure presented in them. So based on the classical Problem (2), we would like to introduce a new learning problem which exploits the manifold structure of the data. We call it the data-dependent regularization problem. Regularization problem has a long history going back to Tikhonov [9]. He proposed the Tikhonov regularization to solve the ill-posed inverse problem.

To exploit the manifold structure of the data, we can then divide a function into two parts: the function restricted on the manifold and the function restricted outside the manifold. So the problem can be formulated as

graphic file with name M112.gif 5

where Inline graphic and Inline graphic. The norms Inline graphic and Inline graphic will be explained later in details. Inline graphic and Inline graphic are two parameters which control the degree for penalizing the energy of the function on the manifold and outside the manifold. We will show later that by controlling the two balancing parameters (set Inline graphic), the standard representer theorem is a special case of Problem (5).

We now discuss something about the functions Inline graphic and Inline graphic. Consider the ambient space Inline graphic (or Inline graphic) and a positive definite kernel K. Let us first look at the restriction of K to the manifold Inline graphic. The restriction is again a positive definite kernel [2] and it will then have a corresponding Hilbert space. We consider the relation between the RKHS Inline graphic and the restricted RKHS to explain the norms Inline graphic and Inline graphic.

Lemma 1

([10]). Suppose Inline graphic (or Inline graphic) is a positive definite kernel. Let Inline graphic be a subset of X (or Inline graphic). Inline graphic denote all the functions defined on Inline graphic. Then the RKHS given by the restricted kernel Inline graphic is

graphic file with name M135.gif 6

with the norm defined as

graphic file with name M136.gif

Proof

Define the set

graphic file with name M137.gif

We first show that the set Inline graphic has a minuma for any Inline graphic. Choose a sequence Inline graphic. Then the sequence is bounded because the space Inline graphic is a Hilbert space. It is reasonable to assume that Inline graphic is weakly convergent because of the Banach-Alaoglu theorem [11]. By the weakly convergence, we can obtain pointwise convergence according to the reproducing property. So the limit of the sequence Inline graphic attains the minima.

We further define Inline graphic. We show that Inline graphic is a Hilbert space by the parallelogram law. In other word, we are going to show that

graphic file with name M146.gif

Since we defined Inline graphic. Then for all Inline graphic, there exists Inline graphic such that

graphic file with name M150.gif

By the definition of Inline graphic, we can choose Inline graphicsuch that

graphic file with name M153.gif

and

graphic file with name M154.gif

Thus, we have

graphic file with name M155.gif

For the reverse inequality, we first choose Inline graphic such that Inline graphic and Inline graphic. Then

graphic file with name M159.gif

Therefore, we get

graphic file with name M160.gif

Next, we show (6) by showing that for all Inline graphic and Inline graphic,

graphic file with name M163.gif

where Inline graphic.

Choose Inline graphic such that Inline graphic and Inline graphic. This is possible because of the analysis above. Specially, we have

graphic file with name M168.gif

Now, for any function Inline graphic such that Inline graphic, we have

graphic file with name M171.gif

Thus,

graphic file with name M172.gif

This completes the proof of the lemma.

With this lemma, the solution of Problem (5) then becomes easy to obtain. By the representer theorem we mentioned before, we know that the function satisfies

graphic file with name M173.gif

is Inline graphic. Since we have

graphic file with name M175.gif
graphic file with name M176.gif

Thus, we can conclude that is solution of (5) is exactly

graphic file with name M177.gif

where the coefficients Inline graphic are controlled by the parameters Inline graphic and Inline graphic.

With the norms Inline graphic and Inline graphic being well-defined, we would like to seek the relation between Inline graphic, Inline graphic and Inline graphic. Before stating the relation, we would like to restate some of the notations to make the statement more clear. Let

graphic file with name M186.gif

and

graphic file with name M187.gif
graphic file with name M188.gif
graphic file with name M189.gif
graphic file with name M190.gif

To find the relation between Inline graphic, Inline graphic and Inline graphic, we need to pullback the restricted kernel Inline graphic and Inline graphic to the original space. To do so, define

graphic file with name M196.gif
graphic file with name M197.gif

Then we have Inline graphic. The corresponding Hilbert spaces for Inline graphic and Inline graphic are

graphic file with name M201.gif
graphic file with name M202.gif

It is straightforward to define that

graphic file with name M203.gif
graphic file with name M204.gif

The following lemma shows the relation between Inline graphic, Inline graphic and Inline graphic, which also reveals the relation between Inline graphic, Inline graphic and Inline graphic by Moore-Aronszajn theorem [12].

Lemma 2

Suppose Inline graphic (or Inline graphic) are two positive definie kernels. If Inline graphic, then

graphic file with name M214.gif

is a Hilbert space with the norm defined by

graphic file with name M215.gif

The idea of the proof of this lemma is exactly the same as the one for Lemma 1. Thus we omit it here.

A direct corollary of this lemma is:

Corollary 1

Under the assumption of Lemma 2, if the functions in Inline graphic and Inline graphic have no functions except for zero function in common. Then the norm of Inline graphic is given simply by

graphic file with name M219.gif

If we go back to our scenario, we can get the following result by Corollary 1:

graphic file with name M220.gif

This means that if we set Inline graphic in Problem (5), it will reduce to Problem (2). Therefore, the standard representer theorem is a special case of our data-dependent regularization problem (5).

Applications

As we said in the introduction part, many engineering problems can be viewed as learning multidimensional functions from incomplete data. In this section, we would like to show two applications of functions learning: image interpolation and patch-based iamge denoising.

Image Interpolation

Image interpolation tries to best approximate the color and intensity of a pixel based on the values at surrounding pixels. See Fig. 1 for illustration. From function learning perspective, image interpolation is to learn a function from the known pixels and their corresponding positions.

Fig. 1.

Fig. 1.

Illustration of image interpolation. The size of the original image is Inline graphic. We want to enlarge it as an Inline graphic image. Then the blue shaded positions are unknown. Using image interpolation, we can find the values of these positions. (Color figure online)

We would like to use the Lena image as shown in Fig. 2(a) to give an example of image interpolation utilizing the proposed framework. The zoomed image is shown in Fig. 2(d). In the image interpolation example, the two balancing parameters are set to be the same and the Laplacian kernel [13] is used:

graphic file with name M224.gif

Note that we can also use other kernels, for example, polynomial kernel and Gaussian kernel to proceed image interpolation. Choosing the right kernel is an interesting problem and we do not have enough space to compare different kernels in this paper.

Fig. 2.

Fig. 2.

Illustration of image interpolation. The original image is downsampled by a factor of 3 in each direction. We use the proposed function learning framework to obtain the interpolation function from downsampled image. From the results, we can see that the proposed framework works for image interpolation.

In Fig. 2(b), we downsampled the original image by a factor of 3 in each direction. The zoomed image is shown in Fig. 2(e). The interpolation result with the zoomed image are shown in Fig. 2(c) and Fig. 2(f).

Patch-Based Image Denoising

From the function learning point of view, the patch-based image denoising problem can be viewed as learning a function from noisy patches to their “noise-free” centered pixels. See Fig. 3 for illustration.

Fig. 3.

Fig. 3.

Illustration of patch-based image denoising. It can be viewed as learning a function from the Inline graphic noisy patches to the centered clean pixels.

In the patch-based image denoising application, we use the Laplacian kernel as well. We assume that the noisy patches are lying close to some manifold so we set the balancing parameter which controls the energy outside the manifold to be large enough. We use the images in Fig. 4 as known data to learn the function. Then for a given noisy image, we can use the learned function to do image denoising. To speed up the learning process, we randomly choose only 10% of the known data to learn the function.

Fig. 4.

Fig. 4.

Four training images. We use noisy images and clean pixels to learn the denoising function.

We use the image Baboon to test the learned denoising function. The denoising results are shown in Fig. 5. Each column shows the result corresponding to one noise level.

Fig. 5.

Fig. 5.

Illustration of the denoising results.

Conclusion and Future Work

In this paper, we introduced a framework of learning functions from part of the data. We gave a data-dependent regularization problem which helps us learn a function using the manifold structure of the data. We used two applications to illustrate the learning framework. While these two applications are just part of the learning framework. They are special cases of the data-dependent regularization problem. However, for the general application, we need to calculate Inline graphic and Inline graphic, which is hard to do so since we only have partial data. So we need to approximate Inline graphic and Inline graphic from incomplete data and to propose a new learning algorithm so that our framework can be used in a general application. This is part of our future work. Another line for the future work is from the theoretical aspect. We showed that the solution of the data-dependent regularization problem is the linear combination of the kernel. It then can be viewed as a function approximation result. If it is an approximated function, then we can consider the error analysis of the approximated function.

Contributor Information

Valeria V. Krzhizhanovskaya, Email: V.Krzhizhanovskaya@uva.nl

Gábor Závodszky, Email: G.Zavodszky@uva.nl.

Michael H. Lees, Email: m.h.lees@uva.nl

Jack J. Dongarra, Email: dongarra@icl.utk.edu

Peter M. A. Sloot, Email: p.m.a.sloot@uva.nl

Sérgio Brissos, Email: sergio.brissos@intellegibilis.com.

João Teixeira, Email: joao.teixeira@intellegibilis.com.

Qing Zou, Email: zou-qing@uiowa.edu.

References

  • 1.Schölkopf B, Smola AJ, Bach F. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge: MIT press; 2002. [Google Scholar]
  • 2.Aronszajn N. Theory of reproducing kernels. Trans. Am. Math. Soc. 1950;68(3):337–404. doi: 10.1090/S0002-9947-1950-0051437-7. [DOI] [Google Scholar]
  • 3.Schölkopf B, Herbrich R, Smola AJ. A generalized representer theorem. In: Helmbold D, Williamson B, editors. Computational Learning Theory; Heidelberg: Springer; 2001. pp. 416–426. [Google Scholar]
  • 4.Argyriou A, Micchelli CA, Pontil M. When is there a representer theorem? vector versus matrix regularizers. J. Mach. Learn. Res. 2009;10(Nov):2507–2529. [Google Scholar]
  • 5.Gohberg, I., Goldberg, S., Kaashoek, M.A.: Hilbert-Schmidt operators. In: Classes of Linear Operators, vol. I, pp. 138–147. Birkhäuser, Basel (1990)
  • 6.Helmberg G. Introduction to Spectral Theory in Hilbert Space. New York: Courier Dover Publications; 2008. [Google Scholar]
  • 7.Mikhail B, Partha N, Vikas S. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 2006;7:2507–2529. [Google Scholar]
  • 8.Pipkin AC. A Course on Integral Equations. New York: Springer; 1991. [Google Scholar]
  • 9.Tikhonov AN. Regularization of incorrectly posed problems. Soviet Math. Doklady. 1963;4(6):1624–1627. [Google Scholar]
  • 10.Saitoh S, Sawano Y. Theory of Reproducing Kernels and Applications. Singapore: Springer; 2016. [Google Scholar]
  • 11.Rudin W. Functional Analysis. MA. Boston: McGraw-Hill; 1991. [Google Scholar]
  • 12.Amir, A.D., Luis, G.C.R., Yukawa, M., Stanczak, S.: Adaptive learning for symbol detection: a reproducing kernel hilbert space approach. Mach. Learn. Fut. Wirel. Commun., 197–211 (2020)
  • 13.Kernel Functions for Machine Learning Applications. http://crsouza.com/2010/03/17/kernel-functions-for-machine-learning-applications/

Articles from Computational Science – ICCS 2020 are provided here courtesy of Nature Publishing Group

RESOURCES