Sequential Sampling and Estimation of Approximately Bandlimited Graph Signals

Sijie Lin; Ke Xu; Hui Feng; Bo Hu

doi:10.3390/s21041460

. 2021 Feb 19;21(4):1460. doi: 10.3390/s21041460

Sequential Sampling and Estimation of Approximately Bandlimited Graph Signals

Sijie Lin ¹, Ke Xu ², Hui Feng ^1,^*, Bo Hu ¹

Editor: Vladimir Stankovic

PMCID: PMC7922557 PMID: 33669801

Abstract

Graph signal sampling has been widely studied in recent years, but the accurate signal models required by most of the existing sampling methods are usually unavailable prior to any observations made in a practical environment. In this paper, a sequential sampling and estimation algorithm is proposed for approximately bandlimited graph signals, in the absence of prior knowledge concerning signal properties. We approach the problem from a Bayesian perspective in which we formulate the signal prior by a multivariate Gaussian distribution with unknown hyperparameters. To overcome the interconnected problems associated with the parameter estimation, in the proposed algorithm, hyperparameter estimation and sample selection are performed in an alternating way. At each step, the unknown hyperparameters are updated by an expectation maximization procedure based on historical observations, and then the next node in the sampling operation is chosen by uncertainty sampling with the latest hyperparameters. We prove that under some specific conditions, signal estimation in the proposed algorithm is consistent. Subsequent validation of the approach through simulations shows that the proposed procedure yields performances which are significantly better than existing state-of-the-art approaches notwithstanding the additional attribute of robustness in the presence of a broad range of signal attributes.

Keywords: graph signal, sequential sampling, consistent estimation

1. Introduction

Graph signal is a powerful tool to represent and analyze data with irregular interconnections by defining signal values on graph vertices and assigning edge weights according to correlation or similarity [1]. In the past decade, graph signal processing theory has developed rapidly, extending classical signal processing techniques such as Fourier analysis [1,2], filtering [3,4,5], sampling and interpolation [6,7,8] to graph signal setting. Related concepts and methods have found wide application in sensor networks [9,10], brain analysis [11], image processing [12], three-dimensional (3D) point cloud processing [13], and machine learning [14,15,16].

Sampling for (lossless) reconstruction or (minimum error) estimation is a fundamental problem in graph signal processing. For example, a sample survey in a social network needs to carefully select interviewees so as to better predict the attitudes of all users. In a sensor network, it is important to optimize sensor placement so that more information can be collected by a restricted number of sensors due to economic constraint [9]. In addition, any graph-based active semisupervised learning task may also be interpreted as a graph signal sampling and estimation problem [14,15].

There have been multiple works on graph signal sampling. The authors of [6,7] try to extend the Nyquist sampling theory to bandlimited graph signals. More works are seeking optimal sampling sets for graph signal estimation, where the graph signals are usually assumed to be bandlimited [17,18,19] or approximately bandlimited [17,20] in the frequency domain, or smooth in the vertex domain [17,21]. Sampling strategies mainly include topology-based methods that compute a score for each vertex [18,21], and design-of-experiment (DOE) approaches that optimize certain scalarization form of some target matrix [19,22]. Most of the current sampling methods rely on accurate bandwidth [6,18,19] or prior distribution [22,23] of the graph signal.

It is natural in many applications to assume that the graph signal is somehow bandlimited or smooth. For instance, people with strong social connection are likely to hold similar opinions and temperature sensors located close to each other usually get similar measurements. However, in practice it is hard to know the exact bandwidth or prior distribution of the graph signal before any observation. In such cases, those sampling strategies based on accurate signal model become not applicable.

A possible solution is to sample and estimate in a sequential way. Unknown signal properties can be estimated from previous observations, and the subsequent sampling node can be selected under the latest signal model. Such a sequential framework can overcome the difficulty of incomplete system model and take advantage of model-based sample selection and signal estimation [20].

In this work, we investigate the sequential sampling and estimation of approximately bandlimited (ABL) graph signals, whose bandwidth and energy level are both not known in advance. With the aid of a Butterworth low-pass filter, the ABL graph signal is assumed to follow a multivariate Gaussian distribution with unknown hyperparameters. The variance of observation noise is also considered unknown.

In the proposed algorithm, hyperparameter estimation and sample selection are performed in an alternating way. At each step, expectation maximization (EM) [24] is first used to update the maximum marginal likelihood (MML) estimation of unknown hyperparameters based on historical observations. Substituting these latest estimated values into the signal prior and noise distribution, a Bayesian estimation and prediction procedure can then be performed. According to the uncertainty sampling (US) criterion [25], the node with the largest predictive variance is selected as the next node to sample.

In fact, estimating prior distribution from data, namely, empirical Bayes (EB) method, has long been studied [26]. It has been proved that under certain assumptions, the EB posterior distribution of parameter is consistent at its true value [27]. Meanwhile, an alternating approach analogous to the proposed one has been used in sequential DOE for nonlinear least squares (LS) regression. There, parameter estimation is proved to be consistent, and the design is asymptotically optimal [28].

In this paper, we prove that under specific conditions, signal estimation in the proposed algorithm is consistent at the true value. That is, as the number of samples goes to infinity, the estimated graph signal can get arbitrarily close to the true value. The finite-time performance of the proposed algorithm is validated by simulation results. Our algorithm is able to adjust to ABL graph signals with different properties, and it performs better in estimation error than existing methods.

The rest of the paper is organized as follows. We detail the signal prior and observation model in Section 2, and develop a sequential sampling and estimation algorithm in Section 3. The consistency of the proposed algorithm is proved under specific conditions in Section 4, and its finite-time performance is validated by experiments in Section 5. We conclude the paper in Section 6.

Throughout the paper, we use normal-weight italic lowercase letters (e.g., x) to denote scalar variables, bold italic lowercase letters (e.g., $x$ ) to denote vectors, bold italic uppercase letters (e.g., $X$ ) to denote matrices, and calligraphic uppercase letters (e.g., $X$ ) to denote sets.

2. System Model

2.1. Preliminaries

We consider a simple connected weighted undirected graph $G = (V, E, W)$ , where $V$ is the set of vertices indexed by ${1, \dots, N}$ ; $E$ is the set of edges between vertex pairs, e.g., $(i, j) \in E$ denotes an edge between vertex i and vertex j; and $W$ is the weighted adjacency matrix, with $w_{i j}$ the weight of edge between vertex i and vertex j if they are connected, or 0 otherwise. A graph signal that takes a real value on each vertex of the graph can be represented as $f \in R^{N}$ , where $f_{i}$ is the signal value on vertex i.

The Laplacian matrix is defined as $L = D - W$ , where $D = \dots (d_{1}, \dots, d_{N})$ with $d_{i} = \sum_{j : (i, j) \in E} w_{i j}$ is the degree matrix. Being symmetric and positive semi-definite, $L$ has a spectral decomposition $L = V Λ V^{T}$ , where $Λ$ is a diagonal matrix of the eigenvalues $0 = λ_{1} < λ_{2} \leq \dots \leq λ_{N}$ , and $V$ contains the corresponding orthonormal eigenvectors ${v_{k}}_{k = 1}^{N}$ as columns. These eigenvectors act as graph Fourier basis, and the associated eigenvalues are regarded as graph frequencies [1]. The frequency-domain coefficients $\hat{f}$ of a graph signal $f$ can be calculated via the graph Fourier transform (GFT) as $\hat{f} = V^{T} f$ . The graph signal $f$ can be expressed as $f = V \hat{f}$ , which is known as the inverse GFT (IGFT).

2.2. Signal Prior

In this work, the graph signal $f$ is assumed to be drawn from the following distribution:

p (f; α, γ) = N (f | 0, {(α I_{N} + γ L^{2 n})}^{- 1}),

(1)

where $I_{N}$ denotes the $N \times N$ identity matrix, and $n, α, γ > 0$ are hyperparameters, among which $α, γ$ are assumed unknown in prior. The meanings of the signal model and hyperparameters are explained in the frequency domain as follows.

By GFT, the spectral coefficients ${{\hat{f}}_{k}}_{k = 1}^{N}$ follow independent zero-mean Gaussian distributions with variances conforming to a Butterworth low-pass filter ([29], Section 7.3):

p ({\hat{f}}_{k}) = N ({\hat{f}}_{k} | 0, \frac{g^{2}}{1 + {(λ_{k} / λ_{c})}^{2 n}}),

(2)

where n is the order, $λ_{c} = \sqrt[2 n]{α / γ}$ is the cut-off frequency, and $g = 1 / \sqrt{α}$ is the amplitude gain of the filter. Under such a prior, we say the graph signal is approximately bandlimited in a probabilistic sense, where $λ_{c}$ plays the role of bandwidth, n controls the strictness of banlimitedness, and g determines the energy level of the graph signal.

Specially, when $n = 1 / 2$ and $δ = α / γ$ is small, the signal prior Equation (1) becomes

p (f) \propto exp (- \frac{γ}{2} f^{T} (L + δ I_{N}) f),

(3)

which is widely used in graph signal sampling and estimation [9,23]. The restriction on total variation $f^{T} L f = \sum_{(i, j) \in E} w_{i j} {(f_{i} - f_{j})}^{2}$ can lead to smooth signals over the graph.

If $n \to \infty$ , the out-of-band coefficients become deterministic zeros. The graph signal becomes strictly bandlimited, represented by

f = V_{K} {\hat{f}}_{K},

(4)

where $K = max {k | λ_{k} \leq λ_{c}}$ , ${\hat{f}}_{K}$ collects the first K GFT coefficients that may be non-zero, and $V_{K}$ contains the first K GFT basis.

In conclusion, our signal prior Equation (1) is flexible to describe approximately bandlimited graph signals with different bandwidth, strictness of bandlimitedness, and energy level. It can also cover the most commonly used smoothness prior [9,23] and bandlimitedness prior [6,7,18,19]. Additionally, this model is simple with three hyperparameters, and can be represented in the vertex domain without eigendecomposition.

Here, hyperparameters $α, γ$ or equivalently $λ_{c}, g$ in the signal prior are assumed unknown, that is, we do not need to have prior knowledge about the bandwidth or energy level of the graph signal. Only a value for n is required to control the strictness of bandlimitedness of the estimated signal. Compared to those methods for only strictly bandlimited graph signals with known bandwidth or smooth graph signals with known prior distribution, our signal prior is more general.

2.3. Observation Model

To overcome the difficulty caused by unknown hyperparameters, a sequential sampling process is considered, where one node is sampled at each step, so that unknown hyperparameters can be estimated from previous observations, and the next node to sample can be selected using the latest hyperparameters.

Suppose that vertex $s_{t}$ is sampled at step t, and the observed signal value is $y_{t}$ . The observation model can be expressed as

y_{t} = ψ_{t}^{T} f + w_{t},

(5)

where $ψ_{t}^{T} = e_{s_{t}}^{T}$ is the sampling vector with its $s_{t}$ -th element equal to 1 and others equal to 0, and $w_{t}$ is an additive zero-mean Gaussian observation noise with unknown precision $β$ ,

p (w_{t}; β) = N (w_{t} | 0, β^{- 1}) .

(6)

In hyperparameter, signal estimation, and sample selection at step t, the historical observations from step 1 to step t are considered together. Let

y_{1 : t} = [\begin{matrix} y_{1} \\ ⋮ \\ y_{t} \end{matrix}], Ψ_{1 : t} = [\begin{matrix} ψ_{1}^{T} \\ ⋮ \\ ψ_{t}^{T} \end{matrix}] = [\begin{matrix} e_{s_{1}}^{T} \\ ⋮ \\ e_{s_{t}}^{T} \end{matrix}], w_{1 : t} = [\begin{matrix} w_{1} \\ ⋮ \\ w_{t} \end{matrix}] .

(7)

The noise values $w_{1}, \dots, w_{t}$ involved in different samples (on either different vertices or the same vertex) are assumed to be independent and identically distributed (i.i.d.). According to Equations (5) and (6),

\begin{matrix} y_{1 : t} = Ψ_{1 : t} f + w_{1 : t}, \end{matrix}

(8)

\begin{matrix} p (w_{1 : t}; β) = N (w_{1 : t} | 0, β^{- 1} I_{t}) . \end{matrix}

(9)

2.4. Problem Formulation

Under the signal and observation model described in Section 2.2 and Section 2.3, at each step t in the sequential sampling and estimation process, the two core problems are (1) how to estimate the unknown hyperparameters $α, γ, β$ as well as the graph signal $f$ based on the historical observations $Ψ_{1 : t}, y_{1 : t}$

α_{t}^{'}, γ_{t}^{'}, β_{t}^{'}, f_{t}^{'} ↤ Ψ_{1 : t}, y_{1 : t},

(10)

where $α_{t}^{'}, γ_{t}^{'}, β_{t}^{'}$ are the estimated values of hyperparameters at step t, and $f_{t}^{'}$ denotes the estimated graph signal at step t, and (2) how to select the next node $s_{t + 1}$ to be sampled at step $t + 1$ with the latest estimated values of hyperparameters

s_{t + 1} ↤ Ψ_{1 : t}, y_{1 : t}, α_{t}^{'}, γ_{t}^{'}, β_{t}^{'} .

(11)

Our ultimate goal is to minimize signal estimation error within given budget, or from another angle, reach certain estimation accuracy with least samples.

3. Algorithm

3.1. Hyperparameter and Signal Estimation

We first focus on unknown hyperparameter estimation from observation data. Here, maximum marginal likelihood (MML) estimation of unknown hyperparameters is adopted:

\begin{matrix} α_{t}^{'}, γ_{t}^{'}, β_{t}^{'} & = arg max_{α, γ, β} p (y_{1 : t} | Ψ_{1 : t}; α, γ, β) \\ = arg max_{α, γ, β} \int_{R^{N}} p (y_{1 : t} | Ψ_{1 : t}, f; β) p (f; α, γ) d f . \end{matrix}

(12)

Then, the signal posterior given estimated hyperparameters is Gaussian:

p (f | Ψ_{1 : t}, y_{1 : t}; α_{t}^{'}, γ_{t}^{'}, β_{t}^{'}) = N (f | μ_{t}, C_{t}),

(13)

with mean vector and covariance matrix ([30], Section 10.6)

\begin{matrix} μ_{t} = β_{t}^{'} C_{t} Ψ_{1 : t}^{T} y_{1 : t}, \end{matrix}

(14)

\begin{matrix} C_{t} = {(α_{t}^{'} I_{N} + γ_{t}^{'} L^{2 n} + β_{t}^{'} Ψ_{1 : t}^{T} Ψ_{1 : t})}^{- 1} . \end{matrix}

(15)

The minimum mean square error (MMSE) or maximum a posteriori (MAP) estimation of the graph signal is $f_{t}^{'} = μ_{t}$ .

However, direct optimization of Equation (12) is intractable. We view $f$ as a hidden variable, and introduce expectation maximization (EM) ([24], Section 9.4) in our algorithm to estimate $α, γ, β$ , and thereby obtain the posterior distribution of $f$ . To be concise, the subscripts t and $1 : t$ are omitted in the introduction of our EM algorithm.

Consider the l-th EM iteration. In E step, hyperparameters $α, γ, β$ are fixed at $α_{l - 1}, γ_{l - 1},$ $β_{l - 1}$ , and the signal posterior $p (f | Ψ, y; α_{l - 1}, γ_{l - 1}, β_{l - 1})$ with mean $μ_{l - 1}$ and covariance $C_{l - 1}$ is computed as in Equations (13)–(15).

In M step, $α, γ, β$ are updated according to

\begin{matrix} α_{l}, γ_{l}, β_{l} & = arg max_{α, γ, β} E_{f | Ψ, y; α_{l - 1}, γ_{l - 1}, β_{l - 1}} [ln p (y, f | Ψ; α, γ, β)] \\ = arg max_{α, γ, β} E_{f | Ψ, y; α_{l - 1}, γ_{l - 1}, β_{l - 1}} [ln p (y | Ψ, f; β) + ln p (f; α, γ)], \end{matrix}

(16)

where $E_{f | Ψ, y; α_{l - 1}, γ_{l - 1}, β_{l - 1}} [\cdot]$ means the expectation is taken with respect to $p (f | Ψ, y; α_{l - 1},$ $γ_{l - 1}, β_{l - 1})$ computed at E step, the signal prior $p (f; α, γ)$ is given in Equation (1), and the signal likelihood $p (y | Ψ, f; β) = N (y | Ψ f, β^{- 1} I)$ according to Equations (8) and (9).

For brevity, denote the expectation operator in Equation (16) as $E_{l - 1} [\cdot]$ , and the expectation value as $E_{l - 1}$ , which is also the objective value of the optimization problem. By direct calculations, we have

\begin{matrix} E_{l - 1} = & \frac{1}{2} ln | α I_{N} + γ L^{2 n} | - \frac{α}{2} E_{l - 1} [f^{T} f] - \frac{γ}{2} E_{l - 1} [f^{T} L^{2 n} f] \\ + \frac{M}{2} ln β - \frac{β}{2} E_{l - 1} [{(y - Ψ f)}^{T} (y - Ψ f)] + const, \end{matrix}

(17)

where M is the sample size; “const” denotes terms that are independent of $α, γ, β$ ; and

\begin{matrix} E_{l - 1} [f^{T} f] & = E_{l - 1} [{(f - μ_{l - 1})}^{T} (f - μ_{l - 1}) + 2 μ_{l - 1}^{T} f - μ_{l - 1}^{T} μ_{l - 1}] \\ = \dots (C_{l - 1}) + μ_{l - 1}^{T} μ_{l - 1}, \end{matrix}

(18)

\begin{matrix} E_{l - 1} [f^{T} L^{2 n} f] & = E_{l - 1} [{(f - μ_{l - 1})}^{T} L^{2 n} (f - μ_{l - 1}) + 2 μ_{l - 1}^{T} L^{2 n} f - μ_{l - 1}^{T} L^{2 n} μ_{l - 1}] \\ = \dots (L^{2 n} C_{l - 1}) + μ_{l - 1}^{T} L^{2 n} μ_{l - 1}, \end{matrix}

(19)

\begin{matrix} E_{l - 1} [{(y - Ψ f)}^{T} (y - Ψ f)] & = E_{l - 1} [{(y - Ψ μ_{l - 1})}^{T} (y - Ψ μ_{l - 1}) \\ - 2 {(y - Ψ μ_{l - 1})}^{T} Ψ (f - μ_{l - 1}) \\ + {(f - μ_{l - 1})}^{T} Ψ^{T} Ψ (f - μ_{l - 1})] \\ = {(y - Ψ μ_{l - 1})}^{T} (y - Ψ μ_{l - 1}) + \dots (Ψ^{T} Ψ C_{l - 1}) . \end{matrix}

(20)

Note that $E_{l - 1}$ is concave with respect to $α, γ, β$ , see Appendix A. The maximization problem in Equation (16) can be solved by any standard tool for convex optimization. Here, the first order conditions of the optimization problem are analyzed to give some insights into the hyperparameter estimation, and then a possible efficient search method based on these conditions is provided in Appendix B.

Take the partial derivatives of $E_{l - 1}$ with respect to $α, γ, β$ , and set them to zero:

\begin{matrix} \frac{\partial E_{l - 1}}{\partial α} = \frac{1}{2} \dots ({(α I_{N} + γ L^{2 n})}^{- 1}) - \frac{1}{2} E_{l - 1} [f^{T} f] = 0, \end{matrix}

(21)

\begin{matrix} \frac{\partial E_{l - 1}}{\partial γ} = \frac{1}{2} \dots ({(α I_{N} + γ L^{2 n})}^{- 1} L^{2 n}) - \frac{1}{2} E_{l - 1} [f^{T} L^{2 n} f] = 0, \end{matrix}

(22)

\begin{matrix} \frac{\partial E_{l - 1}}{\partial β} = \frac{M}{2 β} - \frac{1}{2} E_{l - 1} [{(y - Ψ f)}^{T} (y - Ψ f)] = 0 . \end{matrix}

(23)

Note that $tr ({(α I_{N} + γ L^{2 n})}^{- 1})$ , $tr ({(α I_{N} + γ L^{2 n})}^{- 1} L^{2 n})$ and $\frac{M}{β}$ can be seen as the prior expectations of $f^{T} f$ , $f^{T} L^{2 n} f$ and $w^{T} w$ with respect to $p (f; α, γ)$ and $p (w; β)$ . The M step actually looks for a group of hyperparameters that makes these prior expectations equal to their posterior ones. In this sense, the estimated hyperparameters agree with the observations.

Repeat the E and M steps as above until convergence, and we will obtain an MML estimation of unknown hyperparameters $α, γ, β$ , together with a posterior of the graph signal $f$ . This process is summarized in Algorithm 1. Although the estimated hyperparameters are not guaranteed to be globally optimal by MML, we find the performance good enough in experiments.

Algorithm 1 Hyperparameter and signal estimation by EM.

1:
Initialize $α_{0}, γ_{0}, β_{0}$ .
2:
for $l = 1, 2, \dots$ do
3:
Compute $μ_{l - 1}, C_{l - 1}$ as in Equations (14) and (15).
4:
Update $α_{l}, γ_{l}, β_{l}$ according to Equation (16).
5:
end for

Open in a new tab

3.2. Sample Selection

Having been able to estimate the unknown hyperparameters from previous observations by EM, at each decision step, the subsequent sampling node can be selected using the latest estimated values of hyperparameters.

According to the uncertainty sampling (US) criterion in active learning ([25], Chapter 2), we should scan through all the nodes, and pick the one whose observed signal value we are most uncertain about as the next node to sample. Here, predictive variance is regarded as a measurement of uncertainty.

The predictive distribution of an observation y on vertex s with sampling vector $ψ^{T} = e_{s}^{T}$ given historical observations $Ψ_{1 : t}, y_{1 : t}$ is

p (y | Ψ_{1 : t}, y_{1 : t}, ψ^{T}; α_{t}^{'}, γ_{t}^{'}, β_{t}^{'}) = N (y | e_{s}^{T} μ_{t}, e_{s}^{T} C_{t} e_{s} + {(β_{t}^{'})}^{- 1}),

(24)

where $μ_{t}$ and $C_{t}$ are the posterior mean and covariance of the graph signal $f$ , respectively, given $Ψ_{1 : t}$ and $y_{1 : t}$ . The predictive variance consists of two parts: estimative variance $e_{s}^{T} C_{t} e_{s}$ and noise variance ${(β_{t}^{'})}^{- 1}$ , among which the latter is equal for all vertices. The next sampling node $s_{t + 1}$ can be decided by

s_{t + 1} = arg max_{s \in V} e_{s}^{T} C_{t} e_{s} .

(25)

3.3. Sequential Sampling and Estimation

Finally, the complete procedure of the proposed sequential sampling and estimation algorithm for approximately bandlimited graph signals is given in Algorithm 2. At each step, first unknown hyperparameters are re-estimated by EM as in Section 3.1, and then the next node to sample is selected by US as in Section 3.2. The total number of sampling steps T is equal to the sampling budget.

Algorithm 2 Sequential sampling and estimation of approximately bandlimited graph signals.

1:
Choose the first sampling node $s_{1}$ arbitrarily.
2:
for $t = 1, 2, \dots$ do
3:
Sample the vertex $s_{t}$ and obtain an observation $y_{t}$ .
4:
Update hyperparameters $α_{t}^{'}, γ_{t}^{'}, β_{t}^{'}$ and signal posterior $μ_{t}, C_{t}$ by EM as in Algorithm 1.
5:
Select the next node to sample $s_{t + 1}$ by US as in Equation (25).
6:
end for

Open in a new tab

Our sequential sampling strategy takes into account not only graph topology, but also previous observations, by both estimating hyperparameters based on historical data and selecting sampling nodes based on signal posterior. Making full use of historical observations and deciding which nodes to sample online, the proposed algorithm is able to cope with the situation where signal and noise distributions are not completely available in advance, and efficiently select samples to estimate the underlying graph signal.

The computational complexity of a decision step in the proposed algorithm is $O (n_{t}^{EM} \bar{n_{t}^{bs}} N^{3})$ , where $n_{t}^{EM}$ is the number of EM iterations, $\bar{n_{t}^{bs}}$ is the average iterations of binary search in M step, and N is the graph size. The computational cost is acceptable in view of the performance improvement, especially in time-insensitive scenarios with large observation cost. The proposed method is able to efficiently select samples to give accurate estimation of signals thanks to the sequential framework with EM hyperparameter update, and the signal estimation is consistent.

4. Asymptotic Analysis

In the previous section, we have developed a sequential sampling and estimation algorithm for approximately bandlimited graph signals with an incomplete system model, where hyperparameter estimation and sample selection are performed in an alternating way. The performance of the proposed algorithm will be validated via simulation in the next section, where we will see its efficient sample selection and accurate signal estimation with limited sampling budget and little prior knowledge. In this section, we are to emphasize that the proposed algorithm improves limited-budget performance without sacrifice of consistency, which is not trivial for a sequential decision process.

Intuitively, as the number of observations grows, hyperparameters better fitting the true model will be picked out, really important nodes will be selected to sample, and finally an excellent estimation of the graph signal can be obtained. Here, we try to provide some theoretical support for this intuition. The asymptotic performance of the proposed algorithm is analyzed, and we state and prove the consistency of signal estimation in our method.

Before concentrating on the asymptotic performance of signal estimation, we first investigate the asymptotic behavior of sample selection in the proposed algorithm.

Lemma 1.

Let $m_{i, t}$ denote the number of times vertex i is sampled up to step t. If $lim_{t \to \infty} \frac{1}{t} \frac{t r (α_{t}^{'} I_{N} + γ_{t}^{'} L^{2 n})}{N β_{t}^{'}}$ $= 0$ ,

$\exists δ > 0, \underset{t \to \infty}{lim inf} min_{i \in V} \frac{m_{i, t}}{t} > δ .$

That is, as $t \to \infty$ , all the nodes of the graph will be sampled again and again, each maintaining a sampling ratio greater than $δ$ , including the lowest one. A proof of Lemma 1 is given in Appendix C. Then, two direct corollaries follow.

Corollary 1.

$\forall i \in V$ , $\exists δ > 0$ , $\underset{t \to \infty}{lim inf} \frac{m_{i, t}}{t} > δ$ .

Corollary 2.

$\forall i \in V$ , $lim_{t \to \infty} m_{i, t} = \infty$ .

Based on them, we are now to state our main theorem on the consistency of signal estimation in the proposed algorithm.

Theorem 2.

The signal posterior $p (f | Ψ_{1 : t}, y_{1 : t}; α_{t}^{'}, γ_{t}^{'}, β_{t}^{'})$ is consistent at the true value $f^{*}$ , if EM converges at MML hyperparameters satisfying the condition in Lemma 1.

The consistency of signal posterior here means, as $t \to \infty$ , the posterior distribution will become a point mass at the true value of the graph signal. A proof of Theorem 2 is provided in Appendix D. Then, we have the following corollary.

Corollary 3.

The MMSE/MAP estimation $f_{t}^{'} = μ_{t}$ is consistent at $f^{*}$ under the conditions in Theorem 2.

It ensures that under specific conditions, as the sample size grows infinitely large, the MMSE/MAP estimation can get arbitrarily close to the true signal value, i.e., the graph signal can be estimated with arbitrary accuracy. This guarantees the asymptotic performance of the proposed algorithm in signal estimation.

The condition $lim_{t \to \infty} \frac{1}{t} \frac{tr (α_{t}^{'} I_{N} + γ_{t}^{'} L^{2 n})}{N β_{t}^{'}} = 0$ is satisfied when $\frac{α_{t}^{'}}{β_{t}^{'}} = o (t)$ , $\frac{γ_{t}^{'}}{β_{t}^{'}} = o (t)$ , or more sufficiently, if $\underset{t \to \infty}{lim sup} α_{t}^{'} < \infty$ , $\underset{t \to \infty}{lim sup} γ_{t}^{'} < \infty$ , $\underset{t \to \infty}{lim inf} β_{t}^{'} > 0$ , which ensures that the estimated signal energy and bandwidth do not decrease to zero, and the estimated noise variance keeps finite. We did not observe any violation of these conditions in plenty of experiments, where the true graph signals do not have all same values, and all observations are finite.

5. Experiments

In this section, the finite-time performance of the proposed algorithm is validated via a series of simulation experiments, on both synthetic and real-world data.

The performance of the proposed algorithm is compared to the following methods:

RndEM: Select sampling nodes randomly and uniformly, and estimate hyperparameters and the signal using the proposed EM algorithm.
OrcUS: Suppose that the hyperparameters are known in advance (the “oracle”). Select sampling nodes by US, and estimate the signal by MMSE/MAP.
Anis2016: A heuristic sampling algorithm proposed in Anis et al. [7] to maximize the cut-off frequency such that the sampling set is a uniqueness set for the bandlimited subspace. The graph signal is recovered in the bandlimited subspace by least squares. The estimation order k of cut-off frequency is set to $k = 4$ .
Perraudin2018: A nonuniform random sampling method proposed in Perraudin et al. [21] with probability relevant to local uncertainty. Inpainting is done by minimizing total variation. The kernel in local uncertainty is $\hat{g} (λ) = exp (- τ λ)$ where $τ = 8$ .
Sakiyama2019: In this method by Sakiyama et al. [31], the graph signal is recovered by a linear combination of localization operators at the sampled nodes. The sampling set is designed such that the energy of operator at each node is large and meanwhile the overlapping areas are small. The kernel in localization operator is $\hat{g} (λ) = exp (- τ λ)$ where $τ = 0.5$ . The Chebyshev polynomial approximation order is $P = 12$ , and the signal recovery order is $k = 12$ .

RndEM and OrcUS are references to verify the effectiveness of the proposed algorithm. Anis2016, Perraudin2018, Sakiyama2019 are three state-of-the-art methods that do not directly rely on signal model. For Anis2016, the benefit of further increasing k is not noticeable in our experiments, but the computations will become less numerically stable. For Perraudin2018 and Sakiyama2019, we use the same kernel type as in the original papers, and the parameters are chosen by experiments to adjust to our settings.

The normalized $l_{2}$ -norm estimation error

e = \frac{∥ f^{'} - f^{*} ∥_{2}}{∥ f^{*} ∥_{2}}

(26)

is taken as a performance index, where $f^{'}$ is the estimated signal, and $f^{*}$ is the true value.

5.1. Simulations on Synthetic Data

Three graphs of representative types that have widespread applications in the real world are used for testing in our experiments:

$G_{1}$ : A random geometric graph with $N = 100$ vertices randomly placed in a 1-by-1 square, and edge weights assigned via a thresholded Gaussian kernel
$w_{i j} = \{\begin{matrix} exp (- \frac{d_{i j}^{2}}{2 σ^{2}}), & if d_{i j} < r \\ 0, & otherwise \end{matrix}$ (27)
where $d_{i j}$ is the Euclid distance between vertex i and vertex j, $r = 0.2$ and $σ = 0.1$ .
$G_{2}$ : A small world graph with $N = 100$ nodes, generated by the Watts–Strogatz model [32] with mean node degree 6 and rewiring probability $0.2$ .
$G_{3}$ : A scale-free graph with $N = 100$ nodes, generated by the Barabási–Albert model [33,34] with 1 initial node, each newly added node connected to 1 existing node, and connecting probabilities proportional to the degrees of existing nodes.

The graph topologies as well as their eigenvalue distributions are visualized in Figure 1. Due to limited space, we will mainly display and analyze our experimental results on $G_{1}$ in detail, whereas example results on $G_{2}$ and $G_{3}$ are shown at the end to demonstrate the topology adaptability.

The three graphs used in our simulations, and the histograms of their eigenvalue distributions. (a) $G_{1}$ , a random geometric graph and (d) its eigenvalue distribution. (b) $G_{2}$ , a Watts–Strogatz small world graph and (e) its eigenvalue distribution. (c) $G_{3}$ , a Barabási–Albert scale-free graph and (f) its eigenvalue distribution.

Graph signals are generated from the ABL prior distribution Equation (1). Moreover, observation noise follows i.i.d. zero-mean Gaussian distribution Equation (6). A variety of hyperparameters are considered, in order to cover ABL graph signals with different bandwidth, strictness of bandlimitedness and energy level, and scenarios with different signal-noise ratio (SNR). SNR here considering random ABL graph signal is defined as

{S N R}_{r} = 10 lg \frac{\sum_{k = 1}^{N} 1 / (α + γ λ_{k}^{2 n})}{N / β} .

(28)

For each setting, 100 graph signals are generated to evaluate the average performance of each method. The average estimation errors under different sample sizes are computed and plotted.

First of all, we compare the performance of the proposed algorithm with RndEM and OrcUS on $G_{1}$ . Hyperparameters are set to $n = 2$ , $λ_{c} = 2$ , $g = 1$ , and ${S N R}_{r} = 15$ dB. As shown in Figure 2, although in the initial stage, the estimation error of the proposed algorithm is similar to RndEM, as the sample size grows, our performance gradually approaches that of OrcUS. This result is as expected, as hyperparameter estimation gets better with growing sample size, which can lead to more efficient sample selection and more accurate signal estimation. Our superiority to RndEM (EM estimation without US sampling) demonstrates the effectiveness of choosing sampling nodes by US. The performance gap between our method and OrcUS (US sampling with known hyperparameters) tends to disappear when samples are abundant, also supporting the validity of hyperparameter estimation by EM.

Estimation error with growing sample size of the proposed algorithm versus RndEM and OrcUS on $G_{1}$ . Hyperparameters are $n = 2$ , $λ_{c} = 2$ , $g = 1$ , and ${S N R}_{r} = 15$ dB.

Subsequently, the proposed algorithm is compared to three existing sampling and estimation methods—Anis2016, Perraudin2018, and Sakiyama2019—in scenarios without much prior knowledge about signal and noise model. A series of experiments with different hyperparameters are done on $G_{1}$ to test and compare their performance in all kinds of settings. In the first experiment, we fix $g = 1$ , ${S N R}_{r} = 15$ dB, and repeat the simulation with $n = 1, 2$ , $λ_{c} = 2, 3$ , corresponding to ABL graph signals with different bandwidth and strictness of bandlimitedness. Simulation results are depicted in Figure 3. From the graphics we can see, the proposed algorithm can obtain better or competitive estimation performance compared to state-of-the-art methods on ABL graph signals with various properties. For Perraudin2018 and Sakiyama2019, the proposed method significantly outperforms them after a small number of initial samples in all the four settings. As for Anis2016, when n is small or $λ_{c}$ is large, which means the signal is far from strictly bandlimited or occupies large bandwidth, we need far fewer samples than Anis2016 to recover the signal with tolerable error. This is because our method estimates the graph signal at all frequencies from the beginning thanks to the Bayesian framework, and our signal prior with hyperparameters estimated from data can better fit ABL graph signals with different properties. Therefore, the proposed algorithm is more robust against different signal properties and is able to obtain better estimation results in most cases.

Estimation errors of the proposed algorithm versus Anis2016, Perraudin2018, and Sakiyama2019 on $G_{1}$ with different signal bandwidth and strictness of bandlimitedness, i.e., different n and $λ_{c}$ . We fix $g = 1$ and ${S N R}_{r} = 15$ dB. (a) $n = 1, λ_{c} = 2$ . (b) $n = 1, λ_{c} = 3$ . (c) $n = 2, λ_{c} = 2$ . (d) $n = 2, λ_{c} = 3$ .

We continue to investigate our performance for ABL graph signals with different energy level as well as in scenarios with different signal–noise ratio. We fix $n = 2$ , $λ_{c} = 2$ . When changing g from 0.01 to 100, ${S N R}_{r}$ is fixed at 15 dB. When changing ${S N R}_{r}$ from 5 dB to 25 dB, g is fixed at 1. Sampling size is set to $M = 60$ . According to Figure 4a, the proposed algorithm gets better performance for graph signals with lower energy. For one thing, even though the estimated variance in the signal prior also becomes larger when the signal energy is larger, the zero-mean Gaussian prior still prefers signals with lower energy. For another, as the variance gets larger, the signal prior becomes flatter, and thus weaker, in the sense that it has less preference among different signals. Therefore, more observations are needed to reach the same estimation accuracy. Fortunately, this problem can be overcome by adaptive rescaling, dividing the observed signal values by a sufficiently large number, as long as we know the rough order of magnitudes of the signal energy. As for SNR, from Figure 4b, although lower SNR may result in worse performance of the proposed algorithm, our method keeps outperforming the others when samples are adequate.

Estimation errors of the proposed algorithm versus Anis2016, Perraudin2018, and Sakiyama2019 on $G_{1}$ with different signal energy level and signal–noise ratio, i.e., different g and ${S N R}_{r}$ . We fix $n = 2$ , $λ_{c} = 2$ . The sampling size is set to $M = 60$ . (a) g is changed from 0.01 to 100, while ${S N R}_{r}$ is fixed at 15 dB. (b) ${S N R}_{r}$ is changed from 5 dB to 25 dB, while g is fixed at 1.

Then, example simulation results on $G_{2}$ and $G_{3}$ are given in Figure 5. $n = 2$ , $g = 1$ , ${S N R}_{r} = 15$ dB. $λ_{c} = 3$ for $G_{2}$ , and $λ_{c} = 1$ for $G_{3}$ . The proposed algorithm also reaches higher estimation accuracy than the other methods with enough sampling budget. This demonstrates the robustness of the proposed algorithm against different graph topologies.

Estimation errors of the proposed algorithm versus Anis2016, Perraudin2018, and Sakiyama2019 on $G_{2}$ and $G_{3}$ . (a) Results on $G_{2}$ with $n = 2$ , $λ_{c} = 3$ , $g = 1$ , ${S N R}_{r} = 15$ dB. (b) Results on $G_{3}$ with $n = 2$ , $λ_{c} = 1$ , $g = 1$ , ${S N R}_{r} = 15$ dB.

From the simulation results above, without much prior knowledge, the proposed algorithm can adjust to ABL graph signals with different bandwidth, strictness of bandlimitedness, energy level, and SNR. In most cases, exceeding a small number of initial samples, the proposed algorithm is able to estimate the graph signals with higher accuracy than the existing methods.

Besides the validation of the limited-budget performance, we also verify the consistency of the proposed algorithm by a simulation experiment. We generate an ABL graph signal on $G_{1}$ with $n = 2$ , $λ_{c} = 2$ , and $g = 1$ , and the SNR is set to $15 dB$ . To observe the asymptotic performance, the maximum sample size is set to $10^{5}$ , which is $10^{3}$ times the graph size. We repeat the proposed algorithm on the graph signal for 50 times to see the mean as well as the variance of the estimation errors.

As shown in Figure 6, the mean normalized $l_{2}$ -norm error tends to decrease continuously as the sample size grows, and has decreased to about $5 \times 10^{- 3}$ when the sample size reaches $10^{5}$ . The standard deviation also tends to decline with growing sample size, which is around $3 \times 10^{- 4}$ with $10^{5}$ samples. Other ABL graph signals also have similar results. This agrees with the consistency of the proposed algorithm, which has been stated and theoretically proved in Section 4.

Result of the consistency verification experiment. The graph signal is on $G_{1}$ generated with $n = 2$ , $λ_{c} = 2$ , $g = 1$ , and the observation SNR is $15 dB$ . The curve denotes the mean of the normalized $l_{2}$ -norm error, and the region corresponds to ± 1 standard deviation around the mean. In the overview figure, the y-axis is in logarithmic scale to better display the decrease of the mean estimation error, whereas in the partial enlargement, it is linear to present the decline of the standard deviation.

5.2. A Real-World Example of Temperature Sensor Network

As an application example, we apply the proposed algorithm to the air temperatures measured by weather stations in China [35]. This example shows a potential application of the proposed algorithm in sensor selection.

We construct a K-nearest neighbor (K-NN) graph from the $N = 381$ weather stations with $K = 5$ , where each node corresponds to a weather station, and the great-circle distances between stations are considered in K-NN search. Edge weights are assigned via a Gaussian kernel:

w_{i j} = exp (- \frac{d_{i j}^{2}}{2 σ^{2}}) for (i, j) \in E,

(29)

where $d_{i j}$ is the great-circle distance between station i and station j in degree, and $σ = 2$ .

Then, the air temperatures measured by these weather stations can be seen as a graph signal. We consider two such signals in our experiment: $f_{1}$ at 0:00 on 1 January 2020, which is a winter night, and $f_{2}$ at 12:00 on 1 July 2020, which is a summer noon. The two graph signals and their spectra are visualized in Figure 7. As expected, the signals are smooth in the vertex domain and approximately bandlimited in the frequency domain.

Two graph signals of air temperatures measured by 381 weather stations in China, and their spectra. (a) $f_{1}$ at 0:00 on 1 January 2020, and (b) its spectrum. (c) $f_{2}$ at 12:00 on 1 July 2020, and (d) its spectrum.

We apply the proposed sequential sampling and estimation algorithm to these graph signals. Additive i.i.d. zero-mean Gaussian observation noise following Equation (6) is added, where ${S N R}_{d} = 10 lg \frac{f^{T} f}{N / β}$ is 15 dB. The known hyperparameter n in the signal prior Equation (1) is set to $1 / 2$ , which means a smoothness prior Equation (3) is considered. The observed signal values are divided by 100 as preprocessing, and accordingly the estimated graph signal is multiplied by 100 at output. The performance of the proposed algorithm is compared to Anis2016, Perraudin2018 and Sakiyama2019. As our first sampling node is chosen at random, Perraudin2018 is a random sampling method, and the observation noises also introduce randomness, we test each method on the same signal for 20 times, and compare their average estimation errors.

As shown in Figure 8, although the proposed algorithm needs some initial samples to “warm up”, after that the estimation error decreases rapidly and surpass Anis2016, Perraudin2018, and Sakiyama2019 in very short time, and we keep leading as the sample size further grows. This is because our sequential method with unknown hyperparameters requires a certain number of samples to have a relatively proper estimation of the signal and noise distributions. Then, based on the latest signal bandwidth, energy level, and noise precision estimated from data, we are able to choose the subsequent sampling nodes more efficiently and estimate the graph signal more accurately than other methods that take no account of such signal and noise properties.

Estimation errors of the two temperature graph signals by the proposed algorithm versus Anis2016, Perraudin2018, and Sakiyama2019. (a) Results on $f_{1}$ . (b) Results on $f_{2}$ .

6. Conclusions

In this paper, a sequential sampling and estimation algorithm is proposed for approximately bandlimited graph signals without complete prior knowledge about signal and noise properties. In the proposed algorithm, hyperparameter estimation by EM based on historical observations and sample selection by US with latest estimated hyperparameters are done in an alternating way. We prove in theory the consistency of signal estimation in the proposed algorithm under specific conditions. Finite-time performance of the proposed algorithm is validated by a series of simulation results. The proposed method provides a novel and practical sequential framework for graph signal sampling with unknown hyperparameters in the system model, and the experiment on temperature data shows our potential application in sensor selection on sensor networks with smooth measurements over the topologies. Other potential applications include opinion prediction in social networks assuming that people with strong social connection are likely to hold similar opinions, and active semisupervised learning tasks on similarity graphs.

Appendix A. Concavity of Objective Function in M Step

The objective function in Equation (16) is concave with respect to $α, γ, β$ , as the Hessian matrix

\begin{matrix} H = [\begin{matrix} \frac{\partial^{2} E_{l - 1}}{\partial α^{2}} & \frac{\partial^{2} E_{l - 1}}{\partial α \partial γ} & \frac{\partial^{2} E_{l - 1}}{\partial α \partial β} \\ \frac{\partial^{2} E_{l - 1}}{\partial γ \partial α} & \frac{\partial^{2} E_{l - 1}}{\partial γ^{2}} & \frac{\partial^{2} E_{l - 1}}{\partial γ \partial β} \\ \frac{\partial^{2} E_{l - 1}}{\partial β \partial α} & \frac{\partial^{2} E_{l - 1}}{\partial β \partial γ} & \frac{\partial^{2} E_{l - 1}}{\partial β^{2}} \end{matrix}] & = [\begin{matrix} - \frac{1}{2} \sum_{k = 1}^{N} \frac{1}{{(α + γ λ_{k}^{2 n})}^{2}} & - \frac{1}{2} \sum_{k = 1}^{N} \frac{λ_{k}^{2 n}}{{(α + γ λ_{k}^{2 n})}^{2}} & 0 \\ - \frac{1}{2} \sum_{k = 1}^{N} \frac{λ_{k}^{2 n}}{{(α + γ λ_{k}^{2 n})}^{2}} & - \frac{1}{2} \sum_{k = 1}^{N} \frac{λ_{k}^{4 n}}{{(α + γ λ_{k}^{2 n})}^{2}} & 0 \\ 0 & 0 & - \frac{N}{2 β^{2}} \end{matrix}] \\ = - \frac{1}{2} \sum_{k = 1}^{N} \frac{1}{{(α + γ λ_{k}^{2 n})}^{2}} [\begin{matrix} 1 \\ λ_{k}^{2 n} \\ 0 \end{matrix}] [\begin{matrix} 1 λ_{k}^{2 n} 0 \end{matrix}] + [\begin{matrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & - \frac{N}{2 β^{2}} \end{matrix}] \end{matrix}

(A1)

is negative semi-definite for all $α, γ, β$ .

Appendix B. Search Method for Optimization Problem in M Step

To maximize $E_{l - 1}$ to solve Equation (16), take the partial derivatives with respect to $α, γ, β$ and set them to zero as in Equations (21)–(23).

From Equation (23), the M-step update of $β$ has closed form

β_{l} = \frac{M}{E_{l - 1} [{(y - Ψ f)}^{T} (y - Ψ f)]},

(A2)

which is independent of the update of $α, γ$ .

As for $α, γ$ , as the approximately bandlimited prior Equation (1) requires $α, γ > 0$ , if Equations (21) and (22) hold for some $α, γ > 0$ , this $(α, γ)$ is the solution for $(α_{l}, γ_{l})$ ; otherwise, $(α_{l}, γ_{l})$ should be determined using gradient information based on concavity.

Figure A1 — Two cases of M-step update of $α, γ$ . (a) Case I when $Δ > 0$ : The solution point $(α_{l}, γ_{l})$ locates on PQ between P and Q. (b) Case II when $Δ \leq 0$ : The endpoint Q is the solution for $(α_{l}, γ_{l})$ .

Combining Equations (21) and (22) to have $α \frac{\partial E_{l - 1}}{\partial α} + γ \frac{\partial E_{l - 1}}{\partial γ} = 0$ , we get

α E_{l - 1} [f^{T} f] + γ E_{l - 1} [f^{T} L^{2 n} f] = N,

(A3)

which constrains $(α, γ)$ on a straight line in the $α$ - $γ$ plane. Only the segment $P Q$ in the first quadrant is considered as shown in Figure A1 to ensure $α, γ > 0$ . The endpoint P has coordinates $(0, \frac{N}{E_{l - 1} [f^{T} L^{2 n} f]})$ , and the endpoint Q is at $(\frac{N}{E_{l - 1} [f^{T} f]}, 0)$ .

Define the gradient of $E_{l - 1}$ with respect to $α, γ$ as $g = {[\frac{\partial E_{l - 1}}{\partial α}, \frac{\partial E_{l - 1}}{\partial γ}]}^{T}$ . Any point A on $P Q$ satisfies ${g |}_{A} ⊥ O A$ . The gradient approaching P

{g |}_{(α, γ) \to P} = {[+ \infty, 0]}^{T}

(A4)

is always rightward as illustrated in Figure A1. Moreover, the gradient at Q

{g |}_{Q} = {[0, \frac{1}{2} Δ]}^{T}, where Δ = \frac{\dots (L^{2 n})}{N} E_{l - 1} [f^{T} f] - E_{l - 1} [f^{T} L^{2 n} f],

(A5)

is either upward or downward, depending on the sign of $Δ$ .

If $Δ > 0$ , the gradient at Q is upward as in Figure A1a. Let $d$ be the unit vector of direction $P Q$ . Consider the directional derivative

\begin{matrix} \frac{\partial E_{l - 1}}{\partial d} = 〈 g, d 〉 \propto \frac{\partial E_{l - 1}}{\partial α} E_{l - 1} [f^{T} L^{2 n} f] - \frac{\partial E_{l - 1}}{\partial γ} E_{l - 1} [f^{T} f] . \end{matrix}

(A6)

$\frac{\partial E_{l - 1}}{\partial d} |_{(α, γ) \to P} > 0$ and $\frac{\partial E_{l - 1}}{\partial d} |_{Q} < 0$ are different in sign. By continuity of $\frac{\partial E_{l - 1}}{\partial d}$ , there exists a zero-crossing Z of $\frac{\partial E_{l - 1}}{\partial d}$ on $P Q$ between P and Q. Moreover, by concavity of $E_{l - 1}$ , $\frac{\partial E_{l - 1}}{\partial d}$ is monotonous along $P Q$ . The zero-crossing Z can be found using, for example, binary search. Notice that ${g |}_{Z}$ satisfies both ${g |}_{Z} ⊥ P Q$ and ${g |}_{Z} ⊥ O Z$ , so ${g |}_{Z} = 0$ , i.e., Equations (21) and (22) hold at Z. Therefore, Z is the solution point for $(α_{l}, γ_{l})$ .

Otherwise, if $Δ \leq 0$ , the gradient at Q is zero or downward as in Figure A1b. By concavity of $E_{l - 1}$ , the endpoint Q is the optimal point in the first quadrant. $γ_{l}$ is set to 0, and $α$ is updated by

α_{l} = \frac{N}{E_{l - 1} [f^{T} f]} .

(A7)

In this case, $p (f; α_{l}, γ_{l}) = N (f | 0, α_{l}^{- 1} I_{N})$ , signal values on different nodes become i.i.d., and only the total energy of the graph signal is restricted.

Appendix C. Proof of Lemma 1

Proof.

Let ${m_{(r), t}}_{r = 1}^{N}$ be the order statistics of ${m_{i, t}}_{i \in V}$ satisfying $m_{(1), t} \leq \dots \leq m_{(N), t}$ , and denote the sorted vertex indices by ${i_{(r), t}}_{r = 1}^{N}$ . Consider $η_{i, t} = \frac{m_{i, t}}{t}$ , and $η_{(r), t} = \frac{m_{(r), t}}{t}$ . Define

$R = min \{r | \exists δ > 0, \underset{t \to \infty}{lim inf} η_{(r), t} > δ\} - 1 .$ (A8)

Note that $\underset{t \to \infty}{lim inf} η_{(r), t}$ always exists as $η_{i, t}$ is bounded in $[0, 1]$ . Moreover, the set is not empty as $η_{(N), t} = max_{i \in V} η_{i, t} \geq \frac{1}{N}$ . So $0 \leq R \leq N - 1$ . Suppose that $R \geq 1$ . Let $ρ_{t} = η_{(R), t}$ . We are to show that $\underset{t \to \infty}{lim inf} ρ_{t} \geq \underline{ρ}$ for some $\underline{ρ} > 0$ , which contradicts the definition of R. Then we will arrive at an assertion $R = 0$ , and lemma 1 is proved.

Define $P_{t} = \frac{1}{t} C_{t} = \frac{1}{t} A_{t}^{'} + β_{t}^{'} \sum_{i \in V} η_{i, t} e_{i} e_{i}^{T}$ , where $A_{t}^{'} = α_{t}^{'} I_{N} + γ_{t}^{'} L^{2 n}$ . The sampling strategy Equation (25) can also be rewritten as

$s_{t + 1} = arg max_{i \in V} e_{i}^{T} P_{t}^{- 1} e_{i} .$ (A9)

Define $I_{t} = {i_{(r), t} | r \geq R + 1}$ , and $I_{t}^{c} = {i_{(r), t} | r \leq R}$ . By definition, there exists $δ > 0$ and $T < \infty$ , for all $t > T$ and $i \in I_{t}$ , it holds $η_{i, t} > δ$ . As $A_{t}^{'}$ is positive definite,

$\forall t > T, \forall i \in I_{t}, e_{i}^{T} P_{t}^{- 1} e_{i} \leq e_{i}^{T} {(β_{t}^{'} \sum_{j \in V} η_{j, t} e_{j} e_{j}^{T})}^{†} e_{i} = \frac{1}{β_{t}^{'} η_{i, t}} < \frac{1}{β_{t}^{'} δ},$ (A10)

where ${(\cdot)}^{†}$ denotes the pseudo-inverse of a matrix.

Consider $i \in I_{t}^{c}$ . Define

$\begin{matrix} Q_{t}^{(1)} = β_{t}^{'} (1 - ρ_{t}) \sum_{i \in I_{t}} e_{i} e_{i}^{T}, \end{matrix}$ (A11)

$\begin{matrix} Q_{t}^{(2)} = \frac{1}{t} A_{t}^{'} + β_{t}^{'} ρ_{t} \sum_{i \in V} e_{i} e_{i}^{T} . \end{matrix}$ (A12)

As $P_{t} ⪯ Q_{t}^{(1)} + Q_{t}^{(2)} = \frac{1}{t} A_{t}^{'} + β_{t}^{'} (\sum_{i \in I_{t}} e_{i} e_{i}^{T} + ρ_{t} \sum_{i \in I_{t}^{c}} e_{i} e_{i}^{T})$ ,

$\begin{matrix} max_{i \in I_{t}^{c}} e_{i}^{T} P_{t}^{- 1} e_{i} & \geq max_{i \in I_{t}^{c}} e_{i}^{T} {(Q_{t}^{(1)} + Q_{t}^{(2)})}^{- 1} e_{i} \\ = max_{i \in I_{t}^{c}, z \in R^{N}} 2 e_{i}^{T} z - z^{T} (Q_{t}^{(1)} + Q_{t}^{(2)}) z \\ \geq max_{i \in I_{t}^{c}, z \in N (Q_{t}^{(1)})} 2 e_{i}^{T} z - z^{T} Q_{t}^{(2)} z, \end{matrix}$ (A13)

where $N (Q_{t}^{(1)})$ is the null space of $Q_{t}^{(1)}$ . Denote the maximum value in the last line, also a lower bound of $max_{i \in I_{t}^{c}} e_{i}^{T} P_{t}^{- 1} e_{i}$ , as $L_{t}$ , and the maximizers as $i_{t}^{*}$ , $z_{t}^{*}$ . On one hand, when i is set to $i_{t}^{*}$ , we have

$L_{t} = max_{z \in N (Q_{t}^{(1)})} 2 e_{i_{t}^{*}}^{T} z - z^{T} Q_{t}^{(2)} z .$ (A14)

By Lagrange multiplier method,

$\begin{matrix} z_{t}^{*} = {(Q_{t}^{(2)})}^{- 1} (I_{N} - Q_{t}^{(1)} {(Q_{t}^{(1)} {(Q_{t}^{(2)})}^{- 1} Q_{t}^{(1)})}^{†} Q_{t}^{(1)} {(Q_{t}^{(2)})}^{- 1}) e_{i_{t}^{*}}, \end{matrix}$ (A15)

$\begin{matrix} L_{t} = 2 e_{i_{t}^{*}}^{T} z_{t}^{*} - {(z_{t}^{*})}^{T} Q_{t}^{(2)} z_{t}^{*} = e_{i_{t}^{*}}^{T} z_{t}^{*} = {(z_{t}^{*})}^{T} Q_{t}^{(2)} z_{t}^{*} . \end{matrix}$ (A16)

On the other hand, when $z$ is fixed at $z_{t}^{*}$ , we know

$e_{i_{t}^{*}}^{T} z_{t}^{*} = max_{i \in I_{t}^{c}} e_{i}^{T} z_{t}^{*} .$ (A17)

We further assert that

$e_{i_{t}^{*}}^{T} z_{t}^{*} = max_{i \in I_{t}^{c}} |e_{i}^{T} z_{t}^{*}| .$ (A18)

Otherwise, there exists $j \in I_{t}^{c}$ such that $- e_{j}^{T} z_{t}^{*} > e_{i_{t}^{*}}^{T} z_{t}^{*}$ . This will lead to a contradiction $2 e_{j}^{T} (- z_{t}^{*}) - {(- z_{t}^{*})}^{T} Q_{t}^{(2)} (- z_{t}^{*}) > L_{t}$ . Therefore,

$\begin{matrix} {(e_{i_{t}^{*}}^{T} z_{t}^{*})}^{2} & = max_{i \in I_{t}^{c}} {(e_{i}^{T} z_{t}^{*})}^{2} = max_{i \in I_{t}^{c}} {(z_{t}^{*})}^{T} e_{i} e_{i}^{T} z_{t}^{*} \\ \geq \frac{1}{R} {(z_{t}^{*})}^{T} \sum_{i \in I_{t}^{c}} e_{i} e_{i}^{T} z_{t}^{*} \\ = \frac{1}{R} {(z_{t}^{*})}^{T} (\sum_{i \in I_{t}^{c}} e_{i} e_{i}^{T} + \sum_{i \in I_{t}} e_{i} e_{i}^{T}) z_{t}^{*} \\ = \frac{1}{R} {∥ z_{t}^{*} ∥}^{2}, \end{matrix}$ (A19)

where the last but one line involves $z_{t}^{*} \in N (Q_{t}^{(1)})$ . Therefore,

$L_{t} = e_{i_{t}^{*}}^{T} z_{t}^{*} \geq \frac{1}{\sqrt{R}} ∥ z_{t}^{*} ∥ .$ (A20)

At the same time, according to the Courant–Fischer theorem, along with the positive definiteness of $Q_{t}^{(2)}$ ,

$L_{t} = {(z_{t}^{*})}^{T} Q_{t}^{(2)} z_{t}^{*} \leq λ_{max} (Q_{t}^{(2)}) ∥ z_{t}^{*} ∥^{2} < \dots (Q_{t}^{(2)}) {∥ z_{t}^{*} ∥}^{2},$ (A21)

where $λ_{max} (Q_{t}^{(2)})$ denotes the maximum eigenvalue of $Q_{t}^{(2)}$ , and $\dots (Q_{t}^{(2)})$ is the trace. Equations (A20) and (A21) together they give

$∥ z_{t}^{*} ∥ > \frac{1}{\sqrt{R} \dots (Q_{t}^{(2)})} = \frac{1}{\sqrt{R} (\frac{1}{t} \dots (A_{t}^{'}) + N β_{t}^{'} ρ_{t})} .$ (A22)

Substituting it back to (A20),

$max_{i \in I_{t}^{c}} e_{i}^{T} P_{t}^{- 1} e_{i} \geq L_{t} \geq \frac{1}{\sqrt{R}} ∥ z_{t}^{*} ∥ > \frac{1}{R (\frac{1}{t} \dots (A_{t}^{'}) + N β_{t}^{'} ρ_{t})} .$ (A23)

Let $ρ^{*} = \frac{δ}{N R}$ , and $ρ_{t}^{*} = ρ^{*} - \frac{1}{t} \frac{\dots (A_{t}^{'})}{N β_{t}^{'}}$ . According to the sampling strategy Equation (A9), from previous results Equations (A10) and (A23), for $t > T$ , $ρ_{t} < ρ_{t}^{*}$ implies $s_{t + 1} \in I_{t}^{c}$ . Define ${\tilde{ρ}}_{t} = \frac{1}{R} \sum_{i \in I_{t}^{c}} η_{i, t}$ .

$\begin{matrix} ρ_{t + 1} \geq {\tilde{ρ}}_{t + 1} = \frac{R t {\tilde{ρ}}_{t} + 1}{R (t + 1)} & = {\tilde{ρ}}_{t} + \frac{1 - R {\tilde{ρ}}_{t}}{R (t + 1)} \\ > {\tilde{ρ}}_{t} + \frac{(N - R) δ}{R (t + 1)} \\ \geq \frac{ρ_{t}}{R} + \frac{(N - R) δ}{R (t + 1)} . \end{matrix}$ (A24)

By induction, this lower bound of $ρ_{t + τ}$ increases with $τ$

$ρ_{t + τ} > \frac{ρ_{t}}{R} + \frac{(N - R) δ}{R} \sum_{k = 1}^{τ} \frac{1}{t + k}$ (A25)

until $ρ_{t + τ} \geq ρ_{t + τ}^{*}$ . Suppose that $ρ_{t - 1} \geq ρ_{t - 1}^{*}$ but $ρ_{t} < ρ_{t}^{*}$ at some $t > T$ . Then,

$ρ_{t} \geq \frac{t - 1}{t} ρ_{t - 1} \geq \frac{t - 1}{t} ρ_{t - 1}^{*} .$ (A26)

If $lim_{t \to \infty} \frac{1}{t} \frac{\dots (A_{t}^{'})}{N β_{t}^{'}} = 0$ , then $lim_{t \to \infty} ρ_{t}^{*} = ρ^{*}$ . We can obtain $\underset{t \to \infty}{lim inf} ρ_{t} \geq lim_{t \to \infty} \frac{(t - 1) ρ_{t - 1}^{*}}{R t} = \frac{ρ^{*}}{R} = \frac{δ}{N R^{2}} > 0$ , which contradicts the definition.

Therefore, the only possibility is $R = 0$ , and Lemma 1 is proved. □

The authors of Pronzato [28] have proved a similar statement in sequential DOE for nonlinear LS regression. In contrast, the proposed algorithm is an uncertainty sampling process for a Bayesian estimation problem, which makes our proposition and proof different in some way from theirs.

Appendix D. Proof of Theorem 2

Proof.

Let $U_{ϵ} = {f \in R^{N} | \forall i \in V, | f_{i} - f_{i}^{*} | < ϵ}$ be a “cubic” neighborhood around the true signal $f^{*}$ , and its complement $U_{ϵ}^{c} = R^{N} ∖ U_{ϵ}$ . We are to prove that $P (U_{ϵ}^{c} | Ψ_{1 : t}, y_{1 : t}; α_{t}^{'}, γ_{t}^{'},$ $β_{t}^{'})$ converges to 0 with high probability for arbitrary $ϵ > 0$ .

Here, for brevity, $γ_{t}^{'}$ is absorbed into $α_{t}^{'}$ . We can get a fractional form of the target

$P (U_{ϵ}^{c} | Ψ_{1 : t}, y_{1 : t}; α_{t}^{'}, β_{t}^{'}) = \frac{N_{t}}{D_{t}} = \frac{\int_{U_{ϵ}^{c}} \frac{p (y_{1 : t} | Ψ_{1 : t}, f; β_{t}^{'})}{p (y_{1 : t} | Ψ_{1 : t}, f^{*}; β^{*})} p (f; α_{t}^{'}) d f}{\int_{R^{N}} \frac{p (y_{1 : t} | Ψ_{1 : t}, f; β_{t}^{'})}{p (y_{1 : t} | Ψ_{1 : t}, f^{*}; β^{*})} p (f; α_{t}^{'}) d f},$ (A27)

where $f^{*}$ is the true signal, and $β^{*}$ is the true noise precision.

First look at the numerator. According to Corollary 2, as $t \to \infty$ , $m_{i, t} \to \infty$ . We have

$\begin{matrix} \frac{p (y_{1 : t} | Ψ_{1 : t}, f; β_{t}^{'})}{p (y_{1 : t} | Ψ_{1 : t}, f^{*}; β^{*})} & \overset{(a)}{=} exp (- \sum_{i = 1}^{N} m_{i, t} (\frac{1}{m_{i, t}} \sum_{j = 1}^{m_{i, t}} ln \frac{p (y_{i, j} | f_{i}^{*}; β^{*})}{p (y_{i, j} | f_{i}; β_{t}^{'})})) \\ \overset{(b)}{\to} exp (- \sum_{i = 1}^{N} m_{i, t} K L (p (y_{i} | f_{i}^{*}; β^{*}) | | p (y_{i} | f_{i}; β_{t}^{'}))), \end{matrix}$ (A28)

where $y_{i, j}$ denotes the j-th observation of vertex i; $K L (p | | q)$ is the Kullback–Leibler divergence between two distributions; (a) takes into account the independence of different observations in $y$ conditioned on $Ψ$ , $f$ , and $β$ ; and (b) obeys the Law of Large Numbers.

For any $f \in U_{ϵ}^{c}$ , there exists $k \in V$ , such that $| f_{k} - f_{k}^{*} | > ϵ$ . Regardless of the value of $β_{t}^{'}$ ,

$\begin{matrix} K L (p (y_{k} | f_{k}^{*}; β^{*}) | | p (y_{k} | f_{k}; β_{t}^{'})) & = \frac{1}{2} (β_{t}^{'} {(f_{k} - f_{k}^{*})}^{2} + \frac{β_{t}^{'}}{β^{*}} - ln \frac{β_{t}^{'}}{β^{*}} - 1) \\ > \frac{1}{2} ln (1 + β^{*} ϵ^{2}) . \end{matrix}$ (A29)

For $i \neq k$ , $K L (p (y_{i} | f_{i}^{*}; β^{*}) | | p (y_{i} | f_{i}; β_{t}^{'})) \geq 0$ by definition. Moreover, from Corollary 1, there exists $δ > 0$ , such that $\frac{m_{k, t}}{t} > δ$ for all large t. Thus, for $ϵ_{1} = \frac{1}{2} δ ln (1 + β^{*} ϵ^{2}) > 0$ ,

$\begin{matrix} \frac{p (y_{1 : t} | Ψ_{1 : t}, f; β_{t}^{'})}{p (y_{1 : t} | Ψ_{1 : t}, f^{*}; β^{*})} < & exp (- \frac{1}{2} m_{k, t} ln (1 + β^{*} ϵ^{2})) \\ < & exp (- t ϵ_{1}) \end{matrix}$ (A30)

for all $f \in U_{ϵ}$ for sufficiently large t with high probability.

Therefore, when t becomes large enough, no matter what values $α_{t}^{'}$ have, the numerator

$N_{t} = \int_{U_{ϵ}^{c}} \frac{p (y_{1 : t} | Ψ_{1 : t}, f; β_{t}^{'})}{p (y_{1 : t} | Ψ_{1 : t}, f^{*}; β^{*})} p (f; α_{t}^{'}) d f < exp (- t ϵ_{1})$ (A31)

with high probability.

Then, come to the denominator. As $α_{t}^{'}$ , $β_{t}^{'}$ are estimated by MML,

$D_{t} \geq {\tilde{D}}_{t} (α, β^{*}) = \int_{R^{N}} \frac{p (y_{1 : t} | Ψ_{1 : t}, f; β^{*})}{p (y_{1 : t} | Ψ_{1 : t}, f^{*}; β^{*})} p (f; α) d f,$ (A32)

where $α$ is allowed to be chosen arbitrarily.

Consider arbitrary $ϵ_{2} > 0$ . Analogous to above, as $t \to \infty$ and $m_{i, t} \to \infty$ , we have

$\begin{matrix} exp (t ϵ_{2}) \frac{p (y_{1 : t} | Ψ_{1 : t}, f; β^{*})}{p (y_{1 : t} | Ψ_{1 : t}, f^{*}; β^{*})} & = exp (\sum_{i = 1}^{N} m_{i, t} (ϵ_{2} - \frac{1}{m_{i, t}} \sum_{j = 1}^{m_{i, t}} ln \frac{p (y_{i, j} | f_{i}^{*}; β^{*})}{p (y_{i, j} | f_{i}; β^{*})})) \\ \to exp (\sum_{i = 1}^{N} m_{i, t} (ϵ_{2} - K L (p (y_{i} | f_{i}^{*}; β^{*}) | | p (y_{i} | f_{i}; β^{*})))) . \end{matrix}$ (A33)

Define $X_{ϵ_{2} / 2} = \{f \in R^{N} | \forall i \in V, K L (p (y_{i} | f_{i}^{*}; β^{*}) | | p (y_{i} | f_{i}; β^{*})) < \frac{ϵ_{2}}{2}\}$ . For all $f \in X_{ϵ_{2} / 2}$ , when t grows large enough,

$exp (t ϵ_{2}) \frac{p (y_{1 : t} | Ψ_{1 : t}, f; β^{*})}{p (y_{1 : t} | Ψ_{1 : t}, f^{*}; β^{*})} > exp (t \frac{ϵ_{2}}{2})$ (A34)

with high probability.

Note that the term inside integration is positive. We shrink the range of integration and obtain

$\begin{matrix} exp (t ϵ_{2}) {\tilde{D}}_{t} (α, β^{*}) & > exp (t ϵ_{2}) \int_{X_{ϵ_{2} / 2}} \frac{p (y_{1 : t} | Ψ_{1 : t}, f; β^{*})}{p (y_{1 : t} | Ψ_{1 : t}, f^{*}; β^{*})} p (f; α) d f \\ > exp (t \frac{ϵ_{2}}{2}) \int_{X_{ϵ_{2} / 2}} p (f; α) d f . \end{matrix}$ (A35)

We can always choose an $α$ such that $\int_{X_{ϵ_{2} / 2}} p (f; α) d f > 0$ since $p (f; α)$ is Gaussian. Then, we get $exp (t ϵ_{2}) {\tilde{D}}_{t} (α, β^{*}) \to \infty$ as $t \to \infty$ . Therefore, the denominator

$D_{t} \geq {\tilde{D}}_{t} (α, β^{*}) > exp (- t ϵ_{2})$ (A36)

for all large t with high probability, where $ϵ_{2}$ can be any positive number.

Pick $0 < ϵ_{2} < ϵ_{1}$ . For $ϵ_{3} = ϵ_{1} - ϵ_{2} > 0$ ,

$P (U_{ϵ}^{c} | Ψ_{1 : t}, y_{1 : t}; α_{t}^{'}, β_{t}^{'}) = \frac{N_{t}}{D_{t}} < exp (- t ϵ_{3})$ (A37)

for all sufficiently large t with high probability.

We finally reach a conclusion that $P (U_{ϵ}^{c} | Ψ_{1 : t}, y_{1 : t}; α_{t}^{'}, β_{t}^{'})$ converges to 0 with high probability as $t \to \infty$ for any positive $ϵ$ . □

The proposed algorithm is more complex than classical empirical Bayes method discussed in Petrone et al. [27], because in our setting, hyperparameters exist not only in signal prior but also in observation model, and i.i.d. complete observations are substituted by partial ones selected by strategy. That is why a more complicated proof of consistent signal estimation in our algorithm is provided here.

Author Contributions

Conceptualization, S.L., H.F., and B.H.; methodology, S.L. and H.F.; software, S.L.; validation, S.L., K.X., and H.F.; formal analysis, S.L. and H.F.; investigation, S.L. and H.F.; resources, H.F. and B.H.; data curation, S.L.; writing—original draft preparation, S.L.; writing—review and editing, K.X. and H.F.; visualization, S.L.; supervision, H.F. and B.H.; project administration, H.F., B.H.; funding acquisition, H.F. and B.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Shanghai Municipal Natural Science Foundation (No. 19ZR1404700), Fudan University-CIOMP Joint Fund (FC2019-003), and 2020 Okawa Foundation Research Grant.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

Footnotes

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Shuman D.I., Narang S.K., Frossard P., Ortega A., Vandergheynst P. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process. Mag. 2013;30:83–98. doi: 10.1109/MSP.2012.2235192. [DOI] [Google Scholar]
2.Sandryhaila A., Moura J.M.F. Discrete signal processing on graphs. IEEE Trans. Signal Process. 2013;61:1644–1656. doi: 10.1109/TSP.2013.2238935. [DOI] [Google Scholar]
3.Shuman D.I., Vandergheynst P., Frossard P. Chebyshev polynomial approximation for distributed signal processing; Proceedings of the 2011 International Conference on Distributed Computing in Sensor Systems and Workshops (DCOSS); Barcelona, Spain. 27–29 June 2011. [Google Scholar]
4.Shi X., Feng H., Zhai M., Yang T., Hu B. Infinite impulse response graph filters in wireless sensor networks. IEEE Signal Process. Lett. 2015;22:1113–1117. doi: 10.1109/LSP.2014.2387204. [DOI] [Google Scholar]
5.Loukas A., Simonetto A., Leus G. Distributed autoregressive moving average graph filters. IEEE Signal Process. Lett. 2015;22:1931–1935. doi: 10.1109/LSP.2015.2448655. [DOI] [Google Scholar]
6.Chen S., Varma R., Sandryhaila A., Kovačević J. Discrete signal processing on graphs: Sampling theory. IEEE Trans. Signal Process. 2015;63:6510–6523. doi: 10.1109/TSP.2015.2469645. [DOI] [Google Scholar]
7.Anis A., Gadde A., Ortega A. Efficient sampling set selection for bandlimited graph signals using graph spectral proxies. IEEE Trans. Signal Process. 2016;64:3775–3789. doi: 10.1109/TSP.2016.2546233. [DOI] [Google Scholar]
8.Narang S.K., Gadde A., Ortega A. Signal processing techniques for interpolation in graph structured data; Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); Vancouver, BC, Canada. 26–31 May 2013; pp. 5445–5449. [Google Scholar]
9.Sakiyama A., Tanaka Y., Tanaka T., Ortega A. Efficient sensor position selection using graph signal sampling theory; Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); Shanghai, China. 20–25 March 2016; pp. 6225–6229. [Google Scholar]
10.Jabłoński I. Graph signal processing in applications to sensor networks, smart grids, and smart cities. IEEE Sens. J. 2017;17:7659–7666. doi: 10.1109/JSEN.2017.2733767. [DOI] [Google Scholar]
11.Huang W., Bolton T.A.W., Medaglia J.D., Bassett D.S., Ribeiro A., Ville D.V.D. A graph signal processing perspective on functional brain imaging. Proc. IEEE. 2018;106:868–885. doi: 10.1109/JPROC.2018.2798928. [DOI] [Google Scholar]
12.Cheung G., Magli E., Tanaka Y., Ng M.K. Graph spectral image processing. Proc. IEEE. 2018;106:907–930. doi: 10.1109/JPROC.2018.2799702. [DOI] [Google Scholar]
13.Dinesh C., Cheung G., Bajić I.V. Point cloud denoising via feature graph Laplacian regularization. IEEE Trans. Image Process. 2020;29:4143–4158. doi: 10.1109/TIP.2020.2969052. [DOI] [PubMed] [Google Scholar]
14.Gadde A., Anis A., Ortega A. Active semi-supervised learning using sampling theory for graph signals; Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; New York, NY, USA. 24–27 August 2014; pp. 492–501. [Google Scholar]
15.Anis A., El Gamal A., Avestimehr A.S., Ortega A. A sampling theory perspective of graph-based semi-supervised learning. IEEE Trans. Inf. Theory. 2019;65:2322–2342. doi: 10.1109/TIT.2018.2879897. [DOI] [Google Scholar]
16.Kipf T.N., Welling M. Semi-supervised classification with graph convolutional networks; Proceedings of the 5th International Conference on Learning Representations (ICLR), Palais des Congrès Neptune; Toulon, France. 24–26 April 2017. [Google Scholar]
17.Chen S., Varma R., Singh A., Kovačević J. Signal recovery on graphs: Fundamental limits of sampling strategies. IEEE Trans. Signal Inf. Process. Netw. 2016;2:539–554. doi: 10.1109/TSIPN.2016.2614903. [DOI] [Google Scholar]
18.Puy G., Tremblay N., Gribonval R., Vandergheynst P. Random sampling of bandlimited signals on graphs. Appl. Comput. Harmon. Anal. 2018;44:446–475. doi: 10.1016/j.acha.2016.05.005. [DOI] [Google Scholar]
19.Xie X., Feng H., Jia J., Hu B. Design of sampling set for bandlimited graph signal estimation; Proceedings of the 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP); Montreal, QC, Canada. 14–16 November 2017; pp. 653–657. [Google Scholar]
20.Lin S., Xie X., Feng H., Hu B. Active sampling for approximately bandlimited graph signals; Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); Brighton, UK. 12–17 May 2019; pp. 5441–5445. [Google Scholar]
21.Perraudin N., Ricaud B., Shuman D.I., Vandergheynst P. Global and local uncertainty principles for signals on graphs. APSIPA Trans. Signal Inf. Process. 2018;7:e3. doi: 10.1017/ATSIP.2018.2. [DOI] [Google Scholar]
22.Xie X., Yu J., Feng H., Hu B. Bayesian design of sampling set for bandlimited graph signals; Proceedings of the 2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP); Ottawa, ON, Canada. 11–14 November 2019. [Google Scholar]
23.Gadde A., Ortega A. A probabilistic interpretation of sampling theory of graph signals; Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); South Brisbane, Australia. 19–24 April 2015; pp. 3257–3261. [Google Scholar]
24.Bishop C.M. Pattern Recognition and Machine Learning. Springer; New York, NY, USA: 2006. [Google Scholar]
25.Settles B. Active Learning. Morgan & Claypool; San Rafael, CA, USA: 2012. [Google Scholar]
26.Bernardo J.M., Smith A.F.M. Bayesian Theory. John Wiley & Sons; Hoboken, NJ, USA: 1994. [Google Scholar]
27.Petrone S., Rousseau J., Scricciolo C. Bayes and empirical Bayes: Do they merge? Biometrika. 2014;101:285–302. doi: 10.1093/biomet/ast067. [DOI] [Google Scholar]
28.Pronzato L. One-step ahead adaptive D-optimal design on a finite design space is asymptotically optimal. Metrika. 2010;71:219–238. doi: 10.1007/s00184-008-0227-y. [DOI] [Google Scholar]
29.Oppenheim A.V., Schafer R.W. Discrete-Time Signal Processing. 3rd ed. Prentice Hall; Englewood Cliffs, NJ, USA: 2009. [Google Scholar]
30.Kay S.M. Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory. Prentice Hall; Englewood Cliffs, NJ, USA: 1993. [Google Scholar]
31.Sakiyama A., Tanaka Y., Tanaka T., Ortega A. Eigendecomposition-free sampling set selection for graph signals. IEEE Trans. Signal Process. 2019;67:2679–2692. doi: 10.1109/TSP.2019.2908129. [DOI] [Google Scholar]
32.Watts D.J., Strogatz S.H. Collective dynamics of ‘small-world’ networks. Nature. 1998;393:440–442. doi: 10.1038/30918. [DOI] [PubMed] [Google Scholar]
33.Barabási A.L., Albert R. Emergence of scaling in random networks. Science. 1999;286:509–512. doi: 10.1126/science.286.5439.509. [DOI] [PubMed] [Google Scholar]
34.Perraudin N., Paratte J., Shuman D., Martin L., Kalofolias V., Vandergheynst P., Hammond D.K. GSPBOX: A toolbox for signal processing on graphs. arXiv. 20161408.5781v2 [cs.IT] [Google Scholar]
35.Integrated Surface Database (ISD) by National Climatic Data Center (NCDC) of the USA. [(accessed on 11 December 2020)]; Available online: https://www.ncdc.noaa.gov/isd.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

[B1-sensors-21-01460] 1.Shuman D.I., Narang S.K., Frossard P., Ortega A., Vandergheynst P. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process. Mag. 2013;30:83–98. doi: 10.1109/MSP.2012.2235192. [DOI] [Google Scholar]

[B2-sensors-21-01460] 2.Sandryhaila A., Moura J.M.F. Discrete signal processing on graphs. IEEE Trans. Signal Process. 2013;61:1644–1656. doi: 10.1109/TSP.2013.2238935. [DOI] [Google Scholar]

[B3-sensors-21-01460] 3.Shuman D.I., Vandergheynst P., Frossard P. Chebyshev polynomial approximation for distributed signal processing; Proceedings of the 2011 International Conference on Distributed Computing in Sensor Systems and Workshops (DCOSS); Barcelona, Spain. 27–29 June 2011. [Google Scholar]

[B4-sensors-21-01460] 4.Shi X., Feng H., Zhai M., Yang T., Hu B. Infinite impulse response graph filters in wireless sensor networks. IEEE Signal Process. Lett. 2015;22:1113–1117. doi: 10.1109/LSP.2014.2387204. [DOI] [Google Scholar]

[B5-sensors-21-01460] 5.Loukas A., Simonetto A., Leus G. Distributed autoregressive moving average graph filters. IEEE Signal Process. Lett. 2015;22:1931–1935. doi: 10.1109/LSP.2015.2448655. [DOI] [Google Scholar]

[B6-sensors-21-01460] 6.Chen S., Varma R., Sandryhaila A., Kovačević J. Discrete signal processing on graphs: Sampling theory. IEEE Trans. Signal Process. 2015;63:6510–6523. doi: 10.1109/TSP.2015.2469645. [DOI] [Google Scholar]

[B7-sensors-21-01460] 7.Anis A., Gadde A., Ortega A. Efficient sampling set selection for bandlimited graph signals using graph spectral proxies. IEEE Trans. Signal Process. 2016;64:3775–3789. doi: 10.1109/TSP.2016.2546233. [DOI] [Google Scholar]

[B8-sensors-21-01460] 8.Narang S.K., Gadde A., Ortega A. Signal processing techniques for interpolation in graph structured data; Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); Vancouver, BC, Canada. 26–31 May 2013; pp. 5445–5449. [Google Scholar]

[B9-sensors-21-01460] 9.Sakiyama A., Tanaka Y., Tanaka T., Ortega A. Efficient sensor position selection using graph signal sampling theory; Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); Shanghai, China. 20–25 March 2016; pp. 6225–6229. [Google Scholar]

[B10-sensors-21-01460] 10.Jabłoński I. Graph signal processing in applications to sensor networks, smart grids, and smart cities. IEEE Sens. J. 2017;17:7659–7666. doi: 10.1109/JSEN.2017.2733767. [DOI] [Google Scholar]

[B11-sensors-21-01460] 11.Huang W., Bolton T.A.W., Medaglia J.D., Bassett D.S., Ribeiro A., Ville D.V.D. A graph signal processing perspective on functional brain imaging. Proc. IEEE. 2018;106:868–885. doi: 10.1109/JPROC.2018.2798928. [DOI] [Google Scholar]

[B12-sensors-21-01460] 12.Cheung G., Magli E., Tanaka Y., Ng M.K. Graph spectral image processing. Proc. IEEE. 2018;106:907–930. doi: 10.1109/JPROC.2018.2799702. [DOI] [Google Scholar]

[B13-sensors-21-01460] 13.Dinesh C., Cheung G., Bajić I.V. Point cloud denoising via feature graph Laplacian regularization. IEEE Trans. Image Process. 2020;29:4143–4158. doi: 10.1109/TIP.2020.2969052. [DOI] [PubMed] [Google Scholar]

[B14-sensors-21-01460] 14.Gadde A., Anis A., Ortega A. Active semi-supervised learning using sampling theory for graph signals; Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; New York, NY, USA. 24–27 August 2014; pp. 492–501. [Google Scholar]

[B15-sensors-21-01460] 15.Anis A., El Gamal A., Avestimehr A.S., Ortega A. A sampling theory perspective of graph-based semi-supervised learning. IEEE Trans. Inf. Theory. 2019;65:2322–2342. doi: 10.1109/TIT.2018.2879897. [DOI] [Google Scholar]

[B16-sensors-21-01460] 16.Kipf T.N., Welling M. Semi-supervised classification with graph convolutional networks; Proceedings of the 5th International Conference on Learning Representations (ICLR), Palais des Congrès Neptune; Toulon, France. 24–26 April 2017. [Google Scholar]

[B17-sensors-21-01460] 17.Chen S., Varma R., Singh A., Kovačević J. Signal recovery on graphs: Fundamental limits of sampling strategies. IEEE Trans. Signal Inf. Process. Netw. 2016;2:539–554. doi: 10.1109/TSIPN.2016.2614903. [DOI] [Google Scholar]

[B18-sensors-21-01460] 18.Puy G., Tremblay N., Gribonval R., Vandergheynst P. Random sampling of bandlimited signals on graphs. Appl. Comput. Harmon. Anal. 2018;44:446–475. doi: 10.1016/j.acha.2016.05.005. [DOI] [Google Scholar]

[B19-sensors-21-01460] 19.Xie X., Feng H., Jia J., Hu B. Design of sampling set for bandlimited graph signal estimation; Proceedings of the 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP); Montreal, QC, Canada. 14–16 November 2017; pp. 653–657. [Google Scholar]

[B20-sensors-21-01460] 20.Lin S., Xie X., Feng H., Hu B. Active sampling for approximately bandlimited graph signals; Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); Brighton, UK. 12–17 May 2019; pp. 5441–5445. [Google Scholar]

[B21-sensors-21-01460] 21.Perraudin N., Ricaud B., Shuman D.I., Vandergheynst P. Global and local uncertainty principles for signals on graphs. APSIPA Trans. Signal Inf. Process. 2018;7:e3. doi: 10.1017/ATSIP.2018.2. [DOI] [Google Scholar]

[B22-sensors-21-01460] 22.Xie X., Yu J., Feng H., Hu B. Bayesian design of sampling set for bandlimited graph signals; Proceedings of the 2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP); Ottawa, ON, Canada. 11–14 November 2019. [Google Scholar]

[B23-sensors-21-01460] 23.Gadde A., Ortega A. A probabilistic interpretation of sampling theory of graph signals; Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); South Brisbane, Australia. 19–24 April 2015; pp. 3257–3261. [Google Scholar]

[B24-sensors-21-01460] 24.Bishop C.M. Pattern Recognition and Machine Learning. Springer; New York, NY, USA: 2006. [Google Scholar]

[B25-sensors-21-01460] 25.Settles B. Active Learning. Morgan & Claypool; San Rafael, CA, USA: 2012. [Google Scholar]

[B26-sensors-21-01460] 26.Bernardo J.M., Smith A.F.M. Bayesian Theory. John Wiley & Sons; Hoboken, NJ, USA: 1994. [Google Scholar]

[B27-sensors-21-01460] 27.Petrone S., Rousseau J., Scricciolo C. Bayes and empirical Bayes: Do they merge? Biometrika. 2014;101:285–302. doi: 10.1093/biomet/ast067. [DOI] [Google Scholar]

[B28-sensors-21-01460] 28.Pronzato L. One-step ahead adaptive D-optimal design on a finite design space is asymptotically optimal. Metrika. 2010;71:219–238. doi: 10.1007/s00184-008-0227-y. [DOI] [Google Scholar]

[B29-sensors-21-01460] 29.Oppenheim A.V., Schafer R.W. Discrete-Time Signal Processing. 3rd ed. Prentice Hall; Englewood Cliffs, NJ, USA: 2009. [Google Scholar]

[B30-sensors-21-01460] 30.Kay S.M. Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory. Prentice Hall; Englewood Cliffs, NJ, USA: 1993. [Google Scholar]

[B31-sensors-21-01460] 31.Sakiyama A., Tanaka Y., Tanaka T., Ortega A. Eigendecomposition-free sampling set selection for graph signals. IEEE Trans. Signal Process. 2019;67:2679–2692. doi: 10.1109/TSP.2019.2908129. [DOI] [Google Scholar]

[B32-sensors-21-01460] 32.Watts D.J., Strogatz S.H. Collective dynamics of ‘small-world’ networks. Nature. 1998;393:440–442. doi: 10.1038/30918. [DOI] [PubMed] [Google Scholar]

[B33-sensors-21-01460] 33.Barabási A.L., Albert R. Emergence of scaling in random networks. Science. 1999;286:509–512. doi: 10.1126/science.286.5439.509. [DOI] [PubMed] [Google Scholar]

[B34-sensors-21-01460] 34.Perraudin N., Paratte J., Shuman D., Martin L., Kalofolias V., Vandergheynst P., Hammond D.K. GSPBOX: A toolbox for signal processing on graphs. arXiv. 20161408.5781v2 [cs.IT] [Google Scholar]

[B35-sensors-21-01460] 35.Integrated Surface Database (ISD) by National Climatic Data Center (NCDC) of the USA. [(accessed on 11 December 2020)]; Available online: https://www.ncdc.noaa.gov/isd.

PERMALINK

Sequential Sampling and Estimation of Approximately Bandlimited Graph Signals

Sijie Lin

Ke Xu

Hui Feng

Bo Hu

Roles

Abstract

1. Introduction

2. System Model

2.1. Preliminaries

2.2. Signal Prior

2.3. Observation Model

2.4. Problem Formulation

3. Algorithm

3.1. Hyperparameter and Signal Estimation

3.2. Sample Selection

3.3. Sequential Sampling and Estimation

4. Asymptotic Analysis

Lemma 1.

Corollary 1.

Corollary 2.

Theorem 2.

Corollary 3.

5. Experiments

5.1. Simulations on Synthetic Data

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

5.2. A Real-World Example of Temperature Sensor Network

Figure 7.

Figure 8.

6. Conclusions

Appendix A. Concavity of Objective Function in M Step

Appendix B. Search Method for Optimization Problem in M Step

Figure A1.

Appendix C. Proof of Lemma 1

Proof.

Appendix D. Proof of Theorem 2

Proof.

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases