Distributed weighted least-squares estimation with fast convergence for large-scale systems

Damián Edgardo Marelli; Minyue Fu

doi:10.1016/j.automatica.2014.10.077

. 2015 Jan;51:27–39. doi: 10.1016/j.automatica.2014.10.077

Distributed weighted least-squares estimation with fast convergence for large-scale systems^☆

Damián Edgardo Marelli ^a,^c, Minyue Fu ^a,^b

PMCID: PMC4308017 PMID: 25641976

Abstract

In this paper we study a distributed weighted least-squares estimation problem for a large-scale system consisting of a network of interconnected sub-systems. Each sub-system is concerned with a subset of the unknown parameters and has a measurement linear in the unknown parameters with additive noise. The distributed estimation task is for each sub-system to compute the globally optimal estimate of its own parameters using its own measurement and information shared with the network through neighborhood communication. We first provide a fully distributed iterative algorithm to asymptotically compute the global optimal estimate. The convergence rate of the algorithm will be maximized using a scaling parameter and a preconditioning method. This algorithm works for a general network. For a network without loops, we also provide a different iterative algorithm to compute the global optimal estimate which converges in a finite number of steps. We include numerical experiments to illustrate the performances of the proposed methods.

Keywords: Distributed estimation, Distributed state estimation, Large scale optimization, Sensor network, Networked control

1. Introduction

A sensor network is a web of a large number of intelligent sensing and computing nodes connected via a communication network (Dargie & Poellabauer, 2010). The emergence of sensor networks calls for the development of distributed algorithms for a number of tasks to replace the traditional centralized methods. In particular, the development of distributed algorithms for parameter estimation has recently attracted a great deal of attention (Fang and Li, 2008, Kar et al., 2012, Li and AlRegib, 2007, Li and AlRegib, 2009, Lopes and Sayed, 2008, Ribeiro and Giannakis, 2006a, Ribeiro and Giannakis, 2006b, Ribeiro et al., 2010, Xiao et al., 2006). They find applications in environmental and weather monitoring, industrial process monitoring and control, surveillance, state estimation for smart grid, etc.

Existing distributed estimation methods can be classified in several ways. The first classification of the methods is done according to whether a coordinating node or fusion center (FC) is present. When an FC is present, each node communicates with the FC either directly (via a star topology) or indirectly (via a mesh topology), i.e., there is a directed communication path from either node to the FC. The estimation is carried out at the FC, after some possible pre-processing at each node before transmission. The methods proposed in Fang and Li (2008), Li and AlRegib, 2007, Li and AlRegib, 2009, Ribeiro and Giannakis, 2006a, Ribeiro and Giannakis, 2006b are of this type. On the other hand, when no FC is present, estimation is done by executing a cooperative algorithm among all the nodes of the network. The network is connected in some way and each node communicates with its neighboring nodes only. Representative methods of this type include Conejo, de la Torre, and Canas (2007), Gómez-Expósito, Villa Jaén, Gómez-Quiles, Rousseaux, and Van Cutsem (2011), Kar et al. (2012) and Lopes and Sayed (2008). A second classification is done by whether the estimation method is static or dynamic. In static estimation, a set of parameters is estimated using the measurements of all nodes, which collectively form a snapshot of the system at a fixed time. Examples of this type include Fang and Li (2008), Kar et al. (2012), Li and AlRegib, 2007, Li and AlRegib, 2009, Lopes and Sayed (2008) and Ribeiro and Giannakis, 2006a, Ribeiro and Giannakis, 2006b. On the other hand, in dynamic estimation all the nodes track the evolution of a set of parameters for which a dynamic model is available, as in the Refs. Carli, Chiuso, Schenato, and Zampieri (2008), Hlinka, Sluciak, Hlawatsch, Djuric, and Rupp (2012), Khan and Moura (2008) and Ribeiro et al. (2010). Some “hybrid” methods exist, which permit tracking a time-varying sequence of parameters, without a dynamic model, by updating a static estimate at each time (Lopes & Sayed, 2008). A final classification is done by whether a distributed estimation method is a small-scale one or a large-scale one. In a small-scale method, all nodes estimate a common set of parameters (Fang and Li, 2008, Kar et al., 2012, Li and AlRegib, 2007, Li and AlRegib, 2009, Lopes and Sayed, 2008, Ribeiro and Giannakis, 2006a, Ribeiro and Giannakis, 2006b). But in a large-scale method, each node only reconstructs a subset of the parameters, i.e., the collective knowledge of the parameters is distributed among all the nodes (Conejo et al., 2007, Gómez-Expósito et al., 2011, Huang et al., 2012, Khan and Moura, 2008). Large-scale estimation is in general more challenging than its small-scale counterpart.

In this paper we study distributed static estimation for a large-scale system consisting of a network of interconnected sub-systems. Each sub-system is concerned with a subset of the unknown parameters and has measurements, linear in the unknown parameters, corrupted by additive noise. The distributed estimation task is for each sub-system to estimate its local state using its own measurement and information shared with the network through neighborhood communication. We use the weighted least squares (WLS) criterion for optimal estimation. The goal is that the composite estimate of the whole system, consisting of all local estimates, will become globally optimal in the sense that it is the same as the optimal estimate obtained using all the measurements and a centralized estimation method.

This problem is motivated by many applications involving a large-scale spatially distributed system. For example, the state estimation problem for a large power network is concerned with estimating the voltages and phases of the power supply at each sub-system, consisting of a number of buses or a substation, using measurements (provided by, for example, a phasor measurement unit (PMU) or a supervisory control and data acquisition (SCADA) system) (Conejo et al., 2007, Jiang et al., 2008). Interactions of sub-systems are reflected by the fact that local measurements available at each sub-system typically involve neighboring sub-systems. For example, a current measurement at a conjunction depends on the voltage difference of two neighboring buses. In a smart grid setting, each sub-system is only interested in the local state, i.e., its own voltages and phases, using local measurements and information acquired from neighboring sub-systems via neighborhood communication (Tai, Marelli, Rohr, & Fu, 2013). For a large power network, it is both impractical and unnecessary for every sub-system to estimate the whole state of the system, thus distributed estimation methods for local state estimation are naturally called for. The second example is the localization problem for a wireless sensor network, involving estimating the locations of all sensors using relative measurements (e.g., relative distances or relative positions) between sensors (Diao et al., 2013, Khan et al., 2009). For a small sensor network with a few sensing nodes, it is possible to aggregate all the measurements at an FC to compute a global estimate of all sensor locations. Such an algorithm is not scalable, and would require massive computing resources when the network becomes large. It is also unnecessary for each sensor to localize other nodes. A distributed method is preferred, where each node aims to estimate its own location using local measurements and neighborhood communication. The third example is a traffic network for a city or a highway system, where each node or sub-system wants to estimate its local state (e.g., traffic flow rates, delays, etc.). Due to the spatial correlations of the traffic flows in different sub-systems, neighboring traffic information is certainly useful in predicting the traffic conditions at each sub-system. Again, distributed estimation methods are preferred over centralized methods. Many other examples in sensor networks can be found in, e.g., Ribeiro et al. (2010), Kar et al. (2012) and Yang et al. (2010).

We first provide a fully distributed iterative algorithm for each node to compute its own local estimate. This algorithm works for a general connected network. Contrary to the method proposed in Conejo et al. (2007), our algorithm guarantees that the composite estimate will converge asymptotically to the global WLS estimate. We then focus on the convergence rate of the algorithm. Since our method is based on Richardson’s method for solving systems of linear equations (Bertsekas & Tsitsiklis, 1997), its convergence rate depends on certain scaling parameter and a preconditioning matrix. Choosing the optimum scaling parameter requires the knowledge of the largest and the smallest eigenvalues of certain positive definite matrix (the estimation error covariance). A distributed algorithm for estimating these values can be obtained using the power method (Bertsekas & Tsitsiklis, 1997). However, to prevent numerical instability, this approach requires periodically executing a normalization step, which needs to be carried out in a distributed manner. In Yang et al. (2010), this is done using average consensus. A drawback of this approach is that consensus itself converges asymptotically. This significantly slows down the convergence of the eigenvalue estimation. To avoid this, we propose a different method in which normalization is done locally, at each node, without inter-node communication. In this way, the optimal scaling parameter can be obtained using a distributed method. We then address the problem of designing the preconditioning matrix. Our distributed scenario constrains us to use a block diagonal matrix. Choosing the optimal matrix under this constraint, and using only distributed processing, is a very challenging problem for which we are not able to provide a solution. Nevertheless, we are able to bound the difference between the convergence rate achieved using the optimal matrix, and the convergence rate resulting using a practically feasible matrix design. This bound turns out to have a simple expression which depends on the network connectivity. A shortened version of this method appears in the conference paper (Marelli & Fu, 2013).

For an acyclic network (i.e., its communication graph contains no loops), we provide a different iterative algorithm for distributed estimation. As opposed to the previous algorithm, in this one, the composite estimate is guaranteed to converge to the globally optimal estimate in a finite number of steps. Indeed, we show that the convergence time equals the diameter of the aforementioned graph. Numerical experiments show that this method offers a major reduction in convergence time.

The rest of paper is organized as follows. In Section 2 we describe the distributed WLS estimation the problem. In Section 3 we derive the first distributed WLS method, which converges asymptotically. In Section 3.1 we describe distributed methods for finding the value of the scaling parameter which yields the fastest convergence rate. In Section 3.2 we describe a sub-optimal choice for preconditioning matrix, and we bound its sub-optimality. In Section 4 we introduce the second distributed WLS method which converges in finite time. Numerical experiments are presented in Section 5, and concluding remarks are given in Section 6. For the ease of readability, all proofs are contained in the Appendix A, Appendix B.

Notation 1

For a vector $x$ , $‖ x ‖$ denotes its 2-norm, and ${[x]}_{i}$ denotes its $i th$ entry. For a matrix $X$ , $‖ X ‖$ denotes its operator (induced) norm. For square symmetric matrices $X$ and $Y$ , $X < Y$ means that matrix $Y - X$ is positive-definite. For vectors and matrices, the superscript $^{T}$ denotes its transpose, and $^{*}$ denotes its transpose conjugate.

2. Problem description

Consider a network formed by $I$ nodes. For each $i = 1, \dots, I$ , Node $i$ has an associated parameter vector $x_{i} \in C^{d_{i}}$ , and measures the vector $y_{i} \in C^{p_{i}}$ , which is given by

y_{i} = \sum_{j = 1}^{I} A_{i, j} x_{j} + v_{i},

(1)

where $v_{i} \sim N (0, R_{i})$ denotes the measurement noise. The noises $v_{i}$ and $v_{j}$ are statistically independent, whenever $i \neq j$ .

Let $x^{T} = [x_{1}^{T}, \dots, x_{I}^{T}]$ , $y^{T} = [y_{1}^{T}, \dots, y_{I}^{T}]$ , $v^{T} = [v_{1}^{T}, \dots, v_{I}^{T}]$ , $R = diag {R_{1}, \dots, R_{I}}$ and $A = {[A_{i, j}]}_{i, j = 1, \dots, I}$ . Then, we can write the measurement model of the whole network as

y = A x + v,

(2)

with $v \sim N (0, R)$ . The WLS estimator $\hat{x}$ of $x$ is defined by Kay (1993, Eq. (8.14))

\hat{x} = arg min_{x} {(y - A x)}^{*} R^{- 1} (y - A x) .

Its solution is given by

\hat{x} = Ψ^{- 1} α

(3)

with

α = A^{*} R^{- 1} y and Ψ = A^{*} R^{- 1} A .

For the WLS problem to be well defined, we make the following assumption:

Assumption 2

Matrix $A$ has full column rank and $R$ is non-singular.

Computing (3) requires global information, i.e., all the measurements and the information about $A$ and $R$ need to be made available together. Our goal is to derive distributed methods in which Node $i$ computes the component ${\hat{x}}_{i}$ of the estimate $\hat{x}$ , corresponding to $x_{i}$ , using only the local measurement $y_{i}$ and information received from its neighbors (a formal definition of neighborhood will be given later).

In the rest of the paper we use the following notation:

Notation 3

Let $I_{i} = {j : A_{j, i} \neq 0}$ denote the set of nodes whose measurements involve the parameters of Node $i$ , and $O_{i} = {j : A_{i, j} \neq 0}$ denote the set of nodes whose parameters are involved in the measurements of Node $i$ . Let $N_{i} = I_{i} \cup O_{i}$ be the neighborhood of Node $i$ . We also define $B_{i} = {j : N_{i} \cap N_{j} \neq 0̸}$ to be the set of nodes whose neighborhood have a non-empty intersection with that of Node $i$ . Notice that $B_{i} = {j : Ψ_{i, j} \neq 0}$ .

3. Asymptotic method for WLS estimation

The distributed WLS method derived in this section uses the following definition of neighbor node.

Definition 4

Node $j$ is a neighbor of Node $i$ if $j \in N_{i}$ .

Also, the proposed method requires the following connectivity assumption:

Assumption 5

For each $i = 1, \dots, I$ , Node $i$ can send/receive information to/from all its neighbors. Also, $A_{i, j}$ , for all $j \in O_{i}$ , and $R_{i}$ are available at Node $i$ .

Although our method works regardless of whether the network is sparse or not, it works most efficiently for sparse networks. A network is called sparse if the cardinality of $N_{i}$ is small for all $i = 1, 2, \dots, I$ .

Consider any block diagonal positive definite matrix

Π = diag {Π_{1}, \dots, Π_{I}}

(4)

with $Π_{i} \in C^{d_{i} \times d_{i}}, i = 1, \dots, I$ . Then define

Υ = Π^{1 / 2} Ψ Π^{1 / 2}

(5)

and choose

0 < γ < 2 {‖ Υ ‖}^{- 1} .

(6)

Let $\tilde{α} = {(γ Π)}^{1 / 2} α$ and $\tilde{\hat{x}} = {(γ Π)}^{- 1 / 2} \hat{x}$ . From (3) we have

\tilde{\hat{x}} = {(γ Υ)}^{- 1} \tilde{α} .

From (6), $0 < γ Υ < 2 I$ . Hence, $- I < γ Υ - I < I$ and therefore $‖ I - γ Υ ‖ < 1$ . In view of this, we can use Richardson’s method (Bertsekas & Tsitsiklis, 1997) to compute $\tilde{\hat{x}}$ recursively. This leads to

\tilde{\hat{x}} (t + 1) = (I - γ Υ) \tilde{\hat{x}} (t) + \tilde{α} .

(7)

Then, by substituting the expressions of $\tilde{α}$ and $\tilde{\hat{x}}$ , we obtain straightforwardly

\hat{x} (t + 1) = (I - γ Π Ψ) \hat{x} (t) + γ Π α .

(8)

We call $Π$ the preconditioning matrix, because, as it will be explained in Section 3.2, it is used to increase the convergence rate of the recursions (8).

Let

α_{i} = \sum_{k \in I_{i}} α_{i}^{(k)},

(9)

with $α_{i}^{(k)} = A_{k, i}^{*} R_{k}^{- 1} y_{k}$ , for $k = 1, \dots, I$ , so that $α^{T} = [α_{1}^{T}, \dots, α_{I}^{T}]$ . Also, for $i, j = 1, \dots, I$ , let

Ψ_{i, j} = \sum_{k : i, j \in O_{k}} Ψ_{i, j}^{(k)},

(10)

where $Ψ_{i, j}^{(k)} = A_{k, i}^{*} R_{k}^{- 1} A_{k, j}$ , for all $k = 1, \dots, I$ , so that $Ψ = {[Ψ_{i, j}]}_{i, j = 1, \dots, I}$ . We have that

{[Ψ \hat{x} (t)]}_{i} = \sum_{j = 1}^{I} Ψ_{i, j} {\hat{x}}_{j} (t) = \sum_{j = 1}^{I} \sum_{k : i, j \in O_{k}} Ψ_{i, j}^{(k)} {\hat{x}}_{j} (t) = \sum_{k : i \in O_{k}} \sum_{j \in O_{k}} Ψ_{i, j}^{(k)} {\hat{x}}_{j} (t) = \sum_{k \in I_{i}} \sum_{j \in O_{k}} Ψ_{i, j}^{(k)} {\hat{x}}_{j} (t) .

(11)

Then, from (8), (11), (9), we obtain

{\hat{x}}_{i} (t + 1) = {\hat{x}}_{i} (t) - γ Π_{i} \sum_{j = 1}^{I} Ψ_{i, j} {\hat{x}}_{j} (t) + γ Π_{i} α_{i} = {\hat{x}}_{i} (t) - γ Π_{i} (\sum_{k \in I_{i}} \sum_{j \in O_{k}} Ψ_{i, j}^{(k)} {\hat{x}}_{j} (t) - \sum_{k \in I_{i}} α_{i}^{(k)}) .

(12)

Notice that the matrices $Ψ_{i, j}^{(k)}$ are only available at Node $k$ . That is, Node $k$ acts as an intermediary between Node $j$ , which transmits ${\hat{x}}_{j} (t)$ , and Node $i$ , which receives $\sum_{j \in O_{k}} Ψ_{i, j}^{(k)} {\hat{x}}_{j} (t)$ . This means that node $j$ should transmit ${\hat{x}}_{j} (t)$ to all nodes $k$ with $j \in O_{k}$ , or equivalently, to all nodes in $I_{j}$ . However, Node $j$ does not know which nodes are in $I_{j}$ . Thus, Node $j$ simply transmits ${\hat{x}}_{j} (t)$ to all nodes in $N_{j}$ , and it is up to the receiving Node $k$ to detect whether Node $j \in O_{k}$ or not.

Following the discussion above, we obtain the following algorithm to implement (12).

Algorithm 1 - distributed WLS estimation:

Initialization:

(1)
For each $k = 1, \dots, I$ and $i \in O_{k}$ , Node $k$ computes $α_{i}^{(k)}$ and sends it to Node $i$ .
(2)
On reception, Node $i$ computes $α_{i} = \sum_{k \in I_{i}} α_{i}^{(k)}$ .
(3)
For each $i = 1, \dots, I$ , Node $i$ sets ${\hat{x}}_{i} (1) = 0$ .

Main loop: At time $t \in N$ :

(1)
For each $j = 1, \dots, I$ and $k \in N_{j}$ , Node $j$ sends its current estimate ${\hat{x}}_{j} (t)$ to Node $k$ .
(2)
On reception, for each $k = 1, \dots, I$ and $i \in O_{k}$ , Node $k$ sends to Node $i$
${\overset{ˇ}{x}}_{i, k} (t) = \sum_{j \in O_{k}} Ψ_{i, j}^{(k)} {\hat{x}}_{j} (t) .$
(3)
On reception, for each $i = 1, \dots, I$ , Node $i$ computes
${\hat{x}}_{i} (t + 1) = {\hat{x}}_{i} (t) - γ Π_{i} (\sum_{k \in I_{i}} {\overset{ˇ}{x}}_{i, k} (t) - α_{i}) .$

To implement Algorithm 1, we need to design the scaling factor $γ$ and the preconditioning matrices $Π_{i}$ , for all $i = 1, \dots, I$ . We address these two tasks in Sections 3.1, 3.2, respectively.

3.1. Distributed design of the scaling factor $γ$

In this section we study two approaches for designing the scaling factor $γ$ . In Section 3.1.1 we describe a distributed algorithm which converges asymptotically to the optimal value of $γ$ , i.e., the resulting $γ$ will achieve the maximum convergence speed. In Section 3.1.2, we give another distributed algorithm which converges in finite time to a sub-optimal value of $γ$ .

3.1.1. Asymptotic algorithm for $γ$

In view of (7), the value of $γ$ that maximizes the convergence rate is

γ = \frac{2}{‖ Υ ‖ + {‖ Υ^{- 1} ‖}^{- 1}},

(13)

because this is the value that minimizes $‖ I - γ Υ ‖$ . In order for each node to find the value of $γ$ given in (13), we need distributed methods for finding $‖ Υ ‖$ and ${‖ Υ^{- 1} ‖}^{- 1}$ . We give these methods below. These methods yield, at Node $i$ and time step $t$ , estimates ${\bar{Υ}}_{i} (t)$ and ${\underline{Υ}}_{i} (t)$ , of $‖ Υ ‖$ and ${‖ Υ^{- 1} ‖}^{- 1}$ , respectively. Then, at the same node and time step, the estimate $γ_{i} (t)$ of $γ$ is obtained by

γ_{i} (t) = \frac{2}{{\bar{Υ}}_{i} (t) + {\underline{Υ}}_{i} (t)} .

Distributed method for finding $‖ Υ ‖$ :

Choose any vector $b (0) \neq 0$ and let $b (t) = Υ^{t} b (0)$ . Then, using (11) with $Π^{1 / 2} b (t)$ in place of $\hat{x} (t)$ , we obtain at Node $i$ ,

b_{i} (t + 1) = {[Υ b (t)]}_{i} = Π_{i}^{1 / 2} {[Ψ Π^{1 / 2} b (t)]}_{i} = Π_{i}^{1 / 2} \sum_{k \in I_{i}} \sum_{j \in O_{k}} Ψ_{i, j}^{(k)} Π_{j}^{1 / 2} b_{j} (t),

(14)

where $b_{i} (t)$ denotes the $i th$ block component of $b (t)$ . Then, using the power method (Bertsekas & Tsitsiklis, 1997), Node $i$ can asymptotically compute $‖ Υ ‖$ as follows

‖ Υ ‖ = lim_{t \to \infty} \frac{‖ b_{i} (t) ‖}{‖ b_{i} (t - 1) ‖} .

(15)

An inconvenience with the approach above is that $b (t)$ either increases or decreases indefinitely. To avoid this, the vector $b (t)$ can be periodically normalized. In Yang et al. (2010), this was done using average consensus (in the continuous-time case). As we mentioned in Section 1, we avoid the drawbacks of that method by providing an alternative approach in which $b (t)$ is normalized at each node, and each iteration, without inter-node communication. This algorithm is given below:

Algorithm 2 - distributed estimation of $‖ Υ ‖$ : For each $k = 1, \dots, I$ , Node $k$ , chooses ${\bar{b}}_{k} (1)$ , with $‖ {\bar{b}}_{k} (1) ‖ = 1$ and sets $ς_{k} (1) = 1$ and $υ_{i, j}^{(k)} (1) = 1$ , for all $i, j \in N_{k}$ . Then, at time $t \in N$ :

(1)
For each $j = 1, \dots, I$ and $k \in N_{j}$ , Node $j$ sends $(Π_{j}^{1 / 2} {\bar{b}}_{j} (t), ς_{j} (t))$ to Node $k$ .
(2)
On reception, for each $k = 1, \dots, I$ and $i \in O_{k}$ , Node $k$ sends $({\overset{ˇ}{b}}_{i}^{(k)} (t), {\bar{ς}}_{i}^{(k)} (t))$ to Node $i$ , where
${\overset{ˇ}{b}}_{i}^{(k)} (t) = \sum_{j \in O_{k}} υ_{i, j}^{(k)} (t) Ψ_{i, j}^{(k)} Π_{j}^{1 / 2} {\bar{b}}_{j} (t), {\bar{ς}}_{i}^{(k)} (t) = max_{j \in N_{k}} υ_{i, j}^{(k)} (t),$
and
$υ_{i, j}^{(k)} (t) = \frac{ς_{i} (t)}{ς_{j} (t)} υ_{i, j}^{(k)} (t - 1) .$
(3)
On reception, for each $i = 1, \dots, I$ , Node $i$ computes
${\bar{b}}_{i} (t + 1) = ς_{i} (t + 1) {\tilde{b}}_{i} (t + 1), ς_{i} (t + 1) = max {‖ {\tilde{b}}_{i} (t + 1) ‖, {\bar{ς}}_{i}^{(k)} (t), k \in I_{i}}^{- 1},$
with
${\tilde{b}}_{i} (t + 1) = Π_{i}^{1 / 2} \sum_{k \in I_{i}} {\overset{ˇ}{b}}_{i}^{(k)} (t) .$ (16)
Also, the estimate ${\bar{Υ}}_{i} (t)$ of $‖ Υ ‖$ is
${\bar{Υ}}_{i} (t) = ς_{i} {(t + 1)}^{- 1} .$ (17)

The convergence of Algorithm 2 to $‖ Υ ‖$ is guaranteed by the next theorem.

Theorem 6

Consider the network (1) together with Assumption 2, Assumption 5. Then, for each $i \in {1, \dots, I}$ ,

$lim_{t \to \infty} {\bar{Υ}}_{i} (t) = ‖ Υ ‖ .$ (18)

Distributed method for finding ${‖ Υ^{- 1} ‖}^{- 1}$ :

Let $c \geq ‖ Υ ‖$ and $Φ = c I - Υ$ . It follows that

{‖ Υ^{- 1} ‖}^{- 1} = \underline{eig} (Υ) = c - \bar{eig} (Φ) = c - ‖ Φ ‖,

where, for a symmetric matrix $X$ , $\underline{eig} (X)$ and $\bar{eig} (X)$ denote the smallest and largest eigenvalues of $X$ , respectively. Hence, we can find ${‖ Υ^{- 1} ‖}^{- 1}$ by applying Algorithm 2 on $Φ$ , to find $‖ Φ ‖$ . To this end, at Node $i$ and time $t$ , we choose $c = {\bar{Υ}}_{i} (t)$ . This leads to the following algorithm:

Algorithm 3 - distributed estimation of ${‖ Υ^{- 1} ‖}^{- 1}$ : Apply Algorithm 2 with (16) replaced by

{\tilde{b}}_{i} (t + 1) = {\bar{Υ}}_{i} (t) {\bar{b}}_{i} (t) - Π_{i}^{1 / 2} \sum_{k \in I_{i}} {\overset{ˇ}{b}}_{i, k} (t),

and (17) replaced by

{\underline{Υ}}_{i} (t) = {\bar{Υ}}_{i} (t) - {\bar{Φ}}_{i} (t), {\bar{Φ}}_{i} (t) = ς_{i} {(t + 1)}^{- 1} .

3.1.2. Finite-time algorithm for $γ$

A sub-optimal design of $γ$ can be achieved using the following result.

Theorem 7

Condition (6) is satisfied by choosing $γ$ so that

$0 < γ < \frac{2}{max_{i} ϕ_{i}},$

where

$ϕ_{i} = \sum_{k \in I_{i}} υ_{i, k},$

$υ_{i, k} = \sum_{j \in O_{k}} ‖ Π_{i}^{1 / 2} Ψ_{i, j}^{(k)} Π_{j}^{1 / 2} ‖ .$

The design of $γ$ using Theorem 7 requires the global information ${max}_{i} ϕ_{i}$ . For each, $i = 1, \dots, I$ , Node $i$ can obtain $ϕ_{i}$ from an initialization stage in which it receives $υ_{i, k}$ , from each Node $k$ , with $k \in I_{i}$ . Then, ${max}_{i} ϕ_{i}$ can be obtained by running the max-consensus algorithm (Olfati-Saber & Murray, 2004), in parallel with the estimation Algorithm 1. Notice that the max-consensus algorithm converges in finite time.

3.2. Design of the preconditioning matrix $Π$

As mentioned above, for a given choice of $Υ$ , the fastest convergence rate of Algorithm 1 is achieved when $γ$ is chosen as in (13). Under this choice of $γ$ , we have that

‖ I - γ Υ ‖ = γ ‖ Υ ‖ - 1 = \frac{2 ‖ Υ ‖}{‖ Υ ‖ + {‖ Υ^{- 1} ‖}^{- 1}} - 1 = \frac{‖ Υ ‖ - {‖ Υ^{- 1} ‖}^{- 1}}{‖ Υ ‖ + {‖ Υ^{- 1} ‖}^{- 1}} = \frac{κ (Υ) - 1}{κ (Υ) + 1},

where $κ (Υ) = ‖ Υ ‖ ‖ Υ^{- 1} ‖$ denotes the condition number of $Υ$ . Then, from (7), there exists $K \geq 0$ , such that

‖ \hat{x} - \hat{x} (t) ‖ \leq K {‖ I - γ Υ ‖}^{t} = K exp {t log \frac{κ (Υ) - 1}{κ (Υ) + 1}},

where we recall that $\hat{x}$ denotes the global estimate of $x$ , given by (3). Then, we define the time constant $τ (Υ)$ of the distributed WLS algorithm by

τ (Υ) = \frac{1}{log \frac{κ (Υ) + 1}{κ (Υ) - 1}} .

(19)

Hence, a natural question is whether the preconditioning matrices $Π_{i}$ , $i = 1, \dots, I$ , can be chosen so that $τ (Υ)$ is minimized. While we are not able to answer this question, we have the following result, which follows using an argument similar to the one in Demmel (1983, Theorem 2).

Theorem 8

If $Π_{i} = Ψ_{i, i}^{- 1}$ , for all $i = 1, \dots, I$ , then

$κ (Υ) \leq β κ_{⋆},$

where

$β = max_{i} | B_{i} |,$

$κ_{⋆} = min_{\tilde{Π} \in P} κ ({\tilde{Π}}^{1 / 2} Ψ {\tilde{Π}}^{1 / 2}),$

with $P$ denoting the set of positive definite block diagonal matrices of the form (4).

Theorem 8 states that, if the preconditioning matrices $Π_{i}$ , $i = 1, \dots, I$ , are chosen as

Π_{i} = Ψ_{i, i}^{- 1}

(20)

then $κ (Υ)$ is at most $β$ times bigger than the smallest possible value $κ_{⋆}$ achievable using block diagonal preconditioning matrices. Notice that $B_{i} = {j : I_{i} \cap I_{j} \neq 0̸} \subseteq {j : N_{i} \cap N_{j} \neq 0̸}$ . Hence, $β$ is bounded by the maximum number of two-hop neighbors over the whole network. Hence, it does not necessarily grow with the network size.

Now, we have

lim_{κ \to \infty} κ log (\frac{κ + 1}{κ - 1}) = 2 .

Hence, from Theorem 8, for large $κ (Υ)$ , we have

τ (Υ) ≃ \frac{κ (Υ)}{2} \leq \frac{β}{2} min_{\tilde{Π}} κ ({\tilde{Π}}^{1 / 2} Ψ {\tilde{Π}}^{1 / 2}) ≃ β τ_{⋆},

(21)

where

τ_{⋆} = min_{\tilde{Π} \in P} τ ({\tilde{Π}}^{1 / 2} Ψ {\tilde{Π}}^{1 / 2}) .

Hence, if $Π_{i}$ , $i = 1, \dots, I$ , are chosen as in (20), and $κ (Υ)$ is large, then the time constant $τ (Υ)$ is at most $β$ times bigger than the minimum value $τ_{⋆}$ .

Remark 9

In view of (20), (10),

$Π_{i} = {(\sum_{k \in I_{i}} Ψ_{i, i}^{(k)})}^{- 1} .$

Hence, its computation requires the matrices $Ψ_{i, i}^{(k)}$ , $k \in I_{i}$ , to be transmitted from Node $k$ to Node $i$ during an initialization stage.

4. Finite-time method for WLS estimation

In this method we replace the definition of neighborhood by the following one:

Definition 10

Node $j$ is a neighbor of Node $i$ if $j \in B_{i}$ .

Consequently, we replace the connectivity Assumption 5 by the following one:

Assumption 11

For each $i = 1, \dots, I$ , Node $i$ can send/receive information to/from all its neighbors. Also, $Ψ_{j, i}$ , for all $j \in B_{i}$ , and $α_{i}$ are available at Node $i$ .

To illustrate the idea behind the proposed algorithm, we consider a network with two nodes. The next lemma states how to obtain the global optimal solution, at each node, in this simple case.

Lemma 12

Consider the network (1) together with Assumption 2. If there are only two nodes, labeled by $a$ and $b$ , then $Ψ_{a, a} - Ψ_{a, b} {\overset{ˇ}{Σ}}_{b} Ψ_{b, a}$ is an invertible matrix and the global estimate ${\hat{x}}_{a}$ of the components $x_{a}$ associated to Node $a$ is given by

${\hat{x}}_{a} = Σ_{a} (α_{a} - Ψ_{a, b} {\overset{ˇ}{x}}_{b}),$

$Σ_{a} = {(Ψ_{a, a} - Ψ_{a, b} {\overset{ˇ}{Σ}}_{b} Ψ_{b, a})}^{- 1},$

where

${\overset{ˇ}{x}}_{b} = {\overset{ˇ}{Σ}}_{b} α_{b},$

${\overset{ˇ}{Σ}}_{b} = Ψ_{b, b}^{- 1} .$

Our next result is an immediate generalization of the one above, to a network with a star topology, i.e., in which all nodes are only possibly connected to a single one.

Lemma 13

Consider the network (1) together with Assumption 2. Suppose that $Ψ_{j, k} = 0$ , for all $j, k \in {1, \dots, I} ∖ {i}$ and $j \neq k$ (i.e., all nodes are only possibly connected to Node $i$ ). Then $Ψ_{i, i} - \sum_{j \in B_{i} ∖ {i}} Ψ_{i, j} {\overset{ˇ}{Σ}}_{j} Ψ_{j, i}$ is an invertible matrix and ${\hat{x}}_{i}$ is given by

${\hat{x}}_{i} = Σ_{i} (α_{i} - \sum_{j \in B_{i} ∖ {i}} Ψ_{i, j} {\overset{ˇ}{x}}_{j}),$

$Σ_{i} = {(Ψ_{i, i} - \sum_{j \in B_{i} ∖ {i}} Ψ_{i, j} {\overset{ˇ}{Σ}}_{j} Ψ_{j, i})}^{- 1},$

where

${\overset{ˇ}{x}}_{j} = {\overset{ˇ}{Σ}}_{j} α_{j},$

${\overset{ˇ}{Σ}}_{j} = Ψ_{j, j}^{- 1} .$

Then, using (11), (9) and Lemma 13, we obtain the following algorithm:

Algorithm 4 - distributed WLS estimation:

Initialization: For each $i = 1, \dots, I$ ,

(1)
Node $i$ computes
${\overset{ˇ}{x}}_{i} (0) = {\overset{ˇ}{Σ}}_{i} (0) α_{i}, {\overset{ˇ}{Σ}}_{i} (0) = Ψ_{i, i}^{- 1} .$
and for each $j \in B_{i} ∖ {i}$ ,
${\overset{ˇ}{x}}_{i, j} (0) = {\overset{ˇ}{x}}_{i} (0), {\overset{ˇ}{Σ}}_{i, j} (0) = {\overset{ˇ}{Σ}}_{i} (0) .$

Main loop: For each $i = 1, \dots, I$ , and time $t \in N$

(1)
Node $i$ computes, for each $j \in B_{i} ∖ {i}$ ,
$γ_{i, j} (t) = Ψ_{j, i} {\overset{ˇ}{x}}_{i} (t - 1), Γ_{i, j} (t) = Ψ_{j, i} {\overset{ˇ}{Σ}}_{i} (t - 1) Ψ_{i, j},$
and sends $(γ_{i, j} (t), Γ_{i, j} (t))$ to Node $j$ .
(2)
Node $i$ computes
${\overset{ˇ}{x}}_{i} (t) = {\overset{ˇ}{Σ}}_{i} (t) (α_{i} - \sum_{j \in B_{i} ∖ {i}} γ_{j, i} (t - 1)), {\overset{ˇ}{Σ}}_{i} (t) = {(Ψ_{i, i} - \sum_{j \in B_{i} ∖ {i}} Γ_{j, i} (t - 1))}^{- 1},$
and, for each $j \in B_{i} ∖ {i}$ ,
${\overset{ˇ}{x}}_{i, j} (t) = {\overset{ˇ}{Σ}}_{i, j} (t) (α_{i} - \sum_{j \in B_{i} ∖ {i, j}} γ_{j, i} (t - 1)), {\overset{ˇ}{Σ}}_{i, j} (t) = {(Ψ_{i, i} - \sum_{j \in B_{i} ∖ {i, j}} Γ_{j, i} (t - 1))}^{- 1} .$

Our next step is to show that Algorithm 4 converges in finite time to the global WLS solution.

Definition 14

Each pair $(i, j)$ , $i, j \in {1, \dots, I}$ , is called an edge if $Ψ_{i, j} \neq 0$ . A path is a concatenation of contiguous edges, and its length is the number of edges forming it. For each $i, j \in {1, \dots, I}$ , the distance $d_{i, j}$ between Nodes $i$ and $j$ is defined as the minimum length of a path joining these two nodes. The radius $ρ_{i}$ of Node $i$ is defined as the maximum distance between Node $i$ and any other node in the network. The diameter of the network is the maximum radius between all its nodes. A network is called acyclic if it does not contain a path forming a cycle.

The next theorem states that, if the network is acyclic, then the algorithm above yields the global estimate at each node in finite time.

Theorem 15

Consider the network (1) together with Assumption 2, Assumption 11. If the network is acyclic, then, for each $i \in {1, \dots, I}$ , $j \in B_{i} ∖ {i}$ and $t \in N$ , the matrices $Ψ_{i, i} - \sum_{j \in B_{i} ∖ {i}} Γ_{j, i} (t - 1)$ and $Ψ_{i, i} - \sum_{j \in B_{i} ∖ {i, j}} Γ_{j, i} (t - 1)$ are invertible, and for all $t \geq ρ_{i}$ ,

${\overset{ˇ}{x}}_{i} (t) = {\hat{x}}_{i} .$ (22)

5. Simulations

5.1. State estimation in power systems

In the first simulation we use the proposed distributed methods for state estimation in smart electricity networks, involving multi-area interconnected power systems (Huang et al., 2012). To this end, we use the IEEE 118-bus test system, whose specifications are given in Christie (1993). The system’s diagram is shown in Fig. 1, where buses are represented by circles and lines by edges. Some buses have a phasor measurement unit (PMU) installed. These buses are shown in gray. Each PMU measures the voltage of the bus where it is installed, as well as the currents of the lines attached to that bus. The goal is to estimate the state vector $x$ , containing voltage (a complex phasor) at each bus. For the purposes of state estimation, the buses are clustered in nodes. Two clustering examples as shown in Table 1, Table 3.

Fig. 1 — Diagram of the IEEE 118-bus test system.

Table 1.

Nodes forming a cyclic network topology.

Node	Buses
1	1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 117
2	13, 15, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 29, 30, 31, 32, 33, 113, 114, 115
3	24, 38, 70, 71, 72, 73, 74
4	34, 35, 36, 37, 39, 40, 41, 42, 43
5	44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 77, 80, 81, 100, 116
6	75, 76, 78, 79, 82, 95, 96, 97, 98, 118
7	83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94
8	99, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112

Open in a new tab

Table 3.

Nodes forming an acyclic network topology.

Node	Buses
1	1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 117
2	23, 25, 26, 27, 28, 29, 31, 32, 113, 114, 115
3	5, 16, 17, 18, 19, 20, 21, 22, 24, 30, 33, 34, 35, 36, 37, 39, 40, 71, 72, 73
4	38, 41, 42, 43, 44, 45, 46, 47, 48, 69, 70, 74, 75, 76, 77, 118
5	49, 50, 51, 54, 65, 66, 68, 78, 79, 80, 81, 82, 95, 96, 97, 98, 99, 116
6	52, 53, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 67
7	83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112

Open in a new tab

Let $P$ denote the number of PMUs in the whole system. For each $p = 1, \dots, P$ , let $L_{p}$ denote the number of lines attached to the bus where PMU $p$ is installed. Let also $B_{p}^{T} = [e_{p}^{T}, y_{p, 1}^{T}, \dots, y_{p, L_{p}}^{T}]$ , where the vectors $e_{p}$ and $y_{p, l}$ , $l = 1, \dots, L_{p}$ , are defined such that $e_{p} x$ is the voltage of the installation bus, and $y_{p, l} x$ is the current of the $p$ -th attached line (the value of $y_{p, l}$ is taken from Christie, 1993). Then, matrix $A$ in (2) is given by $A^{T} = [A_{1}^{T}, \dots, A_{I}^{T}]$ , where, for each $i = 1, \dots, I$ , the block of rows $A_{i}$ corresponding to Node $i$ is formed by stacking the matrices $B_{p}$ corresponding to all PMUs contained in Node $i$ , i.e., $A_{i}^{T} = [B_{p_{1}}^{T}, \dots, B_{p_{P_{i}}}^{T}]$ , where $p_{1}, \dots, p_{P_{i}}$ denote the indexes of those PMUs.

We place the PMUs using the method in Tai et al. (2013). This guarantees that matrix $A$ has full column rank. We also assume that the noise covariance is $R = σ^{2} I$ , with $σ = 0.05$ . Notice that voltage and current values in the test system are per unit values, i.e., they appear divided by the nominal voltage $V_{0}$ and the nominal current $I_{0}$ , respectively. Hence, $σ = 0.05$ means that voltage measurements have a standard deviation of $0.05 \times V_{0}$ volts, and current measurements have one of $0.05 \times I_{0}$ amperes. This leads to a global estimate $\hat{x}$ having a relative estimation error of

e = 20 {log}_{10} \frac{‖ x - \hat{x} ‖}{‖ x ‖} = - 17.45 dB .

(23)

In the simulations below, we use

r (t) = 20 {log}_{10} \frac{‖ \hat{x} - \hat{x} (t) ‖}{‖ \hat{x} ‖},

(24)

to measure the relative difference between the global estimate $\hat{x}$ and the one yielded, at time step $t$ , by the proposed distributed algorithms.

5.1.1. Cyclic network topology

In the first simulation we cluster the buses into eight nodes, as shown in Table 1. From the definition of $N_{i}$ , it follows that $j \in N_{i}$ if there is a bus in either, Node $i$ or $j$ , with a PMU installed, having an attached line coming from a bus inside the other node. Fig. 2 shows the topology of the communication network induced by the clustering given in Table 1.

Fig. 2 — Cyclic network topology induced by the nodes in Table 1.

Fig. 3 shows the convergence of the asymptotic method without preconditioning. To this end, we show the modulus of the estimated voltage of each bus at each step. We see that the convergence is very slow, with a relative difference of $r (10^{6}) = - 37 dB$ , between the global estimate and the one obtained by the distributed algorithm at $t = 10^{6}$ . The reason for the slow convergence is that the condition number of $Ψ$ is 478 972. The preconditioning matrix in (20) gives a condition number of 700, which leads to a much faster convergence. This is shown in Fig. 4, where $r (2 \times 10^{3}) = - 52.47 dB$ . Fig. 5 shows that the convergence of the estimation of $‖ Υ ‖$ and ${‖ Υ^{- 1} ‖}^{- 1}$ , at each node, is much faster than that of the WLS estimation algorithm. Finally, Table 2 shows the complexity at each node. To this end, we measure the number of multiplications in a whole cycle of Algorithms 1–3.

Fig. 4 — Convergence of the asymptotic method, with preconditioning, in a cyclic network.

Fig. 5 — Convergence of the distributed eigenvalue estimation algorithm.

Table 2.

Complexity at each node, in number of multiplications per iteration, for the cyclic network topology.

Node	1	2	3	4	5	6	7	8

Complexity	5202	8613	10 469	14 848	26 054	11 810	14 352	10 628

Open in a new tab

5.1.2. Acyclic network topology

In the second simulation we do the clustering such that the induced topology is acyclic. From the definition of $B_{i}$ , it follows that $j \in B_{i}$ if there is a bus (possibly neither in Node $i$ nor in $j$ ), with a PMU installed, having one neighbor bus (i.e., a bus connected to it via an attached line), including possibly itself, in each node, $i$ and $j$ . The clustering and its induced topology are shown in Table 3 and Fig. 6, respectively.

Fig. 6 — Acyclic network topology induced by the nodes in Table 3.

The convergence of the asymptotic method (with preconditioning) is shown in Fig. 7, with a final relative difference with the global estimate of $r (20 \times 10^{3}) = - 69.67 dB$ . In this case, we wait for 100 steps before starting with the distributed estimation. The gap caused by this delay can be seen at the beginning of the graph. We introduced this late start so as to give time for Algorithm 2 and 3 to obtain reasonable approximations of $‖ Υ ‖$ and ${‖ Υ^{- 1} ‖}^{- 1}$ , respectively. We see that the asymptotic method presents an oscillating behavior between time steps 1500 and 3500. This is because the transients in the estimation of the scaling factor $γ (t)$ cause the recursions (8) to become temporarily unstable. We also see that the asymptotic method requires about 20×10³ steps to converge. This is because in this case, preconditioning leads to a condition number of 5264. On the other hand, the convergence of the finite-time method does not depend on the condition number, but on the network diameter, which in this case is four. Fig. 8 shows the convergence of this method in four steps, with a final error of $r (4) = - 223.7 dB$ , caused by numerical inaccuracy.

Fig. 8 — Convergence of the finite-time method (in an acyclic network).

Table 4 shows the complexity at each node. To this end, we consider that solving the positive-definite linear system for computing ${\overset{ˇ}{x}}_{i, j} (t)$ , using the Cholesky decomposition, requires $n^{3} / 3 + 2 n^{2}$ multiplications ( $n^{3} / 3$ for the decomposition and $2 n^{2}$ for solving two triangular linear systems). Also, computing the inverse of the matrix ${\overset{ˇ}{Σ}}_{i} (t)$ , using also the Cholesky Decomposition, requires $n^{3} / 2$ multiplications (Krishnamoorthy & Menon, 2011).

Table 4.

Complexity at each node, in a number of multiplications per iteration, for the acyclic network topology.

Node	Asymptotic method	Finite-time method
1	4985	1912
2	3645	786
3	14 088	4400
4	10 400	2304
5	17 802	3240
6	3891	1267
7	8919	8437

Open in a new tab

5.2. Sensor localization

Sensor localization refers to the problem of obtaining the locations of each node in a network, based on the knowledge of the locations of a few anchor nodes, a well as the mutual distances between neighbor nodes. A distributed method for carrying out this task is proposed in Khan et al. (2009). This method requires that, for each $i = 1, \dots, I$ , Node $i$ lies inside of at least one triangle defined by three of its neighbors $N_{i} = {j, k, l}$ . Then, the coordinates $x_{i}$ of Node $i$ can be written as

x_{i} = \sum_{j \in N_{i}} c_{i, j} x_{i, j},

(25)

where the barycentric coordinates $c_{i, j}$ are given by

c_{i, j} = \frac{S (i \cup N_{i} ∖ j)}{S (N_{i})},

with $S (i, j, k)$ denoting the area of the triangle formed by Nodes $i$ , $j$ and $k$ . The latter can be computed using the Cayley–Menger determinant as follows

S^{2} (i, j, k) = - \frac{1}{16} | \begin{matrix} 0 & 1 & 1 & 1 \\ 1 & 0 & d_{i, j}^{2} & d_{i, k}^{2} \\ 1 & d_{i, j}^{2} & 0 & d_{j, k}^{2} \\ 1 & d_{i, k}^{2} & d_{j, k}^{2} & 0 \end{matrix} |,

where $d_{i, j} = ‖ x_{i} - x_{j} ‖$ denotes the distance between Nodes $i$ and $j$ .

For each $i = 1, \dots, I$ , we have one equation of the form (25), for each triangle containing Node $i$ . We assume that $N$ such triangles exist for each node. Hence, we have $N \times I$ equations. Let $x_{i} \in R^{2}$ , $i = 1, \dots, I$ , denote the node coordinates and $a_{j} \in R^{2}$ , $j = 1, \dots, J$ , denote those of the anchor nodes. Let also $x^{T} = [x_{1}^{T}, \dots, x_{I}^{T}]$ and $a^{T} = [a_{1}^{T}, \dots, a_{J}^{T}]$ . Then, the aforementioned $N \times I$ equations can be written as

x = (C \otimes I_{2}) x + (D \otimes I_{2}) a,

or equivalently,

y = A x,

(26)

with $y = (D \otimes I_{2}) a$ and $A = I - C \otimes I_{2}$ . Due to inaccuracy in distance measurements, (26) can be approximately expressed as in (2). In that case, we can use our proposed distributed method to obtain, at each node, a WLS estimation of its coordinates.

The experiment setup is shown in Fig. 9. It includes three anchor nodes, defining a triangle containing $I = 20$ randomly placed nodes. We use a noise covariance matrix $R = σ^{2} I_{d}$ , where $I_{d}$ denotes the identity matrix, and $σ = \sqrt{1 0^{- 3}} ≃ 31.62$ centimeters. With this setup, the global estimate $\hat{x}$ yields a relative localization error of $e = - 33.39 dB$ , defined as in (23). The convergence of the coordinate estimates at each node, using the proposed method, with preconditioning, is shown in Fig. 10. As before, we wait for 10 steps before starting the iterations, to give time for Algorithms 2 and 3 to obtain reasonable approximations of $‖ Υ ‖$ and ${‖ Υ^{- 1} ‖}^{- 1}$ , respectively. The convergences of these estimates are shown in Fig. 11. Finally, the complexity at each node is shown in Table 5.

Fig. 10 — Convergence of the node coordinate estimates.

Fig. 11 — Convergence of the estimated eigenvalues.

Table 5.

Complexity at each sensor node, in a number of multiplications per iteration.

Node	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20

Complexity	170	282	282	426	170	282	602	602	282	426	282	426	282	170	170	426	426	426	426	426

Open in a new tab

For comparison, we also consider the distributed iterative localization algorithm (DILOC) proposed in Khan et al. (2009). This method solves (26) using Richardson’s recursions to invert matrix $A$ . This requires that $N = 1$ , i.e., only one equation of the form (25) is considered for each node. In this case, the recursions are guaranteed to converge because, as the authors show, $‖ I - A ‖ < 1$ holds in this problem. Fig. 12 shows the evolution of the relative difference $r (t)$ (defined as in (24)) between the estimates of each method, and the global estimate $\hat{x}$ . We see that the DILOC method has a faster convergence. This is because the condition number of $A$ is smaller than that of $A^{*} R^{- 1} A$ , which is the matrix inverted by our proposed method. However, at $t = 500$ , the DILOC method yields $r (500) = - 29.44 dB$ , while the proposed one gives $r (500) = - 72.71 dB$ . This difference results from the fact that the DILOC method does not produce the WLS solution on the limit.¹

Fig. 12 — Relative difference with the global estimate $\hat{x}$ vs. iteration $t$ .

6. Conclusion

We proposed two methods for weighted least squares estimation in large-scale systems. Both methods converge to the global solution and aim to maximize the convergence speed. The first method converges asymptotically and involves a distributed estimation of the scaling parameter upon which the convergence speed depends. To further speed up the convergence, we also use a practically feasible preconditioning method, for which we bounded the speed difference with respect to the fastest theoretically achievable. The second proposed method has an even faster convergence, as it achieves the global optimal in finite time. However, it is only suitable for applications where the graph produced by the communication network contains no loops.

Biographies

graphic file with name pic1.jpg

Damián Edgardo Marelli received his Bachelors Degree in Electronics Engineering from the Universidad Nacional de Rosario, Argentina, in 1995, a Ph.D. degree in Electrical Engineering and a Bachelor (Honous) degree in Mathematics from the University of Newcastle, Australia, in 2003. From 2004 to 2005 he held a postdoctoral position at the Laboratoire d’Analyse Topologie et Probabilités, CNRS/Université de Provence, France. Since 2006 he is Research Academic at the School of Electrical Engineering and Computer Science at the University of Newcastle, Australia. In 2007 he received a Marie Curie Postdoctoral Fellowship, hosted at the Faculty of Mathematics, University of Vienna, Austria, and in 2010 he received a Lise Meitner Senior Fellowship, hosted at the Acoustics Research Institute of the Austrian Academy of Sciences. His main research interests include signal processing and communications.

graphic file with name pic2.jpg

Minyue Fu received his Bachelor’s Degree in Electrical Engineering from the University of Science and Technology of China, Hefei, China, in 1982, and M.S. and Ph.D. degrees in Electrical Engineering from the University of Wisconsin-Madison in 1983 and 1987, respectively. From 1983 to 1987, he held a teaching assistantship and a research assistantship at the University of Wisconsin-Madison. He worked as a Computer Engineering Consultant at Nicolet Instruments, Inc., Madison, Wisconsin, during 1987. From 1987 to 1989, he served as an Assistant Professor in the Department of Electrical and Computer Engineering, Wayne State University, Detroit, Michigan. He joined the Department of Electrical and Computer Engineering, the University of Newcastle, Australia, in 1989. Currently, he is a Chair Professor in Electrical Engineering and Head of School of Electrical Engineering and Computer Science. In addition, he was a Visiting Associate Professor at University of Iowa in 1995–1996, and a Senior Fellow/Visiting Professor at Nanyang Technological University, Singapore, 2002. He holds a Qian-ren Professorship at Zhejiang University, China. He is a Fellow of IEEE. His main research interests include control systems, signal processing and communications. He has been an Associate Editor for the IEEE Transactions on Automatic Control, Automatica and Journal of Optimization and Engineering.

Footnotes

^☆

This work was partially supported by the Open Research Project of the State Key Laboratory of Industrial Control Technology, Zhejiang University, China (No. ICT1414) and by the Austrian Science Fund (FWF project M1230-N13). The material in this paper was partially presented at the 52nd IEEE Conference on Decision and Control (CDC 2013), December 10–13, 2013, Florence, Italy. This paper was recommended for publication in revised form by Associate Editor Tongwen Chen under the direction of Editor Ian R. Petersen.

Notice that, in the scenario considered in this work, noisy inter-node distances are only measured once, and they remain unchanged during the whole iteration process. This is in contrast to the scenario considered in Khan et al. (2009), where these distances are re-measured at each iteration.

Contributor Information

Damián Edgardo Marelli, Email: Damian.Marelli@newcastle.edu.au.

Minyue Fu, Email: Minyue.Fu@newcastle.edu.au.

Appendix A. Proofs of Section 3

A.1. Proof of Theorem 6

Fix $t \in N$ . Let $k_{i} (t) \in R$ and

{\bar{b}}_{i} (t) = k_{i} (t) b_{i} (t) .

(A.1)

From (14), we have

{\bar{b}}_{i} (t + 1) = \frac{k_{i} (t + 1)}{k_{i} (t)} Π_{i}^{1 / 2} \sum_{k \in I_{i}} \sum_{j \in O_{k}} \frac{k_{i} (t)}{k_{j} (t)} Ψ_{i, j}^{(k)} Π_{j}^{1 / 2} {\bar{b}}_{j} (t) .

Let $k_{i} (0) = 1$ and

ς_{i} (t) = \frac{k_{i} (t)}{k_{i} (t - 1)},

so that

k_{i} (t) = \prod_{τ = 1}^{t} ς_{i} (τ) .

Then,

{\bar{b}}_{i} (t + 1) = ς_{i} (t + 1) {\tilde{b}}_{i} (t + 1),

(A.2)

with

{\tilde{b}}_{i} (t + 1) = Π_{i}^{1 / 2} \sum_{k \in I_{i}} \sum_{j \in O_{k}} υ_{i, j} (t) Ψ_{i, j}^{(k)} Π_{j}^{1 / 2} {\bar{b}}_{j} (t),

υ_{i, j} (t) = \prod_{τ = 1}^{t} \frac{ς_{i} (τ)}{ς_{j} (τ)} .

We need to design $k_{i} (t + 1)$ , or equivalently $ς_{i} (t + 1)$ , to avoid the indefinite increase or decrease of $b (t)$ . In principle, this could be achieved by choosing

k_{i} (t) = {‖ b_{i} (t) ‖}^{- 1},

so that $‖ {\bar{b}}_{i} (t) ‖ = 1$ , for all $t \in N$ . From (A.2), this would lead to

ς_{i} (t + 1) = {‖ {\tilde{b}}_{i} (t + 1) ‖}^{- 1} .

However, the question then arises as to whether some of the scalars $υ_{i, j} (t)$ would grow to infinity. Notice that

υ_{i, j} (t) = \frac{k_{i} (t)}{k_{j} (t)} .

Hence, this could only happen if some vector in the eigenspace associated to the largest eigenvalue of $Υ$ has zero components in the entries corresponding to $b_{j} (t)$ . We call a matrix satisfying this property, ill-posed. Although the set of ill-posed matrices is nowhere dense, (i.e., it is unlikely to have an ill-posed matrix $Υ$ ), we can avoid the indefinite growth of $υ_{i, j} (t)$ by choosing $ς_{i} (t + 1)$ so that $‖ {\bar{b}}_{i} (t + 1) ‖ \leq 1$ and, for all $j \in B_{i} = {j : Ψ_{i, j} \neq 0}$ ,

ς_{i} (t + 1) υ_{i, j} (t) \leq 1 .

This leads to

ς_{i} (t + 1) = min {{‖ {\tilde{b}}_{i} (t + 1) ‖}^{- 1}, υ_{i, j}^{- 1} (t), j \in B_{i}} = max {‖ {\tilde{b}}_{i} (t + 1) ‖, υ_{i, j} (t), j \in B_{i}}^{- 1} .

From (15), (A.1), the estimate ${\bar{Υ}}_{i} (t)$ of $‖ Υ ‖$ at $t$ is

{\bar{Υ}}_{i} (t) = \frac{‖ b_{i} (t) ‖}{‖ b_{i} (t - 1) ‖} = ς_{i}^{- 1} (t) \frac{‖ {\bar{b}}_{i} (t) ‖}{‖ {\bar{b}}_{i} (t - 1) ‖} .

However, if $Υ$ is ill-posed, $‖ {\bar{b}}_{i} (t) ‖$ will tend to zero. In such case, ${\bar{Υ}}_{i} (t)$ can be computed by

{\bar{Υ}}_{i} (t) = ς_{j}^{- 1} (t) \frac{‖ {\bar{b}}_{j} (t) ‖}{‖ {\bar{b}}_{j} (t - 1) ‖},

for some neighbor node $j$ for which $‖ {\bar{b}}_{j} (t) ‖$ does not tend to zero. Notice that such a neighbor always exists, for otherwise Node $i$ would be isolated from all other nodes.

A.2. Proof of Theorem 7

We need use the following result.

Lemma 16

Let $M = {[M_{i, j}]}_{i, j = 1, \dots, I}$ be a block symmetric matrix. Then

$‖ M ‖ \leq max_{i} ψ_{i},$

where

$ψ_{i} = \sum_{j = 1}^{I} ‖ M_{i, j} ‖ .$

Proof

Let $y = M x$ , with $x = [x_{1}, \dots, x_{I}]$ and $y = [y_{1}, \dots, y_{I}]$ . We have

$‖ y_{i} ‖ \leq \sum_{j = 1}^{I} ‖ M_{i, j} ‖ ‖ x_{j} ‖ = \sum_{j = 1}^{I} {‖ M_{i, j} ‖}^{1 / 2} {(‖ M_{i, j} ‖ {‖ x_{j} ‖}^{2})}^{1 / 2} \leq {(\sum_{j = 1}^{I} ‖ M_{i, j} ‖)}^{1 / 2} {(\sum_{j = 1}^{I} ‖ M_{i, j} ‖ {‖ x_{j} ‖}^{2})}^{1 / 2} .$

Let $ψ = {max}_{i} ψ_{i}$ . Then,

${‖ y ‖}^{2} = \sum_{i = 1}^{I} {‖ y_{i} ‖}^{2} \leq \sum_{i = 1}^{I} (\sum_{j = 1}^{I} ‖ M_{i, j} ‖) (\sum_{j = 1}^{I} ‖ M_{i, j} ‖ {‖ x_{j} ‖}^{2}) \leq ψ \sum_{i = 1}^{I} \sum_{j = 1}^{I} ‖ M_{i, j} ‖ {‖ x_{j} ‖}^{2} \leq ψ^{2} \sum_{j = 1}^{I} {‖ x_{j} ‖}^{2} = ψ^{2} {‖ x ‖}^{2},$

and the result follows.

Proof of Theorem 7

Using Lemma 16 we have

$‖ Υ ‖ \leq max_{i} ψ_{i},$

with

$ψ_{i} = \sum_{j = 1}^{I} ‖ Π_{i}^{1 / 2} Ψ_{i, j} Π_{j}^{1 / 2} ‖ = \sum_{j = 1}^{I} ‖ \sum_{k : i, j \in O_{k}} Π_{i}^{1 / 2} Ψ_{i, j}^{(k)} Π_{j}^{1 / 2} ‖ \leq \sum_{j = 1}^{I} \sum_{k : i, j \in O_{k}} ‖ Π_{i}^{1 / 2} Ψ_{i, j}^{(k)} Π_{j}^{1 / 2} ‖ = \sum_{k \in I_{i}} \sum_{j \in O_{k}} ‖ Π_{i}^{1 / 2} Ψ_{i, j}^{(k)} Π_{j}^{1 / 2} ‖ .$

Hence,

$‖ Υ ‖ \leq max_{i} ϕ_{i},$

and the result follows.

A.3. Proof of Theorem 8

We need the following lemma.

Lemma 17

If $M = [\begin{matrix} I & C \\ C^{*} & I \end{matrix}] \geq 0$ , then $‖ C ‖ \leq 1$ .

Proof

Let $u^{T} = [\begin{matrix} x^{T} & y^{T} \end{matrix}]$ . Then, for any $u$ ,

$u^{*} M u = {‖ x ‖}^{2} + {‖ y ‖}^{2} + 2 y^{T} C x \geq 0 .$

Choose $x$ and $y$ such that $‖ x ‖ = ‖ y ‖ = 1$ and $y^{*} C x = - ‖ C ‖$ . Then, the inequality above becomes $1 - ‖ C ‖ \geq 0$ , and the result follows.

Proof of Theorem 8

Recall that $Π = diag {Π_{1}, \dots, Π_{I}}$ and let $D = \tilde{Π} Π^{- 1}$ , with $\tilde{Π} \in P$ . Then,

${\tilde{Π}}^{1 / 2} Ψ {\tilde{Π}}^{1 / 2} = D^{1 / 2} Υ D^{1 / 2},$

and we have

$κ ({\tilde{Π}}^{1 / 2} Ψ {\tilde{Π}}^{1 / 2}) = κ (D^{1 / 2} Υ D^{1 / 2}) = κ^{2} ({(Υ D)}^{1 / 2}) .$ (A.3)

Let $\bar{σ} (A)$ and $\underline{σ} (A)$ denote the largest and smaller singular values of $A$ , respectively. Now

$\bar{σ} ({(Υ D)}^{1 / 2}) = max_{x \neq 0} \frac{‖ {(Υ D)}^{1 / 2} x ‖}{‖ x ‖} = max_{x \neq 0} \frac{‖ Υ^{1 / 2} x ‖}{‖ D^{- 1 / 2} x ‖} \geq \frac{‖ Υ^{1 / 2} x_{0} ‖}{‖ D^{- 1 / 2} x_{0} ‖},$

for any $x_{0} \neq 0$ . Similarly,

$\underline{σ} ({(Υ D)}^{1 / 2}) \leq \frac{‖ Υ^{1 / 2} y_{0} ‖}{‖ D^{- 1 / 2} y_{0} ‖},$

for any $y_{0} \neq 0$ . Let $x_{0}$ and $y_{0}$ have unit norm and be such that $D^{- 1 / 2} x_{0} = \underline{σ} (D^{- 1 / 2}) x_{0}$ and $Υ^{1 / 2} y_{0} = \underline{σ} (Υ^{1 / 2}) y_{0}$ . Then,

$κ ({(Υ D)}^{1 / 2}) = \frac{\bar{σ} ({(Υ D)}^{1 / 2})}{\underline{σ} ({(Υ D)}^{1 / 2})} \geq \frac{‖ D^{- 1 / 2} y_{0} ‖}{\underline{σ} (D^{- 1 / 2})} \frac{‖ Υ^{1 / 2} x_{0} ‖}{\underline{σ} (Υ^{1 / 2})} \geq \frac{‖ Υ^{1 / 2} x_{0} ‖}{\underline{σ} (Υ^{1 / 2})} .$

Now, since $D^{- 1 / 2}$ is block diagonal, $x_{0}$ can be chosen so that its nonzero components correspond to only one block of $D^{- 1 / 2}$ . Let $x_{0, b}$ denote the entries of $x_{0}$ in that block, and $Υ_{b}^{1 / 2}$ denote the columns of $Υ^{1 / 2}$ corresponding to the same block. Then,

${‖ Υ^{1 / 2} x_{0} ‖}^{2} = | {(Υ^{1 / 2} x_{0})}^{*} Υ^{1 / 2} x_{0} | = | x_{0, b}^{*} Υ_{b}^{1 / 2} Υ_{b}^{1 / 2} x_{0, b} | = | x_{0, b}^{*} x_{0, b} | = {‖ x_{0} ‖}^{2} = 1 .$

Then,

$κ ({(Υ D)}^{1 / 2}) \geq \frac{1}{\underline{σ} (Υ^{1 / 2})} = \frac{κ (Υ^{1 / 2})}{\bar{σ} (Υ^{1 / 2})} .$

Let $Υ = {[Υ_{i, j}]}_{i, j = 1, \dots, I}$ be the block partition of $Υ$ . From Lemma 17, $‖ Υ_{i, j} ‖ \leq 1$ , for all $i, j = 1, \dots, I$ . Then, from Lemma 16, Lemma 17

$\bar{σ} (Υ) \leq max_{i} \sum_{j = 1}^{I} ‖ Υ_{i, j} ‖ \leq max_{i} | B_{i} | = β .$

Hence, from (A.3),

$κ ({\tilde{Π}}^{1 / 2} Ψ {\tilde{Π}}^{1 / 2}) \geq \frac{κ (Υ)}{\bar{σ} (Υ)} \geq \frac{κ (Υ)}{β} .$

The result follows since the inequality above holds for any $\tilde{Π}$ .

Appendix B. Proofs of Section 4

Proof of Lemma 12

From Kailath, Sayed, and Hassibi (2000, A.1(v)), we have $Ψ_{a, a} - Ψ_{a, b} {\overset{ˇ}{Σ}}_{b} Ψ_{b, a}$ and $Ψ_{b, b} - Ψ_{b, a} Ψ_{a, a} Ψ_{a, b}$ are invertible, and

$Ψ^{- 1} = [\begin{matrix} Σ_{a} & - Σ_{a} Ψ_{a, b} Ψ_{b, b}^{- 1} \\ - Σ_{b} Ψ_{b, a} Ψ_{a, a}^{- 1} & Σ_{b} \end{matrix}]$

with $Σ_{a} = {(Ψ_{a, a} - Ψ_{a, b} Ψ_{b, b}^{- 1} Ψ_{b, a})}^{- 1}$ and $Σ_{b} = {(Ψ_{b, b} - Ψ_{b, a} Ψ_{a, a}^{- 1} Ψ_{a, b})}^{- 1}$ . The result then follows from (3).

Proof of Lemma 13

Follows immediately by applying Lemma 12 with $x_{a} = x_{1}$ and $x_{b}^{T} = [x_{2}, \dots, x_{I}]$ .

Before proving Theorem 15, we introduce some notation.

Notation 18

For each $i \in I = {1, \dots, I}$ , and $j \in B_{i}$ , let $M_{i} (0) = M_{j, i} (0) = {i}$ . Then, for each $t \in N$ , define recursively the following two sequences of sets

$M_{i} (t) = ⋃_{k \in M_{i} (t - 1)} B_{k},$

$M_{j, i} (t) = ⋃_{k \in M_{j, i} (t - 1)} B_{k} ∖ {j} .$

I.e., $M_{i} (t)$ is the set of indexes of nodes which are $t$ edges away from Node $i$ , and $M_{j, i} (t)$ is the set resulting after removing from $M_{i} (t)$ the indexes of those nodes which are linked to Node $i$ through Node $j$ . For each $t \in N_{0} = N \cup {0}$ and $i \in I$ , let $ξ_{i}^{T} (t) = [x_{k}^{T} : k \in M_{i} (t)]$ , $ξ_{i, j}^{T} (t) = [x_{k}^{T} : k \in M_{i, j} (t)]$ , $Ω_{i} (t) = {[A_{k, l}]}_{k \in I, l \in M_{i} (t)}$ and $Ω_{i, j} (t) = {[A_{k, l}]}_{k \in I, l \in M_{i, j} (t)}$ . Also, let $({\hat{ξ}}_{i} (t), Ξ_{i} (t))$ be the WLS solution of the reduced system

$y = Ω_{i} (t) ξ_{i} (t) + v,$ (B.1)

i.e.,

${\hat{ξ}}_{i} (t) = Ξ_{i} (t) Ω_{i}^{*} (t) R^{- 1} y$

$Ξ_{i} (t) = {(Ω_{i}^{*} (t) R^{- 1} Ω_{i} (t))}^{- 1}$

and $({\hat{ξ}}_{i, j} (t), Ξ_{i, j} (t))$ be WLS the solution of

$y = Ω_{i, j} (t) ξ_{i, j} (t) + v .$

Proof of Theorem 15

Suppose that, at time $t \in N$ , and for each $i \in I$ and $j \in B_{i} ∖ {i}$ , Node $i$ is able to compute the components $({\overset{ˇ}{x}}_{i} (t), {\overset{ˇ}{Σ}}_{i} (t))$ , corresponding to the state $x_{i}$ , of the solution $({\hat{ξ}}_{i} (t), {\hat{Ξ}}_{j} (t))$ , and the components $({\overset{ˇ}{x}}_{i, j} (t), {\overset{ˇ}{Σ}}_{i, j} (t))$ , corresponding to the same state, of $({\overset{ˇ}{ξ}}_{i, j} (t), {\overset{ˇ}{Ξ}}_{i, j} (t))$ . Since the network is acyclic, for each $i \in I$ and each $t \in N$ , we have

$M_{i} (t + 1) = {i} \cup ⋃_{j \in B_{i} ∖ {i}} M_{i, j} (t) .$

Then, given that Node $i$ receives $γ_{i, j} (t)$ and $Γ_{i, j} (t)$ , from each $j \in B_{i} ∖ {i}$ , $| B_{i} | - 1$ applications of Lemma 13 ( $| S |$ denotes the number of elements in the set $S$ ), gives that $Ψ_{i, i} - \sum_{j \in B_{i} ∖ {i}} Γ_{j, i} (t - 1)$ is invertible, and Node $i$ is able to compute $({\overset{ˇ}{x}}_{i} (t + 1), {\overset{ˇ}{Σ}}_{i} (t + 1))$ . Also, $| B_{i} | - 2$ applications of Lemma 13 give that, for each $j \in B_{i} ∖ {i}$ , $Ψ_{i, i} - \sum_{j \in B_{i} ∖ {i, j}} Γ_{j, i} (t - 1)$ is invertible and Node $i$ can compute $({\overset{ˇ}{x}}_{i, j} (t + 1), {\overset{ˇ}{Σ}}_{i, j} (t + 1))$ . Then, the result follows after initializing the induction above using $({\overset{ˇ}{x}}_{i} (0), {\overset{ˇ}{Σ}}_{i} (0))$ , at each $i \in I$ , for which no information exchange is required.

At each $t \in N_{0}$ and $i \in I$ , $({\overset{ˇ}{x}}_{i} (t), {\overset{ˇ}{Σ}}_{i} (t))$ is the WLS solution of the sub-system (B.1). Since (B.1) is obtained by considering only the nodes in $M_{i} (t)$ , and $M_{i} (t) = I$ , for all $t \geq ρ_{i}$ , (22) follows.

References

Bertsekas Dimitri P., Tsitsiklis John N. 1997. Parallel and distributed computation: numerical methods. [Google Scholar]
Carli R., Chiuso A., Schenato L., Zampieri S. Distributed kalman filtering based on consensus strategies. IEEE Journal on Selected Areas in Communications. 2008;26(4):622–633. [Google Scholar]
Christie, Rich (1993). 118 bus power flow test case. http://www.ee.washington.edu/research/pstca/pf118/pg_tca118bus.htm.
Conejo Antonio J., de la Torre Sebastian, Canas Miguel. An optimization approach to multiarea state estimation. IEEE Transactions on Power Systems. 2007;22(1):213–221. [Google Scholar]
Dargie Waltenegus, Poellabauer Christian. 1st ed. Wiley; 2010. Fundamentals of wireless sensor networks: theory and practice; p. 8. [Google Scholar]
Demmel James. The condition number of equivalence transformations that block diagonalize matrix pencils. SIAM Journal on Numerical Analysis. 1983;20(3):599–610. [Google Scholar]
Diao, Yingfei, Fu, Minyue, & Zhang, Huanshui (2013). Localizability and distributed localization of sensor networks using relative position measurements. In IFAC symposium on large scale systems.
Fang Jun, Li Hongbin. Joint dimension assignment and compression for distributed multisensor estimation. IEEE Signal Processing Letters. 2008;15:174–177. [Google Scholar]
Gómez-Expósito Antonio, Villa Jaén Antonio de la, Gómez-Quiles Catalina, Rousseaux Patricia, Van Cutsem Thierry. A taxonomy of multi-area state estimation methods. Electric Power Systems Research. 2011;81(4):1060–1069. [Google Scholar]
Hlinka Ondrej, Sluciak Ondrej, Hlawatsch Franz, Djuric Petar M., Rupp Markus. Likelihood consensus and its application to distributed particle filtering. IEEE Transactions on Signal Processing. 2012;60(8):4334–4349. [Google Scholar]
Huang Yih-Fang, Werner Stefan, Huang Jing, Kashyap Neelabh, Gupta Vijay. State estimation in electric power grids: meeting new challenges presented by the requirements of the future grid. IEEE Signal Processing Magazine. 2012;29(5):33–43. [Google Scholar]
Jiang Weiqing, Vittal Vijay, Heydt Gerald T. Diakoptic state estimation using phasor measurement units. IEEE Transactions on Power Systems. 2008;23(4):1580–1589. [Google Scholar]
Kailath Thomas, Sayed Ali H., Hassibi Babak. Prentice Hall; 2000. Linear estimation. [Google Scholar]
Kar Soummya, Moura José M.F., Ramanan Kavita. Distributed parameter estimation in sensor networks: nonlinear observation models and imperfect communication. IEEE Transactions on Information Theory. 2012;58(6):3575–3605. [Google Scholar]
Kay Steven. 1st ed. Prentice Hall; 1993. Fundamentals of statistical signal processing, volume I: estimation theory. Vol. 1; p. 4. [Google Scholar]
Khan Usman A., Kar Soummya, Moura José M.F. Distributed sensor localization in random environments using minimal number of anchor nodes. IEEE Transactions on Signal Processing. 2009;57(5):2000–2016. [Google Scholar]
Khan Usman A., Moura José M.F. Distributing the kalman filter for large-scale systems. IEEE Transactions on Signal Processing. 2008;56(10):4919–4935. [Google Scholar]
Krishnamoorthy, Aravindh, & Menon, Deepak (2011). Matrix inversion using cholesky decomposition. arXiv Preprint arXiv:1111.4144.
Li Junlin, AlRegib Ghassan. Rate-constrained distributed estimation in wireless sensor networks. IEEE Transactions on Signal Processing. 2007;55(5):1634–1643. [Google Scholar]
Li Junlin, AlRegib Ghassan. Distributed estimation in energy-constrained wireless sensor networks. IEEE Transactions on Signal Processing. 2009;57(10):3746–3758. [Google Scholar]
Lopes Cassio G., Sayed Ali H. Diffusion least-mean squares over adaptive networks: formulation and performance analysis. IEEE Transactions on Signal Processing. 2008;56(7):3122–3136. [Google Scholar]
Marelli, Damián, & Fu, Minyue (2013). Distributed weighted least squares estimation with fast convergence in large-scale systems. In IEEE conference on decision and control (CDC) (pp. 5432–5437). [DOI] [PMC free article] [PubMed]
Olfati-Saber R., Murray R.M. Consensus problems in networks of agents with switching topology and time-delays. IEEE Transactions on Automatic Control. 2004;49(9):1520–1533. [Google Scholar]
Ribeiro Alejandro, Giannakis Georgios B. Bandwidth-constrained distributed estimation for wireless sensor networks—part I: Gaussian case. IEEE Transactions on Signal Processing. 2006;54(3):1131–1143. [Google Scholar]
Ribeiro Alejandro, Giannakis Georgios B. Bandwidth-constrained distributed estimation for wireless sensor networks—part II: unknown probability density function. IEEE Transactions on Signal Processing. 2006;54(7):2784–2796. [Google Scholar]
Ribeiro Alejandro, Schizas I., Roumeliotis S., Giannakis G. Kalman filtering in wireless sensor networks. IEEE Control Systems. 2010;30(2):66–86. [Google Scholar]
Tai Xin, Marelli Damián, Rohr Eduardo, Fu Minyue. Optimal PMU placement for power system state estimation with random component outages. International Journal of Electrical Power & Energy Systems. 2013;51:35–42. [Google Scholar]
Xiao Jin-Jun, Ribeiro Alejandro, Luo Zhi-Quan, Giannakis Georgios B. Distributed compression-estimation using wireless sensor networks. IEEE Signal Processing Magazine. 2006;23(4):27–41. [Google Scholar]
Yang Peng, Freeman Randy A., Gordon Geoffrey J., Lynch Kevin M., Srinivasa Siddhartha S., Sukthankar Rahul. Decentralized estimation and control of graph connectivity for mobile sensor networks. Automatica. 2010;46(2):390–396. [Google Scholar]

[br000005] Bertsekas Dimitri P., Tsitsiklis John N. 1997. Parallel and distributed computation: numerical methods. [Google Scholar]

[br000010] Carli R., Chiuso A., Schenato L., Zampieri S. Distributed kalman filtering based on consensus strategies. IEEE Journal on Selected Areas in Communications. 2008;26(4):622–633. [Google Scholar]

[br000015] Christie, Rich (1993). 118 bus power flow test case. http://www.ee.washington.edu/research/pstca/pf118/pg_tca118bus.htm.

[br000020] Conejo Antonio J., de la Torre Sebastian, Canas Miguel. An optimization approach to multiarea state estimation. IEEE Transactions on Power Systems. 2007;22(1):213–221. [Google Scholar]

[br000025] Dargie Waltenegus, Poellabauer Christian. 1st ed. Wiley; 2010. Fundamentals of wireless sensor networks: theory and practice; p. 8. [Google Scholar]

[br000030] Demmel James. The condition number of equivalence transformations that block diagonalize matrix pencils. SIAM Journal on Numerical Analysis. 1983;20(3):599–610. [Google Scholar]

[br000035] Diao, Yingfei, Fu, Minyue, & Zhang, Huanshui (2013). Localizability and distributed localization of sensor networks using relative position measurements. In IFAC symposium on large scale systems.

[br000040] Fang Jun, Li Hongbin. Joint dimension assignment and compression for distributed multisensor estimation. IEEE Signal Processing Letters. 2008;15:174–177. [Google Scholar]

[br000045] Gómez-Expósito Antonio, Villa Jaén Antonio de la, Gómez-Quiles Catalina, Rousseaux Patricia, Van Cutsem Thierry. A taxonomy of multi-area state estimation methods. Electric Power Systems Research. 2011;81(4):1060–1069. [Google Scholar]

[br000050] Hlinka Ondrej, Sluciak Ondrej, Hlawatsch Franz, Djuric Petar M., Rupp Markus. Likelihood consensus and its application to distributed particle filtering. IEEE Transactions on Signal Processing. 2012;60(8):4334–4349. [Google Scholar]

[br000055] Huang Yih-Fang, Werner Stefan, Huang Jing, Kashyap Neelabh, Gupta Vijay. State estimation in electric power grids: meeting new challenges presented by the requirements of the future grid. IEEE Signal Processing Magazine. 2012;29(5):33–43. [Google Scholar]

[br000060] Jiang Weiqing, Vittal Vijay, Heydt Gerald T. Diakoptic state estimation using phasor measurement units. IEEE Transactions on Power Systems. 2008;23(4):1580–1589. [Google Scholar]

[br000065] Kailath Thomas, Sayed Ali H., Hassibi Babak. Prentice Hall; 2000. Linear estimation. [Google Scholar]

[br000070] Kar Soummya, Moura José M.F., Ramanan Kavita. Distributed parameter estimation in sensor networks: nonlinear observation models and imperfect communication. IEEE Transactions on Information Theory. 2012;58(6):3575–3605. [Google Scholar]

[br000075] Kay Steven. 1st ed. Prentice Hall; 1993. Fundamentals of statistical signal processing, volume I: estimation theory. Vol. 1; p. 4. [Google Scholar]

[br000080] Khan Usman A., Kar Soummya, Moura José M.F. Distributed sensor localization in random environments using minimal number of anchor nodes. IEEE Transactions on Signal Processing. 2009;57(5):2000–2016. [Google Scholar]

[br000085] Khan Usman A., Moura José M.F. Distributing the kalman filter for large-scale systems. IEEE Transactions on Signal Processing. 2008;56(10):4919–4935. [Google Scholar]

[br000090] Krishnamoorthy, Aravindh, & Menon, Deepak (2011). Matrix inversion using cholesky decomposition. arXiv Preprint arXiv:1111.4144.

[br000095] Li Junlin, AlRegib Ghassan. Rate-constrained distributed estimation in wireless sensor networks. IEEE Transactions on Signal Processing. 2007;55(5):1634–1643. [Google Scholar]

[br000100] Li Junlin, AlRegib Ghassan. Distributed estimation in energy-constrained wireless sensor networks. IEEE Transactions on Signal Processing. 2009;57(10):3746–3758. [Google Scholar]

[br000105] Lopes Cassio G., Sayed Ali H. Diffusion least-mean squares over adaptive networks: formulation and performance analysis. IEEE Transactions on Signal Processing. 2008;56(7):3122–3136. [Google Scholar]

[br000110] Marelli, Damián, & Fu, Minyue (2013). Distributed weighted least squares estimation with fast convergence in large-scale systems. In IEEE conference on decision and control (CDC) (pp. 5432–5437). [DOI] [PMC free article] [PubMed]

[br000115] Olfati-Saber R., Murray R.M. Consensus problems in networks of agents with switching topology and time-delays. IEEE Transactions on Automatic Control. 2004;49(9):1520–1533. [Google Scholar]

[br000120] Ribeiro Alejandro, Giannakis Georgios B. Bandwidth-constrained distributed estimation for wireless sensor networks—part I: Gaussian case. IEEE Transactions on Signal Processing. 2006;54(3):1131–1143. [Google Scholar]

[br000125] Ribeiro Alejandro, Giannakis Georgios B. Bandwidth-constrained distributed estimation for wireless sensor networks—part II: unknown probability density function. IEEE Transactions on Signal Processing. 2006;54(7):2784–2796. [Google Scholar]

[br000130] Ribeiro Alejandro, Schizas I., Roumeliotis S., Giannakis G. Kalman filtering in wireless sensor networks. IEEE Control Systems. 2010;30(2):66–86. [Google Scholar]

[br000135] Tai Xin, Marelli Damián, Rohr Eduardo, Fu Minyue. Optimal PMU placement for power system state estimation with random component outages. International Journal of Electrical Power & Energy Systems. 2013;51:35–42. [Google Scholar]

[br000140] Xiao Jin-Jun, Ribeiro Alejandro, Luo Zhi-Quan, Giannakis Georgios B. Distributed compression-estimation using wireless sensor networks. IEEE Signal Processing Magazine. 2006;23(4):27–41. [Google Scholar]

[br000145] Yang Peng, Freeman Randy A., Gordon Geoffrey J., Lynch Kevin M., Srinivasa Siddhartha S., Sukthankar Rahul. Decentralized estimation and control of graph connectivity for mobile sensor networks. Automatica. 2010;46(2):390–396. [Google Scholar]

PERMALINK

Distributed weighted least-squares estimation with fast convergence for large-scale systems☆

Damián Edgardo Marelli

Minyue Fu

Abstract

1. Introduction

Notation 1

2. Problem description

Assumption 2

Notation 3

3. Asymptotic method for WLS estimation

Definition 4

Assumption 5

3.1. Distributed design of the scaling factor γ

3.1.1. Asymptotic algorithm for γ

Theorem 6

3.1.2. Finite-time algorithm for γ

Theorem 7

3.2. Design of the preconditioning matrix Π

Theorem 8

Remark 9

4. Finite-time method for WLS estimation

Definition 10

Assumption 11

Lemma 12

Lemma 13

Definition 14

Theorem 15

5. Simulations

5.1. State estimation in power systems

Fig. 1.

Table 1.

Table 3.

5.1.1. Cyclic network topology

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Table 2.

5.1.2. Acyclic network topology

Fig. 6.

Fig. 7.

Fig. 8.

Table 4.

5.2. Sensor localization

Fig. 9.

Fig. 10.

Fig. 11.

Table 5.

Fig. 12.

6. Conclusion

Biographies

Footnotes

Contributor Information

Appendix A. Proofs of Section 3

A.1. Proof of Theorem 6

A.2. Proof of Theorem 7

Lemma 16

Proof

Proof of Theorem 7

A.3. Proof of Theorem 8

Lemma 17

Proof

Proof of Theorem 8

Appendix B. Proofs of Section 4

Proof of Lemma 12

Proof of Lemma 13

Notation 18

Proof of Theorem 15

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Distributed weighted least-squares estimation with fast convergence for large-scale systems^☆

3.1. Distributed design of the scaling factor $γ$

3.1.1. Asymptotic algorithm for $γ$

3.1.2. Finite-time algorithm for $γ$

3.2. Design of the preconditioning matrix $Π$