Skip to main content
PLOS One logoLink to PLOS One
. 2020 Aug 26;15(8):e0238145. doi: 10.1371/journal.pone.0238145

On the effects of hard and soft equality constraints in the iterative outlier elimination procedure

Vinicius Francisco Rofatto 1,2,*, Marcelo Tomio Matsuoka 1,2,3,6, Ivandro Klein 4,5, Maurício Roberto Veronez 6, Luiz Gonzaga da Silveira Junior 6
Editor: Qichun Zhang7
PMCID: PMC7449505  PMID: 32845919

Abstract

Reliability analysis allows for the estimation of a system’s probability of detecting and identifying outliers. Failure to identify an outlier can jeopardize the reliability level of a system. Due to its importance, outliers must be appropriately treated to ensure the normal operation of a system. System models are usually developed from certain constraints. Constraints play a central role in model precision and validity. In this work, we present a detailed investigation of the effects of the hard and soft constraints on the reliability of a measurement system model. Hard constraints represent a case in which there exist known functional relations between the unknown model parameters, whereas the soft constraints are employed where such functional relations can be slightly violated depending on their uncertainty. The results highlighted that the success rate of identifying an outlier for the case of hard constraints is larger than soft constraints. This suggested that hard constraints be used in the stage of pre-processing data for the purpose of identifying and removing possible outlying measurements. After identifying and removing possible outliers, one should set up the soft constraints to propagate their uncertainties to the model parameters during the data processing.

Introduction

It is very common to build models (i.e., the equation systems) based on some initial knowledge about a given problem. In other words, models are often set up in a way that the model parameters need to fulfill certain constraints. Such constraints are a priori knowledge embedded into a model to avoid a trivial solution; to guarantee the stability of estimates; to improve the precision and accuracy of the results by reducing the number of unknown parameters, or accordingly, by increasing the redundancy of the system; and to mitigate (or even estimate) a possible systematic effect [1, 2].

The models are usually formulated with minimal constraint or extra (redundant) constraints. In that case, we refer to the so-called equality constraints, which are usually incorporated into a system of equations to create a well-posed model [3]. For the most part, minimal constraints are introduced to solve to the problem of rank deficiency in linear (or linearized) systems. The rank deficiency is often caused by the lack (or insufficient) information about a problem. In the field of geodesy, for example, minimal constraints are external information whose primary role is to specify the coordinate system to which the network station positions will be estimated by the least-squares method (LS). This problem is known as datum definition (or also zero-order design or datum choice problem) [49]. Several works have investigated the minimum-constrained adjustment and the datum choice problem in the geodetic literature, focusing on topics like free-adjustment and the role of inner constraints [1013].

If the number of constraints exceeds the minimum needed to solve the rank deficiency of the equation systems, we say that we have redundant (or extra) constraints. Extra constraints are also used to check the stability of points in geodetic deformation analysis [1416] to test the compatibility of constraints with the observations and the rest of the constraints [1719].

So far we have only distinguished the constraints in terms of numerical quantity. The model can also be subject to a hard and soft (or weighted) constraints. Hard constraints can often represent a case in which there exist known functional relations between the unknown parameters. Soft constraints (or looser constraints) are, however, for when functional relations can be slightly violated depending on their uncertainty [2, 19]. Soft constraints may also be referred to as a pseudo-observation model [20].

The well-known least-squares (LS) is widely used as a standard method of estimating model parameters in geodetic applications and many others branches of modern science [2141]. This is due to the flexibility of the LS, since no concepts from probability theory are used in formulating the least-squares minimization problem.

LS is a linear unbiased estimator (LUE), and in some special cases, it coincides with the best linear unbiased estimator (BLUE). The estimator that has the smallest variance of all LUEs is called the best linear unbiased estimator (BLUE). If we have full knowledge of the probability density function (PDF) of the measurements, the method of maximum likelihood estimation (MLE) can also be applied. In case of normally distributed measurements (Gauss–Markov model), the MLE estimators are identical to the BLUE ones, and therefore the LS and MLE principles provide identical results [24, 42]; however, the presence of undesirable outliers in the dataset makes LS no longer unbiased and not coincide with MLE [43].

Here, we assume that an outlier is an observation that has deviated from its most probable value to the point of jeopardizing the mathematical model (functional and stochastic) to which it should belong. Due to its importance, outliers must be appropriately treated to ensure the quality of data analysis [4450].

In this study, we employed iterative data snooping (IDS), which is a hypothesis test-based outlier. It is important to mention that IDS is not restricted to the field of geodetic statistics, but is a generally applicable method [51, 52]. IDS is an iterative outlier elimination procedure, which combines estimation, testing and a corrective action [44, 53]. Parameter estimation is often conducted using LS. Then, hypothesis testing is performed with the aim to identify any outlier that may be present in the dataset. After identification, the suspected outlier is then excluded from the dataset as a corrective action (i.e., adaptation), and the LS is restarted without the rejected measurement. If the model redundancy permits, this procedure is repeated until no more (possible) outliers can be identified (see e.g., [23], pp. 135). Although in this study, we restricted ourselves to the case of one outlier at a time, IDS can also be applied for cases containing multiple (simultaneous) outliers [54]. For more details about multiple (simultaneous) outliers, the reader is referred to [5557]. Because ÌDS is based on statistical hypothesis testing, there are chances of both correct and incorrect decisions. Recently, Rofatto et al. [44] provided an algorithm based on Monte Carlo to determine the probability levels associated with IDS. In that case, they described six classes of decisions for IDS, namely probability of correct identification (PCI), probability of missed detection (PMD), probability of wrong exclusion (PWE), probability of over-identification positive (Pover+), probability of over-identification negative (Pover-) and statistical overlap (Pol), as follows:

  • PCI: Probability of identifying and removing correctly an outlying measurement;

  • PMD: Probability of not detecting the outlier (i.e., Type II decision error for IDS);

  • PWE: Probability of identifying and removing a non-outlying measurement while the ‘true’ outlier remains in the dataset (i.e., Type III decision error [58] for IDS);

  • Pover+: Probability of identifying and removing correctly the outlying measurement and others;

  • Pover-: Probability of identifying and removing more than one non-outlying measurement, whereas the ‘true outlier’ remains in the dataset;

  • Pol: occurs in cases where one alternative hypothesis has the same distribution as the another one. These hypotheses cannot be distinguished because their test statistics are numerically the same, violating the IDS rule of one outlier at a time. In that case, they are non-separable and an outlier cannot be identified. In other words, it corresponds to the probability of flagging simultaneously two (or more) measurements as outliers.

Based on the probabilities of correct detection (PCD=1-PMD) and correct identification (PCI), the minimal biases, MDB (minimal detectable bias) and MIB (minimal identifiable bias), can be computed as sensitivity indicators for outlier detection and identification, respectively. “Outlier Detection” only informs whether or not there might have been at least one outlier; however, the detection does not tell us which measurement is an outlier. The localization of the outlier is a problem of “outlier identification”, i.e., “Outlier Identification” implies the execution of a search among the measurements for the most likely outlier [44]; therefore, the smallest value of an outlier that can be detected, given a certain PCD, defines the MDB. On the other hand, the smallest value of an outlier that can be identified, given a certain PCI, defines the MIB.

In this study, we investigated the effects of models subject to constraints (minimum, redundant, hard and soft) on the probability levels associated with IDS. It is important to emphasize that if a standard deviation of a constraint (or a set of a constraint) is changed from zero to a non-zero value, it is called a “relaxation” of the constraint [20].

We also evaluated the effect of relaxing constraints on the MIB and MDB. This kind of assessment is a kind of sensitivity analysis. We also highlight that the task of clustering a set of geodetic measurements was applied for the first time in this study. We intend to show that the clusters can be defined according to two deterministic parameters: local redundancy and correlation between the outlier test statistics.

Critical values optimized by the Monte Carlo method were used here [44, 51] in order to compute the decision classes associated with IDS, i.e., PCI, PMD, PWE, Pover+, Pover- and Pol.

Material and methods

We used the procedure provided by Rofatto et al. [44] to compute the probability levels associated with IDS, as well as to estimate the both MDB and MIB. The procedure is summarized in Fig 1.

Fig 1. Flowchart of the algorithm.

Fig 1

Flowchart of the algorithm to compute the probability levels of Iterative Data Snooping (IDS) for each measurement in the presence of an outlier [44].

The probability levels associated with IDS were computed for each observation individually and for each outlier magnitude; however, they were grouped into clusters based on number of local redundancy (ri) and maximum absolute correlation between the outlier test statistics (ρwi,wj). Furthermore, we took care to control the family-wise error rate. See Supporting Information for more details S1 Appendix.

Problem description

To analyze the effects of the constraints on the IDS, an example was taken from a geodetic leveling network with 12 height differences between the points. The equipment used to measure the level difference was an electronic digital level. In that case, the leveling measurement system comprises of a special bar-coded staff (also called barcode rod) and a digital level (instrument). A digital level is basically a telescope that enables a horizontal line of sight. Digital levels consist of additional electronic image processing components to automatically read and analyze digital (bar coded) leveling staffs, where the graduation is replaced by a manufacturer dependent code pattern. Generally, the result is automatically stored in the data collector of the digital level. An example of a “digital level—bar-code staff” system is displayed in the Fig 2. For more details about digital level see e.g., [5961].

Fig 2. Digital level—Bar-code staff system.

Fig 2

Example of a digital level—bar-code staff system [44].

The standard deviation of the uncorrelated measurements were the same and taken equal to σ = 1mm. The points are indicated as A to G. The eight network configuration are displayed in Fig 3a–3e and their details are given as follows:

Fig 3. Different constraint scenarios.

Fig 3

Leveling geodetic network subject to different constraint scenarios.

  1. Fig 3a: Network with 1 hard constraint (i.e., network minimally constrained). Since the dimension of the network is 1D, the minimum information necessary to estimate the unknown heights is one. The height of G was fixed as a control point (hard constraint), and 6 unknown heights (A,B,C,D,E,F) were minimally constrained; therefore, the redundancy of the system (or overall degrees of freedom) was r = n-rank(A) = nu = 12 − 6 = 6.

  2. Fig 3b: Network with 1 extra hard constraint (i.e., two hard constraints). The heights A and D were taken as hard constraints (i.e., heights A and D were fixed). The redundancy of the system in that case was r = 12 − 5 = 7 with 5 unknown heights (B,C,E,F,G) over-constrained.

  3. Fig 3c: Network with 2 extra hard constraints (i.e., three hard constraints). The heights A, D and G were taken as hard constraints. In that case, the redundancy of the system was r = 12 − 4 = 8.

  4. Fig 3d: Network with 2 soft constraints (A and D). In that case, a standard deviation larger than zero was assigned to both constraints i.e., σc > 0. In other words, A and D were processed as being both observations and unknown parameters, i.e., A and D were pseudo-observations. Both constraints were simultaneously relaxed by considering their uncertainties 10 times worse than the measurements (i.e., σc = 10 × σ = 10mm); 10 times better than measurements (i.e., σc = 0.1mm); their uncertainties equal to the measurements (σc = 1mm). In that case, the redundancy of the system was r = 14 − 7 = 7.

  5. Fig 3e: Network processed with A, D and G as pseudo-observations. Those three constraints were simultaneously relaxed by considering their standard deviations equal to σc = 10mm (10 times worse than measurements); σc = 0.1mm (10 times better than measurements); σc = 1mm (the same as the measurements). In that case, the redundancy of the system was r = 15 − 7 = 8.

The following system of equations for that problem is given by:

y1+e1=hB-hAy2+e2=hC-hBy3+e3=hD-hCy7+e7=hB-hGy8+e8=hC-hGy11+e11=hB-hFy12+e12=hC-hE (1)

The design matrix (A) for the system of equations in 1 is given by:

A=[-11000000-11000000-11000000-11000000-11010000-10010000-1001000-1000010-1000001-101000-100010-100] (2)

Note that the rank defect of the matrix A is u-rank(A) = 7 − 6 = 1. In that case, at least one constraint is needed in order to avoid rank the deficiency of the matrix A. This is guaranteed when one height is known. For example, from the network in Fig 3a, we have added the height G as known (i.e., as a hard constraint). In that case, the constraint equation should be added into the system in 1, i.e.,

y13=hGwithσy13=0, (3)

noticing that because the standard deviation is zero, the observation is non-stochastic (hard constraint) and the residual ey13=0. This can generate problems in the inversion of the covariance matrix of the observations Qe for the calculation of the weight matrix W, because the weight for that constraint would be undefined, i.e., 10. In order to avoid that problem, we have eliminated the rank deficiency of matrix A by removing the seventh column of matrix A in 2 associated with the height G. Now, we have u-rank(A) = 6 − 6 = 0. The constraint defines the geodetic datum, i.e., the S-system [62]. Another approach to solving the system of equations in 1 could be based on generalized (pseudo) inverses [63].

The location of the constraints can be chosen in some circumstances, for example, during the design stage of a geodetic network. For the special case of having a minimally constrained system, the location of the constraint will not influence the w-test statistics and the sensitivity indicators (MIB and MDB) [9]; however, more constraints than the minimum necessary to have a solution (i.e., extra constraints or redundant constraints) can change the least-squares residuals and hence w-test statistics and the minimal biases.

From the network with one extra constraint (2 constraints) in Fig 3b, for example, both the first (height A) and fourth column (height D) of matrix A in 2 were eliminated in the case of having the two heights as hard constraints. For the case where these two heights (A and D) were taken as soft constraints, however, two observation equations were added to Eq 1, i.e.,

y13+e13=hA,σy13>0y14+e14=hD,σy14>0 (4)

In the case of soft constraints in Eq 4, 2 lines were added in matrix A. In other words, A and D were taken as pseudo-observations. In that case, the rank deficiency was also null (i.e., u-rank(A) = 7 − 7 = 0), the redundancy of the system was r = n-rank(A) = nu = 7 and the matrix A was given as follows:

A=[-11000000-11000000-11000000-11000000-11010000-10010000-1001000-1000010-1000001-101000-100010-10010000000001000] (5)

For this example of 2 soft constraints, and by considering the both soft constraints with standard deviation σc = 10mm, the symmetric and positive semi-definite covariance matrix of the observations (Qe) was given as follows:

Qe=[10000010000010000010000000100] (6)

The last two rows and columns of the matrix Qe in 6 refer to the variances (σc2=(10mm)2=100mm2) of the heights constraints A and D, respectively. Similarly, matrices A and Qe were constructed for the other cases studied here.

Although the measurements are able to identify an outlier for the case of having only one single soft constraint, the pseudo-observation (constraint) is not. In that case, the defect configuration is associated with the additional parameter in the constraint (i.e., the presence of an outlier in the constraint). In other words, an additional parameter on the soft constraint will not estimable. For example, if the height point G was taken as a soft constraint, the presence of an outlier in pseudo-observation G would lead to rank deficiency of matrix A, i.e., u-rank(A) = 8 − 7 = 1; therefore, the case of having only one single soft constraint was not considered here.

Result of the hard constraint effects on the iterative outlier elimination procedure

The scenarios in Fig 3a (network minimally constrained), Fig 3b (two hard constraints) and Fig 3c (three hard constraints) were considered here for the analysis. Table 1 gives the local redundancy (ri), the standard deviation of the LS-estimated outlier σi and the maximum absolute correlation (maxρwi,wj) for each scenario of hard constraint set out in this study, i.e., Fig 3a–3c.

Table 1. Local redundancy (ri), standard deviation of the least-squares (LS)-estimated outlier σi and the maximum absolute correlation (maxρwi,wj) for each scenario of hard constraint.

1 hard constraint 2 hard constraints 3 hard constraints
Measurement ri σi maxρwi,wj ri σi maxρwi,wj ri σi maxρwi,wj
y1 0.396 1.589 1.00 0.583 1.309 0.36 0.708 1.188 0.41
y2 0.500 1.414 0.47 0.583 1.309 0.36 0.583 1.309 0.32
y3 0.396 1.589 1.00 0.583 1.309 0.36 0.708 1.188 0.41
y4 0.396 1.589 1.00 0.583 1.309 0.36 0.708 1.188 0.41
y5 0.500 1.414 0.47 0.583 1.309 0.36 0.583 1.309 0.32
y6 0.396 1.589 1.00 0.583 1.309 0.36 0.708 1.188 0.41
y7 0.563 1.333 0.47 0.583 1.309 0.36 0.708 1.188 0.41
y8 0.563 1.333 0.47 0.583 1.309 0.36 0.708 1.188 0.41
y9 0.563 1.333 0.47 0.583 1.309 0.36 0.708 1.188 0.41
y10 0.563 1.333 0.47 0.583 1.309 0.36 0.708 1.188 0.41
y11 0.583 1.309 0.43 0.583 1.309 0.36 0.583 1.309 0.32
y12 0.583 1.309 0.43 0.583 1.309 0.36 0.583 1.309 0.32

Next, the twelve leveling measurements were clustered into four clusters. The four cluster were defined as follows:

  • Cluster 1: y1, y3, y4 and y6.

  • Cluster 2: y2 and y5.

  • Cluster 3: y7, y8, y9 and y10.

  • Cluster 4: y11 and y12.

The probability levels associated with IDS were averaged for each of these clusters. The critical values were k^=3.89, k^=3.93 and k^=3.93 for one hard constraint, two hard constraints and three hard constraints, respectively. These critical values were found for α′ = 0.001. PCI and PCD and are displayed in Fig 4 for each number of hard constraint (denoted by h.c.).

Fig 4. PCI and PCD for the case of hard constraints and for α′ = 0.001.

Fig 4

Cluster 1(A,b), Cluster 2(c,d), Cluster 3(e,f) and Cluster 4(g,h).

The outlier magnitude were defined from |5σ| to |9σ|. The outlier of |5σ| was chosen because it is approximately the lowest MDB0(i) of the network when a single hypothesis testing is in play (See Supplementary Material for more details S1 Appendix). That MDB0(i) of |5σ| was computed for a significance level of α′ = 0.001 and a power of the test γ0 = 0.8. This strategy reduces the search space for an MIB, because we will always have the following inequality MIBMDB0(i) [52, 64]. Remember that the IDS procedure is an example of multiple hypothesis testing. The success rate for outlier detection and outlier identification were taken as being P~CD=P~CI=0.8, respectively. Table 2 provides the values of MDB and MIB for that case of hard constraints.

Table 2. MDB (minimal detectable bias) and MIB (minimal identifiable bias) for the case of hard constraints based on α′ = 0.001 and P~CD=P~CI=0.8.

1 hard constraint 2 hard constraints 3 hard constraints
Cluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)
1 7.5 - 6.3 6.3 5.7 5.7
2 6.7 6.8 6.3 6.4 6.3 6.4
3 6.4 6.4 6.3 6.3 5.8 5.8
4 6.4 6.4 6.4 6.4 6.4 6.4

Fig 5 shows the PWE. Pover+ and Pover- were smaller than 0.001 (i.e., they were practically null). There were not Pol for clusters 2, 3 and 4. We will discuss more about Pol later.

Fig 5. PWE for the case of hard constraints and for α′ = 0.001.

Fig 5

Cluster 1(A), Cluster 2(b), Cluster 3(c) and Cluster 4(d).

Result of the soft constraint effects on the iterative outlier elimination procedure

Both configurations in Fig 3d and 3e were analyzed in terms of soft constraints. In that case, the critical values were k^=3.95, k^=3.95 and k^=3.92 for two soft constraints with σc = 0.1mm, σc = 1mm and σc = 10mm, respectively. In the case of three soft constraints, the critical values found were k^=3.99, k^=3.99 and k^=3.96 for σc = 0.1mm, σc = 1mm and σc = 10mm, respectively. All these critical values were computed for α′ = 0.001. Table 3 gives the local redundancy (ri), the standard deviation of the LS-estimated outlier σi and the maximum absolute correlation (maxρwi,wj) for the scenarios of two constraints.

Table 3. Local redundancy (ri), standard deviation of the LS-estimated outlier σi(mm) and the maximum absolute correlation (maxρwi,wj) for each scenario of two soft constraints.

σc = 0.1mm σc = 1mm σc = 10mm
Measurement ri σi maxρwi,wj ri σi maxρwi,wj ri σi maxρwi,wj
y1 0.581 1.312 0.564 0.471 1.457 0.681 0.397 1.587 0.994
y2 0.582 1.311 0.376 0.533 1.369 0.423 0.501 1.413 0.471
y3 0.581 1.312 0.564 0.471 1.457 0.681 0.397 1.587 0.994
y4 0.581 1.312 0.564 0.471 1.457 0.681 0.397 1.587 0.994
y5 0.582 1.311 0.376 0.533 1.369 0.423 0.501 1.413 0.471
y6 0.581 1.312 0.564 0.471 1.457 0.681 0.397 1.587 0.994
y7 0.583 1.310 0.359 0.571 1.324 0.423 0.563 1.333 0.471
y8 0.583 1.310 0.359 0.571 1.324 0.423 0.563 1.333 0.471
y9 0.583 1.310 0.359 0.571 1.324 0.423 0.563 1.333 0.471
y10 0.583 1.310 0.359 0.571 1.324 0.423 0.563 1.333 0.471
y11 0.583 1.309 0.358 0.583 1.309 0.398 0.583 1.309 0.433
y12 0.583 1.309 0.358 0.583 1.309 0.398 0.583 1.309 0.433
y13 0.007 1.163 1.000 0.300 1.826 1.000 0.497 14.189 1.000
y14 0.007 1.163 1.000 0.300 1.826 1.000 0.497 14.189 1.000

From Table 3, five clusters were defined for each case of two soft constraints, i.e., for the case where heights A and D were given as soft constraints in Fig 3d, as follows:

  • Cluster 1: y1, y3, y4 and y6.

  • Cluster 2: y2 and y5.

  • Cluster 3: y7, y8, y9 and y10.

  • Cluster 4: y11 and y12.

  • Cluster 5: y13 and y14.

PCI and PCD for the measurements (Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayed in Fig 6.

Fig 6. PCI and PCD for the measurements subject to the scenarios of two soft constraints for α′ = 0.001.

Fig 6

Cluster 1(a,b), Cluster 2(c,d), Cluster 3(e,f) and Cluster 4(g,h).

Note that Cluster 5 is associated with the two soft constraints (i.e., y13 and y14). The PCI for these both soft constraints were null; however, PCD were not. Fig 7 shows PCD for these two soft constraints (i.e., heights A and D).

Fig 7. Probability of PCD and PCI for the two soft constraints and for α′ = 0.001.

Fig 7

Cluster 5: heights A and D.

The PWE for the measurements (Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayed in Fig 8. Fig 9 gives PWE for two constraints (i.e., heights A and D). The Pover+ and Pover- and the Pol were practically null for that case. The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayed in Table 4.

Fig 8. The PWE for the measurements subject to the scenarios of two soft constraints for α′ = 0.001.

Fig 8

Cluster 1(a), Cluster 2(b), Cluster 3(c) and Cluster 4(d).

Fig 9. The PWE for the two soft constraints and for α′ = 0.001.

Fig 9

Cluster 5: heights A and D.

Table 4. MDB and MIB for the case of two soft constraints based on α′ = 0.001 and P~CD=P~CI=0.8.

σc = 10mm σc = 1mm σc = 0.1mm
Cluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)
1 7.5 25 7 7.1 6.3 6.3
2 6.8 6.8 6.6 6.6 6.3 6.3
3 6.4 6.4 6.4 6.4 6.3 6.3
4 6.3 6.3 6.3 6.3 6.3 6.3
5 6.8 - 8.8 - 57 -

Table 5 gives the local redundancy (ri), the standard deviation of the LS-estimated outlier σi and the maximum absolute correlation (maxρwi,wj) for the scenarios of three soft constraints.

Table 5. Local redundancy (ri), standard deviation of the LS-estimated outlier σi(mm) and the maximum absolute correlation (maxρwi,wj) for each scenario of the three soft constraints.

σc = 0.1mm σc = 1mm σc = 10mm
Measurement ri σi maxρwi,wj ri σi maxρwi,wj ri σi maxρwi,wj
y1 0.702 1.194 0.660 0.502 1.411 0.577 0.398 1.586 0.992
y2 0.582 1.311 0.326 0.533 1.369 0.412 0.501 1.413 0.470
y3 0.702 1.194 0.660 0.502 1.411 0.577 0.398 1.586 0.992
y4 0.702 1.194 0.660 0.502 1.411 0.577 0.398 1.586 0.992
y5 0.582 1.311 0.326 0.533 1.369 0.412 0.501 1.413 0.470
y6 0.702 1.194 0.660 0.502 1.411 0.577 0.398 1.586 0.992
y7 0.704 1.192 0.415 0.602 1.289 0.412 0.563 1.333 0.470
y8 0.704 1.192 0.415 0.602 1.289 0.412 0.563 1.333 0.470
y9 0.704 1.192 0.415 0.602 1.289 0.412 0.563 1.333 0.470
y10 0.704 1.192 0.415 0.602 1.289 0.412 0.563 1.333 0.470
y11 0.583 1.309 0.326 0.583 1.309 0.385 0.583 1.309 0.433
y12 0.583 1.309 0.326 0.583 1.309 0.385 0.583 1.309 0.433
y13 0.012 0.904 0.660 0.425 1.534 0.542 0.663 12.283 0.501
y14 0.012 0.904 0.660 0.425 1.534 0.542 0.663 12.283 0.501
y15 0.019 0.718 0.63 0.5 1.414 0.542 0.665 12.268 0.501

The PCI and PCD in Fig 10 were computed for the clusters based on Table 5, as follows:

Fig 10. The PCI and PCD for the measurements subject to the scenarios of three soft constraints for α′ = 0.001.

Fig 10

Cluster 1(A,b), Cluster 2(c,d), Cluster 3(e,f) and Cluster 4(g,h).

  • Cluster 1: y1, y3, y4 and y6.

  • Cluster 2: y2 and y5.

  • Cluster 3: y7, y8, y9 and y10.

  • Cluster 4: y11 and y12.

  • Cluster 5: y13 and y14.

  • Cluster 6: y15.

Fig 11 shows PCI and PCD for the three soft constraints, i.e., for Cluster 5 (heights A and D) and Cluster 6 (height G) in Fig 3e. The PWE for the measurements (Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A, D and G) are displayed in Fig 12. Fig 13 gives PWE for three constraints (i.e., heights A, D and G). The Pover+, Pover- and Pol were also practically null for that case of three soft constraints. The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayed in Table 6.

Fig 11. The PCI and PCD for the three constraints and for α′ = 0.001.

Fig 11

Cluster 5(a,b) and Cluster 6(c,d).

Fig 12. The PWE for the measurements subject to the scenarios of the three soft constraints and for α′ = 0.001.

Fig 12

Cluster 1(a), Cluster 2(b), Cluster 3(c) and Cluster 4(d).

Fig 13. The PWE for the three constraints and for α′ = 0.001.

Fig 13

Cluster 6(b).

Table 6. MDB and MIB for the case of the three soft constraints based on α′ = 0.001 and P~CD=P~CI=0.8.

σc = 10mm σc = 1mm σc = 0.1mm
Cluster MDB (σ) MIB (σ) MDB (σ) MIB (σ) MDB (σ) MIB (σ)
1 7.5 22 6.8 6.9 5.8 5.9
2 6.8 6.9 6.6 6.7 6.4 6.4
3 6.4 6.4 6.3 6.3 5.8 5.8
4 6.3 6.3 6.3 6.3 6.3 6.3
5 5.9 6.0 7.4 7.5 43.5 45
6 5.9 5.9 6.9 6.9 34.6 35.5

Discussion

We started by analyzing the scenario of one hard constraint in Fig 3a. Table 1 shows that the maximum correlation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to 1.00 (i.e., maxρwi,wj=1.00). This means that the measurements belonging to Cluster 1 are connected with unknown heights whose connections are limited to only two. Both unknown heights A and D are tied only to two measurements (i.e., y1 and y6 linked to A, and y3 and y4 linked to D); therefore, if an outlier occurred in one of these measurements, we would only be able to analyze the consistency between them, but we would not be able to distinguish which of them was contaminated by an outlier. This means that we would only be able to detect them, because the w-test statistics could be larger than a critical value k^; however, in that case, the values of w-test statistics would be the same, and we would not have only one unique maximum w-test statistics, but would actually have four maximum w-test statistics. In other words, the equation systems associated with the measurements of Cluster 1 are linearly dependent [65]; therefore, there is no reliability in terms of outlier identification for Cluster 1, as can be seen in Fig 3a.

From Fig 3b, we note that there is reliability in terms of outlier detection for Cluster 1, and it is caused by overlapping w-test statistics. The probability of statistics overlap (Pol) for Cluster 1 in the scenario of a minimally constrained network is displayed in Fig 14.

Fig 14. PCD and Pol for Cluster 1 subject to one hard constraint and for α′ = 0.001.

Fig 14

The PCD and Pol for Cluster 1 subject to one hard constraint and for α′ = 0.001.

The problem of not having more connections (i.e., more measurements) for the unknown heights A and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) are taken as hard constraints in Fig 3b or when the heights A, D and G are hard constraints in Fig 3c. Fig 3a and 3b show that the measurements of Cluster 1 are able to identify an outlier when two hard constraints (A and D fixed) are in play. The case of three hard constraints (A, D and G fixes) in Fig 3e and 3f is also verified by our results i.e., there is reliability in terms of both outlier detection and identification for these measurements in those conditions.

From Table 2, we observe different behavior for the clusters as follows:

  • Cluster 1: there was no MIB for the case of having only one single hard constraint, whereas there was MDB = MIB for the other cases; however, both MDB and MIB decrease significantly with the increase in the number of hard constraints.

  • Cluster 2: MDB was slightly smaller than MIB. Both MDB and MIB were practically the same for the case of having two or three hard constraints.

  • Cluster 3: MDB = MIB for all cases of hard constraints; however, both MDB and MIB decrease significantly with the increase in the number of hard constraints.

  • Cluster 4: MDB and MIB were equal for all cases.

In terms of outlier detection and identification: Cluster 1 was more sensitive to constraints; Cluster 3 was relatively sensitive to constraints; Cluster 4 was completely insensitive to constraints; Cluster 2 was relatively insensitive to constraints; see Fig 4. The reason for this is that the local redundancy (ri) of Cluster 1 increased with the increase of the number of hard constraints, whereas Cluster 4 remained the same; see Table 2.

Leaving aside the cases of Pol, the network presents low least-squares residuals correlation (ρwi,wj<0.5) and high local redundancy (ri > 0.5). Because of this, PWE were less than 1%, see Fig 5. The Pover+ and Pover- were practically null. Consequently, PCIPCD. Due of this fact, the family-wise error rate (α′) should be increased in order to have more success rate in the outlier detection and identification [44].

From Fig 15, we observe that increasing the α′ increases both the PCI and PCD for outlier magnitude from 5σ to 6σ in the case of three hard constraints and from 5σ to 6.8σ in the case of two hard constraints. Although the rates of Pover+ and PWE also increase, they are not significant when compared to the improvement of PCI and detection (PCD). This same analysis can be done for the other clusters.

Fig 15. The PCI, PCD, Pover+ and PWE for Cluster 1 subject to two and three hard constraints and for α′ = 0.001 and α′ = 0.1.

Fig 15

The PCI (A), PCD (b), Pover+ (c) and PWE for Cluster 1 subject to two and three hard constraints and for α′ = 0.001 and α′ = 0.1.

In terms of soft constraints for the cases of two constraints in Fig 3d, we observe from Table 3 that the larger the relaxation of the constraint (i.e., the larger the standard deviation of the constraint σc), the larger the residuals correlation (ρwi,wj) and the standard deviation of the outlier σi, and the smaller the local redundancy (ri). Consequently, PCI and detection (PCD) get smaller and smaller with the relaxation of the constraints, whereas PWE gets larger (PWE). This can be more clearly verified in Fig 6a, 6b and 8a for Cluster 1, whose measurements are connected with the constraints A and D (i.e., y13 and y14 in Table 3, respectively).

Note from Fig 8a that the PWE increases as the magnitude of the outlier (∇i) increases; however, this is only true up to a certain limit of outlier magnitude. The effect of residuals correlation ρwi,wj on the rates of PWE and PCI tends to decrease with the increase in the magnitude of the outlier ∇i. This effect is more clearly verified for Cluster 1, in a case where the precision of the constraints are ten times worse than the measurements σc = 10σ = 10mm.

Note from Fig 6 that identifying an outlier in Cluster 1 (i.e., y1, y3, y4 and y6) when σc = 10mm is more difficult than the other clusters. This is due to the fact that Cluster 1 has a higher residuals correlation ρwi,wj=0.994 than other clusters. We observe that the larger the relaxation of the constraints, the larger the effect of the correlation ρwi,wj on the success rate of outlier identification (PCI). Consequently, the higher the sensitivity indicator for outlier identification (MIB). Table 2 reveals that the ratio between MIB and MDB for Cluster 1 and for the scenario where the standard deviations of that two soft constraints are σc = 10mm is MIB/MDB = 25/7.5 = 3.3. On the other hand, the relationship between MIB and MDB is practically one (i.e., MIB/MDB = 1.0) for the others scenarios.

If the family-wise error rate (FWE) rate (α′) were increased for the case where the two soft constraints of σc = 10mm are in play, we would not have great advantages for Cluster 1, due to its high residuals correlation (ρwi,wj=99.4%). From Fig 16, we can observe that the PCI for outlier magnitudes from 5σ to 8σ is effectively larger for a user-defined α′ = 0.1 than α′ = 0.001; however, the success rate is still less than 80%, i.e., PCI<0.8. Note, for example, the correct identification rate is PCI=56% for an outlier magnitude of ∇i = 8σ and α′ = 0.1. For α′ = 0.1 the MIB = 33.5σ = 33.5mm, whereas for α′ = 0.001 is MIB = 25σ = 25mm; therefore, in that case, the MIB for PCI=0.8(80%) and α′ = 0.1 would be 34% larger than user-defined α′ = 0.001.

Fig 16. The PCI for Cluster 1 subject to two soft constraints (2 s.c.) A and D for α′ = 0.001 and α′ = 0.1.

Fig 16

The PCI for Cluster 1 subject to two soft constraints (2 s.c.) A and D for α′ = 0.001 and α′ = 0.1.

The soft constraints A and D were grouped in Cluster 5 (i.e., A and D were treated as pseudo-observations in the model). There is no reliability in terms of outlier identification for the constraints, because the residual correlation between them is ρwi,wj=100%, as can be seen in Table 3 for y13 and y14; however, these soft constraints are able to detect an outlier. In that case, the PCD in Fig 7 is mainly caused by the Pol, as can be seen in σc = 10mm in Fig 17. From Table 4, we observe that the larger the relaxation of the constraints, the larger the MDB. Note that the values of MDB are given in σ, and thus the MDB for σc = 10mm is larger than σc = 1mm and σc = 0.1mm, i.e., we had the following inequality: MDB = 6.8σc = 6.8 × 10mm = 68mm > MDB = 8.8σc = 8.8 × 1mm = 8.8mm > MDB = 57σc = 57 × 0.1mm = 5.7mm. In that case, if the FWE (α′) were increased, the rate of outlier detection by the Cluster 4 (i.e., by the soft constraints) would increase.

Fig 17. The PCD and Pol for the two soft constraints A and D (Cluster 5) with σc = 10mm and for α′ = 0.001.

Fig 17

The PCD and Pol for the two soft constraints A and D (Cluster 5) with σc = 10mm and for α′ = 0.001.

Similar effects of the relaxation of the constraints on the performance of the IDS in case of two soft constraints are verified in case of three soft constraints, as can be seen in Figs 10, 11, 12 and 13.

In case of having three soft constraints in Fig 3e, there is reliability in terms of outlier identification for the three pseudo-observations y13, y14 and y15 (i.e., for A, D and G), seen in Fig 10 and Table 6. In that case, we also observe that PCD of the soft constraints A and D (i.e., Cluster 5) were approximately 13% for σc = 10mm, 16% for σc = 1mm and 24% for σc = 0.1mm larger than the scenario of the network subject to two soft constraints. Table 6 reveals that the advantage of having three soft constraints instead of two constraints is that the constraints become identifiable in the presence of an outlier. The behavior of the PCD, PCI and PWE was similar to the case of the two soft constraints. Furthermore, the larger the relaxation of the constraints, the smaller the residuals correlation between the measurements and the soft constraints and the larger the residuals correlation among the measurements.

We also observe that the case of two soft constraints for σc = 0.1mm was comparable with two hard constraints (see e.g., Tables 2 and 6) in terms of the probability levels associated with IDS for the measurements (i.e., clusters 1, 2, 3 and 4). In the same way for the case of two soft constraints with σc = 1mm or σc = 10mm, the probabilities levels were similar to the one hard constraint for that measurements, with the benefit of two soft constraints having reliability in terms of outlier identification for the Cluster 1. Finally, the three soft constraints with σc = 1mm and σc = 10mm were comparable to the two soft constraints for that scenario of constraints relaxation, wheres the three soft constraints for σc = 0.1mm showed similar outcomes with three hard constraints for the measurements (see e.g., Tables 2 and 6). In that case, however, an advantage of the three soft constraints on the three hard constraints is the possibility of analyzing the sensitivity of the constraints. We emphasize that the stochastic models of the measurements and constraints were assumed to be well-known and defined for the analyses performed here.

Conclusion

We highlight the main findings of this research as follows:

  • Under a system of a high local redundancy ri > 0.5 and low residuals correlation (ρwi,wj<0.5), if one increases the family-wise error rate (FWE) of the test statistic, the performance of the procedure will be improved for both scenarios of hard constraints and soft constraints.

  • PCI of the observations is larger for the case of hard constraints than soft constraints.

  • The larger the relaxation of the constraints, the larger the effect of the residuals correlation (ρwi,wj) on the success rate of outlier identification (PCI) of the observations. Consequently, the higher the sensitivity indicator for outlier identification (MIB), the more difficult it becomes to identify an outlier.

  • Under a scenario of soft constraints, one should set out at least three soft constraints in order to identify an outlier in the constraints.

  • Hard constraints should be used in the stage of pre-processing data for the purpose of identifying and removing possible outlying measurements. In that process, one should opt to set out the redundant hard constraints at points in the network where the smallest connections exist. After identifying and removing possible outliers, the soft constraints should be employed to propagate the uncertainties of the constraints (pseudo-observations) to the model parameters during the process of least-squares estimation.

Supporting information

S1 Appendix. Description of the method.

Provides a broad theoretical framework and detailed description of the method used to estimate the Iterative Data-Snooping probability levels.

(PDF)

Acknowledgments

The authors would like to thank the two anonymous reviewers who contributed to the improvement of the manuscript.

Data Availability

All relevant data are within the manuscript and its Supporting Information files.

Funding Statement

The CNPq - Conselho Nacional de Desenvolvimento Científico e Tecnológico - Brasil had the role of providing the study grant for MTM (proc. n°103587/2019-5); and PETROBRAS (Grant Number 2018/00545-0) had the role of paying both the publication fee and the professional language editing service. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Fang X. Weighted total least-squares with constraints: a universal formula for geodetic symmetrical transformations. Journal of Geodesy. 2015;89(5):459–469. 10.1007/s00190-015-0790-8 [DOI] [Google Scholar]
  • 2. Amiri-Simkooei AR. Weighted Total Least Squares with Singular Covariance Matrices Subject to Weighted and Hard Constraints. J Surv Eng. 2017;143(4):04017018 10.1061/(ASCE)SU.1943-5428.0000239 [DOI] [Google Scholar]
  • 3. Courant R, Hilbert D. Methods of Mathematical Physics. vol. 1 Wiley-VCH; 1989. [Google Scholar]
  • 4. Grafarend EW. Optimization of Geodetic Networks. The Canadian Surveyor. 1974;28(5):716–723. 10.1139/tcs-1974-0120 [DOI] [Google Scholar]
  • 5. Teunissen P. Zero Order Design: Generalized Inverses, Adjustment, the Datum Problem and S-Transformations In: Grafarend EW, Sansò F, editors. Optimization and Design of Geodetic Networks. Berlin, Heidelberg: Springer Berlin Heidelberg; 1985. p. 11–55. [Google Scholar]
  • 6. Dermanis A. Free network solutions with the DLT method. ISPRS Journal of Photogrammetry and Remote Sensing. 1994;49(2):2–12. 10.1016/0924-2716(94)90061-2 [DOI] [Google Scholar]
  • 7. Kotsakis C. Reference frame stability and nonlinear distortion in minimum-constrained network adjustment. Journal of Geodesy. 2012;86(9):755–774. 10.1007/s00190-012-0555-6 [DOI] [Google Scholar]
  • 8. Kotsakis C. In: Grafarend E, editor. Datum Definition and Minimal Constraints. Cham: Springer International Publishing; 2018. p. 1–6. Available from: 10.1007/978-3-319-02370-0_157-1. [DOI] [Google Scholar]
  • 9. Matsuoka MT, Rofatto VF, Klein I, Roberto Veronez M, da Silveira LG, Neto JBS, et al. Control Points Selection Based on Maximum External Reliability for Designing Geodetic Networks. Applied Sciences. 2020;10(2). 10.3390/app10020687 [DOI] [Google Scholar]
  • 10. Baarda W. S-transformations and criterion matrices. Publ on geodesy, New Series. 1973;5(2). [Google Scholar]
  • 11. Schaffrin B. Aspects of Network Design In: Grafarend EW, Sansò F, editors. Optimization and Design of Geodetic Networks. Berlin, Heidelberg: Springer Berlin Heidelberg; 1985. p. 548–597. [Google Scholar]
  • 12. XU P. A general solution in geodetic nonlinear rank-defect models Bollettino di geodesia e scienze affini. 1997;. [Google Scholar]
  • 13.Altamimi Z, Dermanis A. The Choice of Reference System in ITRF Formulation. In: Sneeuw N, Novák P, Crespi M, Sansò F, editors. VII Hotine-Marussi Symposium on Mathematical Geodesy. Berlin, Heidelberg: Springer Berlin Heidelberg; 2012. p. 329–334.
  • 14. Velsink H. On the deformation analysis of point fields. J Geod. 2015;89(11):1071–1087. 10.1007/s00190-015-0835-z [DOI] [Google Scholar]
  • 15. Velsink H. Extendable linearised adjustment model for deformation analysis. Survey Review. 2015;47(345):397–410. 10.1179/1752270614Y.0000000140 [DOI] [Google Scholar]
  • 16. Velsink H. Time Series Analysis of 3D Coordinates Using Nonstochastic Observations. Journal of Applied Geodesy. 2016;10(1):5–16. 10.1515/jag-2015-0027 [DOI] [Google Scholar]
  • 17. Rao CR. Markoff’s Theorem with Linear Restrictions on Parameters. Sankhyā: The Indian Journal of Statistics (1933-1960). 1945;7(1):16–19. [Google Scholar]
  • 18. Lehmann R, Neitzel F. Testing the compatibility of constraints for parameters of a geodetic adjustment model. Journal of Geodesy. 2013;87(6):555–566. 10.1007/s00190-013-0627-2 [DOI] [Google Scholar]
  • 19. Velsink H. Testing Methods for Adjustment Models with Constraints. Journal of Surveying Engineering. 2018;144(4):04018009 10.1061/(ASCE)SU.1943-5428.0000260 [DOI] [Google Scholar]
  • 20. Velsink H. Testing deformation hypotheses by constraints on a time series of geodetic observations. J Appl Geod. 2017;12(1):77–93. 10.1515/jag-2017-0028 [DOI] [Google Scholar]
  • 21. Baarda W. A testing procedure for use in geodetic networks. Publ on geodesy, New Series. 1968;2(5). [Google Scholar]
  • 22. Teunissen PJG. First and second moments of non-linear least-squares estimators. Bull Geodesique (Journal of Geodesy). 1989;63:253–262. 10.1007/BF02520475 [DOI] [Google Scholar]
  • 23. Teunissen PJG. Testing Theory: an introduction. 2nd ed Delft University Press; 2006. [Google Scholar]
  • 24.Amiri-Simkooei AR. Least-squares variance component estimation: theory and GPS applications [PhD thesis]. Delft University of Technology; 2007. Available from: https://ncgeo.nl/index.php/en/publicatiesgb/publications-on-geodesy.
  • 25. Koch KR. Parameter estimation and hypothesis testing in linear models. 2nd ed Springer; 1999. [Google Scholar]
  • 26. Tao H, Feng H, Xu L, Miao M, Long H, Yue J, et al. Estimation of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data. Sensors. 2020;20(5). 10.3390/s20051296 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Han M, Wang Q, Wen Y, He M, He X. The Application of Robust Least Squares Method in Frequency Lock Loop Fusion for Global Navigation Satellite System Receivers. Sensors. 2020;20(4). 10.3390/s20041224 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Wei C. Estimation for the Discretely Observed Cox–Ingersoll–Ross Model Driven by Small Symmetrical Stable Noises. Symmetry. 2020;12(3). 10.3390/sym12030327 [DOI] [Google Scholar]
  • 29. Sakic P, Ballu V, Royer JY. A Multi-Observation Least-Squares Inversion for GNSS-Acoustic Seafloor Positioning. Remote Sensing. 2020;12(3). 10.3390/rs12030448 [DOI] [Google Scholar]
  • 30. Livadiotis G. General Fitting Methods Based on Lq Norms and their Optimization. Stats. 2020;3(1):16–31. 10.3390/stats3010002 [DOI] [Google Scholar]
  • 31. Farooq SZ, Yang D, Ada ENJ. A Cycle Slip Detection Framework for Reliable Single Frequency RTK Positioning. Sensors. 2020;20(1). 10.3390/s20010304 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Araveeporn A. Comparing Parameter Estimation of Random Coefficient Autoregressive Model by Frequentist Method. Mathematics. 2020;8(1). 10.3390/math8010062 [DOI] [Google Scholar]
  • 33. Zhang C, Peng T, Zhou J, Ji J, Wang X. An Improved Autoencoder and Partial Least Squares Regression-Based Extreme Learning Machine Model for Pump Turbine Characteristics. Applied Sciences. 2019;9(19). 10.3390/app9193987 [DOI] [Google Scholar]
  • 34. Ji J, Yang M, Jiang L, He J, Teng Z, Liu Y, et al. Output-Only Parameters Identification of Earthquake-Excited Building Structures with Least Squares and Input Modification Process. Applied Sciences. 2019;9(4). 10.3390/app9040696 [DOI] [Google Scholar]
  • 35. Büchele D, Chao M, Ostermann M, Leenen M, Bald I. Multivariate chemometrics as a key tool for prediction of K and Fe in a diverse German agricultural soil-set using EDXRF. Scientific Reports. 2019;9(1):17588 10.1038/s41598-019-53426-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Zhang J, Richardson JD, Dunkley BT. Classifying post-traumatic stress disorder using the magnetoencephalographic connectome and machine learning. Scientific Reports. 2020;10(1):5937 10.1038/s41598-020-62713-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Yalage Don SM, Schmidtke LM, Gambetta JM, Steel CC. Aureobasidium pullulans volatilome identified by a novel, quantitative approach employing SPME-GC-MS, suppressed Botrytis cinerea and Alternaria alternata in vitro. Scientific Reports. 2020;10(1):4498 10.1038/s41598-020-61471-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Bica R, Palarea-Albaladejo J, Kew W, Uhrin D, Pacheco D, Macrae A, et al. Nuclear Magnetic Resonance to Detect Rumen Metabolites Associated with Enteric Methane Emissions from Beef Cattle. Scientific Reports. 2020;10(1):5578 10.1038/s41598-020-62485-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Chen X, Qiao W, Miao W, Zhang Y, Mu X, Wang J. The Dependence of Implicit Solvent Model Parameters and Electronic Absorption Spectra and Photoinduced Charge Transfer. Scientific Reports. 2020;10(1):3713 10.1038/s41598-020-60757-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Weaving D, Jones B, Ireton M, Whitehead S, Till K, Beggs CB. Overcoming the problem of multicollinearity in sports performance data: A novel application of partial least squares correlation analysis. PLOS ONE. 2019;14(2):1–16. 10.1371/journal.pone.0211776 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Chen Y. Spatial Autocorrelation Approaches to Testing Residuals from Least Squares Regression. PLOS ONE. 2016;11(1):1–19. 10.1371/journal.pone.0146865 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Kargoll B. On the theory and application of model misspecification tests in geodesy [Doctoral thesis]. University of Bonn, Landwirtschaftliche Fakultät. German, Bonn; 2007.
  • 43. Lehmann R. On the formulation of the alternative hypothesis for geodetic outlier detection. J Geod. 2013;87(4):373–386. 10.1007/s00190-012-0607-y [DOI] [Google Scholar]
  • 44. Rofatto VF, Matsuoka MT, Klein I, Roberto Veronez M, da Silveira LG. A Monte Carlo-Based Outlier Diagnosis Method for Sensitivity Analysis. Remote Sensing. 2020;12(5). 10.3390/rs12050860 [DOI] [Google Scholar]
  • 45. Goldstein M, Uchida S. A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data. PLOS ONE. 2016;11(4):1–31. 10.1371/journal.pone.0152173 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Faria B, Vistulo de Abreu F. Cellular frustration algorithms for anomaly detection applications. PLOS ONE. 2019;14(7):1–31. 10.1371/journal.pone.0218930 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Aljably R, Tian Y, Al-Rodhaan M, Al-Dhelaan A. Anomaly detection over differential preserved privacy in online social networks. PLOS ONE. 2019;14(4):1–20. 10.1371/journal.pone.0215856 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. El Azami M, Hammers A, Jung J, Costes N, Bouet R, Lartizien C. Detection of Lesions Underlying Intractable Epilepsy on T1-Weighted MRI as an Outlier Detection Problem. PLOS ONE. 2016;11(9):1–21. 10.1371/journal.pone.0161498 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Gautier M, Hocking TD, Foulley JL. A Bayesian Outlier Criterion to Detect SNPs under Selection in Large Data Sets. PLOS ONE. 2010;5(8):1–16. 10.1371/journal.pone.0011913 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. George NI, Bowyer JF, Crabtree NM, Chang CW. An Iterative Leave-One-Out Approach to Outlier Detection in RNA-Seq Data. PLOS ONE. 2015;10(6):1–10. 10.1371/journal.pone.0125224 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Lehmann R. Improved critical values for extreme normalized and studentized residuals in Gauss–Markov models. J Geod. 2012;86(12):1137–1146. 10.1007/s00190-012-0569-0 [DOI] [Google Scholar]
  • 52. Rofatto VF, Matsuoka MT, Klein I, Veronez MR, Bonimani ML, Lehmann R. A half-century of Baarda’s concept of reliability: a review, new perspectives, and applications. Surv Rev. 2018;0(0):1–17. [Google Scholar]
  • 53. Zaminpardaz S, Teunissen PJG. DIA-datasnooping and identifiability. J Geod. 2019;93(1):85–101. 10.1007/s00190-018-1141-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Kok JJ, States U. On data snooping and multiple outlier testing [microform] / Johan J. Kok. U.S. Dept. of Commerce, National Oceanic and Atmospheric Administration, National Ocean Service, Charting and Geodetic Services: For sale by the National Geodetic Information Center, NOAA Rockville, Md; 1984.
  • 55. Knight NL, Wang J, Rizos C. Generalised measures of reliability for multiple outliers. Journal of Geodesy. 2010;84(10):625–635. 10.1007/s00190-010-0392-4 [DOI] [Google Scholar]
  • 56. Gui Q, Li X, Gong Y, Li B, Li G. A Bayesian unmasking method for locating multiple gross errors based on posterior probabilities of classification variables. J Geod. 2011;85(4):191–203. 10.1007/s00190-010-0429-8 [DOI] [Google Scholar]
  • 57. Klein I, Matsuoka MT, Guzatto MP, Nievinski FG. An approach to identify multiple outliers based on sequential likelihood ratio tests. Surv Rev. 2017;49(357):449–457. 10.1080/00396265.2016.1212970 [DOI] [Google Scholar]
  • 58. Hawkins DM. Identification of Outliers. 1st ed Springer; Netherlands; 1980. [Google Scholar]
  • 59. Algarni DA, Ali AE. Heighting and Distance Accuracy with Electronic Digital Levels. Journal of King Saud University—Engineering Sciences. 1998;10(2):229—239. 10.1016/S1018-3639(18)30698-6 [DOI] [Google Scholar]
  • 60. Takalo M, Rouhiainen P. Development of a System Calibration Comparator for Digital Levels in Finland. Nordic Journal of Surveying and Real Estate Research. January;1(2). [Google Scholar]
  • 61. Wiedemann W, Wagner A, Wunderlich T. Using IATS to Read and Analyze Digital Levelling Staffs In: Paar R, Marendić A, Zrinjski M, editors. SIG 2016. Varazdin, Croatia: Croatian Geodetic Society; 2016. p. 515–526. Available from: http://www.geof.unizg.hr/pluginfile.php/7437/mod_book/chapter/173/TS6_2.pdf. [Google Scholar]
  • 62. T P J G. The Geometry of Geodetic Inverse Linear Mapping and Non-linear Adjustment. Publications on Geodesy, New Series. 1985;8(1). [Google Scholar]
  • 63. Rao CR, Mitra SK. Generalized inverse of a matrix and its applications; 1972. [Google Scholar]
  • 64. Imparato D, Teunissen PJG, Tiberius CCJM. Minimal Detectable and Identifiable Biases for quality control. Surv Rev. 2019;51(367):289–299. 10.1080/00396265.2018.1437947 [DOI] [Google Scholar]
  • 65. Hekimoglu S, Erenoglu RC, Sanli DU, Erdogan B. Detecting Configuration Weaknesses in Geodetic Networks. Survey Review. 2011;43(323):713–730. 10.1179/003962611X13117748892632 [DOI] [Google Scholar]

Decision Letter 0

Qichun Zhang

29 Jun 2020

PONE-D-20-10931

On the Effects of Hard and Soft Equality Constraints in the Iterative Outlier Elimination Procedure

PLOS ONE

Dear Dr. ROFATTO,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

 Based on the comments from the reviewers, a minor revision is needed for improving the quality of the manuscript. Some details should be highlighted in the revised version. Meanwhile, a proof reading is recommenced before submitting the revised version. Please consider and response all the comments of the reviewer, a separate response letter is essential.

Please submit your revised manuscript by Aug 13 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Qichun 'Kit' Zhang, PhD, FHEA, CEng, MIET, SMIEEE

Academic Editor

PLOS ONE

University of Bradford

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We suggest you thoroughly copyedit your manuscript for language usage, spelling, and grammar. If you do not know anyone who can help you do this, you may wish to consider employing a professional scientific editing service.  

Whilst you may use any professional scientific editing service of your choice, PLOS has partnered with both American Journal Experts (AJE) and Editage to provide discounted services to PLOS authors. Both organizations have experience helping authors meet PLOS guidelines and can provide language editing, translation, manuscript formatting, and figure formatting to ensure your manuscript meets our submission guidelines. To take advantage of our partnership with AJE, visit the AJE website (http://learn.aje.com/plos/) for a 15% discount off AJE services. To take advantage of our partnership with Editage, visit the Editage website (www.editage.com) and enter referral code PLOSEDIT for a 15% discount off Editage services.  If the PLOS editorial team finds any language issues in text that either AJE or Editage has edited, the service provider will re-edit the text for free.

Upon resubmission, please provide the following:

  • The name of the colleague or the details of the professional service that edited your manuscript

  • A copy of your manuscript showing your changes by either highlighting them or using track changes (uploaded as a *supporting information* file)

  • A clean copy of the edited manuscript (uploaded as the new *manuscript* file)

3. Thank you for stating the following in the Acknowledgments Section of your manuscript:

'The authors would like to thank CNPq|Conselho Nacional de Desenvolvimento Cientfico e Tecnologico|Brasil (proc. nº 103587/2019-5) and PETROBRAS (Grant Number 2018/00545-0) for funding the research.'

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

'The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.'

At this time, please address the following queries:

  1. Please clarify the sources of funding (financial or material support) for your study. List the grants or organizations that supported your study, including funding received from your institution.

  2. State what role the funders took in the study. If the funders had no role in your study, please state: “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

  3. If any authors received a salary from any of your funders, please state which authors and which funders.

  4. If you did not receive any funding for this study, please state: “The authors received no specific funding for this work.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The study “On the Effects of Hard and Soft Equality Constraints in the Iterative Outlier Elimination Procedure” is interesting. The paper is well set. The data and methodology parts are well described. However, I would recommend the authors add some more explanations and also describe the innovation of this paper in the introduction section. Moreover, attention should be given to the following highlighted points before resubmitting.

1. Improve the quality of Figure 1. Flowchart of the algorithm because the text inside the boxes is hard to read.

2. Page 3 / 20, Line 85, the words “ the minimal biases, MDB (Minimal Detectable Bias) and MIB (Minimal Identifiable Bias) “, the same words defined again just with a different style as Page 4 / 20 line 108 “ the minimal biases - Minimal Detectable Bias (MDB) and Minimal Identifiable Bias (MIB) ”, Please use the same style throughout the paper and secondly once an abbreviation is defined just use the same if required. The same repeated on page 7 / 20 lines 227 MIB (Minimal Identifiable Bias).

3. Page 6 / 20 line 168. The expression 1/0=∞ is not true. 1/0 is said to be undefined because the division is defined in terms of multiplication. a/b = x is defined to mean that b*x = a. There is no x such that 0*x = 1, since 0*x = 0 for all x. Thus 1/0 does not exist, or is not defined, or is undefined.

4. In Table 1 with two hard constraints, why the local redundancy, the standard deviation least square estimated outlier, and the maximum absolute correlation all are equal for twelve measurements. While for 1 hard constraint and 3 hard constraints there is some variation present in twelve measurements.

5. Page 9 / 20. The abbreviations used are very much common please stop this work from line 254 to 259, probabilities of correct identification (PCI ) and correct detection (PCD) used and defined 3 and 4 times, respectively. The other abbreviations also repeated quite often i.e. over-identification cases (P over+ and P over-) and the statistical overlap (Pol).

6. Check the value of MIB in Table 4, column 3 is it the value equal to 25?

7. Check the value of MIB in Table 6, column 3 is it the value equal to 22?

8. Page 12 / 20 line 324, Cluster 4: MDB e MIB were equal for all cases. Here the lower case e stands for.

9. In last some special attention must be given to the language as well. There are some sentences which completely wrong.

Reviewer #2: The authors investigated the effect of soft and hard constraints in the iterative outlier elimination procedure.

The paper is well written and addressing an important issue in the field of Statistical Process Control. The paper can be acceptable for publication after careful handling of the following points.

i) Reduce the length of Conclusion section. Only include important findings.

ii) Include high quality figures.

iii) There are some missing lines or extra legend items in Figures 4(c) - 4(h).

iv) Figures should be of same size.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Aug 26;15(8):e0238145. doi: 10.1371/journal.pone.0238145.r002

Author response to Decision Letter 0


17 Jul 2020

Editor: We have carefully reviewed the comments and thoroughly revised the manuscript accordingly. We would like to update our Funding Statement, as follows:

- The CNPq - Conselho Nacional de Desenvolvimento Científico e Tecnológico - Brasil had the role of providing the study grant for the second author (proc. nº103587/2019-5); and

- The PETROBRAS (Grant Number 2018/00545-0) had the role of paying both the publication fee and the professional language editing service.

We have provided the following files as requested:

English language editing certificate by MDPI.

A copy of our manuscript showing our changes by highlighting them.

A clean copy of the edited manuscript.

Reviewer 1: We have incorporated all of your suggestions into our revision. They were very helpful. Thank you.

Reviewer 2: We have incorporated all of your suggestions into our revision. They were very helpful. Thank you for your help.

Attachment

Submitted filename: Response to Reviewers.pdf

Decision Letter 1

Qichun Zhang

11 Aug 2020

On the Effects of Hard and Soft Equality Constraints in the Iterative Outlier Elimination Procedure

PONE-D-20-10931R1

Dear Dr. ROFATTO,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Qichun Kit Zhang, PhD

Academic Editor

PLOS ONE

Additional Editor Comments:

Both reviewers satisfied the current version with revisions. The concerns have been addressed well and the quality of the manuscript has been improved. The paper is acceptable and ready to publish.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: As my suggestions form the previous review round have been met, I suggest acceptance of the paper in the current form.

Reviewer #2: All comments have been addressed in the revised submission.The manuscript is acceptable in the current format.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Saddam Akber Abbasi

Acceptance letter

Qichun Zhang

14 Aug 2020

PONE-D-20-10931R1

On the Effects of Hard and Soft Equality Constraints in the Iterative Outlier Elimination Procedure

Dear Dr. Rofatto:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Qichun Zhang

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix. Description of the method.

    Provides a broad theoretical framework and detailed description of the method used to estimate the Iterative Data-Snooping probability levels.

    (PDF)

    Attachment

    Submitted filename: Response to Reviewers.pdf

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting Information files.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES