Inlier Clustering based on the Residuals of Random Hypotheses

Mohammed Kutbi; Yizhe Chang; Philippos Mordohai

doi:10.1016/j.patrec.2021.07.007

. Author manuscript; available in PMC: 2022 Oct 1.

Published in final edited form as: Pattern Recognit Lett. 2021 Jul 21;150:101–107. doi: 10.1016/j.patrec.2021.07.007

Inlier Clustering based on the Residuals of Random Hypotheses

Mohammed Kutbi ^a, Yizhe Chang ^b, Philippos Mordohai ^c

PMCID: PMC8415488 NIHMSID: NIHMS1730491 PMID: 34483416

Abstract

We present an approach for motion clustering based on a novel observation that a signature for putative pixel correspondences can be generated by collecting their residuals with respect to model hypotheses drawn randomly from the data. Inliers of the same motion cluster should have strongly correlated residuals, which are low when a hypothesis is consistent with the data in the cluster and high otherwise. After evaluating a number of hypotheses, members of the same cluster can be identified based on these correlations. Due to this property, we named our approach Inlier Clustering based on the Residuals of Random Hypotheses (ICR). An important advantage of ICR is that it does not require an inlier-outlier threshold or parameter tuning. In addition, we propose a supervised recursive formulation of ICR (r-ICR) that, unlike many motion clustering methods, does not require the number of clusters to be known a priori, as long as annotated data are available for training. We validate ICR and r-ICR on several publicly available datasets for robust geometric model fitting.

Keywords: Motion Segmentation, Model Estimation, Clustering

1. Introduction

Robust model fitting, clustering a set of observations into groups according to the parametric model they adhere to, and outlier removal are central tasks in many estimation problems in computer vision and robotics, including SLAM in the presence of moving objects, as well as detection and segmentation of pedestrians or vehicles in driving scenarios. One class of methods for addressing such problems is RANSAC [9] and its variants [25], which randomly generate model hypotheses and verify them by accumulating support from the data. Alternatively, one can cluster the observations [8, 15], and, then, estimate model parameters from the resulting uncontaminated clusters. There are several trade-offs when one considers the entire spectrum of available methods for solving these types of problems, such as robustness to perturbations of the inliers and to the presence of outliers, the ability to handle data with multiple structures, and computational efficiency.

In this paper, we present an approach for inlier clustering based on a novel use of the residuals of each observation with respect to a number of randomly drawn hypotheses as the observation’s signature. Our approach is robust to perturbations and outliers, can handle data with multiple structures, and is computationally efficient. Two algorithms have been implemented based on this approach. The first algorithm, named Inlier Clustering based on Residuals (ICR), starts with a random clustering as initialization. A minimal set of observations are then sampled from each cluster to generate additional hypotheses and the residuals of the observations with respect to these hypotheses are used to grow the signatures and refine the clustering. The assumption is that observations from the same true cluster should have similar residuals: low residuals for hypotheses drawn from samples from their cluster and random, mostly large, residuals otherwise. Figure 1 shows an example of a few objects undergoing different motions and the ICR results.

Fig. 1. — (a) and (b): image pair. (c): first view with extracted feature (d): ground truth clustering (red dots represent outliers). (e) and (f): ICR clustering results.

The second algorithm is a supervised recursive formulation of ICR, which we refer to as r-ICR. r-ICR takes as input a data cluster and uses a Support Vector Machine (SVM) to determine whether the cluster is contaminated or not. Contamination can be either due to the presence of outliers or multiple instances of the motion model. If a cluster is classified as contaminated, ICR is deployed to split it into two clusters. The process is recursively applied on the resulting sub-clusters until they are either pure or too small to be split. r-ICR does not require knowledge of the true number of clusters, K, which is an important advantage over most of the literature. The cost of this capability comes in two forms: the need for annotated training data, containing pure and contaminated clusters, and over-segmentation of the outlier clusters in the output.

Both ICR and r-ICR do not require a threshold, which is a significant advantage compared to the majority of RANSAC-style algorithms. One of the important challenges to robust estimation and geometric model-based segmentation methods is the need for an inlier-outlier threshold, which is typically set by a domain expert. Moreover, the threshold may have to be set per environment, scene, intrinsic camera parameters or pose. Our approach is a new attempt to eliminate this requirement, along with prior and recent work [2, 3, 16, 24, 27].

Motion segmentation experiments on the Hopkins155 [29], the AdelaideRMF [31] and DTU MVS dataset [14] show that both methods are effective in the presence of multiple motions and outliers, while baseline algorithms have limitations under certain conditions. The contributions of this paper are:

The Inlier Clustering based on Residuals (ICR) approach for geometry-based clustering that requires no threshold, even in the presence of multiple clusters.
r-ICR, a recursive formulation of ICR which does not require the number of clusters to be externally specified.
Validation of both ICR and r-ICR on real datasets for motion segmentation including quantitative comparisons with several baseline methods.
ICR and r-ICR outperform baseline methods in terms of computation time, while their complexity grows more slowly as a function of the number of points.

2. Related Work

We categorize relevant publications in two groups: those focusing on the estimation of parametric models and those focusing on motion segmentation, or more generally on assigning the inputs to the model instance they are most compatible with. The first group includes robust estimators, primarily RANSAC [9] and its variants, which have been surveyed by Raguram et al. [25]. Here, we focus on robust estimators that share one or more of the key aspects of our approach: being threshold-free, the capability to partition the data in more than two clusters in one step (ICR) and the capability to cluster the data without knowledge of the number of clusters (r-ICR).

RECON [26] does not require a user-specified threshold; it distinguishes model hypotheses generated from contaminated and uncontaminated minimal subsets by examining their residuals, under the assumption that there is only one structure in the data and the outliers are unstructured. Methods for discriminating inliers and outliers based on residual analysis have also been published by Zhang and Kosecka [32] and Chin et al. [5]. The former is a RANSAC variant, while the latter is an extension of the mixed integer linear program solver for the maximum consensus problem [19]. Both assume that there is a single structure (model instance) in the data. Similar to RECON [26], we operate on the residuals, but anticipate the presence of multiple structures. The StaRSaC algorithm [6] examines the variance of the estimated parameters as the threshold changes to find ranges of stability. Within these ranges, inliers are detected and model parameters are estimated.

Our approach also bears some similarity to Locally Optimized RANSAC (LO-RANSAC) [7, 18] which refines approximate inlier sets generated from minimal subsets that appear to be “promising.” We also progressively refine the clusters of inliers. GC-RANSAC [1] is an extension of LO-RANSAC [7] that alternates graph-cut optimization and model re-fitting to enforce spatial coherence of the inliers and outliers. Recently, MAGSAC [2] and MAGSAC++ [3] eliminate the need for a user-defined inlier-outlier threshold by marginalizing over a range of noise scales. MAGSAC++ overcomes the long run times of MAGSAC by reformulating the problem as M-estimation, and also includes a new sampler favoring the detection of local structures. Fragoso et al. [10] proposed ANSAC (Adaptive Non-Minimal Sample and Consensus) to address a limitation of RANSAC which produces inaccurate hypotheses due to the use of minimal subsets of inliers corrupted by noise. CONSAC [16] is a learning-based approach that guides a RANSAC estimator to sequentially find model instances based on previous searches and detections.

The second group of relevant papers address the problem by attempting to partition the data into groups, each of which corresponds to a model instance. J-Linkage [28] applies agglomerative clustering based on whether points agree or not with random model hypotheses. These agreements are based on a threshold and are encoded as binary variables. T-Linkage [21] uses continuous variables to represent the point-model agreement and the Tanimoto distance for agglomerative clustering, but requires a soft threshold to be provided. Magri and Fusiello [22] later formulated multi-structure fitting as a set coverage problem. Wang et al. [30] rely on robust inlier threshold estimation for each model instance and on merging of consistent hypotheses on small data subsets. Isack and Boykov [13] formulate geometric multi-model fitting as an optimal labeling problem considering model fit, local smoothness and the number of models being fitted.

The sparse subspace clustering (SSC) algorithm [8] is based on the observation that data points which lie in the union of multiple low-dimensional manifolds in a high-dimensional space can be clustered by imposing a sparsity constraint on the number of manifolds required to represent them. The number and dimension of the subspaces or the inlier threshold need not be known in advance.

The work of Li et al. [20] is similar to ours: it does not explicitly estimate the model during the inlier clustering process. The method links consecutive image pairs by imposing joint sparsity constraints on the sequence. It has the advantage that it does not require the number of groups to be specified.

Jung et al. [15] proposed the randomized voting (RV) algorithm for rigid motion segmentation. RV is an iterative algorithm that aims to group the input point correspondences according to their compatibility with model hypotheses. Correspondences vote for each hypothesis and the strength of their votes can be used to determine their membership in the motion clusters. The final clusters are extracted via spectral clustering on an affinity matrix derived from the vote accumulators. RV does not need an inlier threshold.

Lai et al. [17] proposed a multi-frame method that imposes sparsity on the correlation matrix of point trajectories across frames. General motions are viewed as compositions of multiple homographies, while the number of motions is estimated in the process. Similar to our work, residuals are used to indicate whether points should be grouped together. Unlike our work, the values of the residuals are dropped and only their indices after sorting are used as point signatures. Spectral clustering is used to infer the final clusters.

3. Approach

In this section, we present our approach in its general form without limiting the description to a particular model for the inliers. We begin with ICR, where the number of clusters, K, is provided as an input, and proceed with r-ICR, which is a recursive formulation of ICR that does not require K.

3.1. Problem Definition and Notation

The goal of ICR is to partition a given set of N observations x_i ∈ X, i = 1,2, …, N into K clusters, with each cluster corresponding to an instance of a given model, or a collection of outliers. We denote the g^th cluster after partitioning as G_g, g = 1,2, …, K. On the other hand, the goal of r-ICR is to partition the data until the remaining clusters are either “pure” (outlier-free) or too small to support a model hypothesis.

In order to describe the geometric relationship of the set of observations, we must specify a geometric model Φ : Φ(θ, x) = 0, where θ is a vector containing the parameters of the model. For example, the model equation Φ for lines on a plane is θ^T x = 0. A model generation function of Φ, denoted as Θ_Φ, also needs to be specified. θ can be determined from a sampled set of observations (denoted by s) through the generation function Θ_Φ (i.e. θ =Θ_Φ(s)). If θ is determined by observations from the g^th cluster G_g, we denote it as θ_g.

A residual model $R_{Φ}$ of the geometric model Φ must also be specified. The residual is a measure of the consistency between an observation and an instantiated model. We use r_i,g to denote the residual of x_i w.r.t. the g^th cluster, (i.e. $r_{i, g} = R_{Φ} (x_{i}, θ_{g})$ ). For example, if we want to cluster 2D points into multiple straight lines, the distance of a point to a line can be used as the residual. An N × K residual matrix R is formed by combining the residuals of all observations to all model instances:

R = [\begin{matrix} r_{1, 1} & r_{1, 2} & \dots & r_{1, K} \\ r_{2, 1} & r_{2, 2} & \dots & r_{2, K} \\ \dots & \dots & \dots & \dots \\ r_{N, 1} & r_{N, 2} & \dots & r_{N, K} \end{matrix}]

(1)

where each row represents the residuals of an observation w.r.t. every model instance and each column represents the residuals of every point w.r.t. a specific model instance.

ICR is iterative: at the end of each iteration, residuals are re-computed and the observations are re-clustered. Therefore, we use a superscript t to denote the result from the t^th iteration. For example, the g^th cluster after the t^th iteration is denoted by $G_{g}^{t}$ ; and the parameters of the model instance of this cluster are denoted by $θ_{g}^{t}$ . (We use capital T to indicate matrix transpose, e.g. R^t.)

3.2. ICR

Algorithm 1 shows the steps of ICR. If we expect outliers to be present in the input, we set the number of clusters to K to obtain K − 1 clusters with the inliers of every motion and one with the outliers. Before entering the first iteration, clusters are initialized by randomly partitioning the observation set X into K subsets. At each iteration, the parameters of a model instance for each cluster are instantiated through the model generation function Θ_Φ using only the minimum required number of samples to decrease the probability of contamination.

Then, the residuals of all observations w.r.t. the generated models are computed. Each residual is defined using an appropriate distance metric between an observation and an instance of the model. For each observation, x_i, K residuals are computed as we generate a model instance per cluster at each iteration. The residual matrix R^1~t is formed by appending the column residuals computed by the current iteration (R^t) to the right of the global residual matrix from the previous iteration (R^1~t−1).

R^{1 ~ t} = [\begin{array}{l} R^{1} & R^{2} & \dots & R^{t} \end{array}]

(2)

Algorithm 1:

ICR

graphic file with name nihms-1730491-t0003.jpg

Open in a new tab

A row of the residual matrix forms a descriptor of its respective observation since observations from the same true cluster have low residuals when the selected hypothesis has been drawn from their motion cluster and high residuals when the hypothesis is drawn from a different cluster, is mixed or contains outliers. The observations are clustered again after each iteration by applying a Gaussian mixture model [23] on the residuals. It is important to note that clustering takes place in the tk-dimensional space of residuals and not in the space of the coordinates of pixels or 3D points.

It is also important to note that clustering does not take place on the entire residual matrix at the end of all iterations because minimal samples (to form model hypotheses) are drawn from the current clusters at a given iteration. Drawing a minimal sample from a given cluster increases the probability of selecting inliers from the same model and thus instantiating valid models. The process terminates when the maximum number of iterations T_cutoff is reached. Compared to r-ICR, which is described below, ICR does not require supervision in the form of data with ground truth.

3.3. r-ICR

Unlike ICR, r-ICR allows clustering the data into an initially unknown number of clusters K, as shown in Algorithm 2. In each recursion, r-ICR initially splits the input cluster in two by calling ICR with K = 2. Then, a Support Vector Machine (SVM) with a radial basis function (RBF) kernel determines whether the split should be accepted or not. A cluster may consist of (i) an almost pure set of inliers, (ii) a set of outliers, (iii) a set of inliers and a set of outliers, or (iv) two or more sets of inliers from different model instances with or without outliers.

Algorithm 2:

r-ICR

graphic file with name nihms-1730491-t0004.jpg

Open in a new tab

The first two cases should not be split, while the last two should be split. If the classifier determines that a split is necessary, the algorithm is applied recursively on the two sub-clusters, otherwise it returns a single cluster. It also returns when contaminated clusters are too small to split.

An SVM operating on feature vectors derived from cluster-level residuals determines whether a cluster is contaminated or not. Let us denote by R_i the average residual of all observations in cluster X_i w.r.t. the model instance θ_i drawn using all observations in same cluster X_i. Let us also denote the input cluster to be split by X₀, and the sub-clusters after the tentative split by X₁ and X₂. A 7D feature vector V describing the contamination is defined as follows. It includes both differences and ratios to allow the classifier to make decisions based on both absolute and relative quantities.

V = [\begin{matrix} R_{0} - R_{1} \\ R_{0} - R_{2} \\ | R_{1} - R_{2} | \\ R_{1} / R_{0} \\ R_{2} / R_{0} \\ R_{1} / R_{2} \\ R_{2} / R_{1} \end{matrix}]

(3)

The SVM is trained on clusters that are labeled as contaminated or uncontaminated. The labeling can be generated automatically using the ground truth correspondence labels that come with the data. The RBF kernel was chosen because it led to the highest classification accuracy.

4. Experiments

In this section, we compare the accuracy and efficiency of ICR and r-ICR on two-view motion segmentation with five baseline approaches: Randomized Voting (RV) [15], RANSAC (RC) [9], Sparse Subspace Clustering (SSC) [8], T-linkage (T-l) [21] and RansaCov (Rcov) [22]. RV and SSC are state-of-the-art approaches that do not require an inlier threshold while RansaCov, RANSAC and T-linkage do. We implemented our algorithms and RANSAC in MATLAB and used the MATLAB implementations of RV, SSC, T-linkage and RansaCov provided by their authors. Our RANSAC implementation detects K clusters of inliers by detecting a model instance with the largest number of inliers, removing the observations that fit the model, and repeating this process K − 1 times on the remaining observations. Following the SSC authors’ guidelines, we provided K to SSC for fairness, even though it is not strictly required. It is worth noting that T-linkage does not require K.

4.1. Residual Model

In these experiments, we use ICR to partition pixel correspondences in clusters consistent with an epipolar geometry represented by a fundamental matrix F. F plays the role of θ and the epipolar constraint, $x_{2}^{T} F x_{1} = 0$ , is the model equation. Due to measurement noise and quantization, corresponding pixel pairs do not strictly satisfy the epipolar constraint. The Sampson distance [11] is used as the residual model $R_{Φ}$ . We used the eight-point algorithm for estimating the fundamental matrix in all methods [12].

4.2. Data

For these experiments, we have used multiple datasets: the Hopkins155 [29], the AdelaideRMF [31] and one we created from the DTU MVS dataset [14]. The Hopkins155 [29] is a very popular dataset for motion segmentation. It consists of 120 two-motion video sequences and 35 three-motion video sequences. It provides a ground truth set of feature tracks over all frames without outlier corruption. The AdelaideRMF [31] is a homography and fundamental matrix estimation dataset. The dataset consists of 38 sequences each containing a set of corresponding pairs over two frames, 19 sequences for fundamental matrix estimation, AdelaideRMF-F, and 19 sequences for homography estimation, AdelaideRMF-H. AdelaideRMF-F is similar to Hopkins155; it provides a set of corresponding feature points on the two images and their labels, which indicate the model they belong to or if they are outliers. The image pairs for fundamental matrix estimation contain one to four motions. Between the acquisition of the first and the second image, the camera and some objects in the scene are moved, while some other objects remain at the same position. Examples are shown in Figures 1. The AdelaideRMF-H is a dataset we have created by modifying the AdelaideRMF-H dataset by discarding all outliers and motion clusters except for one motion per image set. We refer to these data as the “homography data,” but the task is still fundamental matrix estimation.

DTU MVS is a dataset we created from the DTU MVS [14] dataset. This dataset provides ground truth point clouds and calibration information for several mutli-view stereo image sets. Given this information, we use two of the projection matrices to project the point cloud and generate a set of ground truth corresponding pairs over two of the images. The created dataset consists of 8 pairs of images. Each pair has around 400 corresponding features undergoing the same motion since the scenes are rigid. Then, we added a set of outliers, which was created from the AdelaideRMF-F, as needed. Thus the resulting dataset contains around 400 feature points and the up to 1000 outliers, which are realistic since they were extracted from real data. Details on the inlier-outlier ratio for all datasets are provided in Table 1.

Table 1.

Average numbers of inliers and outliers in each of the datasets used in our experiments.

Motions	Adelaide F	Adelaide H	Hopkins	DTU
inliers	115.74	133.26	342.02	294.37
outliers	147.79	286.58	0	Variable*

Open in a new tab

4.3. Evaluation of r-ICR

As r-ICR does not require the number of motions, it splits the clusters containing mostly outliers into smaller clusters. This happens because outliers, by their nature, do not converge at similar residual signatures with other outliers. For inlier clusters, r-ICR achieves high accuracy but when we count the fragmented outlier clusters as mistakenly classified, the overall accuracy of r-ICR is significantly reduced. r-ICR performs similarly to ICR and other baseline methods in terms of clustering inliers as well as separating inliers from outliers. In order to ensure a fair evaluation, as in [8], we introduce r-ICR+K, which is provided with the value of K after r-ICR has converged. r-ICR+K sorts the clusters returned by r-ICR in ascending order by their average residuals; then, it labels the first K − 1 as inlier clusters and merges all the remaining clusters into a single outlier cluster.

4.4. Parameter Settings

The parameters of RV are set as follows: the number of trials, number of an outer iterations in RV algorithm, is set to 5, the number of iterations, which is equivalent to T_{cuto f f}, to 100, voting strength to 2, decay to 0.9 and initialization to 50. Note that we run RV on only two frames in these experiments. The parameters of SSC are set as follows: “projection” and “affine” are set to “no,” while “outliers” is set to “yes”. For RANSAC, the number of iterations, which is equivalent to T_{cutof f} in ICR, is set to 3000, while the inlier threshold is set empirically to achieve good performance. The number of iterations was set so that RANSAC and ICR are given comparable amounts of time to cluster the data. For T-linkage, RansaCov and RANSAC, the theshold was set emperically on each dataset. ICR takes a parameter which is the number of iterations T_{cutof f} that we set to 100, the same as RV. There are also two parameters for the GMM clustering module, which are the covariance type and regularization value, set to diagonal and 0.001, respectively.

For all approaches except r-ICR and T-linkage, we set the number of clusters, K, to be the number of motions plus one (for the outliers).

For r-ICR, we used libSVM [4] to train an SVM with an RBF kernel. The SVM was trained on the fundamental matrix estimation pairs, when it was tested on the homography pairs and vice versa. To collect training data we created clusters containing the inliers of one or multiple motions with and without outliers as well as clusters containing only outliers. We created 77 clusters from the homography pairs and 118 clusters from the fundamental matrix pairs. Each cluster was split 500 times using ICR with different random initializations and random seeds and the resulting clusters were also split recursively. All these clusters can be automatically labelled using the ground truth provided with the dataset. The total number of training instances was 55900 for the homography data and 38500 for the fundamental matrix data. The classifier achieved 95.66% and 97.79% accuracy using 10-fold cross-validation on the homography and the fundamental matrix estimation dataset, respectively.

To evaluate r-ICR on the homography dataset, which contains outdoor images, we trained the classifier on the fundamental matrix dataset, which contains indoor images, and vice versa to examine whether the SVM generalizes well to different datasets.

The experiments were run on a MacBook Pro. The RANSAC and ICR implementations are not optimized. All results reported below are computed by taking the median ME of 100 runs on each image pair and then averaging the median SEs.

4.5. Results

Table 2 reports quantitative results on the AdelaideRMF-F, AdelaideRMF-H, Hopkins and the DTU experiments. The numbers of motions and outliers in the scence has high effect on the result of each method.

Table 2.

Misclassification error (ME) on the Hopkins (HK), AdelaideRMF (A-F and A-H) and DTU datasets. First column for dataset or subset.

Data	T-L^**	RCov	RC	RV	SSC	ICR	r-ICR+K
HK	5.78^*	0.98^*	33.21	0.77 ^*	9.43	9.47	31.18
1 A-F	N/A	N/A	4.23	13.30	24.09	8.47	0.32
2 A-F	N/A	N/A	29.99	5.74	28.78	16.05	9.21
3 A-F	N/A	N/A	30.86	16.90	30.86	23.89	18.71
4 A-F	N/A	N/A	40.05	16.46	31.85	24.53	20.10

All A-F	9.37^*	6.04 ^*	28.91	12.57	28.88	18.23	11.98

A-H	N/A	N/A	4.72	12.79	33.49	5.00	2.19

DTU-10	0.97	0.50	0.34	0.21	39.39	0.73	2.21
DTU-50	5.49	0.32	0.97	0.26	42.27	0.00	0.00
DTU-100	8.95	0.03	1.29	0.19	44.20	0.00	0.00
DTU-200	13.25	0.07	1.10	0.02	44.37	0.00	0.00
DTU-500	19.51	0.04	2.46	10.66	46.09	7.63	0.22
DTU-1000	48.60	0.01	9.49	32.98	47.85	45.53	8.37

Open in a new tab

denotes results reported by their authors and

^**

denotes that a method does not need the number of clusters K as a priori. The number after DTU indicates the number of outliers added to the data.

Figure 2 shows examples of qualitative results, including results for r-ICR, on the AdelaideRMF-F and AdelaideRMF-H datasets. It can be seen that r-ICR is able to cluster all inliers very accurately, but splits the outlier cluster into multiple clusters. When the clusters are ordered in ascending order according to their average residual values, outlier clusters mostly appear after inlier clusters. This gives an advantage to r-ICR when applied to tasks which only require detecting the largest inlier cluster such as visual odometry and Simultaneous Localization and Mapping (SLAM).

Fig. 2. — Clustering results for: (first six pictures) a fundamental matrix estimation example (biscuitbook) with two motions and outliers; (last six pictures): a homography estimation example (elderhallb) with one motion and outliers. Under each image, we list the name of the algorithm and the SE. Note: ME results are for the specific run shown in the figure. (*) only algorithm not provided the number of clusters K

Table 3 lists the run times for each approach in Matlab. ICR is significantly faster than the other methods, while r-ICR+K, RV and SSC are similar. An RV iteration is more expensive than an ICR iteration. This is because ICR hypotheses are more likely to be correct since the current clusters are considered during sampling. RANSAC iterations are lightweight, but more iterations are required and model instances must be extracted one by one. Lastly, the computational time for T-linkage and RansaCov grows exponentially as the number of points increases.

Table 3.

computation time in seconds for experiments on the Hopkins (HK), AdelaideRMF (A-F and A-H) and DTU datasets. The number after DTU indicates the number of outliers added to the data.

Data	T-l	RCov	RC	RV	SSC	ICR	r-ICR+K
HK	N/A	N/A	33.96	N/A	N/A	0.68	0.83
1 A-F	N/A	N/A	21.25	4.91	0.99	0.78	3.27
2 A-F	N/A	N/A	46.52	7.26	1.06	0.79	4.27
3 A-F	N/A	N/A	65.49	8.58	0.86	0.73	2.75
4 A-F	N/A	N/A	112.41	10.71	1.15	0.72	4.61

All A-F	N/A	N/A	55.12	7.61	0.99	0.76	3.54

A-H	N/A	N/A	27.44	5.74	10.59	0.97	4.65

DTU-10	7.02	28.82	24.64	2.41	1.12	0.72	0.77
DTU-50	9.33	30.42	26.68	2.75	1.48	0.77	1.86
DTU-100	12.94	34.60	27.96	3.58	1.86	0.80	2.67
DTU-200	24.86	52.17	33.41	4.75	3.15	0.87	5.19
DTU-500	76.60	107.91	47.29	5.76	9.76	1.34	13.74
DTU-1000	496.30	478.05	59.04	7.35	66.61	1.63	29.00

Open in a new tab

The most important observations are the following:

r-ICR+K ranks first on overall average on both the AdelaideRMF-H datasets.
ICR ranks third overall in AdelaideRMF-F, trailing RV for data with three or four motions and trailing RANSAC on one motion and the AdelaideRMF-H dataset.
RV ranks first when there are two, three and four motions. It is the only algorithm whose error does not monotonically increase with the number of motions on both AdelaideRMF-F and Hopkins. Without proof, we attribute this to the observation that RV benefits from having competing clusters of similar quality, which is not the case when only one motion is present among outliers.
RANSAC is very effective only when there is one inlier group among outliers, which is shown in its results on the one motion of AdelaideRMF-F, AdelaideRMF-H and DTU datasets.
On the Hopkins dataset, ICR does not outperform the state-of-the-art-methods but achieves 9.47%. However, r-ICR+K perform bad which could be due to the combination of non-existence of outliers and having multiple motions.
SSC is stable as the number of motions varies.
ICR and r-ICR+k do very well in one motion plus outlier across datasets.
In clustering speed, ICR is the fastest in all experiments.
RansaCov outperforms the other algorithms in most experiments, but it becomes exponentially slower as the number of points increase.
In the DTU experiments, we observe zero errors when ICR and r-ICR are applied on data with 50–200 outliers, but large errors when fewer (10) outliers are included. This is due to the fact that a sufficient number of outliers are required for ICR and r-ICR to distinguish them from inliers. Otherwise, the algorithms divide the inlier group.

5. Conclusion

We have presented an approach for clustering data as inliers to instances of a geometric model and for detecting outliers. An unsupervised and a supervised algorithm based on this approach were also presented. Our approach is threshold-free and can handle one or more clusters that fit geometric models in the presence of outliers. It is based on the observation that signatures for the data can be obtained from their residuals w.r.t. randomly drawn model hypotheses. Moreover, r-ICR does not require the number of clusters to be specified a priori. It uses a classifier to decide if a cluster is contaminated; and if so, it recursively splits it.

We applied the proposed approaches on motion segmentation guided by the epipolar constraint. Quantitative results show that ICR and r-ICR are computationally efficient and achieve competitive results to a number of baseline methods representing the state-of-the-art motion segmentation paradigms. In future work, we plan to conduct additional experiments on estimating different models, such as homographies and 3D similarity transformations.

We are optimistic that our algorithms will find practical applications. Variations in the scene, camera response function, camera pose, number of motions and number of pixels are all important factors that may affect a motion segmentation system. In real-world applications, predicting the number of moving objects in a scene is virtually impossible. Predicting their motion properties is another daunting task. Therefore, algorithms that do not require knowing the number of clusters or setting an inlier threshold a priori are valuable. Moreover, reasonable computational complexity as a function of the number of primitives is also desirable.

Highlights.

ICR is an approach for geometry-based clustering that requires no threshold, even in the presence of multiple clusters.
r-ICR is a recursive formulation of ICR which does not require the number of clusters to be externally specified.
Validation of both ICR variants on real datasets for motion segmentation.
Unlike baselines, ICR and r-ICR time complexity grows more slowly as a function of number of points.

Acknowledgments

Research reported in this publication was supported by the National Institute of Nursing Research of the National Institutes of Health under Award Number R01NR015371.

Footnotes

Declaration of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

[1].Barath D, Matas J, 2018. Graph-cut RANSAC, in: CVPR, pp. 6733–6741. [Google Scholar]
[2].Barath D, Matas J, Noskova J, 2019. MAGSAC: marginalizing sample consensus, in: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 10197–10205. [Google Scholar]
[3].Barath D, Noskova J, Ivashechkin M, Matas J, 2020. MAGSAC++, a fast, reliable and accurate robust estimator, in: CVPR, pp. 1304–1312. [Google Scholar]
[4].Chang CC, Lin CJ, 2011. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2, 27. [Google Scholar]
[5].Chin TJ, Heng Kee Y, Eriksson A, Neumann F, 2016. Guaranteed outlier removal with mixed integer linear programs, in: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 5858–5866. [Google Scholar]
[6].Choi J, Medioni G, 2009. StaRSaC: Stable random sample consensus for parameter estimation, in: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 675–682. [Google Scholar]
[7].Chum O, Matas J, Kittler J, 2003. Locally optimized RANSAC. Pattern Recognition, 236–243. [Google Scholar]
[8].Elhamifar E, Vidal R, 2013. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Trans. on Pattern Analysis and Machine Intelligence 35, 2765–2781. [DOI] [PubMed] [Google Scholar]
[9].Fischler MA, Bolles RC, 1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24, 381–395. [Google Scholar]
[10].Fragoso V, Sweeney C, Sen P, Turk M, 2017. ANSAC: adaptive non-minimal sample and consensus, in: British Machine Vision Conference, pp. 43.1–43.11. [Google Scholar]
[11].Hartley R, Zisserman A, 2003. Multiple view geometry in computer vision. Cambridge university press. [Google Scholar]
[12].Hartley RI, 1997. In defense of the eight-point algorithm. IEEE Trans. on Pattern Analysis and Machine Intelligence 19, 580–593. [Google Scholar]
[13].Isack H, Boykov Y, 2012. Energy-based geometric multi-model fitting. Int. Journal of Computer Vision 97, 123–147. [Google Scholar]
[14].Jensen R, Dahl A, Vogiatzis G, Tola E, Aanæs H, 2014. Large scale multi-view stereopsis evaluation, in: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 406–413. [Google Scholar]
[15].Jung H, Ju J, Kim J, 2014. Rigid motion segmentation using randomized voting, in: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1210–1217. [Google Scholar]
[16].Kluger F, Brachmann E, Ackermann H, Rother C, Yang MY, Rosenhahn B, 2020. CONSAC: robust multi-model fitting by conditional sample consensus, in: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 4634–4643. [Google Scholar]
[17].Lai T, Wang H, Yan Y, Chin TJ, Zhao WL, 2017. Motion segmentation via a sparsity constraint. IEEE Transactions on Intelligent Transportation Systems 18, 973–983. [Google Scholar]
[18].Lebeda K, Matas J, Chum O, 2012. Fixing the locally optimized RANSAC–full experimental evaluation, in: British Machine Vision Conference, pp. 1–11. [Google Scholar]
[19].Li H, 2009. Consensus set maximization with guaranteed global optimality for robust geometry estimation, in: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1074–1080. [Google Scholar]
[20].Li Z, Guo J, Cheong LF, Zhiying Zhou S, 2013. Perspective motion segmentation via collaborative clustering, in: Int. Conf. on Computer Vision, pp. 1369–1376. [Google Scholar]
[21].Magri L, Fusiello A, 2014. T-Linkage: A continuous relaxation of J-Linkage for multi-model fitting, in: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 3954–3961. [Google Scholar]
[22].Magri L, Fusiello A, 2016. Multiple model fitting as a set coverage problem, in: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 3318–3326. [Google Scholar]
[23].McLachlan G, Peel D, 2000. Mixtures of factor analyzers. Finite Mixture Models, 238–256. [Google Scholar]
[24].Moisan L, Moulon P, Monasse P, 2012. Automatic homographic registration of a pair of images, with a contrario elimination of outliers. Image Processing On Line 2, 56–73. [Google Scholar]
[25].Raguram R, Chum O, Pollefeys M, Matas J, Frahm JM, 2013. USAC: a universal framework for random sample consensus. IEEE Trans. on Pattern Analysis and Machine Intelligence 35, 2022–2038. [DOI] [PubMed] [Google Scholar]
[26].Raguram R, Frahm JM, 2011. RECON: Scale-adaptive robust estimation via residual consensus, in: Int. Conf. on Computer Vision, pp. 1299–1306. [Google Scholar]
[27].Stewart CV, 1995. MINPRAN: a new robust estimator for computer vision. IEEE Transactions on Pattern Analysis and Machine Intelligence 17, 925–938. [Google Scholar]
[28].Toldo R, Fusiello A, 2008. Robust multiple structures estimation with J-Llinkage, in: European Conf. on Computer Vision, pp. 537–547. [Google Scholar]
[29].Tron R, Vidal R, 2007. A benchmark for the comparison of 3-d motion segmentation algorithms, in: IEEE Conf. on Computer Vision and Pattern Recognition. [Google Scholar]
[30].Wang H, Chin TJ, Suter D, 2012. Simultaneously fitting and segmenting multiple-structure data with outliers. IEEE Trans. on Pattern Analysis and Machine Intelligence 34, 1177–1192. [DOI] [PubMed] [Google Scholar]
[31].Wong HS, Chin TJ, Yu J, Suter D, 2011. Dynamic and hierarchical multi-structure geometric model fitting, in: Int. Conf. on Computer Vision, pp. 1044–1051. [Google Scholar]
[32].Zhang W, Kosecka J, 2006. A new inlier identification procedure for robust estimation problems, in: Robotics: Science and Systems Conference. [Google Scholar]

[R1] [1].Barath D, Matas J, 2018. Graph-cut RANSAC, in: CVPR, pp. 6733–6741. [Google Scholar]

[R2] [2].Barath D, Matas J, Noskova J, 2019. MAGSAC: marginalizing sample consensus, in: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 10197–10205. [Google Scholar]

[R3] [3].Barath D, Noskova J, Ivashechkin M, Matas J, 2020. MAGSAC++, a fast, reliable and accurate robust estimator, in: CVPR, pp. 1304–1312. [Google Scholar]

[R4] [4].Chang CC, Lin CJ, 2011. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2, 27. [Google Scholar]

[R5] [5].Chin TJ, Heng Kee Y, Eriksson A, Neumann F, 2016. Guaranteed outlier removal with mixed integer linear programs, in: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 5858–5866. [Google Scholar]

[R6] [6].Choi J, Medioni G, 2009. StaRSaC: Stable random sample consensus for parameter estimation, in: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 675–682. [Google Scholar]

[R7] [7].Chum O, Matas J, Kittler J, 2003. Locally optimized RANSAC. Pattern Recognition, 236–243. [Google Scholar]

[R8] [8].Elhamifar E, Vidal R, 2013. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Trans. on Pattern Analysis and Machine Intelligence 35, 2765–2781. [DOI] [PubMed] [Google Scholar]

[R9] [9].Fischler MA, Bolles RC, 1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24, 381–395. [Google Scholar]

[R10] [10].Fragoso V, Sweeney C, Sen P, Turk M, 2017. ANSAC: adaptive non-minimal sample and consensus, in: British Machine Vision Conference, pp. 43.1–43.11. [Google Scholar]

[R11] [11].Hartley R, Zisserman A, 2003. Multiple view geometry in computer vision. Cambridge university press. [Google Scholar]

[R12] [12].Hartley RI, 1997. In defense of the eight-point algorithm. IEEE Trans. on Pattern Analysis and Machine Intelligence 19, 580–593. [Google Scholar]

[R13] [13].Isack H, Boykov Y, 2012. Energy-based geometric multi-model fitting. Int. Journal of Computer Vision 97, 123–147. [Google Scholar]

[R14] [14].Jensen R, Dahl A, Vogiatzis G, Tola E, Aanæs H, 2014. Large scale multi-view stereopsis evaluation, in: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 406–413. [Google Scholar]

[R15] [15].Jung H, Ju J, Kim J, 2014. Rigid motion segmentation using randomized voting, in: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1210–1217. [Google Scholar]

[R16] [16].Kluger F, Brachmann E, Ackermann H, Rother C, Yang MY, Rosenhahn B, 2020. CONSAC: robust multi-model fitting by conditional sample consensus, in: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 4634–4643. [Google Scholar]

[R17] [17].Lai T, Wang H, Yan Y, Chin TJ, Zhao WL, 2017. Motion segmentation via a sparsity constraint. IEEE Transactions on Intelligent Transportation Systems 18, 973–983. [Google Scholar]

[R18] [18].Lebeda K, Matas J, Chum O, 2012. Fixing the locally optimized RANSAC–full experimental evaluation, in: British Machine Vision Conference, pp. 1–11. [Google Scholar]

[R19] [19].Li H, 2009. Consensus set maximization with guaranteed global optimality for robust geometry estimation, in: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1074–1080. [Google Scholar]

[R20] [20].Li Z, Guo J, Cheong LF, Zhiying Zhou S, 2013. Perspective motion segmentation via collaborative clustering, in: Int. Conf. on Computer Vision, pp. 1369–1376. [Google Scholar]

[R21] [21].Magri L, Fusiello A, 2014. T-Linkage: A continuous relaxation of J-Linkage for multi-model fitting, in: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 3954–3961. [Google Scholar]

[R22] [22].Magri L, Fusiello A, 2016. Multiple model fitting as a set coverage problem, in: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 3318–3326. [Google Scholar]

[R23] [23].McLachlan G, Peel D, 2000. Mixtures of factor analyzers. Finite Mixture Models, 238–256. [Google Scholar]

[R24] [24].Moisan L, Moulon P, Monasse P, 2012. Automatic homographic registration of a pair of images, with a contrario elimination of outliers. Image Processing On Line 2, 56–73. [Google Scholar]

[R25] [25].Raguram R, Chum O, Pollefeys M, Matas J, Frahm JM, 2013. USAC: a universal framework for random sample consensus. IEEE Trans. on Pattern Analysis and Machine Intelligence 35, 2022–2038. [DOI] [PubMed] [Google Scholar]

[R26] [26].Raguram R, Frahm JM, 2011. RECON: Scale-adaptive robust estimation via residual consensus, in: Int. Conf. on Computer Vision, pp. 1299–1306. [Google Scholar]

[R27] [27].Stewart CV, 1995. MINPRAN: a new robust estimator for computer vision. IEEE Transactions on Pattern Analysis and Machine Intelligence 17, 925–938. [Google Scholar]

[R28] [28].Toldo R, Fusiello A, 2008. Robust multiple structures estimation with J-Llinkage, in: European Conf. on Computer Vision, pp. 537–547. [Google Scholar]

[R29] [29].Tron R, Vidal R, 2007. A benchmark for the comparison of 3-d motion segmentation algorithms, in: IEEE Conf. on Computer Vision and Pattern Recognition. [Google Scholar]

[R30] [30].Wang H, Chin TJ, Suter D, 2012. Simultaneously fitting and segmenting multiple-structure data with outliers. IEEE Trans. on Pattern Analysis and Machine Intelligence 34, 1177–1192. [DOI] [PubMed] [Google Scholar]

[R31] [31].Wong HS, Chin TJ, Yu J, Suter D, 2011. Dynamic and hierarchical multi-structure geometric model fitting, in: Int. Conf. on Computer Vision, pp. 1044–1051. [Google Scholar]

[R32] [32].Zhang W, Kosecka J, 2006. A new inlier identification procedure for robust estimation problems, in: Robotics: Science and Systems Conference. [Google Scholar]

PERMALINK

Inlier Clustering based on the Residuals of Random Hypotheses

Mohammed Kutbi

Yizhe Chang

Philippos Mordohai

Abstract

1. Introduction

Fig. 1.

2. Related Work