Abstract
The ability to handle outliers is essential for performing the perspective-n-point (PnP) approach in practical applications, but conventional RANSAC+P3P or P4P methods have high time complexities. We propose a fast PnP solution named R1PPnP to handle outliers by utilizing a soft re-weighting mechanism and the 1-point RANSAC scheme. We first present a PnP algorithm, which serves as the core of R1PPnP, for solving the PnP problem in outlier-free situations. The core algorithm is an optimal process minimizing an objective function conducted with a random control point. Then, to reduce the impact of outliers, we propose a reprojection error-based re-weighting method and integrate it into the core algorithm. Finally, we employ the 1-point RANSAC scheme to try different control points. Experiments with synthetic and real-world data demonstrate that R1PPnP is faster than RANSAC+P3P or P4P methods especially when the percentage of outliers is large, and is accurate. Besides, comparisons with outlier-free synthetic data show that R1PPnP is among the most accurate and fast PnP solutions, which usually serve as the final refinement step of RANSAC+P3P or P4P. Compared with REPPnP, which is the state-of-the-art PnP algorithm with an explicit outliers-handling mechanism, R1PPnP is slower but does not suffer from the percentage of outliers limitation as REPPnP.
Keywords: Perspective-n-Point, 1-Point RANSAC, soft re-weighting, robustness to outliers
1. Introduction
THE perspective-n-point (PnP) problem aims to determine the position and orientation of a calibrated camera from n known correspondences between three-dimensional (3D) object points and their two-dimensional (2D) image projections. PnP is a core problem in the computer vision field and has found many applications, such as robot vision navigation [1], augmented reality [2], and computer animation. In the past decades, many effective PnP approaches have been proposed with very fast computational speed [3] [4] and high accuracy [5] [6].
To date, most PnP algorithms are designed under the assumption that no outlier exists among the given 3D-2D correspondences. However, in practical applications, this outlier-free assumption is often difficult to satisfy. This is because image feature detection and matching approaches, such as SURF [7], BRISK [8] and ORB [9], do not always give perfect results due to scaling, illumination, shadow and occlusion. Outliers are often unavoidable and they have a significant impact on the PnP methods. Even a small percentage of outliers will lead to a significant decrease in accuracy. Hence, the ability to handle outliers is essential for performing PnP algorithms in practical applications. The most common outliers handling mechanism is to combine a PnP (n = 3 or 4) algorithm [10] [11] [12] with the RANSAC-based scheme [13] to eliminate outliers, and then perform a more accurate PnP algorithm with the remaining inliers to refine the result. A number of very fast closed-form P3P [14] or P4P [15] algorithms have been proposed. However, their RANSAC combination scheme still needs many trials until the selected three or four 3D-2D correspondences are all inliers, which results in a high time complexity. Hence, the computational speed decreases significantly as the percentage of outliers increases. To reduce the time complexity, one natural idea is to utilize the PnP algorithm with a smaller n. However, when n = 1 or 2, the PnP problem has infinitely many solutions, which makes the conventional RANSAC-based scheme infeasible.
To the best of our knowledge, except for RANSAC-based methods, the only PnP method that addresses outliers is REPPnP [3] proposed by Ferraz et al., which is the state-of-the-art PnP method that is robust to outliers. REPPnP integrates an outlier rejection mechanism with camera pose estimation. It formulates the pose estimation problem as a low-rank homogeneous system in which the solution lies on its one-dimensional (1D) null space. Outlier correspondences are those rows of the linear system that perturb the null space and are progressively detected by projecting them on an iteratively estimated solution of the null space. Although REPPnP is very fast and accurate, it suffers from a severe limitation that it cannot handle more than approximately 50% of outliers.
In this paper, we propose a robust 1-point RANSAC-based PnP method named R1PPnP. We first present an optimal iterative process as the core PnP algorithm of R1PPnP. The core algorithm takes a random 3D-2D correspondence as the control point. To address outliers, we propose a soft weight assignment method according to reprojection errors to distinguish inliers and outliers, and integrate it into the core algorithm. The weight factors associated with outliers decrease significantly during the iteration to reduce the impact of outliers. Finally, we employ the 1-point RANSAC scheme to try different control points for the core PnP algorithm. By using this combination of the RANSAC scheme and the soft weight assignment, the algorithm is capable of eliminating outliers when the selected control point is an inlier.
The main advantage of R1PPnP is that it has much lower time complexity and is much faster than conventional RANSAC+P3P or P4P methods, especially when the percentage of outliers is large. Compared with REPPnP, the proposed R1PPnP does not suffer from the percentage of outliers limitation.
This paper is organized as follows. In Section II, we describe the fundamental model used in R1PPnP. The details of the core algorithm are given in Section III, in which we also provide its proof of convergence, local minima analysis and the strategy to select control points. The outliers handling mechanism, including soft weight assignment and the 1-point RANSAC scheme, is introduced in Section IV. We also provide details of termination conditions in Section IV. Evaluation results are presented in Section V. A discussion and description of planned future work is described in Section VI.
1.1. Related Works
The PnP problem, coined by Fischler and Bolles [13], is articulated as follows: Given the relative spatial locations of n control points, and given the angle to every pair of control points Pi from an additional point called the center of perspective C, find the lengths of the line segments joining C to each of the control points. The PnP problem has been studied for many years. In early studies, direct linear transformation (DLT) [16] was used as a solution in a straightforward way by solving a linear system. However, DLT ignores the intrinsic camera parameters, which are assumed to be known, and therefore generally leads to a less stable pose estimate.
In the past decade, researchers have proposed many PnP methods to improve speed, accuracy, and robustness to outliers. The PnP methods can be roughly classified into non-iterative and iterative methods. Generally speaking, non-iterative methods are more efficient but are unstable under image noise and outliers. Many non-iterative PnP methods are based on a set of small number of points (n = 3, 4). They are referred to as P3P [17] [11] [14] or P4P [18] [15] [19] [20] methods. P3P is the smallest subset of control points that yields a finite number of solutions [14] [21]. When the intrinsic camera parameters are known and we have n ≥ 4 points, the solution is generally unique. Triggs proposed a PnP method with four- or five-correspondences [22]. These PnP methods based on less than four correspondences do not make use of redundant points and are very sensitive to noise and outliers. However, due to their efficiency and capability to calculate from a small point set, P3P or P4P methods are very useful for combing a RANSAC-like scheme to reject outliers. There are also many non-iterative PnP methods that are able to make use of redundant points but are quite time consuming. For example, Ansar’s method is O(n8) [23] and Fiore’s is O(n2) [24]. Schweighofer proposed an O(n) PnP method named SDP, but is slow [25]. In recent years, three excellent O(n) effective non-iterative PnP methods, EPnP [4], RPnP [26] and UPnP [27], have been proposed, and these methods are very efficient and accurate even compared to iterative methods.
Iterative PnP methods [6] [28] [29] [30] are mostly optimization methods that decrease their energy function in the iterative process. They are generally more accurate and robust, but slower. For example, Dementhon proposed POSIT that is easy to implement [29] and further proposed SoftPOSIT to handle situations when the correspondence relationships are unknown [31]. Although SoftPOSIT has a certain ability to handle outliers, the strong assumption that all correspondences are unknown make it slow. Lu’s method [6] is the most accurate iterative PnP method but may get stuck in local minima. Schweighofer discussed the local minima situation of Lu’s method and proposed a method to avoid this limitation [32].
PnP algorithms are widely used in applications such as structure from motion [33] and monocular SLAM [34], which require dealing with hundreds or even thousands of noisy feature points and outliers in real-time. The fact that outliers have a much greater impact on PnP accuracy than image Gaussian white noise makes it is necessary for the PnP algorithm to handle outliers efficiently. Conventional method to handle outliers is to combine a RANSAC-like scheme with the P3P or P4P algorithms. Besides, L1-norm is also widely used to handle a certain amount of outliers [35] [36] because the L1-norm penalty is less sensitive to outliers than the L2-norm penalty. Although a L1-norm-based energy function is more robust to outliers, it cannot absolutely get rid of outliers and its computation is more complex.
Ferraz et al. proposed a very fast PnP method that can handle up to 50% of outliers [3]. The outlier rejection mechanism is integrated within the pose estimation pipeline with negligible computational overhead. Compared to Ferraz’s method, the R1PPnP algorithm proposed in this paper demonstrates much stronger robustness, but is slower.
2. Fundamental Model
In this paper we denote the camera frame as c and the world frame as w. For point i, without taking into account the distortion, the perspective projection equations are employed to describe the pinhole camera model,
| (1) |
where f is the camera focal length, xi = [ui, vi, f]T is the image homogeneous coordinate in pixels, and is the real-world coordinate with respect to the camera frame.
According to (1),
| (2) |
where is the normalized depth of point i. (2) indicates that an object point lies on the straight line of sight of the related image point.
The relationship between the camera and world frame coordinate of point i is
| (3) |
where R ϵ SO(3) is the rotation matrix and t ϵ R3 is the translation vector. R and t are the variables that need to be estimated in the PnP problem.
Similarly to the translation elimination method used in works [37] [38], with two points i and o,
| (4) |
In the proposed R1PPnP algorithm, o ϵ [1, N] suggests the index of the control point, N is the number of 3D-2D correspondences. R1PPnP represents the shape of the point cloud by the relative positions between the control point o and other N – 1 points. Denoting , where S means “shape”, then, according to (2) and (4),
| (5) |
We divide both sides of (5) by the depth of the control point , and rewrite (5) as
| (6) |
where and is the scale factor. We have
| (7) |
According to (6) and (7), the PnP problem can be solved by minimizing the objective function
| (8) |
where is the L2-norm.
The objective function (8) is based on Euclidean distances in the 3D space. Compared with the reprojection error cost, Eq.(8) gives more weights to points with larger depthes. For example, the same level of reprojection error has relatively larger effects when related to an object point with greater depth. To solve this problem, we normalize the cost function (8) with depthes of points and propose the objective function of our R1PPnP algorithm, that is
| (9) |
where 1/λi is introduced to adjust the weight of point i to eliminate the inequity among points in Eq.(8).
We estimate R, μ and λi (i = 1, …, N, i ≠ o) by minimizing the objective function (9), the variables of which consist of two parts: the camera pose {R, µ} and the relative depths with respect to the control point {λi}. To describe the following algorithm intuitively, we introduce two sets of points: pi and qi. With a randomly selected control o, points pi are determined by the camera pose {R, µ}, and points qi are determined by depths λi. We have
| (10) |
| (11) |
As shown in Fig. 1, points pi are attached with the virtual object obtained by rotating and scaling the real object around the control point po = qo = xo. qi is the projection of pi on the corresponding line of sight. The objective function (9) is equivalent to
| (12) |
Fig. 1.
Demonstration of geometrical relationships with a bunny model. The mouth point is the control point o. In algorithm, all virtual points rotate and scale around the control point po = qo = xo. We use the tail point to exemplify pi and its projection qi. Plane A is parallel to the imaging plane and passes the camera optical center. Without loss of generality and for clearer demonstration, in this figure we use focal length f = 1 and all depthes are distances between points and plane A.
As this objective function approaches the global optimal solution and as shown in Fig. 1, point pi gets close to qi and the z-component of pi gets close to fλi. Hence, it is expected that the objective function (12) has similar optimal solutions as the conventional reprojection error cost, because
| (13) |
3. Core Algorithm Design
We first introduce the core algorithm of R1PPnP, which solves the PnP problem in outlier-free situations. This section introduces the core algorithm process, proof of convergence, the local minima avoidance mechanism and the strategy to select the control point.
The core algorithm of R1PPnP is an optimal iterative process with the objective function (9) or (12). In each iteration, it estimates the points set qi and pi (i = 1, …, N, i = o) alternately by fixing one points set and updating the other one according to the objective function minimization.
(1). qi estimation stage.
Because each qi are independent with each other, our algorithm seeks the closest qi for each pi. According to (11), points qi are constrained to the related lines of sight. Hence, we vertically project pi onto the related lines of sight to obtain the points’ relative depths with respect to the control point o by
| (14) |
Then, points qi are updated according to Eq.(11).
(2). pi estimation stage.
Points pi are determined by R and μ. According to (10), the updated R and µ should make points {µRSi} have the smallest weighted sum of squared distances to points {qi − xo}, and subject to RT R = I3×3. According to the objective function (12), the weights used in this stage are 1/λi in the previous iteration.
Denoting matrices and , then according to Ref. [39]
| (15) |
Because points pi are directly generated from Si according to Eq.(10), Eq.(15) suggests that R is updated according to the differences between points pi and qi in the 3D space. However, Fig.2 demonstrates that by using this method, the updating rate of µ may be slow in situations when the range of depths is large. To achieve faster convergence rate, we update the scale factor µ by comparing the projected image coordinates of pi, which are denoted as vi, and the real image points xi. Denoting matrices B = [v1 − xo, …, vN − xo]3×N and C = [x1 − xo, …, xN − xo]3×N. µ is updated by
| (16) |
| (17) |
Fig. 2.

Demonstration of the updating method of the scale factor µ. One possible method is to update µ according to the Euclidean distances between pi, qi and o, which works for p1 and q1 because they have close depthes as o. However, this method may result in slow µ updating rate for p2 and q2 because . Hence, it is more efficient to compare vi and xi to move points pi to the related lines of sight.
Finally, points pi are updated according to Eq.(10).
3.1. Proof of Convergence
We first provide the mathematical proof of the convergence of R1PPnP when not using 1/λi as weights in the objective function (12). k denotes the number of iterations, is obtained by vertically projecting to the line of sight i, and and are on the line of sight i. Hence, the three points, ,, and comprise a right-angled triangle. Therefore, for each index i, i ƒ ≠ o,
| (18) |
In the p(k+1) updating stage, the updated R and µ make the objective function (12) smaller. Hence,
| (19) |
According to (18), (19), and the objective function (12),
| (20) |
Hence, the objective function will strictly decrease until when not using 1/λi as weights. However, when 1/λi is applied in the objective funtion, the above convergence proof is not rigorous in mathematics because . As the iteration process, the changes of λi become small, which makes the formula (20) hold. In addition, our experimental results in this paper also support the assumption that our algorithm is convergent.
3.2. Local Minima Avoidance
We have concluded that the iterative process of R1PPnP is convergent. However, we still need to address situations that R1PPnP may get stuck in local minima. To demonstrate the iterative process more intuitively, we introduce a 1D camera working in the 2D space, as shown in Fig. 3. In this demonstration, an object with four points Pi, i = 1, …, 4 are projected to the camera image plane and their image points xi are obtained. P1 is selected as the control point, which means o = 1. Different initial values may result in different convergence results.
Fig. 3.
Iterative process with 2D space and 1D camera imaging plane. P1, P2, P3 and P4 are the object points; P1 is selected as the control point (o = 1). (a) The process makes the estimation pose approach the correct solution; (b) the rotation related to p(k+1) is worse than that related to p(k), which means the process is approaching a local minima, which is a mirror-image form of the true object shape.
Fig. 3(a) demonstrates a process that is approaching the correct global optimal results. Beginning with points p(k), the algorithm projects p(k) to their related lines of sight and obtains points q(k+1). Then, according to q(k+1), the algorithm updates the rotation R and scale factor µ to generate points p(k+1). In this process, the rotation and scale factor related to p(k+1) are closer to the truth compared to that related to p(k), and finally the algorithm will reach the correct solution.
However, as the progress shown in Fig. 3(b) indicates, p(k+1) has a larger pose error than p(k), and the algorithm will finally get stuck in local minima. The reconstructed points p(k) or p(k+1) come in mirror-image forms of the real object points P. Without loss of generality, in either 2D or 3D space, a mirror-image form suggests that the left-right-handed shape of the point cloud has been changed, which should not happen in reality. The reason the core algorithm of R1PPnP may generate points with different left-right-handed shape is that its rotation estimation equation (15) does not constrict det(R) = 1.
In practice we found it not appropriate to constrain det(R) = 1 from the beginning of the algorithm. Instead, we allow the iteration process to approach the mirror-image form. This is because we found that, with the constrain det(R) = 1 from the beginning, the algorithm has many types of local minima and they are unpredictable. However, without this constrain, the convergence direction of the core algorithm becomes predictable, with only two types of convergence. The algorithm may reach the global optimal result directly or the approximate mirror-image form. For the latter case, the estimated det(R) = −1.
Hence, according to the above analysis, we propose the local minima avoidance mechanism. The algorithm begins with a random initial value and control point. When the algorithm converges to a result with det(R) = −1, we perform a mirror flip by
| (21) |
3.3. Control Point o Selection
To select different points as the control point o may result in different convergence rates. Without taking into account noise, the correct value of rotation R should be the same for any control point o in a PnP task. Hence, larger rotation updating steps in the iteration process suggest that less number of iterations are required to converge to the correct value when starting from the same initial value. In R1PPnP, R is updated according to the differences between points pi and qi, i = 1, …, N, i ≠ o. When point o is close to pi, the rotation updating steps are more likely to be large, as shown in Fig.4(a). The updating rates of µ also follow this analysis. Therefore, we are prone to select the control point o from the center of the point cloud, which has better odds of having smaller distances to the rest of the point cloud to achieve faster convergence rate, as shown in Fig.4(b).
Fig. 4.
In R1PPnP, the selection of control point o is related to the convergence rate. (a) An example to illustrate this behavior with 2D space and 1D camera imaging plane. According to the R and µ updating methods in Eqs.(15) and (16). ∠ (p − o1, q − o1) > ∠ (p − o2, q − o2) and suggest that the iteration process is more likely to have larger R and µ updating rate when o is closer to p. (b) A real-world example to illustrate this behavior, the radius of a circle represents the required number of iterations when using this feature point as the control point o.
4. Outliers Handling Mechanism
The robust and fast capability of handling outliers is the main contribution of the proposed R1PPnP algorithm. Our outliers handling mechanism combines a soft weight assignment method and the 1-point RANSAC scheme.
4.1. Soft Re-weighting
R1PPnP mainly consists of qi and pi estimation stages. As described in Section 3, in the qi stage, calculations related to each point are independent from the others. Hence, outliers do not affect inliers in the qi stage. However, in the pi stage, outliers perturbs the camera pose estimation results. To reduce the impact of outliers, the basic idea of our soft re-weighting method is to assign each 3D-2D correspondence a weight factor, and to make weight factors related to outliers small when estimating the camera pose in the pi stage.
One possible method to assign weights is based on least median of squares [40], however this method cannot handle more than 50% of outliers. We designed a soft weight assignment method embedded in the iteration process. To distinguish inliers and outliers, the weights of 3D-2D correspondences are determined by
| (22) |
where ei suggests the reprojection error of point i with the current R and µ during iteration, H is the inliers threshold that points with final reprojection errors smaller than H are considered as inliers. The reweighting rule (22) suggests that a point with a large reprojection error will have a small weight during the estimation of camera pose, which is designed under a reasonable assumption that outliers have much larger reprojection errors than inliers. Although inliers may also have larger reprojection errors than H during the iteration process, it is acceptable to assign weights that are smaller than 1 to inliers as long as outliers have much smaller weights. Hence, we simply use H as the benchmark to assign weights.
According to R estimation given by equations (15), we multiply the weight factors with each item of matrices A and S,
| (23) |
Similarity, to update µ using (16),
| (24) |
Since inliers have much larger weights, R and µ are mainly estimated with inliers.
4.2. 1-Point RANSAC Scheme
The core algorithm of R1PPnP is based on a randomly selected control point o. In outlier-free situations, our algorithm works with any control point. However, in situations with outliers, the control point o should be an inlier to make the algorithm work. Hence, we employ the 1-point RANSAC scheme to try different 3D-2D correspondences as the control point until the algorithm finds the correct solution. The 1-point RANSAC scheme combines the core algorithm naturally because the core algorithm can perform the computation with any control point o ϵ [1, N ]. We assume that 2D-3D correspondences have the same the possibility to be an inlier, without loss of generality, we select the control point o from the center of all image points to the outside. This is because we found that R1PPnP needs less iterations to converge when o is closer to the center, the details of which has been discussed in Section 3.3.
4.3. Algorithm Flow Chart
In general, the overall flow chart of R1PPnP is shown in Fig. 5, we first detect as many inliers as possible inside the RANSAC framework, then based on the detected inliers, we perform the R1PPnP algorithm without re-weighting mechanism to get more accurate results.
Fig. 5.

The overall flow chart of R1PPnP.
Appropriate termination conditions seek balance between speed and precision for RANSAC-based or iterative algorithms. As shown in Fig. 5, two types of termination conditions need to be specified for R1PPnP.
(1). RANSAC Termination Condition
The standard RANSAC termination condition [13] was employed for R1PPnP, that is
| (25) |
where p is the certainty and we use p = 0.99 for all RANSAC-based methods in this paper, trials is the number of RANSAC trials, pinliers = (maximum number of detected inliers) / (number of all points), s is the number of control points needed in each RANSAC trial. s = 1 for R1PPnP, and s = 3, 4 for RANSAC+P3P and P4P respectively.
During the RANSAC process, the camera pose estimated by conventional RANSAC+P3P or P4P methods is based on very small number of points. Because of image noise, the estimated pose varies with different inliers as the control points. This is especially serious when the image noise is large. To improve accuracy, the termination condition (25) suggests that the standard RANSAC scheme may continue looking for better results after finding a large percentage of inliers. In contrast, R1PPnP takes into account all points when estimating the pose, which makes it insensitive to the selected control point o. Therefore it is a reasonable assumption that when pinliers is large enough (we used the threshold of 60%), no improvement can be found and the RANSAC process of R1PPnP could be terminated. Accuracy evaluation results in this paper have testified the rationality of this assumption.
(2). Termination Conditions for R1PPnP Iterations
As shown in Fig. 5, we first detect as many inliers as possible and the related termination condition A is satisfied when the detected number of inliers becomes stable, that is,
| (26) |
where k is the index of iterations, Ninlier is the number of detected inliers. According to our experience, in most cases no more inliers would be detected if Ninlier has not increased in 20 iterations.
A good termination condition A should be able to stop the iteration process as early as possible when point o is an outlier, and do not interrupt when point o is an inlier. We proposed the termination condition A with a window size = 20 iterations to seek balance between speed and robustness. This termination condition is not based on the comparison of parameters of adjacent iterations because in R1PPnP, the dynamically updated weights wi may make the convergence process complex, especially when point o is an outlier. As shown in Fig. 6(a), with an outlier as the control point, may take many iterations to converge to zero, which is slow. With the change of detected number of inliers in a larger window size, the termination decision can be more robust and efficient, as shown in Fig. 6(b).
Fig. 6.

Experiments with synthetic data (ordinary 3D case, 50% of outliers) to demonstrate the iteration process of R1PPnP when the control point o is an outlier. Randomly colored lines are results with different control points. (a) The changes of estimated camera pose between frames k and k – 1 are complex during the iteration process, based on which it is difficult to decide when to stop the process. (b) It is more robust and efficient to stop the process when no more inliers can be detected.
The refinement stage makes pose estimation results more accurate based on the detected inliers. Without the reweighting mechanism, the convergence process is much simple. Hence the termination condition B is satisfied when the estimated rotation becomes stable, that is,
| (27) |
5. Experiments
The performance of the proposed R1PPnP algorithm was evaluated by comparing against the state-of-the-art PnP methods. The source code was implemented in MATLAB scripts and executed on a computer with an Intel Core i7 2.60 GHz CPU. We used both synthetic and real-world data to conduct evaluation experiments. The initial values for R1PPnP are R = diag {1, 1, 1} and µ = 1e – 4. RANSAC+P3P or P4P methods also used the standard termination condition (25).
5.1. Synthetic Experiments
Synthetic experiments in this paper shared the following parameters. The camera focal length is 1,000 pixels with a resolution of 640 × 480. Two types of synthetic data were generated. (1) Ordinary three-dimensional (3D) case: object points were randomly and uniformly distributed in a cube region [−2, 2] × [−2, 2] × [4, 8]. (2) Quasi-singular case: The distribution cube is [1, 2] × [1, 2] × [4, 8]. For each experiment result, we report the mean values of 100 trials.
For accuracy evaluation, the rotation error is measured in degrees between the truth rotation Rtrue and the estimated R as , where rk,true and rk are the kth column of Rtrue and R respectively. The translation error is .
5.1.1. Outlier-Free Synthetic Situations
Most PnP algorithms do not have the ability to handle outliers, and even a small percentage of outliers will significantly reduce the accuracy, as shown in Fig.7. Thus, although outlier-free situations are not the main concern of R1PPnP, we first conducted comparison experiments between the proposed R1PPnP and other PnP algorithms in outlier-free situations. The reason for this comparison is that RANSAC+P3P or P4P methods usually need other PnP methods as the final refinement step. Hence the accuracy and speed in outlier-free situations are also related to the performance in situations with outliers.
Fig. 7.

Except for REPPnP and R1PPnP, most PnP methods cannot handle outliers.
Here we only performed the core algorithm of R1PPnP without outliers handling mechanism. The termination condition for R1PPnP iterations was as Eq.(27). In this experiment, we compared our proposed R1PPnP with the following PnP methods: LHM [6], EPnP [4], RPnP [26], DLS [41], OPnP [5], ASPnP [42], SDP [25], PPnP [43], EPPnP [3] and REPPnP [3].
In our accuracy evaluation experiments, the number of points was 100 and we added different levels of Gaussian image noises from 0 to 10 pixels. As shown in Figs. 8 and 9, for both ordinary 3D and quasi-singular cases, R1PPnP gave the most accurate rotation estimation results together with OPnP and SDP. For ordinary 3D cases, R1PPnP was among the most accurate methods to estimate translation and was only sightly less accurate than OPnP. However, for quasi-singular cases, the accuracy of translation estimation of R1PPnP was not the state-of-the-art. ASPnP became unstable with large image noise hence its mean accuracy decreased significantly compared with that with small image noise. Although sometimes PPnP can provide accurate rotation estimation results in ordinary 3D cases, it also suffered from instability in some random cases, as shown by the jitter in Fig. 8. PPnP and LHM cannot handle the quasi-singular cases.
Fig. 8.

Accuracy with outlier-free synthetic data (ordinary 3D cases). Number of points was 100. Different levels of image noises were added.
Fig. 9.

Accuracy with outlier-free synthetic data (quasi-singular cases). Number of points was 100. Different levels of image noises were added. PPnP is out of range. The accuracy of all PnP methods decreased significantly compared with those in ordinary 3D cases, as shown in Fig.8.
To evaluate runtime, Gaussian image noise with a standard deviation of σ = 5 pixels was added and the number of points increased from 100 to 1000. As shown in Fig. 10, the proposed R1PPnP, together with EPPnP, REPPnP and ASPnP showed superior computational speed. The runtime of R1PPnP did not grow significantly with respect to the number of points. We suspect that this results from the intrinsic parallel optimization of the matrix computation of MATLAB 2014a.
Fig. 10.
Runtime results with outlier-free synthetic data. Standard deviation of image noise σ = 5 pixels. The number of points increased from 100 to 1000. (Left) Ordinary 3D cases. (Right) Quasi-singular cases.
Generally speaking, in outlier-free situations, R1PPnP was among the state-of-the-art methods in terms of both accuracy and computational speed. One drawback of R1PPnP is that the accuracy of translation estimation in quasi-singular cases was not among the best.
5.1.2. Synthetic Situations with Outliers
The main advantage of R1PPnP is that it is capable of handling a large percentage of outliers with a much faster speed than conventional methods. For demonstrating this, we introduced the following RANSAC-based PnP methods for comparison: (RANSAC+P3P [14]); (RANSAC + RP4P + RPnP [26]); (RANSAC + P3P [14] + ASPnP [42]); and (RANSAC + P3P [14] + OPnP [5]). According to evaluations in outlier-free situations, OPnP is the most accurate PnP method and ASPnP and RPnP are fast. We selected these methods as the final refinement step to fully demonstrate the performance of RANSAC+refinement-like methods. Another important method is REPPnP [3], which is the state-of-the-art PnP algorithm that addresses outliers.
The experiments were conducted as follows. Ninlier = 100 correct matches (inliers) between 3D object points and 2D image points were generated. Noutlier mismatches (outliers) were generated by randomly corresponding 3D and 2D points. The true percentage of outliers is . Gaussian image noise with a standard deviation of σ = 5 pixels was added. For R1PPnP and other RANSAC-based methods, the reprojection error threshold to distinguish inliers and outliers was H = 10 pixels.
As shown in Fig. 11, REPPnP began to fail when the percentage of outliers was larger than 50% with ordinary 3D cases, and only 5% with quasi-singular cases. R1PPnP and RANSAC-based methods were capable of handling situations with a large percentage of outliers. R1PPnP was more accurate than RANSAC-based methods for both rotation and translation estimation. Compared to other RANSAC-based methods, R1PPnP was much faster, especially when the percentage of outliers was large.
Fig. 11.

Average accuracy on synthetic data with outliers. (a)-(b) Accuracy with ordinary 3D cases; (c)-(d) Accuracy with quasi-singular cases.
5.2. Real-world Image Data
Our real-world experiments were conducted on the DTU robot image data 1 [44], which provides images and the related 3D point cloud obtained by structured light scan. The true values of rotations and translations are known. Images have a resolution of 800 × 600. Datasets numbered 1 to 30 were used. In each dataset, images were captured under 19 different illumination situations and from 119 camera positions. We selected 10 out of 19 illumination situations. Hence, a total of 30 × 10 × 119 = 35700 images were included in this evaluation. Following the instruction, for each dataset and illumination situation, we used the image numbered 25 as the reference image and performed SURF matching [7] between the reference image and other images. The inliers threshold was H = 5 pixels for all methods. With each image, we ran all algorithms 5 times and used the average value for the subsequent statistics.
As shown in Fig.14, the total number of correspondences and the percentage of outliers varied with objects, illumination situations and camera poses. Although clear comparisons require that only one factor is different, this kind of variable-controlling is difficult for PnP evaluation on real-world data because SURF matching results are unpredictable. In experiments we found that the performance of PnP algorithms were mainly affected by the percentage of outliers, rather than the total number of correspondences. Therefore in this section, we report the evaluation results by comparing the statistical results of PnP methods at each percentage of outliers range. Because the true number of inliers was unknown, for each image, algorithms detected inliers and we considered the maximum number of inliers as the ground truth.
Fig. 14.
Examples of images and R1PnP reprojection results. Green circles are all SURF correspondences and blue stars are the reprojected inliers detected by R1PPnP. First row: images with different illumination situations and the 3D point cloud. Second-third row: different data sets.
As shown in Fig. 13(a), as the percentage of outliers increased, the runtime of R1PPnP did not grow significantly compared with conventional RANSAC+P3P or P4P methods. When poutliers < 30%, R1PPnP was slower than pure RANSAC+P3P, but was much more accurate as shown in Fig. 13(a)(b). To improve accuracy, RANSAC+P3P needs other PnP methods, such as OPnP or ASPnP, as the final refinement step. Compared with other refinement PnP methods, R1PPnP was slightly less accurate than OPnP, which was the most accurate PnP method according to both synthetic and real-world experiments, but much faster even when the percentage of outliers was small.
Fig. 13.
Statistical results with real-world data. The x-axis is ranges of the percentage of outliers. (a) Rotation error. (b) Translation error. (c) Runtime. (d) The number of detected inliers compared with the maximum, zero suggests this method finds the most inliers.
Fig. 15 shows the histogram of the number of R1PPnP iterations on all 35700 images. As shown in Fig. 5, the iteration number includes iterations with the re-weighting mechanism that obtained the best results in RANSAC trials, and the subsequent refinement iterations without re-weighting. The average number of required iterations is 51.3.
Fig. 15.

Histogram of the number of iterations of R1PPnP in real-world experiments.
6. Conclusions
We present a fast and robust PnP solution named R1PPnP for tackling the outliers issue. We integrate a soft re-weighting method into an iterative PnP process to distinguish inliers and outliers, and employ the 1-point RANSAC scheme for selecting the control point. The number of trials is greatly reduced compared to conventional RANSAC+P3P or P4P methods; hence, it is much faster. Synthetic and real world experiments demonstrated its feasibility. Except for the good performance, another hidden advantage of R1PPnP is that its code implementation is relatively easy because all steps of R1PPnP involve only simple calculations. For example, its minima avoidance mechanism only requires to compute the determinant of the rotation matrix and to make λnew = 1/λold. The most appropriate situations to replace conventional RANSAC+P3P methods with R1PPnP is when the percentage of outliers and/or the image white noise is large. R1PPnP is more appropriate for large point clouds because of its low time complexity and the requirement to try control points. Future works involve the development of its extension for planar cases, and applying it in the SLAM system to handle outliers when a new frame is encountered.
Fig. 12.

Average runtime and number of required RANSAC samples with ordinary 3D cases synthetic data. We do not give the results with quasi-singular cases because they is very close to that with ordinary 3D cases. RANSAC+P3P or P4P needs more than 10 RANSAC trails when poutliers = 0 because the large image noise (σ = 5 pixels) usually makes P3P or P4P methods unable to find the correct pose with 3 or 4 inliers to satisify the termination condition Eq.(25). In contrast, the required number of RANSAC trials of R1PPnP is not sensitive to image noise because all points are taken into account.
Acknowledgment
This work is supported by NIH P41EB015898.
Biography

Haoyin Zhou is a research fellow at Surgical Planning Laboratory, Brigham and Women’s Hospital, Harvard Medical School. He received his B.S. and Ph.D. degrees from Tsinghua University, Beijing, China, in 2009, and 2014 respectively. His research interests includes PnP, structure-from-motion, 3D non-rigid dense reconstruction, learning-based segmentation and their applications in the surgical navigation field.

Tao Zhang (M’00-SM’11) was born in March 1969. He received his B.S., M.S. and Ph.D. degrees from Tsinghua University, Beijing, China, in 1993, 1995 and 1999 respectively. He received his second Ph.D. degree from Saga University, Saga, Japan, in 2002. He is currently a Professor and Deputy Head of the Department of Automation, School of Information Science and Technology, Tsinghua University, Beijing, China. He is the author or coauthor of more than 200 papers and three books. His current research includes robotics, image processing, control theory, artificial intelligent, navigation and control of spacecraft.

Jayender Jagadeesan received his B.S. degree from Indian Institute of Technology, Kanpur, India in 2003 and Ph.D. degree from The University of Western Ontario, London, ON, Canada in 2007. He is now an assistant professor at Harvard Medical School and research associate at Brigham and Women’s Hospital, Boston, M.A. His research interests include surgical navigation, medical image registration and segmentation, medical robotics and augmented reality.
Footnotes
Contributor Information
Haoyin Zhou, Email: zhouhaoyin@bwh.harvard.edu, Surgical Planning Laboratory, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, 02115, USA.
Tao Zhang, Email: taozhang@tsinghua.edu.cn, Department of Automation, School of Information Science and Technology, Tsinghua University, Beijing, China, 100086.
Jayender Jagadeesan, Email: jayender@bwh.harvard.edu, Surgical Planning Laboratory, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, 02115, USA.
References
- [1].Raul M-A, Montiel JMM, and Tardos JD, “ORB-SLAM: A versatile and accurate monocular slam system,” IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147–1163, 2015. [Google Scholar]
- [2].Müller M, Rassweiler M-C, Klein J et al. , “Mobile augmented reality for computer-assisted percutaneous nephrolithotomy,” vol. 8, no. 4, pp. 663–675, 2013. [DOI] [PubMed] [Google Scholar]
- [3].Ferraz L, Binefa X, and Moreno-Noguer F, “Very fast solution to the PnP problem with algebraic outlier rejection,” in CVPR, 2014, pp. 501–508.
- [4].Lepetit V, Moreno-Noguer F, and Fua P, “EPnP: An accurate O(n) solution to the pnp problem,” IJCV, vol. 81, no. 2, pp. 155–166, 2009. [Google Scholar]
- [5].Zheng Y, Kuang Y et al. , “Revisiting the PnP problem: A fast, general and optimal solution,” in ICCV, 2013, pp. 2344–2351.
- [6].Lu C-P, Hager GD, and Mjolsness E, “Fast and globally convergent pose estimation from video images,” IEEE TPAMI, vol. 22, no. 6, pp. 610–622, 2000. [Google Scholar]
- [7].Bay H, Tuytelaars T, and Van Gool L, “SURF: Speeded up robust features,” ECCV, pp. 404–417, 2006.
- [8].Leutenegger S, Chli M, and Siegwart RY, “BRISK: Binary robust invariant scalable keypoints,” in ICCV IEEE, 2011, pp. 2548–2555. [Google Scholar]
- [9].Rublee E, Rabaud V, Konolige K, and Bradski G, “ORB: An efficient alternative to SIFT or SURF,” in ICCV IEEE, 2011, pp. 2564–2571. [Google Scholar]
- [10].Nistér D, “A minimal solution to the generalised 3-point pose problem,” in CVPR, vol. 1 IEEE, 2004, pp. 67–79. [Google Scholar]
- [11].Gao X-S, Hou X-R, Tang J, and Cheng H-F, “Complete solution classification for the perspective-three-point problem,” IEEE TPAMI, vol. 25, no. 8, pp. 930–943, 2003. [Google Scholar]
- [12].Josephson K and Byrod M, “Pose estimation with radial distortion and unknown focal length,” in CVPR IEEE, 2009, pp. 2419–2426. [Google Scholar]
- [13].Fischler MA and Bolles RC, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981. [Google Scholar]
- [14].Kneip L, Scaramuzza D, and Siegwart R, “A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation,” in CVPR IEEE, 2011, pp. 2969–2976. [Google Scholar]
- [15].Bujnak M, Kukelova Z, and Pajdla T, “A general solution to the P4P problem for camera with unknown focal length,” in CVPR IEEE, 2008, pp. 1–8. [Google Scholar]
- [16].Hartley R and Zisserman A, Multiple view geometry in computer vision Cambridge university press, 2003. [Google Scholar]
- [17].DeMenthon D and Davis LS, “Exact and approximate solutions of the perspective-three-point problem,” IEEE TPAMI, vol. 14, no. 11, pp. 1100–1105, 1992. [Google Scholar]
- [18].Abidi MA and Chandra T, “A new efficient and direct solution for pose estimation using quadrangular targets: Algorithm and evaluation,” IEEE TPAMI, vol. 17, no. 5, pp. 534–538, 1995. [Google Scholar]
- [19].Horaud R, Conio B, Leboulleux O, and Lacolle B, “An analytic solution for the perspective-4-point problem,” Computer Vision, Graphics, and Image Processing, vol. 47, no. 1, pp. 33–44, 1989. [Google Scholar]
- [20].Quan L and Lan Z, “Linear n-point camera pose determination,” IEEE TPAMI, vol. 21, no. 8, pp. 774–780, 1999. [Google Scholar]
- [21].Wolfe WJ, Mathis D, Sklair CW, and Magee M, “The perspective view of three points,” IEEE TPAMI, vol. 13, no. 1, pp. 66–73, 1991. [Google Scholar]
- [22].Triggs B, “Camera pose and calibration from 4 or 5 known 3d points,” in ICCV, vol. 1 IEEE, 1999, pp. 278–284. [Google Scholar]
- [23].Ansar A and Daniilidis K, “Linear pose estimation from points or lines,” IEEE TPAMI, vol. 25, no. 5, pp. 578–589, 2003. [Google Scholar]
- [24].Fiore PD, “Efficient linear solution of exterior orientation,” IEEE TPAMI, vol. 23, no. 2, pp. 140–148, 2001. [Google Scholar]
- [25].Schweighofer G and Pinz A, “Globally optimal o(n) solution to the pnp problem for general camera models,” in BMVC, 2008, pp. 1–10.
- [26].Li S, Xu C, and Xie M, “A robust o(n) solution to the Perspective n-Point problem,” IEEE TPAMI, vol. 34, no. 7, pp. 1444–1450, 2012. [DOI] [PubMed] [Google Scholar]
- [27].Kneip L, Li H, and Seo Y, “UPnP: An optimal o(n) solution to the absolute pose problem with universal applicability,” in ECCV Springer, 2014, pp. 127–142. [Google Scholar]
- [28].Lowe DG, “Fitting parameterized three-dimensional models to images,” IEEE TPAMI, vol. 13, no. 5, pp. 441–450, 1991. [Google Scholar]
- [29].Dementhon DF and Davis LS, “Model-based object pose in 25 lines of code,” IJCV, vol. 15, no. 1, pp. 123–141, 1995. [Google Scholar]
- [30].Horaud R, Dornaika F, and Lamiroy B, “Object pose: The link between weak perspective, paraperspective, and full perspective,” IJCV, vol. 22, no. 2, pp. 173–189, 1997. [Google Scholar]
- [31].David P, Dementhon D, Duraiswami R, and Samet H, “Soft-POSIT: Simultaneous pose and correspondence determination,” International Journal of Computer Vision, vol. 59, no. 3, pp. 259–284, 2004. [Google Scholar]
- [32].Schweighofer G and Pinz A, “Robust pose estimation from a planar target,” IEEE TPAMI, vol. 28, no. 12, pp. 2024–2030, 2006. [DOI] [PubMed] [Google Scholar]
- [33].Havlena M, Torii A, and Pajdla T, “Efficient structure from motion by graph optimization,” ECCV, pp. 100–113, 2010.
- [34].Mur-Artal R and Tardós JD, “Fast relocalisation and loop closing in keyframe-based slam,” in ICRA IEEE, 2014, pp. 846–853. [Google Scholar]
- [35].Kahl F, Agarwal S, Chandraker MK, Kriegman D, and Belongie S, “Practical global optimization for multiview geometry,” IJCV, vol. 79, no. 3, pp. 271–284, 2008. [Google Scholar]
- [36].Ke Q and Kanade T, “Quasiconvex optimization for robust geometric reconstruction,” IEEE TPAMI, vol. 29, no. 10, 2007. [DOI] [PubMed] [Google Scholar]
- [37].Kneip L, Furgale P, and Siegwart R, “Using multi-camera systems in robotics: Efficient solutions to the npnp problem,” in ICRA IEEE, 2013, pp. 3770–3776. [Google Scholar]
- [38].Lee GH, Li B, Pollefeys M, and Fraundorfer F, “Minimal Solutions for the Multi-Camera Pose Estimation Problem,” IJRR, vol. 34, no. 7, pp. 837–848, 2015. [Google Scholar]
- [39].Arun KS, Huang TS, and Blostein SD, “Least-squares fitting of two 3D point sets,” IEEE TPAMI, no. 5, pp. 698–700, 1987. [DOI] [PubMed] [Google Scholar]
- [40].Simpson D, “Introduction to rousseeuw (1984) least median of squares regression,” pp. 433–461, 1997.
- [41].Hesch JA and Roumeliotis SI, “A direct least-squares (DLS) method for PnP,” in ICCV IEEE, 2011, pp. 383–390. [Google Scholar]
- [42].Zheng Y, Sugimoto S, and Okutomi M, “ASPnP: An accurate and scalable solution to the perspective-n-point problem,” IEICE Transactions on Information and Systems, vol. 96, no. 7, pp. 1525–1535, 2013. [Google Scholar]
- [43].Garro V, Crosilla F, and Fusiello A, “Solving the PnP problem with anisotropic orthogonal procrustes analysis,” in 3DIMPVT IEEE, 2012, pp. 262–269. [Google Scholar]
- [44].Aanæs H, Dahl AL, and Pedersen KS, “Interesting interest points,” IJCV, vol. 97, no. 1, pp. 18–35, 2012. [Google Scholar]






