Skip to main content
Sensors (Basel, Switzerland) logoLink to Sensors (Basel, Switzerland)
. 2018 Apr 10;18(4):1159. doi: 10.3390/s18041159

PL-VIO: Tightly-Coupled Monocular Visual–Inertial Odometry Using Point and Line Features

Yijia He 1,2,*, Ji Zhao 3, Yue Guo 1,2, Wenhao He 1, Kui Yuan 1
PMCID: PMC5948686  PMID: 29642648

Abstract

To address the problem of estimating camera trajectory and to build a structural three-dimensional (3D) map based on inertial measurements and visual observations, this paper proposes point–line visual–inertial odometry (PL-VIO), a tightly-coupled monocular visual–inertial odometry system exploiting both point and line features. Compared with point features, lines provide significantly more geometrical structure information on the environment. To obtain both computation simplicity and representational compactness of a 3D spatial line, Plücker coordinates and orthonormal representation for the line are employed. To tightly and efficiently fuse the information from inertial measurement units (IMUs) and visual sensors, we optimize the states by minimizing a cost function which combines the pre-integrated IMU error term together with the point and line re-projection error terms in a sliding window optimization framework. The experiments evaluated on public datasets demonstrate that the PL-VIO method that combines point and line features outperforms several state-of-the-art VIO systems which use point features only.

Keywords: sensor fusion, visual–inertial odometry, tightly-coupled, point and line features

1. Introduction

Localization and navigation have attracted much attention in recent years with respect to a wide range of applications, particularly for self-driving cars, service robots, and unmanned aerial vehicles, etc. Several types of sensors are utilized for localization and navigation, such as global navigation satellite systems (GNSSs) [1], laser lidar [2,3], inertial measurement units (IMUs), and cameras [4,5]. However, they have obvious respective drawbacks: GNSSs only provide reliable localization information if there is a clear sky view [6]; laser lidar suffers from a reflection problem for objects with glass surfaces [7]; measurements from civilian IMUs are noisy, such that inertial navigation systems may drift quickly due to error accumulation [8]; and monocular simultaneous localization and mapping (SLAM) can only recover the motion trajectory up to a certain scale and it tends to be lost when the camera moves fast or illumination changes dramatically [9,10,11]. As a result, sensor fusion methods, especially for visual–inertial navigation systems, have drawn widespread attention [12]. The acceleration and angular velocity information from an IMU can significantly improve the monocular SLAM system [13,14]. Furthermore, both IMUs and cameras are light-weight and low-cost, and as such they are widely used in civilian applications.

According to directly or indirectly fused measurements from sensors, visual–inertial odometry (VIO) systems can be divided into two main streams: loosely-coupled and tightly-coupled approaches. Loosely-coupled approaches [15,16] process images and IMU measurements by two estimators that estimate relative motion separately and fuse the estimates from two estimators to obtain the final result. Tightly-coupled approaches [17,18] use one estimator to find optimal estimates by fusing raw measurements from the camera and IMU directly. Compared to loosely-coupled approaches, tightly-coupled approaches are generally more accurate and robust. In this paper, the proposed PL-VIO method is a tightly-coupled VIO system. Related works on tightly-coupled VIO approaches can be categorized by the number of linearizations in the measurement model [14]. These approaches based on the extended Kalman filter (EKF) process a measurement only once in the updating step, while batch nonlinear optimization approaches linearize multiple times during the optimization step. Filtering-based approaches [19,20] integrate IMU measurements to propagate/predict the state, and then update/correct the latest state with visual measurements. Since the coordinates of three-dimensional (3D) landmarks are included in the state vector, the computational complexity of the EKF increases quadratically with the number of landmarks. To address this problem, Mourikis and Roumeliotis [21] proposed the multi-state constraint Kalman filter (MSCKF) which marginalizes out the landmark coordinates from the state vector. A drawback of this method is that the landmark measurements used to update the state need to be moved out of view of the camera, which means that not all the current visual measurements are used in the filter. Furthermore, the linearization errors make the filter inconsistent [14].

Optimization-based approaches obtain the optimal estimate by minimizing a joint nonlinear cost function with IMU measurement residuals and visual re-projection residuals. Thus, optimization-based approaches can repeat the linearization of a state vector at different points to achieve higher accuracy than filtering-based methods [14]. The IMU measurement constraints are computed by integrating IMU measurements between frames. However, the standard IMU integration method is closely connected with the initial IMU body state at the first frame. When the estimated state changes, all integrated IMU measurements need to be re-calculated. Lupton and Sukkarieh [22] proposed an IMU pre-integration technology which avoids such repeated integrations. IMU pre-integration has been widely used in optimization-based VIO [18,23,24]. Forster et al. [14] reformulated the IMU pre-integration by treating the rotation group on a manifold instead of using Euler angles. Liu et al. [13] proposed the continuous pre-integration method. Although optimization-based approaches have achieved high accuracy, computation becomes expensive with more and more landmarks being added into the optimization. OKVIS [18] used a first-in-last-out sliding window method for bound computation by marginalizing the measurements from the oldest state. Shen et al. [23] proposed a two-way marginalization to selectively marginalize the body state and landmarks.

Although significant achievements have been made in the VIO area, most VIO systems only use the point features as the visual information. However, point detection in textureless environments and point tracking in illumination-changing scenes are challenging [25,26]. In contrast, line segments are a proper alternative solution in these scenes. Additionally, line segments provide more structural information on the environment than points [27]. For visual-only SLAM, there are several works combining point and line features to estimate camera motion [28,29]. The simplest way to integrate line features in a SLAM system is to use two endpoints to represent the line. Matching the endpoints of a line from different views is difficult. Furthermore, the 3D spatial line only has four degrees-of-freedom (DoFs), while two 3D endpoints introduce six parameters, which results in over-parameterization. Bartoli and Sturm [30] proposed the orthonormal representation, which uses a three-DoF rotation matrix and a one-DoF rotation matrix to update the line parameters during optimization. The orthonormal representation has been used in some stereo visual SLAM systems [27,31]. For VIO approaches, Kottas and Roumeliotis [26] investigated the observability of the VIO using line features only. Kong et al. [25] built a stereo VIO system combining point and line features by utilizing trifocal geometry. However, these works involve filtering-based VIO. In our proposed PL-VIO method, we integrate line features into the optimization framework in order achieve higher accuracy than filtering-based methods.

To build a structural 3D map and obtain the camera’s motion, we propose the PL-VIO system, which optimizes the system states by jointly minimizing the IMU pre-integration constraints together with the point and line re-projection errors in sliding windows. Compared to the traditional methods which only use point features, our method utilizes the additional line feature, aiming to increase the robustness and accuracy in an illumination-changing environment. Our main contributions are as follows:

  • To the best of our knowledge, the proposed PL-VIO is the first optimization-based monocular VIO system using both points and lines as landmarks.

  • To tightly and efficiently fuse the information from visual and inertial sensors, we introduce a sliding window model with IMU pre-integration constraints and point/line features. To represent a 3D spatial line compactly in optimization, the orthonormal representation for a line is employed. All the Jacobian matrices of error terms with respect to IMU body states are derived for solving the sliding window optimization efficiently.

  • We compare the performances of the proposed PL-VIO with three state-of-the-art monocular VIO methods including ROVIO [17], OKVIS [18], and VINS-Mono [32] on both the EuRoc dataset and the PennCOSYVIO dataset, for which detailed evaluation results are reported.

The remainder of this paper is organized as follows. First, we describe the mathematical preliminaries in Section 2, and then formulate the sliding window-based visual–inertial fusion method in Section 3. Next, we describe our PL-VIO system and implementation details in Section 4. Section 5 shows the experimental results. Finally, conclusions and potential future works are given in Section 6.

2. Mathematical Formulation

2.1. Notations

Figure 1 illustrates the visual–inertial sensors, and the visual observations for point and line features. We denote ci as the camera frame at time t=i and bi as the IMU body frame at the same time. w is the Earth’s inertial frame. (·)c means the vector (·) is expressed in frame c. Quaternion qxy is used to rotate a vector from frame y to frame x, and the corresponding matrix form is Rxy. We use pxy to translate a vector from frame y to frame x. Quaternion qbc and vector pbc are the extrinsic parameters between the camera frame and the body frame, and these extrinsic parameters are known in the provided datasets or calibrated with the Kalibr calibration toolbox [33]. fj and Lj are the jth point landmark and the line landmark, respectively, in the map. z represents a measurement; specifically zfjci is the jth point feature observed by ith camera frame, and zbibj represents a pre-integrated IMU measurement between two keyframes.

Figure 1.

Figure 1

An illustration of visual–inertial sensors, point observations, and line observations. IMU: inertial measurement unit.

2.2. IMU Pre-Integration

A 6-axis IMU, including a 3-axis accelerometer and a 3-axis gyroscope, can measure the acceleration a and the angular velocity ω of the body frame with respect to the inertial frame [14]. The raw measurements ω^ and a^ are affected by bias and white noise:

ω^b=ωb+bgb+ngb (1)
a^b=Rbw(aw+gw)+bab+nab (2)

where bgb,bab and ngb,nab are the biases and white noises from gyroscope and accelerometer, respectively. gw=[0,0,g] is the gravity vector in frame w. We use the following kinematics for IMU-driven system [34]:

p˙wbt=vtw,v˙tw=atw,q˙wbt=qwbt012ωbt (3)

where ⊗ denotes the quaternion multiplication operation.

Given the IMU body state at time t=i, namely pwbi,viw,qwbj, and series values of ω and a during the duration t[i,j], we can obtain the body state at time t=j by integrating Equation (3):

pwbj=pwbi+viwΔt+t[i,j](Rwbtabtgw)δt2vjw=viw+t[i,j](Rwbtabtgw)δtqwbj=t[i,j]qwbt012ωbtδt (4)

where Δt is the time difference between i and j. In Equation (4), the body state propagation starts from the ith frame bi. When the state of bi is changed, we need to re-propagate all the measurements. Since body states are adjusted at each iteration during the optimization, Equation (4) is time-consuming. By decomposing qwbj to qwbiqbibt, Equation (4) can be written as:

pwbj=pwbi+viwΔt12gwΔt2+Rwbiαbibjvjw=viwgwΔt+Rwbiβbibjqwbj=qwbiqbibj (5)

where

αbibj=t[i,j](Rbibtabt)δt2βbibj=t[i,j](Rbibtabt)δtqbibj=t[i,j]qbibt012ωbtδt (6)

zbibj=[αbibj,βbibj,qbibj] are called pre-integration measurements [22] and can be calculated directly without knowing the body states of bi, which means when body state is changed the re-propagation is not necessary. We treat the pre-integrated measurements as constraint factors between successive keyframes.

The pre-integration model (Equation (6)) is derived from continuous time and neglects the biases and noises. In practice, IMU measurements are collected from discrete times, and noise should be considered. In this work, we use mid-point integration to integrate the IMU measurements. The IMU body propagation using measurements at discrete moments k and k+1 is calculated by:

ω^=12((ω^bkbgbk+ngbk)+(ω^bk+1bgbk+ngbk+1))q^bibk+1=q^bibk112ω^δta^=12(Rbibk(a^bkbabk+nabk)+Rbibk+1(a^bk+1babk+1+ngbk+1))α^bibk+1=α^bibk+β^bibkδt+12a^δt2β^bibk+1=β^bibk+a^δt (7)

At the beginning, k=i, we have qbibi=[0,0,0,1], and αbibi,βbibi are zero vectors. In Equation (7), in order to compute the pre-integration measurements efficiently, we assume biases are constant between two keyframes:

babk=babk+1,bgbk=bgbk+1,k[i,j1] (8)

In practice, biases change slowly. We model biases with random walk noises:

babk+1=babk+nbaδt,bgbk+1=bgbk+nbgδt (9)

where the Gaussian white noises are defined as nbaN(0,σba2) and nbgN(0,σbg2). When bias changes with a small increments, instead of computing pre-integrated measurements iteratively, we use a first-order approximation to update q^bibj,α^bibj,β^bibj [14]:

α^bibjα^bibj+Jbaiαδbabi+Jbgiαδbgbiβ^bibjβ^bibj+Jbaiβδbabi+Jbgiβδbgbiq^bibjq^bibj112Jbgiqδbgbi (10)

where Jbiaα=αbibjδbabi,Jbgiα=αbibjδbgbi,Jbaiβ=βbibjδbabi,Jbgiβ=βbibjδbgbi,Jbgiq=qbibjδbgbi are the Jacobian matrices of pre-integrated measurements with respect to bias at time i. They can be derived with error state transformation matrices, as shown in Appendix A. The covariance matrix of pre-integrated measurements Σbibj can be computed iteratively with IMU propagation, and more details are provided in Appendix A.

2.3. Geometric Representation of Line

A straight line only has four DoFs. Thus, the compact parameterization of a straight line is with four parameters. In our system, we treat a straight line in 3D space as an infinite line and adopt two parameterizations for a 3D line as in [27]. Plücker line coordinates consisting of six parameters are used for transformation and projection due to their simplicity. An orthonormal representation consisting of four parameters is used for optimization due to its compactness.

2.3.1. Plücker Line Coordinates

In Figure 2a, a 3D spatial line L in Plücker coordinates is represented by L=(n,d)R6, where dR3 is the line direction vector, and nR3 is the normal vector of the plane determined by the line and the coordinate origin. The Plücker coordinates are over-parameterized since there is an implicit constraint between the vector n and d, i.e., nd=0. Therefore, the Plücker coordinates can not be directly used in unconstrained optimization. However, with a 3D line represented by a normal vector and a direction vector it is simple to perform triangulation from two views geometrically, and it is also convenient to model the line geometry transformation.

Figure 2.

Figure 2

Plücker coordinates for line features. (a) Plücker line coordinates; (b) Initialization of a newly observed line.

For line geometry transformation, given the transformation matrix Tcw=Rcwpcw01 from the world frame w to the camera frame c, we can transform the Plücker coordinates of a line by [30]

Lc=ncdc=TcwLw=Rcw[pcw]×Rcw0RcwLw (11)

where [·]× is the skew-symmetric matrix of a three-dimensional vector, and Tcw is the transform matrix used to transform a line from frame w to frame c.

When a new line landmark is observed in two different camera views, the Plücker coordinates are easily calculated. As shown in Figure 2b, the 3D line L is captured by cameras c1 and c2 as zLc1 and zLc2, respectively. The line segment zLc1 in the normalized image plane can be represented by two endpoints, sc1=us,vs,1 and ec1=ue,ve,1. Three non-collinear points, including two endpoints of a line segment and the coordinate origin C=[x0,y0,z0], determine a plane π=[πx,πy,πz,πw] in 3D space:

πx(xx0)+πy(yy0)+πz(zz0)=0 (12)

where

πxπyπz=[sc1]×ec1,πw=πxx0+πyy0+πzz0 (13)

Given the two plane π1 and π2 in camera frame c1, the dual Plücker matrix L can be computed by

L=[d]×nn0=π1π2π2π1R4×4 (14)

The Plücker coordinates L in camera frame c1 are easily extracted from the dual Plücker matrix L. It can be seen that n and d do not need to be unit vectors.

2.3.2. Orthonormal Representation

Since 3D spatial lines only have four DoFs, the orthonormal representation (U,W)SO(3) × SO(2) is more suitable than Plücker coordinates during optimization. Additionally, the orthonormal representation and Plücker coordinates can be converted from each other, which means we can adopt both of them in a SLAM system for different purposes. In this section, we will introduce the details of orthonormal representation. As shown in Figure 2a, a coordinate system is defined on the 3D line. The normalized normal vector and the normalized direction vector are the two axes of the coordinate system. The third axis is determined by crossing the other two axes vectors. We can define the rotation matrix between the line coordinate and the camera frame as U:

U=Rψ=nnddn×dn×d (15)

where ψ=ψ1,ψ2,ψ3 consists of the rotation angles around the x-, y-, and z-axes of a camera frame. The relationship between the Plücker coordinates and U is:

nd=nnddn×dn×dn00d00 (16)

Since the combination of (n,d) in Equation (16) only has one DoF, we can use trigonometric functions to represent it:

W=cos(ϕ)sin(ϕ)sin(ϕ)cos(ϕ)=1(n2+d2)nddn (17)

where ϕ is a rotation angle. Recall that the distance from coordinate origin to the 3D line is d=nd, so W contains the information about the distance d. According to the definitions of U and W, these four DoFs include three DoFs from the rotation matrix, which transforms the line coordinate to the camera frame, and one DoF from the distance d. We use O=[ψ,ϕ] as the minimal representation of a 3D spatial line during optimization.

Once a 3D line L has been optimized with the orthonormal representation, the corresponding Plücker coordinates for the line can be computed by:

L=[w1u1T,w2u2T]T=1(n2+d2)L (18)

where ui is the ith column of matrix U, w1=cos(ϕ), and w2=sin(ϕ). There is a scale factor between L and L, but they represent the same 3D spatial line.

3. Tightly-Coupled Visual–Inertial Fusion

In visual SLAM, bundle adjustment is used to optimize the camera poses and 3D map points by minimizing the re-projection error in image planes. Bundle adjustment by nonlinear optimization can be treated as a factor graph [35] as shown in Figure 3a: round nodes are the camera poses or 3D landmarks needed to be optimized; edges with square boxes represent the visual measurements as constraints between nodes. For visual–inertial fusion, we can also use the tightly-coupled graph-based framework to optimize all the state variables of the visual–inertial system [23]. As shown in Figure 3b, the factor graph not only contains the visual measurements, but also takes the pre-integrated IMU measurements as edges to constrain the successive IMU body states.

Figure 3.

Figure 3

Factor graphs of (a) the visual simultaneous localization and mapping (SLAM) problem versus (b) visual–inertial SLAM.

3.1. Sliding Window Formulation

In order to achieve both computation efficiency and high accuracy, we use the sliding window formulation for factor graph optimization. The full state variables in a sliding window at time i are defined as:

X=xn,xn+1,,xn+N,λm,λm+1,,λm+M,Oo,Oo+1,,Oo+Oxi=pwbi,qwbi,viw,babi,bgbi,in,n+N (19)

where xi is described by the ith IMU body position, velocity, and orientation in the world frame, and biases in the IMU body frame. Subscripts n,m, and o are the start indexes of body states, point landmarks, and line landmarks, respectively. N is the number of keyframes in the sliding window. M and O are the numbers of point landmarks and line landmarks observed by all keyframes in the sliding window, respectively. We only use one variable, the inverse depth λk, to parameterize the kth point landmark from its first observed keyframe. Ol is the orthonormal representation of the lth line feature in the world frame w.

We optimize all the state variables in the sliding window by minimizing the sum of cost terms from all the measurement residuals:

minXρrpJpXΣp2+iBρrb(zbibi+1,X)Σbibi+12+(i,j)Fρrf(zfjci,X)Σfjci2+(i,l)Lρrl(zLici,X)ΣLici2 (20)

where rb(zbibi+1,X) is an IMU measurement residual between the body state xi and xi+1. B is the set of all pre-integrated IMU measurements in the sliding window. rf(zfjci,X) and rl(zLici,X) are the point feature re-projection residual and the line feature re-projection residual, respectively. F and L are the sets of point features and line features observed by camera frames. {rp,Jp} is prior information that can be computed after marginalizing out a frame in the sliding window [23], and Jp is the prior Jacobian matrix from the resulting Hessian matrix after the previous optimization. ρ is the Cauchy robust function used to suppress outliers.

We use Levenberg–Marquard algorithm to solve the nonlinear optimization problem (20). The optimal state estimates X can be found by iteratively update from an initial guess X0 as:

Xt+1=XtδX (21)

where ⊕ is the operator used to update parameters with increment δX. For position, velocity, bias, and inverse depth, the update operator and increments δ are easily defined:

p=p+δp,v=v+δv,λ=λ+δλ,b=b+δb (22)

However, the update operator and increments δ for state attitude q are more complicated. Four parameters are used in quaternion to represent the three-DoF rotation, so it is over-parameterized. The increment for rotation should only be three-dimensional. Similar to [18], we use a perturbation δθR3 in the tangent space as the rotation increment. Thus, rotation q can be updated by the quaternion multiplication:

q=qδq,δq112δθ (23)

We can also write it as a rotation matrix form:

RR(I+[δθ]×) (24)

where I is a 3×3 identity matrix. Similarly, we can update the orthonormal representation as:

UU(I+[δψ]×)WWI+0δϕδϕ0 (25)

The increment for the orthonormal representation is δO=[[δψ]×,δϕ]. Finally, the increment δX during the optimization can be defined as:

δX=δxn,δxn+1,,δxn+N,δλm,δλm+1,,δλm+M,δOo,δOo+1,,δOo+Oδxi=δp,δθ,δv,δbabi,δbgbi,in,n+N (26)

At each iteration, the increment δX can be solved by Equation (20):

(Hp+Hb+Hf+Hl)δX=(bp+bb+bf+bl) (27)

where Hp,Hb,Hf, and Hl are the Hessian matrices for prior residuals, IMU measurement residuals, and point and line re-projection residuals, respectively. For residual r(·), we have H(·)=J(·)Σ(·)1J(·) and b(·)=J(·)Σ(·)1r(·), where J(·) is the Jacobian matrix of residuals vector r(·) with respect to δX, and Σ(·) is the covariance matrix of measurements. Formulations of residuals and Jacobian matrices will be defined in the following subsections.

3.2. IMU Measurement Model

Since the pre-integrated IMU measurement computed by Equation (10) is a constraint factor between two successive keyframes bi and bj, the IMU measurement residual can be defined as:

rb(zbibj,X)=rprθrvrbarbg=Rbiw(pwbjpwbiviwΔt+12gwΔt2)α^bibj2[q^bjbi(qbiwqwbj)]xyzRbiw(vjwviw+gwΔt)β^bibjbabjbabibgbjbgbi15×1 (28)

where ·xyz extracts the real part of a quaternion which is used to approximate the three-dimensional rotation error [18].

During the nonlinear optimization, the Jacobian matrix of the IMU measurement residual with respect to the body state xi and xj is computed by:

Jb=rbδxirbδxjrbδxi=Rbiw[Rbiw(pwbjpwbiviwΔt+12gwΔt2)]×RbiwΔtJbaiαJbgiα0[qwbj1qwbi]L[q^bibj]R3×300Jbgirθ0[Rbiw(vjwviw+gwΔt)]×RbiwJbaiβJbgiβ000I00000I15×15rbδxj=Rbiw00000[q^bibj1qwbi1qwbj]L3×300000Rbiw00000I00000I15×15 (29)

where [q]L and [q]R are the left- and right- quaternion-product matrices, respectively [36]. The operator ·3×3 is used to extract a 3×3 matrix from the bottom right block of (·). The Jacobian matrix is calculated by Jbgirθ=rθδbgbi=[qwbj1qwbiqbibj]L3×3Jbgiq.

3.3. Point Feature Measurement Model

For point features, the re-projection error of a 3D point is the image distance between the projected point and the observed point. In this work, we deal with the point and line feature measurements in the normalized image plane. Given the kth point feature measurement at frame cj, zfkcj=[ufkcj,vfkcj,1], the re-projection error is defined as:

rf(zfkci,X)=xcjzcjufkcjycjzcjvfkcjfkcj=xcjycjzcj=Rbc(Rwbj(Rwbi((Rbc1λkufkcivfkci1+pbc)+pwbi)pwbj)pbc) (30)

where zfci=[ufkci,vfkci,1] is the first observation of the feature in camera frame ci, and the inverse depth of the feature λk is also defined in camera frame ci.

In order to minimize the point’s re-projection error (30), we need to optimize the rotation and the position of frame bi,bj, and the feature inverse depth λ. The corresponding Jacobian matrix can be obtained by the chain rule:

Jf=rffcjfcjxifcjxjfcjδλ (31)

With

rffcj=1zcj0xcj(zcj)201zcjycj(zcj)2fcjxi=RbcRwbjRbcRwbjRwbi[fbi]×0003×15fcjxj=RbcRwbjRbcT[fbj]×0003×15fcjδλ=1λRbcRwbjRwbiRbcfci (32)

where fbi is the 3D point vector in the ith IMU body frame. We define the covariance matrix of point measurement Σfkci as a 2×2 diagonal matrix by assuming that the detected point features have pixel noise on both the vertical and horizontal directions in the image plane.

3.4. Line Feature Measurement Model

The re-projection error of a line measurement is defined as the distance from endpoints to the projected line. For a pin-hole model camera, a 3D spatial line L=[n,d] can be projected to the camera image plane by [27]:

l=l1l2l3=Knc=fy000fx0fycxfxcyfxfync (33)

where K is the projection matrix for a line feature. When projecting a line to the normalized image plane, K is an identity matrix. From the projection Equation (33), the coordinates of the line segment projected by a 3D line are only related with the normal vector n.

Given a 3D line Llw and the orthonormal presentation Ol in a world frame, we firstly transform it to camera frame ci by Equation (11). Then, we project it to the image plane to get the projection line llci. The re-projection error in camera frame i is defined as

rl(zLlci,X)=d(slci,llci)d(elci,llci) (34)

With d(s,l) indicating the distance function from endpoint s to the projection line l:

d(s,l)=sll12+l22 (35)

The ith body state and lth line parameters are optimized by minimizing the line re-projection error rl(zLlci). The corresponding Jacobian matrix can be obtained by the chain rule:

Jl=rllcilciLciLciδxiLciLwLwδO (36)

With

rllci=l1(slci)l(l12+l22)(32)+us(l12+l22)(12)l2(slci)l(l12+l22)(32)+vs(l12+l22)(12)1(l12+l22)(12)l1(elci)l(l12+l22)(32)+ue(l12+l22)(12)l2(elci)l(l12+l22)(32)+ve(l12+l22)(12)1(l12+l22)(12)2×3lciLci=K03×6Lcδxi=Tbc1Rwb[dw]×03×3Tbc1[Rwb(nw+[dw]×pwb)]×Rwbdw]×0006×15LciLwLwδO=Twc10w1u3w1u2w2u1w2u30w2u1w1u26×4 (37)

The derivation details are provided in Appendix B. Similar to the point measurement covariance matrix, the covariance matrix of line measurement ΣLlci is defined by assuming the endpoints of a line segment have pixel noise.

4. Monocular Visual Inertial Odometry with Point and Line Features

As shown in Figure 4, the proposed PL-VIO system has two main modules: the front end and the back end. The front-end module is used to pre-process the measurements from IMU and camera. The back-end module is used to estimate and optimize the body states. We will introduce the details in the following subsections.

Figure 4.

Figure 4

Overview of our point–line visual–inertial odometry (PL-VIO) system. A is the front end module is used to extract information from the raw measurements; B is the back end module is used to estimate and optimize the body states.

4.1. Front End

The front end extracts information from the raw measurements of the camera and IMU. The body state is updated by propagation with each new IMU measurement, and the newest body state is used as the initial value in sliding window optimization. Additionally, the new IMU measurements are pre-integrated to constrain the successive IMU body states during optimization.

As for image processing, the point and line features are detected in two separate threads. When a new frame comes, the point features are tracked from the previous frame to the new frame by the KLT optical flow algorithm [37]. Then, we use the RANSAC framework with an essential matrix test to remove outliers. If the number of tracked point features is less than a threshold after outlier rejection, new corner features which are detected by the FAST detector [38] will be added. As to the line features, line segments in new frame are detected by the LSD detector [39] and matched with those in the previous frame by the appearance-based descriptor LBD [40]. We use geometric constraints to remove outliers of line matches. For example, the distance between the midpoints of two matched lines should be no more than δdistth pixels, and the angle difference should be no more than δangleth degrees. After the feature detection and matching, the point features and the endpoints of line segments are projected onto the normalized image plane. Additionally, a frame is selected as a new keyframe if the average parallax of the tracked point features is larger than a threshold.

4.2. Back End

In the back-end thread, the points and lines are firstly triangulated to build re-projection residuals. In order to get good landmark estimations, the inverse depth of a point feature is estimated with all the observations. For line triangulation, we only choose two frames with the furthest spatial distance in the sliding window to initialize the Plücker coordinates.

After we get the initial estimation of map points/lines and the IMU body state predicted from IMU measurements, the sliding window optimization described in Section 3 is used to find the optimal states. To limit the size of the state vector X, a two-way marginalization strategy is adopted to remove states from the sliding window [23]. When the second newest frame xn+N1 is a keyframe, we marginalize out the earliest frame xn with all its measurements. Otherwise, if the second newest frame is not a keyframe, we discard the visual measurements from this frame and reserve its IMU measurements in the pre-integration measurements. New prior information is gained based on the marginalized measurements, reserving the constraint information from the removed states.

Lastly, we cull the invalid map points and lines. If the inverse depth of a point is negative, we will delete this point from the map. If the re-projection residuals of a line exceed a threshold it will be removed from the map.

4.3. Implementation Details

To bootstrap the VIO system, we adopt the visual–inertial alignment method [41] to recover the scale, gravity vector, initial biases, and velocity of the IMU initial body state. The sliding window is with N=10 keyframes. Each frame has at most 150 point features, and 150 line segments. The thresholds used in line matching are set with δdistth=60 pixels and δangleth=30. Since the visual–inertial system has only four unobservable DoFs (the yaw direction and the absolute position), the optimization methods for six DoFs may introduce illusory information into the roll and pitch directions by automatically taking steps along these directions to minimize the cost function. After the sliding window optimization, we reset the body state by rotating it back with the increments along the roll and pitch directions. All the numerical optimizations are solved using the Levenberg–Marquardt method in the Ceres solver library [42]. The line detection and matching codes are provided by OpenCV 3 [43].

5. Experimental Results

We evaluated our PL-VIO system using two public benchmark datasets: the EuRoc MAV Dataset [44] and the PennCOSYVIO Dataset [45].The accuracy of the PL-VIO method is compared with that of three state-of-the-art monocular VIO methods to validate the advantages of the PL-VIO method: ROVIO [17] and OKVIS [18] in monocular mode, and VINS-Mono [32] without loop closure. ROVIO is a tightly-coupled form of VIO based on the extended Kalman filter (EKF). It directly uses the intensity errors from images to find the optimal state during the update step. OKVIS is a sliding window optimization algorithm with point features which can work with monocular or stereo modes. VINS-Mono is a complete VIO-SLAM system employing point features to optimize IMU body states in a sliding window, and performs loop closure. All of the experiments were performed on the computer with an Intel Core i7-6700HQ CPU with 2.60 GHz, 16 GB RAM, and the ROS Kinetic [46].

5.1. EuRoc MAV Dataset

The EuRoc micro aerial vehicle (MAV) datasets were collected by an MAV in two indoor scenes, which contain stereo images from a global shutter camera at 20FPS and synchronized IMU measurements at 200 Hz [44]. Each dataset provides a ground truth trajectory given by the VICON motion capture system. All the extrinsic and intrinsic parameters are also provided in the datasets. In our experiments, we only used the images from the left camera.

The main advantage of line features is that they provide significant geometry structure information with respect to the environment. As an example, we show the reconstructed map built by PL-VIO from the MH_05_difficult sequence in Figure 5. The four images in Figure 5a–d were captured by an MAV flying in a machine hall, which showed the room’s structure. As shown in Figure 5d, the line segment detection in the weak illumination scene was more robust than point feature detection. From the reconstructed 3D map, it can be seen that the geometry of the environment is described by the line segments, and thus semantic information could be extracted from the map. This is useful for robot navigation.

Figure 5.

Figure 5

Mapping results in the MH_05_difficult sequence. (ad) Detected point and line features for different frames. (e) The reconstructed line map (red) and the trajectory (green). The scenes with rich line features (such as the window, baluster, and cabinet) are clearly reconstructed from the map. (f) The reconstructed point map (blue). As compared to the line map, the structure is hard to identify in the point map.

For quantitative evaluation, we compared our PL-VIO with three state-of-the-art monocular visual–inertial methods: ROVIO [17], OKVIS [18] in monocular mode, and VINS-Mono [32] without loop closure. For the fair comparison, default parameters provided by the authors of these comparison methods were used. We chose the absolute pose error (APE) as the evaluation metric, which directly compared the trajectory error between the estimate and the ground truth [47]. The open-source package evo (https://michaelgrupp.github.io/evo/) provides an easy-to-use interface to evaluate the trajectories of odometry and SLAM algorithms. We employed this tool to evaluate these methods in this section. Table 1 shows the root mean square error (RMSE) of translation and rotation along all the trajectory, and their histograms are also provided as shown in Figure 6.

Table 1.

The root mean square error (RMSE) results on the several EuRoc MAV dataset. The translation (m) and rotation (rad) error are listed as follows. The numbers in bold represent the estimated trajectory is more close to the benchmark trajectory.

Seq. ROVIO OKVIS-Mono VINS-Mono PL-VIO
Trans. Rot. Trans. Rot. Trans. Rot. Trans. Rot.
MH_02_easy 0.59075 2.21181 0.36059 0.06095 0.25663 0.04802 0.23274 0.04204
MH_03_medium 0.40709 1.92561 0.21534 0.02622 0.11239 0.02027 0.11224 0.02016
MH_04_difficult 0.88363 2.30330 0.23984 0.01943 0.15366 0.02173 0.13942 0.01915
MH_05_difficult 1.17578 2.27213 0.39644 0.01987 0.30351 0.01038 0.25687 0.00892
V1_01_easy 0.18153 2.03399 0.08583 0.09665 0.05843 0.09995 0.05916 0.09869
V1_02_medium 0.19563 1.93652 0.12207 0.04073 0.06970 0.03022 0.07656 0.02871
V1_03_difficult 0.17091 2.02069 0.19613 0.06591 0.14531 0.08021 0.13016 0.04382
V2_02_medium 0.60686 1.84458 0.18253 0.04773 0.10218 0.04558 0.09450 0.04177
V2_03_difficult 0.18912 1.92380 0.30513 0.07527 0.26446 0.06162 0.26085 0.06098

Figure 6.

Figure 6

RMSEs for ROVIO, OKVIS-Mono, and Vins-Mono without loop closure, and for the proposed PL-VIO using the EuRoc dataset. (a) RMSEs in translation. (b) RMSEs in rotation. The rotation errors of ROVIO are one order of magnitude higher than those of other three methods, so its result is not reported in (b).

Table 1 shows that our PL-VIO which jointly optimizes point and lines provided the best performance on eight sequences for the rotation, except for V1_01_easy. Our method also performed the best on six sequences for the translation, except for V1_01_easy, V1_02_medium, and V1_03_difficult. However, the difference with respect to the best results was only at the submillimeter level. The results in Table 1 show that integrating line features into VIO could improve the accuracy of motion estimation. To demonstrate the results intuitively, several heat map of trajectories estimated by PL-VIO and VINS-Mono are shown in Figure 7. The redder the color is, the larger the translation error is. Comparing the three trajectories, we came to the conclusion that PL-VIO with line features gave smaller errors than VINS-Mono when the camera was moved with rapid rotation. Furthermore, we found that these sequences with rapid rotation caused large changes in the viewing direction, and the lighting conditions are especially challenging for tracking point features [25,26,28]. As shown in Figure 8, 27 point pairs (including 10 outliers) were matched successfully, while 33 lines were matched successfully and all the matches are correct.

Figure 7.

Figure 7

Comparison of the proposed method versus VINS-Mono. The three colorful trajectories of the left column are run with VINS-Mono on the (a) MH_05_difficult; (c) MH_04_difficult; and (e) V1_03_difficult sequences. The right three trajectories are the results of the proposed PL-VIO on the (b) MH_05_difficult; (d) MH_04_difficult; and (f) V1_03_difficult sequences. Colors encode the corresponding absolute pose errors.

Figure 8.

Figure 8

Tracked point/line features with rapid rotation in V1_03_difficult. Lighting conditions and view directions changed in successive frames. Matched point features are marked with the same color in frame (a,b). Matched line segment features are marked with the same color in frame (c,d).

Besides, the computation times of different methods are listed in Table 2. The computation efficiency of filter-based ROVIO was the highest, while its accuracy was the lowest. The proposed PL-VIO system was the most time-consuming method, but its computation time was mainly restricted by the line detection and matching step. In Section 5.3, the computation times of different modules in PL-VIO are independently estimated in the V1_03_difficult sequence, and it was found that the computation efficiency of the PL-VIO system directly depended on the line detection and matching.

Table 2.

Average running times of different methods on the EuRoc dataset (unit: millisecond).

Seq. ROVIO OKVIS-Mono VINS-Mono PL-VIO
MH_02_easy 15 31 63 127
MH_03_medium 15 30 62 112
MH_04_difficult 15 24 54 108
MH_05_difficult 15 27 58 102
V1_01_easy 14 26 45 93
V1_02_medium 15 23 37 86
V1_03_difficult 15 20 29 82
V2_02_medium 15 22 33 86
V2_03_difficult 15 21 27 85

5.2. PennCOSYVIO Dataset

The PennCOSYVIO dataset is a recent VIO benchmark that collects the synchronized data of a large glass building with hand-held equipment from outdoors to indoors (see Figure 9) [45]. Challenging factors include illumination changes, rapid rotations, and repetitive structures. All the intrinsic and extrinsic parameters as well as the ground truth trajectory are provided. We use data collected by the VI-sensor, which was also used in the EuRoc MAV datasets.

Figure 9.

Figure 9

Images from PennCOSYVIO dataset. Red lines are detected lines. (a) Outside the glass building. (b) Inside the glass building.

We compared the proposed PL-VIO with the VINS-Mono without loop closure. To evaluate fairly, the same parameters were used for PL-VIO and VINS-Mono. The same metric and evaluation tools used in Section 5.1 were employed here to evaluate the trajectories. Table 3 lists the results.

Table 3.

The RMSE of the absolute pose error (APE) for different algorithms. The numbers in bold represent the estimated trajectory is more close to the benchmark trajectory.

Algorithm Translation Error (m) Rotation Error (rad)
VINS-Mono 1.14690 0.04156
PL-VIO 1.05975 0.03742

Furthermore, the evaluation tool (https://daniilidis-group.github.io/penncosyvio/) is also provided in the PennCOSYVIO dataset, and adopts two metrics, the APE and relative pose error (RPE). For RPE, it expresses the errors in percentages by dividing the value with the path length [45]. The creators of PennCOSYVIO cautiously selected the evaluation parameters, so their tool is suited for evaluating VIO approaches in this dataset. Therefore, we adopted this evaluation tool in our experiments, and the results are listed in Table 4.

Table 4.

The results evaluated by PennCOSYVIO evaluation tool for different algorithms. The rotation errors for the APE and relative pose error (RPE) are expressed in degrees. The translation error is expressed in the x-axis, y-axis, and z-axis. The APE is expressed in meters, while the RPE is expressed in percentages (%). The numbers in bold represent the estimated trajectory is more close to the benchmark trajectory.

Algorithm APE RPE
x y z Rot. x y z Rot.
VINS-Mono 0.423 0.173 0.861 2.3477 2.807 1.844 4.663 1.9337
PL-VIO 0.524 0.070 0.769 2.0782 2.375 1.844 4.361 1.7350

From Table 3 and Table 4, it can be seen that the PL-VIO obtained the best performance for the rotation part. The APE of translation evaluated by PennCOSYVIO tool provided more details. Compared to VINS-Mono, PL-VIO gave smaller errors in the y-axis and z-axis, and a smaller error summation of the three axes. VINS-Mono obtained better performance only in the x-axis.

5.3. Computing Time

Finally, we evaluated the average execution time of our PL-VIO running at the V1_02_medium sequence because this image sequence was collected from a typical indoor scene. Table 5 shows the execution time of each block. We can see that line detection and matching, which runs at 11 Hz in the front end, is the bottleneck in terms of efficiency. State-of-the-art line detection and matching methods, such as the combination of LSD and LBD, are not satisfactory for VIO/SLAM systems. Note that our method is independent of line feature detection and matching, so improving their efficiency is beyond the scope of this paper. Marginalization in the back end is another time-consuming part. We observe that the inefficiency of marginalization is caused by the fill-in when marginalizing out features, which makes the Hessian matrix become a less sparse matrix. This problem can be potentially solved by discarding some features when performing marginalization to maintain a sparse Hessian matrix [18].

Table 5.

Mean execution time of PL-VIO run with the V1_02_medium sequence.

Module Operation Times (ms) Rate (Hz) Thread ID
front end point feature detection and matching 4 25 1
line feature detection and matching 86 11 2
IMU forward propagation 1 100 3
back end nonlinear optimization 28 15 4
marginalization 35 15 4
feature triangulation and culling 2 15 4

6. Conclusions

This paper presents the novel tightly-coupled monocular vision-inertial odometry algorithm PL-VIO, which optimizes the system states in a sliding window with both point and line features. The proposed PL-VIO system has two main modules: the front end and the back end. The front-end module is used to propagate IMU body state, and detect and match point/line features. The back-end module is used to estimate and optimize the body states. In the back-end module, a line landmark is considered as an infinite 3D spatial line and its orthonormal representation is employed to parameterize it compactly during optimization. Furthermore, all the Jacobian matrices of error terms are given in detail for solving the sliding window optimization efficiently. We also provide the evaluation results of the proposed PL-VIO as compared to three state-of-the-art monocular VIO methods including ROVIO [17], OKVIS [18], and VINS-Mono [32] on both the EuRoc dataset and PennCOSYVIO dataset. According to the analysis and results, two further conclusions are as follows:

  1. The reconstructed 3D map with line features can provide geometrical information with respect to the environment, and thus semantic information could be extracted from the map. This is useful for robot navigation.

  2. Line features can improve the system accuracy both for translation and rotation, especially in illumination-changing scenes. However, the line detection and matching are time-consuming and become the bottlenecks in the efficiency of the system.

In the future, we plan to improve our system by introducing the structural constraints between 3D spatial lines, such as parallel or coplanar lines in Manhattan-world scenes [48]. Geometric constraints among these lines have the potential to further improve localization precision and reduce rotation accumulation errors.

Acknowledgments

This research work was supported by the National Natural Science Foundation of China (Grant No. 61421004). We would like to thank Yang Ding for testing line matching methods. Finally, Yijia He would like to thank, in particular, the support received from Qiang Tang and Fangbo Qin.

Appendix A

The IMU error state propagtion equation can be defined as [49]:

δαbk+1bk+1δθbk+1bk+1δβbk+1bk+1δbabk+1δbgbk+1=Fkδαbkbkδθbkbkδβbkbkδbabkδbgbk+Gknabkngbknabk+1ngbk+1nbanbg (A1)

With

Fk=If12Iδt14(qbibk+qbibk+1)δt2f150I[ω]×00Iδt0f32I12(qbibk+qbibk+1)δtf35000I00000I (A2)
Gk=14qbibkδt2g1214qbibk+1δt2g1400012δt012δt0012qbibkδtg3212qbibk+1δtg34000000δt000000δt (A3)
f12=αbibk+1δθbkbk=14(Rbibk[abkbabk]×δt2+Rbibk+1[(abkbabk)]×(I[ω]×δt)δt2)f32=βbibk+1δθbkbk=12(Rbibk[abkbabk]×δt+Rbibk+1[(abkbabk)]×(I[ω]×δt)δt)f15=αbibk+1δbgbk=14(Rbibk+1[(abkbabk)]×δt2)(δt)f35=βbibk+1δbgbk=12(Rbibk+1[(abkbabk)]×δt)(δt)g12=αbibk+1ngbk=g14=αbibk+1ngbk+1=14(Rbibk+1[(abkbabk)]×δt2)(12δt)g32=βbibk+1ngbk=g34=βbibk+1ngbk+1=12(Rbibk+1[(abkbabk)]×δt2)(12δt) (A4)

We can define the error state vector with ηk+1=[δαbk+1bk+1,δθbk+1bk+1,δβbk+1bk+1,δbabk+1,δbgbk+1]. The noise vector is nk=[nabk,ngbk,nabk+1,ngbk+1,nba,nbg,]. Equation (A1) can be written in compact matrix form as:

ηk+1=Fkηk+Gknk (A5)

The pre-integrated measurements covariance can be computed iteratively based on the linear model (A5) [14]:

Σbibk+1=FkΣbibkFk+GkΣnGk (A6)

where Σn is the covariance of the raw IMU measurements, and the initial covariance is Σbibi=015×15. Also we can compute the Jacobian matrix of pre-integrated measurements zbibj with respect to error state ηi iteratively with [49]:

Jik+1=FkJik (A7)

With Jii=I. These Jacobian matrices given in Equation (10) can be extracted from Jij.

Appendix B

The Jacobian matrix of the line re-projection error respect to the orthonormal representation is:

rll=r1l1r1l2r1l3r2l1r2l2r2l3=l1sll(l12+l22)(32)+us(l12+l22)(12)l2sll(l12+l22)(32)+vs(l12+l22)(12)1(l12+l22)(12)l1ell(l12+l22)(32)+ue(l12+l22)(12)l2ell(l12+l22)(32)+ve(l12+l22)(12)1(l12+l22)(12)2×3 (A8)
LwδO=Lwu1Lwu2Lww1Lww26×8u1δψu1δϕu2δψu2δϕw1δψw1δϕw2δψw2δϕ8×4=w1I3×303×3u103×103×3w2I3×303×1u26×80u3u20u30u10000w2000w18×4=0w1u3w1u2w2u1w2u30w2u1w1u26×4 (A9)

To compute the line re-projection error, a spatial line in world frame w is transformed to the body frame b firstly, and then transformed to the camera frame c with the extrinsic parameters Tbc.

Lc=Tbc1Twb1Lw=Tbc1Rwb(nw+[dw]×pwb)Rwbdw6×1 (A10)

The Jacobian matrix of the line re-projection error with respect to rotation of the ith IMU body state is:

Lcδθbb=Tbc1(I[δθbb]×)Rwb(nw+[dw]×pwb)δθbb(I[δθbb]×)Rwbdwδθbb=Tbc1[Rwb(nw+[dw]×pwb)]×Rwbdw]×6×3 (A11)

The Jacobian matrix of the line re-projection error with respect to position of the ith IMU body state is as follows:

Lcδpbb=Tbc1Rwb(nw+[dw]×(pwb+δpbb))δpbbRwbdwδpbb=Tbc1Rwb[dw]×06×3 (A12)

Author Contributions

Yijia He and Ji Zhao conceived and designed the alogrithm; Yijia He performed the experiments, analyzed the data, and drafted the paper; Yue Guo and Wenhao He contributed analysis tools; Ji Zhao and Kui Yuan revised the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  • 1.Groves P.D. Principles of GNSS, Inertial, and Multisensor Integrated Navigation Systems. Artech House; Norwood, MA, USA: 2013. [Google Scholar]
  • 2.Martínez J.L., Morán M., Morales J., Reina A.J., Zafra M. Field Navigation Using Fuzzy Elevation Maps Built with Local 3D Laser Scans. Appl. Sci. 2018;8:397. doi: 10.3390/app8030397. [DOI] [Google Scholar]
  • 3.Hess W., Kohler D., Rapp H., Andor D. Real-time loop closure in 2D LIDAR SLAM; Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA); Stockholm, Sweden. 16–21 May 2016; pp. 1271–1278. [Google Scholar]
  • 4.Liu L., Mei T., Niu R., Wang J., Liu Y., Chu S. RBF-Based Monocular Vision Navigation for Small Vehicles in Narrow Space below Maize Canopy. Appl. Sci. 2016;6:182. doi: 10.3390/app6060182. [DOI] [Google Scholar]
  • 5.Valiente D., Gil A., Payá L., Sebastián J.M., Reinoso Ó. Robust Visual Localization with Dynamic Uncertainty Management in Omnidirectional SLAM. Appl. Sci. 2017;7:1294. doi: 10.3390/app7121294. [DOI] [Google Scholar]
  • 6.Borraz R., Navarro P.J., Fernández C., Alcover P.M. Cloud Incubator Car: A Reliable Platform for Autonomous Driving. Appl. Sci. 2018;8:303. doi: 10.3390/app8020303. [DOI] [Google Scholar]
  • 7.Wang X., Wang J. Detecting glass in simultaneous localisation and mapping. Robot. Auton. Syst. 2017;88:97–103. doi: 10.1016/j.robot.2016.11.003. [DOI] [Google Scholar]
  • 8.Titterton D., Weston J.L. Strapdown Inertial Navigation Technology. Volume 17 The Institution of Engineering and Technology; Stevenage, UK: 2004. [Google Scholar]
  • 9.Cadena C., Carlone L., Carrillo H., Latif Y., Scaramuzza D., Neira J., Reid I., Leonard J.J. Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Trans. Robot. 2016;32:1309–1332. doi: 10.1109/TRO.2016.2624754. [DOI] [Google Scholar]
  • 10.Mur-Artal R., Tardós J.D. ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 2017;33:1255–1262. doi: 10.1109/TRO.2017.2705103. [DOI] [Google Scholar]
  • 11.Engel J., Koltun V., Cremers D. Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 2018;40:611–625. doi: 10.1109/TPAMI.2017.2658577. [DOI] [PubMed] [Google Scholar]
  • 12.Gui J., Gu D., Wang S., Hu H. A review of visual inertial odometry from filtering and optimisation perspectives. Adv. Robot. 2015;29:1289–1301. doi: 10.1080/01691864.2015.1057616. [DOI] [Google Scholar]
  • 13.Liu Y., Chen Z., Zheng W., Wang H., Liu J. Monocular Visual-Inertial SLAM: Continuous Preintegration and Reliable Initialization. Sensors. 2017;17:2613. doi: 10.3390/s17112613. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Forster C., Carlone L., Dellaert F., Scaramuzza D. On-Manifold Preintegration for Real-Time Visual–Inertial Odometry. IEEE Trans. Robot. 2017;33:1–21. doi: 10.1109/TRO.2016.2597321. [DOI] [Google Scholar]
  • 15.Weiss S., Siegwart R. Real-time metric state estimation for modular vision-inertial systems; Proceedings of the 2011 IEEE International Conference on Robotics and Automation (ICRA); Shanghai, China. 9–13 May 2011; pp. 4531–4537. [Google Scholar]
  • 16.Kneip L., Weiss S., Siegwart R. Deterministic initialization of metric state estimation filters for loosely-coupled monocular vision-inertial systems; Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); San Francisco, CA, USA. 25–30 September 2011; pp. 2235–2241. [Google Scholar]
  • 17.Bloesch M., Burri M., Omari S., Hutter M., Siegwart R. Iterated extended Kalman filter based visual-inertial odometry using direct photometric feedback. Int. J. Robot. Res. 2017;36:1053–1072. doi: 10.1177/0278364917728574. [DOI] [Google Scholar]
  • 18.Leutenegger S., Lynen S., Bosse M., Siegwart R., Furgale P. Keyframe-based visual–inertial odometry using nonlinear optimization. Int. J. Robot. Res. 2015;34:314–334. doi: 10.1177/0278364914554813. [DOI] [Google Scholar]
  • 19.Jones E.S., Soatto S. Visual-inertial navigation, mapping and localization: A scalable real-time causal approach. Int. J. Robot. Res. 2011;30:407–430. doi: 10.1177/0278364910388963. [DOI] [Google Scholar]
  • 20.Bloesch M., Omari S., Hutter M., Siegwart R. Robust visual inertial odometry using a direct EKF-based approach; Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); Hamburg, Germany. 28 September–2 October 2015; pp. 298–304. [Google Scholar]
  • 21.Mourikis A.I., Roumeliotis S.I. A multi-state constraint Kalman filter for vision-aided inertial navigation; Proceedings of the 2007 IEEE International Conference on Robotics and Automation; Roma, Italy. 10–14 April 2007; pp. 3565–3572. [Google Scholar]
  • 22.Lupton T., Sukkarieh S. Visual-inertial-aided navigation for high-dynamic motion in built environments without initial conditions. IEEE Trans. Robot. 2012;28:61–76. doi: 10.1109/TRO.2011.2170332. [DOI] [Google Scholar]
  • 23.Shen S., Michael N., Kumar V. Tightly-coupled monocular visual-inertial fusion for autonomous flight of rotorcraft MAVs; Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA); Seattle, WA, USA. 26–30 May 2015; pp. 5303–5310. [Google Scholar]
  • 24.Mur-Artal R., Tardós J.D. Visual-inertial monocular SLAM with map reuse. IEEE Robot. Autom. Lett. 2017;2:796–803. doi: 10.1109/LRA.2017.2653359. [DOI] [Google Scholar]
  • 25.Kong X., Wu W., Zhang L., Wang Y. Tightly-coupled stereo visual-inertial navigation using point and line features. Sensors. 2015;15:12816–12833. doi: 10.3390/s150612816. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Kottas D.G., Roumeliotis S.I. Efficient and consistent vision-aided inertial navigation using line observations; Proceedings of the 2013 IEEE International Conference on Robotics and Automation (ICRA); Karlsruhe, Germany. 6–10 May 2013; pp. 1540–1547. [Google Scholar]
  • 27.Zhang G., Lee J.H., Lim J., Suh I.H. Building a 3-D Line-Based Map Using Stereo SLAM. IEEE Trans. Robot. 2015;31:1364–1377. doi: 10.1109/TRO.2015.2489498. [DOI] [Google Scholar]
  • 28.Pumarola A., Vakhitov A., Agudo A., Sanfeliu A., Moreno-Noguer F. PL-SLAM: Real-time monocular visual SLAM with points and lines; Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA); Singapore. 29 May–3 June 2017; pp. 4503–4508. [Google Scholar]
  • 29.Gomez-Ojeda R., Moreno F.A., Scaramuzza D., Gonzalez-Jimenez J. PL-SLAM: A Stereo SLAM System through the Combination of Points and Line Segments. arXiv. 2017. 1705.09479
  • 30.Bartoli A., Sturm P. The 3D line motion matrix and alignment of line reconstructions. Int. J. Comput. Vis. 2004;57:159–178. doi: 10.1023/B:VISI.0000013092.07433.82. [DOI] [Google Scholar]
  • 31.Zuo X., Xie X., Liu Y., Huang G. Robust Visual SLAM with Point and Line Features. arXiv. 2017. 1711.08654
  • 32.Qin T., Li P., Shen S. VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator. arXiv. 2017. 1708.03852
  • 33.Furgale P., Rehder J., Siegwart R. Unified temporal and spatial calibration for multi-sensor systems; Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); Tokyo, Japan. 3–7 November 2013; pp. 1280–1286. [Google Scholar]
  • 34.Kok M., Hol J.D., Schön T.B. Using inertial sensors for position and orientation estimation. arXiv. 2017. 1704.06053
  • 35.Kaess M., Johannsson H., Roberts R., Ila V., Leonard J.J., Dellaert F. iSAM2: Incremental smoothing and mapping using the Bayes tree. Int. J. Robot. Res. 2012;31:216–235. doi: 10.1177/0278364911430419. [DOI] [Google Scholar]
  • 36.Sola J. Quaternion kinematics for the error-state Kalman filter. arXiv. 2017. 1711.02508
  • 37.Lucas B.D., Kanade T. An iterative image registration technique with an application to stereo vision; Proceedings of the 7th International Joint Conference on Artificial Intelligence (IJCAI); Vancouver, BC, Canada. 24–28 August 1981. [Google Scholar]
  • 38.Rosten E., Porter R., Drummond T. Faster and better: A machine learning approach to corner detection. IEEE Trans. Pattern Anal. Mach. Intell. 2010;32:105–119. doi: 10.1109/TPAMI.2008.275. [DOI] [PubMed] [Google Scholar]
  • 39.Von Gioi R.G., Jakubowicz J., Morel J.M., Randall G. LSD: A fast line segment detector with a false detection control. IEEE Trans. Pattern Anal. Mach. Intell. 2010;32:722–732. doi: 10.1109/TPAMI.2008.300. [DOI] [PubMed] [Google Scholar]
  • 40.Zhang L., Koch R. An efficient and robust line segment matching approach based on LBD descriptor and pairwise geometric consistency. J. Vis. Commun. Image Represent. 2013;24:794–805. doi: 10.1016/j.jvcir.2013.05.006. [DOI] [Google Scholar]
  • 41.Yang Z., Shen S. Monocular visual–inertial state estimation with online initialization and camera–imu extrinsic calibration. IEEE Trans. Autom. Sci. Eng. 2017;14:39–51. doi: 10.1109/TASE.2016.2550621. [DOI] [Google Scholar]
  • 42.Agarwal S., Mierle K. Ceres Solver. [(accessed on 9 April 2018)]; Available online: http://ceres-solver.org.
  • 43.Kaehler A., Bradski G. Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library. O’Reilly Media, Inc.; Sebastopol, CA, USA: 2016. [Google Scholar]
  • 44.Burri M., Nikolic J., Gohl P., Schneider T., Rehder J., Omari S., Achtelik M.W., Siegwart R. The EuRoC micro aerial vehicle datasets. Int. J. Robot. Res. 2016;35:1157–1163. doi: 10.1177/0278364915620033. [DOI] [Google Scholar]
  • 45.Pfrommer B., Sanket N., Daniilidis K., Cleveland J. PennCOSYVIO: A challenging visual inertial odometry benchmark; Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA); Singapore. 29 May–3 June2017; pp. 3847–3854. [Google Scholar]
  • 46.Quigley M., Conley K., Gerkey B., Faust J., Foote T., Leibs J., Wheeler R., Ng A.Y. ROS: An Open-Source Robot Operating System. ICRA; Kobe, Japan: 2009. p. 5. ICRA Workshop on Open Source Software. [Google Scholar]
  • 47.Sturm J., Engelhard N., Endres F., Burgard W., Cremers D. A benchmark for the evaluation of RGB-D SLAM systems; Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); Vilamoura, Portugal. 7–12 October 2012; pp. 573–580. [Google Scholar]
  • 48.Lu Y., Song D., Yi J. High level landmark-based visual navigation using unsupervised geometric constraints in local bundle adjustment; Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA); Hong Kong, China. 31 May–7 June 2014; pp. 1540–1545. [Google Scholar]
  • 49.Yang Z., Shen S. Tightly-Coupled Visual–Inertial Sensor Fusion Based on IMU Pre-Integration. Hong Kong University of Science and Technology; Hong Kong, China: 2016. Technical Report. [Google Scholar]

Articles from Sensors (Basel, Switzerland) are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES