High Precision Calibration Algorithm for Binocular Stereo Vision Camera using Deep Reinforcement Learning

Jie Ren; Fuyu Guan; Tingting Wang; Baoshan Qian; Chunlin Luo; Guoliang Cai; Ce Kan; Xiaofeng Li

doi:10.1155/2022/6596868

. 2022 Mar 31;2022:6596868. doi: 10.1155/2022/6596868

High Precision Calibration Algorithm for Binocular Stereo Vision Camera using Deep Reinforcement Learning

Jie Ren ¹, Fuyu Guan ¹, Tingting Wang ², Baoshan Qian ³, Chunlin Luo ¹, Guoliang Cai ⁴, Ce Kan ¹, Xiaofeng Li ^5,^✉

PMCID: PMC8989564 PMID: 35401726

Abstract

Camera calibration is the most important aspect of computer vision research. To address the issue of insufficient precision, therefore, a high precision calibration algorithm for binocular stereo vision camera using deep reinforcement learning is proposed. Firstly, a binocular stereo camera model is established. Camera calibration is mainly divided into internal and external parameter calibration. Secondly, the internal parameter calibration is completed by solving the antihidden point of the camera light center and the camera distortion value of the camera plane. The deep learning fitting value function is used based on the internal parameters. The target network is established to adjust the parameters of the value function, and the convergence of the value function is calculated to optimize reinforcement learning. The deep reinforcement learning fitting structure is built, the camera data is entered, and the external parameter calibration is finished by continuous updating and convergence. Finally, the high precision calibration of the binocular stereo vision camera is completed. The results show that the calibration error of the proposed algorithm under different sizes of checkerboard calibration board test is only 0.36% and 0.35%, respectively, the calibration accuracy is high, the value function converges quickly, and the parameter calculation accuracy is high, the overall time consumption of the proposed algorithm is short, and the calibration results have strong stability.

1. Introduction

At the moment, computer vision is a hot research field. It is widely used in various fields and is particularly useful in UAV visual positioning, robot navigation, and other areas [1, 2]. Binocular stereo vision is based on the premise of mimicking human vision, and it employs two cameras to complete visual measurement using parallax calculations. It offers numerous advantages, including noncontact, high precision, and great concealment. It is capable of meeting people's growing measuring and detecting requirements. Therefore, binocular stereo vision has a promising application future [3]. High precision camera calibration is one of the keys to ensuring the effective functioning of a binocular stereo vision system. As a result, it is vital to investigate the high precision calibration of binocular stereo vision cameras.

The primary goal of binocular stereo vision calibration is to calculate the internal parameters and spatial position parameters of the camera, as well as to determine the correlation between two-dimensional coordinates and three-dimensional coordinates [4], thereby ensuring the accuracy of the vision system measurement. Traditional camera calibration and self-calibration are the two main types of extant camera calibration technologies. The traditional camera calibration method computes the camera's internal characteristics based on a predetermined model and appearance data such as target size. This calibration method has the drawbacks of being difficult to use and being extremely dependent on the equipment. This calibration method has the disadvantages of complex operation and high equipment dependence. The self-calibration of cameras does not need external help, but it only calculates the camera parameters through the feature point data between the target images. Despite the fact that this method is easier than traditional camera calibration, the calibration accuracy is low [5, 6]. Therefore, this paper proposes a high precision calibration algorithm for binocular stereo vision camera using deep reinforcement learning and attempts to combine deep learning and reinforcement learning to give a new concept for binocular stereo vision camera calibration. The main contributions of this paper are as follows: (1) the camera distortion is considered when calculating the internal parameters of the camera to improve the calculation accuracy of the internal parameters; (2) the external parameters of the camera are calculated using deep reinforcement learning algorithm, which fully utilizes the advantages of deep learning and reinforcement learning; (3) the proposed algorithm can effectively complete the high precision camera calibration and has a specific application.

2. Related Work

In the field of computer science, binocular stereo vision is a hot topic. The method of camera calibration has been proposed by a number of academics both at home and abroad. Literature [7] proposed an alternative adjustment-based camera calibration algorithm for binocular stereo vision systems, established a binocular vision calibration system with left and right camera coordinates as reference coordinates, and optimized the internal parameters of the two cameras through alternating adjustment experiments to achieve the best value. The optimal distortion parameters and internal and external parameters are then obtained by optimizing all internal and external parameters although the algorithm's convergence time is slow. The deep learning is updated using the projection vector of feature points, and the best translation vector is found using the projection vector of feature points. Literature [8] used the singular value decomposition approach to calculate the relative attitude matrix during the absolute azimuth interpretation stage. The posture estimation problem of a stereo vision measuring system based on feature points is solved, and stereo vision is expanded. In the image, just one pose parameter from the two collected images is optimized. The algorithm is designed in such a way that it does not effectively increase camera calibration accuracy. Literature [9] established and calibrated a heterogeneous binocular stereo vision system, which included a high-definition color camera and an infrared thermal camera system and designed an algorithm for accurate positioning and sorting of calibration points on the calibration plate. The camera is then calibrated, as is the binocular stereo vision system. This method has a low mistake rate, but it takes a long time. Literature [10] demonstrated online calibration of dynamic binocular stereo vision's external parameters for rectangular images of undetermined size. The elliptical pose and heading reference system is used in real time to provide an approximate value of the rotation angle, and the rotation angle of each camera is solved iteratively using only a single rectangular centroid according to the homology map between images. To complete the camera calibration, the yaw angle is corrected according to the matching rectangle prime angle. However, the algorithm's accuracy is low. Literature [11] examined the methods for calibrating the ultra-wide field of view long wave infrared camera's internal and external parameters. In order to address the issues of camera imaging distortion and low resolution, an external parameter calibration method based on the least square method is proposed, and the calibration results of a long wave infrared camera are evaluated in conjunction with the relevant data of internal parameters. Experiments validate the approach's objective correctness. However, its stability is low. Literature [12] investigated the parallel binocular stereo vision system and zoom calibration method. The image information is gathered using the triangulation concept, the baseline accuracy is ensured by moving the camera, the calibration results are produced, and the BP neural network is used to process the calibration data further to increase the visual measurement accuracy. However, due to the characteristics and mutual restrictions of left and right images in binocular stereo vision, this strategy is prone to local optimization, and overall stability is not satisfied. To address the disadvantages of traditional methods, this work investigates the high precision calibration for a binocular stereo vision camera using deep reinforcement learning, with an emphasis on addressing the camera's internal and external parameters. Experiments validate the algorithm's performance, and camera calibration may be accomplished quickly.

3. Methodology

3.1. Binocular Stereo Vision Camera Model

Through the imaging lens, the camera translates the projection from three-dimensional coordinates to two-dimensional coordinates. This process is known as imaging transformation, and it is referred to as camera model. The camera model can be used to determine the location relationship between each point on the measured image and the space object [13]. Binocular stereo vision cameras use the parallax principle to obtain image information from left and right cameras. Figure 1 depicts the positioning and coordinates of the two cameras in binocular stereo vision assessment. Figure 1 shows the location and coordinates of the two cameras in binocular stereo vision measurement.O-XYZ represents the coordinate system of the left camera. The origin is located at the start of the global coordinate system. The coordinate system of the left camera image is o-x₁y₁z₁, the coordinate system of the right camera is o-xyz, and the coordinate system of the right camera image is o-x₂y₂z₂. The camera transformation model is then developed using the imaging lens principle [14].

\begin{matrix} A_{l} \cdot [\begin{matrix} X \\ Y \\ 1 \end{matrix}] = [\begin{matrix} c_{X} & b_{l} & d_{l} \\ 0 & c_{Y} & f_{l} \\ 0 & 0 & 1 \end{matrix}] [\begin{matrix} x_{1} \\ y_{1} \\ z_{1} \end{matrix}], \\ A_{r} \cdot [\begin{matrix} x \\ y \\ 1 \end{matrix}] = [\begin{matrix} c_{x} & b_{r} & d_{r} \\ 0 & c_{x} & f_{r} \\ 0 & 0 & 1 \end{matrix}] [\begin{matrix} x_{2} \\ y_{2} \\ z_{2} \end{matrix}], \end{matrix}

(1)

where A_l and A_r represent the image scale coefficients of the left and right cameras, and c_X and c_x represent the scale coefficients of the left and right cameras. Using axis d and axis f as measurement scales, d_l, f_l, d_r and f_l are the optical center of left and right cameras, while b_l and b_r are the error coefficients in vertical direction of left and right cameras.

There will be some translation and rotation during the pixel location conversion of the left and right cameras. The original position coordinate of the target object is designated as K(x_k, y_k, z_k), and a corner of the target object is chosen for translation.

Comparing the corresponding coordinates of the corner point before and after the pose transformation of the target object, the translation matrix T can be obtained, and the calculation Equation is as follows:

\begin{matrix} T = [\begin{matrix} x_{k}^{'} - x_{k} \\ y_{k}^{'} - y_{k} \\ z_{k}^{'} - z_{k} \end{matrix}], \end{matrix}

(2)

where K(x_k′, y_k′, z_k′) is the corner position of the target object after translation.

Assuming that the target object's rotation angles along the global coordinate system O-X₀Y₀Z₀ are γ, λ and μ, respectively, the rotation matrix for different angles of rotation around the X₀, Y₀ and Z₀ axes can be expressed as

\begin{matrix} R_{X_{0}} = [\begin{matrix} 1 & 0 & 0 \\ 0 & \cos γ & \sin γ \\ 0 & - \sin γ & \cos γ \end{matrix}], \end{matrix}

(3)

\begin{matrix} R_{Y_{0}} = [\begin{matrix} \cos λ & 0 & - \sin λ \\ 0 & 1 & 0 \\ \sin λ & 0 & \cos λ \end{matrix}], \end{matrix}

(4)

\begin{matrix} R_{Z_{0}} = [\begin{matrix} \cos μ & \sin μ & 0 \\ - \sin μ & \cos μ & 0 \\ 0 & 0 & 1 \end{matrix}] . \end{matrix}

(5)

If a rotation of an angular value is made around a fixed axis, the rotation matrix can be regarded as a superposition of the rotations of X₀, Y₀ and Z₀ as rotation axes.

Equation (3) can be used to calculate the relationship between the initial pose coordinate k of the target object's corner and the transformed pose coordinate k′ [15]:

\begin{matrix} [\begin{matrix} x_{k}^{'} \\ y_{k}^{'} \\ z_{k}^{'} \end{matrix}] = [\begin{matrix} R {}_{X_{0}}x_{k} \\ R {}_{X_{0}}y_{k} \\ R {}_{X_{0}}z_{k} \end{matrix}] + T . \end{matrix}

(6)

3.2. High Precision Calibration Algorithm for Binocular Stereo Vision Camera

Camera calibration is the process of comparing the camera system to the measurement standard and determining the camera parameters through coordinate and related factor calculations [16, 17]. From two-dimensional data, camera calibration can determine the true location state of the measured object. It is not only a significant step in computer vision research, but it is also a necessary connection in binocular vision noncontact measurement. The accuracy of the stereo vision measurement method is directly affected by whether the computation is accurate or not [18, 19].

Internal parameter calibration and external parameter calibration are the two primary types of camera calibration. Table 1 describes the parameters.

Table 1.

Camera parameter description.

Parameter	Expression	Freedom
Internal parameters	Effective focal length f_u, f_v.Optical center u₀, v₀	5
	Nonvertical factor ξ	5
	Radial distortion parameter χ₁, χ₂	4
	Tangential distortion parameter p₁, p₂	4

External parameters	Rotation matrix	3
External parameters	Translation matrix T	3

Open in a new tab

External parameters are used to determine the position relationship of camera coordinate system, including rotation matrix and translation matrix. The degrees of freedom of translation matrix and rotation matrix are three, respectively, and a total of six camera external parameters are obtained by adding. These external parameters usually need to be obtained by experimental calculation [20]. The parameter calculation process can be regarded as camera calibration. Internal camera parameters, such as focal length, optical center, nonvertical factor, and distortion parameters involved in perspective translation, are included in Table 1. External parameters such as the rotation matrix and translation matrix are used to determine the position connection of the camera coordinate system. The degrees of freedom of the translation matrix and rotation matrix are three, respectively, and adding them yields a total of six camera external parameters. These external parameters are normally derived through experimental calculation [20]. The process of calculating parameters might be thought of as camera calibration.

3.2.1. Internal Parameter Calibration

There will be an intersection point between the parallel line and the infinite plane, which is known as the blanking point, according to projective geometry theory. The existence of the blanking point is determined by the line's direction. According to this theory, a blanking point must exist between the camera's optical center and the camera plane. The blanking points can be used to calibrate the camera's internal parameters. It is assumed that there are two blanking points on the camera plane, g and h, in the vertical and parallel directions, respectively, which are connected to the camera's optical center O to produce OG and OH. If the coordinate of the camera's principal point is (d, f) and the coordinates of the blanking points G and H are (g, h), then

\begin{matrix} O G = {((g_{i} - d) c_{X}, (h_{i} - f) c_{x}, f_{0})}^{T}, \end{matrix}

(7)

\begin{matrix} O H = {((g_{j} - d) c_{X}, (h_{j} - f) c_{x}, f_{0})}^{T}, \end{matrix}

(8)

where f₀ is the focal length of the camera, and T is the transpose symbol.

G and H are orthogonal fading point pairs, as the following Equation:

\begin{matrix} O G \cdot O H = 0. \end{matrix}

(9)

Calculation Equation of hidden points cancelled is shown in equation (8).

\begin{matrix} (g_{i} - d) (g_{j} - d) {(c_{X})}^{2} + (h_{i} - f) (h_{j} - f) {(c_{x})}^{2} + {(f_{0})}^{2} = 0. \end{matrix}

(10)

The internal parameter calibration of the camera can be accomplished preliminary using equation (8). The camera model is typically split into linear and nonlinear models based on the imaging geometric connection. However, the premise of the linear model is based on an ideal assumption, which can only simply express the relationship between image coordinates and spatial coordinates [21]. There will be distortion and camera deformity throughout the actual filming process owing to the influence of numerous circumstances. The real imaging position is (U₁, V₁) if the imaging position in the linear model is (U, V).

\begin{matrix} \{\begin{matrix} U_{1} = U + β, \\ V_{1} = V + α, \end{matrix}) \end{matrix}

(11)

where β and α are distortion value in transverse and longitudinal imaging direction.

Radial and tangential distortion are the most common types of camera distortion. The tangential distortion is usually minor and unnoticeable. As a result, the radial distortion polynomial is used to express the camera distortion value.

\begin{matrix} \{\begin{matrix} β = χ p r^{2} (U_{1} - d), \\ α = χ p r^{2} (V_{1} - f), \end{matrix}) \end{matrix}

(12)

where χ represents the radial distortion parameter of the camera. p represents the tangential distortion parameter, and r represents the radial distortion distance dominated by the image center.

3.2.2. External Parameter Calibration using Deep Reinforcement Learning

Internal parameters are used to calibrate the camera's external parameters. In general, the precision calibration board is chosen to compute the corresponding relationship between camera coordinates and spatial coordinates, as well as to define the structural parameters of the binocular vision system. For external parameter calibration, the deep reinforcement learning algorithm is applied in this study.

The deep reinforcement learning algorithm is a new algorithm that was created by combining deep learning and reinforcement learning. It not only has deep learning's feature extraction ability, but also has reinforcement learning's decision-making power. The traditional reinforcement learning algorithm's applicability space is narrow and discrete. Reinforcement learning effectively overcomes the limitation that it cannot be applied to high-dimensional data analysis by optimizing deep learning, allowing it to be well applied to vast spaces practical scenes [22]. Figure 2 shows the deep reinforcement learning framework.

The goal of reinforcement learning, as shown in Figure 2, is to learn the best approach through environmental interaction and reward accumulation. It is a constant process in which agents interact with their surroundings in order to attain their objectives. The camera external parameter calibration process can be seen as a reinforcement learning problem, and the optimal parameters can be determined as much as feasible through the camera target and coordinate analysis, according to the description of reinforcement learning.

At the moment, classical reinforcement learning can be classified into three types: value-based reinforcement learning, policy-based reinforcement learning, and actor critical learning, which combines value and policy. The actor critical method is a hybrid of the two ways, having the benefits of the policy method for generating actions and dealing with continuous actions, but it requires the calculation of the value function. As a result, in this study, the actor critical method is chosen to calibrate the camera's external settings. The value function must be calculated, and deep learning is a powerful function calculation tool. When applying deep learning to reinforcement learning, however, it is necessary to use a neural network to fit the mapping relationship, which will form a very complex mapping relationship network, and the parameters must be adjusted continuously, implying that the adjustment and convergence of the value function have become a critical problem. As a result, in order to tackle this challenge, this work examines the structure fitting of deep reinforcement learning.

(1) Deep Reinforcement Learning Structure Fitting. Deep learning fitting value function, namely, deep reinforcement learning structure fitting, is used in the process of merging deep learning and reinforcement learning in order to fully use the function of reinforcement learning. The study of the deep reinforcement learning structure fitting problem is mostly accomplished by enhancing value function calculation, which is embodied in the adjustment of value function parameters and convergence of value function by building target network.

The estimating procedure of the state action value function is frequently done in practice using function approximation, which is stated as

\begin{matrix} Q (q, a, ϖ) = Q^{'} (q, a), \end{matrix}

(13)

where Q(q, a) is the state action value function, where q denotes the state, a denotes the action value, and ϖ denotes the value function's parameter, which is the reinforcement learning parameter. Equation (13) shows the update method for the value function parameter ϖ.

\begin{matrix} ϖ = ϖ_{0} + φ \nabla Q (q, a), \end{matrix}

(14)

where ϖ₀ is the initial value of the function parameter, and φ is the update coefficient of the value function.

To finish the neural network training, it is required to constantly update the parameters while using a neural network to calculate the value function. This parameter is the value function's parameter. To adjust to the optimal parameters [23, 24], the target network is built, and the parameters are updated in hard and soft modes. When the network unit size must be rigorously controlled, it is considered hard mode. The operating steps are fixed in hard mode. Following the completion of this step, the network parameters are updated by copying. When the network unit size is affected by the overall division unit size, it is considered soft mode, and the update value is minimal in soft mode. The target network parameters (neural network parameters) can then be updated and stated as equation (14).

\begin{matrix} θ^{'} = \{\begin{matrix} θ, & if hard, \\ (1 - η) θ + η θ, & if soft, \end{matrix}) \end{matrix}

(15)

where θ denotes the neural network's initial parameters The updated neural network parameters are denoted by θ′, and the value function is denoted by equation (15).

\begin{matrix} ϖ^{'} = \{\begin{matrix} ϖ, & if hard, \\ (1 - η) ϖ + η ϖ, & if soft, \end{matrix}) \end{matrix}

(16)

where ϖ′ is the updated value function parameters, and η is a small value in soft mode, which can help update the parameters properly.

According to the equation (15), after n iterations, the value function has the following equation (16).

\begin{matrix} ϖ_{0} ⟶ ϖ_{1} ⟶ \dots ⟶ ϖ_{n} . \end{matrix}

(17)

The parameter convergence of the value function can finally be accomplished after equation (16), that is,

\begin{matrix} \lim_{n ⟶ \infty} ϖ_{n} = ϖ . \end{matrix}

(18)

(2) Camera External Parameter Calibration. The binocular stereo vision camera data is input, and the external parameters of the camera are calibrated using deep reinforcement learning calculations based on the fitting structure.

Input: sample data is collected by a binocular camera;
Output: camera external parameter calibration results.

Reinforcement learning parameters are expressed as value function parameters, and the initial reinforcement learning parameter is ϖ, the initial value of neural network parameters is θ, the deep reinforcement learning structure and related parameters are initialized, and the deep reinforcement learning binocular stereo camera parameters are calibrated. Deep reinforcement learning is used to calibrate the parameters of a binocular stereo vision camera.

Half of the binocular stereo vision cameras in the experimental data set were chosen to collect target data as training samples
The numbers of hidden layers and nodes of the neural network are determined based on the size of the training samples
The fitting structure of deep reinforcement learning is constructed, as shown in Figure 3
The neural network is utilized to fit the camera data in order to obtain the value function, and the target network's value function and parameters are established
Repeat the iterative value function and neural network, using equations (15) and (16) to adjust the reinforcement learning parameter ϖ′ and neural network parameter θ′ until the parameters converge
Recollect the binocular stereo vision camera data as test data, enter it into the deep reinforcement learning structure, and calculate the camera's external parameters, including the rotation matrix and translation matrix
The calibration of the camera's external parameters is complete
End

Deep reinforcement learning fitting structure.

4. Experimental Analysis and Results

To evaluate the performance of the binocular stereo vision camera's high-precision calibration algorithm based on deep reinforcement learning, an experimental binocular stereo vision system is constructed.

4.1. Experimental Environment

The vs2019 development platform has been completed. The simulation data is run on Windows 10, and the algorithm is developed in opencv2.49. Table 2 shows the experimental apparatus, which consists of two cameras, two chess and card grid calibration boards, and a computer.

Table 2.

Camera performance index.

Performance index	Numerical value
Pixel	1280 × 960
Sampling frequency	60 Hz
Baseline length	60 mm
Focal length	6 mm
Optical dimension	1/3

Open in a new tab

AutoCAD software is utilized in the experiment to construct chess and card images, develop and print them, and create a calibration board, as illustrated in Figure 4.

4.2. Data Set

The experimental data are drawn from two common data sets as well as a visual system measurement data set: the KITTI data set, the cityscapes data set, and the visual system measurement data set. The KITTI data set is the world's largest automatic driving scenario visual measurement dataset, and it is utilized for visual ranging, target detection, and tracking. The data gathering platform is outfitted with four cameras, one sensor, and one GPS navigation system to collect image data in a variety of scenarios such as cities, towns, and roads, including 389 pairs of stereo images and optical flow diagrams. The cityscapes data set is of a vast order of magnitude, containing street stereoscopic images of 50 distinct cities as well as numerous pixel level annotations, including 5,000 high-quality pixel level annotations and 20,000 poor annotations. The data set is ideal for training deep neural networks. A vision system measurement data set: the vision system collects stereoscopic images of six streets using binocular cameras, yielding a total of 20,000 images with a pixel resolution of 1280 × 960. During the experimental test, 1000 images are chosen from each of the three data sets mentioned above, for a total of 3,000 images evaluated. The first half of the data is utilized to train deep reinforcement learning algorithms, while the other half is used for experimental testing.

The studies were performed in the same noise and light environment to ensure the image acquisition impact. Two groups of studies were conducted, each with a 10 mm and 20 mm chess and card grid calibration board. The binocular stereo vision system captured a total of 1,000 images. At the same time, the collected image is filtered and preprocessed to strengthen the image edge information in order to increase image quality and prevent interference from external variables such as noise and illumination. To improve calibration board accuracy, the dimensions of the two chess and card grid calibration boards are 10 mm and 20 mm, respectively, and the measurement field of view is 7m × 6 m, chess and card grid calibration plates are randomly placed in the camera system's measurement field, and the spacing between the two calibration plates is 4 m.

4.3. Evaluation Criteria

(1)
Calibration precision: This study proposes a calibration algorithm with great precision. To validate the algorithm's completion impact, a special comparative examination of calibration accuracy is required. The error is a method of expressing the precision of the calibration results. The calculation Equation is shown in equation (18).
$\begin{matrix} e = \sqrt{{(x_{w} - x_{w}^{'})}^{2} + {(y_{w} - y_{w}^{'})}^{2}}, \end{matrix}$ (19)
where e is calibration error. (x_w, y_w) and (x_w′, y_w′) represent the real coordinates and measurement coordinates of the target pixel, respectively.
(2)
Convergence of value function: Convergence of the value function: the convergence of the value function is one of the keys to realizing the fit between deep learning and reinforcement learning in the use of deep reinforcement learning algorithms. As a result, this experiment draws the value function network loss function curves of several algorithms to ensure that this method is convergent.
(3)
Parameter calculation accuracy: Parameter adjustment is also one of the keys to realize the fitting of deep learning and reinforcement learning. Therefore, parameter calculation accuracy is also an effective index to show the performance of the proposed algorithm. The accuracy calculation Equation is as follows:
$\begin{matrix} Accu = \frac{L_{1}}{L_{tot}} \times 100 %, \end{matrix}$ (20)
where L_tot represents the actual number of parameter calculations. L₁ is the number of correct parameters in the calculation result.
(4)
Camera calibration time consumption: Camera calibration is an important prerequisite in the application of binocular stereo vision system. It is very important for the vision system to complete camera calibration quickly.
(5)
Stability of calibration results: The stability of the calibration results of the proposed algorithm is compared with those of Literature [7], Literature [8], Literature [9], Literature [11], and Literature [12].The measurement of stability is based on the change of camera calibration result data sequence. It is assumed that the calibration data series has the same keywords. If the relative order of these terms does not change after sorting, the algorithm is stable.

4.4. Results and Discussion

4.4.1. Comparison of Calibration Precision

This paper's main goal is to achieve high precision calibration of a binocular stereo vision camera. As a result, the proposed algorithm is compared to the algorithms in Literature [7], Literature [8], Literature [9], Literature [11], and Literature [12] algorithms in order to reflect the efficiency of the algorithm established in this work as shown in Table 3.

Table 3.

Calibration errors of different algorithms.

Algorithms	Chess and card grid calibration board (mm)
Algorithms	10	20
The proposed algorithm	0.36	0.35
Literature [7] algorithm	0.90	1.32
Literature [8] algorithm	1.66	3.27
Literature [9] algorithm	1.94	2.58
Literature [11] algorithm	5.65	5.20
Literature [12] algorithm	1.74	4.22

Open in a new tab

It can be seen from Table 3 that the test findings are quite important. The calibration errors of the algorithm are 0.36% and 0.35% for 10 mm and 20 mm chess and card grid calibration plates, respectively. In comparison to other literature, the minimum calibration error of Literature [7] under two chess and card grid calibration boards is 0.90%, the minimum calibration error of Literature [8] under two chess and card grid calibration boards is 1.66%, the minimum calibration error of Literature [9] is 1.94%, the minimum calibration error of Literature [11] is 5.20%, and the minimum calibration error of Literature [12] is 1.74%. When we compare the proposed algorithm with five traditional literature algorithms, we can clearly see the advantages of proposed algorithm, demonstrating that the deep reinforcement learning algorithm used in this paper for camera calibration has very high precision and a better calibration effect than the traditional literature algorithm.

4.4.2. Comparison of Convergence of Value Function

The loss function curve of the value function network is drawn by using the number of iterations as the abscissa and the mean square loss as the ordinate as shown in Figure 5.

Comparison of loss function curve of value function network.

According to Figure 5, each algorithm eventually converges, and the loss of mean square error reduces as the number of iterations grows. When comparing the proposed algorithm's convergence speed to that of the five traditional literature algorithms, it is clear that when the number of iterations is close to 30, the trend of the proposed algorithm's loss function curve begins to gradually tend to be stable, the mean square deviation loss is close to 0, and the value function's convergence is completed. After 70 iterations, the algorithms in Literature [8, 11] and Literature [12] rapidly converge. The convergence of the Literature [7] and Literature [9] algorithms is relatively poor, with a minimum root mean square error of more than 0.2 after convergence. It can be seen that the proposed algorithm's convergence speed is quick, and the convergence effect is good, demonstrating the effectiveness of the value function convergence of the design target network.

4.4.3. Comparison of Parameter Calculation Accuracy

This study modifies the value function parameters in reinforcement learning and uses neural network to continually update the parameters to complete the fitting between deep learning and reinforcement learning. The precision of parameter calculation is then critical for camera calibration. It is impossible to acquire accurate calibration results if the accuracy of parameter calculation is low. Figure 6 depicts the comparison result of parameter calculation accuracy.

The neural network is utilized to update the parameters of the value function, as shown in Figure 6. The modification of the median function of reinforcement learning may be performed with high accuracy through numerous iterations, and the maximum computation high accuracy is about 95%. The algorithm in Literature [9] has a relatively good calculating effect on parameters, with the highest accuracy of around 80%. However, it is still very different from the proposed algorithm. The results of the data comparison can be used to demonstrate the benefits of the proposed algorithm, validate its performance for parameter computation, and ensure the high accuracy calibration of binocular vision camera parameters in this study.

4.4.4. Comparison of Camera Calibration Time Consumption

Table 4 shows the camera calibration time consumption results.

Table 4.

Comparison of camera calibration time consumption of different algorithms.

Algorithms	Calibration time consuming (s)
Algorithms	KITTI data set	Cityscapes data set	Measurement data set of a vision system
The proposed algorithm	5.2	6.2	5.3
Literature [7] algorithm	9.2	16.5	20.2
Literature [8] algorithm	7.1	14.7	21.1
Literature [9] algorithm	8.1	13.0	18.5
Literature [11] algorithm	10.4	11.1	15.4
Literature [12] algorithm	9.1	12.5	49.5

Open in a new tab

Table 4 shows that when different data sets are used as data sources to assess the calibration time consuming of algorithm, the test results are quite significant. The calibration time consuming of the algorithm in the KITTI data set, cityscapes data set, and vision system measurement data set is 5.2 s, 6.2 s, and 5.3 s, respectively, with an average time consuming of 5.6 s. The proposed algorithm is faster than the average time of algorithms in Literature [7], Literature [8], Literature [9], Literature [11], and Literature [12]. The deep reinforcement learning technique has a very efficient operation rate, which can effectively improve the camera calibration in this work.

4.4.5. Comparison of Stability of Calibration Results

The stability comparison results of camera calibration results are shown in Figure 7.

Stability comparison of calibration results.

According to the data in Figure 7, the proposed algorithm's stability is substantially higher than that of the other five literature algorithms, and the overall stability is controlled at approximately 92%. Among other algorithms, the highest stability of Literature [7], Literature [8], and Literature [9] algorithms is close to 80%, while the stability of Literature [11] and Literature [12] algorithms is almost 60%. This clearly demonstrates the benefits of the proposed binocular vision camera calibration algorithm, which can eliminate external interference and improve algorithm stability.

5. Conclusions and Future Works

This paper proposes deep learning to improve reinforcement learning, creates a deep reinforcement learning fitting structure, and investigates the calibration process of a binocular stereo vision camera. The camera's internal and external parameter calibrations are explained in depth, and the proposed algorithm is validated through experimentation. The results show that the proposed algorithm is capable of completing the camera's high precision calibration and has some theoretical utility. This study still has several flaws, and the numerous properties of the camera target are not thoroughly explored. Future works are required to account for target distance, image color, and other parameters, in order to improve the application efficiency and scope of the camera and unlock more possibilities.

Acknowledgments

This work was supported by Natural Science Foundation of Heilongjiang Province of China under grant no. LH2021F040.

Data Availability

Readers can access the data supporting the conclusions of the study from KITT data set and cityscapes data set and measurement data set of a vision system.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

1.Tang B., Jiang L. Binocular stereovision omnidirectional motion handling robot. International Journal of Advanced Robotic Systems . 2020;17(3) doi: 10.1177/1729881420926852.172988142092685 [DOI] [Google Scholar]
2.Tian X., Liu R., Wang Z., Ma J. High quality 3D reconstruction based on fusion of polarization imaging and binocular stereo vision. Information Fusion . 2022;77:19–28. doi: 10.1016/j.inffus.2021.07.002. [DOI] [Google Scholar]
3.Ren F., Zhu C., He M. Temperature effect on granite strain burst based on binocular stereovision technology. Environmental Earth Sciences . 2019;78(24):720–726. doi: 10.1007/s12665-019-8744-8. [DOI] [Google Scholar]
4.Li T., Liu C., Yang L., Wang L., Yang D. Binocular stereo vision calibration based on alternate adjustment algorithm. Optik . 2018;173(4):13–20. doi: 10.1016/j.ijleo.2018.07.103. [DOI] [Google Scholar]
5.Zhou H., Li C., Sun G., Yin J., Ren F. Calibration and location analysis of heterogeneous binocular stereo vision system. Applied Optics . 2021;60(24):7214–7222. doi: 10.1364/ao.428054. [DOI] [PubMed] [Google Scholar]
6.Jiang J., Zeng L., Chen B., Lu Y., Xiong W. An accurate and flexible technique for camera calibration. Computing . 2019;101(4):1971–1988. doi: 10.1007/s00607-019-00723-6. [DOI] [Google Scholar]
7.Yin H., Ma Z., Zhong M., et al. SLAM-based self-calibration of a binocular stereo vision rig in real-time. Sensors . 2020;20(3):621–626. doi: 10.3390/s20030621. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Chen X., Lin J., Sun Y., Ma H., Zhu J. Analytical solution of uncertainty with the GUM method for a dynamic stereo vision measurement system. Optics Express . 2021;29(6):8967–8984. doi: 10.1364/oe.422048. [DOI] [PubMed] [Google Scholar]
9.Zou W., Wei Z., Liu F. High-accuracy calibration of line-structured light vision sensors using a plane mirror. Optics Express . 2019;27(24):34681–34704. doi: 10.1364/oe.27.034681. [DOI] [PubMed] [Google Scholar]
10.Wang Y., Wang X., Yin L. Estimation of extrinsic parameters for dynamic binocular stereo vision using unknown-sized rectangle images. Review of Scientific Instruments . 2019;90(6) doi: 10.1063/1.5086352.065108 [DOI] [PubMed] [Google Scholar]
11.Wang Z., Li G., Liu B., Huang F., Chen Y. Extrinsic parameters calibration of ultra-wide angle long-wave infrared stereo vision and evaluation of intrinsic and extrinsic parameters. Spectroscopy and Spectral Analysis . 2020;40(9):2670–2675. [Google Scholar]
12.Yin C., Chu X., Yang S., Li L., Sui G. High-precision zoom camera calibration of stereo vision measurement system with single camera. Optical Technique . 2019;45(6):668–676. [Google Scholar]
13.Steger C., Ulrich M. A camera model for line-scan cameras with telecentric lenses. International Journal of Computer Vision . 202;129(3):80–99. [Google Scholar]
14.Machado A. S., Ignacio Priego-Quesada J., Jimenez-Perez I., Gil-Calvo M., Pivetta Carpes F., Perez-Soriano P. Influence of infrared camera model and evaluator reproducibility in the assessment of skin temperature responses to physical exercise. Journal of Thermal Biology . 2021;98(5) doi: 10.1016/j.jtherbio.2021.102913.102913 [DOI] [PubMed] [Google Scholar]
15.Liu Y., Zhang J., Sen L. Pose estimation of curved objects based on binocular vision and vectors of the tangent plane. Laser & Optoelectronics Progress . 2020;57(4) doi: 10.3788/lop57.041506.041506 [DOI] [Google Scholar]
16.Hsu H.-M., Cai J., Wang Y., Hwang J.-N., Kim K.-J. Multi-target multi-camera tracking of vehicles using metadata-aided Re-id and trajectory-based camera link model. IEEE Transactions on Image Processing . 2021;30:5198–5210. doi: 10.1109/tip.2021.3078124. [DOI] [PubMed] [Google Scholar]
17.Liu J., Gao F., Luo X. A review of deep reinforcement learning based on value function and strategy gradient. Chinese Journal of Computers . 2019;042(6):1406–1438. [Google Scholar]
18.Liang X., Minhe F., Huang J., Wang Q., Ma Y., Liu Z. Novel deep reinforcement learning algorithm based on attention-based value function and autoregressive environment model. Journal of Software . 2020;31(4):948–966. [Google Scholar]
19.Zhu B., Jiang Y., Zhao J., Chen H., Deng W. W. A car-following control algorithm based on deed reinforcement learning. China Journal of Highway and Transport . 2019;032(006):53–60. [Google Scholar]
20.Volodymyr M., Koray K., David S., et al. Human-level control through deep reinforcement learning. Nature . 2015;518(7540):529–533. doi: 10.1038/nature14236. [DOI] [PubMed] [Google Scholar]
21.Manchella K., Umrawal A. K., Aggarwal V. FlexPool: a distributed model-free deep reinforcement learning algorithm for joint passengers and goods transportation. IEEE Transactions on Intelligent Transportation Systems . 2021;22(4):2035–2047. doi: 10.1109/tits.2020.3048361. [DOI] [Google Scholar]
22.Zhou J., Xue S., Xue Y., Liao Y., Liu J., Zhao W. A novel energy management strategy of hybrid electric vehicle via an improved TD3 deep reinforcement learning. Energy . 2021;224 doi: 10.1016/j.energy.2021.120118.120118 [DOI] [Google Scholar]
23.Liu X., Liu Z., Duan G., Cheng J., Jiang X., Tan J. Precise and robust binocular camera calibration based on multiple constraints. Applied Optics . 2018;57(18):5130–5140. doi: 10.1364/ao.57.005130. [DOI] [PubMed] [Google Scholar]
24.Jia Z., Yang J., Liu W., et al. Improved camera calibration method based on perpendicularity compensation for binocular stereo vision measurement system. Optics Express . 2015;23(12):15205–15223. doi: 10.1364/oe.23.015205. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Readers can access the data supporting the conclusions of the study from KITT data set and cityscapes data set and measurement data set of a vision system.

[B1] 1.Tang B., Jiang L. Binocular stereovision omnidirectional motion handling robot. International Journal of Advanced Robotic Systems . 2020;17(3) doi: 10.1177/1729881420926852.172988142092685 [DOI] [Google Scholar]

[B2] 2.Tian X., Liu R., Wang Z., Ma J. High quality 3D reconstruction based on fusion of polarization imaging and binocular stereo vision. Information Fusion . 2022;77:19–28. doi: 10.1016/j.inffus.2021.07.002. [DOI] [Google Scholar]

[B3] 3.Ren F., Zhu C., He M. Temperature effect on granite strain burst based on binocular stereovision technology. Environmental Earth Sciences . 2019;78(24):720–726. doi: 10.1007/s12665-019-8744-8. [DOI] [Google Scholar]

[B4] 4.Li T., Liu C., Yang L., Wang L., Yang D. Binocular stereo vision calibration based on alternate adjustment algorithm. Optik . 2018;173(4):13–20. doi: 10.1016/j.ijleo.2018.07.103. [DOI] [Google Scholar]

[B5] 5.Zhou H., Li C., Sun G., Yin J., Ren F. Calibration and location analysis of heterogeneous binocular stereo vision system. Applied Optics . 2021;60(24):7214–7222. doi: 10.1364/ao.428054. [DOI] [PubMed] [Google Scholar]

[B6] 6.Jiang J., Zeng L., Chen B., Lu Y., Xiong W. An accurate and flexible technique for camera calibration. Computing . 2019;101(4):1971–1988. doi: 10.1007/s00607-019-00723-6. [DOI] [Google Scholar]

[B7] 7.Yin H., Ma Z., Zhong M., et al. SLAM-based self-calibration of a binocular stereo vision rig in real-time. Sensors . 2020;20(3):621–626. doi: 10.3390/s20030621. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] 8.Chen X., Lin J., Sun Y., Ma H., Zhu J. Analytical solution of uncertainty with the GUM method for a dynamic stereo vision measurement system. Optics Express . 2021;29(6):8967–8984. doi: 10.1364/oe.422048. [DOI] [PubMed] [Google Scholar]

[B9] 9.Zou W., Wei Z., Liu F. High-accuracy calibration of line-structured light vision sensors using a plane mirror. Optics Express . 2019;27(24):34681–34704. doi: 10.1364/oe.27.034681. [DOI] [PubMed] [Google Scholar]

[B10] 10.Wang Y., Wang X., Yin L. Estimation of extrinsic parameters for dynamic binocular stereo vision using unknown-sized rectangle images. Review of Scientific Instruments . 2019;90(6) doi: 10.1063/1.5086352.065108 [DOI] [PubMed] [Google Scholar]

[B11] 11.Wang Z., Li G., Liu B., Huang F., Chen Y. Extrinsic parameters calibration of ultra-wide angle long-wave infrared stereo vision and evaluation of intrinsic and extrinsic parameters. Spectroscopy and Spectral Analysis . 2020;40(9):2670–2675. [Google Scholar]

[B12] 12.Yin C., Chu X., Yang S., Li L., Sui G. High-precision zoom camera calibration of stereo vision measurement system with single camera. Optical Technique . 2019;45(6):668–676. [Google Scholar]

[B13] 13.Steger C., Ulrich M. A camera model for line-scan cameras with telecentric lenses. International Journal of Computer Vision . 202;129(3):80–99. [Google Scholar]

[B14] 14.Machado A. S., Ignacio Priego-Quesada J., Jimenez-Perez I., Gil-Calvo M., Pivetta Carpes F., Perez-Soriano P. Influence of infrared camera model and evaluator reproducibility in the assessment of skin temperature responses to physical exercise. Journal of Thermal Biology . 2021;98(5) doi: 10.1016/j.jtherbio.2021.102913.102913 [DOI] [PubMed] [Google Scholar]

[B15] 15.Liu Y., Zhang J., Sen L. Pose estimation of curved objects based on binocular vision and vectors of the tangent plane. Laser & Optoelectronics Progress . 2020;57(4) doi: 10.3788/lop57.041506.041506 [DOI] [Google Scholar]

[B16] 16.Hsu H.-M., Cai J., Wang Y., Hwang J.-N., Kim K.-J. Multi-target multi-camera tracking of vehicles using metadata-aided Re-id and trajectory-based camera link model. IEEE Transactions on Image Processing . 2021;30:5198–5210. doi: 10.1109/tip.2021.3078124. [DOI] [PubMed] [Google Scholar]

[B17] 17.Liu J., Gao F., Luo X. A review of deep reinforcement learning based on value function and strategy gradient. Chinese Journal of Computers . 2019;042(6):1406–1438. [Google Scholar]

[B18] 18.Liang X., Minhe F., Huang J., Wang Q., Ma Y., Liu Z. Novel deep reinforcement learning algorithm based on attention-based value function and autoregressive environment model. Journal of Software . 2020;31(4):948–966. [Google Scholar]

[B19] 19.Zhu B., Jiang Y., Zhao J., Chen H., Deng W. W. A car-following control algorithm based on deed reinforcement learning. China Journal of Highway and Transport . 2019;032(006):53–60. [Google Scholar]

[B20] 20.Volodymyr M., Koray K., David S., et al. Human-level control through deep reinforcement learning. Nature . 2015;518(7540):529–533. doi: 10.1038/nature14236. [DOI] [PubMed] [Google Scholar]

[B21] 21.Manchella K., Umrawal A. K., Aggarwal V. FlexPool: a distributed model-free deep reinforcement learning algorithm for joint passengers and goods transportation. IEEE Transactions on Intelligent Transportation Systems . 2021;22(4):2035–2047. doi: 10.1109/tits.2020.3048361. [DOI] [Google Scholar]

[B22] 22.Zhou J., Xue S., Xue Y., Liao Y., Liu J., Zhao W. A novel energy management strategy of hybrid electric vehicle via an improved TD3 deep reinforcement learning. Energy . 2021;224 doi: 10.1016/j.energy.2021.120118.120118 [DOI] [Google Scholar]

[B23] 23.Liu X., Liu Z., Duan G., Cheng J., Jiang X., Tan J. Precise and robust binocular camera calibration based on multiple constraints. Applied Optics . 2018;57(18):5130–5140. doi: 10.1364/ao.57.005130. [DOI] [PubMed] [Google Scholar]

[B24] 24.Jia Z., Yang J., Liu W., et al. Improved camera calibration method based on perpendicularity compensation for binocular stereo vision measurement system. Optics Express . 2015;23(12):15205–15223. doi: 10.1364/oe.23.015205. [DOI] [PubMed] [Google Scholar]

PERMALINK

High Precision Calibration Algorithm for Binocular Stereo Vision Camera using Deep Reinforcement Learning

Jie Ren

Fuyu Guan

Tingting Wang

Baoshan Qian

Chunlin Luo

Guoliang Cai

Ce Kan

Xiaofeng Li

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Binocular Stereo Vision Camera Model

Figure 1.

3.2. High Precision Calibration Algorithm for Binocular Stereo Vision Camera

Table 1.

3.2.1. Internal Parameter Calibration

3.2.2. External Parameter Calibration using Deep Reinforcement Learning

Figure 2.

Figure 3.

4. Experimental Analysis and Results

4.1. Experimental Environment

Table 2.

Figure 4.

4.2. Data Set

4.3. Evaluation Criteria

4.4. Results and Discussion

4.4.1. Comparison of Calibration Precision

Table 3.

4.4.2. Comparison of Convergence of Value Function

Figure 5.

4.4.3. Comparison of Parameter Calculation Accuracy

Figure 6.

4.4.4. Comparison of Camera Calibration Time Consumption

Table 4.

4.4.5. Comparison of Stability of Calibration Results

Figure 7.

5. Conclusions and Future Works

Acknowledgments

Data Availability

Conflicts of Interest

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases