Skip to main content
Sensors (Basel, Switzerland) logoLink to Sensors (Basel, Switzerland)
. 2024 Jun 21;24(13):4035. doi: 10.3390/s24134035

SC-AOF: A Sliding Camera and Asymmetric Optical-Flow-Based Blending Method for Image Stitching

Jiayi Chang 1,2,3,*, Qing Li 2, Yanju Liang 3, Liguo Zhou 4,*
Editor: Yun Zhang
PMCID: PMC11244014  PMID: 39000817

Abstract

Parallax processing and structure preservation have long been important and challenging tasks in image stitching. In this paper, an image stitching method based on sliding camera to eliminate perspective deformation and asymmetric optical flow to solve parallax is proposed. By maintaining the viewpoint of two input images in the mosaic non-overlapping area and creating a virtual camera by interpolation in the overlapping area, the viewpoint is gradually transformed from one to another so as to complete the smooth transition of the two image viewpoints and reduce perspective deformation. Two coarsely aligned warped images are generated with the help of a global projection plane. After that, the optical flow propagation and gradient descent method are used to quickly calculate the bidirectional asymmetric optical flow between the two warped images, and the optical-flow-based method is used to further align the two warped images to reduce parallax. In the image blending, the softmax function and registration error are used to adjust the width of the blending area, further eliminating ghosting and reducing parallax. Finally, by comparing our method with APAP, AANAP, SPHP, SPW, TFT, and REW, it has been proven that our method can not only effectively solve perspective deformation, but also gives more natural transitions between images. At the same time, our method can robustly reduce local misalignment in various scenarios, with higher structural similarity index. A scoring method combining subjective and objective evaluations of perspective deformation, local alignment and runtime is defined and used to rate all methods, where our method ranks first.

Keywords: image stitching, sliding cameras, asymmetric optical flow, image blending

1. Introduction

Image stitching is a technology that can align and blend multiple images to generate a high-resolution, wide field-of-view and artifact-free mosaic. It has broad and promising applications in many fields such as virtual reality, remote sensing mapping, and urban modeling. The calculation of the global homography, as an important step in image stitching [1,2], directly determines the image alignment accuracy and the final user experience. However, global homography only works for planar scenes or rotation-only camera motions. For non-planar scenes or when the optical centers of cameras do not coincide, homography tends to cause misalignment, resulting in blurring and ghosting in the mosaic. It can also cause perspective deformation, making the final mosaic blurred and severely stretched at the edges. Many solutions have been proposed to solve the problems of parallax and perspective deformation in image stitching, so as to improve the quality of stitched images. But most state-of-the-art mesh-based [3,4,5] and multi-plane [6,7,8] methods are time-consuming and vulnerable to false matches.

In this work, an innovative image stitching method combining sliding camera (SC) and asymmetric optical flow (AOF), referred to as the SC-AOF method, is proposed to reduce both perspective deformation and alignment error. In the non-overlapping area of the mosaic, the SC-AOF method manages to keep the viewpoint of the mosaic the same as or one rotation around the camera Z axis from those of the input images. In the overlapping area of the mosaic, the viewpoint is changed from one input image viewpoint to another, which can effectively solve the perspective deformation at the edge. A global projection plane is estimated to project input images onto the mosaic. After that, an asymmetric optical flow method is employed to further align the images. In the blending, the softmax function and alignment error are used to dynamically adjust the width of the blending area to further eliminate ghosting and improve the mosaic quality. This paper makes the following contributions:

  • The SC-AOF method innovatively uses an approach based on sliding camera to reduce perspective deformation. Combined with either a global projection model or a local projection model, this method can effectively reduce the perspective deformation.

  • An optical-flow-based image alignment and blending method is adopted to further mitigate misalignment and improve the stitching quality of the mosaic generated by a global projection model.

  • Each step in the SC-AOF method can be combined with other methods to improve the stitching quality of those methods.

This article is organized as follows. Section 2 presents the related works. Section 3 first introduces the overall method of this article, then an edge stretching reduction method based on sliding camera and a local misalignment reduction method based on asymmetric optical flow are elaborated in detail. Section 4 presents our qualitative and quantitative experimental results compared with other methods. Finally, Section 5 summarizes our method.

2. Related Works

For the local alignment, APAP (as-projective-as-possible) [8,9] uses the weighted DLT (direct linear transform) method to estimate the location-dependent homography and then eliminate misalignment. However, if some key points match incorrectly, the image areas near these key points may have incorrect homography, resulting in serious alignment errors and distortion. APAP needs to estimate homography using DLT for each image cell, and therefore APAP runs much slower than the global homography warping. REW (robust elastic warping) [10,11] uses the TPS (thin-plate spline) interpolation method to convert discrete matched feature points into a deformation field, which is used to warp the image and achieve accurate local alignment. The estimation of TPS parameters and the deformation field is fast, so REW has excellent running efficiency. TFT (triangular facet approximation) [6] uses the Delaunay triangulation method and the matched feature points to triangulate the mosaic canvas, and the warping inside each triangle is determined by the homography calculated based on the three triangle vertices, so the false matches will lead to serious misalignment. TFT estimates a plane for every triangle instead of a homography for every cell, so TFT depends on the number of triangular facets in efficiency and runs faster than APAP generally. The warping-residual-based image stitching method [7] first estimates multiple homography matrices, and calculates warping residuals of each matched feature point using the multiple homography matrices. The homography of each region is estimated using moving DLT with the difference of the warping residuals as weight, which means the method can handle larger parallax than APAP, but is less robust to the incorrectly estimated homography and runs slower than APAP. The NIS (natural image stitching) [12] method estimates a pixel-to-pixel transformation based on feature matches and the depth map to achieve accurate local alignment. In [13], by increasing feature correspondences and optimizing hybrid terms, sufficient correct feature correspondences are obtained in the low-texture areas to eliminate misalignment. The two methods require additional runtime to enhance robustness, but also are susceptible to the uneven distribution and false matches of feature points.

For perspective deformation, SPHP (shape preserving half projective) [14,15] spatially combines perspective transformation and similarity transformation to reduce deformation. Perspective transformation can better align pixels in overlapping areas, and similarity transformation preserves the viewpoint of the original image in non-overlapping areas. AANAP (adaptive as-natural-as-possible) [16] derives the appropriate similarity transform directly based on matched feature points, and uses weights to gradually transit from perspective transform to similarity transform. The transitions from the homography of the overlapping area to the similarity matrix of the non-overlapping area adopted by SPHP and AANAP are artificial and unnatural, and can generate some “strange” homography matrices, causing significant distortion in the overlapping area. Both SPHP and AANAP require the estimation of homography or similarity matrices for each cell, and thus have the same efficiency issue as APAP. GSP (global similarity prior) [17,18] adds a global similarity prior to constrain the warping of each image so that it resembles a similarity transformation as a whole and avoids large perspective distortion. In SPW (single-projective warp) [19], the quasi-homography warp [20] is adopted to mitigate projective distortion and preserve the single perspective. SPSO (Structure Preservation and Seam Optimization) [4] uses a hybrid warping model based on multi-homography and mesh-based warp to obtain precise alignment of areas at different depths while preserving local and global image structures. GES-GSP (geometric structure preserving-global similarity prior) [21] employs deep learning-based edge detection to extract various types of large-scale edges, and further introduces large-scale geometric structure preservation to GSP to preserve the curves in images and protect them from distortion. GSP, SPW, SPSO and GES-GSP are based on content preserving warping and require constructing and solving a linear equation with m variables and n equations to acquire the corresponding coordinates after mesh warping, in which m is the number of cell vertices multiplied by 2, n is the number of alignment constraints, structural preservation constraints, and other constraints. Both m and n are generally larger, therefore more runtime is required.

Based on the above analysis, generating a natural mosaic quickly and robustly remains a challenging task.

3. Methodology

The flow chart of the SC-AOF algorithm is illustrated in Figure 1. The details on each of its steps are described below.

Figure 1.

Figure 1

Flow chart of SC-AOF method. After the detection and matching of feature points, the camera parameters are obtained in advance or estimated. Then the two warped images are calculated using SC method, and the mosaic that is coarsely aligned can be obtained. Finally, the AOF method is used to further align the two warped images to generate a blended mosaic with higher alignment accuracy.

Step 1: Feature point detection and matching. SIFT (scale-invariant feature transform) and SURF (speed-up robust feature) methods are generally used to detect and describe key points from two input images. Using the KNN (k-nearest neighbor) method, a group of matched points is extracted from the key points and used for camera parameter estimation in step 2 and global projection plane calculation in step 3.

Step 2: Camera parameter estimation. The intrinsic and extrinsic camera parameters are the basis of the SC method, and can be obtained in advance or estimated. When camera parameters are known, we can skip step 1 and directly start from step 3. When camera parameters are unknown, they can be estimated by minimizing the epipolar and planar errors, as described in Section 3.3.

Step 3: Sliding camera-based image projection. In this step, we estimate the global projection plane first, then adjust the camera projection matrix and generate a virtual camera in the overlapping area by interpolation, and obtain the warped images by global planar projection, as detailed in Section 3.1. Misalignment can be found in the two warped images obtained in the current step. Therefore, we need to use the AOF method in step 4 to further improve the alignment accuracy.

Step 4: Flow-based image blending. In this step, we first calculate the bidirectional asymmetric optical flow between the two warped images, then further align and blend the two warped images to generate a mosaic using the optical flow (see Section 3.2 for more details).

3.1. SC: Viewpoint Preservation Based on Sliding Camera

The sliding camera (SC) method is proposed for the first time to solve perspective deformation, and is the first step in the SC-AOF method. For this reason, this section will first introduce the stitching process of this method, and then detail how to calculate the global projection plane and the sliding projection matrix required by this method.

3.1.1. SC Stitching Process

In order to ensure that the mosaic can maintain the perspective of the two input images, the SC method is used. That is, in the non-overlapping area, the viewpoints of the two input images are preserved. In the overlapping area, the viewpoint of the camera is gradually transformed from I1 to I2.

As shown in Figure 2, the image I1 and I2 are back-projected onto the projection surface n, so that the corresponding non-overlapping areas Ω1, Ω2 and overlapping area Ωo are obtained. Assume that the pixels in the mosaic I are u1,u2,,u8, which correspond to the sampling points S1,S2,,S8 on the projection surface n. When the sampling points are within the projection area Ω1 of image  I1, the mosaic is generated from the viewpoint of I1.S1,S2,S3 are the intersection points of the backprojection lines of u1,u2,u3 in I1 and the projection surface n. Therefore, ui=P1Sii=1, 2, 3, where P1 is the projection matrix of I1. When the sampling points are within the projection area Ω2 of image I2, the mosaic is generated from the camera viewpoint of I2. Similarly, we obtain Si and ui=P2Si, where i = 6, 7, 8. In the overlapping area Ωo of I1 and I2, the SC method is used to generate a virtual camera, whose viewpoint gradually transitions from the viewpoint of I1 to that of I2. S4 and S5 are the intersection points of the back-projection lines of u4, u5 in the visual camera and projection plane n, respectively. The virtual camera’s image is generated from images I1 and I2 using perspective transformation. For example, pixel u4 of the virtual camera corresponds to pixel u14 in I1 and pixel u24 in I2, and are generated by blending the latter two pixels.

Figure 2.

Figure 2

Image stitching based on sliding cameras. n is the projection surface, which is fitted by scene points p1,p2,,p6. Stitched image I can be generated by projection of sampling points S1,S2,,S8. The points S1,S2,S3 in the area Ω1 are generated by back-projection of pixels in I1. Similarly, the points S6,S7,S8 in the area Ω2 are generated by back-projection of pixels in I2. The points S4,S5 in the area Ωo are generated by back-projection of pixels in virtual cameras. The pixel values of S4,S5 correspond to the fused pixel values of projection in I1 and I2. P1 and P2 are the camera projection matrices of images I1 and I2. To unify the pixel coordinates of I1 and I2, P2 is adjusted to P2 using the method in Section 3.1.3.

Global projection surface calculation. In order to match the corresponding pixels u14 of I1 and u24 of I2, the projection surface n needs to be as close as possible to the real scene point; we can use the moving plane method [7,8,9] or the triangulation method [6] to obtain a more accurate scene surface. Since the SC-AOF method will use the optical flow to further align the images, for the stitching speed and stability, only the global plane is calculated as the projection surface. Section 3.1.2 will calculate the optimal global projection surface using the matched points.

Sliding camera generation. Generally, since the pixel coordinates of I1 and I2 are not uniform, in the mosaic I, when I(u~)=I1(P1S) in the non-overlapping area of I1, I(u~)=I2(P2S) is false in the non-overlapping area of I2, where S is the sampling point on the projection surface. It is necessary to adjust the projection matrix of I2 to P2, so that I(u~)=I2(P2S). The red camera is shown in Figure 2. Section 3.1.3 will deduce the adjustment method of the camera projection matrix, and interpolate in the overlapping area to generate a sliding camera, and obtain the warped images of I1 and I2.

3.1.2. Global Projection Surface Calculation

The projection matrices of cameras C1 and C2 corresponding to images I1 and I2 are:

P1=K1[I3×3|0]P2=K2R[I3×3|t] (1)

where K1 and K2 are the intrinsic parameter matrices of C1 and C2 respectively; R is the inter-camera rotation matrix; and t is location of the optical center of C2 in the coordinate system C1.

The relationship between the projection u1 in I1 and the projection u2 in I2 of a 3D point p on plane n is:

u~2K2R(I3×3+tnT)K11u~1=Hu~1 (2)

where u~1 and u~2 are the homogeneous coordinates of u1 and u2, respectively. The intersection point p satisfies nTp+1=0. means that u~2 is parallel to Hu~1.

If camera parameters K1,K2, R and t are known, then we can deduce the following Equation (3) from Equation (2)

nTy~1=(RTy~2×t)T(RTy~2×y~1)(RTy~2×t)T(RTy~2×t)=b (3)

where y~1=K11u~1 and y~2=K21u~2 are the normalized coordinates of u~1 and u~2, respectively.

We use Equation (3) of all matched points to construct an overdetermined equation and obtain the fitted global plane n by solving this equation. Since the optical-flow-based stitching method will be used to further align the images, the RANSAC method is not used here to calculate the plane with the most inliers. Instead, the global plane that fits all feature points as closely as possible is selected, misalignment caused by global plane projection will be better solved during optical flow blending.

3.1.3. Projection Matrix Adjustment and Sliding Camera Generation

To preserve the viewpoint in the non-blending area of I2, it is only required to satisfy Iu~=I2Nu~=I2u~2, where u~ is the homogeneous coordinate of a pixel in the mosaic, u~2 is the homogeneous coordinate of a pixel in I2, N is a similarity transformation between I2 and I, and can be obtained by fitting the matched feature points:

N*=minSj=1n||Nu~1ju~2j||2 (4)

where u~1j and u~2j are the homogeneous coordinates of pixels in I1 and I2 respectively.

Therefore, in the non-overlapping area of I2, u~=N*1u~2=N*1K2R2(St), where S is the corresponding 3D point of u~2 on plane n. So we get the projection matrix P2=N*1K2R2.

By RQ decomposition, the internal parameter matrix K2 and rotation R2 are extracted from P2:

N*1K2R2=K2K21N*1K2R2=K2K*R*R2=K2R2 (5)

where K2 and R2 are upper triangular matrix and rotation matrix respectively; and the third line of both matrices is (001).

Compared with P2, P2 has a different intrinsic parameter matrix, and its rotation matrix only differs by one rotation around Z axis, and its optical center t is not changed.

tm=(1m)03×1+mt (6)
Km=(1m)K1+mK2 (7)
qm=sin((1m)θ)sin(θ)q1+sin(mθ)sin(θ)q2 (8)

where q1,q2,qm represent the quaternions corresponding to I3×3, R2 and Rm, θ is the angle between q1 and q2, and m is the weighting coefficient.

As depicted by Figure 3, the weighting coefficient m can be calculated by the method in AANAP [16]:

m=kmP*,kmKM/|kmKM|2 (9)
Figure 3.

Figure 3

The diagram of gradient weight. The quadrilateral is the boundary of the overlapping area of I1 and mapped image of I2 using H1, where O is the center of I1 and O is the warped point of the center point of I2 using H1. km and KM are the projection points closest to O and O on the line OO of the quadrilateral vertices, respectively. P* indicates the pixel coordinates within the overlapping area that need to calculate weighted parameter m.

In the overlapping area, if u corresponds to sliding camera (Km,Rm,tm), then the relation between u and ui in Ii(i=1,2) can be expressed as:

u~=KmRm(I+tmnT/d)K11u~1=Hm1u~1 (10)
u~=Hm1H1u~2=Hm2u~2 (11)

Equations (10) and (11) are also applicable to the non-overlapping area. Projecting I1 and I2 through Hm1 and Hm2 onto the mosaic, respectively, to get warped images I1 and I2. Obviously:

Ii(u~)=Ii((Hmi)1u~)(i=1,2) (12)

Figure 4 shows the experiment result on two school images used in [10]. Due to the parallax between I1 and I2, blending I1 and I2 will cause ghosting. Therefore, the next section will use an optical-flow-based blending method (AOF) to further align the images.

Figure 4.

Figure 4

Figure 4

Image stitching based on sliding cameras and global projection plane. (a,b) show the warped images I1 and I2 of the input images of a school; (c) shows the average blending images of I1 and I2. That is, in the overlapping area, the blended value is (I1+I2)/2.

3.2. AOF: Image Alignment Based on Asymmetric Optical Flow

The mosaic generated by the SC method will inevitably have misalignment in most cases. So, the optical-flow-based method is further employed to achieve more accurate alignment. This section firstly introduces the image alignment process based on asymmetric optical flow (AOF), and then details the calculation method of AOF.

3.2.1. Image Blending Process of AOF

I1 and I2 are projected onto the custom projection surface to obtain warped images I1 and I2, which are then blended to generate the mosaic I. As the 3D points of the scene are not always on the projection plane, ghosting artifacts can be seen in the mosaic, as shown in Figure 4 in the previous section. Direct multi-band image blending will lead to artifacts and blurring. As shown in Figure 5, point P is projected to two points p1 and p2 in the mosaic, resulting in duplication of content. To solve the ghosting problem in the mosaic, the optical-flow-based blending method in [22] is adopted.

Figure 5.

Figure 5

Image blending based on optical flow. B1E2 is the projection surface of the mosaic. In the overlapping areas (denoted by B2E1) of I1 and I2, we need to blend I1 and I2. The 3D point P is outside the projection surface. When P is projected onto the projection surface, ghosting points p1 and p2 appear. Through the weighted blending of asymmetric optical flow, p1 and p2 are merged into point p~, which solves the ghosting problem of stitching.

Suppose F21p2 represents the optical flow value of p2 in I2 and F12p1 represents the optical flow value of p1 in I1. If the blending weight of pixel p~ in the overlapping area is λ (from the non-overlapping area of I1 to the non-overlapping area of I2), λ gradually transitions from 0 to 1, as shown in Figure 5, then after blending, the pixel value of image I at p~ is:

Ip~=1λI1p~1+λI2(p~2) (13)

where p~1=p~+λF21p~ represents the corresponding value of p~ in I1, and p~2=p~+1λF12p~ represents the corresponding value of p~ in I2. That is, for any pixel p~ in the overlapping area of the mosaic, its final pixel value can be obtained by a weighted combination of its corresponding values in the two warped images using optical flow.

To achieve get a better blending effect, following the method presented by Meng and Liu [23], a softmax function is used to facilitate the mosaic transition quickly from I1 to I2, narrowing the blending area. Furthermore, if the optical flow value of a warped image is larger, the salience is higher, and the blending weight of the warped image should be increased accordingly. Therefore, the following blending weight β can be employed:

β=exp(αsλ(1+αmM2))exp(αs(1λ)(1+αmM1))+exp(αsλ(1+αmM2)) (14)

where M1=||F21(p~)|| and M2=||F12(p~)|| represents the optical flow magnitude; αs is the shape coefficient of the softmax function; and αm denotes the enhancement coefficient of the optical flow. The larger αs and αm are, the closer β is to 0 or 1, and the smaller the image transition area becomes.

Also, similar to multi-band blending, a wider blending area is used in smooth and color-consistent areas, and a narrower blending area is used in color-inconsistent areas. And the pixel consistency is measured using Dc:

Dc=||I1(p~1)I2(p~2)|| (15)

The final blending parameter α is obtained:

λd=tanh (cdDc) (16)
α=(1λd)λ+λdβ (17)

β corresponds to a fast transition from I1 to I2, λ corresponds to a linear transition from I1 to I2. When the color differs slightly, the transition from I1 to I2 is linear, and when the color difference is large, we tend to have a fast transition from I1 to I2.

Then the pixel value of the mosaic is:

I(p~)=(1α)I1(p~1)+αI2(p~2) (18)

The curve in the left panel of Figure 6 shows the curve of β with respect to λ under different optical flow intensities. β can be used to achieve quick transition of the mosaic from I1 to I2, narrowing the transition area. In the case of a large optical flow, the blending weight of the corresponding image can be increased to reduce the transition area. The curve in the right panel of Figure 6 shows the influence of λd on the curve of α as a function of λ. When λd is small, a wider fusion area tends to be used; otherwise, a narrower fusion area is used, which is similar to the blending of different frequency bands in a multi-band blending method.

Figure 6.

Figure 6

Blending parameter curves. The figure on the left shows the β curves at different optical flow intensities. The right figure shows the α curve at different λd values.

3.2.2. Calculation of Asymmetric Optical Flow

The general pipeline of the optical flow calculation is to construct an image pyramid, calculate the optical flow of each layer from coarse to fine, and use the estimated current-layer optical flow divided by the scaling factor as the initial optical flow of the finer layer until the optical flow of the finest layer is obtained [23,24,25,26]. Different methods are proposed to achieve better solutions that satisfy brightness constancy assumptions, solve large displacements and appearance variation [27,28], and address edge blur and improve temporal consistency [29,30,31]. Recently, some deep learning methods have been proposed. For example, RAFT (recurrent all-pairs field transforms for optical flow) [32] extracts per-pixel features, builds multi-scale 4D correlation volumes for all pairs of pixels, and iteratively updates a flow field through a recurrent unit. FlowFormer (optical flow Transformer) [33] is based on a transformer neural network architecture with a novel encoder which effectively aggregates cost information of correlation volume into compact latent cost tokens, and a recurrent cost decoder which recurrently decodes cost features to iteratively refine the estimated optical flows.

In order to improve the optical flow calculation speed, we use the method based on optical flow propagation and gradient descent adopted in Facebook surround360 [34] to calculate the optical flow. When calculating the optical flow of each layer, first calculate the optical flow of each pixel from top to bottom and from left to right. From the optical flow values of the current-layer left and top pixels and upper-layer same-position pixel, the value with minimum error represented by Equation (19) is selected as the initial value of the current pixel. Then a gradient descent method is performed to update the optical flow value of the current pixel, and is then spread to the right and bottom pixels, as a candidate for the initial optical flow of the right and bottom pixels. After completing the forward optical flow propagation from top to bottom and from left to right, perform a reverse optical flow propagation and gradient descent from bottom to top and from right to left to obtain the final optical flow value.

When calculating the optical flow value F(u) of pixel u, the error function E(F(u)) used is:

E(F(u))=EI+αSES+αTET (19)
EI(F(u))=||I1(u)I2(u+F(u))|| (20)
ES(F(u))=||F(u)G(u;σ)F(u)|| (21)
ET(F(u))=||F(u)D(1/W,1/H)|| (22)

where EI denotes the optical flow alignment error of the edge image (which is Gaussian filtered to improve the robustness); ES denotes the consistency error of the optical flow; G(u;σ)F(u) denotes the Gaussian-filtered optical flow of pixel u; ET denotes the magnitude error after normalization of optical flow, with excessively large optical flow being penalized; W and H are the width and height of the current-layer image, respectively, D(1/W,1/H) denotes the diagonal matrix with diagonal elements 1/W and 1/H.

3.3. Estimation of Image Intrinsic and Extrinsic Parameters

The SC-AOF method requires known camera parameters of images I1 and I2. When only the intrinsic parameters K1 and K2 of an image are known, the essential matrix t×R between two images can be obtained by feature point matching, and the rotation matrix R and translation vector t between images can be obtained by decomposing the essential matrix. When both intrinsic and extrinsic parameters are unknown, the intrinsic parameters can be estimated by calibration [35,36] firstly, and then the extrinsic parameters of the image can be estimated accordingly. In these cases, both intrinsic and extrinsic parameters of image I1 and I2 can be estimated robustly.

When none of the above methods is feasible, it is necessary to calculate the fundamental matrix from the matched feature points and restore the camera internal and external parameters.

When the camera has zero skew, the known principal point and aspect ratio, then each intrinsic parameter matrix has only one degree of freedom (focal length of the camera). The total degree of freedom of the camera parameters is 7 (where t has 2 degrees of freedom due to the inability to recover scale, R has 3 degrees of freedom, and each camera has 1 degree of freedom), which is equal to the fundamental matrix F’s degree of freedom. The internal and external parameters of the image can be recovered using a self-calibration method [37]. But even if these constraints are met, the camera parameters by solved [37] suffer from large errors when the scene is approximately planar and the matching error is large. Therefore, we use the method of optimizing the objective function in [6] to solve the internal and external parameters of the camera.

To obtain an accurate fundamental matrix, firstly, the feature points need to be distributed more evenly in the image. As shown in Figure 7, a uniform and sparse distribution of feature points can both reduce the computation time and obtain more robust intrinsic and extrinsic camera parameters and global projection planes, which will lead to improved stitching results.

Figure 7.

Figure 7

The impact of feature point distribution on stitching results. The feature points are marked by small color circles, and the blue boxes indicate the regions where the enlarged images are located in the mosaics. The feature points in (a) are concentrated in the grandstand. The corresponding mosaic (c) is misaligned in the playground area. The feature points in (b) are evenly distributed within a 2×2 grid. Although the total number of feature points is smaller, the mosaic (d) has better quality. (e,f) show the detail of mosaics.

Secondly, it is necessary to filter the matched feature points to exclude the influence of outliers. Use the similarity transformation to normalize the matched feature points. After normalization, the mean value of the feature points is 0, and the average distance to the origin is 2.

Thirdly, multiple homographies are estimated to exclude outlier points. Let Fcond and n denote all matched feature points and the total number of matched feature points. In Fcond, the RANSAC method with threshold η=0.01 is applied to compute homography H1 and its inlier set Finlier1, and the matches of isolated feature points which have no neighboring points within a 50 pixel distance are removed from Finlier1. A new candidate set is generated by removing Finlier1 from Fcond. Repeat the above steps to calculate m homography matrices Hm and corresponding inlier set Finlierm until ||Finlierm||<20 or |Finlierm||<0.05n. The final inlier set is Finlier=i=1mFinlieri. If m=1, then there is only one valid plane. In this case, apply the RANSAC method with threshold η=0.1 to recalculate homography H1 and the corresponding inlier set Finlier1.

After excluding the outliers, for any matched points {x1,x2} in the inlier set Finlier, the cost function is:

E(x1,x2)=1λh(re,σe)+λh(rp,σp) (23)

where λ balanced epipolar constraint and the infinite homography constraint, and generally take λ=0.01. h is a robust kernel function, which can mitigate the effect of mis-matched points on the optimization of camera internal and external parameters. re and rp denote the projection errors of the epipolar constraint and of the infinite homography constraint, respectively:

re=1ρx2TK2T[t]×RK11x1=1ρx2TFx1 (24)
rp=x21ωK2RK11x1 (25)

where ρ denotes the length of the vector composed of the first two components of Fx1. That is, assuming Fx1=(a,b,c)T, then ρ=a2+b2. w represents the third component value of the vector K2RK11x1.

4. Experiment

To verify the effectiveness of the SC-AOF method, the mosaics generated by our method and the existing APAP [4], AANAP [16], SPHP [14], TFT [7], REW [10] and SPW [18] methods are compared on some typical datasets used by others to verify the feasibility and advantages of the SC-AOF method in solving deformation and improving alignment. Next, the SC-AOF method is used together with other methods to demonstrate its compatibility. The image pairs used in the comparison experiment are shown in Figure 8.

Figure 8.

Figure 8

The image dataset for comparative experiments. The image pairs are initially used by stitching methods such as APAP, AANAP, and REW.

4.1. Effectiveness Analysis of SC-AOF Method

In this section, various methods of image stitching are compared and analyzed based on three indicators: perspective deformation, local alignment and running speed. The experimental setup is as follows.

  • The first two experiments compare typical methods for solving perspective deformation and local alignment, respectively, and all the methods in the first two experiments are included in the third experiment to show the superiority of the SC-AOF method in all aspects.

  • Since the averaging methods generally underperform compared to linear blending ones, all methods to be compared adopt linear blending to achieve the best performance.

  • All methods other than ours use the parameters recommended by their proposers. Our SC-AOF method has the following parameter settings in optical-flow-based image blending: αs= 10, αm= 100, and cd= 10.

4.1.1. Perspective Deformation Reduction

Figure 9 shows the results of the SC-AOF method versus the SPHP, APAP, AANAP and SPW methods for perspective deformation reduction in image stitching. School, building and park square datasets were used in this experiment. We can see from Figure 9 that, compared with the other methods, our SC-AOF method changes the viewpoint of the stitched image in a more natural manner and effectively eliminates perspective deformation. As explained below, all other methods underperform compared to our SC-AOF method.

Figure 9.

Figure 9

Figure 9

Comparison of perspective deformation processing. From the first row to the last row, the mosaics generated by our method, AANAP, SPHP, SPW and APAP on the datasets are presented, respectively. The red elliptical boxes indicate the unnatural transitions in the mosaics.

The image stitched using the APAP method has its edges stretched to a large extent. This is because it does not process perspective deformation. This method only serves as a reference to verify the effectiveness of perspective-deformation-reducing algorithms.

The AANAP algorithm can achieve a smooth transition between the two viewpoints, but results in severely “curved edges”. And there is even more severe edge stretching for the park square dataset than that of the APAP method. This is because, when the AANAP method extrapolates from homographies, it linearizes the homography in addition to similarity transformation, causing affine deformation in the final transformation.

Compared with the APAP method, the SPW method makes no significant improvement in perspective deformation, except for the image in the first row. SPW preserves perspective consistency, so a multiple-viewpoint method excels in solving perspective deformation compared to single-viewpoint method.

The SPHP algorithm performs well overall. However, it causes severe distortions in some areas (red circles in Figure 8c) due to the rapid change of viewpoints. This is because the SPHP method estimates the similarity transformation and interpolated homographies from global homography. As a result, the similarity transformation cannot reflect the real scene information and the interpolated homographies may deviate from a reasonable image projection.

4.1.2. Local Alignment

Figure 10 and Figure 11 show the results of the SC-AOF method versus APAP, TFT and REW methods for local alignment in image stitching. It can be seen that SC-AOF performs well in all scenes, showing the effectiveness of our method in local alignment.

Figure 10.

Figure 10

Qualitative comparison on the garden image pairs. From the first row to the last row, the mosaics and detail views generated by our method, APAP, TFT, REW are presented, respectively. The red boxes indicate the regions where the enlarged images are located in the mosaics. The red circles highlight errors and distortions.

Figure 11.

Figure 11

Comparison of image alignment on the wall and cabinet image pairs. From the first row to the last row, the mosaics and detail views generated by our method, APAP, TFT, REW are presented, respectively. The blue boxes indicate the region where the enlarged images are located. The red circles highlight errors and distortions.

  • The APAP method performs fairly well in most images, though with some alignment errors. This is because the moving DLT method smooths the mosaics to some extent.

  • The TFT-generated stitched image is of excellent quality in planar areas. But when there is a sudden depth change in the scene, there are serious distortions. This is because large errors appear when calculating planes using three vertices of a triangle in the area with sudden depth changes.

  • The REW method has large alignment errors in the planar area and aligns the images better than the APAP and TFT method in all other scenes. This is because the fewer feature points in the planar area might be filtered out as mismatched points by the REW method.

The SSIM (structural similarity) [38] is employed to objectively describe the alignment accuracy of different methods. SSIM measures the similarity between two images J1 and J2 to be blended in the overlapping area. For our two-step alignment method, J1(u) = I1u+λF21u, J2(u) = I2u+(1λ)F12u. The structural similarity is defined as:

SSIM(J1,J2)=(2μ1μ2+C1)(2σ12+C2)(μ12+μ22+C1)(σ12+σ22+C2) (26)

where μ1 and σ1 represent the mean and standard deviation of pixel values within the overlapping area O of J1, respectively. μ2 and σ2 are the corresponding mean and standard deviation of J2, respectively. σ12 is the covariance of pixel values in the overlapping area of J1 and J2. C1=(k1L)2 and C2=(k2L)2 are constants used to maintain stability, where k1=0.01, k1=0.03, and L is the dynamic range of pixel values (for 8-bit grayscale images, L=255).

The scores of all methods on the datasets building1, building2, garden, building, school, park-square, wall, cabinet, campus-square and racetracks are listed in Table 1. The best SSIM value is highlighted in bold.

Table 1.

Comparison of SSIM.

APAP AANAP SPHP TFT REW SPW Ours
building1 0.88 0.87 0.75 0.88 0.89 0.86 0.90
building2 0.82 0.82 0.75 0.92 0.76 0.81 0.93
garden 0.90 0.92 0.81 0.82 0.95 0.92 0.93
building3 0.93 0.94 0.89 0.70 0.96 0.90 0.96
school 0.89 0.91 0.67 0.90 0.91 0.87 0.93
wall 0.83 0.91 0.68 0.90 0.82 0.81 0.92
park-square 0.95 0.96 0.80 0.97 0.97 0.95 0.97
cabinet 0.91 0.91 0.87 0.89 0.98 0.92 0.96
campus-square 0.92 0.94 0.84 0.95 0.98 0.93 0.97
racetracks 0.74 0.79 0.68 0.86 0.83 0.70 0.85
  • APAP and AANAP have high scores on all image pairs, but the scores are lower than our method and REW, proving that APAP and AANAP blur mosaics to some extent.

  • When SPHP is not combined with APAP, only the global homography is used to align the images, resulting in lower scores compared to other methods.

  • TFT has higher scores on the datasets except for the building dataset. TFT can improve alignment accuracy but also bring instability.

  • SPW combines quasi-homography and content-preserving warping to align images, which add other constraints while also reducing the accuracy of alignment, resulting in lower scores compared to REW and our method.

  • Both REW and our method use a global homography matrix to coarsely align the images. Afterwards, in REW and our method, a deformation field and optical flow are applied to further align the images, respectively. Therefore, both methods have higher scores and robustness than other methods.

4.1.3. Stitching Speed Comparison

The running speed is a direct reflection of the efficiency of each stitching method. Figure 12 shows the speed of the SC-AOF method versus the APAP, AANAP, SPHP, TFT, REW and SPW methods. The same image pairs as in the SSIM comparison are used in this experiment. It can be seen that the REW algorithm has the fastest stitching speed. The reason is that it only needs to calculate TPS parameters based on feature point matching and then compute the transformations of grid points quickly. Our SC-AOF method ranks second in terms of stitching speed, and the AANAP algorithm requires the longest running time. Both the APAP and AANAP methods calculate the local homographies based on moving DLT, and the AANAP method also needs to calculate the Taylor expansion of anchor points.

Figure 12.

Figure 12

Comparison on elapsed time. Our method is second only to REW in speed and is superior to other methods.

4.1.4. Overall Scoring for All the Methods

In order to comprehensively and quantitatively evaluate our method and other methods in improving local alignment and reducing perspective deformation, we define a scoring method that assigns an integer score ranging from 0 to 10 to estimate the effectiveness and efficiency of stitching each image pair using each method. The total score is obtained by adding up the scores from four aspects:

  1. The subjective scoring of perspective deformation reduction. The scores from 0 to 2, respectively, indicate severe deformation, slight relief of deformation, and less deformation.

  2. The subjective scoring of local alignment. The score ranges from 0 to 2, where 0 indicates obvious ghosting in many regions, 1 indicates few or mild mismatches, and 2 indicates no apparent alignment errors.

  3. The objective scoring of local alignment. The score ranges from 0 to 3. We define the mean and standard deviation of the SSIM values of different methods on the same image pair as μ and σ, the SSIM of current method is x, the score of the method is 0, 1, 2 and 3, respectively, when x satisfies xμ<σ, xμσ,0, xμ0,σ and xμ>σ.

  4. The scoring of running time. Like the objective scoring for local alignment, we score 0 when the running time of the method is greater than the mean plus standard deviation. When the time is less than the mean plus standard deviation and greater than the mean, the score is 1. The score is 2 when the time is less than the mean and greater than the mean minus standard deviation. Otherwise, the score is 3.

The scoring results of these methods on the image pairs are shown in Table 2. The image pairs in Table 2 include those used in SSIM and runtime comparison, as well as the test image pairs in the Appendix B (specific comparison of the mosaics generated by different methods are shown in the Appendix B). Every scoring is displayed in the format as “the score of perspective deformation reduction + the subjective score of local alignment + the objective score of local alignment + the score of running time = the overall score”. The highest score is bolded and highlighted. Our SC-AOF method has the highest scores in all the image pairs except the worktable image pair. Given that our method and REW all scored highly and have the same scores on some image pairs, in order to prove that our method is indeed ahead of REW, rather than due to statistical bias, we performed a Wilcoxon test using MATLAB 2018b on all scores of our method and REW. The resultant p-values of 0.0106 and h = 1 prove that the scores of REW and our method come from different distributions, our method has the better overall performance, and our method can maintain a desirable operation efficiency while guaranteeing the final image quality. Our method can have broad applications and promotion significance.

Table 2.

The soring results on the image pairs.

APAP AANAP SPHP TFT REW SPW Ours
building1 1 + 2 + 2 + 2 = 7 2 + 2 + 2 + 0 = 6 2 + 2 + 0 + 2 = 6 1 + 2 + 2 + 1 = 6 1 + 2 + 2 + 3 = 8 1 + 2 + 1 + 2 = 6 2 + 2 + 2 + 2 = 8
building2 0 + 1 + 1 + 2 = 4 1 + 2 + 1 + 0 = 4 2 + 0 + 0 + 2 = 4 1 + 0 + 3 + 0 = 4 1 + 2 + 0 + 3 = 6 1 + 1 + 1 + 2 = 5 2 + 2 + 3 + 2 = 9
garden 1 + 2 + 2 + 1 = 6 2 + 2 + 2 + 0 = 6 2 + 1 + 0 + 1 = 4 1 + 1 + 0 + 2 = 4 1 + 2 + 3 + 3 = 9 2 + 2 + 2 + 1 = 7 2 + 2 + 3 + 2 = 9
building3 1 + 1 + 2 + 2 = 6 2 + 1 + 2 + 0 = 5 2 + 2 + 1 + 2 = 7 0 + 0 + 0 + 0 = 0 1 + 2 + 2 + 3 = 8 1 + 2 + 2 + 2 = 7 2 + 2 + 2 + 2 = 8
school 0 + 2 + 2 + 2 = 6 2 + 2 + 2 + 0 = 6 2 + 2 + 0 + 2 = 6 0 + 2 + 2 + 2 = 6 0 + 2 + 2 + 2 = 6 1 + 2 + 2 + 2 = 7 2 + 2 + 2 + 2 = 8
wall 0 + 1 + 1 + 2 = 4 2 + 2 + 2 + 0 = 6 2 + 0 + 0 + 2 = 4 1 + 2 + 2 + 1 = 6 1 + 1 + 1 + 2 = 5 0 + 1 + 1 + 2 = 4 2 + 2 + 3 + 2 = 9
park-square 1 + 2 + 2 + 2 = 7 1 + 2 + 2 + 0 = 5 2 + 0 + 0 + 1 = 3 1 + 2 + 2 + 2 = 7 1 + 2 + 2 + 3 = 8 1 + 2 + 2 + 0 = 5 2 + 2 + 2 + 2 = 8
cabinet 1 + 1 + 1 + 2 = 5 2 + 1 + 1 + 0 = 4 2 + 2 + 0 + 2 = 6 2 + 0 + 1 + 2 = 5 2 + 2 + 3 + 2 = 9 1 + 2 + 2 + 2 = 7 2 + 2 + 3 + 2 = 9
campus-square 0 + 2 + 1 + 2 = 5 1 + 2 + 2 + 0 = 5 2 + 2 + 0 + 2 = 6 0 + 2 + 2 + 2 = 6 0 + 2 + 3 + 3 = 8 0 + 2 + 1 + 2 = 5 2 + 2 + 2 + 2 = 8
racetracks 2 + 1 + 1 + 2 = 6 2 + 1 + 2 + 0 = 5 2 + 0 + 0 + 1 = 3 2 + 2 + 3 + 2 = 9 2 + 2 + 2 + 3 = 9 1 + 1 + 0 + 1 = 3 2 + 2 + 3 + 2 = 9
roundabout 2 + 2 + 2 + 2 = 8 2 + 2 + 2 + 0 = 6 2 + 2 + 0 + 1 = 5 2 + 1 + 2 + 1 = 6 2 + 2 + 2 + 2 = 8 2 + 2 + 0 + 2 = 6 2 + 2 + 2 + 2 = 8
fence 1 + 2 + 2 + 2 = 7 2 + 2 + 2 + 0 = 6 2 + 1 + 0 + 2 = 5 1 + 1 + 2 + 1 = 5 1 + 2 + 2 + 2 = 7 2 + 2 + 2 + 2 = 8 2 + 2 + 2 + 2 = 8
railtracks 2 + 1 + 1 + 2 = 6 2 + 2 + 2 + 0 = 6 2 + 0 + 0 + 2 = 4 2 + 2 + 2 + 2 = 8 2 + 2 + 2 + 2 = 8 2 + 1 + 1 + 0 = 4 2 + 2 + 3 + 2 = 9
temple 1 + 2 + 2 + 2 = 7 2 + 2 + 2 + 0 = 6 2 + 2 + 0 + 2 = 6 1 + 1 + 2 + 2 = 4 1 + 2 + 2 + 2 = 7 1 + 1 + 1 + 2 = 5 2 + 2 + 2 + 2 = 8
corner 2 + 2 + 2 + 1 = 7 2 + 1 + 2 + 0 = 5 2 + 1 + 1 + 2 = 6 2 + 0 + 0 + 2 = 4 2 + 2 + 2 + 2 = 8 2 + 1 + 2 + 2 = 7 2 + 2 + 2 + 2 = 8
shelf 2 + 2 + 2 + 1 = 7 2 + 2 + 2 + 0 = 6 2 + 2 + 0 + 2 = 6 2 + 1 + 2 + 2 = 7 2 + 2 + 2 + 3 = 9 2 + 2 + 2 + 1 = 7 2 + 2 + 2 + 2 = 8
standing-he 1 + 1 + 2 + 2 = 6 2 + 1 + 1 + 1 = 5 2 + 1 + 1 + 2 = 6 0 + 0 + 1 + 2 = 3 1 + 1 + 2 + 2 = 6 1 + 1 + 2 + 0 = 4 2 + 1 + 3 + 2 = 8
foundation 1 + 2 + 2 + 2 = 7 2 + 2 + 2 + 0 = 6 2 + 2 + 0 + 2 = 6 1 + 1 + 3 + 2 = 7 2 + 2 + 2 + 2 = 8 1 + 2 + 1 + 0 = 4 2 + 2 + 2 + 2 = 8
guardbar 1 + 1 + 2 + 1 = 5 2 + 1 + 2 + 0 = 5 2 + 1 + 0 + 2 = 5 1 + 1 + 3 + 2 = 7 1 + 1 + 2 + 2 = 6 1 + 1 + 1 + 1 = 4 2 + 1 + 2 + 2 = 7
office 1 + 1 + 2 + 2 = 6 2 + 1 + 2 + 0 = 5 2 + 1 + 0 + 2 = 5 0 + 1 + 2 + 1 = 4 1 + 1 + 1 + 3 = 6 0 + 1 + 2 + 1 = 4 2 + 2 + 3 + 2 = 9
plantain 1 + 2 + 2 + 2 = 7 1 + 2 + 2 + 0 = 5 2 + 2 + 1 + 2 = 7 0 + 0 + 0 + 1 = 1 2 + 1 + 2 + 3 = 8 1 + 2 + 2 + 2 = 7 2 + 2 + 2 + 2 = 8
building4 1 + 1 + 2 + 1 = 5 2 + 1 + 2 + 0 = 5 2 + 1 + 0 + 2 = 5 1 + 1 + 2 + 2 = 6 1 + 2 + 2 + 2 = 7 1 + 2 + 1 + 2 = 6 2 + 2 + 3 + 2 = 9
potberry 2 + 2 + 2 + 2 = 8 2 + 2 + 2 + 2 = 8 2 + 2 + 0 + 2 = 6 2 + 2 + 2 + 1 = 7 2 + 1 + 2 + 3 = 8 1 + 1 + 1 + 0 = 3 2 + 2 + 2 + 2 = 8
lawn 1 + 2 + 2 + 2 = 7 1 + 2 + 2 + 2 = 7 2 + 2 + 0 + 0 = 4 1 + 2 + 2 + 1 = 6 1 + 2 + 2 + 3 = 8 1 + 2 + 2 + 1 = 6 2 + 2 + 2 + 2 = 8
worktable 2 + 1 + 2 + 2 = 7 2 + 1 + 1 + 0 = 4 1 + 1 + 0 + 2 = 4 2 + 0 + 2 + 2 = 6 2 + 2 + 2 + 2 = 8 2 + 1 + 2 + 0 = 5 2 + 2 + 2 + 2 = 8

4.2. Compatibility of SC-AOF Method

The SC-AOF method can not only be used independently to generate stitched image with reduced perspective deformation and low alignment error, but also be decomposed (into SC method and image blending method) and combined with other methods to improve the quality of the mosaic.

4.2.1. SC Module Compatibility Analysis

The sliding camera (SC) module in the SC-AOF method can not only be used in the global alignment model, but also be combined with other local alignment models (e.g., APAP and TFT) to solve perspective deformation while maintaining the alignment accuracy. The implementation steps are as follows.

  1. Use the global similarity transformation to project I2 onto the I1 coordinate system to calculate the size and mesh vertices of the mosaic;

  2. Use Equations (6)–(9) to calculate the weights of mesh vertices and the projection matrix, replace the homography H in (2) with the homography matrix in local alignment model, and bring them into (12) to compute the warped images and blend them.

Figure 13 presents the stitched images when using the TFT algorithm alone vs. using the TFT algorithm combined with the SC method. The combined method is more effective in mitigating edge stretching, and it generates more natural images. This shows that the SC method can effectively solve perspective deformation suffered by the local alignment method.

Figure 13.

Figure 13

The combination of TFT and moving cameras method. (a) The mosaics created using TFT. (b) The mosaics obtained by adding the moving camera method to TFT.

4.2.2. Blending Module Compatibility Analysis

The asymmetric optical-flow-based blending in the SC-AOF method can also be used in other methods to enhance the final stitching effect. The implementation steps are as follows.

  1. Generate two projected images using one of the other algorithms and calculate the blending parameters based on the overlapping areas;

  2. Set the optical flow value to be 0, replace linear blending parameter λ with α in Equation (17) to blend warped images, preserve the blending band width in the low-frequency area and narrow the blending width in the high-frequency area to obtain a better image stitching effect.

Figure 14 shows the image stitching effect of the APAP algorithm when using linear blending vs. when using our blending method. It can be seen that the blurring and ghosting in the stitched image are effectively mitigated when using our blending method. This shows that our blending algorithm can blend the aligned images better.

Figure 14.

Figure 14

The combination of APAP and our blending method. (a) The mosaic and detail view generated by the APAP using linear blending. (b) The results of APAP combined with our blending method. The red elliptical boxes indicate the regions where the enlarged images are located.

5. Conclusions

In this paper, to solve the perspective deformation and misalignment in image stitching using homographies, a SC-AOF method is proposed. In image warping, a new virtual camera and a projection matrix are generated as the observation perspective in the overlapping area by interpolating between two projection matrices. The overlapping area transitions gradually from one viewpoint to another to achieve preservation of the viewpoint and the smooth transition of the stitched image, and thus solve the perspective deformation problem. In image blending, the optical-flow-based blending algorithm is proposed to further improve alignment accuracy. The width of the blending area is automatically adjusted according to the softmax function and alignment accuracy. Finally, extensive comparison experiments are conducted to demonstrate the effectiveness of our algorithm in reducing perspective deformation and improving alignment accuracy. In addition, our algorithm had broad applicability, as its component modules can be used with other algorithms to mitigate edge stretching and improve alignment accuracy.

However, the proposed local alignment method may fail if the input images contain large parallax, which cause severe occlusion to prevent us from obtaining the correct optical flow. The problem of local alignment failure caused by large parallax also exists in other local alignment methods. Exploring more robust optical flow calculation and occlusion processing methods to reduce misalignment in a large parallax scene is an interesting research direction for future work.

Appendix A

Given that the abundance of symbols and abbreviations in this paper can lead to reading confusion, the symbols and abbreviations are listed and explained in Table A1 and Table A2.

Table A1.

The symbols and their explanations.

Symbols Description
I1,I2,I the source image pair and the final mosaic
n the global projection plane
Ω1,Ω2 the non-overlapping area of I1,I2
Ω0 the overlapping area of I1 and I2
P1,P2 the camera projection matrix of I1,I2
Pim,Pm the projection matrix of virtual sliding camera
P2 the adjusted projection matrix of I2 to unify the pixel coordinates of I1 and I2
ui,u the non-homogeneous coordinate of the pixel in I
u~i,u~ the homogeneous coordinate of the pixel in I
u1i,u1,u2i,u2 the non-homogeneous coordinate of the pixel in I1,I2
u~1i,u~1,u~2i,u~2 the homogeneous coordinate of the pixel in I1,I2
S,Si the sampling points on the plane n which are projected onto u and ui in I
p,pi the 3D scene points
C1,C2 the cameras corresponding to I1 and I2
K1,K2 the internal parameter matrices of C1 and C2
I3×3 the 3 by 3 identity matrix
R,t the rotation matrix and the location of the optical center of C2 in C1s  coordinate system
y~1,y~2 normalized coordinates of u~1 and u~2
N* the similarity matrix between I1 and I2
K2,R2 the rotation matrix and translation vector corresponding to P2
Km,Rm,tm the internal parameter matrix, the rotation matrix and the translation vector corresponding to Pm
q1,q2,qm the quaternions corresponding to I3×3, R2 and Rm
Hmii=1,2 the homography between Ii and I
Iii=1,2 the warped image of Ii using Hmi
Fijpi the optical flow of pi in Ii which makes Ii(pi)=Ij(pi+Fijpi)
p~,p~1,p~2 the pixel in I and corresponding pixel in I1,I2
M1,M2 the optical flow magnitude of F21p~ and F12p~
αs,αm the softmax function’s shape coefficien and the optical flow’s enhancement coefficient
λ the weight which makes I transition from I1 to I2 linearly
β the softmax weight to transition I from I1 to I2 fastly
α the linear combinationof λ and β which makes the transition of I from I1 to I2 depends on the color difference
Dc,λd the color difference and the hyperbolic tangent function of the color difference
F(u) the optical flow of u
E(F(u)) the error function of F(u) used for solving the optimal optical flow
EI,ES,ET the optical flow’s alignment error, consistency error and penalty for large value
Finlieri,Hi the i-th homography transforming I1 to I2 and the corresponding inlier set
Fcond,Finlier the initial set and the final inlier set of matched feature points
re,rp the projection errors of the epipolar constraint and of the infinite homography constraint
h(r,σ) the robust kernel function to reduce the impact of false matches to optimization

Table A2.

The abbreviations and their explanations.

Abbreviation Meaning
SC sliding camera, proposed by us to solve the perspective deformation
AOF asymmetric optical flow, proposed by us to slove the local alignment
APAP as-projective-as-possible, used to solve the local alignment by location-dependent homography warping
DLT direct linear transform, used for estimating the parameters of the homography
REW robust elastic warping, used to improve the local alignment using deformation fields
TPS thin-plate spline, used to compute deformation fields corresponding to matched feature points
TFT triangular facet approximation, using scene triangular facet estimating to improve the local alignment
NIS natural image stitching, a local alignment method using the depth map
SPHP shape preserving half projective, solving perspective deformation by gradually changing the resultant warp from projective to similarity
AANAP adaptive as-natural-as-possible, a method to solve perspective deformation
GSP global similarity prior, used to align images and reduce deformation
SPW single-projective warp, which adopts the quasi-homography warp to mitigate projective distortion and preserve single perspective
SPSO structure preservation and seam optimization, a method can obtain precise alignment while preserving local and global image structures.
GES-GSP geometric structure preserving-global similarity prior, based on GSP to futher protect the large-scale geometric structure from distortion
SIFT scale-invariant feature transform, a feature detection and description method
SURF speed-up robust feature, a feature detection and description method, faster than SIFT
KNN k-nearest neighbor, a feature matching method
RAFT recurrent all-pairs field transforms, estimating optical flow based on deep learning
RANSAC random sample consensus, used to filter outliers and estimate model parameters

Appendix B

In this section, some supplementary experiments about perspective deformation reduction and local alignment are added. The image pairs used in the supplementary experiments are shown in Figure A1.

Figure A1.

Figure A1

The image dataset for supplementary comparative experiments. The image pairs are initially used by stitching methods such as APAP, AANAP, REW. Four yellow Chinese artistic characters are shown on the temple’s wall in the upper image of (d).

Figure A2, Figure A3 and Figure A4 show the comparisons of perspective deformation reduction of our method, AANAP, SPHP, SPW and APAP.

The comparisons of local alignment of our method, APAP, TFT and REW are shown in Figure A5, Figure A6, Figure A7, Figure A8, Figure A9, Figure A10, Figure A11 and Figure A12. The detail images inside the red rects are the image regions with misalignment and shown directly right the mosaics.

Figure A2.

Figure A2

Qualitative comparisons of perspective deformation reduction on the building1, fence and building4 image pairs.

Figure A3.

Figure A3

Qualitative comparisons of perspective deformation reduction on the foundation and office image pairs.

Figure A4.

Figure A4

Qualitative comparisons of perspective deformation reduction on the standing-he and lawn image pairs.

Figure A5.

Figure A5

Qualitative comparisons of local alignment on the railtracks image pair. The red boxes indicate the regions where the enlarged images are located in the mosaics.

Figure A6.

Figure A6

Qualitative comparisons of local alignment on the worktable image pair. The red boxes indicate the regions where the enlarged images are located in the mosaics.

Figure A7.

Figure A7

Qualitative comparisons of local alignment on the temple image pair. The red boxes indicate the regions where the enlarged images are located in the mosaics.

Figure A8.

Figure A8

Qualitative comparisons of local alignment on the guardbar image pair. The red boxes indicate the regions where the enlarged images are located in the mosaics.

Figure A9.

Figure A9

Qualitative comparisons of local alignment on the roundabout image pair. The red boxes indicate the regions where the enlarged images are located in the mosaics.

Figure A10.

Figure A10

Qualitative comparisons of local alignment on the potberry image pair. The red boxes indicate the regions where the enlarged images are located in the mosaics.

Figure A11.

Figure A11

Qualitative comparisons of local alignment on the plantain image pair. TFT failed to stitch this image pair. The red boxes indicate the regions where the enlarged images are located in the mosaics.

Figure A12.

Figure A12

Qualitative comparisons of local alignment on the shelf and corner image pairs. TFT failed to stitch the corner image pair. The Red circles highlight errors and distortions.

The scores of all methods on the image pairs roundabout, fence, railtracks, temple, corner, shelf, standing-he, foundation, guardbar, office, plantain, building4, potberry, lawn and worktable are listed in Table A3. The best SSIM value is highlighted in bold.

Table A3.

Comparison on SSIM.

APAP AANAP SPHP TFT REW SPW Ours
roundabout 0.85 0.86 0.77 0.86 0.86 0.76 0.87
fence 0.93 0.95 0.81 0.95 0.95 0.93 0.95
railtracks 0.77 0.90 0.62 0.92 0.85 0.77 0.94
temple 0.90 0.91 0.73 0.95 0.94 0.85 0.96
corner 0.98 0.97 0.91 0.73 0.98 0.97 0.96
shelf 0.98 0.98 0.84 0.95 0.97 0.96 0.97
standing-he 0.72 0.77 0.64 0.35 0.78 0.71 0.84
foundation 0.75 0.78 0.58 0.83 0.76 0.71 0.80
guardbar 0.74 0.74 0.58 0.79 0.77 0.65 0.76
office 0.79 0.78 0.55 0.84 0.65 0.75 0.88
plantain 0.85 0.85 0.67 0.31 0.82 0.85 0.90
building4 0.71 0.72 0.58 0.73 0.74 0.70 0.78
potberry 0.89 0.89 0.68 0.93 0.87 0.82 0.91
lawn 0.92 0.95 0.79 0.95 0.95 0.93 0.95
worktable 0.87 0.84 0.59 0.86 0.97 0.85 0.97

Figure A13 shows the speed of the SC-AOF method versus the APAP, AANAP, SPHP, TFT, REW and SPW methods. The same image pairs as in the SSIM comparison are used in this experiment.

Figure A13.

Figure A13

Comparison on elapsed time. Our method is only second to REW in speed and is superior to other methods.

Author Contributions

Conceptualization, Q.L. and J.C.; methodology, J.C.; software, J.C.; validation, J.C.; formal analysis, Q.L.; investigation, Q.L. and J.C.; resources, Y.L.; data curation, Y.L.; writing—original draft preparation, J.C.; writing—review and editing, L.Z.; visualization, L.Z.; supervision, Q.L.; project administration, Q.L. All authors have read and agreed to the published version of the manuscript.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

Author Jiayi Chang and Yanju Liang were employed by the company Wuxi Iot Innovation Center Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Funding Statement

This research received no external funding.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

  • 1.Abbadi N.K.E.L., Al Hassani S.A., Abdulkhaleq A.H. A review over panoramic image stitching techniques. J. Phys. Conf. Ser. 2021;1999:012115. doi: 10.1088/1742-6596/1999/1/012115. [DOI] [Google Scholar]
  • 2.Gómez-Reyes J.K., Benítez-Rangel J.P., Morales-Hernández L.A., Resendiz-Ochoa E., Camarillo-Gomez K.A. Image mosaicing applied on UAVs survey. Appl. Sci. 2022;12:2729. doi: 10.3390/app12052729. [DOI] [Google Scholar]
  • 3.Xu Q., Chen J., Luo L., Gong W., Wang Y. UAV image stitching based on mesh-guided deformation and ground constraint. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021;14:4465–4475. doi: 10.1109/JSTARS.2021.3061505. [DOI] [Google Scholar]
  • 4.Wen S., Wang X., Zhang W., Wang G., Huang M., Yu B. Structure Preservation and Seam Optimization for Parallax-Tolerant Image Stitching. IEEE Access. 2022;10:78713–78725. doi: 10.1109/ACCESS.2022.3194245. [DOI] [Google Scholar]
  • 5.Tang W., Jia F., Wang X. An improved adaptive triangular mesh-based image warping method. Front. Neurorobotics. 2023;16:1042429. doi: 10.3389/fnbot.2022.1042429. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Li J., Deng B., Tang R., Wang Z., Yan Y. Local-adaptive image alignment based on triangular facet approximation. IEEE Trans. Image Process. 2019;29:2356–2369. doi: 10.1109/TIP.2019.2949424. [DOI] [PubMed] [Google Scholar]
  • 7.Lee K.Y., Sim J.Y. Warping residual based image stitching for large parallax; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; Seattle, WA, USA. 16–18 June 2020; pp. 8198–8206. [Google Scholar]
  • 8.Zhu S., Zhang Y., Zhang J., Hu H., Zhang Y. ISGTA: An effective approach for multi-image stitching based on gradual transformation matrix. Signal Image Video Process. 2023;17:3811–3820. doi: 10.1007/s11760-023-02609-9. [DOI] [Google Scholar]
  • 9.Zaragoza J., Chin T.J., Brown M.S., Suter D. As-Projective-As-Possible Image Stitching with Moving DLT; Proceedings of the Computer Vision and Pattern Recognition (CVPR); Portland, OR, USA. 25–27 June 2013. [Google Scholar]
  • 10.Li J., Wang Z., Lai S., Zhai Y., Zhang M. Parallax-tolerant image stitching based on robust elastic warping. IEEE Trans. Multimed. 2017;20:1672–1687. doi: 10.1109/TMM.2017.2777461. [DOI] [Google Scholar]
  • 11.Xue F., Zheng D. Elastic Warping with Global Linear Constraints for Parallax Image Stitching; Proceedings of the 2023 15th International Conference on Advanced Computational Intelligence (ICACI); Seoul, Republic of Korea. 6–9 May 2023; pp. 1–6. [Google Scholar]
  • 12.Liao T., Li N. Natural Image Stitching Using Depth Maps. arXiv. 20222202.06276 [Google Scholar]
  • 13.Cong Y., Wang Y., Hou W., Pang W. Feature Correspondences Increase and Hybrid Terms Optimization Warp for Image Stitching. Entropy. 2023;25:106. doi: 10.3390/e25010106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Chang C.H., Sato Y., Chuang Y.Y. Shape-preserving half-projective warps for image stitching; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Columbus, OH, USA. 23–28 June 2014; pp. 3254–3261. [Google Scholar]
  • 15.Chen J., Li Z., Peng C., Wang Y., Gong W. UAV image stitching based on optimal seam and half-projective warp. Remote Sens. 2022;14:1068. doi: 10.3390/rs14051068. [DOI] [Google Scholar]
  • 16.Lin C.-C., Pankanti S.U., Ramamurthy K.N., Aravkin A.Y. Adaptive as-natural-as-possible image stitching; Proceedings of the Computer Vision & Pattern Recognition; Boston, MA, USA. 7–10 June 2015; [DOI] [Google Scholar]
  • 17.Chen Y., Chuang Y. Natural Image Stitching with the Global Similarity Prior; Proceedings of the European Conference on Computer Vision; Amsterdam, The Netherlands. 11–14 October 2016; Berlin/Heidelberg, Germany: Springer International Publishing; 2016. [Google Scholar]
  • 18.Cui J., Liu M., Zhang Z., Yang S., Ning J. Robust UAV thermal infrared remote sensing images stitching via overlap-prior-based global similarity prior model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020;14:270–282. doi: 10.1109/JSTARS.2020.3032011. [DOI] [Google Scholar]
  • 19.Liao T., Li N. Single-perspective warps in natural image stitching. IEEE Trans. Image Process. 2019;29:724–735. doi: 10.1109/TIP.2019.2934344. [DOI] [PubMed] [Google Scholar]
  • 20.Li N., Xu Y., Wang C. Quasi-homography warps in image stitching. IEEE Trans. Multimed. 2017;20:1365–1375. doi: 10.1109/TMM.2017.2771566. [DOI] [Google Scholar]
  • 21.Du P., Ning J., Cui J., Huang S., Wang X., Wang J. Geometric Structure Preserving Warp for Natural Image Stitching; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; New Orleans, LA, USA. 19–24 June 2022; pp. 3688–3696. [Google Scholar]
  • 22.Bertel T., Campbell N.D.F., Richardt C. Megaparallax: Casual 360 panoramas with motion parallax. IEEE Trans. Vis. Comput. Graph. 2019;25:1828–1835. doi: 10.1109/TVCG.2019.2898799. [DOI] [PubMed] [Google Scholar]
  • 23.Meng M., Liu S. High-quality Panorama Stitching based on Asymmetric Bidirectional Optical Flow; Proceedings of the 2020 5th International Conference on Computational Intelligence and Applications (ICCIA); Virtual. 19–21 June 2020; pp. 118–122. [Google Scholar]
  • 24.Hofinger M., Bulò S.R., Porzi L., Knapitsch A., Pock T., Kontschieder P. Improving optical flow on a pyramid level; Proceedings of the European Conference on Computer Vision; Glasgow, UK. 23–28 August 2020; Cham, Switzerland: Springer International Publishing; 2020. pp. 770–786. [Google Scholar]
  • 25.Shah S.T.H., Xiang X. Traditional and modern strategies for optical flow: An investigation. SN Appl. Sci. 2021;3:289. doi: 10.1007/s42452-021-04227-x. [DOI] [Google Scholar]
  • 26.Zhai M., Xiang X., Lv N., Kong X. Optical flow and scene flow estimation: A survey. Pattern Recognit. 2021;114:107861. doi: 10.1016/j.patcog.2021.107861. [DOI] [Google Scholar]
  • 27.Liu C., Yuen J., Torralba A. Sift flow: Dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 2010;33:978–994. doi: 10.1109/TPAMI.2010.147. [DOI] [PubMed] [Google Scholar]
  • 28.Zhao S., Zhao L., Zhang Z., Zhou E., Metaxas D. Global matching with overlapping attention for optical flow estimation; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; New Orleans, LA, USA. 19–24 June June 2022; pp. 17592–17601. [Google Scholar]
  • 29.Rao S., Wang H. Robust optical flow estimation via edge preserving filtering. Signal Process. Image Commun. 2021;96:116309. doi: 10.1016/j.image.2021.116309. [DOI] [Google Scholar]
  • 30.Jeong J., Lin J., Porikli F., Kwak N. Imposing consistency for optical flow estimation; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; New Orleans, LA, USA. 19–24 June 2022; pp. 3181–3191. [Google Scholar]
  • 31.Anderson R., Gallup D., Barron J.T., Kontkanen J., Snavely N., Hernández C., Agarwal S., Seitz S.M. Jump: Virtual reality video. ACM Trans. Graph. (TOG) 2016;35:1–13. doi: 10.1145/2980179.2980257. [DOI] [Google Scholar]
  • 32.Teed Z., Deng J. Raft: Recurrent all-pairs field transforms for optical flow; Proceedings of the Computer Vision–ECCV 2020: 16th European Conference; Glasgow, UK. 23–28 August 2020; Berlin/Heidelberg, Germany: Springer International Publishing; 2020. pp. 402–419. Proceedings, Part II 16. [Google Scholar]
  • 33.Huang Z., Shi X., Zhang C., Wang Q., Cheung K.C., Qin H., Dai J., Li H. Flowformer: A transformer architecture for optical flow; Proceedings of the European Conference on Computer Vision; Tel Aviv, Israel. 23–27 October 2022; Cham, Switzerland: Springer Nature; 2022. pp. 668–685. [Google Scholar]
  • 34. [(accessed on 1 January 2022)]. Available online: https://github.com/facebookarchive/Surround360.
  • 35.Zhang Y. 3-D Computer Vision: Principles, Algorithms and Applications. Springer Nature; Singapore: 2023. Camera calibration; pp. 37–65. [Google Scholar]
  • 36.Zhang Y., Zhao X., Qian D. Learning-Based Framework for Camera Calibration with Distortion Correction and High Precision Feature Detection. arXiv. 20222202.00158 [Google Scholar]
  • 37.Fang J., Vasiljevic I., Guizilini V., Ambrus R., Shakhnarovich G., Gaidon A., Walter M.R. Self-supervised camera self-calibration from video; Proceedings of the 2022 International Conference on Robotics and Automation (ICRA); Philadelphia, PA, USA. 23–27 May 2022; pp. 8468–8475. [Google Scholar]
  • 38.Wang Z., Bovik A.C., Sheikh H.R., Simoncelli E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004;13:600–612. doi: 10.1109/TIP.2003.819861. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.


Articles from Sensors (Basel, Switzerland) are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES