Region-based image registration for remote sensing imagery

Azubuike Okorie; Sokratis Makrogiannis

doi:10.1016/j.cviu.2019.102825

. Author manuscript; available in PMC: 2023 Aug 24.

Published in final edited form as: Comput Vis Image Underst. 2019 Sep 25;189:102825. doi: 10.1016/j.cviu.2019.102825

Region-based image registration for remote sensing imagery^☆

Azubuike Okorie ¹, Sokratis Makrogiannis ^1,^*

PMCID: PMC10448482 NIHMSID: NIHMS1922095 PMID: 37622168

Abstract

We propose an automatic region-based registration method for remote sensing imagery. In this method, we aim to register two images by matching region properties to address possible errors caused by local feature estimators. We apply automated image segmentation to identify the regions and calculate regional Fourier descriptors and standardized regional intensity descriptors for each region. We define a joint matching cost, as a linear combination of Euclidean distances, to establish and extract correspondences between regions. The segmentation technique utilizes kernel density estimators for edge localization, followed by morphological reconstruction and the watershed transform. We evaluated the registration performance of our method on synthetic and real datasets. We measured the registration accuracy by calculating the root-mean-squared error (RMSE) between the estimated transformation and the ground truth transformation. The results obtained using the joint intensity-Fourier descriptor were compared to the results obtained using Harris, Minimum eigenvalue, Features accelerated segment test (FAST), speeded-up robust features (SURF), binary robust invariant scalable keypoints (BRISK) and KAZE keypoint descriptors. The joint intensity-Fourier descriptor yielded average RMSE of 0.446 ± 0.359 pixels and 1.152 ± 0.488 pixels on two satellite imagery datasets consisting of 35 image pairs in total. These results indicate the capacity of the proposed technique for high accuracy. Our method also produces a lower registration error than the compared feature-based methods.

1. Introduction

Image registration has found wide applicability in various application domains including computer vision, remote sensing, biomedicine, and security. For example, image registration can be used in target recognition applications for matching real-time images with a target. In the biomedical imaging domain, image registration can be used to study the growth of tumors. Also, disease diagnosis can be carried out by matching two images from different modalities. In remote sensing, it can be used for monitoring climatic change and land-use using satellite images. Also in the field of computer vision and pattern recognition, it can be used for matching stereo images for object/shape recognition (Brown, 1992; Zitova and Flusser, 2003). Most registration techniques are classified into intensity-based or feature-based categories.

Feature-based registration methods rely on locally invariant and distinctive image features for registration. The major stages in feature-based algorithms are Feature Detection, Feature Extraction, Feature Matching and Estimation of Geometric Transform (Fig. 1). Feature detection deals with the detection of salient image features, such as corners, intersections, blobs, edges, and regions (Tuytelaars and Mikolajczyk, 2008). For the purpose of finding point correspondences, such features are usually required to be locally invariant to geometric transformations, e.g., changes in viewpoint, rotation, translation and scale invariant, and also to be invariant to photometric changes, for example, noise and occlusion. Some known feature detectors in the literature are Harris corner detector (Harris and Stephens, 1988), corner detection by minimum eigenvalues method (Shi and Tomasi, 1994), Features from accelerated segment test (FAST) (Muja and Lowe, 2009) corner detector, Speeded-Up Robust Feature (SURF) (Bay et al., 2006), Binary Robust Invariant Scalable Keypoint (BRISK) (Leutenegger et al., 2011) and KAZE by Alcantarilla et al. (2012).

Fig. 1. — Stages in Feature-based registration.

The FAST algorithm by Muja and Lowe (2009, 2012) primarily detects a corner at a pixel $p$ using threshold tests over the Bresenham circle of radius 3. In BRISK, points of interest are detected in both image and scale dimensions using a saliency criterion. The effectiveness of this technique is boosted by detecting keypoints in octave and intraoctave layers of the scale space pyramid. The location and scale of each keypoint are obtained in the continuous domain via quadratic function fitting (Leutenegger et al., 2011). The SURF algorithm by Bay et al. (2006) uses the FAST Hessian detector for feature detection. Here the scale space is analyzed by up-scaling instead of constantly reducing the original image. In order to localize interest points in image and scale spaces, non maximum suppression is applied on a 3 × 3 × 3 neighborhood. The maxima of the determinant of the Hessian matrix are then interpolated in scale and image space with the method proposed by Brown and Lowe (2002). Feature extraction is typically done on regions centered around detected features. This step aims at measuring features that can distinguish the points of interest such as object corners, edges, lines and intersections. FAST features are extracted by forming feature vectors using pixel intensities from the 16 pixel circle. The orientations of pixels are determined by assigning either a positive or negative sign to the pixels depending on whether they are greater than the center or not (Muja and Lowe, 2009, 2012). SURF features are extracted in two steps. First a reproducible orientation is fixed based on information from a circular region around the interest point. Rotation invariance is achieved by computing the Haar Wavelets response in the $x$ and $y$ directions. The descriptor component defines a square region aligned to the selected orientation from which the SURF descriptor will be extracted (Bay et al., 2006). Feature matching finds point correspondences between the reference and the input image. The FAST approach minimizes the sum of squared difference (SSD) of the feature vector between the feature points. The $O (N^{2})$ cost of testing every feature against every other feature is achieved by sorting $F_{I}$ by the mean value of feature vectors $\overline{f}$ (Muja and Lowe, 2009). In the BRISK method, two feature descriptors are matched by computing their Hamming distances as discussed in BRIEF (Calonder et al., 2010). The dissimilarity measure between them is determined by the difference in the number of bits in the two descriptors (Leutenegger et al., 2011).

Okorie and Makrogiannis (2017) validated the accuracy of feature-based registration methods on remotely sensed imagery, by calculating pixel discrepancy error between the ground truth transformation and the estimated geometric transform obtained for each of the descriptors mentioned above. Their results show that feature-based registration techniques have the capacity of subpixel accuracy. Feature-based techniques were also applied to remote sensing datasets for image registration in Dare and Dowman (2001), Xiong and Zhang (2009), Makrogiannis and Bourbakis (2010), Huo et al. (2012), Ye and Shan (2014), Gong et al. (2014) and Wang et al. (2015).

Despite their good performance, feature-based techniques make use of mostly low or mid-level image entities, therefore, they lack higher-level visual representation that may prove advantageous for difficult matching problems. In the literature, few methods have been proposed that aim to identify, represent and match regions corresponding to objects or their meaningful parts (Li et al., 1995; Liang et al., 1997; Dare and Dowman, 2001; Rendas and Barral, 2008; Goncalves et al., 2011a,b; Troglio et al., 2012; Kybic and Borovec, 2014; Kybic et al., 2015; Han et al., 2017). An early technique by Liang et al. (1997) employed multiscale segmentation followed by Powell direction set optimization and applied this method to biomedical images. This method required manual selection of regions by the user in the matching stage. Dare and Dowman (2001) employed global shape features to match control points. Rendas and Barral (2008) utilize the MDL criterion between segmented images to find an optimal match for linear registration, however they do not propose a segmentation method and region shape is not explicitly used. Goncalves et al. (2011b) apply histogram processing to identify modes and flat regions of intensity distribution and calculate connected components to form regions. This method represents each region by global shape characteristics and calculates a pairwise cost function over the object pairs. The rotation and translation parameters are sequentially estimated by identifying outliers in their distribution. However, this method does not address scale differences. In Goncalves et al. (2011a), the authors used PCA for multisensor fusion, multi-level Otsu thresholding for segmentation, calculated SIFT features and used a statistical approach for matching, similar to their previous work. In this method, multi-level Otsu thresholding is susceptible to noise and may fail for non-Gaussian distributed pixel intensities. In Troglio et al. (2012) ellipsoidal features were extracted using image processing operations and watershed, and the Hough transform was utilized to match craters and rocks of compact shape for planetary image registration. An object-based registration method for very high resolution VHR images was proposed by Han et al. (2017) that aimed to correct local residual misalignments produced by standard registration methods.

In this work, we transition from pixel-centric to region-based registration for remotely sensed imagery. Our hypothesis is that if we can obtain similar regions in both reference and input images, irrespective of geometric or photometric changes, then these regions can be matched to register the two images. We propose a complete method for region delineation, matching, and geometric transformation. We apply automated image segmentation to identify the regions. In contrast to previous works, we extract and combine object-based intensity and shape information to represent the image content. We introduce regional Fourier descriptors and standardized regional intensity descriptors for each region. We define a joint matching cost, as a linear combination of Euclidean distances, to establish and extract correspondences between regions. To evaluate the performance of this method we compare its registration accuracy to feature-based methods.

In Section 2, we introduce our method including the segmentation stage, the regional shape and intensity descriptors, and our approach to calculating joint region similarity scores. We validate the registration accuracy for real satellite imagery data and compare our method with contemporary feature-based registration in Section 3. We discuss our main findings in Section 4 and we conclude this work in Section 5.

2. Region-based registration

We propose a method, in which we match regions instead of local image features for registration. The main stages of this algorithm are displayed in Fig. 2. The algorithm is divided according to the major components of feature-based registration.

Fig. 2. — Flow chart for region based registration.

2.1. Region delineation and feature detection

In this stage, we delineate the image scene into regions and identify the corresponding features for matching. We apply a histogram matching technique to reduce the effect of multisensor variations on the segmentation. Then we calculate a probabilistic map of the edge location and strength using Parzen kernel density estimation. We apply morphological reconstruction to reduce the false minima and delineate the corresponding regions by applying watershed transform to the reconstructed edge map. The regions obtained after watershed segmentation are then represented by their respective centroids. We briefly describe the steps as follows.

Intensity histogram matching.

This is an intensity mapping technique that aims to transform the histogram of an image to match the histogram of a reference image. The purpose is to produce a similar distribution of intensities in the pair of images that we will register. This pre-processing stage was necessary due to the variation of intensity in the pairs of images in our dataset, which could be as a result of the difference in sensors used in the acquisition process. An example of this process is shown in Fig. 3. We observe in Fig. 3 that the histogram of the input image (middle column) has been transformed (last column) so that the intensity distribution is similar to the reference image (first column).

Fig. 3. — Histogram matching example. (Top, from left to right) The reference image, the input image and an output image of histogram matching. (Bottom, from left to right) Corresponding histograms.

Parzen kernel density estimation probabilistic edge detection.

In our approach, we calculate a measure of edge likelihood from the probability density estimate of the intensity at the median in the neighborhood of each pixel. Our premise is that high probability estimates correspond to uniform distribution models occurring within a region, while low probability values indicate the presence of an edge (Economou et al., 2001; Makrogiannis et al., 2005; Makrogiannis and Bourbakis, 2010). The Parzen Kernel Density estimator (Parzen, 1962) is a non-parametric kernel density estimator given by

f_{n} (x) = \int_{- \infty}^{\infty} \frac{1}{s} K (\frac{x - y}{s}) d F_{n} (y) ≃ \frac{1}{n \cdot p b w} \sum_{j = 1}^{p k l^{2}} K (\frac{x - X_{j}}{p b w})

(1)

where $X_{j}$ corresponds to the intensity values on a $p k l \times p k l$ window. $p k l$ is the kernel length and $p b w$ is the bandwidth, which determines the density of the edges. Our choice for $K (x)$ is the Gaussian kernel, defined by

K (x) = \frac{1}{2 π} e^{- \frac{x^{2}}{2}} .

(2)

H-minima transform.

The edge maps from the preceding step also include false edges introduced by noise. In order to reduce over-segmentation, we applied the $h$ -minima transform. The $h$ -minima transform (Soille, 2013) is a morphological image reconstruction algorithm which suppresses all regional minima with depths less than or equal to some given threshold $h$ . Fig. 5 shows the result of applying $h$ -minima transform with height $h = 2$ to the Parzen edge maps in Fig. 4. As seen in the result, the spurious edges created by noise have been smoothed.

Fig. 5. — Example of segmentation after $h (= 2)$ minima transform. (From left to right) Edge estimation by applying $h$ minima transform and the result of watershed segmentation.

Fig. 4. — Example of segmentation using Parzen Kernel density estimation. (From left to right) Original image corrupted by Gaussian noise with zero mean and standard deviation $σ = 5$ , intensity of the highlighted portion in the original image, edge regularization using Parzen kernel density estimation, and the result of watershed segmentation.

Watershed segmentation.

The next step in our region detection algorithm is to apply watershed segmentation to the reconstructed edge map. Watershed segmentation is a morphological image segmentation algorithm. The principle follows from the construction of dams (watershed lines) between flooded catchment basins corresponding to regional minima, to prevent water from moving from one catchment basin to another. The watershed lines demarcate one region from the other (See Fig. 6). In our experiments, we used this algorithm to delineate regions on real satellite imagery.

Fig. 6. — Watershed labels for synthetic images.

2.2. Feature extraction

A region feature describes some properties that distinguishes it from other regions. We represent the region features of an image in an $N \times L$ matrix, where $N$ is the number of regions and $L$ is the length of each feature vector. We propose to use joint shape and intensity descriptors. We represent the shape by Fourier Descriptors and we also calculate region-based intensity features.

2.2.1. Regional Fourier Descriptor (RFD)

Fourier descriptors represent the shape of a 2D region by taking the Fourier transform of the boundary, after converting each boundary point $(x, y)$ to a complex number $z = x + i y$ . The original boundary can be reconstructed by taking the inverse Fourier transform. The boundary may be approximated by applying the inverse transform to selected Fourier coefficients. Here we utilize $2^{3 + k}, k = 0, 1, 2, \dots$ coefficients for shape boundary Fourier description. We enabled translation invariance by subtracting the region centroid from the boundary coordinates (Zhang et al., 2002), i.e.,

z (t) = (x (t) - x_{c}) + j (y (t) - y_{c})

(3)

where $x_{c} = \frac{1}{N} \sum_{t = 0}^{N - 1} x (t)$ ; $y_{c} = \frac{1}{N} \sum_{t = 0}^{N - 1} y (t)$ . The term used to describe the shape boundary signature in Eq. (3) in Zhang et al. (2002) is complex coordinate function.

We also considered the centroid distance function,

r (t) = | z (t) |

(4)

as another shape boundary signature, calculated by the magnitude of the complex number $z$ . The advantage of using the complex coordinate function is that it is computationally efficient and can be used for shape boundary reconstruction. Notwithstanding, in our experiments, the centroid distance function yielded better results.

Let $u (t)$ represent the shape signature, $t = 0, 1, 2, \dots, M - 1$ . We apply 1D Fourier transform to $u$ to calculate the frequency domain representation of the shape signature

ω (τ) = \sum_{t = 0}^{M - 1} u (t) e^{- j 2 π \frac{τ}{M} t}, τ = 0, 1, 2, \dots, M - 1

(5)

where $u$ is either $r$ or $z$ above. The DC Fourier coefficient, corresponds to $τ = 0$ and is given by

ω (0) = \sum_{t = 0}^{M - 1} u (t)

(6)

Next we calculate the magnitude $| ω (τ) |$ , $τ = 0, 1, \dots, M - 1$ , of the Fourier transform (5). We scale by dividing the magnitude by the DC coefficient. Due to the possibility of the DC being zero, especially when using the complex coordinate function on a symmetric boundary, we scale according to the following formula

ω_{s} (τ) = \{\begin{array}{l} ω (τ) / | ω (0) |, & if | ω (0) | \neq 0 \\ ω (τ), & if | ω (0) | = 0 \end{array}

(7)

We shift the Fourier coefficients to yield $γ (τ) = ω_{s} (τ - M / 2)$ so that $ω_{s} (0) = γ (M / 2)$ , and compute the magnitude

r (τ) = | γ (τ - M / 2) | .

(8)

We locate the median index, $m$ of the centered coefficients, and extract $L = N F C$ number of Fourier coefficients around the center or median location

F D = {[r (m - \frac{L}{2} - 1), \dots, r (m - 1), r (m), r (m + 1), \dots, r (m + \frac{L}{2})]}^{T}

(9)

$L = 2^{3 + k}, k = 0, 1, 2, \dots K, L \leq M$ . Eq. (9) yields our Fourier descriptor that is scale, rotation, and translation invariant. We denote the regional Fourier descriptor of centroid distances by RFD.CD.

2.2.2. Standardized Regional Intensity Descriptor (SRID)

Intensity features are vectors of length $L = (B S)^{2}$ , whose components consist of the pixel intensity values obtained from an $(B S) \times (B S)$ block neighborhood around the centroid. We used 11 × 11 block size in our experiments.

We calculate the regional intensity features for each of the input and reference images. Next, we concatenate the resulting $M \times L$ and $N \times L$ matrices of input and reference features, to obtain $(M + N) \times L$ matrix. Then the concatenated features are scaled by calculating the $z$ score of each component $x_{i, j}$ , such that

z_{i j} = \frac{x_{i j} - μ_{j}}{σ_{j}}, i = 1, 2, \dots, (M + N), j = 1, 2, \dots, L .

(10)

where $μ_{j}$ and $σ_{j}$ are the mean and standard deviation of the jth column. We utilize the standardized scores in the matching stage.

2.2.3. Standardized Global Shape Descriptor (SGSD)

As a reference method for experimental comparisons, we also calculate conventional global shape features (Yang et al., 2008). SGSD employs the major and minor principal axes lengths, principal axes ratio, eccentricity, and solidity, $s = [λ_{major}, λ_{minor}, r, e, ς]$ that we describe next:

The major principal axis length of a region $λ_{major}$ is a scalar that specifies the length, in pixels, of the major axis ellipse having the same shape moment, up to second order, as the region.

The minor principal axis length of a region $λ_{minor}$ is a scalar that specifies the length, in pixels, of the minor axis of an ellipse having the same shape moment, up to second order, as the region.

The principal axes ratio $r$ is the ratio of the major to the minor principal axes, $r = λ_{major} / λ_{minor}$ .

The eccentricity $e$ of a region is the ratio of the distance between the foci to the major principal axis of the ellipse having the same second moment as the region, i.e., $e = \frac{{∥f_{1} - f_{2}∥}_{2}}{λ_{major}}$ where $f_{1}, f_{2}$ are the foci of an ellipse with the same shape moments, up to second moments, as the region.

The solidity $s$ of a region is the area of the region to the area of its Area of shape convex hull, $ς = \frac{Area of shape}{Area of convex hull of the shape}$ .

Shape features are scaled to standardized $z$ –scores using the same strategy as described in the standardization of intensity features. We then create joint region descriptors by combining two or more standardized features described above.

2.3. Feature matching

Feature matching is the process of establishing correspondences between the regional descriptors. The similarity between features is measured by a cost function. We describe our feature matching steps next.

2.3.1. Calculation of joint score matrix

A score (similarity) matrix is an $M \times N$ matrix that consists of numerical measures of similarities between a pair of regions from the input and reference images, which are represented by their respective feature vectors.

We calculate pairwise similarity using a joint cost function that weighs the contribution of each feature according to its range. Let $u = {[u_{1}, \dots, u_{L}]}^{⊤}$ and $v = {[v_{1}, \dots, v_{L}]}^{⊤}$ be vectors in $R^{L}$ representing scaled input and reference features, respectively.

Joint cost function.

We define a joint cost function that is a weighted combination of Euclidean score matrices

D_{J} (u, v) = \sum_{i = 1}^{p} w_{i} {[\sum_{k = 1}^{L_{i}} (u_{k}^{2} - v_{k}^{2})]}^{\frac{1}{2}}

(11)

where $p$ is the number of descriptors, $w_{i}$ is the descriptor weight given by $w_{i} = \frac{1}{L_{i} \cdot \sqrt{{(b_{k} - a_{k})}^{2}}},$ $[a_{k}, b_{k}]$ defines the range of the scaled $i$ th descriptor under consideration and $L_{i}$ is the descriptor’s dimensionality. The three descriptor types are SRID, RFD and SGSD.

To compute the scores in Eq. (11), we first compute the score matrices for each descriptor type and then combine them to obtain a joint score matrix. The weights $w_{i}$ balance the contribution of each descriptor type to prevent possible bias.

2.3.2. Cost minimization

In this stage, we perform a nearest neighbor search on the joint score matrix. The minimum cost for each row of the score matrix is obtained and indexed as (row, col). The row is the index of the reference feature, whereas col is the index of the input feature. Given a reference feature, indexed row, the minimum score in a row describes the closest match with index col among all the input features.

Next, we obtain the second lowest costs for each row to reject unambiguous matches, as discussed in the next subsection. The minimum scores undergo double thresholding to identify the best matches. First, we use the weak matching threshold, $τ_{w}$ to remove the weak matches. The second threshold is the ambiguity threshold $τ_{a}$ . The ratio of the minimum to the second smallest score is calculated and compared to $τ_{a}$ . The minimum scores whose ratio is less than $τ_{a}$ are considered unambiguous scores and are kept. Finally, we extract the pairs of points that correspond to the unambiguous scores. These matches are used for estimating the geometric transform.

2.4. Estimation of geometric transform

In this section, we discuss the method of fitting an affine geometric transform on unambiguous matched points. There are different methods for fitting a geometric transform on 2D to 2D correspondences, but in our case, we used the maximum likelihood estimation sampling consensus (MLESAC) (Torr and Zisserman, 2000; Hartley and Zisserman, 2003).

Given a 2D to 2D correspondence $x_{i} \leftrightarrow x_{i}^{'}$ we wish to obtain an affine geometric transform $H$ such that

x_{i}^{'} = H x_{i}

(12)

where $H, x_{i}$ and $x_{i}^{'}$ are given by

H = [\begin{matrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ 0 & 0 & 1 \end{matrix}]; x_{i} = [\begin{matrix} x_{i} \\ y_{i} \\ 1 \end{matrix}]; x_{i}^{'} = [\begin{matrix} x_{i}^{'} \\ y_{i}^{'} \\ 1 \end{matrix}] .

(13)

Because $x_{i}$ and $H x_{i}^{'}$ are collinear, we have $x_{i} \times H x_{i}^{'} = 0$ . We define $h^{1} = {[h_{11}, h_{12}, h_{13}]}^{⊤}, h^{2} = {[h_{21}, h_{22}, h_{23}]}^{⊤}, h^{3} = [0, 0, 1]^{⊤}$ so that

H x_{i} = (\begin{array}{l} h^{1 T} x_{i} \\ h^{2 T} x_{i} \\ h^{3 T} x_{i} \end{array}) .

Then the cross product can be written as

x_{i} \times H x_{i} = (\begin{matrix} y_{i}^{'} {h^{3}}^{⊤} x_{i} - h^{2^{⊤}} x_{i} \\ h^{1^{⊤}} x_{i} - x_{i}^{'} h^{3^{⊤}} x_{i} \\ x_{i}^{'} {h^{2}}^{⊤} x_{i} - y_{i}^{'} {h^{1}}^{⊤} x_{i} \end{matrix}) .

This yields a system of three equations in the entries of $H$ ,

[\begin{matrix} 0^{⊤} & - x_{i}^{⊤} & y_{i}^{'} x_{i}^{⊤} \\ x_{i}^{⊤} & 0^{⊤} & - x_{i}^{'} x_{i}^{⊤} \\ - y_{i}^{'} x_{i}^{⊤} & x_{i}^{'} x_{i}^{⊤} & 0^{⊤} \end{matrix}] (\begin{array}{l} h^{1} \\ h^{2} \\ h^{3} \end{array}) = 0 .

(14)

Eq. (14) has the form

A_{i} h = 0; w i t h h = {(h^{1}, h^{2}, h^{3})}^{⊤}

(15)

where $h$ is a 9-vector containing entries of $H$ , and $A_{i}$ is a 3 × 9 matrix.

The third row of $A_{i}$ is a linear combination of row 1 and row 2 by the relation $R_{3} = x_{i}^{'} R_{1} + y_{i}^{'} R_{2}$ . So the rank of $A_{i}$ is 2 and we may solve for $h$ using the equation

[\begin{matrix} 0^{⊤} & - x_{i}^{⊤} & y_{i}^{'} x_{i}^{⊤} \\ x_{i}^{⊤} & 0^{⊤} & - x_{i}^{'} x_{i}^{⊤} \end{matrix}] (\begin{array}{l} h^{1} \\ h^{2} \\ h^{3} \end{array}) = 0

(16)

The affine transform matrix has six degrees of freedom, corresponding to $h_{11}, h_{12}, h_{13}, h_{21}, h_{22}, h_{23}$ . Each correspondence $x_{i} \leftrightarrow x_{i}^{'}$ gives rise to two linearly independent equations in the entries of $H$ . Hence $H$ can be computed from three correspondences. The system $(A h = 0)$ to solve becomes

[\begin{matrix} 0 & 0 & 0 & - x_{1} & - y_{1} & - 1 & y_{2}^{'} \\ x_{1} & y_{1} & 1 & 0 & 0 & 0 & - x_{i}^{'} \\ 0 & 0 & 0 & - x_{2} & - y_{2} & - 1 & y_{2}^{'} \\ x_{2} & y_{2} & 1 & 0 & 0 & 0 & - x_{2}^{'} \\ 0 & 0 & 0 & - x_{3} & - y_{3} & - 1 & y_{3}^{'} \\ x_{3} & y_{3} & 1 & 0 & 0 & 0 & - x_{3}^{'} \end{matrix}] (\begin{matrix} h_{11} \\ h_{12} \\ h_{13} \\ h_{21} \\ h_{22} \\ h_{23} \\ 1 \end{matrix}) = 0

(17)

Since the rank of $A$ is 6, it has a one-dimensional null space which provides a solution of $h$ subject to $| h | = 1 . h$ can be solved by singular value decomposition of $A, A = U Σ V^{⊤}$ , and corresponds to the unit singular vector corresponding to the smallest singular value of $A$ .

The MLESAC estimator randomly selects three correspondences $x_{i} \leftrightarrow x_{i}^{'}, i = 1, 2, 3$ at a time, from the set of matched points, estimates the affine matrix $\hat{H}$ and the corrected correspondences, $\{{\hat{x}}_{i} \leftrightarrow {\hat{x}}_{i}^{'}\}$ , that minimize

D = \sum_{i} d {(x_{i}, {\hat{x}}_{i})}^{2} + d {(x_{i}^{'}, {\hat{x}}_{i}^{'})}^{2}

(18)

with ${\hat{x}}_{i}^{'} = \hat{H} {\hat{x}}_{i}$ . To obtain minimization, the algorithm compares $D$ from Eq. (18) against a distance threshold $ϵ$ . The corrected correspondences $\{{\hat{x}}_{i} \leftrightarrow {\hat{x}}_{i}^{'}\}$ , that satisfy $D < ϵ$ are called inliers.

3. Experiments and results

In this section, we present registration results on satellite images using the described region based method. We describe the datasets and how we generated the ground truth. We outline the experiments and the validation technique. We report results from our method and compare them to results we obtained using feature-based methods, i.e., Harris, minimum eigenvalues, FAST, SURF, BRISK and KAZE.

3.1. Datasets

In this work, we employed two datasets of satellite imagery. These datasets present challenges related to the physical sensors, contrast and noise properties, and resolution limitations.

UCSB dataset.

This dataset contains 19 pairs of satellite images obtained from the UCSB Image Processing and Vision Research Lab (IPVRL-UCSB, 0000) dataset. The image sizes range from 250 × 250 to 600 × 600. 18 out of the 19 pairs were acquired using Landsat TM sensors of different bands (0, 3, 4, 5, 7, 8). The remaining pair was obtained with an AVIRIS-Band-19 sensor. The images in each pair have the same size. The images in the UCSB dataset include viewpoint, temporal, and spectral variations. The pairs of images in the UCSB dataset are shown in Fig. 7. The images in the first pair of the UCSB dataset are images from a Mojave Desert sequence taken with an optical sensor. This image pair consists of single sensor images and the size of each image in the pair is 512 by 512. The eleventh pair comprises of multispectral-optical aerial images of a mountainous area acquired by thematic mapper (TM) simulator. These test images (size: 512 by 512) were cut out from different bands of raw images of size 1000 by 766 after translation and rotation. This pair corresponds to the zeroth and eighth bands from a 12 bands set.

Fig. 7. — The first real dataset (source: UCSB). Consists of 19 pairs of satellite images obtained with Landsat sensors of different bands.

USGS dataset.

Our USGS test dataset was obtained from the online database of the U.S Geological Survey that contains Landsat imagery in Geographic Tagged (GeoTiff) format. We selected images of the Delaware–Maryland–Virginia Peninsula in the United States, which were acquired at eight different time points between the years 2002 and 2011. They were acquired using Landsat TM sensors of bands 4 and 5. Each band of the data is stored as a grayscale, uncompressed, 8 -bit string of unsigned integers. The image matrix sizes range from 7191 × 7951 pixels to 7191 × 8061 pixels with a spatial resolution of 30m. The entire dataset is characterized by multi-temporal, multi-spectral and viewpoint variations, because of the different bands and timepoints, which made it suitable for testing registration performance. For the purpose of registration, we chose as reference the Landsat Band 5 image acquired in 2007, and rotated and translated all other Landsat 4 and 5 images, in addition to the existing viewpoint misalignments. This step generated 16 pairs of images in the USGS to register as shown in Fig. 8. The resulting distributions of ground truth transformation parameters over all the USGS image pairs are 13.003° ± 0.002 for the rotation, 869.40 ± 96.698 for the horizontal translation and −846.958 ± 20.174 for the vertical translation.

Fig. 8. — The second dataset (source: USGS). The image acquired on 2007, band 5 denoted by blue background is used as reference.

3.2. Validation

In order to validate the results of automated registration, we computed the root-mean-squared-error (RMSE), the geodesic distance (GD) between the points obtained from the the ground-truth, $T$ and estimated, $\hat{T}$ transformations on the reference domain. We computed ground truth transformations for each image pair in both datasets by registering the image pairs using about 40 manually selected pairs of control points from both images.

Root Mean Squared Error (RMSE).

Let $ω = [u, v, 1]^{⊤} = T [x, y, 1]^{⊤}$ and $\hat{ω} = [\hat{u}, \hat{v}, 1]^{⊤} = \hat{T} [x, y, 1]^{⊤}$ be 2D homogeneous coordinates, where $1 \leq x \leq M$ and $1 \leq y \leq N$ . Let $Ω = {ω = (u, v) : 0 < u \leq M a n d 0 < v \leq N}$ be the set of points $ω$ in the spatial domain of the reference image. Then the RMSE between the estimated geometric transformation $\hat{T}$ , and the ground truth transformation $T$ is given as

R M S E (T, \hat{T}) = {(\frac{1}{| Ω |} \sum_{ω \in Ω} ‖ ω - \hat{ω} ‖_{2}^{2})}^{\frac{1}{2}}

(19)

where $| Ω |$ denotes the cardinality of the set $Ω$ , and $∥ ω - \hat{ω} ∥_{2} = \sqrt{(u - \hat{u})^{2} + (v - \hat{v})^{2}}$ .

Geodesic Distance (GD).

The geodesic distance between two points $(p, \hat{p})$ in $Ω \times R$ is the shortest path along the topographic surface (Peyre, 2019) produced by the reference image. Let $γ (t) = (u (t), v (t), R (u (t), v (t)))$ , $t \in [0, 1]$ be a parametric 3D curve. The length of the curve is given by

L (γ (t)) : = \int_{0}^{1} d γ = \int_{0}^{1} W (γ (t)) ∥γ^{'} (t)∥ d t

(20)

where $W$ is a weighted length metric. A geodesic curve is the curve that minimizes the length $L$ in (20), $\hat{γ} = a r g {m i n}_{γ} L (γ (t))$ . Then the geodesic distance $G D$ between $(p, \hat{p})$ is given by

G D (p, \hat{p}) = L (\hat{γ} (t)) .

(21)

We utilized the fast marching method by Sethian (1996) to reduce the computational time. We calculated the average geodesic distance over the pairs of test points on the image topographic surface. This is a distance measure that considers the intensity differences between points, therefore it may provide further insight into the effect of misalignment.

3.3. Results

UCSB dataset results.

In these experiments, we tested our automated registration method individual shape and intensity descriptors, or their combinations. We also tested several feature-based techniques that have been published in the literature, that is Harris, the minimum eigenvalue denoted by MINEIGEN, FAST, SURF, BRISK and KAZE. We used grid search to tune the parameters for all compared methods. In our method, we tuned the following parameters: parzen kernel length, parzen bandwidth, h-minima height, match threshold, and ambiguity threshold. We report the results from these experiments in Tables 1, 3 and Fig. 9.

Table 1.

Summary of root mean squared errors before and after registration of UCSB data for validation of registration accuracy of feature-based and region-based techniques.

Descriptor	RMSE before registration		RMSE after registration

	Mean ± Stdev	Median	Mean ± Stdev	Median

HARRIS			0.684 ± 0.441	0.551
MINEIGEN			0.68 ± 0.41	0.634
FAST			0.695 ± 0.416	0.715
SURF			0.742 ± 0.408	0.747
BRISK			0.98 ± 0.658	0.8
KAZE	107.315 ± 66.442	98.9	12.768 ± 51.077	0.556
SRID			0.596 ± 0.402	0.599
RFD.CD			53.348 ± 96.742	1.386
SGSD			71.187 ± 84.283	18.713
SRID.RFD.CD			0.446 ± 0.359	0.267
SRID.SGSD			0.522 ± 0.349	0.509

Open in a new tab

Table 3.

Summary of point correspondences for the UCSB dataset. Comparison of registration results feature-based and region-based techniques. TVP — Total valid points, UM — unambiguous matches, INL – inliers.

Descriptor	TVP		UM		Ratio (INL/UM)

	Mean ± Stdev	Median	Mean ± Stdev	Median	Mean ± Stdev	Median

HARRIS	2509.8 ± 1883.4	2226	465.5 ± 338	396	0.665 ± 0.205	0.661
MINEIGEN	5572.3 ± 3204.1	4538	1030.1 ± 569.3	800	0.603 ± 0.217	0.6647
FAST	2507.2 ± 2048.4	1756	529.2 ± 498.6	349	0.645 ± 0.197	0.655
SURF	1695.3 ± 1089.1	1563	377.8 ± 236.8	382	0.518 ± 0.221	0.484
BRISK	4114.8 ± 3477.2	3279	753.8 ± 669.2	585	0.598 ± 0.193	0.604
KAZE	7743.9 ± 3772.2	7644	1587.4 ± 750.5	1653	0.584 ± 0.239	0.609
SRID	2161.2 ± 1318.5	1689	268.5 ± 234.8	196	0.43 ± 0.27	0.353
RFD.CD	1590.3 ± 1357.7	968	346.7 ± 312.6	203	0.037 ± 0.038	0.024
SGSD	2269.9 ± 1376.2	1777	501.4 ± 307.8	411	0.034 ± 0.044	0.016
SRID.RFD.CD	1516.9 ± 1298.9	926	170.5 ± 176.5	92	0.293 ± 0.18	0.286
SRID.SGSD	2161.2 ± 1318.5	1689	318.6 ± 251.2	261	0.317 ± 0.179	0.291

Open in a new tab

In Table 1 we present a summary of root mean square error (RMSE) values of registration of the 19 pairs of images in this dataset, by calculating the average, standard deviation and median of RMSE, before and after registration. The RMSE before registration expresses the average pixel discrepancy, that is quite extensive as we can also observe in Fig. 8. We observe that our joint regional Fourier and intensity descriptor denoted by SRID.RFD.CD yields the lowest average RMSE values among all methods at 0.446 ± 0.359 pixels and the lowest median at 0.267 pixels. Furthermore, the other joint regional shape and intensity descriptor, SRID.SGSD, produced more accurate registration on average than the feature-based techniques with average RMSE values 0.522 ± 0.349 pixels.

Fig. 9 displays the bar graph of the individual RMSE values for each image pair using feature-based methods and our method with the joint shape and intensity descriptors. This offers a more detailed visualization of the performance of each technique and shows that SRID.RFD.CD achieves subpixel accuracy in terms of RMSE for 18 out of the 19 pairs.

Fig. 10 displays the bar graph of the geodesic distances produced by feature-based and region-based methods on the UCSB dataset. We report a summary of these results in Table 2 using the mean, standard deviation and median. From the summary, we observe that SRID.SGSD scored the lowest average geodesic distance at 0.453 ± 0.537 pixels, followed by SRID.RFD.CD method at 0.56 ± 0.681. These results show that our region-based methods yield more accurate registration than feature-based methods.

Table 2.

Summary of geodesic distance values before and after registration of UCSB data for validation of registration accuracy of feature-based and region-based techniques.

Descriptor	GD before registration		GD after registration

	Mean ± Stdev	Median	Mean ± Stdev	Median

HARRIS			0.629 ± 0.632	0.468
MINEIGEN			0.628 ± 0.555	0.689
FAST			0.638 ± 0.571	0.64
SURF			0.764 ± 0.674	0.702
BRISK			0.692 ± 0.588	0.655
KAZE	71.039 ± 61.459	53.217	11.511 ± 46.871	0.498
SRID			2.096 ± 3.619	0.909
RFD.CD			149.983 ± 82.999	160.631
SGSD			118.38 ± 89.559	118.647
SRID.RFD.CD			0.56 ± 0.681	0.558
SRID.SGSD			0.453 ± 0.537	0.242

Open in a new tab

In Table 3, the total valid points (TVP) represent the sum of points in both reference and input images, for which features are extracted. Unambiguous matches (UM) refer to the total number of correspondences obtained after applying the matching thresholds outlined in Section 2.3 and detailed in Muja and Lowe (2009, 2012) and Lowe (2004). Inliers (INL) correspond to the number of correspondences obtained after estimation of the geometric transform by MLESAC. We also report the ratio of inliers to unambiguous matches (INL/UM). The region-based methods, in general, produce fewer unambiguous matches and lower INL/UM ratios than their feature-based counterparts.

Finally, we display the registered image pairs produced by SRID. RFD.CD in Fig. 11.

USGS dataset results.

We performed similar experiments to evaluate the registration accuracy and other performance measures for each stage. We also compared our region-based techniques with feature-based techniques. We report the registration accuracy results in Table 4 and Fig. 14, and feature matching performance measures in Table 6. The results show that SRID.RFD.CD produced very good performance with 1.152 ± 0.488 pixels RMSE estimate and 1.03 pixels median over all 16 image pairs. Among the other methods, MINEIGEN, SRID and SRID.SGSD produce good results, but most of the feature-based methods produced high errors. In Table 6 we observe that our joint regional intensity and shape techniques produce a sufficient ratio of inliers for estimating the geometric transformation. Fig. 12 shows the individual RMSE values for each method and each image pair. Finally, we display the registered image pairs produced by SRID.RFD.CD in Fig. 14.

Table 4.

Summary of root mean squared errors before and after registration of USGS data for validation of registration accuracy of feature-based and region-based techniques.

Descriptor	RMSE before registration		RMSE after registration

	Mean ± Stdev	Median	Mean ± Stdev	Median

HARRIS			256.468 ± 988.988	1.018
MINEIGEN			1.143 ± 0.364	1.060
FAST			2341.732 ± 7168.596	1.514
SURF			1361.969 ± 5265.975	1.977
BRISK			456.078 ± 1760.07	1.507
KAZE	692.538 ± 17.493	691.6	3.18 ± 6.461	1.392
SRID			1.24 ± 0.719	1.140
RFD.CD			2129.787 ± 746.366	2324.1
SGSD			2187.098 ± 496.543	2264.4
SRID.RFD.CD			1.152 ± 0.488	1.027
SRID.SGSD			1.206 ± 0.393	1.277

Open in a new tab

Fig. 14. — Registered image pairs from the USGS dataset using SRID.RFD.CD.

Table 6.

Summary of point correspondences for the USGS dataset.

Descriptor	TVP		UM		Ratio (INL/UM)

	Mean ± Stdev	Median	Mean ± Stdev	Median	Mean ± Stdev	Median

HARRIS	9548 ± 2308	9460.5	1338.8 ± 540.6	1278.5	0.361 ± 0.188	0.351
MINEIGEN	56 705.8 ± 3074.3	57 248.5	8034.8 ± 2903	7549.5	0.319 ± 0.184	0.285
FAST	8582.4 ± 3022.1	7367.5	900.5 ± 758.4	761.5	0.276 ± 0.189	0.290
SURF	5595.1 ± 1936.7	5125.5	857.6 ± 605.6	837.5	0.303 ± 0.165	0.274
BRISK	12 549.1 ± 4786.9	10 769.5	1322.1 ± 1103.1	1161	0.284 ± 0.188	0.285
KAZE	56 778.7 ± 8997.9	58 165	6011.1 ± 1850	6009	0.190 ± 0.111	0.181
SRID	22 367.9 ± 10 875.6	19 958.5	653.8 ± 724.2	589.5	0.305 ± 0.163	0.255
RFD.CD	3918.8 ± 7790.5	1907.5	710.2 ± 1279.6	392	0.024 ± 0.056	0.010
SGSD	22 367.9 ± 10 875.6	19 958.5	4015.2 ± 1291.7	4250	0.001 ± 0.001	0.001
SRID.RFD.CD	17 527 ± 11 078.9	10 887.5	566.8 ± 355.8	405	0.159 ± 0.191	0.113
SRID.SGSD	22 367.9 ± 10 875.6	19 958.5	4015.2 ± 1291.7	4250	0.246 ± 0.158	0.201

Open in a new tab

Fig. 12. — Comparison of RMSE values produced by feature-based and region-based techniques for each image pair of the USGS dataset.

Fig. 13 and Table 5 contain the validation results for the USGS dataset using the geodesic distance. SRID.RFD.CD scored the lowest mean geodesic distances followed by SRID.SGSD at 0.711 ± 0.354 and 1.12 ± 0.449 respectively. These results confirm that our region based registration descriptors are capable of producing more accurate registration in a complicated dataset than their feature based counterparts.

Table 5.

Summary of geodesic distance values before and after registration of USGS data for validation of registration accuracy of feature-based and region-based techniques.

Descriptor	GD before registration		GD after registration

	Mean ± Stdev	Median	Mean ± Stdev	Median

HARRIS			145.329 ± 577.183	1.081
MINEIGEN			1.148 ± 0.403	1.206
FAST			308.623 ± 863.958	1.266
SURF			171.41 ± 678.032	1.708
BRISK			160.39 ± 635.371	1.612
KAZE	364.476 ± 29.368	365.375	1.957 ± 2.343	1.444
SRID			175.282 ± 693.726	1.336
RFD.CD			2378.024 ± 756.813	2336.817
SGSD			2472.603 ± 632.975	2392.214
SRID.RFD.CD			0.711 ± 0.354	0.702
SRID.SGSD			1.12 ± 0.449	1.17

Open in a new tab

4. Discussion

UCSB dataset discussion.

In Tables 1 and 2 we observe that region-based registration techniques with joint shape and intensity descriptors produce the lowest registration errors (SRID.RFD.CD, and SRID.SGSD). In Fig. 11 we can visually confirm the accuracy of registration produced by SRID.RFD.CD. However, the RMSE and geodesic distance values increase significantly when we utilize just the shape descriptors (RFD.CD and SGSD). SRID alone performs very well but still not as well as the joint descriptors. This indicates that intensity and shape features complement each other and their joint use improves the feature matching performance. The feature-based methods of HARRIS, MINEIGEN, FAST and SURF, produce good results but still less accurate than SRID.RFD.CD. A desirable property of RFD is that it represents the complete shape information at an accuracy level that is controlled by the number of Fourier coefficients. The introduction of regional intensity information produces a more accurate representation. We observe that in Table 3 the mean geodesic distance of SRID.RFD.CD is higher than that of SRID.SGSD, while the reverse relationship is observed in Table 1 for the RMSE. This is because the geodesic distance utilizes the intensity (or the gradient) of the reference image, in addition to the spatial coordinates. Overall, the RMSE and geodesic distance validation results are consistent.

To fine tune the algorithm parameters, we performed grid search over the parameter space of the threshold $h$ in the h-minima transform, Parzen kernel length ( $p k l$ ) and Parzen bandwidth ( $p b w$ ) for edge estimation, the number of Fourier coefficients (NFC) for extraction of shape-boundary Fourier descriptors, matching threshold $(τ_{m})$ and ambiguity threshold $(τ_{a})$ for feature matching, over all pairs of images. Greater values of $h$ remove more local minima. The values of $h = {3, 5, 7}$ were able to reduce over-segmentation caused by intensity variations. Furthermore the Parzen kernel length $p k l$ is related to the number of samples of density estimation and the scale of detection, and the Parzen bandwidth $p b w$ controls the sensitivity to intensity contrast. The following values produced good results for the UCSB dataset $p k l = {5, 7, 9}$ , $p b w \in [42, 45]$ . The numbers of Fourier coefficients NFC that produced effective shape representation were $N F C = {8, 16, 32}$ . The amount of unambiguous matches depends on the magnitude of the ambiguity threshold $τ_{a}$ . Due to the closeness of the first and second minimum matches for each row in the score matrix, it was ideal to choose $τ_{a} \in [0.9, 1]$ . The range of the threshold for selecting strong matches $τ_{m}$ is $(0, 1]$ , but for our real dataset experiments, our results were obtained for values in the interval [0.5, 0.9].

In Table 3, the total valid point (TVP) values are influenced by the number of Fourier coefficients (NFC) and the block size of the intensity features. When extracting joint Fourier and Intensity features, we first reject regions with fewer coefficients than the parameter NFC, and secondly, regions whose intensity blocks exceeded the boundary of the image. Finally, the regions that satisfy both conditions are returned as valid regions for matching. The centroids of the valid regions are the valid points. The sum of valid points for both images is referred to as total valid points (TVP).

When reviewing the INL/UM ratios, we observed that the region based descriptors (with the ratio being in the range 0.293 ± 0.18, for joint-intensity-shape descriptor) produced fewer inliers than feature-based descriptors (ratio is in the interval 0.598 ± 0.193, for BRISK). Hence feature-based descriptors may be able to yield more reproducible matches prior to the estimation of the geometric transform. Our joint descriptors SRID.RFD.CD and SRID.SGSD still produce a sufficient number of inliers for estimating an accurate geometric transformation.

USGS dataset discussion.

The RMSE results in Tables 4 and 5 indicate that SRID.RFD.CD produces very good accuracy followed by MINEIGEN, SRID.SGSD, SRID and KAZE. Because this dataset presents challenges due to the significant increase in the image matrix size and detection of more feature points and fine details, the performance of other FBR techniques declines in comparison with the UCSB dataset. The individual RMSE and geodesic distance values in Figs. 12 and 13 indicate the higher level of difficulty of this dataset for some image pairs such as P1, P3, P13, and P15. Nevertheless, SRID.RFD.CD was able to produce good results. To assess the accuracy of registration we show in Fig. 15 registration results by SRID.RFD.CD that are zoomed-in on coastline and inland regions of USGS image pairs. It is obvious that these images contain fine detail that needs to be preserved in order to find accurate point correspondences. We also note the temporal and multispectral variations that further complicate the registration task.

Fig. 15. — Examples of misalignments and registration results on USGS data.

Our grid search parameter tuning strategy produced the following ranges suitable for the USGS data: $h = {3, 5, 7, 9, 11}$ , $p k l = {3, 5}$ , $p b w \in [25, 43], N F C = {8, 16, 32}$ , $τ_{m} \in [0.1, 1]$ , $τ_{a} \in [0.9, 1]$ . We note that the parameter intervals are similar to those of the UCSB data except for the $p b w$ interval. The $p b w$ level is related to the level of detail to be preserved, therefore a lower value corresponds to increased sensitivity to spectral values that is a desirable algorithm property in this dataset.

The point correspondence summary in Table 6 shows that the region-based techniques detect enough keypoints for matching. We note that the registration accuracy of region-based technique is related to the INL/UM ratio. This relation implies that if a sufficient number of inliers is detected, then the registration result is expected to be accurate. Regarding the stability of solutions, we note that a sufficient number of inliers will support the convergence to good solutions and avoid the local minima. The SRID.RFD.CD method produces in general fewer inliers than some feature-based techniques such as MINEIGEN and KAZE, therefore SRID.RFD.CD is more sensitive to outliers. This result implies that there is potential for further improving the robustness of this descriptor and discriminant capability.

Regarding the implementation, we ran the experiments on Windows platform with Intel Xeon Silver 4114 2.2 GHz CPU and 64 GB RAM. Our region-based methods were implemented in Matlab, but the compared approaches were developed in C++ using the OpenCV library. Therefore the computation times of these groups of techniques are not directly comparable. Qualitative comparisons indicate that while the calculation of RFD descriptors increases the computational complexity of region-based approach, this is comparable to the computational complexity of multi-scale analysis methods such as KAZE.

Qualitative comparisons.

Next, we qualitatively compare methods with published region based methods, such as automatic image registration through segmentation methods by Goncalves et al. (2011b,a), contour based approach by Li et al. (1995) and the methods by Dare and Dowman (2001) and Han et al. (2017).

HAIRIS by Goncalves et al. (2011b), uses mode delineation for segmentation, while the method published in Goncalves et al. (2011a) uses the Otsu thresholding approach for segmentation. These segmentation methods are straightforward and are often affected by noise. In our method, the use of kernel density estimation for edge detection followed by watershed segmentation is advantageous in the sense that it is robust to noise and produces closed regions. Methods like HAIRIS, require a separate pre-filtering process/algorithm for noise reduction, which may reduce the accuracy of registration, because of the blurring of edges and/or boundaries of objects. Our method handles the effect of noise implicitly using the parameters of parzen kernel density edge estimation and the h-minima transform. HAIRIS was modeled to address only additive noise, hence the possibility of failing in presence of other type of noise. Our experiments using a simulated additive noise on synthetic images, showed that by adjusting the parameter of h-minima transform, the effect of noise is minimized.

Dare and Dowman (2001)’s method for automatic feature-based registration of SAR and SPOT images involves an initial manual selection of control points, approximate registration using patch matching, before finding the final accurate registration using edge matching. The possible downsides of this method and how our method is able to address them are enumerated as follows; first, the use of manual control points may be a laborious task especially in a large dataset, such as the UCSB and USGS datasets having a total of 35 pairs of images. Even choosing a minimum of four pairs of points for each pair of images, will amount to 4 × 35 × 2, i.e., 280 control points in our experiments. Secondly, in the approximate registration using patch extraction and matching, thresholding, homogeneous and region growing segmentation algorithms were used for patch extraction. One of the drawbacks of these segmentation approaches is that they are task specific. But our use of watershed segmentation using Parzen kernel density estimation, and accompanied by the h-minima morphological image transformation enable us to estimate range of parameter values based on the physical characteristics such as intensity contrast, spatial and spectral resolution. This helps to produce stable results in multiple sensor and multispectral image registration.

Joint feature extraction proposed in our work has the capability of producing distinctive regions that would yield optimal matches. The feature descriptor, whose components are area, perimeter length and width, and the height of minimum bounding rectangle used in the method proposed by Dare and Dowman (2001), or the area, axis ratio, perimeter and fractal dimension, as used in HAIRIS, may not work as well as the joint feature descriptors SRID.RFD.CD and SRID.SGSD would, because global shape features alone are less discriminant than joint intensity and shape, as we observed from the results of the SGSD method.

Han et al. (2017)’s method makes the assumption that images to be registered have been moderately aligned. Thus could imply that it may not work effectively for extensive misalignments. This also suggests that the user has manually or interactively generated a priori information on the geometric variations between the reference and input images to be registered. On the other hand, our method may address extensive geometric (rotation and translation) misalignments that are occurring in UCSB dataset, for example.

Another advantage of our method is that the segmentation stage produces closed regions, but the contour-based approach by Li et al. (1995) produced both closed and open regions/contours which may reduce the effectiveness of feature extraction and matching. This may lead to difficulties in finding correspondences between regions.

5. Conclusion

In this work, we introduced a segmentation-based registration method for remote sensing imagery. We proposed region-based descriptors and a joint matching cost function for image registration. The region-based descriptors are joint standardized regional intensity and regional Fourier descriptors obtained using centroid distance shape signature (SRID.RFD.CD), and joint standardized region intensity and global shape descriptor (SRID.SGSD). Our joint matching cost function is a weighted combination of Euclidean distances.

In the image segmentation stage, we employ non-parametric kernel density edge estimation and morphological reconstruction before applying the watershed transform to reduce over-segmentation. We also apply intensity histogram matching, in the pre-processing stage, to address sensor variations between two the pairs of images.

We evaluated the performance of our method on two datasets of satellite images resulting in 35 image pairs in total. We also tested the performance of recent feature-based registration techniques on the same data to accommodate comparisons. We validated the registration accuracy by calculating the root mean squared error (RMSE) between ground truth transformations, obtained by fitting affine transformation on control points selected manually, and estimated geometric transformations, obtained by fitting an affine transformation on correspondences established by our methods, using maximum likelihood estimation sampling consensus (MLESAC).

Our techniques produced very good results over these datasets and in comparison to the feature-based techniques. Our joint descriptor denoted by SRID.RFD.CD produced very good average registration accuracy as it achieved the smallest registration error among the compared techniques. These results support the applicability of the proposed region-based technique to automated registration of satellite images.

Acknowledgment

This research was supported by the Delaware NASA EPSCoR/RID program, USA under subaward number 42384.

Footnotes

^☆

No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.cviu.2019.102825.

References

Alcantarilla PF, Bartoli A, Davison AJ, 2012. Kaze features. In: Proceedings of the 12th European Conference on Computer Vision - Volume Part VI. In: ECCV’12, Springer-Verlag, Berlin, Heidelberg, pp. 214–227. 10.1007/978-3-642-33783-3_16. [DOI] [Google Scholar]
Bay H, Tuytelaars T, Van Gool L, 2006. Surf: Speeded up robust features. In: European Conference on Computer Vision. Springer, pp. 404–417. [Google Scholar]
Brown LG, 1992. A survey of image registration techniques. ACM Comput. Surv. (CSUR) 24 (4), 325–376. [Google Scholar]
Brown M, Lowe D, 2002. Invariant features from interest point groups, In Proc. BMVC. pp. 23.1–23.10, 10.5244/C.16.23. [DOI] [Google Scholar]
Calonder M, Lepetit V, Strecha C, Fua P, 2010. Brief: Binary robust independent elementary features. In: European Conference on Computer Vision. Springer, pp. 778–792. [Google Scholar]
Dare P, Dowman I, 2001. An improved model for automatic feature-based registration of sar and spot images. ISPRS J. Photogramm. Remote Sens. 56 (1), 13–28. http://dx.doi.org/10.1016/S0924-2716(01)00031-4, URL http://www.sciencedirect.com/science/article/pii/S0924271601000314. [Google Scholar]
Economou G, Fotinos A, Makrogiannis S, Fotopoulos S, 2001. Color image edge detection based on nonparametric density estimation. In: IEEE Int. Conference on Image Processing. pp. 922–925. [Google Scholar]
Goncalves H, Corte-Real L, Goncalves JA, 2011a. Automatic image registration through image segmentation and sift. IEEE Trans. Geosci. Remote Sens. 49 (7), 2589–2600. 10.1109/TGRS.2011.2109389. [DOI] [Google Scholar]
Goncalves H, Goncalves JA, Corte-Real L, 2011b. Hairis: A method for automatic image registration through histogram-based image segmentation. IEEE Trans. Image Process. 20 (3), 776–789. 10.1109/TIP.2010.2076298. [DOI] [PubMed] [Google Scholar]
Gong M, Zhao S, Jiao L, Tian D, Wang S, 2014. A novel coarse-to-fine scheme for automatic image registration based on sift and mutual information. IEEE Trans. Geosci. Remote Sens. 52 (7), 4328–4338. 10.1109/TGRS.2013.2281391. [DOI] [Google Scholar]
Han Y, Bovolo F, Bruzzone L, 2017. Segmentation-based fine registration of very high resolution multitemporal images. IEEE Trans. Geosci. Remote Sens. 55 (5), 2884–2897. 10.1109/TGRS.2017.2655941. [DOI] [Google Scholar]
Harris C, Stephens M, 1988. A combined corner and edge detector, In Proc. of Fourth Alvey Vision Conference. 147–151. [Google Scholar]
Hartley R, Zisserman A, 2003. Multiple View Geometry in Computer Vision. Cambridge university press. [Google Scholar]
Huo C, Pan C, Huo L, Zhou Z, 2012. Multilevel sift matching for large-size vhr image registration. IEEE Geosci. Remote Sens. Lett. 9 (2), 171–175. 10.1109/LGRS.2011.2163491. [DOI] [Google Scholar]
IPVRL-UCSB, 0000. Satellite Imagery Data Set, Department of Electrical and Computer Engineering, University of California, Santa Barbara, http://old.vision.ece.ucsb.edu/registration/satellite/testimag/. [Google Scholar]
Kybic J, Borovec J, 2014. Automatic simultaneous segmentation and fast registration of histological images, In Proc. IEEE 11th Int. Symp. Biomedical Imaging (ISBI). pp. 774–777, 10.1109/ISBI.2014.6867985. [DOI] [Google Scholar]
Kybic J, Dolejší M, Borovec J, 2015. Fast registration of segmented images by normal sampling, In Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops (CVPRW). pp. 11–19, 10.1109/CVPRW.2015.7301311. [DOI] [Google Scholar]
Leutenegger S, Chli M, Siegwart RY, 2011. Brisk: Binary robust invariant scalable keypoints. In: 2011 International Conference on Computer Vision. IEEE, pp. 2548–2555. [Google Scholar]
Li H, Manjunath BS, Mitra SK, 1995. A contour-based approach to multisensor image registration. IEEE Trans. Image Process. 4 (3), 320–334, URL https://vision.ece.ucsb.edu/sites/default/files/publications/95IPTrans.pdf. [DOI] [PubMed] [Google Scholar]
Liang Z-P, Pan H, Magin RL, Ahuja N, Huang TS, 1997. Automated image registration by maximization of a region similarity metric. Int. J. Imaging Syst. Technol. 8 (6), 513–518. . [DOI] [Google Scholar]
Lowe DG, 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60 (2), 91–110. [Google Scholar]
Makrogiannis S, Bourbakis NG, 2010. Efficient registration of multitemporal and multisensor aerial images based on alignment of nonparametric edge features. J. Electron. Imaging 19 (1), 013002–013002–15. 10.1117/1.3293436. [DOI] [Google Scholar]
Makrogiannis S, Vanhamel I, Fotopoulos S, Sahli H, Cornelis JP, 2005. Watershed-based multiscale segmentation method for color images using automated scale selection. J. Electron. Imaging 14 (3), 033007. [Google Scholar]
Muja M, Lowe DG, 2009. Fast approximate nearest neighbors with automatic algorithm configuration.. VISAPP (1) 2 (331–340), 2. [Google Scholar]
Muja M, Lowe DG, 2012. Fast matching of binary features. In: Computer and Robot Vision (CRV), 2012 Ninth Conference on. IEEE, pp. 404–410. [Google Scholar]
Okorie A, Makrogiannis S, 2017. Automated feature-based registration techniques for satellite imagery, In Proc. IEEE Int. Geoscience and Remote Sensing Symp. (IGARSS). pp. 5137–5140, 10.1109/IGARSS.2017.8128159. [DOI] [Google Scholar]
Parzen E, 1962. On estimation of a probability density function and mode. Ann. Math. Stat. 33 (3), 1065–1076. [Google Scholar]
Peyre G, 2019. Numerical Tours of Data Science, https://nbviewer.jupyter.org/github/gpeyre/numerical-tours/blob/master/matlab/fastmarching_1_2d.ipynb. [Google Scholar]
Rendas M, Barral A, 2008. Mdl region-based image registration. In: ICPR 2008 19th International Conference on Pattern Recognition(ICPR), Vol. 00. pp. 1–4. 10.1109/ICPR.2008.4760951, URL doi.ieecomputersociety.org/ 10.1109/ICPR.2008.4760951. [DOI] [Google Scholar]
Sethian JA, 1996. A fast marching level set method for monotonically advancing fronts. Proc. Natl. Acad. Sci. 93 (4), 1591–1595. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shi J, Tomasi C, 1994. Good features to track. In: Computer Vision and Pattern Recognition, 1994. Proceedings CVPR’94., 1994 IEEE Computer Society Conference on. IEEE, pp. 593–600. [Google Scholar]
Soille P, 2013. Morphological Image Analysis: Principles and Applications. Springer Science & Business Media. [Google Scholar]
Torr PH, Zisserman A, 2000. Mlesac: A new robust estimator with application to estimating image geometry. Comput. Vis. Image Underst. 78 (1), 138–156. [Google Scholar]
Troglio G, Le Moigne J, Benediktsson JA, Moser G, Serpico SB, 2012. Automatic extraction of ellipsoidal features for planetary image registration. IEEE Geosci. Remote Sens. Lett. 9 (1), 95–99. 10.1109/LGRS.2011.2161263. [DOI] [Google Scholar]
Tuytelaars T, Mikolajczyk K, 2008. Local invariant feature detectors: a survey. Found. Trends Comput. Graph. Vis. 3 (3), 177–280. [Google Scholar]
Wang X, Li Y, Wei H, Liu F, 2015. An asift-based local registration method for satellite imagery. Remote Sens. 7 (6), 7044–7061. 10.3390/rs70607044, URL http://www.mdpi.com/2072-4292/7/6/7044. [DOI] [Google Scholar]
Xiong Z, Zhang Y, 2009. A novel interest-point-matching algorithm for high-resolution satellite images. IEEE Trans. Geosci. Remote Sens. 47 (12), 4189–4200. 10.1109/TGRS.2009.2023794. [DOI] [Google Scholar]
Yang M, Kpalma K, Ronsin J, 2008. A Survey of Shape Feature Extraction Techniques. In-Tech. [Google Scholar]
Ye Y, Shan J, 2014. A local descriptor based registration method for multispectral remote sensing images with non-linear intensity differences. ISPRS J. Photogramm. Remote Sens. 90, 83–95. 10.1016/j.isprsjprs.2014.01.009, URL http://www.sciencedirect.com/science/article/pii/S0924271614000276. [DOI] [Google Scholar]
Zhang D, Lu G, et al. , 2002. A comparative study of fourier descriptors for shape representation and retrieval. In: Proc. of 5th Asian Conference on Computer Vision (ACCV). Citeseer, pp. 646–651. [Google Scholar]
Zitova B, Flusser J, 2003. Image registration methods: a survey. Image Vis. Comput. 21 (11), 977–1000. [Google Scholar]

[R1] Alcantarilla PF, Bartoli A, Davison AJ, 2012. Kaze features. In: Proceedings of the 12th European Conference on Computer Vision - Volume Part VI. In: ECCV’12, Springer-Verlag, Berlin, Heidelberg, pp. 214–227. 10.1007/978-3-642-33783-3_16. [DOI] [Google Scholar]

[R2] Bay H, Tuytelaars T, Van Gool L, 2006. Surf: Speeded up robust features. In: European Conference on Computer Vision. Springer, pp. 404–417. [Google Scholar]

[R3] Brown LG, 1992. A survey of image registration techniques. ACM Comput. Surv. (CSUR) 24 (4), 325–376. [Google Scholar]

[R4] Brown M, Lowe D, 2002. Invariant features from interest point groups, In Proc. BMVC. pp. 23.1–23.10, 10.5244/C.16.23. [DOI] [Google Scholar]

[R5] Calonder M, Lepetit V, Strecha C, Fua P, 2010. Brief: Binary robust independent elementary features. In: European Conference on Computer Vision. Springer, pp. 778–792. [Google Scholar]

[R6] Dare P, Dowman I, 2001. An improved model for automatic feature-based registration of sar and spot images. ISPRS J. Photogramm. Remote Sens. 56 (1), 13–28. http://dx.doi.org/10.1016/S0924-2716(01)00031-4, URL http://www.sciencedirect.com/science/article/pii/S0924271601000314. [Google Scholar]

[R7] Economou G, Fotinos A, Makrogiannis S, Fotopoulos S, 2001. Color image edge detection based on nonparametric density estimation. In: IEEE Int. Conference on Image Processing. pp. 922–925. [Google Scholar]

[R8] Goncalves H, Corte-Real L, Goncalves JA, 2011a. Automatic image registration through image segmentation and sift. IEEE Trans. Geosci. Remote Sens. 49 (7), 2589–2600. 10.1109/TGRS.2011.2109389. [DOI] [Google Scholar]

[R9] Goncalves H, Goncalves JA, Corte-Real L, 2011b. Hairis: A method for automatic image registration through histogram-based image segmentation. IEEE Trans. Image Process. 20 (3), 776–789. 10.1109/TIP.2010.2076298. [DOI] [PubMed] [Google Scholar]

[R10] Gong M, Zhao S, Jiao L, Tian D, Wang S, 2014. A novel coarse-to-fine scheme for automatic image registration based on sift and mutual information. IEEE Trans. Geosci. Remote Sens. 52 (7), 4328–4338. 10.1109/TGRS.2013.2281391. [DOI] [Google Scholar]

[R11] Han Y, Bovolo F, Bruzzone L, 2017. Segmentation-based fine registration of very high resolution multitemporal images. IEEE Trans. Geosci. Remote Sens. 55 (5), 2884–2897. 10.1109/TGRS.2017.2655941. [DOI] [Google Scholar]

[R12] Harris C, Stephens M, 1988. A combined corner and edge detector, In Proc. of Fourth Alvey Vision Conference. 147–151. [Google Scholar]

[R13] Hartley R, Zisserman A, 2003. Multiple View Geometry in Computer Vision. Cambridge university press. [Google Scholar]

[R14] Huo C, Pan C, Huo L, Zhou Z, 2012. Multilevel sift matching for large-size vhr image registration. IEEE Geosci. Remote Sens. Lett. 9 (2), 171–175. 10.1109/LGRS.2011.2163491. [DOI] [Google Scholar]

[R15] IPVRL-UCSB, 0000. Satellite Imagery Data Set, Department of Electrical and Computer Engineering, University of California, Santa Barbara, http://old.vision.ece.ucsb.edu/registration/satellite/testimag/. [Google Scholar]

[R16] Kybic J, Borovec J, 2014. Automatic simultaneous segmentation and fast registration of histological images, In Proc. IEEE 11th Int. Symp. Biomedical Imaging (ISBI). pp. 774–777, 10.1109/ISBI.2014.6867985. [DOI] [Google Scholar]

[R17] Kybic J, Dolejší M, Borovec J, 2015. Fast registration of segmented images by normal sampling, In Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops (CVPRW). pp. 11–19, 10.1109/CVPRW.2015.7301311. [DOI] [Google Scholar]

[R18] Leutenegger S, Chli M, Siegwart RY, 2011. Brisk: Binary robust invariant scalable keypoints. In: 2011 International Conference on Computer Vision. IEEE, pp. 2548–2555. [Google Scholar]

[R19] Li H, Manjunath BS, Mitra SK, 1995. A contour-based approach to multisensor image registration. IEEE Trans. Image Process. 4 (3), 320–334, URL https://vision.ece.ucsb.edu/sites/default/files/publications/95IPTrans.pdf. [DOI] [PubMed] [Google Scholar]

[R20] Liang Z-P, Pan H, Magin RL, Ahuja N, Huang TS, 1997. Automated image registration by maximization of a region similarity metric. Int. J. Imaging Syst. Technol. 8 (6), 513–518. . [DOI] [Google Scholar]

[R21] Lowe DG, 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60 (2), 91–110. [Google Scholar]

[R22] Makrogiannis S, Bourbakis NG, 2010. Efficient registration of multitemporal and multisensor aerial images based on alignment of nonparametric edge features. J. Electron. Imaging 19 (1), 013002–013002–15. 10.1117/1.3293436. [DOI] [Google Scholar]

[R23] Makrogiannis S, Vanhamel I, Fotopoulos S, Sahli H, Cornelis JP, 2005. Watershed-based multiscale segmentation method for color images using automated scale selection. J. Electron. Imaging 14 (3), 033007. [Google Scholar]

[R24] Muja M, Lowe DG, 2009. Fast approximate nearest neighbors with automatic algorithm configuration.. VISAPP (1) 2 (331–340), 2. [Google Scholar]

[R25] Muja M, Lowe DG, 2012. Fast matching of binary features. In: Computer and Robot Vision (CRV), 2012 Ninth Conference on. IEEE, pp. 404–410. [Google Scholar]

[R26] Okorie A, Makrogiannis S, 2017. Automated feature-based registration techniques for satellite imagery, In Proc. IEEE Int. Geoscience and Remote Sensing Symp. (IGARSS). pp. 5137–5140, 10.1109/IGARSS.2017.8128159. [DOI] [Google Scholar]

[R27] Parzen E, 1962. On estimation of a probability density function and mode. Ann. Math. Stat. 33 (3), 1065–1076. [Google Scholar]

[R28] Peyre G, 2019. Numerical Tours of Data Science, https://nbviewer.jupyter.org/github/gpeyre/numerical-tours/blob/master/matlab/fastmarching_1_2d.ipynb. [Google Scholar]

[R29] Rendas M, Barral A, 2008. Mdl region-based image registration. In: ICPR 2008 19th International Conference on Pattern Recognition(ICPR), Vol. 00. pp. 1–4. 10.1109/ICPR.2008.4760951, URL doi.ieecomputersociety.org/ 10.1109/ICPR.2008.4760951. [DOI] [Google Scholar]

[R30] Sethian JA, 1996. A fast marching level set method for monotonically advancing fronts. Proc. Natl. Acad. Sci. 93 (4), 1591–1595. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] Shi J, Tomasi C, 1994. Good features to track. In: Computer Vision and Pattern Recognition, 1994. Proceedings CVPR’94., 1994 IEEE Computer Society Conference on. IEEE, pp. 593–600. [Google Scholar]

[R32] Soille P, 2013. Morphological Image Analysis: Principles and Applications. Springer Science & Business Media. [Google Scholar]

[R33] Torr PH, Zisserman A, 2000. Mlesac: A new robust estimator with application to estimating image geometry. Comput. Vis. Image Underst. 78 (1), 138–156. [Google Scholar]

[R34] Troglio G, Le Moigne J, Benediktsson JA, Moser G, Serpico SB, 2012. Automatic extraction of ellipsoidal features for planetary image registration. IEEE Geosci. Remote Sens. Lett. 9 (1), 95–99. 10.1109/LGRS.2011.2161263. [DOI] [Google Scholar]

[R35] Tuytelaars T, Mikolajczyk K, 2008. Local invariant feature detectors: a survey. Found. Trends Comput. Graph. Vis. 3 (3), 177–280. [Google Scholar]

[R36] Wang X, Li Y, Wei H, Liu F, 2015. An asift-based local registration method for satellite imagery. Remote Sens. 7 (6), 7044–7061. 10.3390/rs70607044, URL http://www.mdpi.com/2072-4292/7/6/7044. [DOI] [Google Scholar]

[R37] Xiong Z, Zhang Y, 2009. A novel interest-point-matching algorithm for high-resolution satellite images. IEEE Trans. Geosci. Remote Sens. 47 (12), 4189–4200. 10.1109/TGRS.2009.2023794. [DOI] [Google Scholar]

[R38] Yang M, Kpalma K, Ronsin J, 2008. A Survey of Shape Feature Extraction Techniques. In-Tech. [Google Scholar]

[R39] Ye Y, Shan J, 2014. A local descriptor based registration method for multispectral remote sensing images with non-linear intensity differences. ISPRS J. Photogramm. Remote Sens. 90, 83–95. 10.1016/j.isprsjprs.2014.01.009, URL http://www.sciencedirect.com/science/article/pii/S0924271614000276. [DOI] [Google Scholar]

[R40] Zhang D, Lu G, et al. , 2002. A comparative study of fourier descriptors for shape representation and retrieval. In: Proc. of 5th Asian Conference on Computer Vision (ACCV). Citeseer, pp. 646–651. [Google Scholar]

[R41] Zitova B, Flusser J, 2003. Image registration methods: a survey. Image Vis. Comput. 21 (11), 977–1000. [Google Scholar]

PERMALINK

Region-based image registration for remote sensing imagery☆

Azubuike Okorie

Sokratis Makrogiannis

Abstract

1. Introduction

Fig. 1.

2. Region-based registration

Fig. 2.

2.1. Region delineation and feature detection

Intensity histogram matching.

Fig. 3.

Parzen kernel density estimation probabilistic edge detection.

H-minima transform.

Fig. 5.

Fig. 4.

Watershed segmentation.

Fig. 6.

2.2. Feature extraction

2.2.1. Regional Fourier Descriptor (RFD)

2.2.2. Standardized Regional Intensity Descriptor (SRID)

2.2.3. Standardized Global Shape Descriptor (SGSD)

2.3. Feature matching

2.3.1. Calculation of joint score matrix

Joint cost function.

2.3.2. Cost minimization

2.4. Estimation of geometric transform

3. Experiments and results

3.1. Datasets

UCSB dataset.

Fig. 7.

USGS dataset.

Fig. 8.

3.2. Validation

Root Mean Squared Error (RMSE).

Geodesic Distance (GD).

3.3. Results

UCSB dataset results.

Table 1.

Table 3.

Fig. 9.

Fig. 10.

Table 2.

Fig. 11.

USGS dataset results.

Table 4.

Fig. 14.

Table 6.

Fig. 12.

Fig. 13.

Table 5.

4. Discussion

UCSB dataset discussion.

USGS dataset discussion.

Fig. 15.

Qualitative comparisons.

5. Conclusion

Acknowledgment

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Region-based image registration for remote sensing imagery^☆