Abstract
The superpixel-based graph convolutional network with local and global information (SGCN-LG) is introduced for polarimetric synthetic aperture radar (PolSAR) image classification in this paper. The number of superpixels (SPs) is automatically determined according to analyzing the second order differences of the standard deviations of pixels within the square patches with various sizes. The local graph is constructed using neighboring SPs where the number of neighbors for each SP is automatically determined by considering multiple hyper windows around all composing pixels of that SP. Moreover, the nearest neighboring SPs from all classes are chosen from the entire scene to construct the global graph containing the discrimination information and relationship among labeled SPs. The local and global features are fused to achieve the classification map. According to experimental results, the proposed SGCN-LG model outperforms several powerful PolSAR classification models.
Keywords: Graph convolutional network, Deep learning, Polarimetric SAR, Superpixel, Classification
Subject terms: Electrical and electronic engineering, Computational science
Introduction
The polarimetric synthetic aperture radar (PolSAR) images with ability of full-time imagery and providing scattering information through transmission and receipt of electromagnetic waves with various polarizations in different directions1 are among the best image sources for remote sensing applications such as classification2-3. In the past decades, most of studies have been focused on physical scattering mechanism where various target decomposition methods such as Cloude–Pottier, Freeman, Pauli and Krogager4-5 have been used for scattering feature extraction. The extracted features can be classified by an appropriate classifier such as Bayesian6 or support vector machine (SVM)7. Moreover, some previous methods have been focused on the PolSAR statistical distribution where the Wishart distribution was introduced for PolSAR image classification8. However, due to complex nature of the imagery scene, the nonlinear relationship of the input image and complexity of textural structures, more efficient polarimetric and spatial features are required for accurate PolSAR image classification9–11.
Sparse representation with dictionary learning combined with nonlinear transformation is proposed in the nonlinear projection dictionary pair learning (NDPL) method for PolSAR image classification12. The composite kernel-hybrid discrimination random field (CK-HDRF) method utilizes the advantages of the composite kernels in handling the nonlinearity and high dimensionality beside the discriminative random field for modelling the positive distribution to analyze the complex texture13.
Recently, deep learning-based models have shown superior performance in various image processing applications such as PolSAR image classification14–16. Specially, convolutional neural network (CNN) provides high success for PolSAR image analysis17-18. A PolSAR image has a 3D nature with spatial information in two first dimensions and polarimetric information in the third dimension. Therefore, three dimensional CNN with capturing 3D patchers as input can simultaneously extract scattering and spatial features19. The residual convolutional neural network with autoencoder based attention (RCNN-AA) is introduced in20. It benefits the convolutional autoencoder (CAE) for attending fine features in the PolSAR image. The scaled difference of the original input patch with its approximation obtained by CAE is considered as the attention weight containing information about the fine spatial features. The attention feature maps beside the original ones are fed into the residual CNN. The discriminative features based high confidence classification (DFC) introduced in2 uses several approaches to improve the PolSAR image classification. It selects pre-determined convolutional kernels from the important regions of the image without requirement to learning. So, it does not need a high volume of training samples. During a multi-view analysis, diverse classification maps with different information are generated. Moreover, a feature space with reduced dimensionality containing the minimum overlapping and maximum class separability is provided by a two-step discriminant analysis method. Finally, the classification map is generated by a high confidence decision fusion.
The convolutional architectures extract the spatial features from the neighborhood regions. However, they cannot well extract global information from the whole image. To solve this issue, the transformers and attention-based modules have been introduced. They globally extract long-range dependencies and interactions21-22. The transformers were initially introduced for natural language processing. Thereafter, the vision transformer (ViT) has been introduced for capturing the global information by using self-attention mechanism in image domain23. A ViT based PolSAR classifier is introduced in24. Due to presence of objects with different sizes and shapes in natural scenes, there are heterogenous regions in PolSAR images with various contextual information, which can be represented in different levels and scales. To explore this rich source of information, the multi-scale and multi-level attention learning (MMAL) network is introduced in25. It utilizes the cross-attention mechanism to explore the relationships among low-level and high-level features and the relationships among medium-level and high-level features in multiple scales.
The use of high number of polarimetric features in input of deep learning models such as CNN can provide various scattering information. However, the use of high dimensional feature cube is not so effective specially when small training set is available. To solve this issue, an attention based polarimetric feature selection (AFS) convolutional network, called as AFS-CNN is proposed in26, which does feature selection and classification in an end-to-end framework. In addition to CNN and its variants, other forms of deep learning models such as recurrent neural networks have been tried for PolSAR image analysis. For example, in27, the neighborhood regions are converted to spatial sequences. Then, multi-scale spatial features are explored by applying an attention-based multi-scale spatial enhanced long short-term memory (AMSE-LSTM). To extract the pixel-based scattering relationship in PolSAR image, a graph-based complex-valued 3DCNN is used besides the random field with high order cliques in deep features based high order triple discrimination random field (DF-HoTDF) model28.
Segmentation based PolSAR analysis can improve the classification performance. Considering superpixels (SPs) instead of disjoint pixels not only reduces the noise and explores the contextual information but also reduces the computations. The simple linear iteration clustering (SLIC) method is a simple and efficient segmentation algorithm29. More advanced SP generation methods have been introduced for PolSAR image segmentation to well keep details in heterogeneous regions and result in smooth representation in homogeneous regions. An improved version of SLIC is proposed for PolSAR images in30, which adapts the polarimetric characteristics and statistical measures of PolSAR. Moreover, it uses polarimetric feature similarities as statistical distances in its clustering function. The revised Wishart distance is integrated with geodesic distance for PolSAR clustering through a cross-iteration strategy in31. In32, a fuzzy SP algorithm is introduced, which uses the correlation between scattering information to cluster pixels. A PolSAR hierarchical energy driven method is introduced in33 for PolSAR image segmentation. In the coarse level, it uses the histogram intersections of the coherency matrix for SP generation, and in the fine level, it uses the Wishart energy for SP evaluation. In34, the relationship between initial SP size and the structural complexity of PolSAR is constructed beside the determinant ratio test, which lead to a reliable SP generation method with adaptive size estimation.
A composite kernel-based elastic net classifier based on SPs is introduced for PolSAR image classification in35. At first, three types of features are extracted using SP segmentation of different scales. Then, these features are mapped by constructing a composite kernel exploiting the correlation and diversity between different features. Finally, the elastic net classifier is integrated with composite kernel for PolSAR image classification using limited training samples. Advanced methods have recently suggested for SAR image segmentation. For example, in36, both of SP generation and merging steps are incorporated into a unified deep network. At first, a differentiable SP generation method is employed for oversegmentation of the single polarization SAR. Its output is the belonging likelihood of pixels to different SPs. Then, in the merging part, the soft SPs set is converted into a self-connected weighted graph. As an advantage, the shapes of SPs are iteratively adjusted according to the boundaries during training. SPs are introduced into the hypothesis test theory in37 for PolSAR change detection and built-up area extraction. To this end, the PolSAR image is firstly oversegmented into a set of SPs, and the probability density function of a SP’s reflectivity is derived. Then, the superpixelwise likelihood-ratio test statistic is presented to measure similarity of covariance matrices of two superpixelwise for unsupervised change detection.
To deal with small sample size problem in CNN, a dual branch CNN is introduced in38, which uses a SP algorithm for expanding the number of labeled samples. The first branch of CNN extracts polarization features and the second branch extracts spatial features. Moreover, the ensemble learning algorithm is used for the dual branch CNN to improve the classification results. Due to high level feature extraction, deep learning methods may cause edge confusion. To handle this issue, in39, a double channel CNN with an edge preserving Markov random field is proposed. A subnetwork uses the Wishart based complex matrix to learn the statistical characteristics and another subnetwork learns high level semantic features. Although Vision Transformer (ViT) has shown great performance for PolSAR image classification, it requires large labeled samples for training and encounters with semantic misalignment due to fixed patch tokenization. To address these issues, a SP content-aware and semi-supervised ViT network is suggested in40. To generate the token sequences, SPs with random sizes are divided into blocks and masked randomly. To implement a semi-supervised ViT, both of supervised and unsupervised learning are integrated.
Graph neural networks are introduced in limited works for PolSAR image classification. In41, a graph convolutional network is used for neural architecture search. To this end, a graph is constructed whose nodes are pixels of the PolSAR image. It introduces a searching space which its components are from some graph neural networks. To deal with small sample size situation in PolSAR image classification, a graph-based semisupervised deep learning method is proposed in42. The PolSAR image is modeled as an undirected graph where labeled and unlabeled pixels are considered as the nodes, and the weighted edges show similarities between pixels. A CNN model is used for polarimetric feature extraction and outputs the class labels to the graph model.
In CNN and many other deep learning models, the input of model is fixed size patches where label is assigned to the central pixel. Considering the relationships among adjacent pixels in neighborhood regions and considering SPs as input of the model may improve the classification map through involving local information and reducing noise pixels. From the other hand, the graph-based networks with aggregating the node features can provide an improved feature representation. Graph convolutional network (GCN)43-44 with utilizing nonlocal features and modelling the data structures leads to feature representation enhancement. To benefit advantages of SP based analysis and graph based analysis, several works have integrated these approaches together. A SP-wise segmentation network for single polarization SAR image is introduced in45. It firstly uses a differentiable boundary-ware clustering method to estimate task-specific SPs using a simple fully convolutional network. Then, a soft graph convolution network takes the association map and results in the SP-wise segmentation. As an advantage, both of SP generation and graph convolution parts are trained under a unified framework and the shapes of SPs are adjusted according to the segmentation results adhering the boundaries.
The feature enhanced SP hypergraph neural network (FESHNN) is introduced in46 for PolSAR image classification, which benefits the advantages of SP-based graph models for extraction of polarimetric correlation and spatial correlation. Its feature discrimination is enhanced with refining the local features contained in pixels and SPs. However, this method ignores the global information contained in class labels through the scene. Efficiency of the SP-based GCN method is substantially dependent on the SP segmentation result effected by speckle noise and scattering confusion. To deal with this difficulty, a hybrid weighted fuzzy SP-based GCN method is introduced in47, which corrects the edge pixels through defining a fuzzy projection matrix. The features are transformed from SP to pixel where the edge pixels’ features are computed from all neighboring SPs to refine edges to the most similar region. Both of the multifeature distances and revised Wishart are used to define the hybrid weighted adjacent matrix. This method disregards the local individual features of pixels and captures the global contextual information. For local and global feature combining, the graph network is integrated with a 3DCNN into a unified framework.
As said, most of existing PolSAR classification methods take fixed patches as input and utilize CNN for feature extraction and classification where only local information is utilized. The polarimetric information in PolSAR images is complex, and the use of only local information may be not sufficient for providing an accurate classification map. In contrast to pixel-based methods, approaches such as graph neural networks, which consider irregular SPs as input and update features through the graph structure based on information of adjacent nodes learn global information beside the local one, and so, improve the PolSAR image classification. However, the scale of SPs affects the classification results due to existence of objects with various shapes and sizes. To handle this issue, a multiscale SP guided weighted graph convolutional network is proposed in48. At first, it segments the PolSAR image into SPs of three different scales. Then, the correlation among SPs is used to form the adjacency matrix. Then, the weighted graph convolutional network is utilized for providing the SP feature representation. Finally, a multiscale feature cascade fusion module is introduced for providing the pixel level features.
Because of some reasons, the SP-based analysis of a PolSAR image can be preferred than the pixel-based analysis: (1) due to the noisy inherent of SAR images, the SP representation of a PolSAR image is more appropriate than its pixel representation because SPs remove the noisy pixels; (2) the SP representation implicitly explores the spatial information of the PolSAR images; and (3) the use of SPs instead of pixels degrades the computational burden. Usually, the appropriate number of SPs is set manually. In49, a formula is presented for computing the SP segmentation scale for a hyperspectral image. In49, the inherent properties of hyperspectral images such as spatial size, texture ratio, spatial resolution, and the number of categories are taken to account to compute the number of SPs. However, selection of an appropriate number of SPs in a PolSAR image needs to trial and error that is a troublesome task. From the other hand, limited works have studied the ability of graph convolutional networks for PolSAR image classification. Because of ability of graphs in exploring the hidden relationships among defined nodes, they are great tools for feature extraction in complex feature spaces. Moreover, graphs with providing a total view on all nodes, can explore the global information.
Although different works have shown great success in providing accurate classification maps with utilizing the advantages of SPs and graphs but they mostly utilize the SP based graph structure for global feature extraction. Some works have integrated graph networks with CNN to combine global and local features. However, exploring the irregular local structure is not possible by applying a convolutional network. To address this issue, a SP based network is designed in this work, which utilizes the graph networks for exploring both local and global features. A local graph from neighborhood SPs, and a global graph from the nearest SPs of different classes through the PolSAR image are composed. The local and global graphs are individually analyzed using the graph convolutional networks, and then, their extracted features are fused and used for PolSAR image classification. Moreover, most of SP based networks have difficulty of determining the number of SPs through trial and error. This issue is also addressed in this work through introducing an automatic method for determining the number of SPs. In addition, the most of graph-based networks usually have high number of learnable parameters and are highly complicated. A simple dual graph convolutional network with low number of learnable parameters is proposed in this work, which is simply implemented and runs fast in the prediction (test) phase.
To improve the PolSAR image classification, a SP-based graph convolutional network (SGCN) is introduced here, which consists of two branches. While the first branch extracts local spatial features through the local graph constructed from unlabeled neighboring SPs, the second branch extracts global information through the global graph constructed from the nearest labeled SPs from all classes. The main contributions of this work are represented as follows:
A graph convolutional network with two branches containing the local and global information is constructed for PolSAR image classification.
The local graph provides the local neighborhood information and also the unlabeled samples structures.
The global graph contains the global class information and the labeled samples structures.
The number of SPs is determined automatically by computing the standard deviation vector and its second order differential vector.
The number of neighboring SPs is automatically determined by considering multiple local windows around all composing pixels of each SP.
The SGCN model is assessed with an ablation study where just the local branch is used (SGCN-L), where just the global branch is used (SGCN-G), and where both local and global branches are fused together (SGCN-LG). The experimental results show superior performance of the proposed SGCN-LG model in different PolSAR images. Comparison with several state-of-the-art methods shows that SGCN-LG outperforms its competitors with using lower number of training samples.
Proposed SGCN-LG model
In this work, the superpixel based graph convolutional network (SGCN) for local and global feature fusion, called as SGCN-LG, is proposed for PolSAR image classification. The proposed SGCN model consists of two branches of local graph and global graph where each graph is composed of superpixels (SPs) as the graph nodes. The local and global features containing the data structures of spatial neighbors and class neighbors, respectively, are eventually fused to classify the input SP. The flowchart of the SGCN-LG framework and the proposed network are shown in Figs. 1 and 2, respectively.
Fig. 1.
Flowchart of the proposed SGCN-LG framework.
Fig. 2.
The proposed network in the SGCN-LG model.
As said, SGCN-LG contains two parts of local (SGCN-L) and global (SGCN-G), which are explained with more details in the following. But, before that, generation of SPs with automatic determining the number of SPs is described.
Superpixels generation with automatic determining the number of superpixels
The proposed graph model uses the SPs of the PolSAR image as nodes. The SLIC algorithm is used here for generation of SPs because it is a well-known method with simple implementation. The SLIC algorithm is applied to the first principal component (PC1) obtained by the principal component analysis (PCA) transform50, which is normalized by:
![]() |
1 |
where
and
computes the minimum and maximum values among all pixels of the PC1 image. The SLIC algorithm is applied to the PC1, which contains the polarimetric components with the most energy. So, it considers the polarimetric information of the PolSAR image.
In this algorithm, the number of SPs denoted as
is given as input, which is a user defined parameter. Although the appropriate value of
can be determined for each dataset by using the experiments, a simple method is proposed in this work for automatic determination of SPs, which provides appropriate results for various PolSAR images.
The number of SPs in an image can be approximated with the number of the considered square patches in it. With considering
patches, each patch contains
pixels. Let
be the number of total pixels in the image. So, we have
where
rounds the input argument. To find the appropriate patch size
, 25 odd numbers in range of3,35 are considered as the patch size candidates in a vector
where
means range of3,35 with step 2, and
is the transpose operation. For each considered patch size
, the PolSAR image in each polarimetric channel is divided to
patches
where
denotes the number of polarimetric channels. Standard deviation (std) of pixels in each patch is computed. The average of std of all patches in all polarimetric channels is computed associated with each patch size as follows:
![]() |
2 |
where
is
th patch generated in
th channel associated with patch size
.
computes the standard deviation of pixels within the input patch. With computing the std value for all assumed patch sizes, the vector
is obtained. The difference operation is applied to the
. For a vector
with length
, the difference
operation calculates the differences between adjacent elements of vector
as follows:
![]() |
3 |
To find the first order differences of
, we compute:
![]() |
4 |
With applying the difference operation to
, the second order differences are also obtained by:
![]() |
5 |
The output dimension of the difference operation is equal to dimension of the input vector minus one. So, because dimension of
is 25, dimensions of
and
are 24 and 23, respectively. The vector
and its first and second order differences versus the patch size are shown for the Sanfrancisco image in Fig. 3. Because the length of
is two units less than length of
, it can be seen that the x-axis for plot
in Fig. 3(a) starts from 3 while the x-axis for
and
starts from 5 to 7, respectively in Fig. 3(b) and Fig. 3(c).
Fig. 3.

The (a)
values, (b) the first order differences, and (c) the second order differences versus the patch size.
As seen, with increasing the patch size, generally, the std value is increasing,
is decreasing, and the associated
results in negative values. This is expected because with increasing the patch size, more pixels different from the central pixel may be located in the patch, which lead to the increasing of variance. Moreover, with increasing the patch size, the changes values of variance, which correspond to differential of std values, are decreased to a point that patch contains the related and similar pixels. In other words, with increasing the patch size from a place to next, the change variations start to increase, which show the patch may contain the non-related pixels with respect to the central. Therefore, the first place that
starts to increase or
becomes positive, can show the appropriate patch size:
![]() |
6 |
where vector
shows the indices associated with positive values in
. The first index of
, i.e.,
, corresponds to the first place that
is positive. The appropriate patch size corresponds to index
in the patch size vector
, i.e.,
. Because considering the first index,
, leads to selection of a relatively small patch size, and so, selection of high number of SPs, which not only increases the graph computations but also does not provide high accurate results, instead of
, the use of
is suggested for determining the appropriate patch size. In other words, instead of considering the first place that
becomes positive, the third place that
becomes positive, i.e.,
, is considered, which leads to selection of larger patch size, and so, lower number of SPs. For example, in Fig. 3, we can see that the third place that
results positive value in Fig. 3(c), associated with the third place that
starts to increase in Fig. 3(b), corresponds to patch size
in Fig. 3(a).
The number of pixels in each SP is approximated by
where
is the appropriate patch size selected from the patch size vector
and indices
are determined by (6), and
is the third element of the vector
.
is finally obtained to use as input of the SLIC segmentation algorithm.
To process an image with homogeneous regions, the image should be partitioned to a smaller number of SPs where each SP covers a relatively large homogeneous region with high number of similar pixels. In contrast, to process an image with heterogeneous regions, the image should be partitioned to a larger number of SPs with small areas where each SP covers a small region containing a low number of similar pixels.
In the images with homogeneous regions, the adjacent pixels considered within a patch have smaller variance value. So,
starts to increase later, and therefore,
becomes positive later. In other words, a larger value of patch size from the patch size vector
is selected, which is associated with larger number of pixels in each SP, i.e.,
, and lower number of SPs
. In contrast, In the images with heterogeneous regions, there is high variations in image. So,
starts to increase earlier, and therefore,
becomes positive earlier. Thus, a smaller value of patch size is selected, which is associated with smaller
and larger
.
According to above method, the number of SPs for three datasets, which will be introduced in Sect. 3, are obtained as follows:
in Flevoland image,
in Sanfrancisco image, and
in Oberpfaffenhofen image. The generated SP maps for three datasets are shown in Fig. 4.
Fig. 4.
SP maps of Flevoland, Sanfrancisco and Oberpfaffenhofen images.
Graph construction
An undirected graph
with
as the vertex set containing
nodes and
as the edge set is constructed where
represents the edge among nodes
and
, i.e., the edge
. The adjacency matrix
is composed on the edge set
as follows:
![]() |
7 |
where
as the maximum operator finds the maximum value from the input elements, and
is the feature vector associated with node
where
is dimensionality of the feature vector. The diagonal degree matrix
is constructed as
. With considering
as the identity matrix with dimensions of
,
is the adjacency matrix with added self-connections and
is the degree matrix of
. The normalized adjacency matrix will be
. Considering the architecture introduced for graph convolution operation51-52, we have:
![]() |
8 |
where
is the activation function that is the rectified linear unit (ReLu) here.
denotes the feature matrix obtained after
layers and
where
is the input feature matrix.
is the trainable weight matrix for multiplication in layer
. To implement the trainable weight matrix for multiplication, the elementwise production followed by a 2D convolutional operator with 1 filter from size of
is applied in this work (see Fig. 2).
SGCN-L
In the SGCN-L model, a graph is constructed on a given SP and its
spatial neighbors. After segmentation of the PolSAR image in previous section, a local graph is constructed from each SP and its adjacent SPs. Assume that a superpixel
contains
pixels. Around each pixel of the SP, a
local window is considered that
![]() |
9 |
where the appropriate size of
can be obtained through experiments (it is discussed in Sect. 3.2). The value
is approximation of the number of pixels in a SP. For considering a super window containing
SPs, which consists of about
pixels, the length of the square window will be about the square root of
. For each pixel of the SP, a
window is constituted and label (number) of the SP that each pixel of this window belongs to it is saved. So, for superpixel
that has
pixels, the central pixel 1 and its neighbors in its
local window belongs to
SPs. Similarly, the neighbors of pixel 2 in its
neighborhood window belongs to
SPs, and eventually, the neighbors of pixel
in its
neighborhood window belongs to
SPs. Because the adjacent pixels may belong to the same SP, the unique SPs is countered. So,
are the number of unique SPs in pixels 1, 2,…,
, respectively. The minimum number among
is obtained by:
![]() |
10 |
So,
neighbors is determined for superpixel
. This process is repeated for all
SPs in the image and
neighbors is selected for
th SP. Minimum of the obtained numbers is finally considered as
:
![]() |
11 |
Eventually, for each SP,
SPs that are located in local window of its composing pixels are selected as
local neighboring SPs. After here to next, for simplicity in notations,
is written instead of
. According to what explained, the number of neighboring SPs for three datasets are obtained as follows:
in Flevoland,
in Sanfrancisco, and
in Oberpfaffenhofen image.
Now, for each SP, a local graph with
nodes is made where the nodes are the adjacent SPs that are neighbors of the given SP. The feature matrix for each SP is:
![]() |
12 |
where
is the mean of pixels that belong to superpixel
, and
is the number of polarimetric channels. For each pixel of the PolSAR image, 9 elements of the coherency matrix
are used as the feature vector as follows:
![]() |
13 |
So, we have
. Although there are many target decomposition methods that can explore the polarimetric scattering features, we do not use them because of some reasons: 1- for simplicity and avoiding more computations, 2- because the PC1 of features is used for generation of SPs, the use of polarimetric features of the coherency matrix may be sufficient.
Also, the local adjacency matrix is denoted by
. For each SP of the PolSAR image, the local feature matrix
and the local adjacency matrix
are used to compose the local graph convolutional model according to what described in the “2.2. graph construction” section. The local graph has two inputs:
and
. Because, dimensionality of the inputs must be fixed in the network, a fixed neighborhood size
has to be considered for all SPs.
SGCN-G
Assume that there are labeled samples from
classes of dataset. For each labeled pixel, the SP that belongs to it is considered as the labeled SP (training sample) where label of the given pixel is assigned to the SP. For each SP, the nearest SP from each given class is selected.
nearest neighbors from
classes are used as the nodes to form the global graph for the given SP. The mean of pixels in each SP is used as the representative feature vector of that SP and the Euclidean distance is considered for computing the nearest SPs.
Because the training samples are globally located in entire the scene, the composed graph contains the global information with features from labeled samples of all classes. For each SP of the PolSAR image, the global feature matrix
and the global adjacency matrix
are used to compose the global graph convolutional model according to what described in the “2.2. graph construction” section.
Feature fusion and classification
Outputs of two branches of model are individually flattened and then, fused together through the concatenation layer. The aim in this work is to provide a simple and light network with relatively low number of parameters to provide efficient classification results even in small sample size situations. So, fusion of local and global branches is simply done by concatenating the extracted features. Finally, a fully connected (FC) layer with
neurons, softmax and classification layer are used to find the label of the input SP as the model’s output. In this model, for each SP, the local and global feature matrices and adjacency matrices are given as the input and label of the given SP is obtained as the output. The proposed model is a SP based classification. The label of each SP is assigned to all pixels that compose it.
The classification map is filtered by the guided filter with the first principal component as the guidance image to provide a classification map aligned with the real class boundaries. The guided filter has two free parameters, which are set in the experiments:
determines the length of the filtering window,
, and
is the regularization parameter. For more details about the guided filter, the interested reader is referred to53.
Experiments
Datasets and parameter settings
Three real L-band PolSAR images are used for experiments. The first dataset is Flevoland acquired by AIRSAR, which contains 15 classes with 750
1024 pixels. The second image acquired by AIRSAR is Sanfrancisco Bay image with 5 classes and 900
1024 pixels. The third dataset acquired by electronically steered array radar (ESAR) is Oberpfaffenhofen with 4 classes and 1297
935 pixels.
In the Flevoland image, 100 training samples (labeled pixels) per class and in two other datasets, 500 training samples per class are used. For applying the guided filter in the end of the proposed method, the following settings are set:
and
in Flevoland,
and
in Sanfrancisco, and
and
in Oberpfaffenhofen dataset. For pixel based classification methods, the use of smaller window sizes in the guided filter is recommended. But, because the proposed method is a superpixel based classification, there are larger homogenous regions in the obtained classification maps, and so, larger guided filters are applied to them. The Adam optimizer with initial learning rate of 0.001, batch size of 50 and 200 epochs are used for training the proposed methods.
The proposed framework is assessed in three cases: when just local graph is used, i.e., SGCN-L, when just global graph is used, i.e., SGCN-G, and when features of both local and global graphs are fused, i.e., SGCN-LG. These proposed models are compared with SVM, two-dimensional CNN (2DCNN), three-dimensional CNN (3DCNN), and some state-of-the-art PolSAR classification methods. SVM is assessed in two different cases where pixels or SPs are used as input of the classifier, i.e., SVM (pixel) and SVM (superpixel). For implementation of SVM, the polynomial kernel with degree 3 is used.
Due to the use of relatively small training sets, low depth 2DCNN and 3DCNN networks are used as the competitors. In 2DCNN, two convolutional layers followed by batch normalization, ReLu, and dropout with dropping probability of 0.2 are used. 4 filters with size of
and stride 2 are used in the first and second convolutional layers, respectively, with the “same padding”. In the 3DCNN, the similar settings are used where
convolutional filters are used instead of
filters. In the end of 2DCNN and 3DCNN models, two fully connected layers with
and
neurons, respectively are used, which are followed by softmax and classification layers. The inputs of 2DCNN and 3DCNN is patches with size of
where
.
Assessment of parameters’ effects
The effect of selection of
,
and
in classification accuracy and prediction time in Flevoland dataset is represented in Table 1. As said before, the use of
and
leads to determination of a larger number of SPs, which cause higher computational burden. Especially in the global graph, i.e., SGCN-G, the effect of computational burden is more than SGCN-L because in the SGCN-G, associated with each given SP, we have to find the nearest SPs from each class through computing the Euclidean distance. As seen, the prediction time of SGCN-G (containing global graph) and SGCN-LG (containing both local and global graphs) with
is about half of the prediction time of them with
and
. From the classification accuracy point of view, in the SGCN-L model, the classification accuracy obtained by
is significantly better than that with
and
. In the SGCN-G model, from
to
, the classification accuracy is decreased. With
, smaller number of SPs are generated in the PolSAR image, which may be not sufficient for accurate finding the nearest neighbors from different classes. In the SGCN-LG, classification accuracies obtained by
to
, with a bit difference with respect to each other, are better than
. Generally, for the main proposed method, i.e., SGCN-LG,
can be the best choice compared to
and
in terms of both classification accuracy and prediction time.
Table 1.
Effect of selection of
,
and
in classification accuracy and prediction time.
|
No. of superpixels
|
No. of training samples per class | SGCN-L | SGCN-G | SGCN-LG |
|---|---|---|---|---|---|
|
3413 | Overall accuracy | 90.82 | 96.08 | 98.44 |
| Prediction time (seconds) | 1.25 | 1.10 | 1.53 | ||
|
2657 | Overall accuracy | 88.82 | 94.64 | 99.35 |
| Prediction time (seconds) | 1.27 | 1.01 | 1.53 | ||
|
1229 | Overall accuracy | 95.64 | 92.15 | 99.26 |
| Prediction time (seconds) | 1.24 | 0.55 | 0.73 |
The number of considered SPs should be large enough to fit the heterogeneous regions such that the non-similar pixels are assigned to different SPs. From the other hand, increasing the number of SPs leads to increasing the computational burden.
According to Table 1, for three different numbers of SPs in Flevoland dataset, i.e.,
,
, and
, the classification accuracy and prediction time are obtained for different models of the proposed framework. The experiments show that
is an appropriate number of SPs for Flevoland dataset in terms of both classification accuracy and prediction time.
In Table 2, effect of selection of the parameter
is assessed for Flevoland dataset. For different values of
, the associated value of local window size
computed according (9), and the number of local neighboring SPs
obtained by (11) is given. Corresponding to each parameter
, the overall accuracy achieved by the local branch of the proposed model, i.e., SGCN-L is represented. Moreover, the running time for providing the neighbors to construct the local graph is given. As seen from this table, selection of a larger value for parameter
leads to considering a larger local neighborhood window
, and selection of more neighboring SPs
, which requires more running time. However, although increasing the local window size with involving more spatial information from the local regions increases the classification accuracy to a point, but from a point to next, increasing the local window includes redundant and non-related spatial information, which may degrade the class discrimination ability. As seen from this table, with increasing the local window size to
, the classification accuracy is improved but after that, the OA is decreased. With more increasing the local window, not only the classification accuracy is decreased but also the computation time is increased. According to what discussed, the parameter
is set as
for all PolSAR images in this work. Generally, with increasing the local window size, higher running time is required to find the neighboring SPs. However, the related computations are done in the training phase and do not cause delay in the test (prediction) phase.
Table 2.
Effect of selection of parameter
in providing the neighboring superpixels.
No. of neighborhood superpixels
|
Local window size
|
|
Computation time (seconds) | OA of SGCN-L |
|---|---|---|---|---|
| 3 | 45.00 | 6 | 110.62 | 79.78 |
| 5 | 57.00 | 8 | 167.77 | 88.57 |
| 7 | 67.00 | 10 | 210.25 | 90.92 |
| 9 | 77.00 | 13 | 265.65 | 95.64 |
| 11 | 85.00 | 15 | 307.15 | 93.07 |
The whole network is trained in a unified framework with an end-to-end supervised manner. In other words, the learnable parameters of both local and global graphs are determined with a supervised learning, and so, efficiencies of both of them are affected by the number of training samples. The inputs of local graph are
and
; and the inputs of global graph are
and
where
is the number of polarimetric features,
is the number of local neighboring SPs (for example, in Flevoland, we obtain
), and
is the number of classes (
in Flevoland). Generally, sizes of the input data, i.e., sizes of feature matrix and adjacency matrix, in the constructed local and graph graphs are not so large. So, the proposed models SGCN-L, SGCN-G and SGCN-LG are yet relatively efficient using limited training samples.
In Table 3, the overall accuracies obtained by different cases of the proposed framework for Flevoland dataset are reported for different number of training samples. As seen, when, the number of training samples is low (10 or 50 training samples per class), the SGCN-G works better than SGCN-L. By using 10 training samples per class, SGCN-G ranks first with significant difference with respect to SGCN-L and SGCN-LG. The main proposed method, SGCN-LG, even with low number of labeled samples (50 training samples per class) achieves high overall accuracy (OA). Although with increasing the number of labeled samples, efficiency of SGCN-L and SGCN-G are significantly improved, but, SGCN-LG, which fuses the information from both SGCN-L and SGCN-G has lower sensitivity to the number of training samples where its difference between two cases of 100 training samples per class and 150 training samples per class is not significant.
Table 3.
Overall accuracy obtained in different sizes of training set.
| No. of training samples per class | SGCN-L | SGCN-G | SGCN-LG |
|---|---|---|---|
| 10 | 48.86 | 78.51 | 67.15 |
| 50 | 68.49 | 86.09 | 95.97 |
| 100 | 95.64 | 92.15 | 99.26 |
| 150 | 98.68 | 94.82 | 99.29 |
Classification results
In Table 4, the classification results obtained for Flevoland dataset are reported. The classification accuracy of each class, average accuracy (AA), overall accuracy (OA), and kappa coefficient (K) are represented. As seen, the proposed SGCN-LG model provides the highest AA, OA, K and Macro-F1 values. After that, SVM (superpixel) ranks second. SGCN-L, 3DCNN, SGCN-G, 2DCNN, and SVM (pixel) are located in the next ranks, respectively. Most of classes in Flevoland image are related to agriculture regions with grained texture and varied polarimetric characteristics. So, in the agriculture classes such as “Lucerne”, “Beet”, “Grass”, “Rapeseed” and “Wheat 3”, SGCN-L, which explores local information from the neighborhood context performs better than SGCN-G, which focuses on global feature extraction. In contrast, in the class of “Buildings” which has coarser texture, the SGCN-G network outperforms the SGCN-L. Generally, in this dataset, SGCN-L results in more accurate classification map with respect to SGCN-G, which shows more importance of local features contained in neighborhood SPs than the global class information in this image. 100 samples in each category (totally 1500 samples) are used as training set, which is relatively a small set. SVM, which has low sensitivity to the number of training samples provides high accurate results when it is implemented SP based where the use of SPs effectively improves its performance. Although a low depth is considered for 2DCNN and 3DCNN architectures, yet it seems that the used small training set is not enough for good learning of their models.
Table 4.
Classification results for the flevoland dataset.
| Name of class | # samples | SGCN-L | SGCN-G | SGCN-LG | SVM (pixel) | SVM (superpixel) | 2DCNN | 3DCNN |
|---|---|---|---|---|---|---|---|---|
| Stembeans | 6103 | 99.57 | 99.25 | 99.41 | 96.00 | 98.31 | 97.20 | 97.64 |
| Peas | 9111 | 100.00 | 100.00 | 99.95 | 96.53 | 99.73 | 97.32 | 96.90 |
| Forest | 14,944 | 93.75 | 92.92 | 99.59 | 86.32 | 98.32 | 90.20 | 97.42 |
| Lucerne | 9477 | 97.30 | 82.57 | 94.68 | 91.09 | 97.38 | 95.42 | 95.28 |
| Wheat | 17,283 | 98.86 | 97.18 | 99.86 | 78.67 | 98.76 | 83.93 | 92.89 |
| Beet | 10,050 | 90.88 | 84.95 | 98.64 | 90.11 | 97.25 | 94.44 | 95.26 |
| Potatoes | 15,292 | 76.49 | 89.61 | 98.11 | 75.22 | 91.90 | 88.27 | 86.86 |
| Bare soil | 3078 | 100.00 | 100.00 | 100.00 | 99.71 | 100.00 | 100.00 | 100.00 |
| Grass | 6269 | 99.79 | 89.26 | 99.98 | 79.01 | 98.37 | 78.96 | 88.47 |
| Rapeseed | 12,690 | 95.55 | 84.04 | 99.76 | 80.64 | 98.12 | 85.58 | 92.41 |
| Barely | 7156 | 100.00 | 99.85 | 100.00 | 93.00 | 99.68 | 99.34 | 96.12 |
| Wheat 2 | 10,591 | 99.87 | 99.79 | 99.87 | 79.09 | 98.22 | 92.81 | 88.68 |
| Wheat 3 | 21,300 | 99.83 | 85.54 | 99.97 | 85.82 | 98.70 | 94.02 | 95.63 |
| Water | 13,476 | 98.26 | 98.66 | 99.90 | 96.96 | 99.13 | 98.81 | 97.27 |
| Buildings | 476 | 85.08 | 97.27 | 90.55 | 94.75 | 97.90 | 90.97 | 95.38 |
| AA | 95.68 | 93.39 | 98.68 | 88.19 | 98.12 | 92.49 | 94.42 | |
| OA | 95.64 | 92.15 | 99.26 | 86.10 | 97.89 | 91.81 | 93.99 | |
| K | 95.24 | 91.45 | 99.19 | 84.86 | 97.70 | 91.07 | 93.44 | |
| Macro-F1 | 95.47 | 88.62 | 97.43 | 85.85 | 97.97 | 91.79 | 93.97 | |
The proposed SGCN model, which is a graph constructed on the SPs, can well learn specially when both local and global information are fused. To assess whether the difference of the classification methods is statistically significant or not, the Z scores are computed according to the McNemars test54, and the results are shown in Table 5. It is seen that SGCN-LG provides positive Z values much larger than 1.96 with respect to other methods, which shows superior performance of SGCN-LG compared to other methods from the statistical point of view. The Pauli RGB, ground truth map (GTM), and the classification maps of the Flevoland dataset are shown in Fig. 5.
Table 5.
McNemars test results for the flevoland dataset.
| SGCN-L | SGCN-G | SGCN-LG | SVM (pixel) | SVM (superpixel) | 2DCNN | 3DCNN | |
|---|---|---|---|---|---|---|---|
| SGCN-L | 0 | 42.84 | − 70.48 | 94.02 | − 39.13 | 45.14 | 21.29 |
| SGCN-G | − 42.84 | 0 | − 103.48 | 56.73 | − 76.39 | 3.65 | − 21.63 |
| SGCN-LG | 70.48 | 103.48 | 0 | 138.69 | 35.14 | 101.38 | 83.29 |
| SVM (pixel) | − 94.02 | − 56.73 | − 138.69 | 0 | − 122.74 | − 61.67 | − 88.71 |
| SVM (superpixel) | 39.13 | 76.39 | − 35.14 | 122.74 | 0 | 78.90 | 57.30 |
| 2DCNN | − 45.14 | − 3.65 | − 101.38 | 61.67 | − 78.90 | 0 | − 32.96 |
| 3DCNN | − 21.29 | 21.63 | − 83.29 | 88.71 | − 57.30 | 32.96 | 0 |
Fig. 5.
Classification maps for the Flevoland dataset.
As seen, SGCN-L, SGCN-G, SGCN-LG, and SVM (superpixel), which use the SPs as input provide cleaner classification maps with respect to SVM (pixel), 2DCNN and 3DCNN. Although in 2DCNN and 3DCNN, the patches are used as input, but a patch is representative of its central pixel where the label in the output is assigned to the center.
The classification accuracies and Z scores for Sanfrancisco dataset are reported in Tables 6 and 7, respectively. The Sanfrancisco image has classes with large and approximately uniform texture. So, difference between SGCN-L and SGCN-G is not so significant in most of classes of this dataset. However, in the class of “Ocean”, which is inherently.
Table 6.
Classification results for the Sanfrancisco dataset.
| Name of class | # samples | SGCN-L | SGCN-G | SGCN-LG | SVM (pixel) | SVM (superpixel) | 2DCNN | 3DCNN |
|---|---|---|---|---|---|---|---|---|
| Bare soil | 15,628 | 66.80 | 67.19 | 59.85 | 39.67 | 70.62 | 83.89 | 85.95 |
| Mountain | 63,295 | 93.67 | 91.07 | 92.86 | 61.54 | 85.76 | 90.91 | 92.89 |
| Ocean | 328,118 | 82.36 | 97.69 | 98.29 | 81.26 | 61.95 | 95.90 | 95.45 |
| Urban | 343,465 | 94.17 | 96.18 | 98.68 | 29.37 | 84.86 | 89.90 | 87.45 |
| Vegetation | 54,758 | 79.06 | 83.15 | 87.89 | 33.84 | 45.92 | 82.36 | 83.79 |
| AA | 83.21 | 87.06 | 87.51 | 49.14 | 69.82 | 88.59 | 89.11 | |
| OA | 87.76 | 94.95 | 96.58 | 53.55 | 72.67 | 91.79 | 90.86 | |
| K | 81.71 | 92.12 | 94.63 | 38.89 | 61.84 | 87.54 | 86.24 | |
| Macro-F1 | 75.10 | 87.09 | 89.95 | 44.10 | 58.03 | 81.76 | 80.57 | |
Table 7.
McNemars test results for the Sanfrancisco dataset.
| SGCN-L | SGCN-G | SGCN-LG | SVM (pixel) | SVM (superpixel) | 2DCNN | 3DCNN | |
|---|---|---|---|---|---|---|---|
| SGCN-L | 0 | − 190.34 | − 243.33 | 443.93 | 248.62 | − 90.72 | − 68.85 |
| SGCN-G | 190.34 | 0 | − 87.79 | 551.39 | 399.39 | 107.22 | 129.59 |
| SGCN-LG | 243.33 | 87.79 | 0 | 573.08 | 421.36 | 152.64 | 171.20 |
| SVM (pixel) | − 443.93 | − 551.39 | − 573.08 | 0 | − 237.37 | − 513.21 | − 502.50 |
| SVM (superpixel) | − 248.62 | − 399.39 | − 421.36 | 237.37 | 0 | − 332.58 | − 312.76 |
| 2DCNN | 90.72 | − 107.22 | − 152.64 | 513.21 | 332.58 | 0 | 48.29 |
| 3DCNN | 68.85 | − 129.59 | − 171.20 | 502.50 | 312.76 | − 48.29 | 0 |
a homogeneous region, the use of SGCN-G significantly results in better classification result compared to SGCN-L. In this dataset, SGCN-G significantly works better than SGCN-L according to AA, OA, K, Macro-F1 and Z scores in the McNemars test. It shows that the global features contained in the nearest labeled SPs from each class has much more discrimination information with respect to the unlabeled neighboring SPs in the local regions in this image. The proposed SGCN-LG model with considering both local features in unlabeled samples and global ones in labeled samples provides the best classification results. SVM in both cases of pixel-based and superpixel-based cannot work well. 2DCNN and 3DCNN achieve close results where AA of 3DCNN is higher than that of 2DCNN but OA, K and Macro-F1 of 2DCNN is better. Note that due to the high number of learnable parameters in 3DCNN with respect to 2DCNN, 2DCNN may work better than 3DCNN if the training set is not large enough. In this dataset, SGCN-LG, SGCN-G and 2DCNN are the best candidates. The classification maps for Sanfrancisco dataset are shown in Fig. 6. Although SVM (superpixel) uses the SPs as input, but it fails to align the real class boundaries, and there are high false alarm regions in its classification map. Moreover, in SGCN-L, the class with blue color in GTM (Bare soil) is assigned wrongly to a large area of the class with green color (Ocean). SVM (pixel) fails to work. Although 2DCNN and 3DCNN provides relatively accurate results, but their classification maps are very noisy.
Fig. 6.
Classification maps for the Sanfrancisco dataset.
The classification results and the Z score values for Oberpfaffenhofen dataset are represented in Tables 8 and 9, respectively. In homogeneous classes such as “Open areas” and “Wood land”, SGCN-G outperforms SGCN-L, and in classes with more contextual details such as “Built-up areas”, SGCN-L significantly works better than SGCN-G. In this dataset, generally SGCN-G works better than SGCN-L, and SGCN-LG ranks first with significant difference with respect to other methods from the statistical point of view. SVM in both cases of pixel-based and superpixel-based provides OA less than 50%, which is not acceptable. 3DCNN and SGCN-G ranks second and third, respectively. The classification maps are shown in Fig. 7. It can be seen that the class with yellow color in GTM (Built-up areas) is not well detected in SVM (pixel). There are lots of false alarm regions in SVM (superpixel). 2DCNN and 3DCNN provide high noisy classification maps. In the SGCN-L, there are lots of false alarm regions. In contrast, SGCN-LG and SGCN-G provide more accurate and cleaner classification maps. Although SGCN-LG and SGCN-G provide higher classification accuracy with respect to other models, however, they are superpixel-based methods, which have not ability of preserving the edges and class boundaries as the same as pixel-level methods.
Table 8.
Classification results for the Oberpfaffenhofen dataset.
| Name of class | # Total samples | SGCN-L | SGCN-G | SGCN-LG | SVM (pixel) | SVM (superpixel) | 2DCNN | 3DCNN |
|---|---|---|---|---|---|---|---|---|
| Open areas | 625,029 | 48.70 | 68.95 | 70.48 | 32.78 | 65.14 | 72.92 | 73.20 |
| Wood land | 202,032 | 89.02 | 92.92 | 89.79 | 74.45 | 11.13 | 70.27 | 72.36 |
| Built-up areas | 190,202 | 67.30 | 33.08 | 71.79 | 14.49 | 27.10 | 38.10 | 43.45 |
| Road | 195,432 | 26.66 | 29.15 | 39.37 | 29.41 | 22.61 | 26.52 | 28.77 |
| AA | 57.92 | 56.02 | 67.85 | 37.78 | 31.49 | 51.95 | 54.44 | |
| OA | 54.78 | 60.90 | 68.89 | 36.31 | 43.32 | 59.54 | 61.23 | |
| K | 38.58 | 43.25 | 55.04 | 14.83 | 16.08 | 39.86 | 42.61 | |
| Macro-F1 | 50.81 | 52.13 | 63.94 | 33.13 | 31.22 | 50.73 | 52.89 | |
Table 9.
McNemars test results for the Oberpfaffenhofen dataset.
| SGCN-L | SGCN-G | SGCN-LG | SVM (pixel) | SVM (superpixel) | 2DCNN | 3DCNN | |
|---|---|---|---|---|---|---|---|
| SGCN-L | 0 | − 116.47 | − 309.93 | 298.86 | 172.32 | − 81.38 | − 111.56 |
| SGCN-G | 116.47 | 0 | − 214.26 | 405.36 | 287.54 | 27.13 | − 6.66 |
| SGCN-LG | 309.93 | 214.26 | 0 | 508.32 | 400.36 | 180.26 | 150.89 |
| SVM (pixel) | − 298.86 | − 405.36 | − 508.32 | 0 | − 105.65 | − 399.15 | − 420.41 |
| SVM (superpixel) | − 172.32 | − 287.54 | − 400.36 | 105.65 | 0 | − 266.46 | − 293.75 |
| 2DCNN | 81.38 | − 27.13 | − 180.26 | 399.15 | 266.46 | 0 | − 44.73 |
| 3DCNN | 111.56 | 6.66 | − 150.89 | 420.41 | 293.75 | 44.73 | 0 |
Fig. 7.
Classification maps for the Oberpfaffenhofen dataset.
To assess impact of the guided filtering in enhancement of the proposed network and its branches, the OA values obtained in both cases of without guided filtering and with guided filtering for all datasets are reported in Table 10. According to the obtained results, because of noise removing with preserving the class boundaries according to the first principal component as the guidance image, the guided filtering increases the OA in all cases. This improvement is significant for SGCN-L in Sanfrancisco and Oberpfaffenhofen datasets.
Table 10.
The OA in both cases of without applying the guided filter and with applying the guided filter.
| Name of class | SGCN-L | SGCN-G | SGCN-LG | |||
|---|---|---|---|---|---|---|
| Without guided filtering | With guided filtering | Without guided filtering | With guided filtering | Without guided filtering | With guided filtering | |
| Flevoland | 93.21 | 95.64 | 90.36 | 92.15 | 98.15 | 99.26 |
| Sanfrancisco | 77.15 | 87.76 | 92.26 | 94.95 | 93.44 | 96.58 |
| Oberpfaffenhofen | 50.00 | 54.78 | 58.15 | 60.90 | 65.45 | 68.89 |
In this paper, the image is partitioned into SPs. Then, for each given SP, two small graphs are constructed, which lead to low computations. The feature matrix and adjacency matrices of the local graph are
and
; and the feature matrix and adjacency matrices of the global graph are
and
where
is the number of polarimetric features,
is the number of local neighboring SPs (for example, in Flevoland, we obtain
), and
is the number of classes (
in Flevoland). Generally, the sizes of constructed graphs are relatively small, and so, there is not high computations for their processing.
In Table 11, the number of learnable parameters for each method is reported. As seen, all cases of the proposed framework, i.e., SGCN-L, SGCN-G and SGCN-LG have low number of learnable parameters even smaller than the considered light 2DCNN. Approximately, each of SGCN-L and SGCN-G contains half of the learnable parameters in SGCN-LG. In Table 12, the running time in both training and test (prediction) phases are reported for different methods in the Flevoland dataset. According to the obtained results, although SGCN-L, SGCN-G and SGCN-LG models have high training time compared to other methods, but they run much faster than 2DCNN and 3DCNN, and even, faster the pixel based SVM in the test phase. Although the superpixel based SVM has the lowest running time, but it is not so efficient in various PolSAR images compared to other methods.
Table 11.
The number of learnable parameters in each model.
| Method | SGCN-L | SGCN-G | SGCN-LG | 2DCNN | 3DCNN |
|---|---|---|---|---|---|
| No. of learnable parameters | 1.8k | 2k | 3.8k | 5.9k | 23.5k |
Table 12.
The running time in each model.
| Method | SGCN-L | SGCN-G | SGCN-LG | SVM (pixel) | SVM (superpixel) | 2DCNN | 3DCNN |
|---|---|---|---|---|---|---|---|
| Training time (seconds) | 1804.58 | 1753.27 | 3050.37 | 3.04 | 0.20 | 19.59 | 18.74 |
| Test time (seconds) | 1.24 | 0.55 | 0.73 | 3.60 | 0.02 | 72.62 | 74.25 |
Generally, the proposed framework with a low number of learnable parameters, fast running in the prediction phase, and high efficiency in different images, can be a good candidate for PolSAR image classification.
Comparison with several state-of-the-art methods
The proposed SGCN-LG model is compared with several advanced methods in this section. The comparison results obtained for Flevoland dataset using 100 training samples pe class are reported in Table 13. It can be seen that the proposed SGCN-LG model provides the highest OA and kappa coefficient. The use of graph structures on SPs with fusion of two local and global views leads to the best performance. In term of the AA, DFC ranks first, and with a bit difference, SGCN-LG ranks second. Generally, after SGCN-LG, the best performance is obtained by DFC, which benefits the advantages of pre-determined convolutional kernels, multi-view analysis, two-step discriminant analysis.
Table 13.
Comparison with some state-of-the-art methods.
| Name of class | # samples | DFC | RCNN-AA | MMAL | SGCN-LG |
|---|---|---|---|---|---|
| Stembeans | 6103 | 99.56 | 99.08 | 99.23 | 99.41 |
| Peas | 9111 | 98.83 | 97.38 | 97.99 | 99.95 |
| Forest | 14,944 | 99.92 | 97.57 | 96.75 | 99.59 |
| Lucerne | 9477 | 94.68 | 97.79 | 96.81 | 94.68 |
| Wheat | 17,283 | 99.15 | 95.25 | 91.55 | 99.86 |
| Beet | 10,050 | 98.69 | 97.64 | 96.18 | 98.64 |
| Potatoes | 15,292 | 96.55 | 97.54 | 96.44 | 98.11 |
| Bare soil | 3078 | 100.00 | 99.74 | 100.00 | 100.00 |
| Grass | 6269 | 99.22 | 95.09 | 91.42 | 99.98 |
| Rapeseed | 12,690 | 98.16 | 95.29 | 91.65 | 99.76 |
| Barely | 7156 | 99.80 | 99.43 | 99.83 | 100.00 |
| Wheat 2 | 10,591 | 99.01 | 97.26 | 96.88 | 99.87 |
| Wheat 3 | 21,300 | 99.32 | 96.67 | 97.04 | 99.97 |
| Water | 13,476 | 99.98 | 99.80 | 99.93 | 99.90 |
| Buildings | 476 | 99.79 | 93.49 | 84.87 | 90.55 |
| AA | 98.84 | 97.27 | 95.77 | 98.68 | |
| OA | 98.72 | 97.26 | 96.15 | 99.26 | |
| K | 98.61 | 97.01 | 95.80 | 99.19 | |
and high confidence decision fusion. RCNN-AA that uses the convolutional autoencoder based attention for providing an appropriate input for residual CNN ranks third in terms of AA, OA and kappa coefficient. MMAL, which utilizes the cross-attention mechanism to compute the relationships among multi-level features, show less performance compared to other methods. The corresponding classification maps are shown in Fig. 8. The cleanest classification map with the least noise is achieved by SGCN-LG. After that, DFC provides the most accurate classification map. There are more number of noisy pixels in MMAL compared to other methods.
Fig. 8.
Classification maps of some state-of-the-art methods.
In the following, some advantages of the proposed SGCN-LG model, which leads to its high performance, are represented:
The use of SPs has some benefits such as noise reduction, providing spatial information and complexity reducing in graph computations.
With automatic determining the number of SPs
, and the automatic determining the number of spatial neighboring SPs for construction of the local graph
, there is not free parameter for making the graph model.While the local information of each SP is obtained by its neighbors in adjacent areas, the global information containing knowledge of the classes presented in the labeled samples is extracted from the whole image.
Conclusion
Superpixels (SPs) are used as the composing nodes of graph for PolSAR image classification in this work. The number of SPs is automatically determined according to analysis of the differential operator applied to the standard deviation values in patches with different sizes. Two local and global graphs are constructed from the generated SPs where the local graph is made of adjacent SPs in local regions, and global graph is made of the nearest SPs from all classes. The local and global features are fused for classification. The relationships among spatial features are explored in the local graph and the global structure of the labeled SPs are extracted by the global graph. The proposed model with benefiting advantages of SPs, graph networks, and local and global feature fusion provides high accurate classification results with the same or smaller training sets compared to several state-of-the-art methods. However, there are some challenges which can be studied in future works. For example, considering the within-superpixel variations specially in heterogeneous regions is important in pixel based classification. Assigning different weights to different pixels of a given SP can enhance the classification map. Moreover, the SLIC method is used here for PolSAR image segmentation out of the network. But, training an adaptive SP generation block integrated with the classification network in a unified framework can lead to better alignments of class boundaries in the final classification result.
Author contributions
Maryam Imani has all roles of Conceptualization; Methodology; Software; Validation; Formal analysis; Investigation; Writing, review & editing.
Data availability
The datasets are available online in https://ietr-lab.univ-rennes1.fr/polsarpro-bio/san-francisco and https://github.com/fudanxu/CV-CNN?tab=readme-ov-file.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Duan, D. & Wang, Y. Reflection of and vision for the decomposition algorithm development and application in earth observation studies using PolSAR technique and data. Remote Sens. Environ.261, 112498 (2021). [Google Scholar]
- 2.Imani, M. Two-step discriminant analysis based multi-view polarimetric SAR image classification with high confidence. Sci. Rep.12, 5984 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Gomez, L., Alvarez, L., Mazorra, L. & Frery, A. C. Fully PolSAR image classification using machine learning techniques and reaction-diffusion systems. Neurocomputing255, 52–60 (2017). [Google Scholar]
- 4.Li, H., Chen, J., Li, Q., Wu, G. & Chen, J. Mitigation of reflection symmetry assumption and negative power problems for the model-based decomposition. IEEE Trans. Geosci. Remote Sens.54(12), 7261–7271 (2016). [Google Scholar]
- 5.Ghazvinizadeh, A. H., Imani, M. & Ghassemian, H. Residual network based on entropy–anisotropy–alpha target decomposition for polarimetric SAR image classification. Earth Sci. Inf.16, 357–366 (2023). [Google Scholar]
- 6.Bilal, M., Israr, H., Shahid, M. & Khan, A. Sentiment classification of Roman-Urdu opinions using Naïve Bayesian. Decision tree and KNN classification techniques. J. King Saud Univ. - Comput. Inform. Sci.28(3), 330–344 (2016). [Google Scholar]
- 7.Parand, K., Aghaei, A. A., Jani, M. & Ghodsi, A. Parallel LS-SVM for the numerical simulation of fractional Volterra’s population model. Alexandria Eng. J.60(6), 5637–5647 (2021). [Google Scholar]
- 8.Sánchez-Lladó, F. J., Pajares, G. & López-Martínez, C. Improving the wishart synthetic aperture radar image classifications through deterministic simulated annealing. ISPRS J. Photogrammetry Remote Sens.66(6), 845–857 (2011). [Google Scholar]
- 9.Shi, J., Wang, W., Jin, H. & He, T. Complex matrix and multi-feature collaborative learning for polarimetric SAR image classification. Appl. Soft Comput.134, 109965 (2023). [Google Scholar]
- 10.Imani, M. Entropy/anisotropy/alpha based 3DGabor filter bank for PolSAR image classification. Geocarto Int.37(27), 18491–18519 (2022). [Google Scholar]
- 11.Imani, M. Classification using ridge regression-based polarimetric-spatial feature extraction. In Polarimetric SAR. 2021 26th International Computer Conference, Computer Society of Iran (CSICC), Tehran, Iran, 1–5 (2021).
- 12.Chen, Y. et al. Nonlinear projective dictionary pair learning for PolSAR image classification. IEEE Access.9, 70650–70661 (2021). [Google Scholar]
- 13.Song, W., Wu, Y. & Guo, P. Composite kernel and hybrid discriminative random field model based on feature fusion for PolSAR image classification. IEEE Geosci. Remote Sens. Lett.18(6), 1069–1073 (2021). [Google Scholar]
- 14.Latif, S. D. et al. Assessing rainfall prediction models: exploring the advantages of machine learning and remote sensing approaches. Alexandria Eng. J.82, 16–25 (2023). [Google Scholar]
- 15.Imani, M. Integration of the k-nearest neighbours and patch-based features for PolSAR image classification by using a two-branch residual network. Remote Sens. Lett.12(11), 1112–1122 (2021). [Google Scholar]
- 16.Wang, J. et al. Parameter selection of Touzi decomposition and a distribution improved autoencoder for PolSAR image classification. ISPRS J. Photogrammetry Remote Sens.186, 246–266 (2022). [Google Scholar]
- 17.Imani, M. Low frequency and radar’s physical based features for improvement of convolutional neural networks for PolSAR image classification. Egypt. J. Remote Sens. Space Sci.25, 55–62 (2022). [Google Scholar]
- 18.Shang, R., Wang, J., Jiao, L., Yang, X. & Li, Y. Spatial feature-based convolutional neural network for PolSAR image classification. Appl. Soft Comput.123, 108922 (2022). [Google Scholar]
- 19.Zhang, P., Liu, C., Chang, X., Li, Y. & Li, M. Metric-based meta-learning model for few-shot PolSAR image terrain classification. In 2021 CIE International Conference on Radar (Radar), Haikou, Hainan, China, 2529–2533 (2021).
- 20.Imani, M. Residual convolutional neural network with autoencoder based attention for PolSAR image classification. In 2024 13th Iranian/3rd International Machine Vision and Image Processing Conference (MVIP), Tehran, Iran, 1–6. (2024).
- 21.Yang, Z., Wu, Y., Li, M., Hu, X. & Li, Z. Unsupervised change detection in PolSAR images using Siamese encoder–decoder framework based on graph-context attention network. Int. J. Appl. Earth Obs. Geoinf.124, 103511 (2023). [Google Scholar]
- 22.Ling, J., Wei, S., Gamba, P., Liu, R. & Zhang, H. Advancing SAR monitoring of urban impervious surface with a new polarimetric scattering mixture analysis approach. Int. J. Appl. Earth Obs. Geoinf.124, 103541 (2023). [Google Scholar]
- 23.Zhang, Z. C., Chen, Z. D., Wang, Y., Luo, X. & Xu, X. S. A vision transformer for fine grained classification by reducing noise and enhancing discriminative information. Pattern Recogn.145, 109979 (2024). [Google Scholar]
- 24.Dong, H., Zhang, L. & Zou, B. Exploring vision transformers for polarimetric SAR image classification. IEEE Trans. Geosci. Remote Sens.60, 1–15, Art 5219715 (2022).
- 25.Imani, M. Attention based multi-level and multi-scale convolutional network for PolSAR image classification. Adv. Space Res.75(11), 7971–7986 (2025). [Google Scholar]
- 26.Dong, H., Zhang, L., Lu, D. & Zou, B. Attention-based polarimetric feature selection convolutional network for PolSAR image classification. IEEE Geosci. Remote Sens. Lett.19, 1–5, Art 4001705 (2022).
- 27.Hua, W., Wang, X., Zhang, C. & Jin, X. Attention-based multiscale sequential network for PolSAR image classification. IEEE Geosci. Remote Sens. Lett.19, 1–5, Art 4506505 (2022).
- 28.Song, W., Wu, Y. & Xiao, X. Nonstationary PolSAR image classification by deep-features-based high-order triple discriminative random field. IEEE Geosci. Remote Sens. Lett.18(8), 1406–1410 (2021). [Google Scholar]
- 29.Choi, K. S. & Oh, K. W. Subsampling-based acceleration of simple linear iterative clustering for superpixel segmentation. Comput. Vis. Image Underst.146, 1–8 (2016). [Google Scholar]
- 30.Yin, J. et al. SLIC superpixel segmentation for polarimetric SAR images. IEEE Trans. Geosci. Remote Sens.60, 1–17, Art 5201317 (2022).
- 31.Li, M. et al. Efficient superpixel generation for polarimetric SAR images with cross-iteration and hexagonal initialization. Remote Sens.14, 2914 (2022).
- 32.Guo, Y. et al. Adaptive fuzzy learning superpixel representation for PolSAR image classification. IEEE Trans. Geosci. Remote Sens.60(1-18), Art 5217818 (2022). [Google Scholar]
- 33.Yang, S., Yuan, X., Liu, X. & Chen, Q. Superpixel generation for polarimetric SAR using hierarchical energy maximization. Comput. Geosci.135, 104395 (2020). [Google Scholar]
- 34.Li, M. et al. Superpixel generation for polarimetric SAR images with adaptive size estimation and determinant ratio test distance. Remote Sens.15, 1123 (2023). [Google Scholar]
- 35.Cao, Y., Wu, Y., Li, M., Liang, W. & Zhang, P. PolSAR image classification using a superpixel-based composite kernel and elastic net. Remote Sens.13(3), 380 (2021).
- 36.Ma, F., Zhang, F., Xiang, D., Yin, Q. & Zhou, Y. Fast task-specific region merging for SAR image segmentation. IEEE Trans. Geosci. Remote Sens.60, 1–16, Art 5222316 (2022).
- 37.Zhang, F., Sun, X., Ma, F. & Yin, Q. Superpixelwise likelihood ratio test statistic for PolSAR data and its application to built-up area extraction. ISPRS J. Photogrammetry Remote Sens.209, 233–248 (2024). [Google Scholar]
- 38.Hua, W., Zhang, C., Xie, W. & Jin, X. Polarimetric SAR image classification based on ensemble dual-branch CNN and superpixel algorithm. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens.15, 2759–2772 (2022). [Google Scholar]
- 39.Shi, J. et al. Polarimetric synthetic aperture radar image classification based on double-channel convolution network and edge-preserving Markov random field. Remote Sens.15, 5458 (2023). [Google Scholar]
- 40.Ren, J., Zhu, K., Hu, M., Shang, R. & Zhang, M. Polarimetric SAR image classification based on superpixel content-aware and semi-supervised ViT network. Appl. Soft Comput.186(Part A), 114040 (2026). [Google Scholar]
- 41.Liu, H. et al. Graph convolutional networks by architecture search for PolSAR image classification. Remote Sens.13, 1404 (2021).
- 42.Bi, H., Sun, J. & Xu, Z. A Graph-based semisupervised deep learning model for PolSAR image classification. IEEE Trans. Geosci. Remote Sens.57(4), 2116–2132. (2019).
- 43.Yu, B., Xie, H., Fu, Y. & Xu, Z. Three-way graph convolutional network for multi-label classification in multi-label information system. Appl. Soft Comput.161, 111767 (2024). [Google Scholar]
- 44.Xu, D. et al. Difference-guided multiscale graph convolution network for unsupervised change detection in PolSAR images. Neurocomputing555, 126611 (2023). [Google Scholar]
- 45.Ma, F., Zhang, F., Yin, Q., Xiang, D. & Zhou, Y. Fast SAR image segmentation with deep task-specific superpixel sampling and soft graph convolution. IEEE Trans. Geosci. Remote Sens.60, 1–16, Art 5214116 (2022).
- 46.Geng, J., Wang, R. & Jiang, W. Polarimetric SAR image classification based on feature enhanced superpixel hypergraph neural network. IEEE Trans. Geosci. Remote Sens.60(1–12), Art 5237812 (2022). [Google Scholar]
- 47.Shi, J., He, T., Ji, S., Nie, M. & Jin, H. CNN-Improved Superpixel-to-Pixel fuzzy graph Convolution network for PolSAR image classification. IEEE Trans. Geosci. Remote Sens.61, 1–18, Art 4410118 (2023).
- 48.Wang, R., Nie, Y. & Geng, J. Multiscale superpixel-guided weighted graph convolutional network for polarimetric SAR image classification. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens.17, 3727–3741 (2024).
- 49.Zhu, W., Zhao, C., Feng, S. & Qin, B. Multiscale short and long range graph convolutional network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens.60, 1–15, Art 5535815 (2022).
- 50.Zhou, Q., Gao, Q., Wang, Q., Yang, M. & Gao, X. Sparse discriminant PCA based on contrastive learning and class-specificity distribution. Neural Netw.167, 775–786 (2023). [DOI] [PubMed] [Google Scholar]
- 51.Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR 2017), Toulon, France (2017).
- 52.Mou, L., Lu, X., Li, X. & Zhu, X. X. Nonlocal graph convolutional networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens.58(12), 8246–8257 (2020). [Google Scholar]
- 53.Imani, M. A random patches based edge preserving network for land cover classification using polarimetric synthetic aperture radar images. Int. J. Remote Sens.42(13), 4946–4964 (2021). [Google Scholar]
- 54.Roggo, Y., Duponchel, L. & Huvenne, J. P. Comparison of supervised pattern recognition methods with Mcnemar’s statistical test: application to qualitative analysis of sugar beet by near-infrared spectroscopy. Anal. Chim. Acta. 477(2), 187–200 (2003). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets are available online in https://ietr-lab.univ-rennes1.fr/polsarpro-bio/san-francisco and https://github.com/fudanxu/CV-CNN?tab=readme-ov-file.




























