Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2026 Jan 4;16:4736. doi: 10.1038/s41598-025-34965-6

Superpixel-based graph convolutional neural network for polarimetric synthetic aperture radar image classification

Maryam Imani 1,
PMCID: PMC12873321  PMID: 41486311

Abstract

The superpixel-based graph convolutional network with local and global information (SGCN-LG) is introduced for polarimetric synthetic aperture radar (PolSAR) image classification in this paper. The number of superpixels (SPs) is automatically determined according to analyzing the second order differences of the standard deviations of pixels within the square patches with various sizes. The local graph is constructed using neighboring SPs where the number of neighbors for each SP is automatically determined by considering multiple hyper windows around all composing pixels of that SP. Moreover, the nearest neighboring SPs from all classes are chosen from the entire scene to construct the global graph containing the discrimination information and relationship among labeled SPs. The local and global features are fused to achieve the classification map. According to experimental results, the proposed SGCN-LG model outperforms several powerful PolSAR classification models.

Keywords: Graph convolutional network, Deep learning, Polarimetric SAR, Superpixel, Classification

Subject terms: Electrical and electronic engineering, Computational science

Introduction

The polarimetric synthetic aperture radar (PolSAR) images with ability of full-time imagery and providing scattering information through transmission and receipt of electromagnetic waves with various polarizations in different directions1 are among the best image sources for remote sensing applications such as classification2-3. In the past decades, most of studies have been focused on physical scattering mechanism where various target decomposition methods such as Cloude–Pottier, Freeman, Pauli and Krogager4-5 have been used for scattering feature extraction. The extracted features can be classified by an appropriate classifier such as Bayesian6 or support vector machine (SVM)7. Moreover, some previous methods have been focused on the PolSAR statistical distribution where the Wishart distribution was introduced for PolSAR image classification8. However, due to complex nature of the imagery scene, the nonlinear relationship of the input image and complexity of textural structures, more efficient polarimetric and spatial features are required for accurate PolSAR image classification911.

Sparse representation with dictionary learning combined with nonlinear transformation is proposed in the nonlinear projection dictionary pair learning (NDPL) method for PolSAR image classification12. The composite kernel-hybrid discrimination random field (CK-HDRF) method utilizes the advantages of the composite kernels in handling the nonlinearity and high dimensionality beside the discriminative random field for modelling the positive distribution to analyze the complex texture13.

Recently, deep learning-based models have shown superior performance in various image processing applications such as PolSAR image classification1416. Specially, convolutional neural network (CNN) provides high success for PolSAR image analysis17-18. A PolSAR image has a 3D nature with spatial information in two first dimensions and polarimetric information in the third dimension. Therefore, three dimensional CNN with capturing 3D patchers as input can simultaneously extract scattering and spatial features19. The residual convolutional neural network with autoencoder based attention (RCNN-AA) is introduced in20. It benefits the convolutional autoencoder (CAE) for attending fine features in the PolSAR image. The scaled difference of the original input patch with its approximation obtained by CAE is considered as the attention weight containing information about the fine spatial features. The attention feature maps beside the original ones are fed into the residual CNN. The discriminative features based high confidence classification (DFC) introduced in2 uses several approaches to improve the PolSAR image classification. It selects pre-determined convolutional kernels from the important regions of the image without requirement to learning. So, it does not need a high volume of training samples. During a multi-view analysis, diverse classification maps with different information are generated. Moreover, a feature space with reduced dimensionality containing the minimum overlapping and maximum class separability is provided by a two-step discriminant analysis method. Finally, the classification map is generated by a high confidence decision fusion.

The convolutional architectures extract the spatial features from the neighborhood regions. However, they cannot well extract global information from the whole image. To solve this issue, the transformers and attention-based modules have been introduced. They globally extract long-range dependencies and interactions21-22. The transformers were initially introduced for natural language processing. Thereafter, the vision transformer (ViT) has been introduced for capturing the global information by using self-attention mechanism in image domain23. A ViT based PolSAR classifier is introduced in24. Due to presence of objects with different sizes and shapes in natural scenes, there are heterogenous regions in PolSAR images with various contextual information, which can be represented in different levels and scales. To explore this rich source of information, the multi-scale and multi-level attention learning (MMAL) network is introduced in25. It utilizes the cross-attention mechanism to explore the relationships among low-level and high-level features and the relationships among medium-level and high-level features in multiple scales.

The use of high number of polarimetric features in input of deep learning models such as CNN can provide various scattering information. However, the use of high dimensional feature cube is not so effective specially when small training set is available. To solve this issue, an attention based polarimetric feature selection (AFS) convolutional network, called as AFS-CNN is proposed in26, which does feature selection and classification in an end-to-end framework. In addition to CNN and its variants, other forms of deep learning models such as recurrent neural networks have been tried for PolSAR image analysis. For example, in27, the neighborhood regions are converted to spatial sequences. Then, multi-scale spatial features are explored by applying an attention-based multi-scale spatial enhanced long short-term memory (AMSE-LSTM). To extract the pixel-based scattering relationship in PolSAR image, a graph-based complex-valued 3DCNN is used besides the random field with high order cliques in deep features based high order triple discrimination random field (DF-HoTDF) model28.

Segmentation based PolSAR analysis can improve the classification performance. Considering superpixels (SPs) instead of disjoint pixels not only reduces the noise and explores the contextual information but also reduces the computations. The simple linear iteration clustering (SLIC) method is a simple and efficient segmentation algorithm29. More advanced SP generation methods have been introduced for PolSAR image segmentation to well keep details in heterogeneous regions and result in smooth representation in homogeneous regions. An improved version of SLIC is proposed for PolSAR images in30, which adapts the polarimetric characteristics and statistical measures of PolSAR. Moreover, it uses polarimetric feature similarities as statistical distances in its clustering function. The revised Wishart distance is integrated with geodesic distance for PolSAR clustering through a cross-iteration strategy in31. In32, a fuzzy SP algorithm is introduced, which uses the correlation between scattering information to cluster pixels. A PolSAR hierarchical energy driven method is introduced in33 for PolSAR image segmentation. In the coarse level, it uses the histogram intersections of the coherency matrix for SP generation, and in the fine level, it uses the Wishart energy for SP evaluation. In34, the relationship between initial SP size and the structural complexity of PolSAR is constructed beside the determinant ratio test, which lead to a reliable SP generation method with adaptive size estimation.

A composite kernel-based elastic net classifier based on SPs is introduced for PolSAR image classification in35. At first, three types of features are extracted using SP segmentation of different scales. Then, these features are mapped by constructing a composite kernel exploiting the correlation and diversity between different features. Finally, the elastic net classifier is integrated with composite kernel for PolSAR image classification using limited training samples. Advanced methods have recently suggested for SAR image segmentation. For example, in36, both of SP generation and merging steps are incorporated into a unified deep network. At first, a differentiable SP generation method is employed for oversegmentation of the single polarization SAR. Its output is the belonging likelihood of pixels to different SPs. Then, in the merging part, the soft SPs set is converted into a self-connected weighted graph. As an advantage, the shapes of SPs are iteratively adjusted according to the boundaries during training. SPs are introduced into the hypothesis test theory in37 for PolSAR change detection and built-up area extraction. To this end, the PolSAR image is firstly oversegmented into a set of SPs, and the probability density function of a SP’s reflectivity is derived. Then, the superpixelwise likelihood-ratio test statistic is presented to measure similarity of covariance matrices of two superpixelwise for unsupervised change detection.

To deal with small sample size problem in CNN, a dual branch CNN is introduced in38, which uses a SP algorithm for expanding the number of labeled samples. The first branch of CNN extracts polarization features and the second branch extracts spatial features. Moreover, the ensemble learning algorithm is used for the dual branch CNN to improve the classification results. Due to high level feature extraction, deep learning methods may cause edge confusion. To handle this issue, in39, a double channel CNN with an edge preserving Markov random field is proposed. A subnetwork uses the Wishart based complex matrix to learn the statistical characteristics and another subnetwork learns high level semantic features. Although Vision Transformer (ViT) has shown great performance for PolSAR image classification, it requires large labeled samples for training and encounters with semantic misalignment due to fixed patch tokenization. To address these issues, a SP content-aware and semi-supervised ViT network is suggested in40. To generate the token sequences, SPs with random sizes are divided into blocks and masked randomly. To implement a semi-supervised ViT, both of supervised and unsupervised learning are integrated.

Graph neural networks are introduced in limited works for PolSAR image classification. In41, a graph convolutional network is used for neural architecture search. To this end, a graph is constructed whose nodes are pixels of the PolSAR image. It introduces a searching space which its components are from some graph neural networks. To deal with small sample size situation in PolSAR image classification, a graph-based semisupervised deep learning method is proposed in42. The PolSAR image is modeled as an undirected graph where labeled and unlabeled pixels are considered as the nodes, and the weighted edges show similarities between pixels. A CNN model is used for polarimetric feature extraction and outputs the class labels to the graph model.

In CNN and many other deep learning models, the input of model is fixed size patches where label is assigned to the central pixel. Considering the relationships among adjacent pixels in neighborhood regions and considering SPs as input of the model may improve the classification map through involving local information and reducing noise pixels. From the other hand, the graph-based networks with aggregating the node features can provide an improved feature representation. Graph convolutional network (GCN)43-44 with utilizing nonlocal features and modelling the data structures leads to feature representation enhancement. To benefit advantages of SP based analysis and graph based analysis, several works have integrated these approaches together. A SP-wise segmentation network for single polarization SAR image is introduced in45. It firstly uses a differentiable boundary-ware clustering method to estimate task-specific SPs using a simple fully convolutional network. Then, a soft graph convolution network takes the association map and results in the SP-wise segmentation. As an advantage, both of SP generation and graph convolution parts are trained under a unified framework and the shapes of SPs are adjusted according to the segmentation results adhering the boundaries.

The feature enhanced SP hypergraph neural network (FESHNN) is introduced in46 for PolSAR image classification, which benefits the advantages of SP-based graph models for extraction of polarimetric correlation and spatial correlation. Its feature discrimination is enhanced with refining the local features contained in pixels and SPs. However, this method ignores the global information contained in class labels through the scene. Efficiency of the SP-based GCN method is substantially dependent on the SP segmentation result effected by speckle noise and scattering confusion. To deal with this difficulty, a hybrid weighted fuzzy SP-based GCN method is introduced in47, which corrects the edge pixels through defining a fuzzy projection matrix. The features are transformed from SP to pixel where the edge pixels’ features are computed from all neighboring SPs to refine edges to the most similar region. Both of the multifeature distances and revised Wishart are used to define the hybrid weighted adjacent matrix. This method disregards the local individual features of pixels and captures the global contextual information. For local and global feature combining, the graph network is integrated with a 3DCNN into a unified framework.

As said, most of existing PolSAR classification methods take fixed patches as input and utilize CNN for feature extraction and classification where only local information is utilized. The polarimetric information in PolSAR images is complex, and the use of only local information may be not sufficient for providing an accurate classification map. In contrast to pixel-based methods, approaches such as graph neural networks, which consider irregular SPs as input and update features through the graph structure based on information of adjacent nodes learn global information beside the local one, and so, improve the PolSAR image classification. However, the scale of SPs affects the classification results due to existence of objects with various shapes and sizes. To handle this issue, a multiscale SP guided weighted graph convolutional network is proposed in48. At first, it segments the PolSAR image into SPs of three different scales. Then, the correlation among SPs is used to form the adjacency matrix. Then, the weighted graph convolutional network is utilized for providing the SP feature representation. Finally, a multiscale feature cascade fusion module is introduced for providing the pixel level features.

Because of some reasons, the SP-based analysis of a PolSAR image can be preferred than the pixel-based analysis: (1) due to the noisy inherent of SAR images, the SP representation of a PolSAR image is more appropriate than its pixel representation because SPs remove the noisy pixels; (2) the SP representation implicitly explores the spatial information of the PolSAR images; and (3) the use of SPs instead of pixels degrades the computational burden. Usually, the appropriate number of SPs is set manually. In49, a formula is presented for computing the SP segmentation scale for a hyperspectral image. In49, the inherent properties of hyperspectral images such as spatial size, texture ratio, spatial resolution, and the number of categories are taken to account to compute the number of SPs. However, selection of an appropriate number of SPs in a PolSAR image needs to trial and error that is a troublesome task. From the other hand, limited works have studied the ability of graph convolutional networks for PolSAR image classification. Because of ability of graphs in exploring the hidden relationships among defined nodes, they are great tools for feature extraction in complex feature spaces. Moreover, graphs with providing a total view on all nodes, can explore the global information.

Although different works have shown great success in providing accurate classification maps with utilizing the advantages of SPs and graphs but they mostly utilize the SP based graph structure for global feature extraction. Some works have integrated graph networks with CNN to combine global and local features. However, exploring the irregular local structure is not possible by applying a convolutional network. To address this issue, a SP based network is designed in this work, which utilizes the graph networks for exploring both local and global features. A local graph from neighborhood SPs, and a global graph from the nearest SPs of different classes through the PolSAR image are composed. The local and global graphs are individually analyzed using the graph convolutional networks, and then, their extracted features are fused and used for PolSAR image classification. Moreover, most of SP based networks have difficulty of determining the number of SPs through trial and error. This issue is also addressed in this work through introducing an automatic method for determining the number of SPs. In addition, the most of graph-based networks usually have high number of learnable parameters and are highly complicated. A simple dual graph convolutional network with low number of learnable parameters is proposed in this work, which is simply implemented and runs fast in the prediction (test) phase.

To improve the PolSAR image classification, a SP-based graph convolutional network (SGCN) is introduced here, which consists of two branches. While the first branch extracts local spatial features through the local graph constructed from unlabeled neighboring SPs, the second branch extracts global information through the global graph constructed from the nearest labeled SPs from all classes. The main contributions of this work are represented as follows:

  1. A graph convolutional network with two branches containing the local and global information is constructed for PolSAR image classification.

  2. The local graph provides the local neighborhood information and also the unlabeled samples structures.

  3. The global graph contains the global class information and the labeled samples structures.

  4. The number of SPs is determined automatically by computing the standard deviation vector and its second order differential vector.

  5. The number of neighboring SPs is automatically determined by considering multiple local windows around all composing pixels of each SP.

The SGCN model is assessed with an ablation study where just the local branch is used (SGCN-L), where just the global branch is used (SGCN-G), and where both local and global branches are fused together (SGCN-LG). The experimental results show superior performance of the proposed SGCN-LG model in different PolSAR images. Comparison with several state-of-the-art methods shows that SGCN-LG outperforms its competitors with using lower number of training samples.

Proposed SGCN-LG model

In this work, the superpixel based graph convolutional network (SGCN) for local and global feature fusion, called as SGCN-LG, is proposed for PolSAR image classification. The proposed SGCN model consists of two branches of local graph and global graph where each graph is composed of superpixels (SPs) as the graph nodes. The local and global features containing the data structures of spatial neighbors and class neighbors, respectively, are eventually fused to classify the input SP. The flowchart of the SGCN-LG framework and the proposed network are shown in Figs. 1 and 2, respectively.

Fig. 1.

Fig. 1

Flowchart of the proposed SGCN-LG framework.

Fig. 2.

Fig. 2

The proposed network in the SGCN-LG model.

As said, SGCN-LG contains two parts of local (SGCN-L) and global (SGCN-G), which are explained with more details in the following. But, before that, generation of SPs with automatic determining the number of SPs is described.

Superpixels generation with automatic determining the number of superpixels

The proposed graph model uses the SPs of the PolSAR image as nodes. The SLIC algorithm is used here for generation of SPs because it is a well-known method with simple implementation. The SLIC algorithm is applied to the first principal component (PC1) obtained by the principal component analysis (PCA) transform50, which is normalized by:

graphic file with name d33e447.gif 1

where Inline graphic and Inline graphic computes the minimum and maximum values among all pixels of the PC1 image. The SLIC algorithm is applied to the PC1, which contains the polarimetric components with the most energy. So, it considers the polarimetric information of the PolSAR image.

In this algorithm, the number of SPs denoted as Inline graphic is given as input, which is a user defined parameter. Although the appropriate value of Inline graphic can be determined for each dataset by using the experiments, a simple method is proposed in this work for automatic determination of SPs, which provides appropriate results for various PolSAR images.

The number of SPs in an image can be approximated with the number of the considered square patches in it. With considering Inline graphic patches, each patch contains Inline graphic pixels. Let Inline graphic be the number of total pixels in the image. So, we have Inline graphic where Inline graphic rounds the input argument. To find the appropriate patch size Inline graphic, 25 odd numbers in range of3,35 are considered as the patch size candidates in a vector Inline graphic where Inline graphic means range of3,35 with step 2, and Inline graphic is the transpose operation. For each considered patch size Inline graphic, the PolSAR image in each polarimetric channel is divided to Inline graphic patches Inline graphic where Inline graphic denotes the number of polarimetric channels. Standard deviation (std) of pixels in each patch is computed. The average of std of all patches in all polarimetric channels is computed associated with each patch size as follows:

graphic file with name d33e541.gif 2

where Inline graphic is Inline graphicth patch generated in Inline graphicth channel associated with patch size Inline graphic. Inline graphic computes the standard deviation of pixels within the input patch. With computing the std value for all assumed patch sizes, the vector Inline graphic is obtained. The difference operation is applied to the Inline graphic. For a vector Inline graphicwith length Inline graphic, the difference Inline graphic operation calculates the differences between adjacent elements of vector Inline graphic as follows:

graphic file with name d33e593.gif 3

To find the first order differences of Inline graphic, we compute:

graphic file with name d33e603.gif 4

With applying the difference operation to Inline graphic, the second order differences are also obtained by:

graphic file with name d33e613.gif 5

The output dimension of the difference operation is equal to dimension of the input vector minus one. So, because dimension of Inline graphic is 25, dimensions of Inline graphic and Inline graphic are 24 and 23, respectively. The vector Inline graphic and its first and second order differences versus the patch size are shown for the Sanfrancisco image in Fig. 3. Because the length of Inline graphic is two units less than length of Inline graphic, it can be seen that the x-axis for plot Inline graphic in Fig. 3(a) starts from 3 while the x-axis for Inline graphic and Inline graphic starts from 5 to 7, respectively in Fig. 3(b) and Fig. 3(c).

Fig. 3.

Fig. 3

The (a) Inline graphic values, (b) the first order differences, and (c) the second order differences versus the patch size.

As seen, with increasing the patch size, generally, the std value is increasing, Inline graphic is decreasing, and the associated Inline graphic results in negative values. This is expected because with increasing the patch size, more pixels different from the central pixel may be located in the patch, which lead to the increasing of variance. Moreover, with increasing the patch size, the changes values of variance, which correspond to differential of std values, are decreased to a point that patch contains the related and similar pixels. In other words, with increasing the patch size from a place to next, the change variations start to increase, which show the patch may contain the non-related pixels with respect to the central. Therefore, the first place that Inline graphic starts to increase or Inline graphic becomes positive, can show the appropriate patch size:

graphic file with name d33e687.gif 6

where vector Inline graphic shows the indices associated with positive values in Inline graphic. The first index of Inline graphic, i.e., Inline graphic, corresponds to the first place that Inline graphic is positive. The appropriate patch size corresponds to index Inline graphic in the patch size vector Inline graphic, i.e., Inline graphic. Because considering the first index, Inline graphic, leads to selection of a relatively small patch size, and so, selection of high number of SPs, which not only increases the graph computations but also does not provide high accurate results, instead of Inline graphic, the use of Inline graphic is suggested for determining the appropriate patch size. In other words, instead of considering the first place that Inline graphic becomes positive, the third place that Inline graphic becomes positive, i.e., Inline graphic, is considered, which leads to selection of larger patch size, and so, lower number of SPs. For example, in Fig. 3, we can see that the third place that Inline graphic results positive value in Fig. 3(c), associated with the third place that Inline graphic starts to increase in Fig. 3(b), corresponds to patch size Inline graphic in Fig. 3(a).

The number of pixels in each SP is approximated by Inline graphic where Inline graphic is the appropriate patch size selected from the patch size vector Inline graphic and indices Inline graphic are determined by (6), and Inline graphic is the third element of the vector Inline graphic. Inline graphic is finally obtained to use as input of the SLIC segmentation algorithm.

To process an image with homogeneous regions, the image should be partitioned to a smaller number of SPs where each SP covers a relatively large homogeneous region with high number of similar pixels. In contrast, to process an image with heterogeneous regions, the image should be partitioned to a larger number of SPs with small areas where each SP covers a small region containing a low number of similar pixels.

In the images with homogeneous regions, the adjacent pixels considered within a patch have smaller variance value. So, Inline graphic starts to increase later, and therefore, Inline graphic becomes positive later. In other words, a larger value of patch size from the patch size vector Inline graphic is selected, which is associated with larger number of pixels in each SP, i.e., Inline graphic, and lower number of SPs Inline graphic. In contrast, In the images with heterogeneous regions, there is high variations in image. So, Inline graphic starts to increase earlier, and therefore, Inline graphic becomes positive earlier. Thus, a smaller value of patch size is selected, which is associated with smaller Inline graphic and larger Inline graphic.

According to above method, the number of SPs for three datasets, which will be introduced in Sect. 3, are obtained as follows: Inline graphic in Flevoland image, Inline graphic in Sanfrancisco image, and Inline graphic in Oberpfaffenhofen image. The generated SP maps for three datasets are shown in Fig. 4.

Fig. 4.

Fig. 4

SP maps of Flevoland, Sanfrancisco and Oberpfaffenhofen images.

Graph construction

An undirected graph Inline graphic with Inline graphic as the vertex set containing Inline graphic nodes and Inline graphic as the edge set is constructed where Inline graphic represents the edge among nodes Inline graphic and Inline graphic, i.e., the edge Inline graphic. The adjacency matrix Inline graphic is composed on the edge set Inline graphic as follows:

graphic file with name d33e911.gif 7

where Inline graphic as the maximum operator finds the maximum value from the input elements, and Inline graphic is the feature vector associated with node Inline graphic where Inline graphic is dimensionality of the feature vector. The diagonal degree matrix Inline graphic is constructed as Inline graphic. With considering Inline graphic as the identity matrix with dimensions of Inline graphic, Inline graphic is the adjacency matrix with added self-connections and Inline graphic is the degree matrix of Inline graphic. The normalized adjacency matrix will be Inline graphic. Considering the architecture introduced for graph convolution operation51-52, we have:

graphic file with name d33e975.gif 8

where Inline graphic is the activation function that is the rectified linear unit (ReLu) here. Inline graphic denotes the feature matrix obtained after Inline graphic layers and Inline graphic where Inline graphic is the input feature matrix. Inline graphic is the trainable weight matrix for multiplication in layer Inline graphic. To implement the trainable weight matrix for multiplication, the elementwise production followed by a 2D convolutional operator with 1 filter from size of Inline graphic is applied in this work (see Fig. 2).

SGCN-L

In the SGCN-L model, a graph is constructed on a given SP and its Inline graphic spatial neighbors. After segmentation of the PolSAR image in previous section, a local graph is constructed from each SP and its adjacent SPs. Assume that a superpixel Inline graphic contains Inline graphic pixels. Around each pixel of the SP, a Inline graphic local window is considered that

graphic file with name d33e1061.gif 9

where the appropriate size of Inline graphic can be obtained through experiments (it is discussed in Sect. 3.2). The value Inline graphic is approximation of the number of pixels in a SP. For considering a super window containing Inline graphic SPs, which consists of about Inline graphic pixels, the length of the square window will be about the square root of Inline graphic. For each pixel of the SP, a Inline graphic window is constituted and label (number) of the SP that each pixel of this window belongs to it is saved. So, for superpixel Inline graphic that has Inline graphic pixels, the central pixel 1 and its neighbors in its Inline graphic local window belongs to Inline graphic SPs. Similarly, the neighbors of pixel 2 in its Inline graphic neighborhood window belongs to Inline graphic SPs, and eventually, the neighbors of pixel Inline graphic in its Inline graphic neighborhood window belongs to Inline graphic SPs. Because the adjacent pixels may belong to the same SP, the unique SPs is countered. So, Inline graphic are the number of unique SPs in pixels 1, 2,…, Inline graphic, respectively. The minimum number among Inline graphic is obtained by:

graphic file with name d33e1152.gif 10

So, Inline graphic neighbors is determined for superpixel Inline graphic. This process is repeated for all Inline graphic SPs in the image and Inline graphicneighbors is selected for Inline graphicth SP. Minimum of the obtained numbers is finally considered as Inline graphic:

graphic file with name d33e1183.gif 11

Eventually, for each SP, Inline graphic SPs that are located in local window of its composing pixels are selected as Inline graphic local neighboring SPs. After here to next, for simplicity in notations, Inline graphic is written instead of Inline graphic. According to what explained, the number of neighboring SPs for three datasets are obtained as follows: Inline graphic in Flevoland, Inline graphic in Sanfrancisco, and Inline graphic in Oberpfaffenhofen image.

Now, for each SP, a local graph with Inline graphic nodes is made where the nodes are the adjacent SPs that are neighbors of the given SP. The feature matrix for each SP is:

graphic file with name d33e1224.gif 12

where Inline graphic is the mean of pixels that belong to superpixel Inline graphic, and Inline graphic is the number of polarimetric channels. For each pixel of the PolSAR image, 9 elements of the coherency matrix Inline graphic are used as the feature vector as follows:

graphic file with name d33e1246.gif 13

So, we have Inline graphic. Although there are many target decomposition methods that can explore the polarimetric scattering features, we do not use them because of some reasons: 1- for simplicity and avoiding more computations, 2- because the PC1 of features is used for generation of SPs, the use of polarimetric features of the coherency matrix may be sufficient.

Also, the local adjacency matrix is denoted by Inline graphic. For each SP of the PolSAR image, the local feature matrix Inline graphic and the local adjacency matrix Inline graphic are used to compose the local graph convolutional model according to what described in the “2.2. graph construction” section. The local graph has two inputs: Inline graphic and Inline graphic. Because, dimensionality of the inputs must be fixed in the network, a fixed neighborhood size Inline graphic has to be considered for all SPs.

SGCN-G

Assume that there are labeled samples from Inline graphic classes of dataset. For each labeled pixel, the SP that belongs to it is considered as the labeled SP (training sample) where label of the given pixel is assigned to the SP. For each SP, the nearest SP from each given class is selected. Inline graphic nearest neighbors from Inline graphic classes are used as the nodes to form the global graph for the given SP. The mean of pixels in each SP is used as the representative feature vector of that SP and the Euclidean distance is considered for computing the nearest SPs.

Because the training samples are globally located in entire the scene, the composed graph contains the global information with features from labeled samples of all classes. For each SP of the PolSAR image, the global feature matrix Inline graphic and the global adjacency matrix Inline graphic are used to compose the global graph convolutional model according to what described in the “2.2. graph construction” section.

Feature fusion and classification

Outputs of two branches of model are individually flattened and then, fused together through the concatenation layer. The aim in this work is to provide a simple and light network with relatively low number of parameters to provide efficient classification results even in small sample size situations. So, fusion of local and global branches is simply done by concatenating the extracted features. Finally, a fully connected (FC) layer with Inline graphic neurons, softmax and classification layer are used to find the label of the input SP as the model’s output. In this model, for each SP, the local and global feature matrices and adjacency matrices are given as the input and label of the given SP is obtained as the output. The proposed model is a SP based classification. The label of each SP is assigned to all pixels that compose it.

The classification map is filtered by the guided filter with the first principal component as the guidance image to provide a classification map aligned with the real class boundaries. The guided filter has two free parameters, which are set in the experiments: Inline graphic determines the length of the filtering window, Inline graphic, and Inline graphic is the regularization parameter. For more details about the guided filter, the interested reader is referred to53.

Experiments

Datasets and parameter settings

Three real L-band PolSAR images are used for experiments. The first dataset is Flevoland acquired by AIRSAR, which contains 15 classes with 750Inline graphic1024 pixels. The second image acquired by AIRSAR is Sanfrancisco Bay image with 5 classes and 900Inline graphic1024 pixels. The third dataset acquired by electronically steered array radar (ESAR) is Oberpfaffenhofen with 4 classes and 1297Inline graphic935 pixels.

In the Flevoland image, 100 training samples (labeled pixels) per class and in two other datasets, 500 training samples per class are used. For applying the guided filter in the end of the proposed method, the following settings are set: Inline graphic and Inline graphic in Flevoland, Inline graphic and Inline graphic in Sanfrancisco, and Inline graphic and Inline graphic in Oberpfaffenhofen dataset. For pixel based classification methods, the use of smaller window sizes in the guided filter is recommended. But, because the proposed method is a superpixel based classification, there are larger homogenous regions in the obtained classification maps, and so, larger guided filters are applied to them. The Adam optimizer with initial learning rate of 0.001, batch size of 50 and 200 epochs are used for training the proposed methods.

The proposed framework is assessed in three cases: when just local graph is used, i.e., SGCN-L, when just global graph is used, i.e., SGCN-G, and when features of both local and global graphs are fused, i.e., SGCN-LG. These proposed models are compared with SVM, two-dimensional CNN (2DCNN), three-dimensional CNN (3DCNN), and some state-of-the-art PolSAR classification methods. SVM is assessed in two different cases where pixels or SPs are used as input of the classifier, i.e., SVM (pixel) and SVM (superpixel). For implementation of SVM, the polynomial kernel with degree 3 is used.

Due to the use of relatively small training sets, low depth 2DCNN and 3DCNN networks are used as the competitors. In 2DCNN, two convolutional layers followed by batch normalization, ReLu, and dropout with dropping probability of 0.2 are used. 4 filters with size of Inline graphic and stride 2 are used in the first and second convolutional layers, respectively, with the “same padding”. In the 3DCNN, the similar settings are used where Inline graphic convolutional filters are used instead of Inline graphic filters. In the end of 2DCNN and 3DCNN models, two fully connected layers with Inline graphic and Inline graphic neurons, respectively are used, which are followed by softmax and classification layers. The inputs of 2DCNN and 3DCNN is patches with size of Inline graphic where Inline graphic.

Assessment of parameters’ effects

The effect of selection of Inline graphic, Inline graphic and Inline graphic in classification accuracy and prediction time in Flevoland dataset is represented in Table 1. As said before, the use of Inline graphic and Inline graphic leads to determination of a larger number of SPs, which cause higher computational burden. Especially in the global graph, i.e., SGCN-G, the effect of computational burden is more than SGCN-L because in the SGCN-G, associated with each given SP, we have to find the nearest SPs from each class through computing the Euclidean distance. As seen, the prediction time of SGCN-G (containing global graph) and SGCN-LG (containing both local and global graphs) with Inline graphic is about half of the prediction time of them with Inline graphic and Inline graphic. From the classification accuracy point of view, in the SGCN-L model, the classification accuracy obtained by Inline graphic is significantly better than that with Inline graphic and Inline graphic. In the SGCN-G model, from Inline graphic to Inline graphic, the classification accuracy is decreased. With Inline graphic, smaller number of SPs are generated in the PolSAR image, which may be not sufficient for accurate finding the nearest neighbors from different classes. In the SGCN-LG, classification accuracies obtained by Inline graphic to Inline graphic, with a bit difference with respect to each other, are better than Inline graphic. Generally, for the main proposed method, i.e., SGCN-LG, Inline graphic can be the best choice compared to Inline graphic and Inline graphic in terms of both classification accuracy and prediction time.

Table 1.

Effect of selection of Inline graphic, Inline graphic and Inline graphic in classification accuracy and prediction time.

Inline graphic No. of superpixels
Inline graphic
No. of training samples per class SGCN-L SGCN-G SGCN-LG
Inline graphic 3413 Overall accuracy 90.82 96.08 98.44
Prediction time (seconds) 1.25 1.10 1.53
Inline graphic 2657 Overall accuracy 88.82 94.64 99.35
Prediction time (seconds) 1.27 1.01 1.53
Inline graphic 1229 Overall accuracy 95.64 92.15 99.26
Prediction time (seconds) 1.24 0.55 0.73

The number of considered SPs should be large enough to fit the heterogeneous regions such that the non-similar pixels are assigned to different SPs. From the other hand, increasing the number of SPs leads to increasing the computational burden.

According to Table 1, for three different numbers of SPs in Flevoland dataset, i.e., Inline graphic, Inline graphic, and Inline graphic, the classification accuracy and prediction time are obtained for different models of the proposed framework. The experiments show that Inline graphic is an appropriate number of SPs for Flevoland dataset in terms of both classification accuracy and prediction time.

In Table 2, effect of selection of the parameter Inline graphic is assessed for Flevoland dataset. For different values of Inline graphic, the associated value of local window size Inline graphic computed according (9), and the number of local neighboring SPs Inline graphic obtained by (11) is given. Corresponding to each parameter Inline graphic, the overall accuracy achieved by the local branch of the proposed model, i.e., SGCN-L is represented. Moreover, the running time for providing the neighbors to construct the local graph is given. As seen from this table, selection of a larger value for parameter Inline graphic leads to considering a larger local neighborhood window Inline graphic, and selection of more neighboring SPs Inline graphic, which requires more running time. However, although increasing the local window size with involving more spatial information from the local regions increases the classification accuracy to a point, but from a point to next, increasing the local window includes redundant and non-related spatial information, which may degrade the class discrimination ability. As seen from this table, with increasing the local window size to Inline graphic, the classification accuracy is improved but after that, the OA is decreased. With more increasing the local window, not only the classification accuracy is decreased but also the computation time is increased. According to what discussed, the parameter Inline graphic is set as Inline graphic for all PolSAR images in this work. Generally, with increasing the local window size, higher running time is required to find the neighboring SPs. However, the related computations are done in the training phase and do not cause delay in the test (prediction) phase.

Table 2.

Effect of selection of parameter Inline graphic in providing the neighboring superpixels.

No. of neighborhood superpixels Inline graphic Local window size Inline graphic Inline graphic Computation time (seconds) OA of SGCN-L
3 45.00 6 110.62 79.78
5 57.00 8 167.77 88.57
7 67.00 10 210.25 90.92
9 77.00 13 265.65 95.64
11 85.00 15 307.15 93.07

The whole network is trained in a unified framework with an end-to-end supervised manner. In other words, the learnable parameters of both local and global graphs are determined with a supervised learning, and so, efficiencies of both of them are affected by the number of training samples. The inputs of local graph are Inline graphic and Inline graphic; and the inputs of global graph are Inline graphic and Inline graphic where Inline graphic is the number of polarimetric features, Inline graphic is the number of local neighboring SPs (for example, in Flevoland, we obtain Inline graphic), and Inline graphic is the number of classes (Inline graphic in Flevoland). Generally, sizes of the input data, i.e., sizes of feature matrix and adjacency matrix, in the constructed local and graph graphs are not so large. So, the proposed models SGCN-L, SGCN-G and SGCN-LG are yet relatively efficient using limited training samples.

In Table 3, the overall accuracies obtained by different cases of the proposed framework for Flevoland dataset are reported for different number of training samples. As seen, when, the number of training samples is low (10 or 50 training samples per class), the SGCN-G works better than SGCN-L. By using 10 training samples per class, SGCN-G ranks first with significant difference with respect to SGCN-L and SGCN-LG. The main proposed method, SGCN-LG, even with low number of labeled samples (50 training samples per class) achieves high overall accuracy (OA). Although with increasing the number of labeled samples, efficiency of SGCN-L and SGCN-G are significantly improved, but, SGCN-LG, which fuses the information from both SGCN-L and SGCN-G has lower sensitivity to the number of training samples where its difference between two cases of 100 training samples per class and 150 training samples per class is not significant.

Table 3.

Overall accuracy obtained in different sizes of training set.

No. of training samples per class SGCN-L SGCN-G SGCN-LG
10 48.86 78.51 67.15
50 68.49 86.09 95.97
100 95.64 92.15 99.26
150 98.68 94.82 99.29

Classification results

In Table 4, the classification results obtained for Flevoland dataset are reported. The classification accuracy of each class, average accuracy (AA), overall accuracy (OA), and kappa coefficient (K) are represented. As seen, the proposed SGCN-LG model provides the highest AA, OA, K and Macro-F1 values. After that, SVM (superpixel) ranks second. SGCN-L, 3DCNN, SGCN-G, 2DCNN, and SVM (pixel) are located in the next ranks, respectively. Most of classes in Flevoland image are related to agriculture regions with grained texture and varied polarimetric characteristics. So, in the agriculture classes such as “Lucerne”, “Beet”, “Grass”, “Rapeseed” and “Wheat 3”, SGCN-L, which explores local information from the neighborhood context performs better than SGCN-G, which focuses on global feature extraction. In contrast, in the class of “Buildings” which has coarser texture, the SGCN-G network outperforms the SGCN-L. Generally, in this dataset, SGCN-L results in more accurate classification map with respect to SGCN-G, which shows more importance of local features contained in neighborhood SPs than the global class information in this image. 100 samples in each category (totally 1500 samples) are used as training set, which is relatively a small set. SVM, which has low sensitivity to the number of training samples provides high accurate results when it is implemented SP based where the use of SPs effectively improves its performance. Although a low depth is considered for 2DCNN and 3DCNN architectures, yet it seems that the used small training set is not enough for good learning of their models.

Table 4.

Classification results for the flevoland dataset.

Name of class # samples SGCN-L SGCN-G SGCN-LG SVM (pixel) SVM (superpixel) 2DCNN 3DCNN
Stembeans 6103 99.57 99.25 99.41 96.00 98.31 97.20 97.64
Peas 9111 100.00 100.00 99.95 96.53 99.73 97.32 96.90
Forest 14,944 93.75 92.92 99.59 86.32 98.32 90.20 97.42
Lucerne 9477 97.30 82.57 94.68 91.09 97.38 95.42 95.28
Wheat 17,283 98.86 97.18 99.86 78.67 98.76 83.93 92.89
Beet 10,050 90.88 84.95 98.64 90.11 97.25 94.44 95.26
Potatoes 15,292 76.49 89.61 98.11 75.22 91.90 88.27 86.86
Bare soil 3078 100.00 100.00 100.00 99.71 100.00 100.00 100.00
Grass 6269 99.79 89.26 99.98 79.01 98.37 78.96 88.47
Rapeseed 12,690 95.55 84.04 99.76 80.64 98.12 85.58 92.41
Barely 7156 100.00 99.85 100.00 93.00 99.68 99.34 96.12
Wheat 2 10,591 99.87 99.79 99.87 79.09 98.22 92.81 88.68
Wheat 3 21,300 99.83 85.54 99.97 85.82 98.70 94.02 95.63
Water 13,476 98.26 98.66 99.90 96.96 99.13 98.81 97.27
Buildings 476 85.08 97.27 90.55 94.75 97.90 90.97 95.38
AA 95.68 93.39 98.68 88.19 98.12 92.49 94.42
OA 95.64 92.15 99.26 86.10 97.89 91.81 93.99
K 95.24 91.45 99.19 84.86 97.70 91.07 93.44
Macro-F1 95.47 88.62 97.43 85.85 97.97 91.79 93.97

The proposed SGCN model, which is a graph constructed on the SPs, can well learn specially when both local and global information are fused. To assess whether the difference of the classification methods is statistically significant or not, the Z scores are computed according to the McNemars test54, and the results are shown in Table 5. It is seen that SGCN-LG provides positive Z values much larger than 1.96 with respect to other methods, which shows superior performance of SGCN-LG compared to other methods from the statistical point of view. The Pauli RGB, ground truth map (GTM), and the classification maps of the Flevoland dataset are shown in Fig. 5.

Table 5.

McNemars test results for the flevoland dataset.

SGCN-L SGCN-G SGCN-LG SVM (pixel) SVM (superpixel) 2DCNN 3DCNN
SGCN-L 0 42.84 − 70.48 94.02 − 39.13 45.14 21.29
SGCN-G − 42.84 0 − 103.48 56.73 − 76.39 3.65 − 21.63
SGCN-LG 70.48 103.48 0 138.69 35.14 101.38 83.29
SVM (pixel) − 94.02 − 56.73 − 138.69 0 − 122.74 − 61.67 − 88.71
SVM (superpixel) 39.13 76.39 − 35.14 122.74 0 78.90 57.30
2DCNN − 45.14 − 3.65 − 101.38 61.67 − 78.90 0 − 32.96
3DCNN − 21.29 21.63 − 83.29 88.71 − 57.30 32.96 0

Fig. 5.

Fig. 5

Classification maps for the Flevoland dataset.

As seen, SGCN-L, SGCN-G, SGCN-LG, and SVM (superpixel), which use the SPs as input provide cleaner classification maps with respect to SVM (pixel), 2DCNN and 3DCNN. Although in 2DCNN and 3DCNN, the patches are used as input, but a patch is representative of its central pixel where the label in the output is assigned to the center.

The classification accuracies and Z scores for Sanfrancisco dataset are reported in Tables 6 and 7, respectively. The Sanfrancisco image has classes with large and approximately uniform texture. So, difference between SGCN-L and SGCN-G is not so significant in most of classes of this dataset. However, in the class of “Ocean”, which is inherently.

Table 6.

Classification results for the Sanfrancisco dataset.

Name of class # samples SGCN-L SGCN-G SGCN-LG SVM (pixel) SVM (superpixel) 2DCNN 3DCNN
Bare soil 15,628 66.80 67.19 59.85 39.67 70.62 83.89 85.95
Mountain 63,295 93.67 91.07 92.86 61.54 85.76 90.91 92.89
Ocean 328,118 82.36 97.69 98.29 81.26 61.95 95.90 95.45
Urban 343,465 94.17 96.18 98.68 29.37 84.86 89.90 87.45
Vegetation 54,758 79.06 83.15 87.89 33.84 45.92 82.36 83.79
AA 83.21 87.06 87.51 49.14 69.82 88.59 89.11
OA 87.76 94.95 96.58 53.55 72.67 91.79 90.86
K 81.71 92.12 94.63 38.89 61.84 87.54 86.24
Macro-F1 75.10 87.09 89.95 44.10 58.03 81.76 80.57

Table 7.

McNemars test results for the Sanfrancisco dataset.

SGCN-L SGCN-G SGCN-LG SVM (pixel) SVM (superpixel) 2DCNN 3DCNN
SGCN-L 0 − 190.34 − 243.33 443.93 248.62 − 90.72 − 68.85
SGCN-G 190.34 0 − 87.79 551.39 399.39 107.22 129.59
SGCN-LG 243.33 87.79 0 573.08 421.36 152.64 171.20
SVM (pixel) − 443.93 − 551.39 − 573.08 0 − 237.37 − 513.21 − 502.50
SVM (superpixel) − 248.62 − 399.39 − 421.36 237.37 0 − 332.58 − 312.76
2DCNN 90.72 − 107.22 − 152.64 513.21 332.58 0 48.29
3DCNN 68.85 − 129.59 − 171.20 502.50 312.76 − 48.29 0

a homogeneous region, the use of SGCN-G significantly results in better classification result compared to SGCN-L. In this dataset, SGCN-G significantly works better than SGCN-L according to AA, OA, K, Macro-F1 and Z scores in the McNemars test. It shows that the global features contained in the nearest labeled SPs from each class has much more discrimination information with respect to the unlabeled neighboring SPs in the local regions in this image. The proposed SGCN-LG model with considering both local features in unlabeled samples and global ones in labeled samples provides the best classification results. SVM in both cases of pixel-based and superpixel-based cannot work well. 2DCNN and 3DCNN achieve close results where AA of 3DCNN is higher than that of 2DCNN but OA, K and Macro-F1 of 2DCNN is better. Note that due to the high number of learnable parameters in 3DCNN with respect to 2DCNN, 2DCNN may work better than 3DCNN if the training set is not large enough. In this dataset, SGCN-LG, SGCN-G and 2DCNN are the best candidates. The classification maps for Sanfrancisco dataset are shown in Fig. 6. Although SVM (superpixel) uses the SPs as input, but it fails to align the real class boundaries, and there are high false alarm regions in its classification map. Moreover, in SGCN-L, the class with blue color in GTM (Bare soil) is assigned wrongly to a large area of the class with green color (Ocean). SVM (pixel) fails to work. Although 2DCNN and 3DCNN provides relatively accurate results, but their classification maps are very noisy.

Fig. 6.

Fig. 6

Classification maps for the Sanfrancisco dataset.

The classification results and the Z score values for Oberpfaffenhofen dataset are represented in Tables 8 and 9, respectively. In homogeneous classes such as “Open areas” and “Wood land”, SGCN-G outperforms SGCN-L, and in classes with more contextual details such as “Built-up areas”, SGCN-L significantly works better than SGCN-G. In this dataset, generally SGCN-G works better than SGCN-L, and SGCN-LG ranks first with significant difference with respect to other methods from the statistical point of view. SVM in both cases of pixel-based and superpixel-based provides OA less than 50%, which is not acceptable. 3DCNN and SGCN-G ranks second and third, respectively. The classification maps are shown in Fig. 7. It can be seen that the class with yellow color in GTM (Built-up areas) is not well detected in SVM (pixel). There are lots of false alarm regions in SVM (superpixel). 2DCNN and 3DCNN provide high noisy classification maps. In the SGCN-L, there are lots of false alarm regions. In contrast, SGCN-LG and SGCN-G provide more accurate and cleaner classification maps. Although SGCN-LG and SGCN-G provide higher classification accuracy with respect to other models, however, they are superpixel-based methods, which have not ability of preserving the edges and class boundaries as the same as pixel-level methods.

Table 8.

Classification results for the Oberpfaffenhofen dataset.

Name of class # Total samples SGCN-L SGCN-G SGCN-LG SVM (pixel) SVM (superpixel) 2DCNN 3DCNN
Open areas 625,029 48.70 68.95 70.48 32.78 65.14 72.92 73.20
Wood land 202,032 89.02 92.92 89.79 74.45 11.13 70.27 72.36
Built-up areas 190,202 67.30 33.08 71.79 14.49 27.10 38.10 43.45
Road 195,432 26.66 29.15 39.37 29.41 22.61 26.52 28.77
AA 57.92 56.02 67.85 37.78 31.49 51.95 54.44
OA 54.78 60.90 68.89 36.31 43.32 59.54 61.23
K 38.58 43.25 55.04 14.83 16.08 39.86 42.61
Macro-F1 50.81 52.13 63.94 33.13 31.22 50.73 52.89

Table 9.

McNemars test results for the Oberpfaffenhofen dataset.

SGCN-L SGCN-G SGCN-LG SVM (pixel) SVM (superpixel) 2DCNN 3DCNN
SGCN-L 0 − 116.47 − 309.93 298.86 172.32 − 81.38 − 111.56
SGCN-G 116.47 0 − 214.26 405.36 287.54 27.13 − 6.66
SGCN-LG 309.93 214.26 0 508.32 400.36 180.26 150.89
SVM (pixel) − 298.86 − 405.36 − 508.32 0 − 105.65 − 399.15 − 420.41
SVM (superpixel) − 172.32 − 287.54 − 400.36 105.65 0 − 266.46 − 293.75
2DCNN 81.38 − 27.13 − 180.26 399.15 266.46 0 − 44.73
3DCNN 111.56 6.66 − 150.89 420.41 293.75 44.73 0

Fig. 7.

Fig. 7

Classification maps for the Oberpfaffenhofen dataset.

To assess impact of the guided filtering in enhancement of the proposed network and its branches, the OA values obtained in both cases of without guided filtering and with guided filtering for all datasets are reported in Table 10. According to the obtained results, because of noise removing with preserving the class boundaries according to the first principal component as the guidance image, the guided filtering increases the OA in all cases. This improvement is significant for SGCN-L in Sanfrancisco and Oberpfaffenhofen datasets.

Table 10.

The OA in both cases of without applying the guided filter and with applying the guided filter.

Name of class SGCN-L SGCN-G SGCN-LG
Without guided filtering With guided filtering Without guided filtering With guided filtering Without guided filtering With guided filtering
Flevoland 93.21 95.64 90.36 92.15 98.15 99.26
Sanfrancisco 77.15 87.76 92.26 94.95 93.44 96.58
Oberpfaffenhofen 50.00 54.78 58.15 60.90 65.45 68.89

In this paper, the image is partitioned into SPs. Then, for each given SP, two small graphs are constructed, which lead to low computations. The feature matrix and adjacency matrices of the local graph are Inline graphic and Inline graphic; and the feature matrix and adjacency matrices of the global graph are Inline graphic and Inline graphic where Inline graphic is the number of polarimetric features, Inline graphic is the number of local neighboring SPs (for example, in Flevoland, we obtain Inline graphic), and Inline graphic is the number of classes (Inline graphic in Flevoland). Generally, the sizes of constructed graphs are relatively small, and so, there is not high computations for their processing.

In Table 11, the number of learnable parameters for each method is reported. As seen, all cases of the proposed framework, i.e., SGCN-L, SGCN-G and SGCN-LG have low number of learnable parameters even smaller than the considered light 2DCNN. Approximately, each of SGCN-L and SGCN-G contains half of the learnable parameters in SGCN-LG. In Table 12, the running time in both training and test (prediction) phases are reported for different methods in the Flevoland dataset. According to the obtained results, although SGCN-L, SGCN-G and SGCN-LG models have high training time compared to other methods, but they run much faster than 2DCNN and 3DCNN, and even, faster the pixel based SVM in the test phase. Although the superpixel based SVM has the lowest running time, but it is not so efficient in various PolSAR images compared to other methods.

Table 11.

The number of learnable parameters in each model.

Method SGCN-L SGCN-G SGCN-LG 2DCNN 3DCNN
No. of learnable parameters 1.8k 2k 3.8k 5.9k 23.5k

Table 12.

The running time in each model.

Method SGCN-L SGCN-G SGCN-LG SVM (pixel) SVM (superpixel) 2DCNN 3DCNN
Training time (seconds) 1804.58 1753.27 3050.37 3.04 0.20 19.59 18.74
Test time (seconds) 1.24 0.55 0.73 3.60 0.02 72.62 74.25

Generally, the proposed framework with a low number of learnable parameters, fast running in the prediction phase, and high efficiency in different images, can be a good candidate for PolSAR image classification.

Comparison with several state-of-the-art methods

The proposed SGCN-LG model is compared with several advanced methods in this section. The comparison results obtained for Flevoland dataset using 100 training samples pe class are reported in Table 13. It can be seen that the proposed SGCN-LG model provides the highest OA and kappa coefficient. The use of graph structures on SPs with fusion of two local and global views leads to the best performance. In term of the AA, DFC ranks first, and with a bit difference, SGCN-LG ranks second. Generally, after SGCN-LG, the best performance is obtained by DFC, which benefits the advantages of pre-determined convolutional kernels, multi-view analysis, two-step discriminant analysis.

Table 13.

Comparison with some state-of-the-art methods.

Name of class # samples DFC RCNN-AA MMAL SGCN-LG
Stembeans 6103 99.56 99.08 99.23 99.41
Peas 9111 98.83 97.38 97.99 99.95
Forest 14,944 99.92 97.57 96.75 99.59
Lucerne 9477 94.68 97.79 96.81 94.68
Wheat 17,283 99.15 95.25 91.55 99.86
Beet 10,050 98.69 97.64 96.18 98.64
Potatoes 15,292 96.55 97.54 96.44 98.11
Bare soil 3078 100.00 99.74 100.00 100.00
Grass 6269 99.22 95.09 91.42 99.98
Rapeseed 12,690 98.16 95.29 91.65 99.76
Barely 7156 99.80 99.43 99.83 100.00
Wheat 2 10,591 99.01 97.26 96.88 99.87
Wheat 3 21,300 99.32 96.67 97.04 99.97
Water 13,476 99.98 99.80 99.93 99.90
Buildings 476 99.79 93.49 84.87 90.55
AA 98.84 97.27 95.77 98.68
OA 98.72 97.26 96.15 99.26
K 98.61 97.01 95.80 99.19

and high confidence decision fusion. RCNN-AA that uses the convolutional autoencoder based attention for providing an appropriate input for residual CNN ranks third in terms of AA, OA and kappa coefficient. MMAL, which utilizes the cross-attention mechanism to compute the relationships among multi-level features, show less performance compared to other methods. The corresponding classification maps are shown in Fig. 8. The cleanest classification map with the least noise is achieved by SGCN-LG. After that, DFC provides the most accurate classification map. There are more number of noisy pixels in MMAL compared to other methods.

Fig. 8.

Fig. 8

Classification maps of some state-of-the-art methods.

In the following, some advantages of the proposed SGCN-LG model, which leads to its high performance, are represented:

  1. The use of SPs has some benefits such as noise reduction, providing spatial information and complexity reducing in graph computations.

  2. With automatic determining the number of SPs Inline graphic, and the automatic determining the number of spatial neighboring SPs for construction of the local graph Inline graphic, there is not free parameter for making the graph model.

  3. While the local information of each SP is obtained by its neighbors in adjacent areas, the global information containing knowledge of the classes presented in the labeled samples is extracted from the whole image.

Conclusion

Superpixels (SPs) are used as the composing nodes of graph for PolSAR image classification in this work. The number of SPs is automatically determined according to analysis of the differential operator applied to the standard deviation values in patches with different sizes. Two local and global graphs are constructed from the generated SPs where the local graph is made of adjacent SPs in local regions, and global graph is made of the nearest SPs from all classes. The local and global features are fused for classification. The relationships among spatial features are explored in the local graph and the global structure of the labeled SPs are extracted by the global graph. The proposed model with benefiting advantages of SPs, graph networks, and local and global feature fusion provides high accurate classification results with the same or smaller training sets compared to several state-of-the-art methods. However, there are some challenges which can be studied in future works. For example, considering the within-superpixel variations specially in heterogeneous regions is important in pixel based classification. Assigning different weights to different pixels of a given SP can enhance the classification map. Moreover, the SLIC method is used here for PolSAR image segmentation out of the network. But, training an adaptive SP generation block integrated with the classification network in a unified framework can lead to better alignments of class boundaries in the final classification result.

Author contributions

Maryam Imani has all roles of Conceptualization; Methodology; Software; Validation; Formal analysis; Investigation; Writing, review & editing.

Data availability

The datasets are available online in https://ietr-lab.univ-rennes1.fr/polsarpro-bio/san-francisco and https://github.com/fudanxu/CV-CNN?tab=readme-ov-file.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Duan, D. & Wang, Y. Reflection of and vision for the decomposition algorithm development and application in earth observation studies using PolSAR technique and data. Remote Sens. Environ.261, 112498 (2021). [Google Scholar]
  • 2.Imani, M. Two-step discriminant analysis based multi-view polarimetric SAR image classification with high confidence. Sci. Rep.12, 5984 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Gomez, L., Alvarez, L., Mazorra, L. & Frery, A. C. Fully PolSAR image classification using machine learning techniques and reaction-diffusion systems. Neurocomputing255, 52–60 (2017). [Google Scholar]
  • 4.Li, H., Chen, J., Li, Q., Wu, G. & Chen, J. Mitigation of reflection symmetry assumption and negative power problems for the model-based decomposition. IEEE Trans. Geosci. Remote Sens.54(12), 7261–7271 (2016). [Google Scholar]
  • 5.Ghazvinizadeh, A. H., Imani, M. & Ghassemian, H. Residual network based on entropy–anisotropy–alpha target decomposition for polarimetric SAR image classification. Earth Sci. Inf.16, 357–366 (2023). [Google Scholar]
  • 6.Bilal, M., Israr, H., Shahid, M. & Khan, A. Sentiment classification of Roman-Urdu opinions using Naïve Bayesian. Decision tree and KNN classification techniques. J. King Saud Univ. - Comput. Inform. Sci.28(3), 330–344 (2016). [Google Scholar]
  • 7.Parand, K., Aghaei, A. A., Jani, M. & Ghodsi, A. Parallel LS-SVM for the numerical simulation of fractional Volterra’s population model. Alexandria Eng. J.60(6), 5637–5647 (2021). [Google Scholar]
  • 8.Sánchez-Lladó, F. J., Pajares, G. & López-Martínez, C. Improving the wishart synthetic aperture radar image classifications through deterministic simulated annealing. ISPRS J. Photogrammetry Remote Sens.66(6), 845–857 (2011). [Google Scholar]
  • 9.Shi, J., Wang, W., Jin, H. & He, T. Complex matrix and multi-feature collaborative learning for polarimetric SAR image classification. Appl. Soft Comput.134, 109965 (2023). [Google Scholar]
  • 10.Imani, M. Entropy/anisotropy/alpha based 3DGabor filter bank for PolSAR image classification. Geocarto Int.37(27), 18491–18519 (2022). [Google Scholar]
  • 11.Imani, M. Classification using ridge regression-based polarimetric-spatial feature extraction. In Polarimetric SAR. 2021 26th International Computer Conference, Computer Society of Iran (CSICC), Tehran, Iran, 1–5 (2021).
  • 12.Chen, Y. et al. Nonlinear projective dictionary pair learning for PolSAR image classification. IEEE Access.9, 70650–70661 (2021). [Google Scholar]
  • 13.Song, W., Wu, Y. & Guo, P. Composite kernel and hybrid discriminative random field model based on feature fusion for PolSAR image classification. IEEE Geosci. Remote Sens. Lett.18(6), 1069–1073 (2021). [Google Scholar]
  • 14.Latif, S. D. et al. Assessing rainfall prediction models: exploring the advantages of machine learning and remote sensing approaches. Alexandria Eng. J.82, 16–25 (2023). [Google Scholar]
  • 15.Imani, M. Integration of the k-nearest neighbours and patch-based features for PolSAR image classification by using a two-branch residual network. Remote Sens. Lett.12(11), 1112–1122 (2021). [Google Scholar]
  • 16.Wang, J. et al. Parameter selection of Touzi decomposition and a distribution improved autoencoder for PolSAR image classification. ISPRS J. Photogrammetry Remote Sens.186, 246–266 (2022). [Google Scholar]
  • 17.Imani, M. Low frequency and radar’s physical based features for improvement of convolutional neural networks for PolSAR image classification. Egypt. J. Remote Sens. Space Sci.25, 55–62 (2022). [Google Scholar]
  • 18.Shang, R., Wang, J., Jiao, L., Yang, X. & Li, Y. Spatial feature-based convolutional neural network for PolSAR image classification. Appl. Soft Comput.123, 108922 (2022). [Google Scholar]
  • 19.Zhang, P., Liu, C., Chang, X., Li, Y. & Li, M. Metric-based meta-learning model for few-shot PolSAR image terrain classification. In 2021 CIE International Conference on Radar (Radar), Haikou, Hainan, China, 2529–2533 (2021).
  • 20.Imani, M. Residual convolutional neural network with autoencoder based attention for PolSAR image classification. In 2024 13th Iranian/3rd International Machine Vision and Image Processing Conference (MVIP), Tehran, Iran, 1–6. (2024).
  • 21.Yang, Z., Wu, Y., Li, M., Hu, X. & Li, Z. Unsupervised change detection in PolSAR images using Siamese encoder–decoder framework based on graph-context attention network. Int. J. Appl. Earth Obs. Geoinf.124, 103511 (2023). [Google Scholar]
  • 22.Ling, J., Wei, S., Gamba, P., Liu, R. & Zhang, H. Advancing SAR monitoring of urban impervious surface with a new polarimetric scattering mixture analysis approach. Int. J. Appl. Earth Obs. Geoinf.124, 103541 (2023). [Google Scholar]
  • 23.Zhang, Z. C., Chen, Z. D., Wang, Y., Luo, X. & Xu, X. S. A vision transformer for fine grained classification by reducing noise and enhancing discriminative information. Pattern Recogn.145, 109979 (2024). [Google Scholar]
  • 24.Dong, H., Zhang, L. & Zou, B. Exploring vision transformers for polarimetric SAR image classification. IEEE Trans. Geosci. Remote Sens.60, 1–15, Art 5219715 (2022).
  • 25.Imani, M. Attention based multi-level and multi-scale convolutional network for PolSAR image classification. Adv. Space Res.75(11), 7971–7986 (2025). [Google Scholar]
  • 26.Dong, H., Zhang, L., Lu, D. & Zou, B. Attention-based polarimetric feature selection convolutional network for PolSAR image classification. IEEE Geosci. Remote Sens. Lett.19, 1–5, Art 4001705 (2022).
  • 27.Hua, W., Wang, X., Zhang, C. & Jin, X. Attention-based multiscale sequential network for PolSAR image classification. IEEE Geosci. Remote Sens. Lett.19, 1–5, Art 4506505 (2022).
  • 28.Song, W., Wu, Y. & Xiao, X. Nonstationary PolSAR image classification by deep-features-based high-order triple discriminative random field. IEEE Geosci. Remote Sens. Lett.18(8), 1406–1410 (2021). [Google Scholar]
  • 29.Choi, K. S. & Oh, K. W. Subsampling-based acceleration of simple linear iterative clustering for superpixel segmentation. Comput. Vis. Image Underst.146, 1–8 (2016). [Google Scholar]
  • 30.Yin, J. et al. SLIC superpixel segmentation for polarimetric SAR images. IEEE Trans. Geosci. Remote Sens.60, 1–17, Art 5201317 (2022).
  • 31.Li, M. et al. Efficient superpixel generation for polarimetric SAR images with cross-iteration and hexagonal initialization. Remote Sens.14, 2914 (2022).
  • 32.Guo, Y. et al. Adaptive fuzzy learning superpixel representation for PolSAR image classification. IEEE Trans. Geosci. Remote Sens.60(1-18), Art 5217818 (2022). [Google Scholar]
  • 33.Yang, S., Yuan, X., Liu, X. & Chen, Q. Superpixel generation for polarimetric SAR using hierarchical energy maximization. Comput. Geosci.135, 104395 (2020). [Google Scholar]
  • 34.Li, M. et al. Superpixel generation for polarimetric SAR images with adaptive size estimation and determinant ratio test distance. Remote Sens.15, 1123 (2023). [Google Scholar]
  • 35.Cao, Y., Wu, Y., Li, M., Liang, W. & Zhang, P. PolSAR image classification using a superpixel-based composite kernel and elastic net. Remote Sens.13(3), 380 (2021).
  • 36.Ma, F., Zhang, F., Xiang, D., Yin, Q. & Zhou, Y. Fast task-specific region merging for SAR image segmentation. IEEE Trans. Geosci. Remote Sens.60, 1–16, Art 5222316 (2022).
  • 37.Zhang, F., Sun, X., Ma, F. & Yin, Q. Superpixelwise likelihood ratio test statistic for PolSAR data and its application to built-up area extraction. ISPRS J. Photogrammetry Remote Sens.209, 233–248 (2024). [Google Scholar]
  • 38.Hua, W., Zhang, C., Xie, W. & Jin, X. Polarimetric SAR image classification based on ensemble dual-branch CNN and superpixel algorithm. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens.15, 2759–2772 (2022). [Google Scholar]
  • 39.Shi, J. et al. Polarimetric synthetic aperture radar image classification based on double-channel convolution network and edge-preserving Markov random field. Remote Sens.15, 5458 (2023). [Google Scholar]
  • 40.Ren, J., Zhu, K., Hu, M., Shang, R. & Zhang, M. Polarimetric SAR image classification based on superpixel content-aware and semi-supervised ViT network. Appl. Soft Comput.186(Part A), 114040 (2026). [Google Scholar]
  • 41.Liu, H. et al. Graph convolutional networks by architecture search for PolSAR image classification. Remote Sens.13, 1404 (2021).
  • 42.Bi, H., Sun, J. & Xu, Z. A Graph-based semisupervised deep learning model for PolSAR image classification. IEEE Trans. Geosci. Remote Sens.57(4), 2116–2132. (2019).
  • 43.Yu, B., Xie, H., Fu, Y. & Xu, Z. Three-way graph convolutional network for multi-label classification in multi-label information system. Appl. Soft Comput.161, 111767 (2024). [Google Scholar]
  • 44.Xu, D. et al. Difference-guided multiscale graph convolution network for unsupervised change detection in PolSAR images. Neurocomputing555, 126611 (2023). [Google Scholar]
  • 45.Ma, F., Zhang, F., Yin, Q., Xiang, D. & Zhou, Y. Fast SAR image segmentation with deep task-specific superpixel sampling and soft graph convolution. IEEE Trans. Geosci. Remote Sens.60, 1–16, Art 5214116 (2022).
  • 46.Geng, J., Wang, R. & Jiang, W. Polarimetric SAR image classification based on feature enhanced superpixel hypergraph neural network. IEEE Trans. Geosci. Remote Sens.60(1–12), Art 5237812 (2022). [Google Scholar]
  • 47.Shi, J., He, T., Ji, S., Nie, M. & Jin, H. CNN-Improved Superpixel-to-Pixel fuzzy graph Convolution network for PolSAR image classification. IEEE Trans. Geosci. Remote Sens.61, 1–18, Art 4410118 (2023).
  • 48.Wang, R., Nie, Y. & Geng, J. Multiscale superpixel-guided weighted graph convolutional network for polarimetric SAR image classification. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens.17, 3727–3741 (2024).
  • 49.Zhu, W., Zhao, C., Feng, S. & Qin, B. Multiscale short and long range graph convolutional network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens.60, 1–15, Art 5535815 (2022).
  • 50.Zhou, Q., Gao, Q., Wang, Q., Yang, M. & Gao, X. Sparse discriminant PCA based on contrastive learning and class-specificity distribution. Neural Netw.167, 775–786 (2023). [DOI] [PubMed] [Google Scholar]
  • 51.Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR 2017), Toulon, France (2017).
  • 52.Mou, L., Lu, X., Li, X. & Zhu, X. X. Nonlocal graph convolutional networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens.58(12), 8246–8257 (2020). [Google Scholar]
  • 53.Imani, M. A random patches based edge preserving network for land cover classification using polarimetric synthetic aperture radar images. Int. J. Remote Sens.42(13), 4946–4964 (2021). [Google Scholar]
  • 54.Roggo, Y., Duponchel, L. & Huvenne, J. P. Comparison of supervised pattern recognition methods with Mcnemar’s statistical test: application to qualitative analysis of sugar beet by near-infrared spectroscopy. Anal. Chim. Acta. 477(2), 187–200 (2003). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets are available online in https://ietr-lab.univ-rennes1.fr/polsarpro-bio/san-francisco and https://github.com/fudanxu/CV-CNN?tab=readme-ov-file.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES