Skip to main content
Computational Intelligence and Neuroscience logoLink to Computational Intelligence and Neuroscience
. 2022 Apr 28;2022:3854635. doi: 10.1155/2022/3854635

Hyperspectral Image Classification: Potentials, Challenges, and Future Directions

Debaleena Datta 1, Pradeep Kumar Mallick 1, Akash Kumar Bhoi 2,3,4, Muhammad Fazal Ijaz 5,, Jana Shafi 6, Jaeyoung Choi 7,
PMCID: PMC9071975  PMID: 35528334

Abstract

Recent imaging science and technology discoveries have considered hyperspectral imagery and remote sensing. The current intelligent technologies, such as support vector machines, sparse representations, active learning, extreme learning machines, transfer learning, and deep learning, are typically based on the learning of the machines. These techniques enrich the processing of such three-dimensional, multiple bands, and high-resolution images with their precision and fidelity. This article presents an extensive survey depicting machine-dependent technologies' contributions and deep learning on landcover classification based on hyperspectral images. The objective of this study is three-fold. First, after reading a large pool of Web of Science (WoS), Scopus, SCI, and SCIE-indexed and SCIE-related articles, we provide a novel approach for review work that is entirely systematic and aids in the inspiration of finding research gaps and developing embedded questions. Second, we emphasize contemporary advances in machine learning (ML) methods for identifying hyperspectral images, with a brief, organized overview and a thorough assessment of the literature involved. Finally, we draw the conclusions to assist researchers in expanding their understanding of the relationship between machine learning and hyperspectral images for future research.

1. Introduction

Hyperspectral imagery is one of the most significant discoveries in remote sensing imaging sciences and technological advancements. Hyperspectral imagery (HSI) is the technology that depicts the perfect combination of Geographic Information System (GIS) and remote sensing. Besides, HSI has several advantages such as ecological protection, security, agriculture and horticulture applications, crop specification and monitoring, medical diagnosis, identification, and quantification [1]. RGB images are made up of three dimensions: width, height, and 3 color bands or channels consisting of color information, that is, red, green, and blue. They are stored as a 3D byte array that explicitly holds a color value for each pixel in the image; a combination of RGB intensities put down onto a color plane. However, in contrast, HSI comprises thousands of hypercubes and hence possesses a large resolution and an enormous amount of embedded information of all kinds—spectral, spatial, and temporal. This information enables various applications to detect and characterize land covers, which are most significantly explored [2]. RGB images are captured by digital RGB cameras capable of characterizing objects only based on their shape and color. Moreover, the embedded information is minimal since only three visible bands are available in the human visibility range. The HSI, on the other hand, is captured by specialized airborne hyperspectral sensors placed on artificial satellites, that is, spectrometers. They have a broad range of scenes by acquiring large numbers of consecutive bands, not confined to the visible light spectrum and through a wider spectral band-pass. However, compared to the digital sensor that absorbs light in just three wide channels, a hyperspectral sensor's channel width is much narrower, making the spectral resolution and data volume much higher, resulting in hurdles to store, mine, and manage [3]. Furthermore, processing these data with a massive number of bands imposes many obstacles such as noise-causing image calibration, geometric distortion, noisy labels, and limited or unbalanced labeled training samples [46], that is, Hughes phenomenon and dimensionality reduction-related artifacts: overfitting, redundancy, spectral variability, loss of significant features between the channels, etc. [7].

Classifying HSIs is considered to be an intrinsically nonlinear problem [8], and the initial approach by linear-transformation-based statistical techniques such as principle component analytical methods, that is, principal component analysis (PCA) [9] and independent component analysis (ICA) [10]; the discriminant analytical methods, that is, linear [11] and fisher [12]; wavelet transforms [13]; and composite [14], probabilistic [15], and generalized [16] kernel methods, had shown promising outcomes. Still, their focus was limited to spatial information. They emphasized that the feature extractor techniques assisted by some basic random classifiers that lead to complexity in terms of cost, space, and time are not sufficiently accurate. After the success of these traditional methodical techniques assigned for HSI classification, researchers became keenly interested in applying the most recent emerging but not tedious computer-based methods that made the entire process smoother and vicinal to perfection. Study advancements suggest that the last decade can be considered the most escalating era regarding computer-based technologies due to the emergence of machine learning (ML). ML is an algorithmic and powerful tool that resembles the human brain's cognition. It simply represents a complex system by holding abstraction. Hence, it can reduce complexities and peep into the insights of the vast amount of HS data to fetch out the hidden discriminative features, both spectral and spatial [17]. Thus, it overcomes all the stumbling blocks to achieve the desired accuracy in identifying the classes that the objects of the target HSI data belong to. Hence, they act as all-in-one techniques that can serve the purpose without further assistance. Keeping this in mind, we conducted an extensive survey based on the various discriminative machine and deep learning (ML, DL) models for HSI. In most of the literature studies, the HSI datasets that are commonly used for landcover classification are AVIRIS Indian Pines (IP), Kennedy Space Center (KSC), Salinas Valley (SV), and ROSIS-03 University of Pavia (UP), along with less frequently used Pavia Center, Botswana, University of Houston (HU), etc. They are pre-refined and made publicly available on [18] for download and perform operations.

The motivation of our work is divided into three parts. First, a novel methodology is proposed for the review work that is entirely systematic and helps find the inspiration in forming the research gaps and embedded questions after going through a large pool of research articles. Second, this work focuses on the current advancements of ML technologies for classifying HSI, with their brief, methodical description and a detailed review of the literature involved with them. Finally, the inferences are drawn and help the researchers boost knowledge for their future research. The key contributions made to the research field on hyperspectral imagery by our novel effort are as follows:

  1. The thorough revision of the analytical and classification work carried out to date on HS imagery by employing ML/DL techniques.

  2. Emphasis on the categorized methods explored and practiced so far in an overly frequent manner. Also, it includes a brief interpretation of the most recent technologies and the highlighted hybrid techniques.

  3. An open knowledge base that acts as a reservoir of relevant information that is listed out that interprets all research on each mentioned technique in terms of their methodology, convenience and limitations, and future strategies. This illustration might administrate in making a proper choice of objective for further research on the field of HSIs.

  4. Explicit idea of the growth of interest in the concerned field that would attract researchers to invest themselves with a coherent, substantial specification (benefaction and drawbacks) of all the methods, individually, that contributes academically to the researchers about their favorable result and the difficulties for a chosen technique.

  5. A transitory rendition of the most recent research on HSIs signifies the currently adapted technologies as hot spots. Also, focus on the research areas about the interest that could apply to others, that is, the hybridized methods popular among researchers to address the problem and achieve the desired experimental results.

The rest of the article is arranged as follows: Section 2 briefly explains the constraints faced by the researchers in dealing with HSI; Section 3 represents the methodology for the research along with the motive behind this review; Section 4 describes seven ML techniques, namely, support vector machine (SVM), sparse representation (SR), Markov random field (MRF), extreme learning machine (ELM), active learning (AL), deep learning (DL), and transfer learning (TL); Section 5 shows up the complete summary of the literature review work in the form of answers to the research questions; Section 6 depicts the conclusions; and Section 7 explains the limitations and future work.

2. Constraints of HSI Classification

Since their emergence, several difficulties have caused issues in analyzing and performing operations on hyperspectral images. Initially, it suffered from spectroscopy technology due to the bad quality of hyperspectral sensors and poor quality with insufficient data. However, along with the advancement in applied science, things have come to ease, but there are still some well-known nondispersible hitches that need to be overcome. Some of them are stated as follows:

  1. Lack of high-resolution Earth observation (EO) noiseless images: During the initial stage of the discovery of spectrometers, they were not very efficient. Due to this, noises caused by water vapor, atmospheric pollutants, and other atmospheric perturbations modify the signals coming from the Earth's surface for Earth observations. Several efforts have been made over the last decades to produce high-quality hyperspectral data for Earth observation and develop a wide range of high-performance spectrometers that combines the power of digital imaging, spectroscopy, and extracting numerous embedded spatial-spectral features [19].

  2. Hindrances in the extraction of features: During data gathering, redundancy across contiguous spectral bands results in the availability of duplicated information, both spatially and spectrally, obstructing the optimal and discriminative retrieval of spatial-spectral characteristics [7].

  3. The large spatial variability and interclass similarity: The hyperspectral dataset collected contains unusable noisy bands due to mistakes in the acquisition that result in information loss in terms of the unique identity, that is, the spectral signatures and excessive intraclass variability. Furthermore, with poor resolution, each pixel comprises broad spatial regions on the Earth's surface, generating spectral signature mixing, contributing to the enhanced interclass similarity in border regions, thus creating inconsistencies and uncertainties for employed classification algorithms [19].

  4. Limitation of available training samples and insufficient labeled data: Aerial spectrometers cover significantly smaller areas, so they can only collect a limited number of hyperspectral data. That leads to the restriction of the number of training samples for classification models [20]. In addition, HSIs typically contain classes that correspond to a single scene, and available classification models' learning procedures require labeled data. However, labeling each pixel requires human skill, which is arduous and time-consuming [21].

  5. Lack of balance among interclass samples: The class imbalance problems, where each class sample has a wide range of occurrences, diminish the usefulness of many existing algorithms in terms of enhancing minority class accuracy without compromising majority class accuracy, which is a difficult task in and of itself [22].

  6. The higher dimensionality: Due to incorporating more information in multiple channels, such high-band pictures increase estimation errors. The curse of dimensionality is a significant drawback for supervised classification algorithms, as it significantly impacts their performance and accuracy [23].

The possible solutions to the above limitations that also represent the possible operations that are performed to analyze and comprehend the HSIs can be (1) technological advancement to make versatile and robust hardware for the spectrometers to capture the scenes more accurately, (2) spectral unmixing and resolution enhancement for better feature extraction and distinguishing capability of the embedded objects, (3) image compression-restoration and dimensionality reduction for addressing the high-dimensions and lack of data, and (4) use of robust classifiers that are capable of dealing with the above issues as well as promote fast computation ability [7].

These hurdles were very prominent for the methods that classify HSI based on the feature extrication from HSI. After ML/DL came into the scene, the operations on HSI became effortless as explicit feature extraction is not needed, and it has also many advantages such as great dealing with noise and time complexity. However, ML/DL acquires a few drawbacks in specific criteria [19], including parameter-tuning and numerous local minima problems in training procedures and compression [20] overfitting, optimization, and convergence problems despite many positive aspects.

3. Research Methodology

This section is divided into three categories that will assist in understanding the review procedure and its ambition.

3.1. Planning of the Review

Three systematic advances are utilized that comprise the planning behind our work. First, based on efficacy and frequency of applicability on classifying HSIs, seven most recently used ML techniques have been chosen in this article for review, which establishes the operational relationship and compatibility with the issue of categorizing the land covers of a particular scene captured as HSI. Second, this relationship provides all the shortfalls and benefits of those methods and their potential possibilities. Finally, we identified the limitations of our present review work and how to rectify them in the future.

3.2. Conducting the Review

The entire review work has been conducted in the following steps:

  • (a)

    Collection of literature: The literature studies have been collected based on the keywords: “Hyperspectral image classification,” “Machine learning techniques,” “Deep learning techniques,” from the most relevant search engine, that is, Google (Google Scholar), which provides the scholarly articles for the concerned topic. These literature studies include Web of Science (WoS), Scopus, SCI, and SCIE-indexed and SCIE-related articles, both journals and conferences. Several methods are utilized throughout the literature that assist the classification of hyperspectral data, out of which ML techniques seem to be more convenient and promising.

  • (b)
    Screening: The collected research papers depict raw data, sorted categorically according to the chronological order of the ML techniques used over the periods. The screening was accomplished based on the following constraints:
    1. Time Period: The studies published in the range of 2010–2021 are included in this work. Studies published before 2010 are not included.
    2. Methodology: The studies on HSI's analytical operations (denoising, spectral unmixing, etc.) other than classifying the underlying land covers are rejected.
    3. Type: The studies that deal with the hyperspectral images of a particular land scene are considered, discarding the medical hyperspectral imagery, water reservoir, etc.
    4. Design of study: The studies comprising experimental outcomes and the elaboration of the models are accepted; other literary-based articles or review papers are only for primary knowledge gain.
    5. The language used: The studies written in the English language are only considered.
  • Figure 1 represents the total number of the literary studies screened individually on each of the categories of chosen ML techniques in the form of pie-charts with a percent-wise pattern. Figure 2 is a standard graphical depiction of the number of most recent articles that we screened for each chosen ML-based method in the period ranging from 2015 to 2021.

  • (c)

    Selection: Out of all the papers screened based on the abovementioned criteria, a few most eligible are handpicked. The selection has been made keeping specific parameters: the modeling strategy and algorithm and its suitability with the modern technological scenario. The final result is the corresponding overall accuracy (COA) for each dataset used, preferably journals with a good citation index.

  • (d)

    Analysis and inference: These selected papers are thoroughly reviewed to determine their contribution, restrictions, and future propositions. Based on this analysis, the deductions are drawn to show the pathway of further research.

Figure 1.

Figure 1

The statistical pie-charts of screened articles on ML/DL techniques used for HSI classification (source: SCI, SCIE, Scopus, WoS).

Figure 2.

Figure 2

The statistical bar graph of screened articles on ML/DL techniques used for HSI classification from 2015 to 2021 (source: SCI, SCIE, Scopus, WoS): (a). ML. (b). DL.

3.3. Research Investigations (RI)

The analysis arises some of the queries:

  •   RI 1: What is the significance of traditional ML and DL for analyzing HSI?

  •   RI 2: How is ML/DL more impactful on HSI than other non-ML strategies?

  •   RI 3: What are the advantages and challenges faced by the researchers for the chosen ML/DL-based algorithm for HSI classification?

  •   RI 4: What are the emerging literary works of ML/DL on HSI classification in the year 2021?

  •   RI 5: How are ML- and DL-based hybrid techniques helping scientists in HSI classification?

  •   RI 6: What are the latest emerging techniques associated with addressing classifying HSIs?

3.4. Datasets

The HSI datasets are pre-refined and made publicly available for download and perform operations. There are six datasets that are described here in a concise manner:

  1. AVIRIS Indian Pines: This dataset was taken by airborne visible infrared imaging spectrometer (AVIRIS) sensor, on June 12, 1992. The scene captured here was Indian Pines test site in North-Western Indiana, USA, and contains an agricultural area exemplified by its crops of regular geometry and some irregular forest zones. It consists of 145 ∗ 145 pixels with a spectral resolution of 10 nm and a spatial resolution of 20 mpp and 224 spectral reflectance bands in the wavelength range 0.4–2.5 μm, out of which 24 noisy bans are removed due to low signal-to-noise ratio. The scene contains 16 different classes of land covers.

  2. Salinas Valley: This scene was obtained by AVIRIS sensor over various agricultural fields of Salinas valley, California, USA, in 1998. The scene is characterized by a high spatial resolution of 3.7 mpp and a spectral resolution of 10 nm. The area is covered by 512 ∗ 217 spectral samples with a wavelength range of 0.4–2.5 μm. Out of 224 reflector bands, 20 noisy bands are discarded due to water absorption coverage. The scene comprises 16 different land classes.

  3. Pavia Center: This scene was captured by a reflective optics system imaging spectrometer (ROSIS-03) sensor during a flight campaign over Pavia, northern Italy. It possesses 115 spectral bands, out of which only 102 are useful. Its spectral coverage is 0.43–0.86 μm, with a spectral resolution of 4 nm and a spatial resolution of 1.3 mpp defined by 1096 ∗ 1096 pixels. There are 9 different land cover classes in the area.

  4. Pavia University: This scene was also captured by the same sensor at the same time as Pavia center, over the University of Pavia in 2001. It has the same structural features as the Pavia center, only contrasting in considering 103 bands out of 115 bands with a size of 610 ∗ 340 are taken after discarding 12 noisy bands. The scene contains 9 classes with urban environmental constructions.

  5. Kennedy Space Center: This scene was acquired by NASA AVIRIS sensor over Kennedy Space Center, Florida, USA, on March 23, 1996. It was taken from an altitude of approximately 20 kilometres, having a spatial resolution of 18 kilometres and a spectral resolution of 10 nm. The wavelength range of the scene is 0.4–2.5 μm with the special size of 512 ∗ 614 pixels; 24 of 48 bands were removed for a low signal-to-noise ratio. The ground contains 13 predefined classes by the center personnel.

  6. Botswana: The scene was obtained by the Hyperion sensor placed on the NASA EO-1 satellite over Okavango delta, Botswana, South Africa, on May 31, 2001. It has a special resolution of 30 metres and a spectral resolution of 10 nm while taken at an altitude of 7.7 kilometres. Out of 242 bands containing 1476 ∗ 256 pixels, with a wavelength range of 400–2500 nm, 97 bands are considered to be water-corrupted and noisy; hence, 145 remaining are useful. The scene comprises 14 land cover classes.

4. Machine Learning-Based Techniques for HSI Classification

ML technologies are not only intelligent and cognitive, but also their accuracy is skyrocketing due to their embedded mechanical abilities such as extraction, selection, and reduction of joint spatial-spectral features as well as contextual ones [2426]. Moreover, the hidden dense layers with various allocated functions of the extensive networks work as intelligent learners by creating dictionaries or learning spaces to store deterministic information and then separate the landcover classes through its classification units [2729]. The latest ML techniques that assist in classifying the hyperspectral data, that is, SVM, SRC, ELM, MRF, AL, DL, and TL, are shown categorically in Figure 3 and are discussed hereafter in detail.

Figure 3.

Figure 3

The categories of the eminent machine learning techniques used for HSI classification.

4.1. Support Vector Machine (SVM)

SVM is an innovative pattern-recognition technique rooted in the principle of statistical learning. The rudimentary concept of SVM-based training can unravel the ideal linear hyperplane so that the predicted classification error is mitigated, be it for binary or multiclass purposes [30], as depicted in Figure 4. For linearly separable binary classification, let (xi, yi) be the standard set of linearly separating samples with x ∈ (R)N and y ∈ {−1, +1}. The universal formula of linear decision function in n-dimensional space with the classification hyperplane is

gx=wT.x+b=0, (1)

where w is the weight directional vector and b is the slope of the hyperplane. A separating hyperplane with margin 2/||w|| in the canonical form must gratify the following constraints:

yiwT.xi+b1. (2)

Figure 4.

Figure 4

Classification strategy by multiclass SVM.

For multiclass scenarios, we presumably transform the datapoints to S, a probable infinite-dimensional space, by a mapping function ψ defined as ψ(x) = (x12, x22, √2x1x2), x = (x1, x2). Linear operations performed in S resemble nonlinear processes in the original input space. Let K(xi, xj) = ψ(xi)Tψ(xj) be the kernel function, which remaps the inner products of the training dataset.

Constructing SVM requires values of the constants, that is, Lagrange's multipliers, α = (α1,…, αN) so that

Pα=i=1Nαi12i,j=1NαiαjyiyjKxi.xj. (3)

is maximized with the constraints with respect to α:

i=1Nαiyi=0,αi0 for all αi. (4)

Because most αi are supposedly equal to zero, samples conforming to nonzero αi are support vectors. Conferring to the support vectors, the modified optimally ideal classification function is

fx=i=1NαiyiKxi.xj+b. (5)

The application of SVM for classifying HSI started two decades ago [31, 32]. Focusing on the potentially critical issue of applying binary SVMs [33], fuzzy-based SVM [34] as fuzzy input-fuzzy output support vector machine (F2-SVM), SVM evolved to dimensionality reduction and mixing of morphological details [35]. It also assisted particle swarm optimization (PSO) [36] and wavelet analysis with semi-parametric estimation [37], as the classifier “wavelet SVM” (WSVM). Table 1 summarizes the research carried out so far for the classification purpose of HSI using SVM.

Table 1.

Summary of review of HSI classification using SVM.

Year Method used Dataset and COA Research remarks and future scope
2011 Multiclass SVM [38] San Diego3—98.86% Outperforms traditional SVM and deals better with Hugh's effect
2012 Fuzzy decision tree-support vector machine (FDT-SVM) [39] Washington DC mall—94.35% Efficient testing accuracy truncated computational and storage demand, understandable edifices, and reduction of Hugh's effect
2014 Semi-supervised SVM kernel-spectral fuzzy C-means (KSFCM) [40] IP—98.52% Enhanced classification and clustering by fully exploring both labeled and unlabeled samples
2014 SVM-radial basis function (SVM-RBF) [41] IP—88.7%, UP—94.7% Outperforms other existing kernel-based methods
2015 Regional kernel-based SVM (RKSVM) [42] UP—95.40%, IP—92.55% Outperforms pixel-point-based SVM-CK
2017 Multiscale segmentation of super-pixels (MSP-SVMsub) [43] UP: MSP-SVMsub—97.57%, IP: MSP-SVMsub-95.28% Solving classic OBIC-based methods with difficulties determining the appropriate segmentation size reduces the Hughes phenomenon
2018 Extended morphological profiles (EMP), differential morphological profiles (DMP), Gabor filtering with SVM [44] UP: MFSVM-GF—98.46%, IP: MFSVM-GF—98.01% Outruns several advanced classifiers: SVM, super-pixel-based SVM, SVM-CK, multifeature SVM, EPF
2019 SVM-PCA [24] IP—91.37%, UP—98.46% Outperforms Naïve Bayes, decision tree k-NN

4.2. Sparse Representation and Classification (SRC)

Sparse method depends on dictionary learning that enhances and rectifies the values of parameters based upon the current training observations while accumulating the knowledge of the previous observations prior. It then generates the sparse coefficient vector using sparse coding. This method is supremely efficient as it embeds dictionary learning to extract rich features embedded inside the HSI dataset. SR can classify images pixelwise by representing the patches around the pixel with a linear combination of several elements taken from the dictionary. The generalization of SRC called multiple SRC (mSRC) has three chief parameters—patch size, sparsity level, and dictionary size. Dictionary learning is the first step for sparse, using K-SVD algorithm. Let Y = [y1, y2,…, yN] be a matrix of L2-normalized training samples yi ∈ Rm [4547].

The size of patches around the pixel is

minD,BYDBF2 such that bi0S,for all i, (6)

where D is a member of RmXn is the learned over a complete dictionary, with n > m atoms, B = [b1, b2,…, bm] represents the matrix of corresponding sparse coding vectors bi ∈ Rn, and ∣∣·∣∣F is the Frobenius norm. Sparsity S limits the number of nonzero coefficients in each bi. The next step sparse coding is provided with dictionary D and represents y as a linear combination of y = Db where b is sparse. For the final classification step, suppose for each class j ∈ {1,…, M} of an image, a dictionary Di is trained. Then, the classification of a new patch ytest is achieved by estimating a representation error. The class assignments rule [47] is calculated through a pseudoprobability measure P(Cj) for each class error Ej as

j=argmaxjPCj,where, PCj=1M1k=1,kjMEkk=1MEk. (7)

mSRC obtains residuals of disjoint sparse representation of ytest for all classes j. Each dictionary Dj is updated by eliminating nonzero atoms from bj after each of k iterations and ytest is assigned to the class, using Q total iterations:

Dj=argmaxjk=1QPkCj. (8)

Sparse representation is an essential and efficient machine-dependent method in many areas, including denoising, restoration, target identification, recognition, and monitoring. It may grow even more vital when associated with logistic regression, adaptivity, and super-pixels to extricate the joint features globally and locally. SR has a very high potential of being associated with methods such as PCA, ICA, Markov random fields, conditional random fields, extreme learning machines, and DL methods such as CNN and graphical convolutional network. Table 2 gives a summary of the research performed so far for the classification purpose of HSI employing SRC.

Table 2.

Summary of review of HSI classification using sparse representation.

Year Method used Dataset and COA Research remarks and future scope
2013 Kernel sparse representation classification (KSRC) [45] IP—96.8%, UP—98.34%, KSC—98.95% Lacks in devising automatic window size collection of spatial image quality, and filtering degree of class spatial relations

2014 Multiscale adaptive sparse representation (MASR) [46] UP—98.47%, IP—98.43%, SV—97.33% MASR outperformed the JSRM single-scale approach and several other classifiers on classification maps and accuracy
The structural dictionary desired to be more inclusive and trained by discriminative learning algorithms

2015 Sparse multinomial logistic regression (SMLR) [47] IP—97.71%, UP—98.69% Being a pixelwise supervised method, its performance is better than other contemporary methods
The model can be improved via more technical validations, exploitation of MRF, and structured sparsity-inducing norm that enhances the interpretability, stability, and identity of the model learned

2015 Super-pixel-based discriminative sparse model (SBDSM) [377] IP—97.12%, SV—99.37%, UP—97.33%, Washington DC mall—96.84% The advantages of this model lie in harnessing spatial contexts effectively through the super-pixel concept, which is better in performance speed and classification accuracy
Determination of a supplementary and systematic way to adjust the count of super-pixels to various conditions and apply SR to other remote sensing practices

2015 Shape-adaptive joint sparse representation classification (SAJSRC) [48] IP—98.45%, UP—98.16%, SV—98.53% Local area shape-adapted for every test pixel rather than a fixed square window for adaptive exploration of spatial PCs, making the method outperforms other corresponding methods
Region searching based on shape-adaption can be used instead of the reduced dimensional map to reconnoiter complete spatial information of the actual HSI

2017 Multiple-feature-based adaptive sparse representation (MFASR) [49] IP—97.99%, UP—98.39%, Washington DC mall—97.26% SA regions' full utilization of all embedded joint features makes the method superior to some cutting-edge approaches
Enhancement of the proposed method in the future by selecting features automatically and improving dictionary learning to reduce the computational cost

2018 Weighted joint nearest neighbor and joint sparse representation (WJNN-JSR) [50] UP—97.42%, IP— 93.95%, SV—95.61%, Pavia center—99.27% The model was improved using the Gaussian weighted method and incorporates the conventional test pixel area to achieve a new measure of classification knowledge: The Euclidean-weighted joint size
Creating more effective approaches to applying the system and further increasing classification accuracy are taken as future work

2019 Log-Euclidean kernel-based joint sparse representation (LogEKJSR) [51] IP—97.25%, UP—99.06%, SV—99.36% Specializes in extracting covariance traits from a spatial square neighborhood to calculate the analogy of matrices with covariances employing the conventional Gaussian form of Kernel
Creation of adaptive local regions using super-pixel segmentation methods and learning the required kernel using multiple kernel learning methods

2019 Multiscale super-pixels and guided filter (MSS-GF) [52] IP—97.58%, UP—99.17% Effective spatial and edge details in his, various regional scales to build MSSs to acquire accurate spatial information, and GF improved the classification maps for near-edge misclassifications
Additional applications of efficient methods to extract local features and segment super-pixels are added as future work

2019 Joint sparse representation—self-paced learning (JSR-SPL) [53] IP—96.60%, SV—98.98% The findings are more precise and reliable than other JSR methods

2019 Maximum-likelihood estimation based JSR (MLEJSR) [54] IP—96.69%, SV—98.91%, KSC—97.13% The model is reliable in terms of outliers

2020 Global spatial and local spectral similarity-based manifold learning-group sparse representation-based classifier (GSLS-ML-GSRC) [55] UP—93.42%, Washington DC mall—91.64%, SV—93.79% The said fusion makes the method outperform other contemporary methods focused on nonlocal or local similarities

2020 Sparse-adaptive hypergraph discriminant analysis (SAHDA) [56] Washington DC mall—95.28% Effectively depict the multiple complicated aspects of the HSI and will be considered for future spatial knowledge

4.3. Markov Random Field (MRF)

MRF describes a set of random variables satisfying Markov probability, depicted by undirected graphs. It is similar to the Bayesian network but, unlike it, undirected and cyclic. An MRF is represented as a graphical model of a joint probability distribution defined in Figure 5. The undirected graph of MRF, G = (V, E), in which V is the nodes representing random variables.

Figure 5.

Figure 5

Given the green nodes, the black node is independent of other nodes.

Based on the Markov properties [57], the neighborhood set Nc of a node c is defined as

Nc=cV|c,dE. (9)

The conditional probability of Yc decides the joint distribution of Y as

PYc|YvYc=PYc|YNc. (10)

To prosper the construction, the graph G absorbs a Gibbs distribution all over the maximum cliques (C) in G:

Py=mЄCψmym=1Ze1/TmЄCVmym, (11)

where Z is the partition function. Therefore, equation (11) can be rewritten as

Py=1Ze1/TUy, (12)

where T is the temperature, whose value is generally 1, and U(y)=∑mЄCVm(ym) represents the energy.

Markov models depict the stochastic method that is represented by a graph made of circles has an acute advantage of not considering the past states for all upcoming future states for a random alterable dataset such as HSIs. The variants of Markov random fields are adaptive, hierarchical, cascaded, and probabilistic, a blend of Gaussian mixture model, joint sparse representation, transfer learning, etc., whose outcomes are pretty victorious. Hidden Markov random fields are highly suitable for the unsupervised classification of HSIs where the model parameters are estimated to make each pixel belong to its appropriate cluster [58], leading to the precise classification. Table 3 lists out the research carried out so far for the classification purpose of HSI employing MRF.

Table 3.

Summary of review of HSI classification using MRF.

Year Method used Dataset and COA Research remarks and future scope
2011 Adaptive-MRF (a-MRF) [59] IP—92.55% Handles homogeneous problem of “salt and pepper” areas and the possibility of overcorrection impact on class boundaries

2014 Hidden MRF and SVM (HMRF-SVM) [60] IP—90.50%, SV—97.24% Outperforms SVM and improves overall accuracy outcomes by nearly 8% and 3.2%, respectively

2014 Probabilistic SR with MRF-based multiple linear logistic (PSR-MLL) [61] IP—97.8%, UP—99.1%, Pavia center—99.4% Exceeds other modern contemporary methods in terms of accuracy

2014 MRF with Gaussian mixture model (GMM-MRF) [62] UP(LFDA-GMM-MRF)-90.88% UP(LPNMF-GMM-MRF)—94.96% Advantageous for a vast range of operating conditions and spatial-spectral information to preserve multimodal statistics
GMM classificatory distributions are to be considered in the future

2011 MRF with sparse multinomial logistic regression classifier—spatially adaptive total variation regularization (MRF-SMLR-SpATV) [63] UP—90.01%, IP—97.85%, Pavia center—99.23% Efficient time complexity of the model
Improvisation of the model by implementing GPU and learning dictionaries are the future agendas

2016 Multitask joint sparse representation (MJSR) and a stepwise Markov random filed framework (MSMRF) [64] IP—92.11%, UP—92.52% The gradual optimization explores the spatial correlation, which significantly improves the effectivity and accuracy of the classification

2016 MRF with hierarchical statistical region merging (HSRM) [65] SVMMRF-HSRM: IP—93.10%, SV—99.15%, UP— 86.52%; MLRsubMRF-HSRM-IP—82.60%, SV—88.16%, UP—95.52% Better solution to the technique of majority voting that suffers from the problem of scale choice
Considering the spatial features in the spatial prior model of objects of the different groups in the future

2018 Integration of optimum dictionary learning with extended hidden Markov random field (ODL-EMHRF) [66] ODL-EMHRF-ML-IP—98.56%, UP—99.63%; ODL-EMHRF-EM-IP—98.47%, UP—99.58% The method has been proven to be better than SVM-associated EMRF

2018 Label-dependent spectral mixture model (LSMM) fused with MRF (LSMM-MRF) [67] The Konka image—94.19%, the shipping scene—66.45% Efficient unsupervised classification strategy that considers spectral information in mixed pixels and the impact of spatial correlation
Enhanced theoretical derivations of EM steps

2019 Adaptive interclass-pair penalty and spectral similarity information (aICP2-SSI) along with MRF and SVM [68] UP—98.10%, SV—96.40%, IP— 96.14% Outperforms other MRF-based methods
More efficient edge-preserving strategies, more spectral similitude, and class separable calculation methods as future research

2019 Cascaded version of MRF (CMRF) [69] IP—98.56%, Botswana—99.32%, KSC—99.24% Backpropagation tunes the model parameters and least computation expenses

2020 Fusion of transfer learning and MRF (TL-MRF) [70] IP—93.89%, UP—91.79% TL is taken to be very effective for HSI classification
Future research for reducing the number of calculations involved in the existing

2020 MRF with capsule net (caps-MRF) [71] IP—98.52%, SV—99.74%, Pavia center—99.84% Ensures that relevant information is preserved, and the spatial constraint of the MRF helps achieve more precise model convergence
The combination of CapsNet with several postclassification techniques

4.4. Extreme Learning Machine (ELM)

An efficacious learning algorithm based on single hidden layer feedforward neural network (SLFNN), it is applied to classify patterns and regression. Let (xi, pi) ∈ RnX Rm be N arbitrarily perceptible samples where xi = [xi1,…, xin]T ∈ Rn and pi = [pi1,…, pim]T ∈ Rm [72]. The standard SLFNN having N hidden nodes and f(x) as activation function is approached mathematically as

i=1N^αifixi=i=1N^αifwi.xj+bi=Oj;j=1,,N. (13)

Here, wi = [wi1,…, win]T gives the weight vector establishing the connection between input nodes and ith is the hidden node and αi = [αi1,…, αim]T represents the weight vector connecting between output node Oj with the ith hidden node, and wi.xj represents the inner product. The zero error for N samples can be written in the matrix form as  = P, where A (w1,…, wN^, b1,…, bN^, x1,…, xN) is the neural network hidden layer output matrix, and the ith is hidden node output with respect to x1,…, xN; the ith column of A represents xN inputs. The training of SLFNN is based on finding specific α, wi, and bi, (i = 1,…, N^) [73] such that

Aw1,,wN^,b1,,bN^,x1,,xNαP=minw,α,bAw1,,wN^,b1,,bN^,x1,,xNαP. (14)

This equation denotes the cost function with a depreciation. By using gradient-based algorithms, the set of weights (αi, wi) and biases bi are attuned with epochs as

wk=wk1ηδUWδW;U=k=1Nj=1N^αjfwj.xk+bjPk2. (15)

The learning rate η must be accurate for better convergence and N << N for better generalization performance.

Extreme learning methods proposed overcoming the disadvantage of a single hidden layer feedforward neural network and improving learning ability and generalization performance. It is a supervised method but is highly recommended to get an extension to its semi-supervised and unsupervised versions for dealing with the huge amount of data such as HSIs, which are primarily unlabeled and suffering from lack of training samples. Great potential lies with its other variants than those mentioned here, [74] of ELM, like two-hidden layer ELM, multilayer ELM, feature mapping-based ELM, incremental ELM, and deep ELM to become superior and achieve victorious precision in classifying HSIs. Table 4 underneath provides the summary of the research executed so far for the classification purpose of HSI utilizing ELM.

Table 4.

Summary of review of HSI classification using ELM.

Year Method used Dataset and COA Research remarks and future scope
2014 Ensemble extreme learning machines (E2LM)-bagging-based ELMs (BagELMs) and AdaBoost-based (BoostELMs) [72] UP—94.3%, KSC—97.71%, SV—97.19% BoostELM performs better than kernel and other EL methods
Performance of other differential or nondifferentiable activation functions

2015 Kernel-based ELM—composite kernel (KELM-CK) [75] IP—95.9%, UP—93.5%, SV—96.4% Outperforms other SVM-CK-based models

2015 ELM's two-level fusions: feature-level fusion (FF-ELM) and mixing ELM classifier two levels of fusions: feature-level fusion (FF-ELM) [76] FF-ELM: UP—98.11%, IP—92.93%, SV—99.12%; DF-ELM—UP—99.25%, IP—93.58%, SV—99.63% Outperforms basic ELM models

2016 Hierarchical local-receptive-field-based ELM (HL-ELM) [77] IP—98.36%, UP—98.59% Surpasses other ELM methods in terms of accuracy and training speed

2017 Genetic-firefly algorithm with ELM (3FA-ELM) [78] HyDice DC mall—97.36%, HyMap—95.58% Low complexity (ELM), better adaptability, and searching capability (FA)
Execution time needs to be reduced in future

2017 Local receptive fields-based kernel ELM (LRF-KELM) [79] IP—98.29% Outperforms other ELM models

2017 Distributed KELM based on MapReduce framework with Gabor filtering (DK-Gabor-ELMM) [80] IP—92.8%, UP—98.8% Outperforms other ELM models

2017 Loopy belief propagation with ELM (ELM-LBP) [81] IP—97.29% Efficient time complexity

2018 Mean filtering with RBF-based KELM (MF-KELM) [82] IP—98.52% The model offers the most negligible computational hazard

2018 Augmented sparse multinomial logistic ELM (ASMLELM) [83] IP—98.85%, UP—99.71%, SV—98.92% Improved classification accuracy by extended multi-attribute profiles and more SR

2018 ELM with enhanced composite feature (ELM-ECF) [84] IP—98.8%, UP—99.7%, SV—99.5% Low complexity and multiscale spatial feature for better accuracy
Incorporate feature-fusion technology

2019 Local block multilayer sparse ELM (LBMSELM) [85] IP—89.31%, UP—89.47%, SV—90.03% Performs anomaly and target detection. Reduced computational overhead and increased classification accuracy by inverse free; saliency detection and gravitational search

2019 ELM-based heterogeneous domain adaptation (EHDA) [25] HU-DC —97.51%, UP-DC —96.63%, UP-HU —97.53% Outperforms other HDA methods. Invariant feature selection

2019 Spectral-spatial domain-specific convolutional deep ELM (S2CDELM) [86] IP—97.42%, UP—99.72% Easy construction with high training-testing speed
Merge of DL with ELM
2020 Cumulative variation weights and comprehensive evaluated ELM (CVW-CEELM) [87] IP—98.5%, UP—99.4% Accuracy achieved due to the weight determination of multiple weak classifiers. Multiscale neighborhood choice and optimized feature selection

4.5. Active Learning (AL)

It is a special type of the supervised ML approach to build a high-performance classifier while minimizing the size of the training dataset by actively selecting valuable data points. The general structure of AL can be understood from Figure 6. There are three categories of AL—stream-based selective sampling, that is, where each unlabeled dataset is enquired for a certain label whether to assign a query or not; pool-based sampling; that is, the whole dataset is under consideration before selecting the best set of queries; and membership query synthesis; that is, it involves data augmentation to create user selected labeling. The decision to select the most informative data points depends on the uncertainty measure used in the selection. In an active learning scenario, the most informative data points are those the classifier is least sure about. The uncertainty measures for datapoints x [88] are

  • Least Confidence (LC): responsible for selecting the classifier's data point is least certain about the chosen class. With y as the most likely label sequence and ф as the learning model, LC is represented as
    SLCx=1Py|x,ф. (16)
  • Smallest Margin Uncertainty (SMU): Represents the difference between classification probability of the most likely class (y1∗) and that of the second-best class (y2∗), written mathematically as:
    SSMUx=Pфy1|xPфy2|x. (17)
  • Largest Margin Uncertainty (LMU): Represents the difference between classification probability of most likely class (y1∗) and that of the least likely class (ymin), written mathematically as:
    SLMUx=Pфy1|xPфymin|x. (18)
  • Sequence Entropy (SE): Detects the measure of disorder in a system; higher the entropy implies a more disordered condition. The denotation of SE is
    SSEx=yPy|x;фlog  Py|x;ф, (19)
  • with y^ ranging over all possible label sequences for input x.

Figure 6.

Figure 6

Principle of active learning.

Although not considered customary and coherent, AL is pretty much capable of reducing human effort, time, and processing cost for a large batch of unlabeled data. This method relies on prioritizing data that needs to be labeled in a huge pool of unlabeled data to have the highest impact on training. A desired supervised model keeps on being trained through active queries and improvising itself to predict the class for each remaining data point. AL is advantageous for its dynamic and incremental approach to training the model so that it learns the most suitable label for each data cluster [89]. Table 5 lists out the research performed so far for the classification purpose of HSI using AL.

Table 5.

Summary of review of HSI classification using active learning.

Year Method used Dataset and COA Research remarks and future scope
2008 AL with expectation-maximization-binary hierarchical classifier (BHC-EM-AL) and maximum-likelihood (ML-EM-AL) [90] Range: KSC-90-96%, Botswana—94-98% Better learning levels than the random choice of data points and an entropy-based AL
Measurement of the efficacy of the active learning-based knowledge transfer approach while systematically increasing the spatial/temporal segregation of the data sources

2010 Semi-supervised-segmentation with AL and multinomial logistic regression (MLR-AL) [91] IP—79.90%, SV—97.47% Innovative mechanisms for selecting unlabeled training samples automatically, AL to enhance segmentation results
Testing the segmentation in various scenarios influenced by limited a priori accessibility of training images

2013 Maximizer of the posterior marginal by loopy belief propagation with AL (MPM-LBP-AL) [92] IP—94.76%, UP—85.78% Improved accuracy than previous AL applications
Use parallel-computer-architectures such as commodity—clusters or GPUs to build computationally proficient implementation

2015 Hybrid AL-MRF, that is, uncertainly sampling breaking ties (MRF-AL-BT), passive selection approach random sampling (MRF-AL-RS), and the combination (MRF-AL-BT + RS) [93] IP—94.76%, UP—85.78% (MRF-AL-RS provides the highest accuracies) Outperforms conventional AL and SVM AL methods due to MRF regularization and pixelwise output
Merge the model with other effective AL methods and test them with a limited number of training samples

2015 Integration of AL and Gaussian process classifier (GP-AL) [94] IP—89.49%, Pavia center—98.22% Empirical autonomation of AL achieves reasonable accuracy
Adding diversity criterion to the heuristics and contextual information with the model and reducing computation time

2016 AL with hierarchical segmentation (HSeg) tree: adding features and adding samples (Adseg_AddFeat + AddSamp) [95] IP—82.77%, UP—92.23% Outruns several baseline methods-selecting appropriate training data from already existing labeled datasets and potentially decreasing manual laboratory labeling
Reduce the computational time that limits its applicability on large-scale datasets

2016 Multiview 3D redundant discrete wavelet transform-based AL (3D-RDWT-MV-AL) [96] HU—99%, KSC—99.8%, UP—95%, IP—90% The precious method as a combination of an initial process with AL, improved classification

2017 Discovering representativeness and discriminativeness by semi-supervised active learning (DRDbSSAL) [97] Botswana—97.03%, KSC—93.47%, UP—93.03%, IP—88.03% Novel approach with efficient accuracy

2017 Multicriteria AL [98] KSC—99.71%, UP—99.66%, IP—99.44% Surpasses other existing AL methods regarding stability, accuracy, robustness, and computational hazard
A multi-objective optimization strategy and the usage of advanced attribute-based profile features

2018 Feature-driven AL associated with morphological profiles and Gabor filter [99] IP—99.5%, UP—99.84%, KSC—99.53% (Gabor-BT) A discriminative feature space is designed to gather helpful information into restricted samples

2018 Multiview intensity-based AL (MVAL)-multiview intensity-based query-representative strategy (MVIQ-R) [100] UP—98%, Botswana—99.5%, KSC—99.9%, IP—95% Focus on pixel intensity obtains unique feature and hence better performance
Selection of combination of optimal attribute features

2019 Super-pixel with density peak augmentation (DPA)-based semi-supervised AL (SDP-SSAL) [101] IP—90.08%, UP—85.61% Novel approach proposed based on super-pixels density metric
Development of a pixelwise solution to produce super-pixel-based neighborhoods

2020 Adaptive multiview ensemble spectral classifier and hierarchical segmentation (Ad-MVEnC_Spec + Hseg) [102] KSC—97.63%, IP—87.1%, HU—93.3% Enhancement in the view sufficiency, and promotion of the disagreement level by the dynamic view, provides lower computational complexity due to parallel computing

2020 Spectral-spatial feature fusion using spatial coordinates-based AL (SSFFSC-AL) [103] IP—100%, UP—98.43% High running speed can successfully address the “salt and pepper” phenomenon but drops a few if similar class samples are distributed in different regions differently
The sampling weight parameter conversion to an adaptive parameter is adjusted adaptively as the training samples are modified

4.6. Deep Learning (DL)

Deep learning is the most renowned ML technology in application and accuracy terms. Although it is considered the next tread of ML, it also lends concepts from artificial intelligence. DL is the mother of algorithms that resemble human brain simulations, that is, creativity, enhanced analysis, and proper decision-making, based on pure or hybrid large networks for any given real-life problem. It has enhanced the throughput of computer-based, especially unsupervised snags for the practical technology-based applications such as automated translation of machines, image reconstructions and classifications, computer vision, and automated analysis. [104] The basic structure of any DL model possesses a three-type-layered architecture: it contains one input layer through which input data are fed to the next layer(s) known as the intermediate hidden layer responsible for all the computations based on the problem given, which passes its generated data to the final layer, that is, the output layer, which provides the desired ultimate output. The steps involved in DL models are as follows: having proper knowledge and understanding of the problem, collecting the input database, selecting the most appropriate algorithm, training the model with the sample source database, and finally testing the target database [105].

DL models are more efficient and advantageous over other ML models due to the following reasons [19]:

  1. The capability to extract hidden and complicated structures from raw data is inextricably linked to their ability to represent the internal representation and generalize any form of knowledge.

  2. They have a wide range of data types that they can accommodate, for example, 2D imagery data and complex 3D data such as medical imagery and remote sensing. In addition, they can use HSI data's spectral and spatial domains in both standalone and linked ways [106108].

  3. They provide architects a lot of versatility in terms of layer types, blocks, units, and depth.

  4. Furthermore, its learning approach can be tailored to various learning strategies, from unsupervised to supervised, with intermediate strategy.

  5. Additionally, developments in processing techniques, including batch partitioning and high-performance computation, especially on distributed and parallel architecture, have enabled DL models to find better opportunities and solutions when coping with enormous volumes of data [109].

The models that are broadly used for HSI classification are described as follows.

  • (a)

    Autoencoder (AE): AEs are the fundamental unsupervised deep model based on the backpropagation rule. AEs consist of two fragments: encoder, connecting the input vector to the hidden layer by a weight matrix; decoder, formed by the hidden layer output via a reconstruction vector tied by a specific weight matrix. SAEs are AEs with multiple hidden layers where the production of every hidden layer is fed to the successive hidden layer as input. It comprises three steps: (1) first AE trained to fetch the learned feature vector; (2) the former layer's feature vector is taken as input to the next layer, and this process is redone till the completion of training; (3) backpropagation is used after all the hidden layers have been trained to reduce the cost function and to update the weights is done with a named training set to obtain fine-tuning [110]. The architecture of SAE is depicted in Figure 7.

  • Let xn ∈ Rm; n = 1, 2,…, N represent the unlabeled input dataset, En be the hidden encoder vector computed by xn, and yn be the decoder vector of the output layer [111].
    Encoder: En=gWixn+bi; (20)
  • g-> encoding function, Wi-> encoder weight matrix, bi-> encoder bias vector.
    Decoder: yn=fWjEn+bj; (21)
  • f-> decoding function, Wj-> decoder weight matrix, bj-> decoder bias vector.

  • The reconstruction error in SAE is denoted as
    ΦΘ=argminθ,θ1Nk=1NLxk,yk,where the Loss function is Lxk,yk=xy2. (22)
  • AEs are unsupervised neural networks that embed several convolutional hidden layers based on nonlinear activation functions and transformations [112]. There are high risks of data loss during training, but it handles the model well for specific data types through specialized training. There are AEs for every purpose such as convolutional, sparse, variational, deep, contractive, and denoising applied for data compression, noise removal, feature extraction, image augmenting, and image coloring. AE inevitably provides a vast platform for further research on its various applicability and its capability to participate in hybridization. Table 6 describes a few research works in the aspect of AEs.

  • (b)
    Convolutional Neural Network (CNN): It is a famous deep neural network that works like a human visual cortex with many interconnected layers applied widely in image, speech, and signal processing. It assigns learnable and modifiable weights and biases to the input image to identify various objects or patterns with differentiable features. As shown in Figure 8, each layer of CNN possesses filtering capabilities with ascending complexities: the first layer learns filtering corners and edges; intermediate layers learn object parts filtering; and the last layer learns filtering out the entire object in different locations and shapes. The comparison between the layers in terms of several parameters is shown in Table 7. It consists of four layers [117, 118]:
    • (1)
      Convolution: This operation is the cause of the naming of CNN, that is, a dot product of the original pixel values with weights identified in the filter or kernel of the image. The findings are compiled into one number representing all the pixels found in the filter. Assuming I be the hyper-input-cube of dimension p × q × r where p × q denotes the spatial size of I with r number of bands, and ik is the kth feature map of I. Let d number of filters be present in each convolutional layer, and weight Wm and bias bm represent the mth filter. The mth convolutional layer output with transformation function g is denoted as
      Ym=k=1rgik.Wm+bm;m=1,2,,d. (23)
    • (2)
      Activation: The convolution layer produces a matrix significantly smaller than the actual image. The matrix is passed through an activation layer (generally rectified linear unit, aka ReLU), adding nonlinearity that enables the network to train itself through backpropagation.
    • (3)
      Pooling: It is the method of even more downsampling and reduction of the matrix size. A filter is applied over the results obtained by the previous layer and chooses a number from each set of values (generally the maximum, the max-pooling), which allows the network to train much more quickly, concentrating on the most valuable information in each image feature. For an m × m square window neighbor S with N elements and zij activation value concerning (i, j) location, the average pooling is formulated as
      T=1Ni,jϵSzij. (24)
    • (4)
      Fully Connected (FC): A typical perceptron structure with multilayers. The input is a single-dimensional vector representing the output of the layers above. Its output is a probability list for the various possible labels attached to the image. Classification decision is the mark that receives the highest likelihood. It is mathematically represented with transformation function g, for N samples of inputs with X″ and Y″ being the outputs having W being the weight matrix and b, the bias constant, is as follows:
      Y=j=1NgWX+b. (25)
    • CNN is the most method-in-demand and widely explored model among all DL models. The functional unit of convolutional layers is kernels that expertise in extricating the most relevant and enriched spatial and spectral features from the given dataset through automated filtering by convolution operation [119]. It provides an intense description of the whereabouts of CNNs. The most popular ones are attention-based CNN, ResNet, CapsNet, LeNet, AlexNet, VGG, etc. Some of them are still unexplored yet in classifying HSI. The detailed research work on CNN for dealing with HSI classification is listed in Table 8.
  • (c)
    Recurrent Neural Network (RNN): DL is a very efficient approach that follows a sequential framework with a definite timestamp t. “Recurrent” refers to performing the same task for each sequence element, with the output depending on the preceding computations. In other words, they have a “memory” that enfolds information about the calculation so far type of neural network, and the output of a particular recurrent neuron is fed backward as input to the same node, which leads the network to efficiently predict the output, represented in Figure 9, where RNN unrolls, that is, show the complete sequence of the entire network structure neuron by neuron. It consists of the following steps:
    • (1)
      X = […, xt−1, xt, xt+1,…] be the input vector, where xt represents input at timestamp t.
    • (2)
      ht is the “memory of the network,” the hidden state at timestamp t. Preliminarily, h−1 is initialized to zero vector to calculate the first hidden step. ht being the current step is calculated based on previously hidden step ht−1, formulated by [132]
      ht=fPxt+Wht1, (26)
    • where f denotes a function of nonlinearity, that is, tanh or ReLU, and W be the weight vector.
    • (3)
      Y = […, yt−1, yt, yt+1,…] be the output vector, where yt represents input at timestamp t, generally a softmax function: yt = softmax(Q ht).
    • RNN is an efficient deep model with large potential. The recurrence looping structure acquainted with RNN enables it to store relevant information about spatial-spectral relationships between the pixels and neighbors. There are several RNN architectures based on inputs/outputs as stated in [133], and based on LSTM, there are five categories [134]. These variates can be well utilized in collaboration with other DL methods such as MRF and PCA to find their accuracy.
    • The literature studies based on RNN are cataloged in Table 9.
  • (d)

    Deep Belief Network (DBN): DBNs are formed by greedy stacking and training restricted Boltzmann machines (RBMs), an unsupervised learning algorithm based on “contrastive divergence.” For neural networks, RBMs suggest taking a probabilistic approach and are thus called stochastic neural networks. Each RBM is made of three parts: a visible unit (input layer), an invisible unit (hidden layer), and a bias unit. The general structure of a DBN is depicted in Figure 10.

  • For a DBN, the joint distribution of input vector, X with n hidden layers hn, is defined as [137]
    PX,h1,,hn=i=0n2Phi|hi+1.Phn1,hn, (27)
  • where X = h0, P(hi−1, hi) is the conditional distribution of the visible units on the hidden RBM units at level i and P(hn−1, hn) is the hidden-visible joint distribution in top-level RBM. DBN has two phases: the pretraining phase depicts numerous layers of RBM, and fine-tuning phase is simply a feedforward NN.

  • DBN is the graphical representation that is generative; that is, it creates all distinct outcomes that can be produced for the particular case and learn to disengage a deep hierarchical depiction of the sample training data. DBNs are structurally more capable than RNNs as they lack loops, are pretrained in an unsupervised way, and are computationally eminent for particularly classification problems. Minor modifications or collaborations can improvise DBNs functionally and accuracy. Table 10 depicts a list of works done on DBN.

  • (e)

    Generative Adversarial Network (GAN): One of the most recent DL models that are rapidly growing its footsteps in the area of technical research. The GAN model is trained using two kinds of neural networks: the “generative network” or “generator” model that learns to generate new viable samples and the “discriminatory network” or “discriminator,” which learns to discriminate generated instances from existing instances. Discriminative algorithms seek to classify the input data, which is given as a collection of certain features; the algorithm maps feature on labels [140]. In contrast, generative algorithms attempt to construct the input data, which is given with a set of features, and it will not classify it, but it will attempt to create a feature that matches a certain label. The generator tries to get better at deluding the discriminator during the training, and the discriminator tries to grab the counterfeits generated by the generator. Thus, the training procedure is termed adversarial training. The generator and discriminator should be trained against a static opponent, keeping the discriminator constant while training the generator and keeping the generator constant when training the discriminator. That helps to understand the gradients better.

Figure 7.

Figure 7

The network structure of stacked autoencoders; input X-to-E is the encoding phase; E-to-output Y is the decoding phase.

Table 6.

Summary of the review of HSI classification using deep learning—AE.

Year Method used Dataset and COA Research remarks and future scope
2013 Autoencoders (AE) [110] Error rate: KSC—4%, Pavia city—14.36% This article opened a considerable doorway of research, including other deep models for better accuracy

2014 Stacked autoencoder and logistic regression (SAE-LR) [113] KSC—98.76%, Pavia city—98.52% Highly accurate in comparison to RBF-SVM and performs testing in optimized time limit than SVM or KNN but fails in training time efficiency

2016 Spatial updated deep AE with collaborative representation-based classifier (SDAE-CR) [114] IP—99.22%, Pavia center—99.9%, Botswana—99.88% Highly structured in extracting high specialty deep features and not the hand-crafted ones and accurate
Improving the deep network architecture and selection of parameters

2019 Compact and discriminative stacked autoencoder (CDSAE) [115] UP—97.59%, IP—95.81%, SV—96.07% Efficient in dealing with feature space in low dimension, but the computation cost is high as per architecture size

2021 Stacked autoencoder with distance-based spatial-spectral vector [116] SV—97.93%, UP—99.34%, surrey—94.31% Augmentation of EMAP features with the geometrically allocated spatial-spectral feature vectors achieves excellent results. Better tuning of hyperparameter and more powerful computational tool required
Improving the training model to become unified and classified in a more generalized and accurate way

Figure 8.

Figure 8

The CNN architecture deploying the layers.

Table 7.

Comparison of convolutional layers.

Arguments Convolution layer Pooling layer Fully connected layer
Input (i) 3D-cube, preceding set of feature maps (i) 3D-cube, preceding set of feature maps (i) Flattened-3d-cube, preceding set of feature maps

Parameters (i) Kernel counts (i) Stride (i) Number of nodes
(ii) Kernel size (ii) Size of window (ii) Activation function: selected based on the role of the layer. For aggregating info-ReLU. For producing final classification—softmax
(iii) Activation function (ReLU)
(iv) Stride
(v) Padding
(vi) Type and value of regularization

Action (i) Application of filters made of small kernels to extricate features (i) Reduction of dimensionality (i) Aggregate information from final feature maps
(ii) Learning (ii) Extraction of the maximum of a region average (ii) Generate final classification
(iii) One bias per filter (iii) Sliding window framework
(iv) Application of activation function on each feature map value

Output (i) 3D-cube, a 2D-map per filter (i) 3D-cube, a 2D-map per filter, reduced spatial dimensions (i) 3D-cube, a 2D-map per filter

Table 8.

Summary of review of HSI classification using deep learning—CNN.

Year Method used Dataset and COA Research remarks and future scope
2015 Convolutional neural network and multilayer perceptron (CNN-MLP) [120] Pavia city—99.91%, UP—99.62%, SV—99.53%, IP—98.88% Far better than SVM, RBF mixed classifiers, the effective convergence rate can be useful for large datasets
Detection of human behavior from hyperspectral video sequences

2016 3D-CNN [121] IP—98.53%, UP—99.66%, KSC—97.07% A landmark in terms of quality and overall performance
Mapping performance to be accelerated by postclassification processing

2016 Spectral-spatial feature-based classification (SSFC) [122] Pavia center—99.87%, UP—96.98% Highly accurate than other methods
Inclusion of optimal observation scale for improved outcome

2016 CNN-based simple linear iterative clustering (SLIC-CNN) [123] KSC—100%, UP—99.64, IP—97.24% Deals with a limited dataset use spectral and local-spatial probabilities as an enhanced estimate in the Bayesian inference

2017 Pixel-pair feature enhanced deep CNN (CNN-PPF) [124] IP—94.34%, SV—94.8%, UP—96.48% Overcomes the significant parameter and bulk-data problems of DL, PPFs make the system unique and reliable, and voting strategy makes the more enhanced evaluations in classification

2017 Multiscale 3D deep convolutional neural network (M3D-DCNN) [125] IP—97.61%, UP—98.49%, SV—97.24% Outperforms popular methods like RBF-SVM and combinations of CNNs
Removing data limitations and improving the network architecture

2018 2D-CNN, 3D-CNN, recurrent 2D-CNN (R-2D-CNN), and recurrent 3-D-CNN (R-3D-CNN) [126] IP-99.5%, UP—99.97%, Botswana—99.38%, PaviaC—96.79%, SV—99.8%, KSC—99.85% R-3D-CNN outperforms all other CNNs mentioned and proves to be very potent in both fast convergence and feature extraction but suffers from the limited sample problem
Applying prior knowledge and transfer learning

2019 3D lightweight convolutional neural network (CNN) (3D-LWNet) [127] UP—99.4%, IP—98.87%, KSC—98.22% Provides irrelevance to the sources of data
Architecture is to be improvised by intelligent algorithms

2020 Hybrid spectral CNN (HybridSN) [128] IP—99.75%, UP—99.98%, SV—100% Removes the shortfalls of passing over the essential spectral bands and complex, the tedious structure of 2D-CNN and 3D-CNN exclusively and outruns all other contemporary CNN methods superiorly, like SSRN and M-3D-CNN

2020 Heterogeneous TL based on CNN with attention mechanism (HT-CNN-attention) [129] SV—99%, UP—97.78%, KSC—99.56%, IP—96.99% Efficient approach regardless of the sample selection strategies chosen

2020 Quantum genetic-optimized SR based CNN (QGASR-CNN) [27] UP—91.6%, IP—94.1% With enhanced accuracy, overfitting and “salt-and-pepper” noise are resolved
Improvement of operational performance by the relation between feature mapping and selection of parameters

2020 Rotation-equivariant CNN2D (reCNN2D) [130] IP—97.78%, UP—98.89, SV—98.18% Provides robustness and optimal generalization and accuracy without any data augmentation

2020 Spectral-spatial dense connectivity-attention 3D-CNN (SSDANet) [131] UP—99.97%, IP— 99.29% Higher accuracy but high computational hazard
Optimization by using other efficient algorithms

Figure 9.

Figure 9

The RNN structure with recurrent neurons.

Table 9.

Summary of review of HSI classification using deep learning—RNN.

Year Method used Dataset and COA Research remarks and future scope
2017 Gated recurrent unit-based RNN with parametric rectified tanh as activation function (RNN-GRU-pretanh) [132] UP—88.85%, HU—89.85%, IP—88.63% An enhanced model that utilizes the intrinsic feature provided by HS pixels with better accuracy than SVM
The study is limited to only spectral features
Incorporation of deep end-to-end convolutional RNN with both spatial-spectral features

2019 Spectral-spatial cascaded recurrent neural network (SSCasRNN) [135] IP—91.79%, UP—90.30% Outruns pure RNN and CNN models due to the perfect placement of convolutional and recurrent layers to explore joint information

2020 Geometry-aware deep RNN (Geo-DRNN) [136] UP—98.05%, IP—97.77% Due to encoding the complex geometrical structures, the data lack space
Minimization of memory-occupation

2021 2D and 3D spatial attention-driven recurrent feedback convolutional neural network (SARFNN) [28] IP—99.15%, HU—86.05% Integrating attention and feedback mechanism with recurrent nets in two layers, 2D and 3D, enables efficient accuracy

Figure 10.

Figure 10

The detailed DBN structure.

Table 10.

Summary of review of HSI classification using deep learning—DBN.

Year Method used Dataset and COA Research remarks
2015 Deep belief network and logistic regression (DBN-LR) [137] IP—95.95%, Pavia City—99.05% The drawback in training time complexity, it is super-fast testing, and result generating capability outperforms RBF-SVM with EMP

2019 Spectral-adaptive segmented deep belief network (SAS-DBN) [138] UP—93.15%, HU—98.35% Capable of addressing the complexities and other subsidiaries of limited samples

2020 Conjugate gradient update-based DBN (CGDBN) [139] UP—97.31% Better approach towards stability and convergence of the training model
High time complexity

In a GAN model, say D and G denote the discriminator and the generator units that map a noise data space θ to real and original data space x, respectively. G(θ) denotes the fake output generated by G, and D(y), and D(G(θ)) are D's output for real and fake training samples, respectively. Pθ(θ) and Pd(y) represent the input model distribution and original data distribution, respectively, when θPθ [141] as shown in Figure 11.

The loss function for D:LD=maxlogDy+log   1DGθ. (28)
The Loss function for G:LG=minlogDy+log   1DGθ. (29)

Figure 11.

Figure 11

The GAN architecture.

Combining equations (28) and (29), the total loss of the entire dataset represented by the min-max value function is given by

minGmaxDVD,G=minGmaxDEyPdylogDy+EθPθθlog1DGθ. (30)

GAN is a generative modeling neural network architecture based on the concept of adversarial training that utilizes a model to build new instances that are conceivably derived from an existing sample distribution. Hence, GANs are new favorites for classifying HSIs as they compensate for the lack of data problem and classify the data in a pro manner. There are several types of GANs—conditional GAN, vanilla GAN, deep convolutional GAN (simple type); and Pix2Pix GAN, CycleGAN, StackGAN, and InfoGAN (complex type) [142]. These may be very useful for images like HSIs as they can deal with related issues. The research works based on the GAN are listed in Table 11.

Table 11.

Summary of review of HSI classification using deep learning—GAN.

Year Method used Dataset and COA Research remarks and future scope
2018 Hyperspectral 1D generative adversarial networks (HSGAN) [140] IP—83.53% Outperforms CNN, KNN, etc.

2018 3D augmented GAN [143] SV—93.67%, IP—91.1%, KSC—98.12% Data augmentation solved the problem of overfitting and improved class accuracy

2019 Conditional GAN with conditional variational AE (CGAN-CVAE) [144] UP—83.85%, DC Mall—89.36% Semi-supervised and ensemble prediction technique ensures the model's training under limited sample conditions

2020 Semi-supervised variational GAN (SSVGAN) [145] UP—84.35%, Pavia Center—97.15%, DC Mall—92.21%, Jiamusi—64.76% Outperforms other GAN variants, that is, CVAEGAN and ACGAN, but it suffers from feature matching, overfitting, and convergence problem
Correction through metric learning method

2020 Spectral-spatial GAN-conditional random field (SS-GANCRF) [146] IP—96.3%, UP—99.31% Enhanced classification capability
Creating an end-to-end training system, graph constraint placed on the convolutional layers

2021 Adaptive weighting feature-fusion generative adversarial network (AWF2-GAN) [147] IP—97.53%, UP—98.68% Exploration of the entire joint feature space and fusion of them, joint loss function, and the central loss gained intraclass sensitivity from local neighboring areas and offered an efficient spatial regularization outcome

2021 Variational generative adversarial network with crossed spatial and spectral interactions (CSSVGAN) [148] IP—93.61%, UP—99.11%, SV—97% Increased classification potential by utilizing transformer and GAN

4.7. Transfer Learning (TL)

It is the most current hot topic in interactive learning, and there are more to it to be explored. It is an approach where information gained is transferred in one or more source tasks and is used to enhance the learning of a similar target task. TL can be represented diagrammatically by Figure 12 and mathematically shown as follows:

Figure 12.

Figure 12

The principle of transfer learning.

Domain, D, is represented as {X, P(X)}, X = {x1,…, xn}, xi ∈ X; X denotes the feature space, and P(X) symbolizes the marginal probability of sample data point X [149].

Task T is depicted as {Y, P(Y|X)} = {Y, Φ}, Y = {y1,…, yn}, yi ∈ Y; Y is the label space, Φ is the prognostic objective function, having learned form (feature vector, label) couples, (xi, yi); xi ∈ X, yi ∈ Y, and calculated as the conditional probability.

Also, for every feature vector in D, Φ predicts its corresponding label as Φ(xi) = yi.

If DS and DT be the source and target domains, TS and TT be the source and target tasks, respectively, with DS ≠ DT and TS ≠ TT. TL objectifies to learn P(YT|XT), that is, the target conditional probability distribution in DT with knowledge obtained from DS and TS.

Traditional learning is segregated and solely based on particular tasks, datasets, and different independent models working on them. No information that can be converted from one model to another is preserved, but on the contrary, TL possesses the human-like capability of transferring knowledge; that is, knowledge can be leveraged from priorly trained models to train new models, the process of which is faster, more accurate, and with the limited amount of training data. Table 12 represents a brief detail about the research works on transfer learning.

Table 12.

Summary of review of HSI classification using transfer learning.

Year Method used Dataset and COA Research remarks and future scope
2018 Deep mapping-based heterogeneous transfer learning model (DLTM) [150] Washington DC Mall—96.25% Capable of binary classification
Improvisation to multiclass classification

2018 AL with stacked sparse autoencoder (AL-SSAE) [151] UP—99.48%, center of Pavia—99.8%, SV— 99.45% Domains, both source, and target possess finely tuned hyperparameters
Architectural parameters need to be modified further to enhance the classification accuracy

2020 Heterogeneous TL based on CNN with attention mechanism (HT-CNN-attention) [152] SV—99%, UP—97.78%, KSC—99.56%, IP—96.99% Efficient approach regardless of the sample selection strategies chosen

2020 ELM-based ensemble transfer learning (TL-ELM) [26] UP—98.12%, Pavia center—96.25% Efficient accuracy and transferability with high training speed
Inclusion of SuperPCA and knowledge transfer

2020 Lightweight shuffled group convolutional neural network (abbreviated as SG-CNN) [153] Botswana—99.67%, HU—99.4%, Washington DC—97.06% Fine-tuned model as compared to CNN architectures, low computational cost for training
Inclusion of more grouped convolutional architectures

2021 Super-pixel pooling convolutional neural network with transfer learning (SP-CNN) [154] SV—95.99%, UP—93.18%, IP—94.45% More excellent parameter optimization with more accuracy using a limited number of samples and in a very short period for both training and testing
Optimal super-pixel segmentation and merging with different CNN architectures

5. Discussion

Based on the reviewed articles, we can draw the desired inferences that provide answers to the investigative questions mentioned in Section 2 and show the clear motive and benefits of this review.

RI 1: What is the significance of traditional ML and DL for analyzing HSI?

Ans: Hyperspectral data have certain restrictions, as cited in Section 1. Statistical classifiers initially addressed them, but the operations and analysis became much easier and more accurate after the invention of ML/DL strategies in a machine-dependent way [155, 156]. The general advantages that researchers were provided by the ML/DL algorithms while dealing with HSIs are as follows: (i) easy dealing with high-dimensional data, that is, troubles of Hughes phenomenon removed [115, 125]; (ii) equally manipulative to labeled and unlabeled samples [99, 150]; (iii) precise and the meticulous choice of features [51, 127]; (iv) high-end-precise models to deal with real hypercubes, hence top-notch classification accuracy [119, 154]; v) removes overfitting, noises, and other hurdles to a much greater extent [120, 147]; (vi) embedded spatial-spectral feature extraction and selection units [119, 133]; (vii) mimics human brain to solve multiclass problems [136, 138].

RI 2: How are ML/DL more impactful on HSI than other non-ML strategies?

Ans: The initial discovery of hyperspectral data has suffered due to its limitations. In the preliminary research stage, the scientists followed the traditional methodology for classifying HSIs, that is, preprocessing (if required), extraction, and selection of discriminative characteristics and then ran a classifier on those features to identify the land cover groups. Hence, they emphasized the feature extractor techniques such as PCA [9], ICA [10], and wavelets [13], assisted by some basic random classifiers such as extended morphological profiles [2, 157], NN [158, 159], logistic regression [160], edge-preserving filters [10, 161], density functions/matrices [162], and Bayes law of classification [163, 164]. These classic mathematics-oriented techniques were not enough to deal with such a huge amount of data like HSI, as they were simple in structure and design and easy to implement. It also could not predict well enough the multiclass problems, which is very much required for a dataset like HSI, whose land covers belong to multiple classes of regions. Also, these methods were not accurate in feature selection and extraction or dealing with the storage of such bulk data. These reasons made researchers struggle to analyze properly, process, and classify HSIs. On the contrary, the advancements of ML/DL technologies have opened a broad gateway of research that researchers are still exploring and combining with different groupings to address the HSI classification problem in real life, dealing with the limitations mentioned above [26, 131]. The tabular depiction of the advantages and disadvantages of the ML and non-ML strategies applied for HSI classification is shown in Table 13.

RI 3: What are the advantages and challenges faced by the researchers for the chosen ML/DL-based algorithm for HSI classification?

Ans: We added the advantages and challenges of the ML- and DL-based techniques in Table 13.

Table 13.

Comparison between ML and non-ML techniques for HSI classification.

Methods Advantages Disadvantages
Classical state-of-art techniques (i) Simple structure and design (i) High space complexity due to the storage of bulk data
(ii) Less time consumption (ii) Based on empirical identities, hence a tedious workpiece
(iii) Easy to implement (iii) Feature selection and extraction are not accurate
(iv) Dimension handling skillfully by PCA and ICA (iv) Suffers from limited labeled sample problem, Hughes phenomenon, and noise
(v) Better binary and moderate multiclass classification by kernel and SVM

Advanced machine learning techniques (i) Easy dealing with high-dimensional data, that is, troubles of Hughes phenomenon removed (i) The construction of the model is difficult due to its complex network-alike structure
(ii) Equally manipulative to labeled and unlabeled samples (ii) High time complexity due to training and testing of the huge amount of raw HSI data
(iii) Precise and meticulous choice of features (iii) Extremely expensive design
(iv) High-end-precise models to deal with real hypercubes, hence, top-notch classification accuracy (iv) Strenuous to implement
(v) Removes overfitting, noises, and other hurdles to a much greater extent
(vi) Mimics the human brain to solve multiclass problems

RI 4: What are the emerging literary works of ML/DL on HSI classification in the year 2021?

Ans: In the ongoing years, 2021 seems to be more promising in terms of technical advancements for the problem concerned. New techniques are emerging, along with hybrid ones, to solve the issue to a whole new level, the methodologies' accuracy to be described. Recent work on MRF with a band-weighted discrete spectral mixture model (MRF-BDSMM) in a Bayesian framework has been proposed in [165], an unsupervised adaptive approach to accommodate heterogeneous noise and find the abundant labeled subpixels to extricate joint features. A collaboration of Kernel-based ELM with PCA, local binary pattern (LBP), and gray-wolf optimization algorithm (PLG) is proposed as novel methodologies. They help reduce huge dimensions, seek global and local-spatial features, and optimize the KELM parameters to obtain the class labels [166]. A variant of SRC is proposed in [167], dual sparse representation graph-based collaborative propagation (DSRG-CP) that separates spatial and spectral dimensions with the respective graph to improve the labeling scheme limited samples by collaborating the outcomes. AL has been one of the hot topics so far, as it integrates with a Fredholm kernel regularized model (AMKFL) that enables better labeling than manual ones, even for noisy images [168]. It ties with DL with the augmentation of training samples to label the uncertain hypercubes (ADL-UL) accurately [169], facilitates iterative training sample augmentation by expanding the hypercubes and adds discriminative joint features (ITSA-AL-SS) [170], extracts local unique spatial multiscale characteristics from the super-pixels (MSAL) [171]. A novel idea of attention-based CNNs is proposed in [172, 173], the former (SSAtt-CNN) collides two attention subnetworks—spatial and spectral with CNN as the base, and the latter (FADCNN) is a dense spectral-spatial CNN with feedback attention technique that perfectly poses the band weights for better mining and utilization of dominant features. GAN is one the most exploited methods to date, and [174] proposes the full utilization of shallow features from the unlabeled bands through a multitasking network (MTGAN); in [175], the discriminator is based upon capsule network and convolutional long short-term memory to extricate less visible features and integrates them to build high-profile contextual characteristics (CCAPS-GAN); 1D and 2D CapsGAN together form a dual-channel spectral-spatial fusion capsule GAN (DcCaps-GAN) shown in [176]; and generative adversarial minority oversampling for 3D-hypercubes (3D-HyperGAMO) is depicted in [177] that focuses on the minor class features using existing ones to label and classify them properly.

RI 5: How are ML- and DL-based hybrid techniques helping scientists in HSI classification?

Ans: Since the dawn of the emergence of HSIs, it has suffered many hurdles in its path of analysis and information extraction. The maximum number of highly correlated bands and the high spatial-spectral features signature by the electromagnetic spectrum embedded in it are always considered a traction matter. Thus, finding an appropriate technology for the classification of such interconnected and hugely confined featured high-dimensional images is a very tedious and strenuous matter. The classification methods chosen so far have been mostly limited to supervised. The requirement of a sufficient number of quality-labeled data and unsupervised, in which the lack of coherence between the spectral clusters and the target regions, causes the failure in obtaining the desired accuracy. A semi-supervised method is needed to overcome such problems as a combination of supervised and unsupervised methods, named the hybrid method. A hybrid method is always advantageous in robustness and flexibility towards the high-dimensional data.

The hybrid methods have the following benefits:

  1. Specifically designed to overcome the limitations and take advantage of the methodologies involved in the concerned hybrid to achieve a deep, rich, and insightful conclusion (general).

  2. Addressing and resolving multiple issues regarding the handling and analyzing the HSI data, at a time, depending upon the methods that are chosen for mixing/hybridizing [179183].

  3. Coherence in time, space, and cost complexities [184186].

  4. Better interpretability, quality, effectivity leading to the construction of a more refined framework [180, 182, 183, 187194].

  5. Deterministic spectral, spatial, and contextual feature extraction, reduction, and selection, and combining them to achieve desired accuracy and performance [182, 183, 187, 188, 195197].

ML, being a standard versatile technology, can merge with traditional techniques like PCA for its benefit. As stated in [195, 198], PCA is exploited at its best for feature extraction, selection, and reduction to achieve higher accuracy and performance quality. PCA is one of the best preprocessing methods considered to date for improvised spectral dimension reduction [180], proper selection of spectral bands and their multiscale features in a segmented format [181, 199], noise-reduced spectral analysis [27], and feature extraction [130, 196]. PCA, in collaboration with SVM [195, 200], DL for feature reduction and better classification [182, 183], CNN with multiscale feature extraction [188, 189], and sparse tensor technology [190], has highly been appreciated as soulful research. All these recent time collaborations and a special honor to the merging of ICA-DCT with CNN cited in [191] are the evidence that although PCA is categorized under traditional methods, it is supremely relevant for its significant usefulness in handling HSIs.

Some other hybridizations are also explored by researchers, such as SRC with mathematical index of divergence-correlation [192], Gabor-cube filter [193], and ELM [83, 85]; ELM with CNN [86] and TL [26]; AL based on super-pixel profile [201, 202], AL with CNN [203], CapsNet [204], CNN [204, 205], and TL [151, 184]; CNN with attention-aided methodology [172, 173, 185] and GAN [186]; GAN with dynamic neighborhood majority voting mechanism [194, 197], CapsNet [175, 176, 206, 207]; and TL with MRF [70]. These articles depict the highly tenacious performance with literal mitigation of the computational complexities enforced on the raw HSI data to build a strong and enhanced model for achieving higher accuracy than ever.

RI 6: What are the latest emerging techniques associated with addressing classifying HSIs?

Ans: The following are the most recent research studies that have enlightened a new path of dealing with the purpose:

  1. DSVM: The latest and novel concept incorporates DL facilities with traditional kernel SVM. This combines four deep layers of kernels with SVM being the hidden layer units, namely, exponential and gaussian radial basis function (ERBF and GRBF), neural and polynomial [208]. This approach has outperformed several efficient DL methods with nearly 100% accuracy for IP and UP datasets.

  2. Conditional Random Fields (CRFs): These are the structured generalization of multinomial logistic regression in the form of graphical models based on a priori continuity considering the neighboring pixels of analogous spectral signatures that possess the same labels. They extensively explore the hidden spectral-contextual information. In [146], CRF incorporates with semi-supervised GAN whose trained discriminators produce softmax predictions that are guided by dense CRFs graph constraints to improve HSI classification maps. A collaboration between 3D-CNN and CRF has been proposed in [209] to make a deep CRF capable of extracting the semantic correlations between patches of hypercubes by CNN's unary and pairwise potential functions. A semi-supervised approach is depicted in [210], embedding subspace learning and 3D convolutional autoencoder to remove redundancy in joint features and obtain class sets using an iterative algorithm. In [211], CRF with Gaussian edge potentials associated with deep metric learning (DML) classifies HSI data pixelwise using the geographical distances between pixels and the Euclidean distances between the features. A novel framework using HSI feature learning network (HSINet) with CRF is proposed [212] that is a trainable end-to-end DL model with backpropagation that extracts joint features, edges, and colors based on subpixel, pixel, and super-pixels. In [213], a decision fusion model including CRF and MRF is built based on sparse unmixing and soft classifiers output.

  3. Random Forest (RF): It is an efficient algorithm that ensembles regression and classification tree. It enables the HSI classification model to be noise-tolerant, inherent in the multiclass division, robustness in parallelism, and speed. In [214], RF is compared to the DL algorithm, which outshined the classification accuracy. A new framework of cascaded RF is shown in [215] that uses the boosting strategy to generate and train base classifiers and Hierarchical Random Subspace Method to select features and suitable base classifiers based on the diversity of the features. A novel collaboration of semi-supervised learning and AL and RF is featured in [216], where the queries based on spatial information are fed to AL, and then, the labeled samples are classified by RF through semi-supervision. [217, 218] depicts a deep cube CNN model that extracts pixelwise joint features and is classified by RF.

  4. Graph Convolutional Network (GCN): A descendent of CNN, a structure designed to generalize and convert the convolution data to graph data. It consists of three steps feature aggregation, feature transformation, and classification. Being an expert in graphical modeling considers the spatial interrelations between the classes at its best. In [219], the different unique features collected from CNN and GCN are fused additive, elementwise, and concatenated way. A new framework of globally consistent GCN is introduced in [220], which first generates a spatial-spectral local optimized graph whose global high-order neighbors obtain the enriched contextual information employing the graph topological consistent connectivity; at last, those global features determine the classes. [221] shows the concept of a dual GCN network, which works with a limited number of training samples, where first extricates all the significant features and second learns label distribution. A novel idea of deep attention GCN is introduced in [222] based on similarity measurement criteria between the mixed measurement of a kernel-spectral angle mapper and spectral information divergence to accumulate analogous spectra. [223] emerges as a collaboration between CNN and GCN to extract pixel and super-pixelwise joint features by learning small-scale regular regions and large-scale irregular regions.

6. Conclusion

This article depicts the various technologies and procedures used for HSI classification since the dawn of its invention to date. There are many barriers to dealing with such high-band data as HSI mentioned above. Despite that, many researchers have taken their interest in this field to improvise the existing techniques or even invent new ones throughout the last decade. As per the considerable improvement in technologies and the introduction of ML into the classification issues of HSI, it has become more accurate than traditional and contemporary state-of-art methodologies. As a result, DL has emerged as the most eminent work tool for HSI classification for the last half of this decade. The more the researchers focused on this, the more they explored the remote sensing and space imagery features.

This review article bears the individual information for every method and their submethods about their performance, research gaps, and achievements. In addition, it appends a novel research methodology that makes this work more distinctive than others. After going through each methodology's minute details, the most significant inferences have been drawn, which add further novelty to our work. Also, it shows a path of choosing an appropriate technique and its alternatives for future researchers, hence alleviating its creativity and uniqueness, above all other contemporary review works on this subject. Also, it provides the details of the most recent research scenario on HSI classification and some of the currently developed techniques that might be acutely useful in several future research. Our study holds the uniqueness and the novelty regarding several aspects, such as the following: (1) it includes the research works carried out in the last decade, that is, 2010–2020, and the most recent papers of the previous year, i.e., 2021, and we have mentioned it in Section 3; (2) the number of papers referred here is above 200, outnumbering other review papers; (3) the review is carried out by selecting the most appropriate papers solely dedicated to our subject of interest, that is, machine learning techniques serving the purpose of hyperspectral image classification. Then, the findings from those works of literature are systematically arranged in the tabular format (Tables 112); (4) the objective behind this review work is expressed by RQ 1–6. Also, they provide a clear view of the recent technological advances and applications that the researchers are developing in recent times; (5) Table 14 provides an explicit idea of the pros and cons of each ML technique described in this manuscript when applied for classifying hyperspectral images, which will help the researchers in their future research; and (6) the researcher who wishes to write a literature review can follow our proposed methodology that depicts the flow of work in a methodical way. [224].

Table 14.

The advantages and challenges of the ML- and DL-based techniques for HSI classification.

ML/DL techniques Advantages Challenges
Support vector machine (i) Robust in terms of outliers, Hughes effect, and dimensions as its reduction is not primarily necessary [32, 41, 43] (i) It works very well for binary classification but fails for generating accurate classes for multiclass problems [31]
(ii) Supports both supervised, semi-supervised, and unsupervised problems with less overfitting risks [24, 33, 37, 44] (ii) Training time is high for high-class datasets like HSI [31, 32]
(iii) Form of a sigmoid kernel that deals better than the rest of the previous for unlabeled and unstructured HSI datasets [35, 4042] (iii) Difficulty in fine-tuning the parameters [41, 42]
(iv) The capability of solving the classification problem for both binary and multiclass problems by outperforming several methods [39] (iv) Complex interpretability [33, 35]
(v) Can improve the performance if assisted with other supporting methods [36, 4042] (v) Lack of easy generalization to the datasets having multiple classes [33, 35]
(vi) Complexity in building the model due to a lack of sufficient labeled samples [31, 32]

Sparse representation and classification (i) A dictionary with relevant data is used for learning with a minimal number of optimal parameters [45, 46] (i) Making the dictionary considers high expense overheads [50]
(ii) Builds precise and powerful classification models with higher interpretability through sparse coding [49, 50, 54] (ii) The dictionary or the coding might cause loss of information [48, 178]
(iii) Proper memory usage in an optimized manner [53, 55, 178] (iii) Difficulties in representing such high-profile with higher resolution image data like HSI through the sparse matrix [47, 48]
(iv) Reduces the estimated variance between the classes to produce better outcomes [49, 56, 178]

Markov random field (i) Works well for a wide range of unstructured problems and no direct dependency between classes and the parameters [67, 69] (i) Normalization of data might be hectic for high dimension data [63, 70]
(ii) Better denoising effect [59] (ii) Suffers from the lack of training undirected data that might not be possible to represent graphically [61, 62]
(iii) Robust for both spatial and spectral distributions [62, 64] (iii) Poor interpretability [63, 68]
(iv) Time complexity is low due to the graphical representation of data [63]

Extreme learning machines (i) Less training time and faster learning rate as compared to previous methods [86] (i) Higher computational hazard [7680]
(ii) Avoidance for local minima and finishes job in single iteration [83, 87] (ii) The wrong choice of an optimal amount of the hidden layer neurons may cause redundancy in the model and hence affect the classification accuracy [85, 86]
(iii) Advantageous for overfitting caused due to several bands in HSIs [83] (iii) There is plenty of room for advancements in the algorithm to accommodate itself to be compatible for dealing with HSI data [78, 82, 86]
(iv) Builds an enhanced model with better prediction performance at the optimized expense [86]
(v) Improved generalization ability, robustness, and controllability [78, 84, 85]

Active learning (i) A very efficient way of learning for both supervised and semi-supervised problems [91, 97, 101, 103] (i) Higher computational hazard [7680]
(ii) Ease in segregating the interclass and intraclass features through active query sets [91, 95, 102, 103] (ii) The wrong choice of an optimal amount of the hidden layer neurons may cause redundancy in the model and hence affect the classification accuracy [85, 86]
(iii) Training speed is comparatively high for not so large-scale data [103] (iii) There is plenty of room for advancements in the algorithm to accommodate itself to be compatible for dealing with HSI data [78, 82, 86]
(iv) Knowledge-based solid models can be generated [103]
(v) Achieves greater classification accuracies for unlabeled HSIs [95, 102]

Deep learning (i) Diverse, unstructured, and unlabeled raw HSI datasets are finely processed where preprocessing of the data is not needed [110, 122, 125, 144] (i) Suffers from a lack of a large amount of HSI data, which is practically unavailable [123, 136]
(ii) Possesses the capability to address supervised, semi-supervised, and specifically unsupervised learning problems [127, 128, 137] (ii) The extreme expense to generate an appropriate model by training a complex data structure like HSIs [114, 139, 148]
(iii) Expertise in dimension reduction, denoising, feature extraction as embedded properties [27, 114, 124] (iii) Low interpretability [131, 147]
(iv) Address in an illustrious manner to the issues such as Hughes phenomenon, overfitting, and convergence. [120, 124, 145] (iv) Theoretically not sound, hence incomprehensible where an error occurs and its rectification [122, 124, 145]
(v) Robust and adaptive to new features introduced in the dataset [26, 123, 145] (v) High time and space complexity and computational hazard [131, 136, 148]
(vi) The hidden layer neurons are proven to be eminent in training the desired model with a highly qualified prior knowledge (DBN, RNN, CNN) [127, 129, 135, 138]
(vii) Computational efficiency with high-performance speed (CNN, SAE) [114, 115, 127, 128]
(viii) Data augmentation facility (GAN) [143, 145]

Transfer learning (i) Works as a combination of different models, be it traditional or latest machine-lefted techniques, that together brings out a highly improved hybrid model [151, 152] (i) Data overfitting [150]
(ii) Capable of transferring knowledge from the source domain, that is , a pretrained model to the target domain, that is, a new model to make it more enriched [151, 152] (ii) Complex structure of the model [150, 151]
(iii) Greater feature extraction and selection capability [152] (iii) Less interpretability
(iv) Stable model with highly optimized parameters and hyperparameters [154] (iv) Difficulty in implementation
(v) High training speed and accuracy with low computational cost [26, 153]
(vi) Reduced computational cost and training time complexity [153, 154]

7. Limitations of Present Work and Its Future Scope

The study has some limitations: (i) we have used fewer keywords in the current research (ii) we only focused on seven popular ML techniques; (iii) we briefly explain the emerging methodologies; and (iv) the experimental details are not fully discussed.

As a future proposition, we would like to explore more keywords, more techniques, and more studies that offer a better understanding of other learning methods, both traditional and contemporary. In addition, there are several instances of hybrid strategies along with some more eminent and latest ML/DL techniques that we shall look forward to exploring in both qualitative and quantitative manner.

Acknowledgments

Jana Shafi would like to thank the Deanship of Scientific Research, Prince Sattam bin Abdul Aziz University, for supporting this work. This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (Grant no. 2022R1C1C1004590).

Acronym

HS:

Hyperspectral

HSI:

Hyperspectral image

GIS:

Geographic Information System

PCA:

Principal component analysis

ICA:

Independent component analysis

SVM:

Support vector machine

SR:

Sparse representation

SRC:

Sparse representation and classification

MRF:

Markov random field

HMRF:

Hidden Markov random field

ELM:

Extreme learning machine

AL:

Active learning

HU:

University of Houston

TL:

Transfer learning

DL:

Deep learning

AE:

Autoencoders

SAE:

Stacked autoencoders

CNN:

Convolutional neural network

RNN:

Recurrent neural network

DBN:

Deep belief network

GAN:

Generative adversarial network

IP:

Indian pines

KSC:

Kennedy space center

SV:

Salinas valley

UP:

University of Pavia.

Contributor Information

Muhammad Fazal Ijaz, Email: fazal@sejong.ac.kr.

Jaeyoung Choi, Email: jychoi19@gachon.ac.kr.

Data Availability

Publicly available data are used in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  • 1.Khan M. J., Khan H. S., Yousaf A., Khurshid K., Abbas A. Modern trends in hyperspectral image analysis: a review. IEEE Access . 2018;6:14118–14129. doi: 10.1109/access.2018.2812999. [DOI] [Google Scholar]
  • 2.Falco N., Benediktsson J. A., Bruzzone L. Spectral and spatial classification of hyperspectral images based on ICA and reduced morphological attribute profiles. IEEE Transactions on Geoscience and Remote Sensing . 2015;53(11):6223–6240. [Google Scholar]
  • 3.Adão T., Hruška J., Pádua L., et al. Hyperspectral imaging: a review on UAV-based sensors, data processing and applications for agriculture and forestry. Remote Sensing . 2017;9:p. 1110. [Google Scholar]
  • 4.Northcutt C., Jiang L., Chuang I. Confident learning: estimating uncertainty in dataset labels. https://www.jair.org/index.php/jair/article/view/12125 .
  • 5.Xu Z., Lu D., Wang Y., et al. Noisy labels are treasure: mean-teacher-assisted confident learning for hepatic vessel segmentation. Proceedings of the Medical Image Computing and Computer Assisted Intervention – MICCAI 2021; 27 September 2021; Strasbourg, France. Springer; https://link.springer.com/chapter/10.1007%2F978-3-030-87193-2_1 . [Google Scholar]
  • 6.Northcutt C. G., Wu T., Chuang I. L. Learning with confident examples: rank pruning for robust classification with noisy labels. https://arxiv.org/abs/1705.01936 .
  • 7.Ghamisi P., Yokoya N., Li J., et al. Advances in hyperspectral image and signal processing: a comprehensive overview of the state of the art. IEEE Geoscience and Remote Sensing Magazine . Dec. 2017;5(4):37–78. doi: 10.1109/mgrs.2017.2762087. [DOI] [Google Scholar]
  • 8.Han T., Goodenough D. G. Investigation of nonlinearity in hyperspectral imagery using surrogate data methods. IEEE Transactions on Geoscience and Remote Sensing . Oct. 2008;46(10):2840–2847. doi: 10.1109/tgrs.2008.2002952. [DOI] [Google Scholar]
  • 9.Beirami B. A., Mokhtarzade M. Band grouping SuperPCA for feature extraction and extended morphological profile production from hyperspectral images. IEEE Geoscience and Remote Sensing Letters (Early Access) . 2020;17:1–5. doi: 10.1109/lgrs.2019.2958833. [DOI] [Google Scholar]
  • 10.Xia J., Bombrun L., Adalı T., Berthoumieu Y., Germain C. Spectral–spatial classification of hyperspectral images using ICA and edge-preserving filter via an ensemble strategy. IEEE Transactions on Geoscience and Remote Sensing . 2016;54(8):4971–4982. [Google Scholar]
  • 11.Imani M., Ghassemian H. Principal component discriminant analysis for feature extraction and classification of hyperspectral images. Proceedings of the 2014 Iranian Conference on Intelligent Systems (ICIS); IEEE, Bam, Iran, 4 February 2014; Bam, Iran. IEEE; [DOI] [Google Scholar]
  • 12.Li W., Prasad S., Fowler J. E., Bruce L. M. Locality-preserving discriminant analysis in kernel-induced feature spaces for hyperspectral image classification. IEEE Geoscience and Remote Sensing Letters . 2011;8(5):894–898. [Google Scholar]
  • 13.Cao X., Yao J., Fu X., Bi H., Hong D. An enhanced 3-D discrete wavelet transform for hyperspectral image classification. IEEE Geoscience and Remote Sensing Letters (Early Access) . 2020;18:1–5. [Google Scholar]
  • 14.Peng J., Chen H., Zhou Y., Li L. Ideal regularized composite kernel for hyperspectral image classification. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . 2017;10(4):1563–1574. doi: 10.1109/jstars.2016.2621416. [DOI] [Google Scholar]
  • 15.Li J., Marpu P. R., Plaza A., Bioucas-Dias J. M., Benediktsson J. A. Generalized composite kernel framework for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . 2013;51(9):4816–4829. [Google Scholar]
  • 16.Liu J., Wu Z., Li J., Plaza A., Yuan Y. Probabilistic-kernel collaborative representation for spatial–spectral hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . 2016;54(4):2371–2384. doi: 10.1109/tgrs.2015.2500680. [DOI] [Google Scholar]
  • 17.Kumar M. S., Keerthi V., Anjnai R. N., Sarma M. M., Bothale V. Evalution of machine learning methods for hyperspectral image classification. Proceedings of the 2020 IEEE India Geoscience and Remote Sensing Symposium (InGARSS); 1 December 2020; Ahmedabad, India. IEEE; pp. 225–228. [Google Scholar]
  • 18.Hyperspectral remote sensing scenes. http://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes .
  • 19.Paoletti M. E., Haut J. M., Plaza J., Plaza A. Deep learning classifiers for hyperspectral imaging: a review. ISPRS Journal of Photogrammetry and Remote Sensing . 2019;158(December):279–317. doi: 10.1016/j.isprsjprs.2019.09.006. [DOI] [Google Scholar]
  • 20.Chowdhary C. L., Patel P. V., Kathrotia K. J., Attique M., Perumal K., Ijaz M. F. Analytical study of hybrid techniques for image encryption and decryption. Sensors . 2020;20(18):p. 5162. doi: 10.3390/s20185162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Jia S., Jiang S., Lin Z., Li N., Xu M., Yu S. A survey: deep learning for hyperspectral image classification with few labeled samples. Neurocomputing . 2021;448:179–204. doi: 10.1016/j.neucom.2021.03.035. [DOI] [Google Scholar]
  • 22.Quan Y., Zhong X., Feng W., Chan J. C.-W., Li Q., Xing M. SMOTE-based weighted deep rotation forest for the imbalanced hyperspectral data classification. Remote Sensing . 2021;13(3):p. 464. doi: 10.3390/rs13030464. [DOI] [Google Scholar]
  • 23.Hughes G. On the mean accuracy of statistical pattern recognizers. IEEE Transactions on Information Theory . January 1968;14(1):55–63. doi: 10.1109/tit.1968.1054102. [DOI] [Google Scholar]
  • 24.Pathak D. K., Kalita S. K. Spectral spatial feature based classification of hyperspectral image using support vector machine. Proceedings of the 2019 6th International Conference on Signal Processing and Integrated Networks (SPIN), Date of Conference; 7 March 2019; Noida, India. IEEE; [Google Scholar]
  • 25.Zhou L., Ma L. Extreme learning machine-based heterogeneous domain adaptation for classification of hyperspectral images. IEEE Geoscience and Remote Sensing Letters . 2019;16(11):1781–1785. [Google Scholar]
  • 26.Liu X., Hu Q., Cai Y., Cai Z. Extreme learning machine-based ensemble transfer learning for hyperspectral image classification. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . 13:3892–3902. [Google Scholar]
  • 27.Chen H., Miao F., Shen X. Hyperspectral remote sensing image classification with CNN based on quantum genetic-optimized sparse representation. IEEE Access . 8:99900–99909. [Google Scholar]
  • 28.Li H. C., Li S. S., Hu W. S., Feng J. H., Sun W. W., Du Q. Recurrent feedback convolutional neural network for hyperspectral image classification. IEEE Geoscience and Remote Sensing Letters . 2021;19 [Google Scholar]
  • 29.Mian Qaisar S. Signal-piloted processing and machine learning based efficient power quality disturbances recognition. PLoS One . 2021;16(5) doi: 10.1371/journal.pone.0252104.e0252104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Chamasemani F., Singh Y. P. Multi-class support vector machine (SVM) classifiers -- an application in hypothyroid detection and classification. Proceedings of the The 2011 Sixth International Conference on Bio-Inspired Computing; 27 Setember 2011; Penang, Malaysia. IEEE; pp. 351–356. [Google Scholar]
  • 31.Zhang J., Zhang Y., Zhou T. Classification of hyperspectral data using support vector machine. Proceedings of the 2001 International Conference on Image Processing (Cat. No.01CH37205); 7 October 2001; Thessaloniki, Greece. IEEE; [Google Scholar]
  • 32.Camps-Valls G., Gomez-Chova L., Calpe-Maravilla J., et al. Robust support vector method for hyperspectral data classification and knowledge discovery. IEEE Transactions on Geoscience and Remote Sensing . July 2004;42(7):1530–1542. doi: 10.1109/tgrs.2004.827262. [DOI] [Google Scholar]
  • 33.Melgani F., Bruzzone L. Classification of hyperspectral remote sensing images with support vector ma-chines. IEEE Transactions on Geoscience and Remote Sensing . 2004;42(8):1778–1790. [Google Scholar]
  • 34.Borasca B., Bruzzone L., Carlin L., Zusi M. A fuzzy-input fuzzy-output SVM technique for classification of hyperspectral remote sensing images. Proceedings of the 7th Nordic Signal Processing Symposium - NORSIG 2006; 7 June 2006; Reykjavik, Iceland. IEEE; [Google Scholar]
  • 35.Fauvel M., Benediktsson J. A., Chanussot J., Sveinsson J. R. Spectral and spatial classification of hyperspectral data using SVMs and morphological profiles. IEEE Transactions on Geoscience and Remote Sensing . 2008;46(11):3804–3814. [Google Scholar]
  • 36.Ding S., Chen L. Classification of hyperspectral remote sensing images with support vector machines and particle swarm optimization. Proceedings of the International Conference on Information Engineering and Computer Science; 19 December 2009; Wuhan, China. IEEE; [DOI] [Google Scholar]
  • 37.Du P., Tan K., Xing X. Wavelet SVM in Reproducing Kernel Hilbert Space for hyperspectral remote sensing image classification. Optics Communications . 2010;283(24):4978–4984. doi: 10.1016/j.optcom.2010.08.009. [DOI] [Google Scholar]
  • 38.Mianji F. A., Zhang Y. Semisupervised support vector machine classification for hyperspectral imagery. Proceedings of the 2011 International Conference on Communications and Signal Processing; 10 February 2011. [Google Scholar]
  • 39.Moustakidis S., Mallinis G., Koutsias N., Theocharis J. B., Petridis V. SVM-based fuzzy decision trees for classification of high spatial resolution remote sensing images. IEEE Transactions on Geoscience and Remote Sensing . Jan. 2012;50(1):149–169. doi: 10.1109/tgrs.2011.2159726. [DOI] [Google Scholar]
  • 40.Shao Z., Zhang L., Zhou X., Ding L. A novel hierarchical semisupervised SVM for classification of hyperspectral images. IEEE Geoscience and Remote Sensing Letters . 2014;11(9):1609–1613. [Google Scholar]
  • 41.Kuo B., Ho H., Li C., Hung C., Taur J. A kernel-based feature selection method for SVM with RBF kernel for hyperspectral image classification. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . Jan. 2014;7(1):317–326. doi: 10.1109/jstars.2013.2262926. [DOI] [Google Scholar]
  • 42.Peng J., Zhou Y., Chen C. L. P. Region-kernel-based support vector machines for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . 2015;53(9):4810–4824. [Google Scholar]
  • 43.Yu H., Gao L., Liao W., Zhang B., Pižurica A., Philips W. Multiscale super-pixel-level subspace-based support vector machines for hyperspectral image classification. IEEE Geoscience and Remote Sensing Letters . 2017;14(11):2142–2146. [Google Scholar]
  • 44.Zhang C., Han M., Xu M. Multi-feature classification of hyperspectral image via probabilistic SVM and guided filter. Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN); IEEE, Rio de Janeiro, Brazil, 8 July 2018; Rio de Janeiro, Brazil. IEEE; [DOI] [Google Scholar]
  • 45.Liu J., Wu Z., Wei Z., Xiao L., Sun L. Spatial-spectral kernel sparse representation for hyperspectral image classification. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . Dec. 2013;6(6):2462–2471. doi: 10.1109/jstars.2013.2252150. [DOI] [Google Scholar]
  • 46.Fang L., Li S., Kang X., Benediktsson J. A. Spectral–spatial hyperspectral image classification via multiscale Adaptive sparse representation. IEEE Transactions on Geoscience and Remote Sensing . Dec. 2014;52(12):7738–7749. doi: 10.1109/tgrs.2014.2318058. [DOI] [Google Scholar]
  • 47.Du P., Xue Z., Li J., Plaza A. Learning discriminative sparse representations for hyperspectral image classification. IEEE Journal of Selected Topics in Signal Processing . Sept. 2015;9(6):1089–1104. doi: 10.1109/jstsp.2015.2423260. [DOI] [Google Scholar]
  • 48.Fu W., Li S., Fang L., Kang X., Benediktsson J. A. Hyperspectral image classification via shape-adaptive joint sparse representation. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . Feb. 2016;9(2):556–567. doi: 10.1109/jstars.2015.2477364. [DOI] [Google Scholar]
  • 49.Fang L., Wang C., Li S., Benediktsson J. A. Hyperspectral image classification via multiple-feature-based adaptive sparse representation. IEEE Transactions on Instrumentation and Measurement . July 2017;66(7):1646–1657. doi: 10.1109/tim.2017.2664480. [DOI] [Google Scholar]
  • 50.Tu B., Huang S., Fang L., Zhang G., Wang J., Zheng B. Hyperspectral image classification via weighted joint nearest neighbor and sparse representation. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . 2018;11(11):4063–4075. [Google Scholar]
  • 51.Yang W., Peng J., Sun W., Du Q. Log-euclidean kernel-based joint sparse representation for hyperspectral image classification. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . Dec. 2019;12(12):5023–5034. doi: 10.1109/jstars.2019.2952408. [DOI] [Google Scholar]
  • 52.Dundar T., Ince T. Sparse representation-based hyperspectral image classification using multiscale super-pixels and guided filter. IEEE Geoscience and Remote Sensing Letters . Feb. 2019;16(2):246–250. doi: 10.1109/lgrs.2018.2871273. [DOI] [Google Scholar]
  • 53.Peng J., Sun W., Du Q. Self-paced joint sparse representation for the classification of hyperspectral images. IEEE Transactions on Geoscience and Remote Sensing . Feb. 2019;57(2):1183–1194. doi: 10.1109/tgrs.2018.2865102. [DOI] [Google Scholar]
  • 54.Peng J., Li L., Tang Y. Y. Maximum likelihood estimation-based joint sparse representation for the classification of hyperspectral remote sensing images. IEEE Transactions on Neural Networks and Learning Systems . June 2019;30(6):1790–1802. doi: 10.1109/TNNLS.2018.2874432. [DOI] [PubMed] [Google Scholar]
  • 55.Yu H., Gao L., Liao W., et al. Global spatial and local spectral similarity-based manifold learning group sparse representation for hyperspectral imagery classification. IEEE Transactions on Geoscience and Remote Sensing . May 2020;58(5):3043–3056. doi: 10.1109/tgrs.2019.2947032. [DOI] [Google Scholar]
  • 56.Luo F., Zhang L., Zhou X., Guo T., Cheng Y., Yin T. Sparse-adaptive hypergraph discriminant analysis for hyperspectral image classification. IEEE Geoscience and Remote Sensing Letters . June 2020;17(6):1082–1086. doi: 10.1109/lgrs.2019.2936652. [DOI] [Google Scholar]
  • 57.Markov random field. https://en.wikipedia.org/wiki/Markov_random_field .
  • 58.Altalib G., Ahmed E. Land cover classification using hidden Markov models. International Journal of Computer Networks and Communications Security . 2013;1:165–172. [Google Scholar]
  • 59.Zhang B., Li S., Jia X., Gao L., Peng M. Adaptive Markov random field approach for classification of hyperspectral imagery. IEEE Geoscience and Remote Sensing Letters . 2011;8(5):973–977. [Google Scholar]
  • 60.Ghamisi P., Benediktsson J. A., Ulfarsson M. O. Spectral–spatial classification of hyperspectral images based on hidden Markov random fields. IEEE Transactions on Geoscience and Remote Sensing . May 2014;52(5):2565–2574. doi: 10.1109/tgrs.2013.2263282. [DOI] [Google Scholar]
  • 61.Xu L., Li J. Bayesian classification of hyperspectral imagery based on probabilistic sparse representation and Markov random field. IEEE Geoscience and Remote Sensing Letters . April 2014;11(4):823–827. doi: 10.1109/lgrs.2013.2279395. [DOI] [Google Scholar]
  • 62.Li W., Prasad S., Fowler J. E. Hyperspectral image classification using Gaussian mixture models and Markov random fields. IEEE Geoscience and Remote Sensing Letters . Jan. 2014;11(1):153–157. doi: 10.1109/lgrs.2013.2250905. [DOI] [Google Scholar]
  • 63.Sun L., Wu Z., Liu J., Xiao L., Wei Z. Supervised spectral–spatial hyperspectral image classification with weighted Markov random fields. IEEE Transactions on Geoscience and Remote Sensing . March 2015;53(3):1490–1503. doi: 10.1109/tgrs.2014.2344442. [DOI] [Google Scholar]
  • 64.Yuan Y., Lin J., Wang Q. Hyperspectral image classification via multitask joint sparse representation and stepwise MRF optimization. IEEE Transactions on Cybernetics . Dec. 2016;46(12):2966–2977. doi: 10.1109/TCYB.2015.2484324. [DOI] [PubMed] [Google Scholar]
  • 65.Golipour M., Ghassemian H., Mirzapour F. Integrating hierarchical segmentation maps with MRF prior for classification of hyperspectral images in a bayesian framework. IEEE Transactions on Geoscience and Remote Sensing . Feb. 2016;54(2):805–816. doi: 10.1109/tgrs.2015.2466657. [DOI] [Google Scholar]
  • 66.Ghasrodashti E. K., Helfroush M. S., Danyali H. Sparse-based classification of hyper-spectral images using extended hidden Markov random fields. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . 2018;11(11):4101–4112. [Google Scholar]
  • 67.Fang Y., Xu L., Peng J., Yang H., Wong A., Clausi D. A. Unsupervised bayesian classification of a hyperspectral image based on the spectral mixture model and Markov random field. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . 2018;11(9):3325–3337. [Google Scholar]
  • 68.Pan C., Gao X., Wang Y., Li J. Markov random fields integrating adaptive interclass-pair penalty and spectral similarity for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . May 2019;57(5):2520–2534. doi: 10.1109/tgrs.2018.2874077. [DOI] [Google Scholar]
  • 69.Cao X., Wang X., Wang D., Zhao J., Jiao L. Spectral–spatial hyperspectral image classification using cascaded Markov random fields. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . Dec. 2019;12(12):4861–4872. doi: 10.1109/jstars.2019.2938208. [DOI] [Google Scholar]
  • 70.Jiang X., Zhang Y., Li Y., Li S., Zhang Y. Hyperspectral image classification with transfer learning and Markov random fields. IEEE Geoscience and Remote Sensing Letters . March 2020;17(3):544–548. doi: 10.1109/lgrs.2019.2923647. [DOI] [Google Scholar]
  • 71.Jiang X., Zhang Y., Liu W., et al. Hyperspectral image classification with CapsNet and Markov random fields. IEEE Access . 8:191956–191968. [Google Scholar]
  • 72.Samat A., Du P., Liu S., Li J., Cheng L. E2LMs: ensemble extreme learning machines for hyperspectral image classification. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . April 2014;7(4):1060–1069. doi: 10.1109/jstars.2014.2301775. [DOI] [Google Scholar]
  • 73.Ding S., Zhao H., Zhang Y., Xu X., Nie R. Extreme learning machine: algorithm, theory and applications. Artificial Intelligence Review . 2015;44(1):103–115. doi: 10.1007/s10462-013-9405-z. [DOI] [Google Scholar]
  • 74.A multiple hidden layers extreme learning machine method and its application. https://www.hindawi.com/journals/mpe/2017/4670187/
  • 75.Zhou Y., Peng J., Chen C. L. P. Extreme learning machine with composite kernels for hyperspectral image classification. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . June 2015;8(6):2351–2360. doi: 10.1109/jstars.2014.2359965. [DOI] [Google Scholar]
  • 76.Li W., Chen C., Su H., Du Q. Local binary patterns and extreme learning machine for hyperspectral imagery classification. IEEE Transactions on Geoscience and Remote Sensing . July 2015;53(7):3681–3693. doi: 10.1109/tgrs.2014.2381602. [DOI] [Google Scholar]
  • 77.Lv Q., Niu X., Dou Y., Xu J., Lei Y. Classification of hyperspectral remote sensing image using hierarchical local-receptive-field-based extreme learning machine. IEEE Geoscience and Remote Sensing Letters . March 2016;13(3):434–438. doi: 10.1109/lgrs.2016.2517178. [DOI] [Google Scholar]
  • 78.Su H., Cai Y., Du Q. Firefly-algorithm-inspired framework with band selection and extreme learning machine for hyperspectral image classification. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . Jan. 2017;10(1):309–320. doi: 10.1109/jstars.2016.2591004. [DOI] [Google Scholar]
  • 79.Shen Y., Chen J., Xiao L. Supervised classification of hyperspectral images using local-receptive-fields-based kernel extreme learning machine. Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP); IEEE, Beijing, China, 17–20 September 2017; Beijing, China. IEEE; [DOI] [Google Scholar]
  • 80.Ku J., Zheng B. Distributed extreme learning machine with kernels based on MapReduce for spectral-spatial classification of hyperspectral image. Proceedings of the 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC); IEEE, Guangzhou, China, 21 July 2017; Guangzhou, China. IEEE; [DOI] [Google Scholar]
  • 81.Cao F., Yang Z., Jiang M., Chen W., Ye Q., Ling W. Spectral-spatial classification of hyperspectral image using extreme learning machine and loopy Belief propagation. Proceedings of the 2017 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData); IEEE, Exeter, UK, 21 June 2017; Exeter, UK. IEEE; [DOI] [Google Scholar]
  • 82.Shang W., Wu Z., Xu Y., Zhang Y., Wei Z. Hyperspectral supervised classification using mean filtering based kernel extreme learning machine. Proceedings of the 2018 Fifth International Workshop on Earth Observation and Remote Sensing Applications (EORSA); 18 June 2018; Xi’an, China. IEEE; [Google Scholar]
  • 83.Cao F., Yang Z., Ren J., et al. Sparse representation-based augmented multinomial logistic extreme learning machine with weighted composite features for spectral–spatial classification of hyperspectral images. IEEE Transactions on Geoscience and Remote Sensing . 2018;56(11):6263–6279. [Google Scholar]
  • 84.Jiang M., Cao F., Lu Y. Extreme learning machine with enhanced composite feature for spectral-spatial hyperspectral image classification. IEEE Access . 6:22645–22654. [Google Scholar]
  • 85.Cao F., Yang Z., Ren J., Chen W., Han G., Shen Y. Local block multilayer sparse extreme learning machine for effective feature extraction and classification of hyperspectral images. IEEE Transactions on Geoscience and Remote Sensing . 2019;57(8):5580–5594. [Google Scholar]
  • 86.Shen Y., Xiao L., Chen J., Pan D. A spectral-spatial domain-specific convolutional deep extreme learning machine for supervised hyperspectral image classification. IEEE Access . 7:132240–132252. [Google Scholar]
  • 87.Yin Y., Wei L. Hyperspectral image classification using comprehensive evaluation model of extreme learning machine based on cumulative variation weights. IEEE Access . 188003;8:p. 187991. [Google Scholar]
  • 88.Introduction to active learning. https://towardsdatascience.com/introduction-to-active-learning-117e0740d7cc .
  • 89.Active learning machine learning: what it is and how it works. https://algorithmia.com/blog/active-learning-machine-learning .
  • 90.Rajan S., Ghosh J., Crawford M. M. An active learning approach to hyperspectral data classification. IEEE Transactions on Geoscience and Remote Sensing . April 2008;46(4):1231–1242. doi: 10.1109/tgrs.2007.910220. [DOI] [Google Scholar]
  • 91.Li J., Bioucas-Dias J. M., Plaza A. Semisupervised hyperspectral image segmentation using multinomial logistic regression with active learning. IEEE Transactions on Geoscience and Remote Sensing . 2010;48(11):4085–4098. [Google Scholar]
  • 92.Li J., Bioucas-Dias J. M., Plaza A. Spectral–spatial classification of hyperspectral data using loopy Belief propagation and active learning. IEEE Transactions on Geoscience and Remote Sensing . Feb. 2013;51(2):844–856. doi: 10.1109/tgrs.2012.2205263. [DOI] [Google Scholar]
  • 93.Sun S., Zhong P., Xiao H., Wang R. An MRF model-based active learning framework for the spectral-spatial classification of hyperspectral imagery. IEEE Journal of Selected Topics in Signal Processing . Sept. 2015;9(6):1074–1088. doi: 10.1109/jstsp.2015.2414401. [DOI] [Google Scholar]
  • 94.Sun S., Zhong P., Xiao H., Wang R. Active learning with Gaussian process classifier for hyper-spectral image classification. IEEE Transactions on Geoscience and Remote Sensing . April 2015;53(4):1746–1760. doi: 10.1109/tgrs.2014.2347343. [DOI] [Google Scholar]
  • 95.Zhang Z., Pasolli E., Crawford M. M., Tilton J. C. An active learning framework for hyperspectral image classification using hierarchical segmentation. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . Feb. 2016;9(2):640–654. doi: 10.1109/jstars.2015.2493887. [DOI] [Google Scholar]
  • 96.Zhou X., Prasad S., Crawford M. M. Wavelet-domain multiview active learning for spatial-spectral hyperspectral image classification. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . 2016;9(9):4047–4059. [Google Scholar]
  • 97.Wang Z., Du B., Zhang L., Zhang L., Jia X. A novel semisupervised active-learning algorithm for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . June 2017;55(6):3071–3083. doi: 10.1109/tgrs.2017.2650938. [DOI] [Google Scholar]
  • 98.Patra S., Bhardwaj K., Bruzzone L. A spectral-spatial multicriteria active learning technique for hyperspectral image classification. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . Dec. 2017;10(12):5213–5227. doi: 10.1109/jstars.2017.2747600. [DOI] [Google Scholar]
  • 99.Liu C., He L., Li Z., Li J. Feature-Driven active learning for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . Jan. 2018;56(1):341–354. doi: 10.1109/tgrs.2017.2747862. [DOI] [Google Scholar]
  • 100.Xu X., Li J., Li S. Multiview intensity-based active learning for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . Feb. 2018;56(2):669–680. doi: 10.1109/tgrs.2017.2752738. [DOI] [Google Scholar]
  • 101.Liu C., Li J., He L. Super-pixel-Based semisupervised active learning for hyperspectral image classification. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . Jan. 2019;12(1):357–370. [Google Scholar]
  • 102.Zhang Z., Pasolli E., Crawford M. M. An adaptive multiview active learning approach for spectral–spatial classification of hyperspectral images. IEEE Transactions on Geoscience and Remote Sensing . April 2020;58(4):2557–2570. doi: 10.1109/tgrs.2019.2952319. [DOI] [Google Scholar]
  • 103.Mu C., Liu J., Liu Y., Liu Y. Hyperspectral image classification based on active learning and spectral-spatial feature fusion using spatial coordinates. IEEE Access . 03 January 2020;8:6768–6781. doi: 10.1109/access.2019.2963624. [DOI] [Google Scholar]
  • 104.Li S., Song W., Fang L., Chen Y., Ghamisi P., Benediktsson J. A. Deep learning for hyperspectral image classification: an overview. IEEE Transactions on Geoscience and Remote Sensing . 2019;57(9):6690–6709. [Google Scholar]
  • 105.What is deep learning and how does it work. https://towardsdatascience.com/what-is-deep-learning-and-how-does-it-work-f7d02aa9d477 .
  • 106.Subasi A., Mian Qaisar S. The ensemble machine learning-based classification of motor imagery tasks in brain-computer interface. Journal of Healthcare Engineering . 2021;2021:12. doi: 10.1155/2021/1970769.1970769 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Alsinglawi B., Alshari O., Alorjani M., et al. An explainable machine learning framework for lung cancer hospital length of stay prediction. Scientific Reports . 2022;12(1):1–10. doi: 10.1038/s41598-021-04608-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Alsinglawi B., Alnajjar F., Mubin O., Novoa M., Karajeh O., Darwish O. Benchmarking predictive models in electronic health records: sepsis length of stay prediction. Proceedings of the International Conference on Advanced Information Networking and Applications; 15 April 2020; Caserta, Italy. Springer; pp. 258–267. https://link.springer.com/chapter/10.1007/978-3-030-44041-1_24 . [DOI] [Google Scholar]
  • 109.Srinivasu P. N., SivaSai J. G., Ijaz M. F., Bhoi A. K., Kim W., Kang J. J. Classification of skin disease using deep learning neural networks with MobileNet V2. Sensors . 21(8):p. 2852. doi: 10.3390/s21082852. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Lin Z., Chen Y., Zhao X., Wang G. Spectral-spatial classification of hyperspectral image using autoencoders. Proceedings of the 2013 9th International Conference on Information, Communications & Signal Processing; IEEE, Tainan, 10 December 2013; Tainan. IEEE; [DOI] [Google Scholar]
  • 111.Liu G., Bao H., Han B. A stacked autoencoder-based deep neural network for achieving gearbox fault diagnosis. Advancements in Mathematical Methods for Pattern Recognition and its Applications . 2018;2018:10. doi: 10.1155/2018/5105709.5105709 [DOI] [Google Scholar]
  • 112.Tutorial A. A beginner’s guide to autoencoders. https://www.edureka.co/blog/autoencoders-tutorial/
  • 113.Chen Y., Lin Z., Zhao X., Wang G., Gu Y. Deep learning-based classification of hyperspectral data. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . June 2014;7(6):2094–2107. doi: 10.1109/jstars.2014.2329330. [DOI] [Google Scholar]
  • 114.Ma X., Wang H., Geng J. Spectral–spatial classification of hyperspectral image based on deep auto-encoder. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . Sept. 2016;9(9):4073–4085. doi: 10.1109/jstars.2016.2517204. [DOI] [Google Scholar]
  • 115.Zhou P., Han J., Cheng G., Zhang B. Learning compact and discriminative stacked autoencoder for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . July 2019;57(7):4823–4833. doi: 10.1109/tgrs.2019.2893180. [DOI] [Google Scholar]
  • 116.Madani H., McIsaac K. Distance transform-based spectral-spatial feature vector for hyperspectral image classification with stacked autoencoder. Remote Sensing . 2021;13:p. 1732. doi: 10.3390/rs13091732. [DOI] [Google Scholar]
  • 117.Layers of a convolutional neural network. https://wiki.tum.de/display/lfdv/Layers+of+a+Convolutional+Neural+Network .
  • 118.Architecture B. C. N. N. Explaining 5 layers of convolutional neural network. https://www.upgrad.com/blog/basic-cnn-architecture/
  • 119.A survey of the recent architectures of deep convolutional neural networks. https://arxiv.org/ftp/arxiv/papers/1901/1901.06032.pdf .
  • 120.Makantasis K., Karantzalos K., Doulamis A., Doulamis N. Deep supervised learning for hyperspectral data classification through convolutional neural networks. Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Date of Conference; IEEE, Milan, Italy, 26 July 2015; Milan, Italy. IEEE; [DOI] [Google Scholar]
  • 121.Chen Y., Jiang H., Li C., Jia X., Ghamisi P. Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Transactions on Geoscience and Remote Sensing . Oct. 2016;54(10):6232–6251. doi: 10.1109/tgrs.2016.2584107. [DOI] [Google Scholar]
  • 122.Zhao W., Du S. Spectral–spatial feature extraction for hyperspectral image classification: a dimension reduction and deep learning approach. IEEE Transactions on Geoscience and Remote Sensing . 2016;54(8):4544–4554. [Google Scholar]
  • 123.Cao J., Chen Z., Wang B. Deep convolutional networks with super-pixel segmentation for hyperspectral image classification. Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS); 10 July 2016; [DOI] [Google Scholar]
  • 124.Li W., Wu G., Zhang F., Du Q. Hyperspectral image classification using deep pixel-pair features. IEEE Transactions on Geoscience and Remote Sensing . Feb. 2017;55(2):844–853. doi: 10.1109/tgrs.2016.2616355. [DOI] [Google Scholar]
  • 125.He M., Li B., Chen H. Multi-scale 3d deep convolutional neural network for hyper-spectral image classification. Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP); IEEE, Beijing, China, 17 September 2017; Beijing, China. IEEE; [DOI] [Google Scholar]
  • 126.Yang X., Ye Y., Li X., Lau R. Y. K., Zhang X., Huang X. Hyperspectral image classification with deep learning models. IEEE Transactions on Geoscience and Remote Sensing . 2018;56(9):5408–5423. [Google Scholar]
  • 127.Zhang H., Li Y., Jiang Y., Wang P., Shen Q., Shen C. Hyperspectral classification based on lightweight 3-D-CNN with transfer learning. IEEE Transactions on Geoscience and Remote Sensing . 2019;57(8):5813–5828. [Google Scholar]
  • 128.Roy S. K., Krishna G., Dubey S. R., Chaudhuri B. B. HybridSN: exploring 3-D–2-D CNN feature hierarchy for hyperspectral image classification. IEEE Geoscience and Remote Sensing Letters . Feb. 2020;17(2):277–281. doi: 10.1109/lgrs.2019.2918719. [DOI] [Google Scholar]
  • 129.Hu W., Li H., Pan L., Li W., Tao R., Du Q. Spatial–spectral feature extraction via deep con-vLSTM neural networks for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . June 2020;58(6):4237–4250. doi: 10.1109/tgrs.2019.2961947. [DOI] [Google Scholar]
  • 130.Paoletti M. E., Haut J. M., Roy S. K., Hendrix E. M. T. Rotation equivariant convolutional neural networks for hyperspectral image classification. IEEE Access . 2020;8:179575–179591. [Google Scholar]
  • 131.Zhang X., Wang Y., Zhang N., et al. Spectral-spatial three-dimensional convolutional neural network for hyperspectral image classification. IEEE Access . 2020;8:127167–127180. [Google Scholar]
  • 132.Mou L., Ghamisi P., Zhu X. X. Deep recurrent neural networks for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . July 2017;55(7):3639–3655. doi: 10.1109/tgrs.2016.2636241. [DOI] [Google Scholar]
  • 133.Types of RNN (recurrent neural network) https://iq.opengenus.org/types-of-rnn/
  • 134.5 types of LSTM recurrent neural networks and what to do with them. https://www.exxactcorp.com/blog/Deep-Learning/5-types-of-lstm-recurrent-neural-networks-and-what-to-do-with-them .
  • 135.Hang R., Liu Q., Hong D., Ghamisi P. Cascaded recurrent neural networks for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . 2019;57(8):5384–5394. [Google Scholar]
  • 136.Hao S., Wang W., Salzmann M. Geometry-aware deep recurrent neural networks for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, Early Access . 2020;59:1–13. [Google Scholar]
  • 137.Chen Y., Zhao X., Jia X. Spectral–spatial classification of hyperspectral data based on deep Belief network. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . June 2015;8(6):2381–2392. doi: 10.1109/jstars.2015.2388577. [DOI] [Google Scholar]
  • 138.Mughees A., Tao L. Multiple deep-belief-network-based spectral-spatial classification of hyperspectral images. Tsinghua Science and Technology . April 2019;24(2):183–194. doi: 10.26599/tst.2018.9010043. [DOI] [Google Scholar]
  • 139.Chen C., Ma Y., Ren G. Hyperspectral classification using deep Belief networks based on conjugate gradient update and pixel-centric spectral block features. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . 2020;13:4060–4069. doi: 10.1109/jstars.2020.3008825. [DOI] [Google Scholar]
  • 140.Zhan Y., Hu D., Wang Y., Yu X. Semisupervised hyperspectral image classification based on generative adversarial networks. IEEE Geoscience and Remote Sensing Letters . Feb. 2018;15(2):212–216. doi: 10.1109/lgrs.2017.2780890. [DOI] [Google Scholar]
  • 141.The math behind GANs (generative adversarial networks) https://towardsdatascience.com/the-math-behind-gans-generative-adversarial-networks-3828f3469d9c .
  • 142.Introduction to generative adversarial networks (GANs): types, and applications, and implementation. https://heartbeat.fritz.ai/introduction-to-generative-adversarial-networks-gans-35ef44f21193 .
  • 143.Zhu L., Chen Y., Ghamisi P., Benediktsson J. A. Generative adversarial networks for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . 2018;56(9):5046–5063. [Google Scholar]
  • 144.Wang H., Tao C., Qi J., Li H., Tang Y. Semi-supervised variational generative adversarial networks for hyperspectral image classification. Proceedings of the IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium; IEEE, Yokohama, Japan, 28 July 2019; Yokohama, Japan. IEEE; [DOI] [Google Scholar]
  • 145.Tao C., Wang H., Qi J., Li H. Semisupervised variational generative adversarial networks for hyperspectral image classification. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . Feb. 2020;13:914–927. doi: 10.1109/jstars.2020.2974577. [DOI] [Google Scholar]
  • 146.Zhon Z., Li J., Clausi D. A., Wong A. Generative adversarial networks and conditional random fields for hyperspectral image classification. IEEE Transactions on Cybernetics . July 2020;50(7):3318–3329. doi: 10.1109/tcyb.2019.2915094. [DOI] [PubMed] [Google Scholar]
  • 147.Liang H., Bao W., Shen X. Adaptive weighting feature fusion approach based on generative adversarial network for hyperspectral image classification. Remote Sensing . 2021;13:p. 198. doi: 10.3390/rs13020198. [DOI] [Google Scholar]
  • 148.Li Z., Zhu X., Xin Z., Guo F., Cui X., Wang L. Variational generative adversarial network with crossed spatial and spectral interactions for hyperspectral image classification. Remote Sensing . 2021;13:p. 3131. doi: 10.3390/rs13163131. [DOI] [Google Scholar]
  • 149.Tsiakmaki M., Kostopoulos G., Kotsiantis S., Ragos O. Transfer learning from deep neural networks for predicting student performance. Applied Sciences . 2020;10(6):p. 2145. doi: 10.3390/app10062145. [DOI] [Google Scholar]
  • 150.Lin J., Ward R., Wang Z. J. Deep transfer learning for Hyperspectral Image classification. Proceedings of the 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP); 29 August 2018; Yokohama, Japan. IEEE; [Google Scholar]
  • 151.Deng C., Xue Y., Liu X., Li C., Tao D. Active transfer learning network: a unified deep joint spectral–spatial feature learning model for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . March 2019;57(3):1741–1754. doi: 10.1109/tgrs.2018.2868851. [DOI] [Google Scholar]
  • 152.He X., Chen Y., Ghamisi P. Heterogeneous transfer learning for hyperspectral image classification based on convolutional neural network. IEEE Transactions on Geoscience and Remote Sensing . May 2020;58(5):3246–3263. doi: 10.1109/tgrs.2019.2951445. [DOI] [Google Scholar]
  • 153.Liu Y., Gao L., Xiao C., Qu Y., Zheng K., Marinoni A. Hyperspectral image classification based on a shuffled group convolutional neural network with transfer learning. Remote Sensing . 2020;12:p. 1780. doi: 10.3390/rs12111780. [DOI] [Google Scholar]
  • 154.Xie F., Gao Q., Jin C., Zhao F. Hyperspectral image classification based on superpixel pooling convolutional neural network with transfer learning. Remote Sensing . 2021;13:p. 930. doi: 10.3390/rs13050930. [DOI] [Google Scholar]
  • 155.Kumar Y., Koul A., Singla R., Ijaz M. F. Artificial intelligence in disease diagnosis: a systematic literature review, synthesizing framework and future research agenda. Journal of Ambient Intelligence and Humanized Computing . 2022;13:1–28. doi: 10.1007/s12652-021-03612-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 156.Data-driven cervical cancer prediction model with outlier detection and over-sampling methods. MF Ijaz, M Attique, Y Son. Sensors . 2020;20(10):2809–76. doi: 10.3390/s20102809. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 157.Mura M. D., Villa A., Benediktsson J. A., Chanussot J., Bruzzone L. Classification of hyperspectral images by using extended morphological attribute profiles and independent component analysis. IEEE Geoscience and Remote Sensing Letters . May 2011;8(3):542–546. doi: 10.1109/lgrs.2010.2091253. [DOI] [Google Scholar]
  • 158.Xia J., Chanussot J., Du P., He X. (Semi-) supervised probabilistic principal component analysis for hyperspectral remote sensing image classification. Proceedings of the 2012 4th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS); 4 June 2012; IEEE; [Google Scholar]
  • 159.Ren Y., Liao L., Maybank S. J., Zhang Y., Liu X. Hyperspectral image spectral-spatial feature extraction via tensor principal component analysis. IEEE Geoscience and Remote Sensing Letters . 2017;14(9):1431–1435. [Google Scholar]
  • 160.Kutluk S., Kayabol K., Akan A. Classification of hyperspectral images using mixture of probabilistic PCA models. Proceedings of the 2016 24th European Signal Processing Conference (EUSIPCO); IEEE, Budapest, Hungary, 29 August 2016; Budapest, Hungary. IEEE; [DOI] [Google Scholar]
  • 161.Kang X., Xiang X., Li S., Benediktsson J. A. PCA-based edge-preserving features for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . Dec. 2017;55(12):7140–7151. doi: 10.1109/tgrs.2017.2743102. [DOI] [Google Scholar]
  • 162.Chiang S., Chang C., Ginsberg I. W. Unsupervised hyperspectral image analysis using independent component analysis. Proceedings of the IGARSS 2000. IEEE 2000 International Geoscience and Remote Sensing Symposium. Taking the Pulse of the Planet: The Role of Remote Sensing in Managing the Environment. Proceedings (Cat. No.00CH37120); 24 July 2000; Honolulu, HI, USA. IEEE; [Google Scholar]
  • 163.Villa A., Benediktsson J. A., Chanussot J., Jutten C. Independent component discriminant analysis for hyperspectral image classification. Proceedings of the 2010 2nd Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing; 14–16 June 2010; Reykjavik, Iceland. IEEE; [Google Scholar]
  • 164.Villa A., Benediktsson J. A., Chanussot J., Jutten C. Hyperspectral image classification with independent component discriminant analysis. IEEE Transactions on Geoscience and Remote Sensing . Dec. 2011;49(12):4865–4876. doi: 10.1109/tgrs.2011.2153861. [DOI] [Google Scholar]
  • 165.Chen Y., Xu L., Fang Y., et al. Unsupervised bayesian subpixel mapping of hyperspectral imagery based on band-weighted discrete spectral mixture model and Markov random field. IEEE Geoscience and Remote Sensing Letters . Jan. 2021;18(1):162–166. doi: 10.1109/lgrs.2020.2967104. [DOI] [Google Scholar]
  • 166.Chen H., Miao F., Chen Y., Xiong Y., Chen T. A hyperspectral image classification method using multifeature vectors and optimized KELM. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . 2021;14:2781–2795. doi: 10.1109/jstars.2021.3059451. [DOI] [Google Scholar]
  • 167.Saboori A., Ghassemian H., Razzazi F. Active multiple kernel Fredholm learning for hyperspectral images classification. IEEE Geoscience and Remote Sensing Letters . Feb. 2021;18(2):356–360. doi: 10.1109/lgrs.2020.2969970. [DOI] [Google Scholar]
  • 168.Zhang Y., Cao G., Wang B., Li X., Amoako P. Y. O., Shafique A. Dual sparse representation graph-based copropagation for semisupervised hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . 2021;60 [Google Scholar]
  • 169.Lei Z., Zeng Y., Liu P., Su X. Active deep learning for hyperspectral image classification with uncertainty learning. IEEE Geoscience and Remote Sensing Letters . 2021;19 [Google Scholar]
  • 170.Ma K. Y., Chang C. I. Iterative training sampling coupled with active learning for semisupervised spectral-spatial hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . 2021;59 [Google Scholar]
  • 171.Lu Q., Wei L. Multiscale superpixel-based active learning for hyperspectral image classification. IEEE GeoScience and Remote Sensing Letters . 2021;19 [Google Scholar]
  • 172.Hang R., Li Z., Liu Q., Ghamisi P., Bhattacharyya S. S. Hyperspectral image classification with attention-aided CNNs. IEEE Transactions on Geoscience and Remote Sensing . March 2021;59(3):2281–2293. doi: 10.1109/tgrs.2020.3007921. [DOI] [Google Scholar]
  • 173.Yu C., Han R., Song M., Liu C., Chang C. I. Feedback attention-based dense CNN for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . 2021;60 [Google Scholar]
  • 174.Hang R., Zhou F., Liu Q., Ghamisi P. Classification of hyperspectral images via multitask generative adversarial networks. IEEE Transactions on Geoscience and Remote Sensing . Feb. 2021;59(2):1424–1436. doi: 10.1109/tgrs.2020.3003341. [DOI] [Google Scholar]
  • 175.Wang W. Y., Li H. C., Deng Y. J., Shao L. Y., Lu X. Q., Du Q. Generative adversarial capsule network with ConvLSTM for hyperspectral image classification. IEEE Geoscience and Remote Sensing Letters . March 2021;18(3):523–527. doi: 10.1109/lgrs.2020.2976482. [DOI] [Google Scholar]
  • 176.Wang J., Guo S., Huang R., Li L., Zhang X., Jiao L. dual-channel capsule generation adversarial network for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . 2021;60 [Google Scholar]
  • 177.Roy S. K., Haut J. M., Paoletti M. E., Dubey S. R., Plaza A. Generative adversarial minority oversampling for spectral-spatial hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . 2021;60 [Google Scholar]
  • 178.Fang L., Li S., Kang X., Benediktsson J. A. Spectral–spatial classification of hyperspectral images with a super-pixel-based discriminative sparse model. IEEE Transactions on Geoscience and Remote Sensing . 2015;53(8):4186–4201. [Google Scholar]
  • 179.Ma P., Ren J., Zhao H., Sun G., Murray P., Zheng J. Multiscale 2-D singular spectrum analysis and principal component analysis for spatial–spectral noise-robust feature extraction and classification of hyperspectral images. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . 2021;14:1233–1245. doi: 10.1109/jstars.2020.3040699. [DOI] [Google Scholar]
  • 180.Arsa D. M. S., Sanabila H. R., Rachmadi M. F., Gamal A., Jatmiko W. Improving principal component analysis performance for reducing spectral dimension in hyperspectral image classification. Proceedings of the 2018 International Workshop on Big Data and Information Security (IWBIS); 12 May 2018; Jakarta, Indonesia. IEEE; pp. 123–128. [DOI] [Google Scholar]
  • 181.Baisantry M., Sao A. K. Band selection using segmented PCA and component loadings for hyperspectral image classification. Proceedings of the IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium; 28 July 2019; Yokohama, Japan. IEEE; pp. 3812–3815. [Google Scholar]
  • 182.Hossain M. M., Hossain M. A. Feature reduction and classification of hyperspectral image based on multiple kernel PCA and deep learning. Proceedings of the 2019 IEEE International Conference on Robotics, Automation, Artificial-intelligence and Internet-of-Things (RAAICON); 29 November 2019; Dhaka, Bangladesh. IEEE; pp. 141–144. [DOI] [Google Scholar]
  • 183.Ruiz D., Bacca B., Caicedo E. Hyperspectral images classification based on inception network and kernel PCA. IEEE Latin America Transactions . December 2019;17(12):1995–2004. doi: 10.1109/tla.2019.9011544. [DOI] [Google Scholar]
  • 184.Lin J., Zhao L., Li S., Ward R., Wang Z. J. Active-learning-incorporated deep transfer learning for hyperspectral image classification. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . Nov. 2018;11(11):4048–4062. doi: 10.1109/jstars.2018.2874225. [DOI] [Google Scholar]
  • 185.Lin J., Mou L., Zhu X. X., Ji X., Wang Z. J. Attention-aware pseudo-3-D convolutional neural network for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . 2021;59 [Google Scholar]
  • 186.Neagoe V., Diaconescu P. CNN hyperspectral image classification using training sample augmentation with generative adversarial networks. .; Proceedings of the 2020 13th International Conference on Communications (COMM); 18 June 2020; Bucharest, Romania. IEEE; pp. 515–519. [Google Scholar]
  • 187.Zhang X., Jiang X., Jiang J., Zhang Y., Liu X., Cai Z. Spectral-spatial and superpixelwise PCA for unsupervised feature extraction of hyperspectral imagery. IEEE Transactions on Geoscience and Remote Sensing . 2021;60 [Google Scholar]
  • 188.Abbasi A. N., He M. Convolutional neural network with PCA and batch normalization for hyperspectral image classification. Proceedings of the IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium; 28 July 2019; Yokohama, Japan. IEEE; pp. 959–962. [Google Scholar]
  • 189.Haque M. R., Mishu S. Z. Spectral-spatial feature extraction using PCA and multi-scale deep convolutional neural network for hyperspectral image classification. Proceedings of the 2019 22nd International Conference on Computer and Information Technology (ICCIT); 18 December 2019; Dhaka, Bangladesh. IEEE; pp. 1–6. [DOI] [Google Scholar]
  • 190.Sun W., Yang G., Peng J., Du Q. Lateral-slice sparse tensor robust principal component analysis for hyperspectral image classification. IEEE Geoscience and Remote Sensing Letters . Jan. 2020;17(1):107–111. doi: 10.1109/lgrs.2019.2915315. [DOI] [Google Scholar]
  • 191.Abbasi A. N., He M. CNN with ICA-PCA-DCT joint preprocessing for hyperspectral image classification. Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC); 18 November 2019; Lanzhou, China. IEEE; pp. 595–600. [Google Scholar]
  • 192.Baisantry M., Sao A. K., Shukla D. P. Band selection using combined divergence–correlation index and sparse loadings representation for hyperspectral image classification. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . 2020;13:5011–5026. doi: 10.1109/jstars.2020.3014784. [DOI] [Google Scholar]
  • 193.Jia S., Hu J., Xie Y., Shen L., Jia X., Li Q. Gabor cube selection based multitask joint sparse representation for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . June 2016;54(6):3174–3187. doi: 10.1109/tgrs.2015.2513082. [DOI] [Google Scholar]
  • 194.Zhan Y., Qin J., Huang T., et al. Hyperspectral image classification based on generative adversarial networks with feature fusing and dynamic neighborhood voting mechanism. Proceedings of the IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Sympo-sium; 28 July 2019; Yokohama, Japan. IEEE; pp. 811–814. [Google Scholar]
  • 195.Joy A. A., Hasan M. A. M. A hybrid approach of feature selection and feature extraction for hyperspectral image classification. Proceedings of the 2019 International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2); 11 July 2019; Rajshahi, Bangladesh. IEEE; pp. 1–4. [DOI] [Google Scholar]
  • 196.Ali U. A. M. E., Hossain M. A., Islam M. R. Analysis of PCA based feature extraction methods for classification of hyperspectral image. Proceedings of the 2019 2nd International Conference on Innovation in Engineering and Technology (ICIET); 23 December 2019; Dhaka, Bangladesh. pp. 1–6. [Google Scholar]
  • 197.Zhan Y., Wu K., Liu W., et al. Semi-supervised classification of hyperspectral data based on generative adversarial networks and neighborhood majority voting. Proceedings of the IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium; 22 July 2018; Valencia, Spain. IEEE; pp. 5756–5759. [Google Scholar]
  • 198.Champa A. I., Rabbi M. F., Banik N. Improvement in hyperspectral image classification by using hybrid subspace detection technique. Proceedings of the 2019 International Conference on Sustainable Technologies for Industry 4.0 (STI); 24 December 2019; Dhaka, Bangladesh. IEEE; pp. 1–5. [DOI] [Google Scholar]
  • 199.Fu H., Sun G., Ren J., Zhang A., Jia X. Fusion of PCA and segmented-PCA domain multiscale 2-D-SSA for effective spectral-spatial feature extraction and data classification in hyperspectral imagery. IEEE Transactions on Geoscience and Remote Sensing .
  • 200.Chen G. Y. Multiscale filter-based hyperspectral image classification with PCA and SVM. Journal of Electrical Engineering . 2021;72(1):40–45. doi: 10.2478/jee-2021-0006. [DOI] [Google Scholar]
  • 201.Xue Z., Zhou S., Zhao P. Active learning improved by neighborhoods and superpixels for hyperspectral image classification. IEEE Geoscience and Remote Sensing Letters . March 2018;15(3):469–473. doi: 10.1109/lgrs.2018.2794980. [DOI] [Google Scholar]
  • 202.Bhardwaj K., Das A., Patra S. Spectral-spatial active learning with superpixel profile for classification of hyperspectral images. Proceedings of the 2020 6th International Conference on Signal Processing and Communication (ICSC); 5 March 2020; Noida, India. IEEE; pp. 149–155. [Google Scholar]
  • 203.Cao X., Yao J., Xu Z., Meng D. Hyperspectral image classification with convolutional neural network and active learning. IEEE Transactions on Geoscience and Remote Sensing . July 2020;58(7):4604–4616. doi: 10.1109/tgrs.2020.2964627. [DOI] [Google Scholar]
  • 204.Paoletti M. E., Haut J. M., Plaza J., Plaza A. Training capsnets via active learning for hyperspectral image classification. Proceedings of the IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium; 26 September 2020; Waikoloa, HI, USA. IEEE; pp. 40–43. [Google Scholar]
  • 205.Haut J. M., Paoletti M. E., Plaza J., Li J., Plaza A. Active learning with convolutional neural networks for hyperspectral image classification using a new bayesian approach. IEEE Transactions on Geoscience and Remote Sensing . Nov. 2018;56(11):6440–6461. doi: 10.1109/tgrs.2018.2838665. [DOI] [Google Scholar]
  • 206.Wang X., Tan K., Chen Y. CapsNet and triple-GANs towards hyperspectral classification. Proceedings of the 2018 Fifth International Workshop on Earth Observation and Remote Sensing Applications (EORSA); 18 June 2018; Xi’an, China. IEEE; pp. 1–4. [Google Scholar]
  • 207.Wang X., Tan K., Du Q., Chen Y., Du P. Caps-TripleGAN: GAN-assisted CapsNet for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . Sept. 2019;57(9):7232–7245. doi: 10.1109/tgrs.2019.2912468. [DOI] [Google Scholar]
  • 208.Okwuashi O., Ndehedehe C. E. Deep support vector machine for hyperspectral image classification. Pattern Recognition . 2020;103 doi: 10.1016/j.patcog.2020.107298.107298 [DOI] [Google Scholar]
  • 209.Alam F. I., Zhou J., Liew A. W., Jia X., Chanussot J., Gao Y. Conditional random field and deep feature learning for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . March 2019;57(3):1612–1628. doi: 10.1109/tgrs.2018.2867679. [DOI] [Google Scholar]
  • 210.Cao Y., Mei J., Yuebin W., et al. SLCRF: subspace learning with conditional random field for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . 2021;59 [Google Scholar]
  • 211.Liang Y., Zhao X., Guo A. J. X., Zhu F. Hyperspectral image classification with deep metric learning and conditional random field. IEEE Geoscience and Remote Sensing Letters . June 2020;17(6):1042–1046. doi: 10.1109/lgrs.2019.2939356. [DOI] [Google Scholar]
  • 212.Wang Y., Mei J., Zhang L., et al. Self-supervised feature learning with CRF embedding for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . May 2019;57(5):2628–2642. doi: 10.1109/tgrs.2018.2875943. [DOI] [Google Scholar]
  • 213.Andrejchenko V., Liao W., Philips W., Scheunders P. Decision fusion framework for hyperspectral image classification based on Markov and conditional random fields. Remote Sensing . 2019;11(6):p. 624. [Google Scholar]
  • 214.Rissati J. V., Molina P. C., Anjos C. S. Hyperspectral image classification using random forest and deep learning algorithms. Proceedings of the 2020 IEEE Latin American GRSS & ISPRS Remote Sensing Conference (LAGIRS); 22 March 2020; Santiago, Chile. IEEE; p. p. 132. [Google Scholar]
  • 215.Zhang Y., Cao G., Li X., Wang B. Cascaded random forest for hyperspectral image classification. Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing . April 2018;11(4):1082–1094. doi: 10.1109/jstars.2018.2809781. [DOI] [Google Scholar]
  • 216.Zhang Y., Cao G., Li X., Wang B., Fu P. Active semi-supervised random forest for hyperspectral image classification. Remote Sensing . 2019;11(24):p. 2974. doi: 10.3390/rs11242974. [DOI] [Google Scholar]
  • 217.Li T., Leng J., Kong L. D. C. N. R. Deep cube CNN with random forest for hyperspectral image classification. Multimedia Tools and Applications . 2019;78:3411–3433. doi: 10.1007/s11042-018-5986-5. [DOI] [Google Scholar]
  • 218.Wang A., Wang Y., Chen Y. Hyperspectral image classification based on convolutional neural network and random forest. Remote Sensing Letters . 10(11):1086–1094. [Google Scholar]
  • 219.Hong D., Gao L., Yao J., Zhang B., Plaza A., Chanussot J. Graph convolutional networks for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . 2021;59 [Google Scholar]
  • 220.Ding Y., Guo Y., Chong Y., Pan S., Feng J. Global consistent graph convolutional network for hyperspectral image classification. IEEE Transactions on Instrumentation and Measurement . 2021;70:1–16. doi: 10.1109/tim.2021.3056750.5501516 [DOI] [Google Scholar]
  • 221.He X., Chen Y., Ghamisi P. Dual graph convolutional network for hyperspectral image classification with limited training samples. IEEE Transactions on Geoscience and Remote Sensing . 2021;60 [Google Scholar]
  • 222.Bai J., Ding B., Xiao Z., Jiao L., Chen H., Regan A. C. Hyperspectral image classification based on deep attention graph convolutional network. IEEE Transactions on Geoscience and Remote Sensing . 2021;60 [Google Scholar]
  • 223.Liu Q., Xiao L., Yang J., Wei Z. CNN-enhanced graph convolutional network with pixel- and superpixel-level feature fusion for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing . 2021;59 [Google Scholar]
  • 224.Signoroni A., Savardi M., Baronio A., Benini S. Deep learning meets hyperspectral image analysis: a multidisciplinary review. Journal of Imaging . 2019;5(5):p. 52. doi: 10.3390/jimaging5050052. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Publicly available data are used in this study.


Articles from Computational Intelligence and Neuroscience are provided here courtesy of Wiley

RESOURCES