Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2026 Feb 27;16:8641. doi: 10.1038/s41598-026-38036-2

Near space hyperspectral interferometric imaging image quality assessment with a physically grounded dataset

Cheng Jiang 1,, Chiming Tong 1, Zhongqi Ma 1
PMCID: PMC12979658  PMID: 41760697

Abstract

Near-space hyperspectral interferometric imaging (20–100 km altitude) is essential for atmospheric observation. It enables high-resolution profiling of greenhouse gases and wind fields. However, this modality is highly vulnerable to nonlinear degradations, including Littrow angle deviations, platform vibrations, and sensor non-uniformities. These factors severely hinder accurate image quality assessment (IQA). Existing IQA benchmarks are primarily built on natural images and lack both physical realism and domain-specific distortions. Consequently, models trained on them often fail to address the physics-driven degradations in interferometric systems. To overcome this limitation, we introduce NSIQ, the first IQA benchmark designed for near-space interferometric imaging. NSIQ contains 201 grayscale interferograms generated with a physics-consistent simulation framework and includes six representative degradation types derived from realistic system-level distortions. Each sample is annotated with hybrid quality labels that combine expert perceptual scores with normalized physical parameters, providing a multi-dimensional view of image quality. Benchmarking results reveal that state-of-the-art IQA methods, while effective on natural-image datasets, suffer substantial performance drops on NSIQ. This highlights the urgent need for domain-adaptive and physically grounded IQA models. The release of NSIQ will facilitate research in environmental monitoring, atmospheric modeling, and intelligent remote sensing. It also provides a foundation for long-term observation and a deeper understanding of the Earth system.

Keywords: Image quality assessment, Near-space hyperspectral interferometric imaging, Benchmark dataset, Remote sensing image

Subject terms: Engineering, Optics and photonics

Introduction

Near-space plays an indispensable role in atmospheric science research, covering the upper stratosphere, entire mesosphere, and lower thermosphere. This region serves as a critical observation platform for climate-relevant parameters, including greenhouse gas concentrations1, wind field structures2, and atmospheric energy transport3 processes. In recent years, Spatial Heterodyne Spectrometer and Imaging Systems (SHIS) have emerged as a pivotal technique for acquiring interferometric data from near-space platforms due to their high spectral resolution, compact structural design, and scan-free imaging capability. Interferograms captured by SHIS inherently contain detailed spectral information crucial for accurate atmospheric retrieval and subsequent scientific inference.

Image Quality Assessment (IQA), an essential component for ensuring image reliability, has made significant progress in natural image processing. Existing benchmark datasets such as KADID-10k4, KonIQ-10k5, and CLIVE6 have respectively laid methodological foundations in synthetic distortion modeling, real-world content diversity, and low-light perceptual quality evaluation. However, these datasets predominantly focus on natural scenes and lack physical consistency and domain specificity necessary for remote sensing, particularly for scientific interferometric imaging. While publicly available remote sensing datasets such as GOSAT7 provide valuable spectral and temporal information, they generally lack annotations related to perceptual image quality. This omission limits their suitability for developing and benchmarking IQA models in scientific imaging applications. In the absence of perceptual labels, existing IQA methods trained on such datasets struggle to capture the subjective quality attributes essential for accurate interpretation of scientific observations.

In near-space interferometric imaging, quality degradation arises from a complex interplay of physical factors, including platform-induced vibrations, Littrow angle deviations, detector non-uniformities, and electronic noise. These factors are highly system-dependent and difficult to isolate in real-world conditions. More critically, they break the core assumptions of many IQA models. These assumptions include spatial stationarity, homogeneous distortion types, and the availability of pristine reference images. Once violated, both full-reference (FR) and no-reference (NR) approaches exhibit degraded performance in scientific contexts. This highlights a key theoretical gap in current IQA research. Existing methods lack mechanisms to integrate physically grounded priors and fail to capture the complex, domain-specific quality cues required for scientific interpretation.

To address these challenges, we propose NSIQ, the first dedicated IQA benchmark specifically designed for near-space interferometric imaging. NSIQ consists of 201 high-fidelity grayscale interferograms generated via a physics-driven simulation framework that systematically models six representative degradation mechanisms inherent to SHIS imaging, covering a wide spectrum from mild to severe. Each sample is equipped with hybrid annotations that integrate expert perceptual judgments with standardized physical metrics, thereby bridging human-centered perceptual modeling and system-level physical characterization. This dual-source annotation strategy not only ensures perceptual relevance but also embeds physical interpretability, establishing NSIQ as a unique and domain-specific benchmark for advancing physically grounded IQA research.

We further conduct a comprehensive benchmarking of a diverse set of full-reference (FR) and no-reference (NR) IQA methods on the proposed NSIQ dataset, and consistently find that their performance remains unsatisfactory, particularly for models originally trained on natural image distortions. These observations reveal the limited applicability of existing approaches in physics-driven interferometric scenarios and emphasize the urgent need for physically informed IQA models explicitly tailored to scientific imaging tasks. In summary, the key contributions of this work are as follows:

  1. We construct NSIQ, the first high-fidelity IQA dataset for near-space interferometric imaging, featuring six physically grounded degradation types and dual-source annotations combining expert perception and system-level metrics.

  2. We perform a systematic evaluation of state-of-the-art FR and NR IQA models on NSIQ, revealing significant performance degradation and uncovering their modeling limitations when confronted with physics-driven distortions in scientific imagery.

Related work

Existing IQA and remote sensing datasets

Widely used IQA datasets, including LIVE8, TID20139, and CSIQ10, are built upon natural scene imagery. These datasets apply synthetic degradations such as compression noise, additive Gaussian perturbations, blurring, chromatic shifts, and contrast alterations. While such datasets have been instrumental in advancing IQA algorithms, their distortion types and imaging conditions are mostly limited to natural image scenarios, lacking support for remote sensing, particularly in hyperspectral and atmospheric imaging contexts.

In the remote sensing community, existing public datasets are mostly developed for high-level semantic tasks such as land cover classification11 or semantic segmentation12. Other datasets like MODIS13 or GOSAT7 support geophysical parameter inversion tasks such as atmospheric retrieval or radiative transfer modeling. However, these datasets rarely include perceptual annotations aligned with human visual assessment, and thus are unsuitable for direct use in IQA benchmarking.

To date, there is no publicly available IQA dataset specifically tailored for near-space hyperspectral interferometric imaging. These systems, operating under the spatial heterodyne principle, require precise reconstruction of interference fringes, accurate spectral demultiplexing, and robust spatial calibration. However, the acquired imagery is inherently vulnerable to a broad spectrum of degradations, including platform-induced instability, inter-channel spectral crosstalk, electronic sensor noise, and nonlinear response characteristics.

These artifacts often appear as large-scale geometric deformations. They also include spectral misalignments and fringe distortions. Such effects cannot be realistically reproduced by conventional synthetic distortion pipelines. Therefore, constructing a high-fidelity benchmark dataset that incorporates physically-induced distortions, perceptual annotations, and real observational data is essential for advancing quality control and model development in this emerging imaging modality.

IQA methodologies

IQA methods are typically divided into FR and NR categories. FR approaches compare distorted images with a reference, using metrics such as PSNR and SSIM. While computationally simple, these pixel- or statistic-based methods often misalign with human perception, especially under complex structural distortions.

To improve perceptual alignment, recent FR methods integrate structural and multi-scale features beyond traditional pixel-wise comparisons. For instance, VIF14 and FSIM15 incorporate multi-scale information fidelity and phase congruency, respectively, allowing for improved sensitivity to local structural degradation. More recently, deep learning-based FR models such as DISTS16 and PieAPP17 have leveraged learned feature representations to quantify semantic similarity in high-dimensional perceptual spaces. Further advancing this trend, hybrid architectures like AHIQ18 integrate convolutional encoders with Transformer modules, facilitating the joint modeling of fine textures and global semantic coherence.

Nonetheless, the assumption that an undistorted reference image is available rarely holds in real-world applications. In near-space hyperspectral interferometric imaging, where signals are reconstructed through fringe demodulation and spectral unmixing, the notion of a perfect reference is often physically ambiguous and practically unattainable.

To overcome the limitations of FR methods, NR IQA approaches assess image quality directly from distorted images without requiring a reference. Early models such as BRISQUE19 and NIQE20 rely on natural scene statistics (NSS) to quantify quality as deviations from statistical norms.

Deep learning has improved NR-IQA by replacing hand-crafted features with learned representations. CNN-based models like CNNIQA21 and DB-CNN22 capture local quality-relevant structures, while architectures such as MUSIQ23 and MANIQA24 incorporate multi-scale encoding and attention mechanisms to handle complex, non-local distortions. Recent work has explored multimodal large language models such as Rl-IQA25, which combine vision and language for joint quality and aesthetic assessment. However, most existing NR-IQA models are trained on natural image datasets and do not account for physically grounded distortions.

Despite recent advances, the effectiveness of NR-IQA models remains fundamentally limited by their dependence on natural image priors and the lack of training data that reflect structured, physically grounded degradations. In the context of near-space hyperspectral interferometric imaging, where distortions arise from complex interactions among platform motion, optical interference, and sensor response, such limitations become particularly pronounced. Conventional NR models, trained on natural image distributions, struggle to generalize to these physics-driven degradations, resulting in suboptimal performance and limited reliability in scientific applications.

These challenges underscore the urgent need for a dedicated IQA dataset tailored to this domain. A dataset that incorporates perceptual annotations alongside diverse, system-specific degradation types would not only serve as a benchmark for evaluating existing methods but also enable the development of physically informed, perceptually consistent IQA models. Establishing such a dataset is thus essential for bridging the gap between current IQA research and the unique requirements of near-space scientific imaging systems.

NSIQ datasets

This section presents the design rationale, simulation methodology, and structural specification of the proposed NSIQ dataset. We begin by introducing the physical principles of near-space hyperspectral interferometric imaging and describing the core components of the forward simulation framework. We then detail a comprehensive degradation modeling strategy that systematically simulates six systematic degradation, enabling diverse quality levels. Finally, we specify the dataset structure to support the evaluation of both FR and NR IQA methods.

Physically consistent imaging simulation

We propose a systematic simulation framework for hyperspectral interferometric imaging, specifically designed to emulate degradation mechanisms in near-space observations with high physical fidelity. The overall simulation process is divided into two stages: (1) modeling of ideal image formation and (2) controlled injection of degradation effects. Each stage maintains transparent physical assumptions and controllable system parameters, ensuring that the resulting synthetic data remain both interpretable and reproducible.

As illustrated in Fig. 1(a), the front-end optical system employs a cylindrical lens to segment the entire field of view into a set of linear narrow slices. The local interferometric information within each slice is projected onto designated rows and columns of the focal plane array, enabling spatial–spectral joint encoding with one-dimensional spatial resolution.

Fig. 1.

Fig. 1

Illustration of the physically consistent simulation pipeline for near-space hyperspectral interferometric imaging. (a) Schematic diagram of the front-end optical system and spatial–spectral encoding mechanism, where the cylindrical lens segments the field of view into linear slices projected onto the focal plane array. (b) Spatial heterodyne spectrometer and imaging system. (c) Example of a simulated noise-free interferogram and its local regions: (1) high-contrast central fringes, (2) smooth lateral regions, and (3) low-modulation boundary zones.

Under ideal, noise-free conditions, we consider the interferometric response at a specific spatial position x on the focal plane, resulting from a monochromatic wave component with wavenumber Inline graphic. The recorded intensity can be modeled as: Inline graphic, where Inline graphic denotes the baseline irradiance and Inline graphic is the phase difference introduced by the interferometer. Assuming the spatial heterodyne interferometer operates under quasi-parallel littrow diffraction geometry, the phase delay at position x can be analytically expressed as: Inline graphic, where Inline graphic is the littrow central wavenumber and Inline graphic denotes the Littrow Angle determined by the grating alignment.

Since realistic illumination is broadband rather than monochromatic, the total interferometric intensity I(x) at a spatial location x is obtained by integrating over the full wavenumber domain. Let Inline graphic denote the spectral radiance of the incident light, and Inline graphic be the system transmission efficiency. The cumulative intensity becomes:

graphic file with name d33e437.gif 1

This formulation describes how spatial fringe modulation encodes spectral information and serves as the basis for spectral demodulation in subsequent processing.

Real-world SHIS imaging is inevitably affected by system-level imperfections. To account for system-level imperfections, we introduce perturbation terms into the ideal fringe model. The resulting interferometric intensity is formulated as:

graphic file with name d33e444.gif 2

where Inline graphic are the transmission efficiencies of the two interferometric arms. The terms Inline graphic and Inline graphic represent angular misalignment, lateral displacement, and residual phase error, respectively. The integration bounds are defined by Inline graphic where W is the effective beam width and Inline graphic is the nominal Littrow angle. This extended formulation will be used as a basis for simulating a number of degradation scenarios during the subsequent modeling phase.

In this context, Fig. 1(b) establishes the physical configuration of the spatial heterodyne spectrometer and imaging system, which underpins the formulation of our physically consistent simulation framework and guides the subsequent degradation modeling. Figure 1(c) further provides a noise-free simulated interferogram, which serves as a reference baseline for systematically analyzing how individual physical perturbations influence fringe modulation and spatial integrity.

Physically motivated degradation models

In practical spatial heterodyne interferometric imaging systems, image quality is affected not only by the idealized optical formation process, but also by a wide range of physical perturbations and hardware-induced noise sources. To simulate the degradation of interferograms under realistic observation conditions, we formulate six representative degradation models grounded in physical principles. For each physical degradation type, parameters are sampled within physically plausible ranges derived from interferometric imaging principles. Uniform sampling is adopted within each range to ensure balanced coverage of degradation severities, while different degradation factors are modeled independently for controlled analysis.

Phase errors. Phase errors arise from spatially varying perturbations introduced by multiple system-level imperfections, including electronic drift, grating misalignment, and ZOPD localization uncertainty. These factors alter the nominal optical path distribution across the detector, thereby disrupting local phase coherence. The consequence is a gradual loss of fringe visibility, structural distortion of the interferogram, and a measurable reduction in modulation efficiency.

From a physical modeling perspective, phase error can be expressed as a spatially dependent perturbation superimposed on the ideal modulation phase,

graphic file with name d33e498.gif 3

where Inline graphic denotes the ideal phase distribution determined by the nominal optical path difference (OPD), Inline graphic is the optical wavenumber, and Inline graphic represents the spatially varying OPD disturbance caused by mechanical or electronic instabilities. This additive representation captures how even small fluctuations in Inline graphic can propagate through the optical system, producing non-uniform phase shifts across the detector.

Such perturbations have two primary consequences. First, the coherence of the recorded interferogram becomes spatially variant, leading to degraded fringe contrast in regions of strong signal intensity. Second, the cumulative accumulation of phase deviations across the detector plane results in distorted fringe geometry, particularly in central regions where phase sensitivity is highest. As illustrated in Fig. 2(a), increasing the magnitude of Inline graphic produces a visible transition from sharp, well-modulated fringes to distorted and irregular patterns, reflecting the loss of spatial modulation fidelity under phase errors. This physically motivated model thus provides a quantitative link between microscopic perturbations in optical path length and macroscopic degradation of interferometric image quality.

Fig. 2.

Fig. 2

Illustration of the impact of four individual degradation variables on near-space interferometric imagery: (a) phase errors, (b) Littrow angle deviations, (c) vibration errors, and (d) Electrical system noise. The top row presents the corresponding pseudocolor interferograms, while the bottom row displays the associated 2D frequency spectra.

A consistent pseudo-color mapping and a unified contrast normalization strategy are applied to all visual examples in Fig. 2, ensuring that perceptual differences across images primarily reflect the effects of different physical degradation mechanisms rather than visualization-dependent scaling.

Littrow angle errors. Littrow angle errors arise from slight misalignment between the two diffraction gratings, typically caused by mechanical instability or thermal drift. These angular deviations alter the intended phase modulation geometry, shifting fringe frequency and symmetry.

We model this effect by introducing an angular perturbation Inline graphic to the Littrow angle, which modifies the spatial phase term from Inline graphic. For small deviations, a first-order approximation gives:

graphic file with name d33e547.gif 4

This results in nonlinear changes to fringe spacing across the detector, as the phase modulation term becomes position-dependent in an asymmetric way. As shown in Fig. 2(b), Littrow Angle Errors introduce a spatially dependent shift in fringe frequency, leading to enhanced central intensity and asymmetric fringe broadening across the interferogram.

Vibration errors.

Near-space imaging platforms, such as spacecraft or high-altitude aerial vehicles, are inevitably subject to micro-vibrations induced by mechanical operations, aerodynamic turbulence, and thermally driven structural deformations. Physically, these perturbations can be modeled as a stochastic displacement Inline graphic superimposed on the nominal OPD:

graphic file with name d33e565.gif 5

where Inline graphic denotes the undisturbed optical path difference and Inline graphic represents the vibration-induced temporal displacement. The corresponding modulation phase at wavenumber Inline graphic is expressed as: Inline graphic, where Inline graphic is the instantaneous modulation phase and Inline graphic is the optical wavenumber. Due to temporal integration over the exposure interval, fluctuations in Inline graphic cause a partial loss of coherence in the recorded interferogram. Assuming Inline graphic follows a zero-mean Gaussian distribution with root-mean-square (RMS) amplitude Inline graphic, the effective fringe visibility can be written as

graphic file with name d33e608.gif 6

where Inline graphic denotes the visibility factor and Inline graphic is the RMS vibration amplitude mapped into the optical path domain.

This exponential coherence-decay formulation establishes a direct physical connection between the statistical properties of platform vibrations and the degradation of interferometric contrast. As shown in Fig. 2(c), increasing vibration amplitude leads to a significant drop in central fringe contrast and gradual disappearance of fringe visibility toward the edges, transitioning the interferogram from sharp patterns to spatially diffuse structures.

Detector Non-uniformity. Detector non-uniformity arises from fabrication tolerances, material inconsistencies, and aging effects, leading to pixel-wise variations in quantum efficiency (QE) and dark current noise.

During a limited integration time Inline graphic, photon detection follows Poisson statistics, and QE variations introduce additional spatial noise. The standard deviation of calibrated shot noise is approximated by:

graphic file with name d33e637.gif 7

where Inline graphic is the calibration error rate, Inline graphic is the pixel area of the detector. h denotes Planck’s constant, and c denotes the speed of light. Inline graphic is the mean wavenumber of the spectral band. In the absence of incident light, dark current–arising from thermally generated charge carriers–also exhibits spatial variability. Taking both photon and dark current contributions into account, the overall impact of detector non-uniformity on the interferogram can be approximated as: Inline graphic where Inline graphic. Inline graphic accounts for the combined variance from shot noise non-uniformity and dark signal variations.

Electrical system noise. In SHIS, detector outputs are transmitted through analog amplification, analog-to-digital conversion, and digital readout circuits, each introducing distinct noise sources due to hardware limitations. Quantization noise arises from the finite resolution of the ADC and is modeled as: Inline graphic where Inline graphic is the saturation charge and M is the number of quantization bits.

Readout noise, caused by thermal and bandwidth fluctuations in preamplifiers, is expressed as: Inline graphic where Inline graphic denotes baseline readout noise and Inline graphic is a temperature correction factor.

Bit error noise, stemming from clock jitter and transmission interference, is modeled as:

graphic file with name d33e704.gif 8

where Inline graphic represents the bit error rate, SR denotes the spectral resolution, Inline graphic is the wavenumber corresponding to the interferogram fringe location. Assuming statistical independence, the total electronic noise is: Inline graphic. This noise component degrades local signal-to-noise ratio, especially in low-intensity regions and high-resolution spectral bands.

Interferogram Sampling Errors. SHIS focal plane arrays to synchronously sample the interferogram in the spatial domain. However, due to limited detector resolution and fixed pixel size in practical system design and implementation, fringe patterns are often sampled at near-critical or even sub-Nyquist rates, leading to systematic sampling errors during acquisition. To emulate these degradations, we apply spatial down-sampling, non-uniform sampling, and quantization noise to an ideal interferogram. The effect of spatial under-sampling can be approximated as:

graphic file with name d33e728.gif 9

where Inline graphic denotes the spatial interpolation operator, and Inline graphic represent the effective sampling intervals along horizontal and vertical directions, respectively.

Although each degradation mechanism is modeled independently in this study to facilitate controlled analysis, multiple physical degradations may coexist and interact in real interferometric imaging systems, jointly affecting image formation and quality.

Quality label generation

We establish a dual-source label generation framework to ensure the reliability of image quality annotation in the proposed IQA dataset. Each interferogram is assigned a quality score that combines perceptual human judgment with physically grounded system degradation metrics. The overall scoring scheme is illustrated in Fig. 3.

Fig. 3.

Fig. 3

Illustration of the quality score generation process. The final quality score is obtained by combining physical degradation parameters extracted from image simulation metadata and subjective evaluation scores provided by expert human raters based on visual assessments.

Specifically, the final quality label Q for each sample is defined as a weighted combination:

graphic file with name d33e755.gif 10

Here, Inline graphic denotes subjective quality scores assigned by 27 domain experts under standardized viewing conditions, based on fringe clarity, spatial modulation, and overall appearance. Scores were constrained to Inline graphic and normalized to reduce inter-rater bias. The hybrid quality score combines subjective perceptual scores and physics-driven objective metrics through a weighting parameter Inline graphic, which is empirically determined based on preliminary validation. A fixed value of Inline graphic is used for all samples to ensure stability and reproducibility of the dataset.

Inline graphic reflects the impact of normalized physical degradation parameters. Although noise sources were modeled separately, their interactions may yield non-linear effects. Thus, Inline graphic is not simply additive. Further analysis is provided in the supplementary material.

Dataset specification

The NSIQ dataset contains 201 grayscale interferograms simulating diverse near-space atmospheric conditions. Each image has a resolution of 1024Inline graphic610 pixels, with the vertical axis (610 pixels) representing altitudes from 10 to 71 km, and the horizontal axis encoding spatial modulation from optical interference fringes.

Samples are synthesized by injecting one or more of six physically grounded degradation types. Parameters are uniformly sampled within realistic bounds to cover mild to severe distortion levels, enabling robust model generalization under varied noise conditions.

All images are stored in grayscale.tif format to preserve full fringe dynamic range. To support various IQA protocols, each sample includes one or more quality labels and is paired with two types of annotation files in.txt format. The NSIQ dataset developed in this work is openly accessible at https://huggingface.co/datasets/Bwhitebear/NSIQ/tree/main. The dataset comprises near-space hyperspectral interferometric images and associated image quality labels, formatted in standard digital representations to facilitate reproducibility, benchmarking, and further methodological development in image quality assessment.

Dataset analysis and benchmarking

Statistical properties of NSIQ

To systematically evaluate the NSIQ dataset in terms of its completeness in physical degradation modeling and the representativeness of perceptual quality annotations, we analyze it from three perspectives: the distribution of degradation parameters, the statistical behavior of subjective scores, and their interaction mechanisms, as illustrated in Fig. 4.

Fig. 4.

Fig. 4

Statistical characteristics of the NSIQ dataset. (a) Radar chart of the relative proportion of six simulated perturbation parameters. (b) Histogram of DMOS scores. (c) Frequency of each perturbation factor within the dataset.

Figure 4(a) shows the relative proportions of six representative perturbation sources. These degradations originate from core physical processes in interferometric imaging systems, including optical path instability, frequency drift, mechanical-thermal vibration, and electronic response noise. The radar chart indicates that all degradation dimensions are well balanced, effectively mitigating sample bias toward any particular noise pattern.

Figure 4(b) presents the distribution of subjective DMOS scores across all samples. The scores span the full perceptual range of approximately 1 to 5, forming a continuous and right-skewed distribution with a dominant peak in the mid-quality region. Such a continuous labeling structure is particularly well-suited for training regression-based IQA models and supports fine-grained discriminative learning.

Further, Fig. 4(c) illustrates the numerical value ranges of the six degradation parameters. Each parameter is sampled within a physically plausible tolerance band, covering typical intervals from minimally perceptible to severely deteriorated system conditions. For instance, phase errors are sampled from 0.15 to 0.6, and Littrow angle deviations are controlled within 0.04 to 0.2. As shown in the bar chart, these parameters exhibit clear numeric continuity and scale separation, ensuring both the controllability and physical interpretability of the degradation simulation. This also provides a strong basis for multifactor sensitivity analysis and robustness evaluation of IQA models.

Benchmark experiments on the NSIQ dataset

To comprehensively evaluate the discriminative capacity and inherent challenges of the proposed NSIQ dataset, we benchmarked 14 representative IQA models, including both NR-IQA (DEIQA26, MUSIQA23, ASNA27, GraphIQA28, TRes29, MANIQA24,DiffVIQA30) and FR-IQA (DIST31, DMM32, DeepJSD33,ADISTS34, PieApp17, CSIM35) methods. For trainable deep learning models, we adopted a conventional 8:2 train-test split strategy, where Inline graphic of the interferometric images were used for fine-tuning and the remaining Inline graphic for testing. The model training was conducted on a computer equipped with two NVIDIA RTX 3090 GPUs.

To mitigate the performance variance caused by random data splits and ensure the statistical reliability of the results, all experiments were repeated 50 times with different seeds, and the reported performance is the average over these repetitions. We evaluate all models using four widely used metrics: Spearman Rank-order Correlation Coefficient (SRCC), Pearson Linear Correlation Coefficient (PLCC), Kendall Rank Correlation Coefficient (KRCC), and Root Mean Square Error (RMSE). The averaged results on the test set are summarized in Table 1, which reveals how well each model performs under physically grounded degradation scenarios.

Table 1.

Benchmark results of 14 IQA methods on the NSIQ dataset. Both FR and NR models are evaluated using SRCC, PLCC, KRCC, and RMSE.

Methods DIST31 DMM32 DeepJSD33 ADISTS34 PieApp17 DEIQA26 MUSIQA23
SRCC 0.7916 0.6566 0.0917 0.7169 0.5680 0.2521 0.8591
PLCC 0.8076 0.6846 0.0807 0.7728 0.5918 0.2986 0.8606
KRCC 0.5979 0.4760 0.0604 0.5283 0.4077 0.3312 0.6716
RMSE 3.2066 2.6338 3.1780 3.1450 2.5885 2.4997 0.5602
Methods ASNA27 CSIM35 RE-IQA36 GraphIQA28 TRes29 MANIQA24 DiffVIQA30
SRCC 0.2578 0.4102 0.4980 0.4872 0.5745 0.5357 0.8689
PLCC 0.2788 0.6781 0.5772 0.5657 0.6027 0.6115 0.8713
KRCC 0.3814 0.2768 0.3481 0.3352 0.3613 0.3787 0.6822
RMSE 6.0778 2.1984 2.9120 2.9304 2.8610 2.7861 0.1624

To facilitate holistic comparison, we further provide a radar chart visualization that summarizes the joint behavior of representative methods across all evaluation metrics, as illustrated in Fig. 6. In this visualization, RMSE is inverted and all metrics are normalized to ensure consistent comparison.

Fig. 6.

Fig. 6

Radar chart illustrating the multi-metric performance of representative IQA methods. SRCC, PLCC, KRCC, and RMSE are jointly considered. RMSE is inverted and all metrics are normalized to [0,1] for visualization, with larger values indicating better performance.

To further validate the generalization challenge imposed by NSIQ, we conducted transfer experiments on several trainable IQA models. Specifically, we evaluated models pretrained on natural image datasets such as KonIQ-10k, and CSIQ, directly applying their weights to perform quality prediction on interferometric images from NSIQ, while keeping network architectures and hyperparameters unchanged. Without any fine-tuning, these models exhibited significant performance degradation on NSIQ: multiple metrics dropped sharply, and some models even lost correlation with human subjective scores entirely.

These results indicate that while existing models excel on conventional natural image distortions, their discriminative capacity fails to generalize to degradations driven by physical system perturbations. Unlike synthetic distortions used in traditional IQA datasets, NSIQ captures quality degradations rooted in real-world observational phenomena, characterized by complex system couplings and multi-scale physical dynamics. Such degradations are inherently nonlinear, spatially variant, and physically grounded-exceeding the representational scope of models trained on natural scene statistics.

To evaluate the generalization and robustness of mainstream image quality assessment (IQA) algorithms on physically degraded interferometric data, we conducted benchmark experiments on the NSIQ dataset. Figure 5 compares the performance of six representative IQA methods (DIST, DMM, DeepJSD, PieApp, MANIQA, and MUSIQA) across 200 test samples. For each method, the predicted quality scores are shown alongside the ground-truth or reference scores, enabling a direct visual comparison of accuracy and consistency.

Fig. 5.

Fig. 5

Test results of six IQA methods on the proposed dataset. For each method, the predicted quality scores (red) are compared with the reference quality scores (blue) across 200 test samples. All scores are mapped to a unified range of1,5 to enable a direct comparison under the same coordinate scale. The noticeable fluctuations and mismatches between the two curves indicate the difficulty of existing IQA models in accurately capturing perceptual quality under physics-driven degradations in interferometric imaging.

The results show substantial fluctuations in the predicted scores for all methods, with none consistently matching the reference labels across the full test set. Methods such as DeepJSD and DIST are highly sensitive to subtle degradations, often producing sharp prediction spikes and local mismatches with ground-truth. Learning-based approaches demonstrate greater prediction consistency, but residual biases remain under severe degradations.

Overall, these findings reveal that existing IQA models, primarily trained and validated on natural image distortions, struggle to capture the perceptual impact of physical degradations unique to interferometric imaging. The observed inconsistencies underscore the need for specialized quality metrics and more robust model adaptation in near-space or physics-constrained imaging scenarios. The NSIQ benchmark therefore provides a challenging and physically meaningful testbed for advancing physically interpretable IQA methodologies.

It should be noted that the observed performance degradation of existing IQA models is specific to the physics-driven degradations modeled in the proposed NSIQ dataset. These degradations arise from interferometric imaging mechanisms and differ fundamentally from distortions commonly encountered in natural-image or conventional remote-sensing IQA benchmarks. We also acknowledge that the current NSIQ dataset has a moderate scale and consists of static image samples, which may limit the diversity of quality variations and temporal effects represented in the benchmark.

Limitation

The proposed NSIQ dataset currently has several limitations. First, its scale remains modest, containing 201 grayscale interferograms, thus potentially restricting its effectiveness for training large-scale deep networks or self-supervised frameworks. This constraint arises chiefly from the high computational demands associated with synthesizing physically accurate interferometric data. Future work will focus on expanding the dataset to cover a wider range of physically representative scenarios and observational conditions.

Second, NSIQ currently includes only static frames, omitting temporally correlated degradations inherent to real-world near-space observations. Extending the dataset to include time-series interferograms would enable the exploration of dynamic quality variations and facilitate video-level benchmarking.

Furthermore, most existing state-of-the-art IQA methods inherently assume spatially uniform distortions, lacking sensitivity to localized, region-specific artifacts prevalent in NSIQ. Models employing global pooling operations consequently tend to underestimate local quality degradations, resulting in perceptual bias. As illustrated through patch-wise analyses in the supplementary material, addressing this issue requires developing methods explicitly sensitive to spatially heterogeneous distortions.

Future work will focus on expanding the dataset size and on building image quality assessment datasets from real satellite observations, so as to further validate and extend NSIQ.

Conclusion

In this work, we presented NSIQ, a physically grounded image quality assessment dataset designed for near-space hyperspectral interferometric imaging. By modeling representative physical degradation mechanisms and combining perceptual annotations with physics-driven quality measures, NSIQ bridges the gap between conventional natural-image IQA benchmarks and the requirements of interferometric remote sensing scenarios. Extensive benchmarking experiments demonstrate that existing IQA methods face notable challenges under physics-driven degradations, highlighting the necessity of domain-aware quality modeling. Beyond serving as a benchmark, NSIQ provides a practical foundation for developing and validating IQA methods tailored to near-space interferometric observations, and has the potential to support automated quality assessment in applications such as atmospheric retrieval and interferometric remote sensing, where reliable image quality evaluation is essential.

Author contributions

C.J. conceived and supervised the overall study, established the theoretical framework, and developed the physically grounded simulation models for near-space interferometric imaging. He also performed the analytical derivations and drafted the main manuscript. C.T. implemented the dataset generation and conducted experimental benchmarking of both full-reference and no-reference IQA models. She was responsible for data visualization and the preparation of all figures and tables, as well as assisting in result interpretation and manuscript organization. Z.M. contributed to data validation, statistical analysis, and manuscript revision. He provided critical feedback that helped refine the structure and clarity of the paper. All authors reviewed and approved the final version of the manuscript.

Funding

This work was supported by the National Key Research and Development Program of China under grant no. 2022YFB3901800, and no. 2022YFB3901804.

It was also supported in part by the National Natural Science Foundation of China under grant no. 42471425.

Data availability

The NSIQ dataset presented in this study is publicly available to support transparency, reproducibility, and further research. The dataset includes simulated near-space hyperspectral interferometric images together with corresponding image quality annotations generated under physically grounded degradation models. All data are hosted on an open-access repository and can be accessed at: https://huggingface.co/datasets/Bwhitebear/NSIQ/tree/main. The dataset generated and analyzed during the current study is publicly available on Hugging Face at the following URL: https://huggingface.co/datasets/Bwhitebear/NSIQ/tree/main. All data supporting the findings of this study, including the simulated interferograms and associated hybrid quality annotations, can be freely accessed and used for non-commercial research purposes under the dataset’s specified license.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Filonchyk, M., Peterson, M. P., Zhang, L., Hurynovich, V. & He, Y. Greenhouse gases emissions and global climate change: Examining the influence of co2, ch4, and n2o. Sci. The Total. Environ. 173359 (2024). [DOI] [PubMed]
  • 2.Gao, H. et al. Urban wind field prediction based on sparse sensors and physics-informed graph-assisted auto-encoder. Comput. Civ. Infrastructure Eng.39, 1409–1430 (2024). [Google Scholar]
  • 3.Tan, J. et al. Harvesting energy from atmospheric water: grand challenges in continuous electricity generation. Adv. Mater.36, 2211165 (2024). [DOI] [PubMed] [Google Scholar]
  • 4.Lin, H., Hosu, V. & Saupe, D. Kadid-10k: A large-scale artificially distorted iqa database. In 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX), 1–3 (IEEE, 2019).
  • 5.Hosu, V., Lin, H., Sziranyi, T. & Saupe, D. Koniq-10k: An ecologically valid database for deep learning of blind image quality assessment. IEEE Transactions on Image Processing29, 4041–4056 (2020). [DOI] [PubMed] [Google Scholar]
  • 6.Ghadiyaram, D. & Bovik, A. C. Massive online crowdsourced study of subjective and objective picture quality. IEEE Transactions on Image Processing25, 372–387 (2015). [DOI] [PubMed] [Google Scholar]
  • 7.Kuze, A., Suto, H., Nakajima, M. & Hamazaki, T. Thermal and near infrared sensor for carbon observation fourier-transform spectrometer on the greenhouse gases observing satellite for greenhouse gases monitoring. Appl. optics48, 6716–6733 (2009). [DOI] [PubMed] [Google Scholar]
  • 8.Sheikh, H. Live image quality assessment database release 2. research (2005).
  • 9.Ponomarenko, N. et al. Image database tid2013: Peculiarities, results and perspectives. Signal processing: Image communication30, 57–77 (2015). [Google Scholar]
  • 10.Larson, E. C. & Chandler, D. M. Most apparent distortion: full-reference image quality assessment and the role of strategy. J. electronic imaging19, 011006–011006 (2010). [Google Scholar]
  • 11.Green, R. O. et al. Imaging spectroscopy and the airborne visible/infrared imaging spectrometer (aviris). Remote. sensing environment65, 227–248 (1998). [Google Scholar]
  • 12.Phiri, D. et al. Sentinel-2 data for land cover/use mapping: A review. Remote sensing12, 2291 (2020). [Google Scholar]
  • 13.Pagano, T. S. & Durham, R. M. Moderate resolution imaging spectroradiometer (modis). In Sensor Systems for the Early Earth Observing System Platforms, vol. 1939, 2–17 (SPIE, 1993).
  • 14.Han, Y., Cai, Y., Cao, Y. & Xu, X. A new image fusion performance metric based on visual information fidelity. Information fusion14, 127–135 (2013). [Google Scholar]
  • 15.Zhang, L., Zhang, L., Mou, X. & Zhang, D. Fsim: A feature similarity index for image quality assessment. IEEE transactions on Image Processing20, 2378–2386 (2011). [DOI] [PubMed] [Google Scholar]
  • 16.Tsai, P.-F., Peng, H.-N., Liao, C.-H. & Yuan, S.-M. Full-reference image quality assessment with transformer and dists. Mathematics11, 1599 (2023). [Google Scholar]
  • 17.Prashnani, E., Cai, H., Mostofi, Y. & Sen, P. Pieapp: Perceptual image-error assessment through pairwise preference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1808–1817 (2018).
  • 18.Lao, S. et al. Attentions help cnns see better: Attention-based hybrid image quality assessment network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 1140–1149 (2022).
  • 19.Mittal, A., Moorthy, A. K. & Bovik, A. C. Blind/referenceless image spatial quality evaluator. In 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), 723–727, 10.1109/ACSSC.2011.6190099 (2011).
  • 20.Mittal, A., Soundararajan, R. & Bovik, A. C. Making a “completely blind’’ image quality analyzer. IEEE Signal processing letters20, 209–212 (2012). [Google Scholar]
  • 21.Kang, L., Ye, P., Li, Y. & Doermann, D. Convolutional neural networks for no-reference image quality assessment. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1733–1740 (2014).
  • 22.Zhang, W., Ma, K., Yan, J., Deng, D. & Wang, Z. Blind image quality assessment using a deep bilinear convolutional neural network. IEEE Transactions on Circuits Syst. for Video Technol.30, 36–47. 10.1109/TCSVT.2018.2886771 (2020). [Google Scholar]
  • 23.Ke, J., Wang, Q., Wang, Y., Milanfar, P. & Yang, F. Musiq: Multi-scale image quality transformer. In Proceedings of the IEEE/CVF international conference on computer vision, 5148–5157 (2021).
  • 24.Yang, S. et al. Maniqa: Multi-dimension attention network for no-reference image quality assessment. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 1191–1200 (2022).
  • 25.Li, M., Wang, R., Sun, L., Bai, Y. & Chu, X. Next token is enough: Realistic image quality and aesthetic scoring with multimodal large language model. arXiv preprintarXiv:2503.06141 (2025).
  • 26.Qin, G. et al. Data-efficient image quality assessment with attention-panel decoder. In Proceedings of the AAAI Conference on Artificial Intelligence37, 2091–2100 (2023). [Google Scholar]
  • 27.Ayyoubzadeh, S. M. & Royat, A. (asna) an attention-based siamese-difference neural network with surrogate ranking loss function for perceptual image quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 388–397 (2021).
  • 28.Sun, S., Yu, T., Xu, J., Zhou, W. & Chen, Z. Graphiqa: Learning distortion graph representations for blind image quality assessment. IEEE Transactions on Multimed.25, 2912–2925 (2022). [Google Scholar]
  • 29.Golestaneh, S. A., Dadsetan, S. & Kitani, K. M. No-reference image quality assessment via transformers, relative ranking, and self-consistency. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, 1220–1230 (2022).
  • 30.Wang, Z. et al. Diffusion model-based visual compensation guidance and visual difference analysis for no-reference image quality assessment. IEEE Transactions on Image Process. (2025). [DOI] [PubMed]
  • 31.Ding, K., Ma, K., Wang, S. & Simoncelli, E. P. Image quality assessment: Unifying structure and texture similarity. IEEE transactions on pattern analysis and machine intelligence44, 2567–2581 (2020). [DOI] [PubMed] [Google Scholar]
  • 32.Chen, B. et al. Debiased mapping for full-reference image quality assessment. IEEE Transactions on Multimedia (2025).
  • 33.Liao, X. et al. Deepwsd: Projecting degradations in perceptual space to wasserstein distance in deep feature space. In Proceedings of the 30th ACM International Conference on Multimedia, 970–978 (2022).
  • 34.Ding, K., Liu, Y., Zou, X., Wang, S. & Ma, K. Locally adaptive structure and texture similarity for image quality assessment. In Proceedings of the 29th ACM International Conference on multimedia, 2483–2491 (2021).
  • 35.Ghazouali, S. E., Michelucci, U., Hillali, Y. E. & Nouira, H. Csim: A copula-based similarity index sensitive to local changes for image quality assessment. arXiv preprintarXiv:2410.01411 (2024).
  • 36.Saha, A., Mishra, S. & Bovik, A. C. Re-iqa: Unsupervised learning for image quality assessment in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 5846–5855 (2023).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The NSIQ dataset presented in this study is publicly available to support transparency, reproducibility, and further research. The dataset includes simulated near-space hyperspectral interferometric images together with corresponding image quality annotations generated under physically grounded degradation models. All data are hosted on an open-access repository and can be accessed at: https://huggingface.co/datasets/Bwhitebear/NSIQ/tree/main. The dataset generated and analyzed during the current study is publicly available on Hugging Face at the following URL: https://huggingface.co/datasets/Bwhitebear/NSIQ/tree/main. All data supporting the findings of this study, including the simulated interferograms and associated hybrid quality annotations, can be freely accessed and used for non-commercial research purposes under the dataset’s specified license.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES