Abstract
We introduce the EEGET-RSOD, a simultaneous electroencephalography (EEG) and eye-tracking dataset for remote sensing object detection. This dataset contains EEG and eye-tracking data when 38 remote sensing experts located specific objects in 1,000 remote sensing images within a limited time frame. This task reflects the typical cognitive processes associated with human visual search and object identification in remote sensing imagery. To our knowledge, EEGET-RSOD is the first publicly available dataset to offer synchronized eye-tracking and EEG data for remote sensing images. This dataset will not only advance the study of human visual cognition in real-world environment, but also bridge the gap between human cognition and artificial intelligence, enhancing the interpretability and reliability of AI models in geospatial applications.
Subject terms: Human behaviour, Geography, Neurophysiology
Background & Summary
Remote sensing images are important media for people to perceive and understand the real world and make decisions1. Compared to general natural images, remote sensing images provide unique insights into human visual cognitive mechanisms with the overhead perspective and broad perspective2. Remote sensing object detection, a fundamental method for large-scale monitoring of geographic environments and surface changes, is used to search for and identify specific objects or phenomena in remote sensing images3,4. This technique is widely used in areas such as urban planning5, precision agriculture6, and disaster monitoring7. Recent progress in deep learning has significantly improved the ability of detection models to automatically learn and extract features from images, leading to higher detection efficiency and accuracy8. However, remote sensing object detection is still a data-driven statistical learning process that requires a large amount of labeled data for training9. Typically, training an object detection algorithm in a specific remote sensing task (e.g., detect the land cover type) requires experts to visually interpret and manually label hundreds of images and thousands of objects, which is undoubtedly a resource-intensive task. Therefore, our goal is to effectively integrate human cognitive processes in artificial intelligence (AI) approaches of remote sensing object detection. In other words, we aim to first identify and extract the main cognitive processes of the human brain (reflecting by human EEG and eye-tracking data) when detecting objects in remote sensing images, and then decode these brain and eye signals that could replace or supplement human annotations.
Previous research has shown that human eye movements and brain activities can help improve and interpret AI models for remote sensing tasks10,11. Eye-tracking data reveal explicit patterns of visual attention, whereas EEG signals elucidate implicit cognitive processes. For example, eye-tracking data can be used to label the locations of candidate objects, which can then be identified via convolutional neural networks (CNNs)12. EEG data are widely used in the training of BCI and object detection models to improve the efficiency of remote sensing interpretation13,14. In addition, eye-tracking and EEG data have been used to explain and enhance deep learning models in areas such as feature extraction, attention module, and classification15,16. However, to date, there has been no integrated eye-tracking and EEG data to investigate human visual attention for remote sensing object detection tasks. This is mainly because designing synchronized eye-tracking and EEG experiments comes with higher costs and more complex procedures, including extended setup and calibration, managing system interference, aligning data with differing temporal resolutions, and handling increased noise and artifacts during preprocessing. However, it is precisely these challenges that make such data extremely valuable. Synchronizing eye-tracking with EEG allows researchers to combine observable visual attention patterns with internal neural responses. This makes it possible to identify not only where people focus their attention in remote sensing images, but also the brain activity that occurs at the same time. This integration bridges the gap between behavioral and neural correlates of attention. It also facilitates mutual validation through complementary analytical methods, offering a more comprehensive understanding of human cognitive processes during complex tasks. Furthermore, such datasets can provide both interpretability benchmarks and valuable insights for deep learning.
In this paper, we present our EEGET-RSOD dataset, which includes synchronized EEG and eye-tracking data from 38 remote sensing interpretation experts. It contains a total of 1,000 remote sensing images with corresponding object label data. Each visual stimulus is presented within 3000 ms, aligning with the typical human process of searching for and identifying objects in remote sensing images13,17. Experts must locate objects in these stimuli within the given time. In previous studies, rapid serial visual presentation (RSVP)18,19 was often used to investigate the cognitive mechanisms of human visual search and recognition. Several eye-tracking and brain activity datasets (such as COCO-Search1820 and BioTest EEG21) have also been used. In these datasets, stimuli are typically presented for 200 to 500 ms, which ensures alignment between neural activity and visual behavior but eliminates important aspects of normal human object search processes. Therefore, it is crucial to emphasize the value of simultaneously recording EEG and eye-tracking data in this study. Eye-tracking data allow us to understand which objects are being focused on during different periods of human object detection, enabling us to extract the corresponding EEG signals for each object. These signals have the potential to reflect complex visual perception, search, and target detection processes, providing valuable insights into human cognitive patterns in real world. Therefore, the EEGET-RSOD dataset has a wide range of applications and important value in the field of human visual research. It will attract the joint interest of researchers in geography, cognitive science, and artificial intelligence. Furthermore, it will promote interdisciplinary research such as geospatial brain-like intelligence9 and neurocognitive geography22.
Methods
Participants
To recruit qualified participants (i.e. the expert for remote sensing object detection) for the experimental tasks, potential participants need pass a remote sensing image interpretation test in addition to completing the remote sensing courses. The test consisted of 10 questions, where potential participants were required to search for airplanes in images within 5 seconds and count the number of occurrences. Participants who achieved an accuracy rate of over 90% are regarded as experts for remote sensing object detection and are eligible to participate in the formal experiment. Based on this, a total of 40 participants were involved, including 17 males and 23 females, with an average age of 22.5 years (SD = 2.03). We explained the principles of eye tracking and EEG, along with potential risks to participants, and collected their signed consent forms of the formal experiment. All participants’ vision was normal or corrected to normal, and none had a history of brain-related conditions. This study was approved by the Ethics Committee of Beijing Normal University (No: 20221024111). All participants voluntarily took part in the experiment and consented to the collection and use of eye-tracking and EEG data, including the publication and public disclosure of data that do not contain direct personal identification information.
Stimuli
The stimuli for this experiment were selected from the DIOR dataset, a large dataset for remote sensing object detection that primarily includes satellite and aerial images23. This dataset covers various scenes, object categories, and complex backgrounds, making it well-suited for remote sensing object detection in complex environments. We selected 900 remote sensing images containing airplanes, with each image containing between 1 and 6 airplanes. Additionally, 100 images without airplanes were chosen to reduce participants’ expectation effects regarding the objects24,25 (Fig. 1A). All the images were 800 × 800 pixels in size. The 1,000 images were evenly divided into 10 blocks, each containing 90 images with airplanes and 10 images without airplanes, with a random order for each block.
Fig. 1.
Experimental setup. (A) Examples of stimuli; (B) Eye tracking and simultaneous EEG acquisition system; (C) Experimental procedure; (D) The position of the EEG electrodes following the 10–20 system.
Experimental procedure
In the experiment, the participants first completed a 13-point eye-tracking calibration. Before the formal experiment began, the participants underwent a keypress training session in which three test images were used to familiarize themselves with the procedure. Then they viewed the first block, which contained 100 remote sensing images. Each image was displayed for 3,000 ms, during which the participants were instructed to search for and fixate on all airplanes in the image. A grayscale image, displayed for 500 ms, was shown between each pair of remote sensing images. After completing a block, participants can rest for a few minutes and start the next block when they are ready. Each participant was required to complete five blocks. While a 13-point eye-tracking calibration was used before the formal experiment began, a 5-point calibration was employed for subsequent blocks before starting. The participants were instructed to remain as still as possible during the task (Fig. 1C).
Data acquisition
The experiment was conducted in a quiet laboratory. Stimulus presentation was controlled via SMI’s Experiment Center. The participants used a laptop with a screen resolution of 1920 × 1080 and a refresh rate of 60 Hz and sat 60 cm away from the screen (Fig. 1B).
Eye-tracking data were collected using an SMI RED250 eye tracker, with a sampling rate of 250 Hz, an accuracy of 0.4°, and a spatial resolution of 0.03°. Eye-tracking calibration was performed at the start of the experiment. During the validation step, calibration was repeated until the error between any two measurements at a single point was less than 0.5° or until the average error across all points was less than 1°.
EEG data were collected using an NE Enobio 32 system with a sampling rate of 500 Hz. Thirty-two conductive silver chloride gel electrodes were applied according to the international 10–20 system (Fig. 1D), with a reference electrode placed at the right mastoid. The electrodes were positioned as follows: frontal (Fp1, Fp2, Fz, F3, F4, F7, and F8), central (Cz, C3, and C4), temporal (T7 and T8), parietal (P7 and P8), and occipital (Oz, O1, and O2). Additional electrodes were placed in the frontal (AF3 and AF4), central frontal (FC1, FC2, FC5, and FC6), central parietal (CP1, CP2, CP5, and CP6), and occipital‒parietal (PO3 and PO4) regions. Prior to recording, the impedance of each electrode was checked to ensure good contact, with the impedance maintained below 5 kΩ. Electrode impedance levels were checked before each block.
Data temporal alignment
Eye movement and EEG data were recorded using the same computer, and were synchronized via keyboard inputs. Specifically, Key events were assigned at the beginning and end of each block. At the start of the experiment, the participants were instructed to press the numeric key “1” to initiate the block. Upon completing a block, participants were required to press the numeric key “3” to finish this block. When the participant pressed the designated key, the eye-tracking and EEG acquisition software simultaneously recorded the key press and its corresponding timestamp. Each participant provided 10 synchronized keystroke records.
The conversion formula between eye movement time and EEG time is as follows:
1 |
Eye-tracking data is recorded using timestamps, with a precision of one-thousandth of a millisecond (0.001 ms). To convert this into milliseconds, the must be divided by 1000. In contrast, EEG data is based on sample point counts. As our data were acquired with a sampling rate of 500 Hz, the time interval between each sample point was 2 ms. Therefore, the must be multiplied by 2 to convert the EEG data’s sample point count (latency) into milliseconds. By using the timestamps of the synchronized keystrokes within the eye movement and EEG data, we can calculate the temporal discrepancy between the two systems. Specifically, we converted the timestamps of the eye-tracking and EEG data into time values measured in milliseconds. This allowed us to obtain the precise time records of the same button-press event in both the eye-tracking and EEG data. Subsequently, by subtracting these two time records, we derived the temporal discrepancy between the two systems, denoted as .We also provided a dataset containing the values for each participant.
Figure 2 displays one example of the occurrence of blinks in the eye movement data, as represented in the EEG, along with the corresponding topographic maps. It reflects the good synchronization accuracy of our eye-tracking and EEG data acquisition.
Fig. 2.
Visualization of single-trial EEG and eye-tracking blink data.
Data Records
The dataset provided in this paper is available at Figshare26. The structure of the dataset is illustrated in Fig. 3. This dataset contains six types of data: raw EEG data, preprocessed EEG data, eye-tracking data, data alignment files, saliency maps, stimulus material data, and object annotation data.
Fig. 3.
The structure of the dataset.
EEG recording
EEG data were collected using an NE Enobio 32 system with a sampling rate of 500 Hz. The reference electrode was placed on the right mastoid, and all channels were positioned according to the extended 10–20 system. A conductive gel was applied to reduce the impedance at each electrode. The file names are organized by participant ID and saved in EDF + format. All raw EEG data are stored in the “EEG.zip” file.
Additionally, we provide preprocessed EEG files (“EEG_clean data.zip”). Preprocessing was performed via EEGLAB. First, channel locations were set for the raw EEG data. A high-pass filter of 0.1 Hz, a low-pass filter of 80 Hz, and a notch filter of 50 Hz were applied. Next, rereferencing was performed using an average reference across the whole brain. Finally, independent component analysis (ICA) was used, and eye-movement artifacts were manually removed. The preprocessed EEG data are also organized by participant ID and saved in EDF format. Please note that preprocessed EEG data is only for users who wish to use it directly. It is advised to manually eliminate artifacts to achieve more effective results.
Eye-tracking recording
Eye-tracking data were collected using an SMI RED 250 binocular eye tracker, with a sampling rate of 250 Hz. File names are organized by participant ID and saved in TXT format. Each ET data frame is stored on a separate line and includes the recording time, event type (i.e., Message and Sample), gaze coordinates for both eyes, pupil size, gaze point type (i.e., Fixation, Saccade, and Blink), and corresponding stimulus material. An individual gaze plot is shown in Fig. 1.
For fixation and saccade detection, we used a method based on time and spatial thresholds to identify fixations, which is suitable for a sampling rate of 250 Hz. A fixation is defined as the gaze remaining within 0.5° of visual angle for at least 50 ms. Saccade detection is based on velocity, time and amplitude thresholds. The minimum saccade duration was set to 5 ms, the minimum saccade amplitude was set to 0.5°, and the peak velocity threshold was set to 50°/s. When the eye movement velocity, time and amplitude exceed the corresponding thresholds, the period is marked as a saccade.
Time synchronization data
This file is used for aligning the timestamps of the eye-tracking data with the EEG data. It contains the parameter b, which is required for the time conversion formula (Eq. (1)). Each participant’s data corresponds to a unique b value.
Gaze heatmap
The heatmaps generated from eye-tracking points are considered ground truths for training and testing visual saliency detection models. These heatmaps show the regions that attract the most attention from human vision. We created gaze heatmaps on the basis of kernel density estimates of all participants’ eye-tracking points. Each gaze heatmap is named the same as its corresponding stimulus material (Fig. 1).
Stimulus and object label data
The dataset includes the 1,000 selected remote sensing images, along with the object label files from the DIOR dataset (Fig. 1).
Technical Validation
Eye-tracking validation
We employed four methods to assess the quality of the eye-tracking ET data: the valid sampling rate, the object detection rate of the participants, the stability of the fixation duration, and the consistency of the fixation point distribution. Each method evaluates different aspects of the reliability of the eye-tracking data.
The valid sampling rate measures how well the eye tracker tracked the eyes during the experiment, indicating the quality of the data collection27,28. We used Begaze’s algorithm to calculate invalid data points caused by factors such as blinking, tracking loss, or device calibration issues. The remaining data were considered valid. The valid sampling rate is the ratio of valid data points to the total number of data points. As shown in Table 1 Details of all participants in the dataset, the sampling rate exceeded 90% for 34 participants, the sampling rate was between 85% and 90% for 5 participants, and the sampling rates of the remaining 2 participants (D01 and D02) were below this threshold and were therefore excluded from the dataset.
Table 1.
Details of all participants in the dataset.
Participant No. | Sex | Age (years) | Educational level | Object detection rate | Effective sampling rate |
---|---|---|---|---|---|
P01 | Female | 21 | Undergraduate student | 86.11% | 95.01% |
P02 | Female | 21 | Undergraduate student | 81.26% | 96.77% |
P03 | Female | 21 | Undergraduate student | 85.13% | 93.93% |
P04 | Male | 26 | Master’s student | 77.78% | 87.21% |
P05 | Female | 24 | Master’s student | 87.87% | 80.19% |
P06 | Male | 25 | Master’s student | 96.59% | 90.00% |
P07 | Male | 26 | Master’s student | 92.03% | 90.72% |
P08 | Male | 25 | Master’s student | 72.04% | 86.74% |
P09 | Female | 20 | Undergraduate student | 93.22% | 90.99% |
P10 | Male | 20 | Undergraduate student | 95.43% | 97.26% |
P11 | Male | 20 | Undergraduate student | 94.50% | 87.16% |
P12 | Male | 21 | Undergraduate student | 90.21% | 84.15% |
P13 | Male | 27 | Master’s student | 95.83% | 82.36% |
P14 | Male | 25 | Master’s student | 93.86% | 88.83% |
P15 | Female | 19 | Undergraduate student | 97.50% | 83.89% |
P16 | Female | 18 | Undergraduate student | 97.11% | 88.21% |
P17 | Female | 22 | Master’s student | 94.19% | 91.19% |
P18 | Female | 24 | Master’s student | 88.15% | 89.14% |
P19 | Female | 21 | Master’s student | 72.85% | 89.27% |
P20 | Female | 22 | Master’s student | 90.91% | 85.64% |
P21 | Female | 20 | Undergraduate student | 94.08% | 95.94% |
P22 | Female | 20 | Undergraduate student | 78.35% | 38.58% |
P23 | Female | 21 | Undergraduate student | 88.67% | 94.94% |
P24 | Male | 21 | Undergraduate student | 91.78% | 94.53% |
P25 | Male | 25 | Master’s student | 85.34% | 85.56% |
P26 | Female | 24 | Master’s student | 98.83% | 90.80% |
P27 | Female | 21 | Undergraduate student | 95.29% | 96.48% |
P28 | Male | 24 | Master’s student | 89.94% | 95.28% |
P29 | Male | 24 | Master’s student | 84.25% | 97.46% |
P30 | Male | 24 | Master’s student | 96.55% | 93.88% |
P31 | Female | 24 | Master’s student | 92.67% | 80.84% |
P32 | Male | 25 | Master’s student | 84.49% | 89.57% |
P33 | Male | 25 | Master’s student | 91.78% | 94.53% |
P34 | Female | 18 | Undergraduate student | 97.31% | 86.47% |
P35 | Male | 23 | Master’s student | 92.67% | 80.84% |
P36 | Female | 22 | Master’s student | 90.26% | 88.37% |
P37 | Female | 23 | Master’s student | 92.30% | 94.06% |
P38 | Female | 24 | Master’s student | 97.11% | 88.21% |
D01 | Female | 23 | Master’s student | - | 75.12% |
D02 | Male | 25 | Master’s student | - | 66.72% |
The object detection rate measures how well participants completed the task, specifically, whether they identified all the objects in the remote sensing images. We used label data from the DIOR dataset to define the object areas. Considering the limitations of human visual fields, we defined a circular region centered on the fixation point with a radius of 50 pixels as the effective visual field. If a participant’s effective visual field overlapped with the object area, the object was considered detected. (see in Fig. 4) The detection rate was calculated as the ratio of detected objects to the total number of objects. The results are shown in Table 1. On average, the detection rate of participants was 90.11%, with the highest rate of 98.83% and the lowest rate of 72.04%. A total of 24 participants achieved a detection rate above 90%, indicating that most participants successfully completed the task as designed. Ten participants had detection rates between 80% and 90%, whereas 4 participants had rates below 80%, demonstrating that the remote sensing objects used in this experiment were challenging. A ceiling effect was not observed in the study29. Furthermore, we conducted an analysis of the correlation between the effective sampling rate and the detection rate using Pearson’s correlation coefficient. The results indicated no significant correlation between the two (r = 0.232, p > 0.05). This suggests that the loss of some eye-tracking data points did not impact the accuracy of evaluating the participants’ task completion.
Fig. 4.
Objective detection rate measurement method. These images are used as illustrations. The red dots and lines are the scanpaths of the participant’s eye movement, and the blue areas are the objects that need to be searched.
The eye movements of the participants in each image reflected their visual attention during the experimental process30,31. We used an I-DT(Identification Dispersion-Threshold) method based on time and spatial thresholds to identify fixations. The minimum fixation duration was set to 50 ms, with a maximum dispersion of 0.5°. Saccade detection was based on a peak velocity threshold of 50°/s, with a minimum saccade duration of 5 ms and a minimum saccade amplitude of 0.5°. Figure 5 shows the average fixation duration, average saccade amplitude, and average pupil size for each participant on remote sensing images with and without objects. Compared with viewing images containing objects, when reviewing images without objects, participants displayed longer fixation durations and greater saccade amplitude. This finding indicates that participants demonstrate distinct patterns in their eye movements during the remote sensing object detection task. Longer fixation duration and greater saccade amplitude typically suggest that participants engage in more complex visual processing and extensive visual searching32,33. However, there were no significant differences in pupil size across the different images. This finding indicates that the levels of focused attention and cognitive load remained consistent for participants when viewing both object-present and object-absent remote sensing images34,35.
Fig. 5.
Eye movement indices of the participants on object and non-object images.
Additionally, we conducted a statistical analysis of the saccade direction in remote sensing images with and without objects. The saccade direction is defined as the vector from the previous gaze point to the subsequent gaze point36. We defined the positive x-axis as 0 degrees and divided the 360-degree range into 24 equal angular segments in a counterclockwise manner. We subsequently calculated the number of fixations by participants across different directions for each image. Finally, we assessed the number of fixations in both object-present and object-absent images, along with the coefficient of variation and skewness for the fixation counts across different directions. The results are shown in Fig. 6.
Fig. 6.
Saccade direction difference and visualization.
The results indicate that participants exhibited a greater number of fixations in object-absent images than in object-present images, alongside a lower directional bias in gaze. We visualized the gaze trajectories and the fixation counts for each direction. In object-present remote sensing images, participants demonstrated a more consistent gaze trajectory with a clear directional focus. Conversely, in object-absent images, participants exhibited a more structured search pattern, akin to the gaze behavior observed during text reading.
The fixation location reflects the spatial distribution of participants’ visual attention when viewing the same image37,38. Heatmaps are commonly used to visualize and analyze the consistency of participants’ fixation distributions. We randomly selected the eye-tracking data of N participants, generated heatmaps on the basis of their fixation points, and used two similarity metrics—the Pearson linear correlation coefficient (CC) and the Kullback‒Leibler (KL) divergence39—to measure the difference between these heatmap and the heatmaps of all participants’ fixation points. CC represents an intuitive measure of the linear relationship between two Heatmaps. And KL measures the difference in distribution between two Heatmaps. A CC value close to 1 or -1 indicates a strong linear correlation, whereas a CC value of 0 indicates no linear correlation. The results show that as the number of participants decreased, the CC value gradually decreased, and the variance increased (Fig. 7A). When the number of participants was fewer than 11, the CC was below 0.9. The KL divergence is often used to measure the difference between the heatmap distribution and the true distribution, with lower KL values indicating greater similarity. The results demonstrate that as the number of participants decreased, the KL values gradually increased, and the variance also increased (Fig. 7B). When the number of participants was below 9, the KL value exceeded 0.1. The results above indicate that as the number of participants increased, the variation in the saliency map decreased and gradually stabilized (Fig. 7C). This demonstrates that the current sample size is sufficient to capture consistent patterns in eye-tracking data. Specifically, the stabilization of the saliency map indicates that the eye-tracking data already reflect stable visual behavior patterns, which supports the reliability and robustness of the dataset for its intended applications.
Fig. 7.
Consistency of fixation point distributions.
EEG validation
We used event-related potentials (ERPs) as a key metric to evaluate the EEG data in this experiment. EEG preprocessing was manually performed using EEGLAB40 and FieldTrip41. First, a bandpass filter of 0.1–80 Hz was applied to the EEG signals using EEGLAB to remove nonneural artifacts. Additionally, a notch filter of 48–52 Hz was employed to eliminate line noise interference. Next, we performed rereferencing using the average of all the electrodes. We subsequently applied blind source separation through ICA42 to remove artifacts such as blinks, eye movements, and electromyographic (EMG) noise. EEG data were segmented into epochs based on stimulus onset, with each epoch time-locked to events ranging from -500 ms to 1000 ms. Baseline correction was applied using the prestimulus period from -300 to 0 ms. On the basis of previous research, we selected the Pz, Cz, and Oz channels, as they are associated with visual neural responses.
The ERP is a specific type of evoked potential caused by particular psychological or cognitive activities, reflecting changes in cortical potentials. P300 is an ERP component closely associated with attention and memory processes43 and is typically observed in regions such as the frontal and parietal lobes44. In this experiment, we observed a prominent P300 component in the Pz channel (Fig. 8A), indicating that the stimuli elicited attention-related processes in the participants45. Another common ERP component, N1, reflects the brain’s early processing of visual stimuli46. A distinct N1 component was observed in the Cz channel (Fig. 8B), which may be related to stimulus classification and memory updating47. In the Oz channel, we observed components resembling P1 and N2 (Fig. 8C)48. The P1 component likely reflects the primary visual cortex’s response to early attention tasks49. The N2 component is associated with stimulus recognition, classification, and semantic evaluation50,51.
Fig. 8.
ERPs of different channels.
Existing research has demonstrated that the power spectral density (PSD) characteristics of EEG can be utilized to assess participants’ cognitive load52 and attention level53. Alpha band (8–12 Hz) is associated with the resting state of individuals, and lower Alpha power indicates heightened attention and information processing54. Theta band (3–7 Hz) is linked to memory encoding and working memory tasks, with increased theta activity often indicating higher cognitive load55. In contrast, beta band (13–20 Hz) is related to active attention, information processing, and task execution, with increased beta activity reflecting higher levels of attentiveness56. Consequently, we selected the Alpha/Theta ratio and Alpha/Beta ratio of the frontal channels (Fz, F3, and F4) as indicators to assess cognitive load and attention levels. These metrics were used to compare participants’ attention states during tasks involving two types of remote sensing images.
Figure 9 reveals that most participants exhibited no significant differences in attention states between object-present and target-absent remote sensing images. This indicates that participants maintained similar cognitive load and attention levels across both image types, suggesting that they exerted consistent effort in the target detection task, regardless of the presence of targets within the remote sensing images.
Fig. 9.
The mean alpha/theta and alpha/beta band power ratio of all the participants on object and non-object images.
In summary, the replication of ERP results demonstrates that our EEG data reliably capture stable neural activity patterns. This not only confirms the adequacy of both the participant sample size and the number of stimuli, but also proves the high quality of the data. In addition, the PSD validation further supports the stability of participants’ cognitive states throughout the experiment, reinforcing the data’s reliability.
Furthermore, classifying behaviors on the basis of brain signals is a common and significant task in the field of EEG57, serving as a robust indicator of data validity. This classification involves interpreting EEG signals to identify patterns associated with visual behaviors. Such tasks have applications in BCIs and assistive technologies, where individuals can identify objects in remote sensing images solely through intent58.
As shown in Fig. 10, we utilized labeled files to define the extent of objects within each image. Next, we categorized gaze points on the basis of their locations, distinguishing between fixations within the object region and those outside it. We then converted the fixation start times into EEG data timestamps via Eq. (1). Finally, we added both fixation types—within-object and outside-object—into the EEG dataset according to these timestamps.
Fig. 10.
EEG events based on fixation classification.
We utilized event markers to create epochs. We selected five different time windows (See Table 2 for details) for analysis. All the epochs were divided into training and testing sets at a 70:30 ratio, and classification was performed using EEGNet59. For classification, these hyperparameters were trained: the size of the batch was 32, the stochastic gradient descent (SGD) optimizer was employed, the learning rate was 5e-4, and the maximum number of iterations was 50. The results of the classification are shown in Table 2.
Table 2.
Classification results for EEGNet at different epochs.
Time window size | Start and end time | Mean Accuracy | Maximum Accuracy |
---|---|---|---|
500 ms | −500~0 ms | 57.85% | 84.54% |
0~500 ms | 59.63% | 84.62% | |
500~1000 ms | 57.38% | 82.00% | |
1000~1500 ms | 57.65% | 84.62% | |
1000 ms | −500~500 ms | 59.72% | 80.13% |
0~1000 ms | 57.60% | 81.41% | |
500~1500 ms | 58.15% | 85.90% | |
1500 ms | −500~1000 ms | 57.94% | 84.62% |
0~1500 ms | 58.15% | 85.90% | |
2000 ms | −500~1500 ms | 59.04% | 82.69% |
In summary, the EEGET-RSOD dataset offers a valuable resource for enhancing machine learning models and deepening our understanding of human cognitive processes in the context of remote sensing object detection. However, it is essential to acknowledge certain limitations inherent in the current study. Firstly, the dataset is composed of experienced remote sensing experts, whose cognitive patterns may not fully represent those of the general population. While their shared demographic and professional backgrounds may further limit the generalizability of the findings, it offers valuable insights into expert-specific cognitive processes and strategies, which help establish a baseline for understanding domain-related expertise. Secondly, the dataset focuses specifically on aircraft detection in remote sensing imagery, which may constrain the applicability of the results to other object types or tasks involving greater complexity. Finally, the laboratory-controlled observation conditions, coupled with fixed image presentation durations, might not fully capture real-world observation behaviors. To address these limitations, future studies could incorporate a wider variety of stimuli, recruit participants with more diverse demographic and professional backgrounds, and design tasks that better simulate real-world remote sensing applications.
Supplementary information
Acknowledgements
This research was supported by the National Natural Science Foundation of China (Key Program, No. 42230103).
Author contributions
Bing He: Study design and conceptualization; Data collection; Data curation; Data validation; Manuscript writing; Manuscript correction. Hongqiang Zhang: Data curation; Data validation. Tong Qin: Study design; Data validation; Manuscript correction. Bowen Shi: Study design; Data collection; Data curation. Qiao Wang: Study design and conceptualization; Manuscript correction; Supervision. Weihua Dong: Study design and conceptualization; Data validation; Manuscript correction; Supervision. All authors reviewed and approved the final version of the manuscript.
Code availability
For the classification task, code is available at: https://github.com/Bing-1997/EEGET_RSOD/tree/main/eegnet.
For Technical Validation, code is available at: https://github.com/Bing-1997/EEGET_RSOD.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
The online version contains supplementary material available at 10.1038/s41597-025-04995-w.
References
- 1.Xiong, Z. T., Zhang, F. H., Wang, Y., Shi, Y. L. & Zhu, X. X. EarthNets: Empowering artificial intelligence for Earth observation. IEEE Geoscience and Remote Sensing Magazine10.1109/mgrs.2024.3466998 (2024). [Google Scholar]
- 2.Shen, Y. Y., Liu, D., Zhang, F. Z. & Zhang, Q. L. Fast and accurate multi-class geospatial object detection with large-size remote sensing imagery using CNN and Truncated NMS. ISPRS Journal of Photogrammetry and Remote Sensing191, 235–249, 10.1016/j.ISPRSjprs.2022.07.019 (2022). [Google Scholar]
- 3.Cheng, G. & Han, J. W. A survey on object detection in optical remote sensing images. ISPRS Journal of Photogrammetry and Remote Sensing117, 11–28, 10.1016/j.ISPRSjprs.2016.03.014 (2016). [Google Scholar]
- 4.Liu, Z. G., Gao, Y., Du, Q. Q., Chen, M. & Lv, W. Q. YOLO-Extract: Improved YOLOv5 for Aircraft Object Detection in Remote Sensing Images. IEEE Access11, 1742–1751, 10.1109/access.2023.3233964 (2023). [Google Scholar]
- 5.Crivellari, A., Wei, H., Wei, C. Z. & Shi, Y. H. Super-resolution GANs for upscaling unplanned urban settlements from remote sensing satellite imagery - the case of Chinese urban village detection. International Journal of Digital Earth16, 2623–2643, 10.1080/17538947.2023.2230956 (2023). [Google Scholar]
- 6.Hong, R. et al. Yolo-Light: Remote Straw-Burning Smoke Detection Based on Depthwise Separable Convolution and Channel Attention Mechanisms. Applied Sciences-Basel13, 10.3390/app13095690 (2023).
- 7.Bo, W. H. et al. BASNet: Burned Area Segmentation Network for Real-Time Detection of Damage Maps in Remote Sensing Images. IEEE Transactions on Geoscience and Remote Sensing60, 10.1109/tgrs.2022.3197647 (2022).
- 8.Zhang, L., Zhang, L. & Yuan, Q. Large Remote Sensing Model:Progress and Prospects. Geomatics and Information Science of Wuhan University48, 1574–1581 (2023). [Google Scholar]
- 9.Jiao, L. C. et al. Brain-Inspired Remote Sensing Interpretation: A Comprehensive Survey. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing16, 2992–3033, 10.1109/jstars.2023.3247455 (2023). [Google Scholar]
- 10.Wang, X. J. et al. in 3rd International Conference on Computational Intelligence and Applications (ICCIA). 171-174 (2018).
- 11.He, B., Qin, T., Shi, B. W. & Dong, W. H. How do human detect targets of remote sensing images with visual attention? International Journal of Applied Earth Observation and Geoinformation132, 10.1016/j.jag.2024.104044 (2024).
- 12.Li, X. B., Jiang, B. T., Wang, S. J., Shen, L. & Fu, Y. Z. A Human-Computer Fusion Framework for Aircraft Recognition in Remote Sensing Images. IEEE Geoscience and Remote Sensing Letters17, 297–301, 10.1109/lgrs.2019.2918955 (2020). [Google Scholar]
- 13.Fan, L. W. et al. DC-tCNN: A Deep Model for EEG-Based Detection of Dim Targets. IEEE Transactions on Neural Systems and Rehabilitation Engineering30, 1727–1736, 10.1109/tnsre.2022.3184725 (2022). [DOI] [PubMed] [Google Scholar]
- 14.Long, J. W., Fang, Z. X. & Wang, L. B. SK-MMFMNet: A multi-dimensional fusion network of remote sensing images and EEG signals for multi-scale marine target recognition. Information Fusion108, 10.1016/j.inffus.2024.102402 (2024).
- 15.Moradizeyveh, S. et al. When Eye-Tracking Meets Machine Learning: A Systematic Review on Applications in Medical Image Analysis. Arxiv, arXiv:2403.07834 (2024).
- 16.Hollenstein, N. et al. Decoding EEG Brain Activity for Multi-Modal Natural Language Processing. Frontiers in Human Neuroscience15, 10.3389/fnhum.2021.659410 (2021). [DOI] [PMC free article] [PubMed]
- 17.Blacker, K. J., Peltier, C., McKinley, R. A. & Biggs, A. T. What Versus How in Visual Search: Effects of Object Recognition Training, Strategy Training, and Non-invasive Brain Stimulation on Satellite Image Search. Journal of Cognitive Enhancement4, 131–144, 10.1007/s41465-020-00165-5 (2020). [Google Scholar]
- 18.Mitchell, D. C. Locus of the experimental effects in the rapid serial visual presentation (RSVP) task. Perception & Psychophysics25, 143–149, 10.3758/bf03198801 (1979). [DOI] [PubMed] [Google Scholar]
- 19.Gerson, A. D., Parra, L. C. & Sajda, P. Cortical origins of response time variability during rapid discrimination of visual objects. Neuroimage28, 342–353, 10.1016/j.neuroimage.2005.06.026 (2005). [DOI] [PubMed] [Google Scholar]
- 20.Chen, Y. P. et al. COCO-Search18 fixation dataset for predicting goal-directed attention control. Scientific Reports11, 10.1038/s41598-021-87715-9 (2021). [DOI] [PMC free article] [PubMed]
- 21.Gifford, A. T., Dwivedi, K., Roig, G. & Cichy, R. M. A large and rich EEG dataset for modeling human visual object recognition. Neuroimage264, 10.1016/j.neuroimage.2022.119754 (2022). [DOI] [PMC free article] [PubMed]
- 22.Yang, T. et al. Neurocognitive geography: exploring the nexus between geographic environments, the human brain, and behavior. Science bulletin, 10.1016/j.scib.2025.01.044 (2025). [DOI] [PubMed]
- 23.Li, K., Wan, G., Cheng, G., Meng, L. Q. & Han, J. W. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS Journal of Photogrammetry and Remote Sensing159, 296–307, 10.1016/j.ISPRSjprs.2019.11.023 (2020). [Google Scholar]
- 24.van Moorselaar, D., Lampers, E., Cordesius, E. & Slagter, H. A. Neural mechanisms underlying expectation-dependent inhibition of distracting information. Elife9, 10.7554/eLife.61048 (2020). [DOI] [PMC free article] [PubMed]
- 25.Won, B. Y., Forloines, M., Zhou, Z. H. & Geng, J. J. Changes in visual cortical processing attenuate singleton distraction during visual search. Cortex132, 309–321, 10.1016/j.cortex.2020.08.025 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.He, B. et al. A simultaneous EEG and eye-tracking dataset for remote sensing object detection. figshare, 10.6084/m9.figshare.26943565 (2025). [DOI] [PMC free article] [PubMed]
- 27.Hollenstein, N. et al. ZuCo, a simultaneous EEG and eye-tracking resource for natural sentence reading. Scientific Data5, 10.1038/sdata.2018.291 (2018). [DOI] [PMC free article] [PubMed]
- 28.Blignaut, P. & Wium, D. Eye-tracking data quality as affected by ethnicity and experimental design. Behavior Research Methods46, 67–80, 10.3758/s13428-013-0343-0 (2014). [DOI] [PubMed] [Google Scholar]
- 29.Wang, L. J., Zhang, Z. Y., McArdle, J. J. & Salthouse, T. A. Investigating ceiling effects in longitudinal data analysis. Multivariate Behavioral Research43, 476–496, 10.1080/00273170802285941 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Smith, D. T., Rorden, C. & Jackson, S. R. Exogenous orienting of attention depends upon the ability to execute eye movements. Current Biology14, 792–795, 10.1016/j.cub.2004.04.035 (2004). [DOI] [PubMed] [Google Scholar]
- 31.Dong, W. et al. New research progress of eye tracking-based map cognition in cartography since 2008. Acta Geographica Sinica74, 599–614 (2019). [Google Scholar]
- 32.Dong, W. H., Zheng, L. Y., Liu, B. & Meng, L. Q. Using Eye Tracking to Explore Differences in Map-Based Spatial Ability between Geographers and Non-Geographers. ISPRS International Journal of Geo-Information7, 10.3390/ijgi7090337 (2018).
- 33.Dong, W. H., Yang, T. Y., Liao, H. & Meng, L. Q. How does map use differ in virtual reality and desktop-based environments? International Journal of Digital Earth13, 1484–1503, 10.1080/17538947.2020.1731617 (2020). [Google Scholar]
- 34.Alnaes, D. et al. Pupil size signals mental effort deployed during multiple object tracking and predicts brain activity in the dorsal attention network and the locus coeruleus. Journal of Vision14, 10.1167/14.4.1 (2014). [DOI] [PubMed]
- 35.Pei, X. Z. et al. A simultaneous electroencephalography and eye-tracking dataset in elite athletes during alertness and concentration tasks. Scientific Data9, 10.1038/s41597-022-01575-0 (2022). [DOI] [PMC free article] [PubMed]
- 36.Erkelens, C. J. & Sloot, O. B. Initial directions and landing positions of binocular saccades. Vision Research35, 3297-3303, 10.1016/0042-6989(95)00077-r (1995). [DOI] [PubMed]
- 37.Dong, W. H., Liao, H., Roth, R. E. & Wang, S. Y. Eye Tracking to Explore the Potential of Enhanced Imagery Basemaps in Web Mapping. Cartographic Journal51, 313–329, 10.1179/1743277413y.0000000071 (2014). [Google Scholar]
- 38.de Winter, J. C. F., Dodou, D. & Tabone, W. How do people distribute their attention while observing The Night Watch? Perception51, 763–788, 10.1177/03010066221122697 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Bylinskii, Z., Judd, T., Oliva, A., Torralba, A. & Durand, F. What Do Different Evaluation Metrics Tell Us About Saliency Models? IEEE Transactions on Pattern Analysis and Machine Intelligence41, 740–757, 10.1109/tpami.2018.2815601 (2019). [DOI] [PubMed] [Google Scholar]
- 40.Delorme, A. & Makeig, S. EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. Journal of Neuroscience Methods134, 9–21, 10.1016/j.jneumeth.2003.10.009 (2004). [DOI] [PubMed] [Google Scholar]
- 41.Oostenveld, R., Fries, P., Maris, E. & Schoffelen, J. M. FieldTrip: Open Source Software for Advanced Analysis of MEG, EEG, and Invasive Electrophysiological Data. Computational Intelligence and Neuroscience2011, 10.1155/2011/156869 (2011). [DOI] [PMC free article] [PubMed]
- 42.Jung, T. P. et al. Removing electroencephalographic artifacts by blind source separation. Psychophysiology37, 163–178, 10.1111/1469-8986.3720163 (2000). [PubMed] [Google Scholar]
- 43.Li, F. L. et al. Inter-subject P300 variability relates to the efficiency of brain networks reconfigured from resting- to task-state: Evidence from a simultaneous event-related EEG-fMRI study. Neuroimage205, 10.1016/j.neuroimage.2019.116285 (2020). [DOI] [PubMed]
- 44.Polich, J. Updating p300: An integrative theory of P3a and P3b. Clinical Neurophysiology118, 2128–2148, 10.1016/j.clinph.2007.04.019 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Devillez, H., Guyader, N. & Guérin-Dugué, A. An eye fixation-related potentials analysis of the P300 potential for fixations onto a target object when exploring natural scenes. Journal of Vision15, 10.1167/15.13.20 (2015). [DOI] [PubMed]
- 46.Wascher, E., Hoffmann, S., Sanger, J. & Grosjean, M. Visuo-spatial processing and the N1 component of the ERP. Psychophysiology46, 1270–1277, 10.1111/j.1469-8986.2009.00874.x (2009). [DOI] [PubMed] [Google Scholar]
- 47.Brem, S. et al. Increasing expertise to a novel script modulates the visual N1 ERP in healthy adults. International Journal of Behavioral Development42, 333–341, 10.1177/0165025417727871 (2018). [Google Scholar]
- 48.Yuan, J. & Fu, S. Brief review of event-related potential correlates of visual consciousness and related studies. Chinese Science Bulletin57, 3336–3345 (2012). [Google Scholar]
- 49.Schindler, S. & Bublatzky, F. Attention and emotion: An integrative review of emotional face processing as a function of attention. Cortex130, 362–386, 10.1016/j.cortex.2020.06.010 (2020). [DOI] [PubMed] [Google Scholar]
- 50.Lawson, A. L. et al. Sensation seeking predicts brain responses in the old-new task: Converging multimodal neuroimaging evidence. International Journal of Psychophysiology84, 260–269, 10.1016/j.ijpsycho.2012.03.003 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Cai, B. Y. et al. in 6th International IEEE EMBS Conference on Neural Engineering (NER). 89-92 (2013).
- 52.Liu, Y. X. et al. Fusion of Spatial, Temporal, and Spectral EEG Signatures Improves Multilevel Cognitive Load Prediction. IEEE Transactions on Human-Machine Systems53, 357–366, 10.1109/thms.2023.3235003 (2023). [Google Scholar]
- 53.Fuentes-Martinez, V. J., Romero, S., Lopez-Gordo, M. A., Minguillon, J. & Rodríguez-Alvarez, M. Low-Cost EEG Multi-Subject Recording Platform for the Assessment of Students’ Attention and the Estimation of Academic Performance in Secondary School. Sensors23, 10.3390/s23239361 (2023). [DOI] [PMC free article] [PubMed]
- 54.Pitchford, B. & Arnell, K. M. Resting EEG in alpha and beta bands predicts individual differences in attentional breadth. Consciousness and Cognition75, 10.1016/j.concog.2019.102803 (2019). [DOI] [PubMed]
- 55.Castro-Meneses, L. J., Kruger, J. L. & Doherty, S. Validating theta power as an objective measure of cognitive load in educational video. Etr&D-Educational Technology Research and Development68, 181–202, 10.1007/s11423-019-09681-4 (2020). [Google Scholar]
- 56.Jurewicz, K. et al. EEG-neurofeedback training of beta band (12-22 Hz) affects alpha and beta frequencies - A controlled study of a healthy population. Neuropsychologia108, 13–24, 10.1016/j.neuropsychologia.2017.11.021 (2018). [DOI] [PubMed] [Google Scholar]
- 57.Ngo, T. D. et al. An EEG & eye-tracking dataset of ALS patients & healthy people during eye-tracking-based spelling system usage. Scientific Data11, 10.1038/s41597-024-03501-y (2024). [DOI] [PMC free article] [PubMed]
- 58.Abiri, R., Borhani, S., Sellers, E. W., Jiang, Y. & Zhao, X. P. A comprehensive review of EEG-based brain-computer interface paradigms. Journal of Neural Engineering16, 10.1088/1741-2552/aaf12e (2019). [DOI] [PubMed]
- 59.Lawhern, V. J. et al. EEGNet: a compact convolutional neural network for EEG-based brain-computer interfaces. Journal of Neural Engineering15, 10.1088/1741-2552/aace8c (2018). [DOI] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
For the classification task, code is available at: https://github.com/Bing-1997/EEGET_RSOD/tree/main/eegnet.
For Technical Validation, code is available at: https://github.com/Bing-1997/EEGET_RSOD.