Skip to main content
MethodsX logoLink to MethodsX
. 2025 Aug 27;15:103589. doi: 10.1016/j.mex.2025.103589

An in-flight multimodal data collection method for assessing pilot cognitive states and performance in general aviation

Rongbing Xu a,f,, Shi Cao a,f, Michael Barnett-Cowan b,f, Gulnaz Bulbul c,f, Elizabeth Irving d,f, Ewa Niechwiej-Szwedo b,f, Suzanne Kearns e,f
PMCID: PMC12414839  PMID: 40926804

Abstract

Human factors are central to aviation safety, with pilot cognitive states such as workload, stress, and situation awareness playing important roles in flight performance and safety. Although flight simulators are widely used for training and scientific research, they often lack the ecological validity needed to replicate pilot cognitive states from real flights. To address these limitations, a new in-flight data collection methodology for general aviation using a Cessna 172 aircraft, which is one of the most widely used aircraft for pilot training, is presented.

The dataset combines:

• Human data from wearable physiological sensors (electroencephalography, electrocardiography, electrodermal activity, and body temperature) and eye-tracking glasses.

• Flight data from ADS-B flight recorder.

• Pilot’s self-reported cognitive states and flight performance rate by instructor.

The paper describes the sensor setup, flight task design, and data synchronization procedures. Potential analyses using statistical and machine learning methods are discussed to classify cognitive states and demonstrate the dataset’s value. This methodology supports human factors research and has practical value for applications in pilot training, performance evaluation, and aviation safety management. The method was applied in a field study with 25 participants, from which 20 complete multimodal datasets were retained after data cleaning. After collecting additional data, the resulting dataset will support further research on pilot performance and behavior.

Keywords: In-flight data, Multimodal data collection, Pilot performance, Workload, Stress, Situation awareness, Electroencephalogram, Electrocardiogram, Electrodermal activity, Eye tracking

Graphical abstract

Image, graphical abstract

Specifications table

 

Subject area Engineering
More specific subject area Aviation, Human Factors, Ergonomics.
Name of your method In-Flight Pilot Multimodal Data Collection Method.
Name and reference of original method None.
Resource availability Aircraft: Cessna 172N (https://en.wikipedia.org/wiki/Cessna_172).
EEG: Muse S headband (https://www.choosemuse.com/) with Mind Monitor mobile application (https://mind-monitor.com/).
ECG/HRV: Polar H10 chest strap (https://www.polar.com/) with Polar Sensor Logger application (https://play.google.com/store/apps/details?id=com.j_ware.polarsensorlogger).
EDA/Temperature: EmbracePlus wristband (https://www.empatica.com/) with Care Lab mobile application (https://care.empatica.com/).
Eye tracking: Pupil Labs Neon glasses (https://pupil-labs.com/) connected to a mobile phone, data processed via Pupil Labs Cloud (https://cloud.pupil-labs.com/).
Flight data: Sentry Plus ADS-B receiver (https://flywithsentry.com/sentry-plus) connected to an Apple iPad running ForeFlight EFB application (https://apps.apple.com/us/app/foreflight-mobile-efb/id333252638), data viewable on ForeFlight Web (https://plan.foreflight.com/).
Performance assessment and subjective reports: Competency-Based Assessment Grade Sheet (WISA Technical Report 2024–001).
Data-collection devices: An Android phone and an iPhone/iPad.

Background

Despite significant safety improvements in aviation driven by technology and procedures, human factors remain a primary cause in 70–80 % of accidents and incidents [1,2]. Actions and decisions by pilots, controllers, and maintenance personnel are often implicated, highlighting the need to understand human capabilities and limitations within aviation systems.

Pilot cognitive states, including workload, stress, situation awareness (SA), attention, fatigue, and decision-making, are crucial for safe flight operations [3]. Cognitive incapacitation, potentially triggered by high stress, poses a significant risk due to its subtle nature [4]. Both excessive workload and varying stress levels can impair performance [5], while strong SA (perceiving, understanding, and projecting environmental elements) is essential for effective decision-making [6].

Traditional pilot performance assessments, often relying on subjective instructor evaluations [7], lack granularity and may miss subtle indicators of cognitive overload or inefficient strategies. Objective, data-driven methods integrating flight parameters and physiological signals offer a more detailed and consistent assessment, enabling trend identification and personalized training. Researchers increasingly use physiological signals, such as electroencephalogram (EEG), electrocardiogram (ECG), and electrodermal activity (EDA), as well as eye-tracking, flight data, and pilot performance to objectively assess pilot cognitive states [[8], [9], [10], [11], [12], [13], [14], [15], [16]]. This real-time data provides insights into internal cognitive processes, potentially allowing early detection of stress or high workload [8] and informing better training and human-centered system design [17].

However, much existing research relies on Flight Simulation Training Devices (FSTDs) [[9], [10], [11], [12], [13], [14], [15], [16]]. While safe and controlled, simulators lack ecological validity; they may not replicate the psychological realism or consequences of actual flight, potentially altering pilot behavior and stress responses [18]. Simulators often fail to capture the nuances of real-world stressors, limiting their utility for studying cognitive incapacitation or training for unexpected events [19]. In-flight research, though more complex and costly, provides unparalleled realism.

To address these limitations of simulator studies and the scarcity of real-world data, this paper introduces a novel multimodal in-flight data acquisition methodology for general aviation. Unlike previous work that predominantly relies on simulated environments, this approach captures synchronized data during actual flight operations in a Cessna 172, including continuous physiological signals (EEG, ECG, EDA, body temperature), eye-tracking data, objective flight parameters, expert performance ratings from an onboard instructor, and subjective pilot reports. This methodology represents one of the first attempts to integrate consumer-grade wearable sensors and eye-tracking technologies into real-world general aviation settings. It provides a scientifically rigorous framework for capturing ecologically valid data on pilot cognition and performance, supporting future research using statistical and machine learning analyses to ultimately enhance aviation safety, training effectiveness, and support system design.

Method details

The primary objectives of this study are to: (1) develop a standardized, replicable methodology for in-flight multimodal data collection in general aviation; (2) enable objective assessment of pilot cognitive states and performance using synchronized physiological, behavioral, and expert-evaluated data; and (3) demonstrate the feasibility and real-world applicability of using wearable, non-invasive sensors and consumer-grade devices in operational flight environments.

This section outlines the proposed method in a structured and replicable manner, detailing procedures for participant recruitment, experimental scenario design, selection and configuration of aircraft and sensing devices, standardized performance and cognitive assessment methods, data acquisition workflows, signal processing pipelines, and data storage solutions. The entire process can be reliably reproduced by other researchers.

Participant recruitment

Recruitment procedures target licensed or student pilots to meet specific inclusion criteria relevant to the research questions (e.g., private pilot license holders with current medical certificates, minimum flight hours, specific experience in the Cessna 172). Exclusion criteria may include medical conditions contraindicating participation or use of certain medications. Demographic information (age, gender), flight experience (total hours, hours in type, instrument ratings), race, first language, and other relevant factors are collected. The number of participants is determined based on statistical power considerations for the intended analysis.

Furthermore, a certified flight instructor is required to supervise all flights, monitor in-aircraft tasks, rate pilot performance, and record participants’ self-reported cognitive states. To ensure consistency and reduce subjectivity, the same experienced CFI should be employed throughout the data collection procedure.

All procedures require approval from an institutional review board (IRB) or ethics committee, and participants need to be provided with written informed consent prior to participation, fully aware of the procedures, potential risks, and data confidentiality measures. Documenting participant characteristics is crucial, as human variability in experience (age and gender), culture (race and first language), and inherent capabilities significantly influences performance and responses in operational settings.

Scenario design and aircraft

The flight data collection should be conducted at an airport, which is suitable for flight training and research maneuvers. Air Traffic Controller (ATC) is optional, however, an uncontrolled aerodrome with low air traffic is preferred. Six scenarios, including five distinct Visual Flight Rules (VFR) flight tasks, are designed to elicit a range of pilot cognitive states and performance demands:

  • Rest: A baseline period, typically on the ground before engine starts, to establish baseline physiological readings.

  • Take-off: The sequence including taxi, engine run-up, take-off roll, and initial climb.

  • Steep Turn: Performing a coordinated turn at a steep bank angle (e.g., 45 degrees), requiring precise control and heightened attention.

  • Power-on Stall (1500 RPM): Practicing stall entry and recovery procedures, a critical skill involving significant changes in aircraft handling and potentially elevated stress.

  • Landing: The approach, flare, touchdown, and rollout phases, demanding high levels of visual attention, fine motor control, and decision-making.

  • Full Circuit: Flying a complete traffic pattern around the airport, integrating take-off, climb, level flight, turns, descent, and landing into a continuous sequence.

The method utilizes a Cessna 172N model aircraft, a four-seat, single-engine, high-wing aircraft. This aircraft is one of the most common platforms for flight training and personal use worldwide, making findings relevant to a large segment of the aviation community. Its relatively standard avionics configuration provides a representative environment for studying fundamental piloting tasks.

Devices and data type selection

A comprehensive suite of sensors and assessment tools is selected to capture the pilot's state and operational context during flight. The selection prioritizes relatively non-invasive, wearable, and portable technologies suitable for integration into the constrained cockpit environment of a general aviation aircraft, while aiming to maximize data quality and relevance for assessment of human factors. Fig. 1 demonstrates how participants wore devices. Table 1 presents the details of the devices and tools used, along with the types of data collected.

Fig. 1.

Fig. 1

Setup of wearable devices on participants during flight tasks.

Table 1.

Summary of multimodal data collection systems.

Modality Device / Tool Recorded Parameters Unit / Format Sampling Rate
EEG Muse S + Mind Monitor Raw EEG signals (AF7, AF8, TP9, TP10) µV 256 Hz
Power spectral density in Delta, Theta, Alpha, Beta, Gamma bands µV² or normalized units 1 Hz (post-FFT)
Relative and absolute band power, band power ratios (e.g., Theta/Beta) Ratio / % 1 Hz
ECG / HRV Polar H10 Heart rate (HR) bpm 1 Hz
R-R intervals (beat-to-beat) ms Event-based
HRV metrics (SDNN, RMSSD, LF, HF, LF/HF ratio) ms, ms², ratio Computed during analysis
EDA EmbracePlus Skin conductance level (SCL) µS 4 Hz
Skin conductance responses (SCR frequency, amplitude, rise time) µS, count Computed during analysis
Body Temperature EmbracePlus Peripheral skin temperature °C 1 Hz
Eye Tracking Pupil Labs Neon Gaze position (x, y coordinates) Pixels / AOI-mapped 200 Hz
Fixation duration and count ms, count Computed during analysis
Saccade amplitude, velocity, frequency °/s Computed during analysis
Pupil diameter mm 200 Hz
Blink rate and duration count/min, ms Computed during analysis
Scene video (with gaze overlay) mp4 30 Hz (video)
Flight Data Sentry Plus + ForeFlight GPS latitude, longitude, altitude °, ft 1 Hz
Ground speed, vertical speed knots, ft/min 1 Hz
Magnetic heading, track, pitch, roll ° 1 Hz
Time UTC 1 Hz
Instructor Ratings CFI + Grade Sheet Maneuver scores (climb, descent, turn, stall, landing, circuit) Score (0–3 or pass/fail) Per maneuver
Self-Report Pilot (post-task) Workload, stress, situation awareness ratings 0–10 scale Post-segment
Pilot (pre-flight) Baseline stress rating 0–10 scale Once (pre-flight)

Electroencephalography (EEG)

A Muse S headband (Fig. 2) and Mind Monitor (Fig. 3) are employed to capture EEG data. This commercially available device uses dry electrodes positioned over frontal and temporal regions (AF7, AF8, TP9, and TP10) and validated in multiple scientific studies [20,21]. It wirelessly transmits data (256 Hz), including raw EEG signals and derived frequency band power (Alpha, Beta, Gamma, Delta, Theta), potentially offering insights into cognitive states such as workload, attention, engagement, and drowsiness [22]. Mind Monitor is a mobile application that connects to the Muse EEG headband and streams raw EEG data in real time. It applies basic preprocessing, including notch filtering and optional bandpass filtering, and provides access to raw EEG signals, absolute and relative band powers, and FFT-based spectral data (1 Hz) for further analysis.

Fig. 2.

Fig 2

Muse S headband.

Fig. 3.

Fig 3

Mind Monitor application.

Electrocardiography (ECG)

Heart rate and R-R intervals are recorded using a Polar H10 chest strap (Fig. 4). This device is widely validated for accurately capturing beat-to-beat intervals, essential for Heart Rate Variability (HRV) analysis in previous studies [23,24]. Similar to the Muse S headband, it transmits data to the data receivers, typically the mobile phones, using Bluetooth. Fig. 5 shows the Polar Sensor Logger application used during our data collection. HRV metrics derived from these intervals (e.g., SDNN, RMSSD, LF/HF ratio) serve as robust indicators of autonomic nervous system activity, reflecting physiological responses to stress, mental workload, and fatigue [25]. Data are logged at 1 Hz for HR or provides beat-by-beat R-R intervals in milliseconds.

Fig. 4.

Fig 4

Polar H10 chest strap.

Fig. 5.

Fig 5

Polar Sensor Logger application.

Electrodermal activity (EDA) & body temperature

An EmbracePlus (Fig. 6) research-grade wearable device, worn on the left wrist (non-dominant hand), is used to measure EDA (via skin conductance) and peripheral skin temperature. EmbracePlus is a successor to the Empatica E4, a wristband that offers real-time physiological data acquisition and streaming, which is also widely used for research purposes [26,27]. EDA provides a sensitive measure of sympathetic nervous system arousal, often correlating with emotional responses, stress, and cognitive load [28]. Skin temperature fluctuations can also be associated with stress responses. The device logs data at specified sampling rates (e.g., EDA at 4 Hz, Temperature at 1 Hz) in standard units (microsiemens for EDA, degrees Celsius for temperature). All data are transmitted via Bluetooth to a mobile phone running the Care Lab app (Fig. 7), which automatically uploads the data to a private Amazon S3 bucket managed by the device manufacturer.

Fig. 6.

Fig 6

EmbracePlus watch.

Fig. 7.

Fig 7

Care Lab application.

Eye tracking

Gaze behavior is monitored using Pupil Labs Neon eye-tracking glasses (Fig. 8). These wearable glasses provide video of the pilot’s field of view along with overlaid gaze position, enabling analysis of visual scan patterns and attention allocation. Pupil Labs eye trackers have been widely used in studies of cognitive states and human performance [14,29]. Key metrics include gaze coordinates, pupil diameter (pupillometry, linked to cognitive load), blink rate, and blink duration (potentially related to fatigue or attention lapses). The system records data at 200 Hz and provides information on fixations, saccades, and pupil size. The device is wired to a mobile phone for data transfer and power. All recorded data and videos are uploaded to the Pupil Labs Cloud (Fig. 9) for further processing and metric extraction.

Fig. 8.

Fig 8

Pupil Labs Neon glasses.

Fig. 9.

Fig 9

Pupil Labs Cloud interface.

Flight data

Objective flight data are captured using a Sentry Plus ADS-B receiver (Fig. 10) connected to an Apple iPad running the ForeFlight Electronic Flight Bag (EFB) application. This setup logs a range of parameters analogous to a basic flight data recorder (FDR), including time-stamped GPS position (latitude, longitude), altitude (GPS and pressure), ground speed, vertical speed, track, and potentially attitude information (pitch, roll, magnetic heading) derived from the Sentry’s internal sensors. This provides crucial context about the aircraft’s state and trajectory, enabling objective performance assessment (e.g., adherence to flight path, stability of control) and correlation with physiological and behavioral data. Data are recorded at 1 Hz sample rate and can be viewed on ForeFlight Web (Fig. 11).

Fig. 10.

Fig 10

Sentry Plus ADS-B receiver.

Fig. 11.

Fig 11

ForeFlight’s Webpage.

Performance assessment

Pilot performance on specific maneuvers is evaluated by a CFI occupying the right seat, using a structured Competency-Based Assessment Grade Sheet adapted from a University of Waterloo Institute for Sustainable Aeronautics (WISA) technical report [30], which provides standardized criteria for grading performance on tasks such as climbs, descents, turns, stalls, navigation, and landings. Criteria typically involve adherence to tolerances for altitude, airspeed, heading, procedural correctness, and overall control smoothness. This yields objective, expert-rated scores linked to specific flight segments.

Subjective cognitive states

Subjective cognitive states are assessed using single-item self-reports on a 0–10 scale, as completing multi-item questionnaires during in-aircraft tasks was not feasible. Pilots rate their perceived workload, stress, and situation awareness immediately after each flight segment. A pre-flight stress rating is also recorded as a baseline. This approach allows for quick and minimally intrusive data collection and is consistent with previous studies that have used single-item measures to assess cognitive states in real-world aviation settings [31,32].

Data collection procedure

A standardized procedure is followed for each data collection session to ensure consistency and safety.

Pre-flight

Upon arrival, participants receive a detailed briefing about the study objectives, flight plan, tasks involved, sensor operation, and safety procedures. They provide written informed consent and complete a baseline questionnaire for demographic information. The research team assists the pilot in fitting physiological sensors (EEG headband, ECG chest strap, EDA/Temp watch) and eye-tracking glasses. Signal quality checks (e.g., EEG electrode impedance, ECG signal detection) and eye-tracker calibration are performed to ensure reliable data acquisition. The CFI receives a separate briefing on their role as safety pilot, evaluator using the competency-based assessment sheet, and the method for interaction during the flight (minimizing non-essential communication). Prior to settling in the cockpit, data recording is initiated simultaneously across all acquisition systems (physiological sensors, eye tracker, flight data logger).

In-flight

The flight scenario is designed to encompass a range of standard general aviation maneuvers typically encountered in training and operational flying, selected to elicit variations in cognitive load, attention demands, and potential stress. Throughout the flight, participants perform each task following the given task list. The CFI monitors the pilot’s performance, provides instruction only when necessary for safety or procedural correction, records participants’ self-reported cognitive states, and completes the relevant sections of the competency-based assessment grade sheet during or immediately after the execution of specific maneuvers. The CFI’s primary role is ensuring flight safety.

Post-flight

After the engine shutdown and securing the aircraft, data recording is stopped. The sensors are removed from the participant. A debriefing session is conducted with the pilot to discuss their experience, clarify any events that occurred during the flight, and gather qualitative feedback. Data from all recording devices are then downloaded and securely stored for subsequent processing and analysis.

Synchronization strategy

Accurate temporal alignment across all data streams is essential for integrated analysis. In this study, all recording devices are connected to mobile phones, with each dataset referencing the phone’s system time, although with varying timestamp formats. Synchronization involves converting all timestamps to a common format and aligning them accordingly. Downsampling is applied as needed to ensure consistency across sampling rates. The start and end points of each flight task are identified using eye-tracking video recordings, which provide a reliable reference for segmenting the data. This streamlined synchronization approach ensures consistent alignment across physiological signals, eye-tracking data, and flight parameters, enabling precise temporal analysis.

Data structure and organization

The application of this methodology yields a rich, high-dimensional, multimodal dataset capturing various facets of pilot state and performance in real flight conditions. The data structure and organization follow a previously published method [33]. The dataset comprises several distinct types of data:

Time-series physiological data

Continuous recordings from the physiological sensors, including multiple channels of EEG, ECG R-R intervals or raw waveforms, EDA (skin conductance), and peripheral skin temperature. These data require extensive preprocessing, including noise filtering (e.g., powerline interference, motion artifacts), artifact detection and correction (e.g., EEG eye blinks, ECG/EDA movement artifacts), and segmentation into epochs aligned with flight phases or maneuvers.

Time-series eye-tracking data

Continuous streams capturing gaze coordinates (x, y position on the scene camera or cockpit map), pupil diameter, and blink events (rate, duration). Preprocessing includes detection of fixations, saccades, and blinks, as well as mapping gaze data to cockpit areas of interest (AOIs) such as instruments or external references to analyze visual attention patterns.

Time-series flight data

Continuous recordings of aircraft state parameters, including GPS position, altitude, airspeed (or ground speed if pitot-static information is unavailable), vertical speed, and attitude (pitch, roll, heading). These data are generally cleaner but still require synchronization with other streams.

Event-based performance data

Discrete performance scores assigned by the CFI for specific flight maneuvers (e.g., altitude maintenance during a steep turn, quality of landing flare). These scores are linked to specific time intervals or events in the continuous data streams.

Discrete subjective data

Single-point measurements include pre-flight baseline stress ratings and post-task single-item scores for workload, stress, and situation awareness. These measures provide concise subjective assessments of the pilot’s cognitive state after each task.

To facilitate analysis and sharing, all data are systematically organized. Comma-separated values (CSV) files are used as the standard format, with all data files including synchronized timestamps (e.g., UTC or seconds since experiment start) for precise multimodal alignment. A detailed data dictionary defining all variables, units, and coordinate systems is maintained. This structure allows researchers to easily extract and link data associated with specific flight phases (e.g., take-off, landing) or discrete events (e.g., stall recovery, steep turn), supporting flexible and targeted analyses.

Method validation

This study successfully validated a practical and ecologically valid method for multimodal in-flight data collection designed to objectively assess pilot cognitive states and performance in general aviation. The validation involved 25 healthy licensed and student pilots (recruited based on minimum flight hours and valid Transport Canada Aviation Medical Certificates) and one CFI at Brantford Municipal Airport (CYFD), Ontario, Canada, following approval from the University of Waterloo Research Ethics Board.

Despite some technical challenges leading to the exclusion of 5 datasets, 20 complete multimodal datasets were successfully collected and retained for analysis. These datasets provide a rich, synchronized view of pilot behavior and cognition, integrating physiological signals (EEG, ECG, EDA, body temperature), eye-tracking behavior, flight telemetry, expert instructor performance ratings, and subjective self-reports, all captured within the authentic operational demands of a Cessna 172 during actual flight. Table 2 presents summary statistics describing the demographic characteristics of the participants and key flight parameters collected during the study.

Table 2.

Participant and flight characteristics.

Variable Mean SD Min Max
Age (years) 32 12.8 18 65
Total flight hours 256 271 20 1100
Number of tasks completed 6 0 6 6
Average total study duration (min) 81.6 10.5 66.1 93.2

The method proved feasible to implement in a real-world flight environment and was well-tolerated. Participants reported minimal discomfort or distraction from the wearable sensors, and the flight instructor confirmed that the structured scenarios and evaluation process were manageable and did not compromise flight safety. The use of brief, single-item self-reports post-task was deemed efficient. This methodology yielded high-quality, ecologically valid data, addressing key limitations associated with simulator-based research by capturing behavior under real consequences and stressors.

The resulting dataset supports diverse analytical approaches, from statistical correlations between physiological features, gaze, and performance, to machine learning models for classifying cognitive states like workload or predicting performance variations. This integrative capability is crucial for moving towards real-time pilot state monitoring and adaptive systems. The methodology's strengths lie in its ecological validity and multimodal richness, offering a holistic view relevant to general aviation. The data enables detailed investigations into workload, stress, attention, and SA during flight, supporting the development of data-driven performance assessment tools. Furthermore, this work has practical implications for enhancing pilot training, informing safety management systems, and guiding the design of human-centered cockpit interfaces and automation. Multiple research projects leveraging this dataset are currently underway at WISA.

Discussion

This study presents a validated and ecologically grounded in-flight data collection methodology that enables objective assessment of pilot cognitive states and performance. By integrating synchronized physiological signals, eye-tracking behavior, flight telemetry, instructor evaluations, and subjective self-reports, the protocol offers a comprehensive, real-world dataset suitable for advancing human factors research. The approach proved feasible and well-tolerated, with minimal disruption to flight operations, and yielded 20 complete datasets despite some technical issues.

This multimodal method provides a unique opportunity to investigate pilot workload, stress, attention, and situation awareness through feature extraction and statistical or machine learning analyses. The strengths of the methodology lie in its ecological validity, data richness, and relevance to general aviation. It supports the development of data-driven assessment tools and has practical implications for pilot training, safety management, and adaptive cockpit design.

While the method was successfully implemented, several challenges and limitations were encountered, highlighting areas for future refinement. Technical issues resulted in incomplete data for 5 out of 25 participants. The primary causes included Bluetooth connectivity problems between sensors and recording devices, sensor displacement (particularly EEG headset slippage during maneuvers leading to signal loss), mobile app crashes, and data logging or file saving errors. These issues underscore the complexities of deploying multiple sensor systems reliably in a dynamic flight environment.

Data management also presented practical hurdles. Data streams were stored on different devices using varied formats and timestamps, necessitating careful manual conversion, alignment, and synchronization during post-processing. Although using eye-tracking video to identify task start/end points for synchronization was effective, it proved time-intensive. Matching sampling rates across modalities required downsampling, and rigorous file naming conventions were essential to manage the large number of files generated per participant. Processing and analyzing such complex, multimodal data demands significant technical resources and expertise.

Beyond technical aspects, managing potential sensor discomfort and minimizing data artifacts caused by flight movement or environmental factors remain ongoing challenges. Furthermore, accounting for individual differences in pilot physiology and cognition during analysis is crucial for deriving generalizable findings.

Conclusion

This methodology lays a strong foundation for advancing both theoretical and applied research in aviation human factors. It addresses a notable gap in the literature, as no comprehensive, published protocol currently exists for in-aircraft multimodal pilot data collection. While previous studies have focused on simulator-based environments, in-aircraft data collection presents unique challenges that require additional considerations—such as ensuring flight safety, minimizing cockpit distractions, and simplifying subjective measures (e.g., replacing long questionnaires like NASA-TLX with quick single-item ratings). The current protocol demonstrates a practical and safe approach for capturing synchronized physiological, behavioral, and performance data during real flights.

A key contribution of this work is the integration of multiple consumer-grade biosensors into a unified, operationally feasible system that can be deployed in general aviation settings. This includes streamlined synchronization methods and a flexible structure for collecting ecologically valid, time-aligned datasets.

Future research will focus on increasing sample size, refining sensor systems, improving synchronization techniques, and enabling real-time pilot state monitoring. Expanding the study to different aircraft, flight conditions, and longitudinal comparisons will further enhance its applicability. Addressing current limitations by improving sensor robustness, streamlining synchronization processes, and enabling automated data analysis will be critical for future work. Ultimately, overcoming these challenges will support the development of comprehensive, evidence-based approaches to assess and enhance pilot performance in increasingly complex operational environments.

Ethics statements

All procedures involving human participants in this study were reviewed and approved by the University of Waterloo Research Ethics Board. Informed consent was obtained from all participants prior to their involvement, confirming their understanding that the collected data would be fully anonymized, stored securely in a database, and potentially utilized in future research studies.

CRediT authorship contribution statement

Rongbing Xu: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing – original draft, Visualization, Project administration. Shi Cao: Conceptualization, Methodology, Investigation, Resources, Writing – review & editing, Supervision, Funding acquisition. Michael Barnett-Cowan: Writing – review & editing, Resources. Gulnaz Bulbul: Writing – review & editing, Resources. Elizabeth Irving: Writing – review & editing, Resources. Ewa Niechwiej-Szwedo: Writing – review & editing, Resources. Suzanne Kearns: Writing – review & editing, Resources.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The development of this data collection methodology was supported by a Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant (RGPIN-2024–04808 to S.C.). Special thanks to Dr. Yedong Chen for his support in creating the graphic abstract.

Footnotes

Related research article: None

For a published article: None

Data availability

Data will be made available on request.

References

  • 1.Aurino D.E.M. Human factors and aviation safety: what the industry has, what the industry needs. Ergonomics. 2000;43(7):952–959. doi: 10.1080/001401300409134. [DOI] [PubMed] [Google Scholar]
  • 2.S. Shappell, C. Detwiler, K. Holcomb, C. Hackworth, A. Boquet, and D. Wiegmann, “Human error and commercial aviation accidents: a comprehensive, fine-grained analysis using HFACS,” 2006. [DOI] [PubMed]
  • 3.Martins A.P. A review of important cognitive concepts in aviation. Aviation. 2016;20(2):65–84. doi: 10.3846/16487788.2016.1196559. [DOI] [Google Scholar]
  • 4.Causse M., et al. Cognitive incapacitation in aviation: a narrative review. Theor. Issues. Ergon. Sci. 2025:1–19. doi: 10.1080/1463922X.2025.2475431. [DOI] [Google Scholar]
  • 5.Masi G., Amprimo G., Ferraris C., Priano L. Stress and workload assessment in aviation—a narrative review. Sensors. Mar. 2023;23(7):3556. doi: 10.3390/s23073556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Nguyen T., Lim C.P., Nguyen N.D., Gordon-Brown L., Nahavandi S. A review of situation awareness assessment approaches in aviation environments. IEEe Syst. J. 2019;13(3):3590–3603. doi: 10.1109/JSYST.2019.2918283. [DOI] [Google Scholar]
  • 7.Mavin T.J., Roth W. A holistic view of cockpit performance: an analysis of the assessment discourse of flight examiners. Int. J. Aviat. Psychol. 2014 doi: 10.1080/10508414.2014.918434. [DOI] [Google Scholar]
  • 8.Han S.-Y., Kim J.-W., Lee S.-W. Recognition of pilot’s cognitive states based on combination of physiological signals. 2019 7th International Winter Conference on Brain-Computer Interface (BCI); IEEE; 2019. pp. 1–5. [Google Scholar]
  • 9.Aricò P., Borghini G., Di Flumeri G., Colosimo A., Bonelli S., Babiloni F. Adaptive automation triggered by EEG-based mental workload evaluation: a passive brain–computer interface application in realistic flight simulations. Front. Hum. Neurosci. 2017;11:243. doi: 10.3389/fnhum.2017.00243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Yiu C.Y., et al. Towards safe and collaborative aerodrome operations: assessing shared situational awareness for adverse weather detection with EEG-enabled bayesian neural networks. Adv. Eng. Inform. 2022;53 doi: 10.1016/j.aei.2022.101698. [DOI] [Google Scholar]
  • 11.Wang H., Jiang N., Pan T., Si H., Li Y., Zou W. Cognitive load identification of pilots based on physiological-psychological characteristics in complex environments. J. Adv. Transp. 2020 doi: 10.1155/2020/5640784. [DOI] [Google Scholar]
  • 12.Ahmed T., Powner M.B., Qassem M., Kyriacou P.A. Colorimetric determination of salivary cortisol levels in artificial saliva for the development of a portable Colorimetric sensor (Salitrack) Sci. 2024 doi: 10.3390/sci6020020. [DOI] [Google Scholar]
  • 13.Zhang C., et al. Assessing pilot workload during takeoff and climb under different weather conditions: a fNIRS-based modelling using deep learning algorithms. IEEE Trans. Aerosp. Electron. Syst. 2024 [Google Scholar]
  • 14.Ayala N., Kearns S., Irving E., Cao S., Niechwiej-Szwedo E. The effects of a dual task on gaze behavior examined during a simulated flight in low-time pilots. Front. Psychol. 2024;15 doi: 10.3389/fpsyg.2024.1439401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Xu R., Cao S., Kearns S.K., Niechwiej-Szwedo E., Irving E. Computational cognitive modeling of pilot performance in pre-flight and take-off procedures. J. Aviat./Aerosp. Educ. Res. 2024;33(4) doi: 10.58940/2329-258X.2026. [DOI] [Google Scholar]
  • 16.Paul N., Moncion B., Cao S. An experimental comparison on the effectiveness of various levels of simulator fidelity on ab initio pilot training. Ergonomics. 2025:1–17. doi: 10.1080/00140139.2024.2449110. [DOI] [PubMed] [Google Scholar]
  • 17.Carretta T.R., Teachout M.S., Ree M.J., Barto E.L., King R.E., Michaels C.F. Consistency of the relations of cognitive ability and personality traits to pilot training performance. Int. J. Aviat. Psychol. 2014;24(4):247–264. doi: 10.1080/10508414.2014.949200. [DOI] [Google Scholar]
  • 18.Perfect P., White M., Padfield G., Gubbels A. Rotorcraft simulation fidelity: new methods for quantification and assessment. Aeronaut. J. 2013;117(1189):235–282. doi: 10.1017/S0001924000007983. [DOI] [Google Scholar]
  • 19.Caldwell J. Assessing the impact of stressors on performance: observations on levels of analyses. Biol. Psychol. 1995;40(1–2):197–208. doi: 10.1016/0301-0511(95)05115-5. [DOI] [PubMed] [Google Scholar]
  • 20.Krigolson O.E., Williams C.C., Norton A., Hassall C.D., Colino F.L. Choosing MUSE: validation of a low-cost, portable EEG system for ERP research. Front. Neurosci. 2017;11:109. doi: 10.3389/fnins.2017.00109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Krigolson O.E., et al. Using Muse: rapid mobile assessment of brain performance. Front. Neurosci. 2021;15 doi: 10.3389/fnins.2021.634147. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Bell M.A., Cuevas K. Using EEG to study cognitive development: issues and practices. J. Cogn. Dev. 2012;13(3):281–294. doi: 10.1080/15248372.2012.691143. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Schaffarczyk M., Rogers B., Reer R., Gronwald T. Validity of the polar H10 sensor for heart rate variability analysis during resting state and incremental exercise in recreational men and women. Sensors. 2022;22(17):6536. doi: 10.3390/s22176536. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Speer K.E., Semple S., Naumovski N., McKune A.J. Measuring heart rate variability using commercially available devices in healthy children: a validity and reliability study. Eur. J. Investig. Health Psychol. Educ. 2020;10(1):390–404. doi: 10.3390/ejihpe10010029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Forte G., Favieri F., Casagrande M. Heart rate variability and cognitive function: a systematic review. Front. Neurosci. 2019;13:710. doi: 10.3389/fnins.2019.00710. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.McCarthy C., Pradhan N., Redpath C., Adler A. Validation of the Empatica E4 wristband. 2016 IEEE EMBS international student conference (ISC); IEEE; 2016. pp. 1–4. [DOI] [Google Scholar]
  • 27.Milstein N., Gordon I. Validating measures of electrodermal activity and heart rate variability derived from the empatica E4 utilized in research settings that involve interactive dyadic states. Front. Behav. Neurosci. 2020;14:148. doi: 10.3389/fnbeh.2020.00148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Buchwald M., Kupiński S., Bykowski A., Marcinkowska J., Ratajczyk D., Jukiewicz M. Electrodermal activity as a measure of cognitive load: a methodological approach. 2019 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA); IEEE; 2019. pp. 175–179. [DOI] [Google Scholar]
  • 29.Picanço C.R., Tonneau F. A low-cost platform for eye-tracking research: using Pupil\copyright in behavior analysis. J. Exp. Anal. Behav. 2018;110(2):157–170. doi: 10.1002/jeab.448. [DOI] [PubMed] [Google Scholar]
  • 30.B. Moncion, S. Cao, and S. Kearns, “Creating competency-based assessment grade sheets and a rubric for private pilot licence training,” WISA Technical Report (2024-001). Waterloo Institute for Sustainable Aeronautics, University of Waterloo., 2024.
  • 31.Elo A.-L., Leppänen A., Jahkola A. Validity of a single-item measure of stress symptoms. Scand. J. Work Env. Health. 2003:444–451. doi: 10.5271/sjweh.752. [DOI] [PubMed] [Google Scholar]
  • 32.Ames L., George E. Air Force Flight Test Center; Edwards AFB, CA: 1993. Revision and Verification of a Seven-Point Workload Scale. AFFTC-TIM-93-01. [Google Scholar]
  • 33.R. Xu et al., “A comprehensive data collection and processing protocol for general aviation pilot performance assessment and behaviour research,” WISA Technical Report (2025-001). Waterloo Institute for Sustainable Aeronautics, University of Waterloo., 2025.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data will be made available on request.


Articles from MethodsX are provided here courtesy of Elsevier

RESOURCES