Skip to main content
PLOS One logoLink to PLOS One
. 2021 Feb 18;16(2):e0247061. doi: 10.1371/journal.pone.0247061

Visual scanning strategies in the cockpit are modulated by pilots’ expertise: A flight simulator study

Christophe Lounis 1,*, Vsevolod Peysakhovich 1, Mickaël Causse 1
Editor: Peter James Hills2
PMCID: PMC7891757  PMID: 33600487

Abstract

During a flight, pilots must rigorously monitor their flight instruments since it is one of the critical activities that contribute to update their situation awareness. The monitoring is cognitively demanding, but is necessary for timely intervention in the event of a parameter deviation. Many studies have shown that a large part of commercial aviation accidents involved poor cockpit monitoring from the crew. Research in eye-tracking has developed numerous metrics to examine visual strategies in fields such as art viewing, sports, chess, reading, aviation, and space. In this article, we propose to use both basic and advanced eye metrics to study visual information acquisition, gaze dispersion, and gaze patterning among novices and pilots. The experiment involved a group of sixteen certified professional pilots and a group of sixteen novice during a manual landing task scenario performed in a flight simulator. The two groups landed three times with different levels of difficulty (manipulated via a double task paradigm). Compared to novices, professional pilots had a higher perceptual efficiency (more numerous and shorter dwells), a better distribution of attention, an ambient mode of visual attention, and more complex and elaborate visual scanning patterns. We classified pilot’s profiles (novices—experts) by machine learning based on Cosine KNN (K-Nearest Neighbors) using transition matrices. Several eye metrics were also sensitive to the landing difficulty. Our results can benefit the aviation domain by helping to assess the monitoring performance of the crews, improve initial and recurrent training and ultimately reduce incidents, and accidents due to human error.

Introduction

Monitoring activity in the cockpit

Throughout the flight, pilots must build and update their situation awareness (SA) to maintain flight safety margins [1]. The flight crew cannot update the SA without monitoring specific flight instruments (e.g., attitude indicator, speed, altimeter, engine parameters) and the external environment (by clear weather). The monitoring activity, particularly critical during dynamic flight phases such as take-off and landing, includes the observation and interpretation of the flight path data, aircraft-configuration status, automation modes, and on-board systems. It supposes a real-time comparison of instrument data or system modes against the expected values according to the current flight phase. A rigorous cockpit monitoring allows timely corrective actions in case of a parameter deviation, ensuring an optimal level of safety [2]. This monitoring activity is structured in sequences of attentional shifts from an instrument to another.

Irregularities in these sequences can undermine the safety margins. In numerous cases of aircraft accidents, pilots’ visual scanning has been described as “inadequate”, “ineffective”, or “insufficient” [3]. Since the 1994 report by the National Transportation Safety Board that the inappropriate monitoring was involved in 84% of major accidents in the United States [4], numerous studies investigated the visual behavior of the pilots. However, in a “practical guide for improving flight path monitoring” by the Flight Safety Foundation [5], which investigated 188 accidents with monitoring issues, it is underlined that many monitoring errors still occur, most of them during dynamic phases of flight (e.g., climb, descent, approach, and landing). In 2013, the Federal Aviation Administration required airlines to include an explicit training program to improve monitoring skills [6, 7]. Following the PARG study [7], the Bureau d’Enquêtes et d’Analyses (French Investigation Agency) encouraged the use of eye tracking systems to finely analyze and improve crews’ visual scanning. Interestingly, an extensive survey conducted on 931 pilots during the PARG study [8] showed that most of the pilots need a better description of what a “standard” visual circuit in the cockpit is. Similarly, in another recent survey [9], 75% of pilots deemed helpful to know the required visual patterns for the different flight phases to enhance their cockpit monitoring skills.

Visual scanning strategies as a marker of expertise

The relationship between visual scanning skills and performance has been highlighted in experiences where participants were trained to gaze at relevant areas. For instance, Shapiro et al. [10] demonstrated that videogamers that were trained using efficient visual scanning examples showed better performance compared with random pattern training or no training at all. In another air traffic control study, Kang and Landry [11] enhanced novices’ performance in a conflict detection task by presenting experts’ visual scans overlaid on the radar screen during the task. The study also showed that the visual presentation outperformed the “instruction-only” condition. These studies support the relationship between visual patterns and task performance, and demonstrate the possibility to improve these patterns with adequate training. The task performance increases with the experience and associated expertise. The links between the visual scanning strategies and the expertise were observed in fields such as radiology, driving, sport, military aviation or chess (e.g., [1214]). Gegenfurtner, Lehtinen, and Säljö [15] conducted a meta-analysis and highlighted that experts (compared to non-experts) generally demonstrate more fixations on task-relevant areas as well as shorter fixations. In their review of eye movements in medicine and chess, Reingold and Sheridan [16] have labeled this greater perceptual effectiveness of experts as “superior perceptual encoding of domain-related patterns”.

Several studies in the aeronautical domain showed that pilots’ visual scanning strategies (e.g. duration and frequency of fixations) evolve with the level of expertise [1725]. According to Bellenkes et al. [26] the fixations of experts are shorter and fixations on instruments are more frequent. Similarly, Kasarskis, Stehwien, and Hickox [27] noticed that expert pilots (1500—2150 flight hours) perform more fixations and have shorter dwell times than novices (40—70 flight hours), and argued that experts have more structured visual patterns. Lorenz et al. [28] have shown that experts (3000–10300 flight hours) spend more time looking outside the cockpit compared to novices (13–500 flight hours) during a taxiing task. Furthermore, a study involving fighter pilots flying high speed low altitude flights [29] highlighted the importance of efficient visual scanning strategies. In this study, the pilots who achieved the best flight performance made shorter fixations on the heads-down tactical display and alternated more frequently between the tactical display and the outside world. Similar results were found in experts (>1000 hours) and novices (200—400 hours) playing flight simulation games [30]. Because visual scanning appears to differentiate between expert and novice pilots’ performance, it is interesting to examine which eye tracking metrics are available in the literature [31] to compare the visual scanning strategies using various approaches such as the estimation of the distribution and patterning of the visual scanning.

The objective of the present work is to provide a framework for eye movement data analysis techniques to study visual scanning strategies in novices and experts. These eye movement measures and algorithms are presented in light of the results of an experiment involving novice and expert pilots during a landing scenario performed in a flight simulator. We examined the impact of expertise and the difficulty of the flight scenario on the visual attention allocation. The participants performed three times the same landing scenario with varying difficulty conditions. Two difficulty conditions incorporated a supplementary visual monitoring task, with different time pressure, to make cockpit monitoring more complex by increasing visuomotor activity. We analyzed the effect of the pilots’ profile (pilot vs. novice) as well as the effects of the landing difficulty on numerous standard (number of dwells, average dwell times) and advanced eye movements metrics (Lempel-Ziv Complexity, Gaze Transition Entropy, attentional modes, N-gram methods) presented in the following section.

State-of-the-art visual scanning metrics

Classical eye movements measures such as fixation duration, dwell time, or the number of fixations, provide relevant results when comparing novices vs. experts. However, statistical analyses of these metrics often involve time-averaging operations, thus, neglecting the information regarding the sequence of instrument scanning. Consequently, a rich part of the data that reflects the dynamic of the deployment of the attention processes is lost or not fully exploited. Numerous other metrics are available to explore and characterize in more depth visual scanning strategies. We use the broad term “visual scanning” to describe visual scanning made up of an at least one dwell to one area of interest (AOI), followed by a transition, and a dwell to another AOI; “visual scanning pattern” is used when the visual scanning is made up of repeated sequences of a given “visual scanning”. One approach to examine visual scanning strategies is to analyze transition matrix (e.g., [3235], a second one is the characterization of fluctuation between ambient/focal visual behavior [36], another one is to derive global patterns metrics such as entropy (e.g., [37, 38], see [39] for a review). More generally, in this paper, we classified visual scanning strategies metrics in three AOI based approaches: one is based on Markov chains (transition matrix), another is based on the attentional modes, and the last one is based on sequences analyses. Fig 1 presents a comparison of the visual scanning metrics described below (e.g. formula, definition, strength shortcomings, strength, etc…).

Fig 1. Overview of the different visual scanning metrics classified by approaches.

Fig 1

Markov chains

Several metrics allow examining whether visual scanning is narrow or wide.

The transition matrix probabilities

They contain the information about how often a transition from one Area Of Interest (AOI) to another occurred based on subsequent dwells of the visual scan. This method provides a data representation that can also lead to the development of stochastic and queuing models [40] of the pilot’s scanning in the cockpit. This method can be extended to three dimensions by considering the location of the previous two dwells, which Norris et al. [41] have described as a second-order Markov chain. Jones et al. [42] showed that transitions matrices are sensitive to flight maneuvers. Based on the transition matrices, Hayashi proposed in 2004 [43] a Hidden Markov Model approach corresponding to different flight tasks. Its works were used afterward to model the dwell patterns of the space shuttle crew [44].

Transition matrix density

Introduced by Goldberg and Kotval [31], the transition matrix density describes the dispersion of attention over time [45]. Transition matrix density provides a single quantitative value by dividing the number of active transition cells (i.e., those containing at least one transition) by the total number of cells. An unusually dense transition matrix (large index value), with most cells filled with at least one transition, can indicate a dispersed, lengthy, and wandering visual scan (this can reflect an extensive search on a display for example) [46]. A sparse matrix can reflect a more efficient and directed search, for example when using a computer software [40], or, in other contexts, can indicate a failure to properly monitor the environment, for example when a novice driver directs his gaze continuously to the road while ignoring/forgetting the rearview mirrors or when a pilot is excessively engaging his visual attention on a single instrument (e.g., [47]).

Attentional modes

K coefficient

Another evaluation of the dispersion of the attention is a novel parametric scale called K coefficient introduced by Krejtz et al. [48]. This metric was created and developed during exploring artwork (e.g., painting) and map viewing [49] in order to investigate the dynamics of visual scan (focal vs ambient) when operating such tasks. In a recent study, Lounis et al. [50] used this method by modifying input data, using dwells and transitions instead of fixations and saccades. During various flight phases with automation in a full-flight simulator, they calculated for each pilot the mean difference between standardized values (z-scores) of each transition (a(i + 1)) and its preceding ith dwell (di), where di is the duration of the i–th dwell and ai+1 the amplitude of the transition that occurs after the i–th dwell. μd, μa are the mean dwell durations and transition amplitudes, respectively, and ρd, ρa are standard deviations, respectively.

κi=di-μdρd-ai+1-μaρa,κ=1ni=1nκi (1)

Values of Ki close to zero indicate relative similarity between dwell durations and transition amplitudes. Positive values of Ki show relatively long dwells followed by short transition amplitudes, which indicate focal attention. Negatives values of Ki refer to the situation where relatively short dwells are followed by a relatively long transition, suggesting ambient attention (diffuse attention). According to Heitz, R. P., & Engle, R. W. (2007) [51], in the diffuse mode, visual attention is more allocated to all regions of the visual field in quite equal proportion; in the focused mode, attention is concentrated at a few areas of interest, specified by a central or peripheral cue. An extremely focused mode could be compared to the concept of attentional tunnelling [47]. It is worth noting that the values of the K coefficient should be interpreted together with dwell duration results because different groups can have different average values of dwell duration and transition amplitudes.

Sequence analyses

The sequence analyses approach allows measuring the extent to which the time sequence of eye movements is ordered or random during a flight.

Gaze Transition Entropy (GTE)

Defined by Shannon and Weaver [52], entropy is a measure of lack of predictability in a sequence. This metric enables evaluating the structuration of the gaze [53]. When applied to eye tracking data, transition entropy describes the amount of information needed to describe the visual strategies, following the formula:

GTE=i=1np(i)[j=1np(ji)log2p(ji)],ij (2)

where i represents the “from” AOI and j represents the “to” AOI. Higher transition entropy denotes more randomness and more frequent switching between AOIs [54]. Ephrath, Tole, Stephens, and Young [55] have noticed an increase of entropy with increasing pilots’ mental workload (by adding a secondary task). Van de Merwe et al. [56] found that entropy increased as a result of cockpit instrument failure, conditions that most likely produce an increased mental workload. More recently, using GTE, Allsop et Gray, 2014 [57] revealed that visual scanning became more random during the an anxiety landing scenario. Diaz-Piedra et al. [58] observed a significant decrease in pilot’s gaze entropy when pilots faced a scenario presenting more complexity.

Lempel-Ziv complexity

The complexity (i.e., the quantity and diversity) of visual scanning patterns can be assessed using Lempel-Ziv Complexity (LZC). LZC was defined by Lempel and Ziv in 1976 (for a review, see [59] as a data compression algorithm computing the minimum number of bits from which a particular message or file can effectively be reconstructed. This algorithm counts the number of different patterns in a sequence when scanned from left to right. For instance, Lempel-Ziv complexity of s = 101001010010111 is 7, because when scanned from left to right, 7 different patterns are observed: 1∣0∣10∣01∣010∣0101∣11. Recently, LZC was applied to the dwell transition to evaluate the number of different visual scanning patterns [60].

N-gram sequences

N-gram is an essential component of many methods in bioinformatics, including for genome and transcriptome assembly, for metagenomic sequencing, and for error correction of sequence reads [61]. Basically, an N-gram model predicts the occurrence of an AOI, based on the occurrence of its N–1 previous AOI. So here we are answering the question: How far back in the history of a sequence of AOI should we go to predict the next AOI? For instance, a bigram model (N = 2) predicts the occurrence of an AOI given only its previous AOI (as N–1 = 1 in this case). Similarly, a trigram model (N = 3) predicts the occurrence of an AOI based on its previous two AOI. The common N-gram sequence analysis used the n-grams frequency-based method [62] to identify the number of common 3, 4, 5, and 6-gram sequences in each group. By using this method, it is possible to count the occurrence of N-gram AOI and their occurrence for each pilots, and thus it allows to compare for each N-gram the intra-group patterns consistency.

Current study

In the present study, we evaluated the efficiency of the previously describe metrics on the eye tracking data from novice and expert pilots. Our main hypothesizes were that expert pilots should exhibit different visual behaviors than novices, including more numerous dwells and shorter dwell times, following the idea that superior perceptual encoding processing comes with expertise. We expected also a sensitivity of all advanced metrics to expertise, with more visual scanning complexity (as evaluated by the Lempel Ziv complexity and the visual pattern lengths), and a more regular visual scanning (as evaluated by the transition entropy) in experts. We also assumed that the pilots’ expertise could be classified (using machine learning) in their way that they switched from an instrument to another, using transition matrices. Finally, we hypothesized that the addition of a parallel monitoring task should also have an impact on ocular behavior, notably by increasing complexity, reducing the regularity level, and generating an ambient mode of attention (i.e. more diffuse attention).

Materials and methods

For reproducibility purpose, the protocol is available on protocols.io; DOI number: dx.doi.org/10.17504/protocols.io.zb5f2q6.

Participants

Thirty-two participants, all males, participated in this experiment. They all had normal or corrected to normal vision. They were not informed about the exact purpose of the study. They were divided into two groups according to their flying experience. A first group called “novices” consisted of participants with no real flight experience (n = 16, mean age 25.7±5.5 years). They were recruited from a French aerospace engineering school (ISAE-SUPAERO, Toulouse, France). All these novices participants had advanced theoretical knowledge about aeronautical engineering, were familiar with the various information given by the instruments in the cockpit (altimeter, altitude etc.), and had flight notions on how to manually interact with the aircraft. Our experimental flight scenarios were relatively simple: the participant had to control the trajectory and the speed of the aircraft. The scenarios did not require complex navigation activities or interacting with automation. Thus the scenarios were feasible for these novices after a relatively short training session. A second group called “pilots” consisted of active professional airline pilots (n = 16, mean age 34.39 ± 8.86 years) with a minimum of 1600 flight hours (mean = 4321.73 ± 2911.41 hours). They were recruited from various airline companies. They all flew on A320 and were currently flying on A320 (68.75%) or B737 (31.25%) at the time of the experiment.

Ethics statement

This research project was approved by the local institutional Research Ethics Committee of the University of Toulouse (Comité d’Ethique de la Recherche de l’Université de Toulouse, code N°2019-131) and was conducted in accordance with the Helsinki Declaration. Volunteers signed an informed consent prior to the experiment and were informed of their right to stop their participation at any time.

Materials

Flight simulator

We used an A320-like flight simulator (“PEGASE”) located at ISAE-SUPAERO (Toulouse, France), see Fig 2. Like in the A320 aircraft, flight instruments included a Primary Flight Display (PFD), a Navigation Display (ND), an Electronic Central Aircraft Monitoring display (ECAM), and an FCU (Flight Control Unit). The field of view covered by the simulator is about 180°. The participants controlled the aircraft with a side-stick, two thrust levers, and a rudder. We recorded flight data to calculate flight performance during the landing.

Fig 2. ISAE-SUPAERO flight simulator with its external screens.

Fig 2

Flight scenarios

Participants manually (i.e., without the autopilot) performed three times the same landing scenario according to three different conditions. The “control scenario” (CS) was a nominal landing without a supplementary task. The “easy dual task scenario” (EDTS) and the “difficult dual task scenario” (HDTS) were similar to the “control scenario” except that participants were asked to perform a supplementary monitoring task. The purpose of this supplementary task was to increase the level of visuo-attentional effort: participants had to regularly check the ND Zone in the ND screen to say aloud the value at the right time. In the “easy dual task scenario”, participants were asked to say aloud the distance between the aircraft and the airfield threshold every 0.5 Nm (information provided by a radio beacon localized near the airfield and displayed in the ND Zone, see Fig 3). In the “difficult dual task scenario”, they were asked to say aloud this distance every 0.2 Nm. The experimenter stayed in the cockpit during the entire experimentation. Each of the three-landing scenarios consisted of performing an approach/landing to Toulouse-Blagnac Airport, Runway LFBO 14R. The flight began at coordinates 1.2159° of longitude and 43.7626° of latitude. During each scenario, the participants had to comply with the same specific instructions related to the flight. In particular: to maintain a vertical speed between +500 ft/min and -800 ft/min, a speed of 130 knots, and a heading of 143° (corresponding to the Runway 14R). We choose these values because they roughly correspond to a standard landing speed with a commercial aircraft. The negative vertical speed of -800 ft/min approximately corresponds to the vertical speed at 130 kt with an angle of approach of three degrees. We defined a tolerance range in case the participant was not well stabilized on the approach slope and had to regain altitude (+500 ft/min maximum). Each landing scenario started at an altitude of 2000 ft and lasted approximately four minutes. The three scenarios were randomized across participants to avoid learning effects. Performance dependent variables were heading, vertical speed, and speed deviations. The number of omissions (i.e., the participant omitted to call out the distance) during the supplementary task was also calculated.

Fig 3.

Fig 3

Overview of the ten different AOIs: (1) Attitude indicator, (2) Speed tape, (3) Vertical speed tape, (4) Flight mode annunciator, (5) Heading tape, (6) Navigation display, (7) ND zone (displays the distance to recall during the two landing scenarios with the supplementary task), (8) Flight control unit, (9) Electronic centralized aircraft monitoring, (10) Out of the window.

Eye movements recordings

Eye movements were recorded at 60Hz using a Smart Eye remote eye tracker (Smart Eye AB, Sweden). The system detects human face/head movements, eye movements, and gaze direction. Gaze direction and eyelid positions are determined by combining image edge information with 3-D models of the eye and eyelids. As presented in Fig 3, the system uses five cameras integrated into the cockpit. A major advantage of using several cameras is that eye and head tracking can be maintained despite significant head motions (translation and rotation) or occlusion of one of the cameras by the participant (e.g., by its hand). Smart eye system allows to design a 3D environment and to establish calibration points (at the vicinity of AOI). When the world model is designed, we just need to operate an automatic calibration for each participant.

World model and area of interest

The cockpit was split into 10 AOIs, corresponding to the different flight instruments and displays that pilots can examine during a flight, see Fig 3. We choose to restrict our analysis to instruments that display information directly related to the flight parameter (altitude, speed etc.) and external view (i.e., Out of the Window).

Procedure

At first, participants filled out the consent form and provided demographic information such as their flight qualification (aircraft type) and their flight experience (total hours of flight experience). Participants were briefed on the study and instructed about the different flight scenarios. Then, they were invited to seat in the flight deck at the captain position (left seat). The eye-tracking system was calibrated using an 11-point calibration. Following the Smart Eye manual recommendation, the 11 points were located in the vicinity of the AOIs. Participants performed a training session, consisting of performing two times a landing scenario. All participants (including novice ones) were able to control the aircraft correctly after these two landing. Then, the participants performed three times the same landings scenario than during the training, but with varying levels of complexity.

Data processing

Flight simulator and eye-tracking data were analyzed using MATLAB R2019b with custom homebuilt scripts. The data were recorded from the beginning of the landing scenario to touch-down. Because the landing duration depends on the pilot’s actions, landing durations could differ by a few seconds. As a consequence, the beginning of the scenarios has been cut out to obtain the same duration for each participant, corresponding to 14,000 frames sampled at 60 Hz for the eye-tracking data and 233 frames at 1 Hz for the flight simulator.

Eye tracking data

Fig 4 shows the entire eye tracking pipeline analysis. Each AOI was coded using numbers from 1 to 10 corresponding to the flight instruments (see Fig 3). Only AOI-based data were extracted in this experiment and concatenated to obtain two chronological vectors containing the indices of the visited AOIs (from 1 to 10) and the time spent on them. Dwells inferior to 200 ms [40] were discarded. Furthermore, consecutive fixations in the same area were merged (e.g., for 1, 1, 4, 4, 5, 5, 5, 6 we only consider 1, 4, 5, 6). The transition vector (the vector containing the transitions between each AOI numbers) was used to compute LZC, GTE. Concerning the transition matrices, given their high dimensionality, it is difficult to use classical inferential statistics. Therefore, we applied machine learning algorithms on the concatenated transition matrices to compare the two groups of participants (novice vs pilot). Various types of machine learning model were used (SVM, LDA, K-Nearest Neighbor, for a review see [63]). The algorithm performing the best accuracy (Cosine KNN) was selected in this paper. The transition probabilities from one AOI to another were taken as a feature, thus raising the number of features to a total of 100 features (i.e., 10 AOIs × 10 AOIs). A principal component analysis (PCA) was used to reduce the features’ numbers. This restricts the model to 35 features corresponding to the main transition probabilities of the matrices. Five-fold cross-validation was used, which is a good trade-off between bias and variance estimation [64]. According to Combrisson and Jerbi [65] theoretical chance level for classification for p < 0.05 with two classes is around 58%. Concerning the K coefficient, the transition entropy, and the Lempel Ziv complexity methods, they were respectively computed following the methods of [48, 54, 60]. Finally, based on the transition vector, the n-grams frequency-based method [62] was used to identify the number of common 3, 4, 5, and 6-gram sequences in each group. After counting the occurrence of given n-grams for each participant, the number of common sequences of each n-gram was calculated for each group (Novice/Pilots).

Fig 4. Analysis pipeline for the eye tracking data.

Fig 4

Flight simulator data

The flying performances were examined to quantify the ability of the pilot to comply with the specific flying instructions given by the experimenter. As presented in Fig 5, Root Mean Square Errors (RMSEs) were calculated for 3 different flight parameters: speed, vertical speed, and heading. In this experiment, the predicted values corresponded to the different specific threshold given by the experimenter (i.e., speed 130 kt; vertical speed below -500 ft/min and above +800 ft/min; heading different from 143°) and the observed values corresponded to actual pilots’ performances. The deviations were calculated following the formula:

RMSEk,k+1=1ni=1n(Oi-Pi)2 (3)

as where for n data points between points k and k + 1, Pi was the predicted value and Oi the observed value.

Fig 5. Analysis pipeline for the flight parameters data.

Fig 5

Statistical analysis

We performed 2 × 3 repeated measures analysis of variance (ANOVA) for each dependent variable (i.e., dual task omission, average dwell time, the total number of dwells, LZC, transition entropy, K coefficient, RMSE heading, RMSE vertical speed, RMSE speed) to assess the effects of the group (novices, pilots) with scenario difficulty as the within-subjects factors (three levels: Control scenario, Easy dual task scenario, Difficult dual task scenario). The normal distribution for each dependent variable was also checked. We used the Greenhouse-Geisser and Huynh-Feldt adjustment to correct the violation of the sphericity assumption when needed. Bonferroni post-hoc tests were performed for multiple comparisons and reported Bonferroni post-hoc are only those with significant differences. The level of significance was set to α = 0.05 and partial η2 was used to estimate the effect sizes.

Results

Flight performances

The flight performances are shown in Fig 6.

Fig 6. Flight performances for heading, vertical speed, and speed deviations among novices and pilots groups.

Fig 6

Error bars represent SD and * indicates main effects p < 0.05. (CS = control scenario; EDTS = Easy dual task scenario; HDTS = Hard dual task scenario).

Heading

There was no significant main effect of the group, F(1, 30) = 0.03, p = 0.874, nor main effect of the scenario, F(2, 60) = 0.9, p = 0.39, on heading deviations. The scenario × group interaction was not significant, F(2, 60) = 0.4, p = 0.67.

Speed

A significant main effect of the group on speed deviation was found, F(1, 30) = 4.3, p < 0.05, η2 = 0.13, with the novice’s group (M = 5.46; SD = 1.94) showing higher speed deviation than pilot’s group (M = 2.66; SD = 1.97). Analyses also revealed a significant main effect of the scenario, F(2, 60) = 3.6, p < 0.05, η2 = 0.11. Bonferroni post-hoc test showed that speed deviation was lower during the control scenario (M = 2.95; SD = 0.93) compared to the easy dual task scenario (M = 3.70; SD = 1.02) and the difficult dual task scenario (M = 5.53; SD = 2.79). There was a significant effect of scenario × group interaction, F(2, 60) = 3.3, p < 0.05, η2 = 0.09. Bonferroni post-hoc test showed that the speed deviation was lower for the pilot’s group in the difficult dual task scenario (M = 2.93; SD = 3.97) compared to the novice’s group in the difficult dual task scenario (M = 8.13; SD = 4.02).

Vertical speed

Analyses revealed a significant main effect of the group, F(1, 30) = 11.4, p < 0.05, η2 = 0.28, on vertical speed deviation, with the novice’s group (M = 565; SD = 130) showing higher vertical speed deviation than pilot’s group (M = 258; SD = 134). Analyses also revealed a significant main effect of the scenario, F(2, 60) = 5.1, p < 0.01, η2 = 0.15. Bonferroni post-hoc test showed that the vertical speed deviation was lower during the control scenario (M = 265; SD = 103) compared to the easy dual task scenario (M = 403; SD = 141) and the difficult dual task scenario (M = 566; SD = 184). The scenario × group interaction was not significant, F(2, 60) = 0.7, p = 0.52, η2 = 0.02.

Dual task omissions

Analyses showed (Fig 7) a significant main effect of the group on omissions, F(1, 30) = 35.3, p < 0.05, η2 = 0.54. The novice’s group had a higher number of omissions (M = 2.75; SD = 1) than the pilot’s group (M = 0.68; SD = 0.5). Analyses also revealed a significant main effect of the scenario, F(1, 30) = 24.8, p < 0.05, η2 = 0.45. Bonferroni post-hoc test showed that the difficult dual task scenario (M = 2.37; SD = 0.52) yielded more omissions than the easy dual task scenario (M = 1.06; SD = 0.3). The scenario × group interaction was significant, F(1, 30) = 16.2, p < 0.05, η2 = 0.35. Bonferroni post-hoc test showed that there were more omissions during the difficult dual task scenario (M = 1.5; SD = 1) vs. easy dual task scenario (M = 3.95; SD = 2) in novices whereas the number of errors did not differ among the two scenarios for pilots.

Fig 7. Omission number for the easy dual task scenario and hard dual task scenario among novices and pilots groups.

Fig 7

Error bars represent SD and * indicates main effects p < 0.05.

Basic eye metrics

Average dwell times

Analyses showed (Fig 8) a significant main effect of group, F(1, 30) = 8.1, p < 0.05, η2 = 0.22, with short average dwell times for the pilot’s group (M = 1.1; SD = 0.2) compared to the novice’s group (M = 1.51; SD = 0.21). We also found a significant main effect of the scenario, F(2, 60) = 19.0, p < 0.05, η2 = 0.39. Bonferroni post-hoc showed that the average dwell time was shorter during easy dual task (M = 1.16; SD = 0.12) and difficult dual task scenario (M = 1.16; SD = 0.17) than during the control scenario (M = 1.58; SD = 0.22). There was no significant scenario × group interaction, F(2, 60) = 2.3, p = 0.11, η2 = 0.07. The time spent gazing outside the defined AOIs was relatively low (M = 4.21% for experts; M = 4.62% for novices), see supplementary material for detailed information (S1 Fig).

Fig 8. From left to right, respectively the average dwell and the number of dwells averaged over all scenarios among novice and pilot groups.

Fig 8

Error bars represent SD and * indicates main effects p < 0.05.

Number of dwells

Analyses showed (Fig 8) a significant main effect of group, F(1, 30) = 13.3, p < 0.05, η2 = 0.31, with a higher number of dwells for the pilot’s group (M = 188; SD = 21) compared to the novice’s group (M = 137.5; SD = 19.9). Analyses also revealed a significant main effect of the scenario, F(2, 60) = 13.2, p < 0.05, η2 = 0.31. Bonferroni post-hoc showed that the number of dwells was higher during easy dual task scenario (M = 172; SD = 16) and during the difficult dual task scenario (M = 177; SD = 18) compared to the control scenario (M = 137; SD = 17). There was no significant scenario × group interaction, F(2, 60) = 0.7, p = 0.50, η2 = 0.02.

Markov chain and machine learning

The confusion matrix presented in Fig 9 show that approach based on Cosine KNN reached classification accuracy up to 91.7% to classify expertise based on transition matrices during the baseline scenario. As shown in Fig 10, the differences in transition matrices between novices/pilots are mainly observed in a more sparsed distribution of transition probabilities from one instrument to another for the pilot’s group. Most of the AOI explored by Novice group involved AOI concentrated in the PFD (from 1 to 5, see Fig 4) while pilot’s group explore other combinations of AOI.

Fig 9. Confusion matrix of fivefold cross-validation using the Cosine K-Nearest neighbors among novices and Pilots groups during the baseline scenario.

Fig 9

Fig 10. Markov chains (Left) and transition matrices (Right) AOI-based representations among novices (top) and pilots groups (bottom) during the baseline scenario.

Fig 10

Attentional modes and K coefficient

Analyses showed (Fig 11) no significant effect of the group, F(1, 30) = 3.3, p = 0.07, η2 = 0.10, on the K coefficient. However, the main effect of scenario was significant, F(2, 60) = 38.1, p < 0.01, η2 = 0.56. Bonferroni post-hoc test showed that K coefficient was lower during the easy dual task scenario (M = -0.12; SD = 0.06) and during the difficult dual task scenario (M = -0.01; SD = 0.12) compared to the control scenario (M = 0.28; SD = 0.10). There was also a significant difference between the easy dual task scenario (M = -0.12; SD = 0.06) and the difficult dual task scenario (M = 0; SD = 0.12). The scenario × group interaction was significant, F(2, 60) = 4.8, p = 0.01, η2 = 0.15. Bonferroni post-hoc test showed that K coefficient was lower for the pilot’s group in the control scenario (M = 0.14; SD = 0.16) compared to the novice’s group in the control scenario (M = 0.41; SD = 0.16). Bonferroni post-hoc test also showed that K coefficient was lower for the pilot’s group in the difficult dual task scenario (M = -0.10; SD = 0.17) compared to the novice’s group in the difficult dual task scenario (M = 0.09; SD = 0.16).

Fig 11. Ambient focal K coefficient during the control scenario, the easy dual task scenario, and hard dual task scenario among novices and pilots groups.

Fig 11

k > 0 indicates a focal visual attention, whereas k < 0 indicates an ambient visual attention. (error bars represent SD and * indicates main effects p < 0.05.

Sequence analyses

Transition entropy

Analyses showed (Fig 12) a significant main effect of group, F(1, 30) = 6.0, p < 0.05, η2 = 0.17, with the novice’s group (M = 1.22; SD = 0.2) showing lower transition entropy than pilot’s group (M = 1.56; SD = 0.2). Analyses also revealed a significant main effect of the scenario, F(2, 60) = 8.4, p < 0.05, η2 = 0.22. Bonferroni post-hoc test showed that the transition entropy was higher during easy dual task scenario (M = 1.50; SD = 0.16) and during difficult dual task scenario (M = 1.44; SD = 0.17) than during the control scenario (M = 1.23; SD = 0.15). The scenario x group interaction term was not significant, F(2, 60) = 0.2, p = 0.82, η2 = 0.01.

Fig 12. Transition entropy during the control scenario, the easy dual task scenario, and the hard dual task scenario among novices and pilots groups.

Fig 12

Error bars represent SD and * indicates main effects p < 0.05.

Common N-grams sequence

As presented in Fig 13, the count of common n-gram sequences revealed that pilots have more common sequences than novices during all scenarios (Control, easy dual task, and hard dual task). The easy dual task and hard dual task scenario yielded to more common sequences for both groups compared to the control scenario. Regardless of the n-gram length (3, 4, 5, or 6), during the control scenario, the pilots had more common sequences than novices. For example, the most frequent tri-gram pattern for the novices was OTW⇒VS⇒OTW—transition between out-of-the-window, vertical speed, and back. On average, it was repeated 6.4 times. For the pilots, the most frequent tri-gram occurred 17.4 times on average and it was OTW⇒ECAM⇒OTW. We also note that the ten most frequent n-grams included the same AOI at least twice (for instance, repeated transitions between same instruments). For novices, trigrams involving three unique AOIs were OTW⇒SPD⇒ATT repeated 2.6 times on average, OTW⇒VS⇒ATT—2.1 times, and OTW⇒ATT⇒SPD—2 times. For pilots, the trigrams involving unique AOIs were OTW⇒ECAM⇒ATT repeated 9.4 times on average, OTW⇒VS⇒ATT—8.6 times, OTW⇒ATT⇒VS—4.6 times, and OTW⇒HDG⇒ATT—3.8 times. For the both easy and hard dual task scenarios, the most frequent trigram involved the ND zone display for both groups OTW⇒NDz⇒OTW. It occurred 17.6 times on average for novices and 19.1 for pilots during the easy dual-task scenario, and 21.1 times on average for novices and 22.2 for pilots during the hard dual-task scenario. Our results showed that for novices only one frequent trigram with unique AOIs found in the control scenario was also found during the easy dual task scenario (OTW⇒VS⇒ATT). However, this trigram was not found during the hard-dual task scenario. As for the pilots, between four trigrams with unique AOIs that were found in the control scenario, only 2 of them were found in the easy dual-task scenario, and only one in the hard dual task scenario (see Table 1). Interestingly, the most frequent 5-grams among novices was OTW⇒SPD⇒OTW⇒SPD⇒OTW repeated on average 1.5 times whereas OTW⇒VS⇒ATT⇒OTW⇒ATT was the most frequent 5-gram among pilots repeated on average 3 times.

Fig 13. Number of common patterns sequence by N-grams length during the control scenario, the easy dual task scenario, and the hard dual task scenario among novices and pilots groups.

Fig 13

Table 1. The most frequent trigrams involving unique AOIs in the pilot group during the Control Scenario (CS), the Easy Dual-Task Scenario (EDTS), and the Hard Dual-Task Scenario (HDTS).
Frequent trigram with unique AOIs Av. occur. in the CS Av. occur. in the EDTS Av. occur. in the HDTS
OTW⇒ECAM⇒ATT 9.4 0 0
OTW⇒VS⇒ATT 8.6 7.7 0
OTW⇒ATT⇒VS 4.6 5.5 0
OTW⇒HDG⇒ATT 3.8 0 0

Av. occur. = average occurences.

Lempel-Ziv Complexity (LZC)

Analyses showed (Fig 14) a significant main effect of group, F(1, 30) = 10.0, p < 0.05, η2 = 0.25, with a higher LZC for the pilot’s group (M = 40.3; SD = 5.6) compared to the novice’s group (M = 33; SD = 5.2). There was also a significant main effect of the scenario, F(2, 60) = 13.2, p < 0.05, η2 = 0.30. Bonferroni post-hoc test showed that LZC was higher during easy dual task (M = 40.46; SD = 4.4) and difficult dual task scenario (M = 37.9; SD = 4.97) than during the control scenario (M = 31.7; SD = 3.76). The scenario × group interaction was not significant, F(2, 60) = 0.5, p = 0.62, η2 = 0.02.

Fig 14. Lempel-Ziv complexity during the control scenario, the easy dual task scenario, and hard dual task scenario among novices and pilots groups.

Fig 14

Discussion

Several previous studies have reported differences among pilots and novices in how they scan cockpit instruments using standard metrics such as fixation duration, dwell times, numbers of saccades, etc. In this work, standard and advanced eye metrics were analyzed in sixteen novices and sixteen professional pilots during landing scenarios involving different visuo-attentional effort. All the metrics used in this study allowed characterizing visual scanning. We examined the impact of expertise and flying difficulty on the visual scanning strategies. As our results showed, a large number of standard and advanced metrics were sensitive to these two factors. Each metric has its strengths and weaknesses to bring an understanding of visual strategies. For instance, while a transition matrix measure and an entropy value are closely related, the information presented for one and the other is different. A transition matrix makes it possible to measure the preferred paths when consulting AOIs. It highlights the strength of the links between AOIs while the entropy will reflect the disorder of these transition sequences. The application of these metrics can be different. For example, if the aim is to redesign a cockpit panel, transition matrices can be very useful because they give the strength of the relationship between AOIs. This metric can allow to bring close together instruments that are often gazed consecutively, which would help to spare the pilot’s visual attention effort. Concerning LZC and N-gram method, N-gram compares the patterns used within the group, while LZC assesses the compressibility of the patterns (how varied the patterns are).

Flight performances

Our results showed that expert pilots had better flying performance than novices. In particular, they had lower speed and vertical speed deviations. The heading variable was not sensitive most likely because the aircraft was nearly in front of the airfield at the beginning of the scenarios. We assume that these superior flying performances are at least partially due to better visual scanning strategies gained with expertise.

Basic eye metrics

Expert pilots had shorter average dwell times and a higher number of dwells compared to novices. This result has been interpreted in the literature [15, 66, 67] as an important sign of expertise, built on an optimization of the visual information processing, allowing faster extraction of information when consulting a flight instrument. This strategy allows consulting more often the various instruments, resulting in a better updating of situation awareness [68]. This result also validates the existence of a superior perceptual encoding of domain-related patterns [40] in expert pilots.

Markov chains and attentional mode

Based on transition matrices, a machine learning approach using Cosine KNN algorithm reached an accuracy of 93% to classify expertise. Expert pilots had more heterogeneous transition probabilities when switching from an instrument to another. This suggests that experts include more flight instruments in their visual scans and succeed to balance their time between them. The focal-ambient K coefficient showed that attention was dominantly focal (positive value) in both groups during the control scenario. However, the attention was more focal in the novice’s group vs the pilot group. It can be assumed that expert pilots have a greater spatial distribution of their visual attention than novices. The K coefficient also showed sensitivity to the task difficulty. By adding a monitoring task (Easy dual-task scenario) inducing a supplementary display to monitor, visual attention switched from focal to ambient for the 2 groups. Interestingly, by further increasing the time pressure of the monitoring task (hard dual-task scenario), we found that while the induced dual-task changed the ambient-focal strategy of the novices, the pilot group kept their strategy consistent across experimental scenarios.

Sequence analyses

As showed by the transition entropy analysis, more information (bits) was required to describe expert pilots’ visual strategies than the novice group. Thus, the pilot group exhibited more complex visual scanning patterns. The n-grams analysis of common sequences highlighted the existence of more similar visual strategies (within the professional pilot group) built with expertise as well as more elaborate visual strategies considering common visual scanning patterns of size 6 (6-grams). Furthermore, this analysis revealed that some complex patterns (that include only distinct flight instruments) found in the control scenario were still present in both easy and hard-dual task scenarios. We expected that adding a double task would impact the visual scanning. Our results revealed that pilots kept their visual scanning strategies related to the manual landing task by maintaining visual patterns (found in the control scenario) in the dual-task scenarios (easy and hard). We back these results up with the dual task performances and flight performances where maintaining patterns related to the landing task (control scenario) during dual task scenarios (easy dual task and hard dual task) would maintain relevant visual activity for maintaining flight performance and performing callbacks. Finally, AOI redundancies were also found in both groups, i.e. n-grams having twice several same AOI in an n-gram sequence. The complexity of the Lempel-Ziv demonstrated that redundancies were lower in the pilots group. Pilots displayed a higher complexity and richness of visual patterns, containing a larger variety of possible combinations.

Expertise theories

Three theories can explain the expert superiority in visual domains. First, the theory of long-term working memory [69] assumes that expertise extends the capacities for information processing. This theory assumes that the limited-capacity assumption should be reconsidered when related to an expert’s specific domain. Related to this hypothesis, experts encode and retrieve information more rapidly than novices. This expert’s rapid information processing is reflected in shorter dwell durations. The second theory is related to the information-reduction hypothesis [70]. This assumes that expertise optimizes the amount of processed information by neglecting task-irrelevant information. Our results demonstrated that expert’ group keeps maintaining the visual scanning strategies related to the piloting activity during the hard dual task scenario while novice under-performed during this scenario. This result highlights the expert’s ability to focus toward relevant information to perform the task neglecting redundant information. Eventually, the third theory is the holistic model of image perception [71]. It focuses on the extension of the visual span. Charness et al. [72] shown that experts extract information from widely distanced and parafoveal regions, producing patterns of saccadic selectivity by piece saliency [73]. Our results suggested that expert over-performed the novice group in maintaining their speed. N-gram analysis revealed the visual scanning strategies related to speed were not found for the pilots whereas novices presents this AOI in their sequences. These results suggested the ability for experts to process information through parafoveal processing.

Limitations

There are some limitations to this study. We compared professional pilots with non-pilots only. The comparison of these two very different profiles can artificially increase the observed differences in terms of ocular behavior. A further research should consider participants with different levels of expertise from novice to expert (e.g., every 1000 hours) to finely examine the implementation of the visual strategies with expertise. Another limitation concerns the flight simulator used in the study. While it is somehow representative and allows to simulate real flight with all primary displays, one should consider a full flight simulator to better fit with the operational context. This experiment could be also replicated with different meteorological conditions. Finally, the eye tracker devices are more and more mature and accurate (about 1° at a distance of one meter). However, care should be taken when analyzing contiguous AOIs, the accuracy limitations of eye tracking systems could lead to errors in this situation. Most eye-tracking studies rely on the eye-mind hypothesis which states that users fixate on an area that relates to the currently processed information. However, special care should be taken when analyzing areas of interest close to each other. Pilots can perceive some information in peripheral visions, for example, speed changing via the movement of the speed tape [74]. The experts may succeed in maintaining a constant speed by looking only at the attitude zone. This would explain why the “AOI SPD” corresponding to speed tape is not often found in the most frequent patterns (n-grams). Finally, we should also specify that eye tracking allow capturing only overt attention, for example when a person moves his eyes in the direction of an object, and not covert attention, when an individual focus his attention on an object, but without moving the eyes toward that object.

Conclusion

This work highlighted the difference between novices and expert pilots concerning visual scanning strategies and flight performances. Our result confirmed that expertise exerts a top-down modulation on gaze behaviour [10]. We used a wide variety of standard and advanced metrics to uncover the modification of the gaze behavior bring by expertise. Expert pilots have a more efficient perception of the information, better dispersion of their attention, and more elaborate visual patterns. Expertise makes it possible, despite a dual-task costly in visuo-attentional resources, to maintain visual patterns linked with the flying task (i.e. the irrelevant dual-task did not alter the nominal visual behavior). Overall, the eye metrics used in this research are relevant to finely assess pilot’s gaze behavior in the cockpit and can contribute to better characterize visual scanning in the cockpit, an important topic for safety [75]. These eye metrics can be used to evaluate pilots during their training program. For example, it might be possible to follow the evolution of their scanning strategies and determine whether they tend to resemble that of expert pilots. In the future, it might be possible to assess cockpit monitoring during real flight [76, 77]. In this sense, recent studies investigated the possibility to use an eye tracking assistant to warn pilots using a database of the visual behaviour of expert pilots [7880]. Our results suggest that such on-board eye tracking could be customized based on pilot experience. Finally, we believe that the eye metrics employed in this study can be also useful for practitioners and researchers in other fields such as air traffic control and automotive.

Supporting information

S1 Fig. Time spent gazing outside the defined AOI for each participants during all scenarios.

(TIF)

S1 Data

(XLSX)

S2 Data

(XLSX)

S3 Data

(XLSX)

S4 Data

(XLSX)

S5 Data

(XLSX)

S6 Data

(XLSX)

Acknowledgments

We are especially grateful to all the pilots who participated in the experiment, and to Kevin J. Verdière, Emilie S. Jahanpour, Quentin Chenot, Evelyne Lepron, and Lili for their valuable assistance. The authors thank the PEGASE simulator technical team. The manuscript was improved thanks to three anonymous reviewers.

Data Availability

All relevant data are within the manuscript and its Supporting information files.

Funding Statement

This research was supported by a chair grant from Dassault Aviation (\CASAC”, holder: MC). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Endsley, M. R. (1997, October). Supporting situation awareness in aviation systems. In 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation (Vol. 5, pp. 4177-4181). IEEE.
  • 2. Wei H., Zhuang D., Wanyan X., & Wang Q. (2013). An experimental analysis of situation awareness for cockpit display interface evaluation based on flight simulation. Chinese Journal of Aeronautics, 26(4), 884–889. 10.1016/j.cja.2013.04.053 [DOI] [Google Scholar]
  • 3. Jarvis S. R. (2017). Concurrent pilot instrument monitoring in the automated multi-crew airline cockpit. Aerospace medicine and human performance, 88(12), 1100–1106. 10.3357/AMHP.4882.2017 [DOI] [PubMed] [Google Scholar]
  • 4.National Transportation Safety Board (NTSB). A Review of Flightcrew-Involved, Major Accidents of U.S. Carriers, 1978 through 1990, Safety Study NTSB/SS-94/01; NTSB:Washington, DC, USA, 1994.
  • 5.Flight Safety Foundation, The Active Pilot Monitoring Working Group, A Practical Guide for Improving Flight Path Monitoring (November 2014)
  • 6.Federal Aviation Administration (FAA). Safety Alert for Operators No15011, Roles and Responsibilities for Pilot Flying (PF) and Pilot Monitoring (PM); Flight Standards Service: Washington, DC, USA, 2015. Available online:
  • 7.Federal Aviation Administration (FAA). 14 CFR Part 121 Qualification, Service, and Use of Crewmembers and Aircraft Dispatchers; Final Rule; Federal Register: Washington, DC, USA, 2013; Volume 78, Issue 218. Available online: http://www.gpo.gov/fdsys/granule/FR-2013-11-12/2013-26845 (accessed on 2 November 2017).
  • 8.British European Airways (BEA). Study on Aeroplane State Awareness during Go-Around. 2013. Available online: https://www.bea.aero/etudes/asaga/asaga.study.pdf (accessed on 2 November 2017).
  • 9.Lefrancois, O., Matton, N., Gourinat, Y., Peysakhovich, V., & Causse, M.. (2016). The role of Pilots’ monitoring strategies in flight performance.
  • 10. Shapiro K. L., & Raymond J. E. (1989). Training of efficient oculomotor strategies enhances skill acquisition. Acta Psychologica, 71(1-3), 217–242. 10.1016/0001-6918(89)90010-3 [DOI] [PubMed] [Google Scholar]
  • 11. Kang Z., & Landry S. J. (2014). Using scanpaths as a learning method for a conflict detection task of multiple target tracking. Human factors, 56(6), 1150–1162. 10.1177/0018720814523066 [DOI] [PubMed] [Google Scholar]
  • 12. Nibbeling N., Oudejans R. R., & Daanen H. A. (2012). Effects of anxiety, a cognitive secondary task, and expertise on gaze behavior and performance in a far aiming task. Psychology of Sport and Exercise, 13(4), 427–435. 10.1016/j.psychsport.2012.02.002 [DOI] [Google Scholar]
  • 13. Williams A. M., Singer R. N., & Frehlich S. G. (2002). Quiet eye duration, expertise, and task complexity in near and far aiming tasks. Journal of Motor Behavior, 34(2), 197–207. 10.1080/00222890209601941 [DOI] [PubMed] [Google Scholar]
  • 14. Vickers J. N., & Lewinski W. (2012). Performing under pressure: Gaze control, decision making and shooting performance of elite and rookie police officers. Human movement science, 31(1), 101–117. 10.1016/j.humov.2011.04.004 [DOI] [PubMed] [Google Scholar]
  • 15. Gegenfurtner A., Lehtinen E., & Säljö R. (2011). Expertise differences in the comprehension of visualizations: A meta-analysis of eye-tracking research in professional domains. Educational Psychology Review, 23(4), 523–552. 10.1007/s10648-011-9174-7 [DOI] [Google Scholar]
  • 16. Reingold E. M., & Sheridan H. (2011). Eye movements and visual expertise in chess and medicine In Oxford handbook on eye movements (Vol. 528). Oxford, UK: Oxford University Press. [Google Scholar]
  • 17. Ziv G. (2016). Gaze behavior and visual attention: A review of eye tracking studies in aviation. The International Journal of Aviation Psychology, 26(3-4), 75–104. 10.1080/10508414.2017.1313096 [DOI] [Google Scholar]
  • 18. Lai M. L., Tsai M. J., Yang F. Y., Hsu C. Y., Liu T. C., Lee S. W. Y., et al. (2013). A review of using eye-tracking technology in exploring learning from 2000 to 2012. Educational research review, 10, 90–115. 10.1016/j.edurev.2013.10.001 [DOI] [Google Scholar]
  • 19.Robinski, M., & Stein, M. (2013). Tracking visual scanning techniques in training simulation for helicopter landing.
  • 20. Yang J. H., Kennedy Q., Sullivan J., & Fricker R. D. (2013). Pilot performance: assessing how scan patterns & navigational assessments vary by flight expertise. Aviation, space, and environmental medicine, 84(2), 116–124. 10.3357/ASEM.3372.2013 [DOI] [PubMed] [Google Scholar]
  • 21. Yu C. S., Wang E. M. Y., Li W. C., Braithwaite G., & Greaves M. (2016). Pilots’ visual scan patterns and attention distribution during the pursuit of a dynamic target. Aerospace medicine and human performance, 87(1), 40–47. 10.3357/AMHP.4209.2016 [DOI] [PubMed] [Google Scholar]
  • 22. Haslbeck A., Schubert E., Gontar P., & Bengler K. (2012). The relationship between pilots’ manual flying skills and their visual behavior: a flight simulator study using eye tracking Advances in Human Aspects of Aviation, 561–568. [Google Scholar]
  • 23. Haslbeck A., & Zhang B. (2017). I spy with my little eye: Analysis of airline pilots’ gaze patterns in a manual instrument flight scenario. Applied ergonomics, 63, 62–71. 10.1016/j.apergo.2017.03.015 [DOI] [PubMed] [Google Scholar]
  • 24. Brams S., Hooge I. T., Ziv G., Dauwe S., Evens K., De Wolf T., et al. (2018). Does effective gaze behavior lead to enhanced performance in a complex error-detection cockpit task?. PloS one, 13(11), e0207439 10.1371/journal.pone.0207439 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Peißl S., Wickens C. D., & Baruah R. (2018). Eye-tracking measures in aviation: A selective literature review. The International Journal of Aerospace Psychology, 28(3-4), 98–112. 10.1080/24721840.2018.1514978 [DOI] [Google Scholar]
  • 26. Bellenkes A. H., Wickens C. D., & Kramer A. F. (1997). Visual scanning and pilot expertise: the role of attentional flexibility and mental model development. Aviation, space, and environmental medicine. [PubMed] [Google Scholar]
  • 27.Kasarskis, P., Stehwien, J., Hickox, J., Aretz, A., & Wickens, C. (2001, March). Comparison of expert and novice scan behaviors during VFR flight. In Proceedings of the 11th international symposium on aviation psychology (Vol. 6, pp. 325-335).
  • 28. Lorenz B., Biella M., Teegen U., Stelling D., Wenzel J., Jakobi J., et al. (2006). Performance, situation awareness, and visual scanning of pilots receiving onboard taxi navigation support during simulated airport surface operation Human Factors and Aerospace Safety. [Google Scholar]
  • 29. Svensson E., Angelborg-Thanderez M., Sjöberg L., & Olsson S. (1997). Information complexity-mental workload and performance in combat aircraft. Ergonomics, 40(3), 362–380. 10.1080/001401397188206 [DOI] [PubMed] [Google Scholar]
  • 30.Xiong, W., Wang, Y., Zhou, Q., Liu, Z., & Zhang, X. (2016, July). The research of eye movement behavior of expert and novice in flight simulation of landing. In International conference on engineering psychology and cognitive ergonomics (pp. 485-493). Springer, Cham.
  • 31. Goldberg J. H., & Kotval X. P. (1999). Computer interface evaluation using eye movements: methods and constructs. International journal of industrial ergonomics, 24(6), 631–645. 10.1016/S0169-8141(98)00068-7 [DOI] [Google Scholar]
  • 32.Fitts, P. M., Jones, R. E., & Milton, J. L. (1949). Eye Fixations of Aircraft Pilots. III. Frequency, Duration, and Sequence Fixations When Flying Air Force Ground-Controlled Approach System (GCA). Air materiel command wright-patterson afb oh.
  • 33.Jones, R. E., Milton, J. L., & Fitts, P. M. (1949). “Eye Fixations of Aircraft Pilots, IV. Frequency, Duration and Sequence of Fixations During Routine Instrument Flight”. US Air Force Technical Report, 5975.
  • 34.Jones, R. E., Milton, J. L., & Fitts, P. M. (1949). Eye fixations of aircraft pilots, I. a review of prior eye-movement studies and a description of a technique for recording the frequency, duration and sequences of eye-fixations during instrument flight. Wright Patterson AFB, OH, USAF Tech. Rep, 5837.
  • 35. Senders J. W. (1966). A re-analysis of the pilot eye-movement data. IEEE Transactions on Human Factors in Electronics, (2), 103–106. 10.1109/THFE.1966.232330 [DOI] [Google Scholar]
  • 36. Follet B., Le Meur O., & Baccino T. (2011). New insights into ambient and focal visual fixations using an automatic classification algorithm. i-Perception, 2(6), 592–610. 10.1068/i0414 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Tole, J. R., Stephens, A. T., Vivaudou, M., Ephrath, A. R., & Young, L. R. (1983). Visual scanning behavior and pilot workload. [PubMed]
  • 38. Di Nocera F., Camilli M., & Terenzi M. (2007). A random glance at the flight deck: Pilots’ scanning strategies and the real-time assessment of mental workload. Journal of Cognitive Engineering and Decision Making, 1(3), 271–285. 10.1518/155534307X255627 [DOI] [Google Scholar]
  • 39.Glaholt, M. G. (2014). Eye tracking in the cockpit: a review of the relationships between eye movements and the aviators cognitive state. Defence Research and Development Toronto (Canada).
  • 40.Goldberg, J. H., Stimson, M. J., Lewenstein, M., Scott, N., & Wichansky, A. M. (2002, March). Eye tracking in web search tasks: design implications. In Proceedings of the 2002 symposium on Eye tracking research & applications (pp. 51-58).
  • 41. Norris J. R. (1998). Markov chains (No. 2). Cambridge university press. [Google Scholar]
  • 42.Jones, D. H. (1985). An error-dependent model of instrument-scanning behavior in commercial airline pilots. NASA Contractor Report 3908. NASA, Langley.
  • 43.Hayashi, M. (2004). Hidden Markov Models for analysis of pilot instrument scanning and attention switching (Doctoral dissertation, Massachusetts Institute of Technology).
  • 44.Hayashi, M., Beutter, B., & McCann, R. S. (2005, October). Hidden Markov model analysis for space shuttle crewmembers’ scanning behavior. In 2005 IEEE International Conference on Systems, Man and Cybernetics (Vol. 2, pp. 1615-1622). IEEE.
  • 45. Ognjanovic S., Thüring M., Murphy R. O., & Hölscher C. (2019). Display clutter and its effects on visual attention distribution and financial risk judgment. Applied ergonomics, 80, 168–174. 10.1016/j.apergo.2019.05.008 [DOI] [PubMed] [Google Scholar]
  • 46. Holmqvist K., Nyström M., Andersson R., Dewhurst R., Jarodzka H., & Van de Weijer J. (2011). Eye tracking: A comprehensive guide to methods and measures. OUP Oxford. [Google Scholar]
  • 47. Wickens C. D., & Alexander A. L. (2009). Attentional tunneling and task management in synthetic vision displays. The International Journal of Aviation Psychology, 19(2), 182–199. 10.1080/10508410902766549 [DOI] [Google Scholar]
  • 48. Krejtz K., Duchowski A., Krejtz I., Szarkowska A., & Kopacz A. (2016). Discerning ambient/focal attention with coefficient K. ACM Transactions on Applied Perception (TAP), 13(3), 1–20. 10.1145/2896452 [DOI] [Google Scholar]
  • 49. Krejtz K., Çöltekin A., Duchowski A. T., & Niedzielska A. (2017). Using coefficient K to distinguish ambient/focal visual attention during map viewing. Journal of eye movement research, 10(2), 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Lounis, C. A., Hassoumi, A., Lefrancois, O., Peysakhovich, V., & Causse, M. (2020, June). Detecting ambient/focal visual attention in professional airline pilots with a modified Coefficient K: a full flight simulator study. In ACM Symposium on Eye Tracking Research and Applications (pp. 1-6).
  • 51. Heitz R. P., & Engle R. W. (2007). Focusing the spotlight: Individual differences in visual attention control. Journal of Experimental Psychology: General, 136(2), 217 10.1037/0096-3445.136.2.217 [DOI] [PubMed] [Google Scholar]
  • 52.Shannon, C. W., & Weaver, W. (1948). W.: (1949) The Mathematical Theory of Communication. Press UoI, editor.
  • 53. Shiferaw B., Downey L., & Crewther D. (2019). A review of gaze entropy as a measure of visual scanning efficiency. Neuroscience & Biobehavioral Reviews, 96, 353–366. 10.1016/j.neubiorev.2018.12.007 [DOI] [PubMed] [Google Scholar]
  • 54. Krejtz K., Duchowski A., Szmidt T., Krejtz I., González Perilli F., Pires A., et al. (2015). Gaze transition entropy. ACM Transactions on Applied Perception (TAP), 13(1), 1–20. 10.1145/2834121 [DOI] [Google Scholar]
  • 55.Ephrath, A. R., Tole, J. R., Stephens, A. T., & Young, L. R. (1980, October). Instrument Scan—Is it an Indicator of the Pilot’s Workload?. In Proceedings of the Human Factors Society Annual Meeting (Vol. 24, No. 1, pp. 257-258). Sage CA: Los Angeles, CA: SAGE Publications.
  • 56. Van de Merwe K., van Dijk H., & Zon R. (2012). Eye movements as an indicator of situation awareness in a flight simulator experiment. The International Journal of Aviation Psychology, 22(1), 78–95. 10.1080/10508414.2012.635129 [DOI] [Google Scholar]
  • 57. Allsop J., & Gray R. (2014). Flying under pressure: Effects of anxiety on attention and gaze behavior in aviation. Journal of Applied Research in Memory and Cognition, 3(2), 63–71. 10.1016/j.jarmac.2014.04.010 [DOI] [Google Scholar]
  • 58. Diaz-Piedra C., Rieiro H., Cherino A., Fuentes L. J., Catena A., & Di Stasi L. L. (2019). The effects of flight complexity on gaze entropy: An experimental study with fighter pilots. Applied ergonomics, 77, 92–99. 10.1016/j.apergo.2019.01.012 [DOI] [PubMed] [Google Scholar]
  • 59. Aboy M., Hornero R., Abásolo D., & Álvarez D. (2006). Interpretation of the Lempel-Ziv complexity measure in the context of biomedical signal analysis. IEEE transactions on biomedical engineering, 53(11), 2282–2288. 10.1109/TBME.2006.883696 [DOI] [PubMed] [Google Scholar]
  • 60.Lounis, C., Peysakhovich, V., & Causse, M. (2020). Lempel-Ziv Complexity of dwell sequences: visual scanning pattern differences between novice and expert aircraft pilots. In 1st International Workshop on Eye-Tracking in Aviation.
  • 61. Melsted P., & Pritchard J. K. (2011). Efficient counting of k-mers in DNA sequences using a bloom filter. BMC bioinformatics, 12(1), 333 10.1186/1471-2105-12-333 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Reani, M., Peek, N., & Jay, C. (2018, June). An investigation of the effects of n-gram length in scanpath analysis for eye-tracking research. In Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications (pp. 1-8).
  • 63. Suthaharan S. (2016). Machine learning models and algorithms for big data classification Integr. Ser. Inf. Syst, 36, 1–12. [Google Scholar]
  • 64. Friedman J., Hastie T., & Tibshirani R. (2001). The elements of statistical learning (Vol. 1, No. 10) New York: Springer series in statistics. [Google Scholar]
  • 65. Combrisson E., & Jerbi K. (2015). Exceeding chance level by chance: The caveat of theoretical chance levels in brain signal classification and statistical assessment of decoding accuracy. Journal of neuroscience methods, 250, 126–136. 10.1016/j.jneumeth.2015.01.010 [DOI] [PubMed] [Google Scholar]
  • 66. Charness N., Reingold E. M., Pomplun M., & Stampe D. M. (2001). The perceptual aspect of skilled performance in chess: Evidence from eye movements. Memory & cognition, 29(8), 1146–1152. 10.3758/BF03206384 [DOI] [PubMed] [Google Scholar]
  • 67. Curby K. M., Glazek K., & Gauthier I. (2009). A visual short-term memory advantage for objects of expertise. Journal of Experimental Psychology: Human Perception and Performance, 35(1), 94 10.1037/0096-1523.35.1.94 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. Yu C. S., Wang E. M. Y., Li W. C., & Braithwaite G. (2014). Pilots’ visual scan patterns and situation awareness in flight operations. Aviation, space, and environmental medicine, 85(7), 708–714. 10.3357/ASEM.3847.2014 [DOI] [PubMed] [Google Scholar]
  • 69. Ericsson K. A., & Kintsch W. (1995). Long-term working memory. Psychological review, 102(2), 211 10.1037/0033-295X.102.2.211 [DOI] [PubMed] [Google Scholar]
  • 70. Haider H., & Frensch P. A. (1999). Eye movement during skill acquisition: more evidence for the information-reduction hypothesis. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25(1), 172. [Google Scholar]
  • 71. Kundel H. L., Nodine C. F., Conant E. F., & Weinstein S. P. (2007). Holistic component of image perception in mammogram interpretation: gaze-tracking study. Radiology, 242(2), 396–402. 10.1148/radiol.2422051997 [DOI] [PubMed] [Google Scholar]
  • 72. Reingold E. M., Charness N., Pomplun M., & Stampe D. M. (2001). Visual span in expert chess players: Evidence from eye movements. Psychological Science, 12(1), 48–55. 10.1111/1467-9280.00309 [DOI] [PubMed] [Google Scholar]
  • 73. Charness N., Reingold E. M., Pomplun M., & Stampe D. M. (2001). The perceptual aspect of skilled performance in chess: Evidence from eye movements. Memory & cognition, 29(8), 1146–1152. 10.3758/BF03206384 [DOI] [PubMed] [Google Scholar]
  • 74.Alamán, J. R., Causse, M., & Peysakhovich, V. (2020). Attentional span of aircraft pilots: did you look at the speed?. In 1st International Workshop on Eye-Tracking in Aviation.
  • 75.Li, W. C., Chiu, F. C., Kuo, Y. S., & Wu, K. J. (2013, July). The investigation of visual attention and workload by experts and novices in the cockpit. In International Conference on Engineering Psychology and Cognitive Ergonomics (pp. 167-176). Springer, Berlin, Heidelberg.
  • 76. Peysakhovich V., Lefrançois, O., Dehais F., & Causse M. (2018). The neuroergonomics of aircraft cockpits: the four stages of eye-tracking integration to enhance flight safety. Safety, 4(1), 8 10.3390/safety4010008 [DOI] [Google Scholar]
  • 77.Schwerd, S., & Schulte, A. (2020, July). Experimental Validation of an Eye-Tracking-Based Computational Method for Continuous Situation Awareness Assessment in an Aircraft Cockpit. In International Conference on Human-Computer Interaction (pp. 412-425). Springer, Cham.
  • 78.Lounis, C., Peysakhovich, V., & Causse, M. (2019, July). Flight eye tracking assistant (feta): Proof of concept. In International Conference on Applied Human Factors and Ergonomics (pp. 739-751). Springer, Cham.
  • 79.Lounis, C., Peysakhovich, V., & Causse, M. (2018, June). Intelligent cockpit: eye tracking integration to enhance the pilot-aircraft interaction. In Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications (pp. 1-3).
  • 80.Dubois, E., Blättler, C., Camachon, C., & Hurter, C. (2017, June). Eye movements data processing for ab initio military pilot training. In International Conference on Intelligent Decision Technologies (pp. 125-135). Springer, Cham.

Decision Letter 0

Peter James Hills

11 Sep 2020

PONE-D-20-21444

Visual scanning strategies in the cockpit and pilot expertise: a flight simulator study

PLOS ONE

Dear Dr. LOUNIS,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

As you will see, the Reviewer's made many interesting and important comments that I won't repeat all here. I believe that the most important aspects to address are: improving the clarity and level of detail; discussing surprising and unexpected findings; and controlling for the fact that the AOIs are of different sizes (this needs a good discussion as to what method you choose, for psychological and practical purposes - using a area-normalised technique might be useful to understand the importance of particular areas psychologically, but may not be as important practically). Potentially, you might want to include multiple analyses in the results. Whatever you choose to do, ensure it is discussed fully. I look forward to receiving your interesting work.

Please submit your revised manuscript by Oct 26 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Peter James Hills, PhD

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Thank you for stating the following financial disclosure:

"The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

At this time, please address the following queries:

  1. Please clarify the sources of funding (financial or material support) for your study. List the grants or organizations that supported your study, including funding received from your institution.

  2. State what role the funders took in the study. If the funders had no role in your study, please state: “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

  3. If any authors received a salary from any of your funders, please state which authors and which funders.

  4. If you did not receive any funding for this study, please state: “The authors received no specific funding for this work.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: No

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The article „Visual scanning strategies in the cockpit and pilot expertise: a flight simulator study“ is written at a good level and meets the requirements for a scientific contribution. The topic of the manuscript is relevant and of interest to the audience of this journal. Research methodology and treatment for the study are appropriate and replied properly. Manuscript contain enough sufficient and appropriate references. The level of English is at a high level. The conclusion summarizes the main results and contributions of the manuscript.

Page 7, Eye tracking data „ only AOI- based data were used in this experiment “. What are other areas that are not part of the AOI? (overhead panel, throttle…)

Page 8, Flight simulator data: „In this experiment, the predicted values correspond to the different specific threshold given by the experimenter (i.e., speed 130 Kt; vertical speed below -500 ft/min and above +800 ft/min; heading different from 143°)“ Based on what did you choose these values?

Page 11, Visual patterns identification, article compare tri- gram pattern OTW-VS-OTW, OTW-ECAM-OTW, etc. This marking is opaque. I recommend using a numerical marking according to the distribution of AOI Fig. 2. Overview of the ten different AOIs

I recommend publishing the article in the journal.

Reviewer #2: Manuscript ID: PONE-D-20-21444

Title: "Visual scanning strategies in the cockpit and pilot expertise: a flight simulator study"

This is a nice study dealing with scanning strategies among aviators. The study is well-done and the methods are sound. Future studies should address more specific questions and measurements, for instance whether small saccades produced during fixation (i.e. microsaccades; see McCamy et al., 2014 for a work that discusses the relationship between scene informativeness and ocular targeting), also differ between expert and novice. However, my feeling is that the current study already makes a valuable contribution to the field in present form.

Notwithstanding the above, I have some comments and suggestions.

Introduction

#1. The literature reviewed in the introduction and discussion seems incomplete. When revising your paper, please take care to ensure your reference list is up to date, and that any recent paper that are of relevance to your work are cited.

#2. I suggest adding a table comparing the eye movement metrics you described (e.g. main differences, shortcomings, strength, etc…). It will tremendously help the reader to have a big picture of your arguments.

Methods

#3. I suggest adding more details (or better defining) of your novice group. It looks quite strange that participants with no flight experience were able to flight an aircraft. Did they receive any basic flight training? If they have received some basic flight courses during their education in aeronautics, it should be stated (number of hours). It will help to strengthen your results and discussion. If this group lacks of minimum flight notions on how to interact with the aircraft, it should be reported as a main shortcoming of your study. In this case, any comparisons with the expert group will be pointless, and any mention (including the title) comparing novices and experts should be toned down.

#4. Please, add more details about the choice of your sample size. You might want to state if the number of pilots was considered appropriate based on a previous cohort.

#5. What is the field of view covered by the simulator?

#6. The AOIs are very different in size. Is this controlled for in any way? Furthermore, AOIs 6 and 7 are adjacent to each other and are relatively small, at least in terms of the subtended visual field. For example, if a pilot is looking at 6, do he really need to foveate 7 to get the necessary information. Moreover, can your eye tracking system resolve these different AOIs reliably? You might want to discuss this issue in your limitation section.

#7. Did you re-calibrate the 5 cameras, and eventually update the eye tracker 3D setting, for each pilot? Please clarify this issue.

#8. Authors stated that have used machine learning models. Which models? In plural?

Results

#9.It is surprising that participants with no flight experience (novices) were able to behave similar to the expert pilots (i.e. Heading ). This supports my comment on the description of your sample. Furthermore, it feels strange that novices and pilots behave the same during the easy dual task scenario, but very differently in the other two. Is there any plausible explanation for this?

Discussion

#10. I suggest providing a theoretical framework for describing your results. While this is not absolutely necessary to publish the results, it will strengthen the argument about differences in visual strategy.

Suggested references:

Shiferaw, B., Downey, L., & Crewther, D. (2019). A review of gaze entropy as a measure of visual scanning efficiency. Neuroscience & Biobehavioral Reviews, 96, 353-366.

Diaz-Piedra, C., Rieiro, H., Cherino, A., Fuentes, L. J., Catena, A., & Di Stasi, L. L. (2019). The effects of flight complexity on gaze entropy: An experimental study with fighter pilots. Applied ergonomics, 77, 92-99.

McCamy, M. B., Otero-Millan, J., Di Stasi, L. L., Macknik, S. L., & Martinez-Conde, S. (2014). Highly informative natural scene regions increase microsaccade production during visual scanning. Journal of neuroscience, 34(8), 2956-2966.

Reviewer #3: The manuscript presents an experiment that explored the potential of several eye metrics to differentiate between pilots with different flight expertise. It presents really interesting, innovative metrics (k coefficient, transition entropy, n-gram coding), although authors do not really justify the selection of such metrics and the classification of metrics seems forced (most metrics look to me more as gaze patterning metrics than gaze dispersion metrics). Anyway, the methods used are generally sound and valid and I feel that the manuscript will be of interest and very useful for researchers in the field.

More detailed comments are provided below.

1. The manuscript needs to be good, thoughtful proofread. Also, the messages are sometimes not clear and therefore the “story” that authors are telling is hard to follow. I am not only proposing authors to remove typos and spelling errors, or to carefully edit the whole manuscript for correctness and clarity, but to improve the organization of the contents. Some errors, only from the abstract section, are as follows. Some of them should have been detected prior submission as they clearly affect the flow of the text.

- The first sentence is written in a way that simplifies the construct of situational awareness: here, situational awareness would merely represent the monitoring of flight instruments. Moreover, monitoring the flight instruments is not the only demanding activity in the cockpit, and just monitoring them does not guarantee a timely intervention in an emergency situation.

- I know that there are texts that use the term “situation awareness”, whereas others use “situational awareness”. In this case, authors should only choose one term and stick to it.

- It is not clear what authors mean by “qualify” visual strategies or visual information in the abstract. It needs to be clarified if authors are referring to “study”, or to “examine”, or any other synonym, or maybe they are referring to “quantify”.

- Does “visual information taking” mean “visual information acquisition”?

- Does “efficient perceptual efficiency” mean “higher perceptual efficiency”?

- The sentence “The two groups performed ...” in the abstract should be rephrased as its structure does not comply with English grammar rules.

- “complex” and “elaborate” are essentially synonyms. What does “elaborate” add to the sentence?

- Authors mentioned “better dispersion of their (pilots) attention” in the abstract. Attention is a very complex construct. What authors are referring to with the idea of a dispersed attention? I understand that they are probable referring to orienting, a kind of attention triggered by external cues (visual or in other modalities) and that usually implies the movements of the eyes toward a target location. It is important that authors differentiate between overt and covert orienting attention, as the latter may be especially relevant for expert pilots.

- The sentence “These visual scanning differences…” in the abstract should be rephrased. The scanning differences are being used to classify pilots, this reads like the other way around.

- The sentence “Our results can benefit…” is saying the same thing twice. Also, “benefit for aviation” does not make sense. I understand it would be “benefit the aviation”.

2. Authors presented a classification of gaze patterns depending on “spatial dispersion” and “its structure”. I am not sure why the “gaze dispersion metrics” are called that way. For example, the transition matrix includes a significant amount of temporal structure, and the k coefficient, while it somehow includes dispersion (saccade size), it is influenced at least 50% by dwell time, which again is a temporal parameter. Also, it is unclear why the transition entropy is not included here, since it is a very similar measure to the transition matrix density. In general, I am not sure there is a clear distinction between the two set of parameters, but, if any, the comparison is more between global and fine level structure.

Regarding the transition matrix density, authors stated that “A sparse matrix (small index value) indicates a more efficient and directed search”. It is not necessarily more efficient, since it can be a marker of missing vital information. For example, a novice driver can direct his gaze continuously to the road while ignoring/forgetting the rearview mirrors.

Regarding the k coefficient, authors stated that “Values of Ki close to zero indicate relative similarity between dwell durations and transition amplitudes.” It would be interesting to know if this comes from long dwelling periods followed by large saccades or short dwells and small saccades, since they are probably generated by very different cognitive/physiological states.

Finally, there is an analysis using n-grams that is used on the data but not described here, which is extremely confusing.

3. Methods:

- Authors defined the flight scenarios used, but never the abbreviations used across the rest of the manuscript.

- As far as I can tell, there is not a catch-all AOI being defined (i.e., subject is looking somewhere the design did not account for). Is there a significant amount of time spent looking outside these AOIs?

- The eye tracking data section is probably the most important on the manuscript and it is very hard to understand. Authors should rewrite it carefully to improve readability. Moreover:

1) Authors defined two types of entropy in the introduction section, but did not clarify the one they finally used.

2) Pattern identification has not been explained before.

3) Machine learning models should be described.

4) A potential thing that could be added is an analysis on metric redundancy. Metrics such as the transition matrices and the transition entropy seem to be measuring very similar things, and kind of the same with the LZC and the ngrams. Is it really useful to have all of them?

4. Results:

- Figure 9 differences in transition matrices between expert and novice pilots. Which one has the more homogeneous distribution? Authors need to state it in the results section. Also, it seems like most of the novice complexity comes from exploration of AOIs 1-5. Is this relevant?

- The true positive-false positive seems redundant, since it has the exact same information as the confusion matrix.

- The focal-ambient K coefficient showed that attention was dominantly focal (positive value) in both groups, but not in all scenarios. Is that correct?

- In the hard dual-task scenario, authors found that dual-task changed the ambient-focal strategy of the novices, while experienced pilots kept their strategy consistent across experimental scenarios. Any idea as to why?

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Feb 18;16(2):e0247061. doi: 10.1371/journal.pone.0247061.r002

Author response to Decision Letter 0


2 Dec 2020

Please find all the responses in the document "response to reviewer.pdf".

Reviewer #1

The article „Visual scanning strategies in the cockpit and pilot expertise: a flight simulator study“ is written at a good level and meets the requirements for a scientific contribution. The topic of the manuscript is relevant and of interest to the audience of this journal. Research methodology and treatment for the study are appropriate and replied properly. Manuscript contain enough sufficient and appropriate references. The level of English is at a high level. The conclusion summarizes the main results and contributions of the manuscript. Page 7, Eye tracking data „ only AOI- based data were used in this experiment “. What are other areas that are not part of the AOI? (overhead panel, throttle…)

• Thank you very much for these positive and kind comments. We indeed did not integrate these AOIs as our experiment specifically addressed the monitoring issues during the landing phase. Consequently, we restricted our investigation to the flight instruments that displayed relevant information during this maneuver. The flight scenarios were designed in such a way that no action was required on the overhead panel. However, the authors recognize that we could have analyzed the time spent gazing at the throttle, as the position of the thrust levers can be monitored by crews. However, we choose to restrict our analysis to instruments that display information directly related to the flight parameter (altitude, speed, etc.). We clarified this point in the paper.

Page 8, Flight simulator data: „In this experiment, the predicted values correspond to the different specific threshold given by the experimenter (i.e., speed 130 Kt; vertical speed below -500 ft/min and above +800 ft/min; heading different from 143°)“ Based on what did you choose these values?

• These values were chosen because they roughly correspond to reality. In commercial aircraft, a standard landing speed is around 130 kt. The chosen negative vertical speed (–500 ft/min) roughly corresponds to the vertical speed during the approach at 130 kt, with an angle of approach of three degrees. We defined a tolerance range in case the participant was not well stabilized on the approach slope. Consequently, we gave these instructions to our participants to perform their landing. We added these details in the paper:

o “We choose these values because they roughly correspond to a standard landing speed with a commercial aircraft. The negative vertical speed of 800 ft/min approximatively corresponds to the vertical speed at 130 kt with an angle of approach of three degrees. We defined a tolerance range in case the participant was not well stabilized on the approach slope and had to gain altitude (+500 ft/min maximum).”

Page 11, Visual patterns identification, article compare tri- gram pattern OTW-VS-OTW, OTW-ECAM-OTW, etc. This marking is opaque. I recommend using a numerical marking according to the distribution of AOI Fig. 2. Overview of the ten different AOIs

• We thank reviewer #1 for this suggestion. The author would like to keep the labels (OTW, VS, SPD...) since they are commonly used in experiment involving flight instruments. If reviewer #1 feels that numbers could clarify, we might add the numerical mark in parentheses in addition to the label.

Reviewer #2

#1. The literature reviewed in the introduction and discussion seems incomplete. When revising your paper, please take care to ensure your reference list is up to date, and that any recent paper that are of relevance to your work are cited.

• We expanded the introduction and discussion section by adding recent supplementary recent references relevant to our work. In particular, the following references were introduced: Diaz-Piedra et al. (2019) ; Shiferaw et al. (2019) ; Haslbeck et Zhang (2017) ; Allsop (2014) ; Brams et al. (2018) ; Alamán et al. (2020),Dubois et al. (2017), Schwerd et Schulte (2020), Peissl et al(2019).

#2. I suggest adding a table comparing the eye movement metrics you described (e.g. main differences, shortcomings, strength, etc…). It will tremendously help the reader to have a big picture of your arguments.

• We thank reviewer #2 for this excellent suggestion that helped clarify the presentation of the various eye metrics. We add a table comparing the different metrics.

#3. I suggest adding more details (or better defining) of your novice group. It looks quite strange that participants with no flight experience were able to flight an aircraft. Did they receive any basic flight training? If they have received some basic flight courses during their education in aeronautics, it should be stated (number of hours). It will help to strengthen your results and discussion. If this group lacks minimum flight notions on how to interact with the aircraft, it should be reported as a main shortcoming of your study. In this case, any comparisons with the expert group will be pointless, and any mention (including the title) comparing novices and experts should be toned down.

• All novices’ participants had advanced theoretical knowledge about aeronautical engineering, and they all knew perfectly the various information given by the instruments in the cockpit (altimeter, altitude etc.), and all had flight notions on how to manually interact with the aircraft. However, they had no experience of real flying. In our experiment, the participant simply had to control the trajectory and the speed of the aircraft, thus the task was feasible by beginners (the scenarios did not require complex navigation activities or interaction with automation). However, the authors acknowledge that assessing different levels of expertise, including pilots with low, moderate, and high experience, would be a desirable extension of this work. We clarified this point in the participant section:

o “A first group called “novices” consisted of participants with no real flight experience (n = 16, mean age 25.65 ± 5.47 years). They were recruited from a French aerospace engineering school (ISAE-SUPAERO, Toulouse, France). All these novices participants had advanced theoretical knowledge about aeronautical engineering, were familiar with the various information given by the instruments in the cockpit (altimeter, altitude etc.), and had flight notions on how to manually interact with the aircraft. Our experimental flight scenarios were relatively simple: the participant had to control the trajectory and the speed of the aircraft. The scenarios did not require complex navigation activities or interacting with automation. Thus the scenarios were feasible for these novices after a relatively short training session.”

• They indeed all went thought a training session, we slightly expanded this point:

o “Participants performed a training session, consisting of performing two times a landing scenario. All participants (including novice ones) were able to control the aircraft correctly after these two landings. Then, the participants performed three times the same landing scenario than during the training, but with varying levels of complexity.”

• Finally, we expanded the discussion on this possible shortcoming in the limitation section as follow:

o “There are some limitations to this study. We compared professional pilots with non-pilots only. The comparison of these two very different profiles can artificially increase the observed differences in terms of ocular behavior. A further researcher should consider participants with different levels of expertise from novice to expert (e.g., every 1000 hours) to finely examine the implementation of the visual strategies with expertise.”

#4. Please, add more details about the choice of your sample size. You might want to state if the number of pilots was considered appropriate based on a previous cohort.

• With a total of 32 participants, divided in two groups (16 novices and 16 experts), our study is in the range of other works. Some studies involved a more reduced sampled size (e.g., 10 novices and 10 pilots for Ottati et al. (1999); 10 novices and 6 experts’ pilots for kasarskis (2001)); 14 novices and 14 pilots for Schriver et al (2008). Other studies involved a relatively similar sample size: 36 pilots for Wen-Chin Li et al. (2012) with various flight hours experiences. In order to determine a sufficient sample size, we initially conducted a statistical power analysis to determine the sample size required to detect an effect with power = 0.9 as a function of standardized effect size (alpha = 0.05). We compared average dwell times on all cockpits AOI for 5 novices and 5 experts that conducted a pretest scenario. This power analysis showed that approximately 10 participants per group were sufficient to reach a power of 0.90.

#5. What is the field of view covered by the simulator?

• The field of view covered by the simulator is 180°. This information has been added in the paper

#6. The AOIs are very different in size. Is this controlled for in any way? Furthermore, AOIs 6 and 7 are adjacent to each other and are relatively small, at least in terms of the subtended visual field. For example, if a pilot is looking at 6, do he really need to foveate 7 to get the necessary information. Moreover, can your eye tracking system resolve these different AOIs reliably? You might want to discuss this issue in your limitation section.

• Reviewer #2 is right. Indeed, AOIs are of different sizes since they correspond to the size of the various instruments. We did not control these size variations since we believe that this has no particular impact on our analysis as AOIs are the same for novice and experts. However, we fully acknowledge that relatively small and close AOIs can bias results due to the possibility to process information in peripheral vision and also due to the accuracy limitations of eye tracker systems. However, regarding AOI 6 and 7, we believe that it was not problematic since the distance between the center of the ND and the values in the AOI 7 are relatively far. Also, when defining the AOI 7, being aware of the accuracy limitations of eye tracking systems, we took a slight margin around the AOI 6 in order to ensure that dwell on AOI 7 would be captured. Indeed, there is no other reason for the pilot to glance close to that location, except reading the values. We expended discussion about these issues in the limitation section as follow:

o “Most eye-tracking studies rely on the eye-mind hypothesis which states that users fixate an area that relates to the currently processed information. However, special care should be taken when analyzing areas of interest close to each other. Pilots can perceive some information in peripheral visions, for example, speed changing via the movement of the speed tape (Citer : Alamán, J. R., Causse, M., & Peysakhovich, V. (2020). Attentional span of aircraft pilots: did you look at the speed? In 1st International Workshop on Eye-Tracking in Aviation). The experts may succeed in maintaining a constant speed by looking only at the attitude zone. This would explain why the ”AOI SPD” corresponding to speed tape is not often found in the most frequent patterns (n-grams). Finally, the eye tracker devices are more and more mature and accurate (about 1°at a distance of one meter). However, care should be taken when analyzing contiguous AOIs, the limitation in accuracy of eye tracking systems could lead to errors in this situation.”

#7. Did you re-calibrate the 5 cameras, and eventually update the eye tracker 3D setting, for each pilot? Please clarify this issue.

• Smart eye system allows to design a 3D environment and to establish calibration points (at the vicinity of AOIs). When the world model is designed, we just need to operate an automatic calibration for each participant.

#8. Authors stated that have used machine learning models. Which models? In plural?

• We used different classification algorithms (support vector Machine, LDA, Decision trees, ...) to determine which algorithm performed the best accuracy. The algorithm performing better was used in this paper to classify expertise based on transition matrix.

Results

#9.It is surprising that participants with no flight experience (novices) were able to behave similar to the expert pilots (i.e. Heading ). This supports my comment on the description of your sample. Furthermore, it feels strange that novices and pilots behave the same during the easy dual task scenario, but very differently in the other two. Is there any plausible explanation for this?

• This is an excellent remark. The heading metric was not really sensitive to expertise since the aircraft was nearly in front of the airfield at the beginning of the scenario. Consequently, it was relatively trivial to keep the aircraft to the correct heading. Speed and vertical speed values were much more complex to maintain, explaining why pilot performed systematically better than novices on these two variables. We clarified this point in the paper:

o “Our results showed that experts’ pilots had better flying performance than novices. In particular, they had lower speed and vertical speed deviations (heading variable was not sensitive most likely because the aircraft was nearly in front of the airfield at the beginning of the scenario). We assume that these superior flying performances are at least partially due to better visual scanning strategies gained with expertise.”

Discussion

#10. I suggest providing a theoretical framework for describing your results. While this is not absolutely necessary to publish the results, it will strengthen the argument about differences in visual strategy.

• Thank you very much for this excellent suggestion. We added a section in discussion referring to three possible theoretical models that can account for our results, in particular regarding the differences between novices and experts.

o Three theories can explain} the expert superiority in visual domains. First, the theory of long-term working memory \\cite{bib70} assumes that expertise extends the capacities for information processing. This theory assumes that the limited-capacity assumption should be reconsidered when related to an expert’s specific domain. Related to this hypothesis, experts encode and retrieve information more rapidly than novices. This expert's rapid information processing is reflected in shorter dwell durations. The second theory is related to the information-reduction hypothesis \\cite{bib71}. This assumes that expertise optimizes the amount of processed information by neglecting task-irrelevant information. Our results demonstrated that expert’ group keeps maintaining the visual scanning strategies related to the piloting activity during the hard-dual task scenario while novice under-performed during this scenario. This result highlights the expert's ability to focus toward relevant information to perform the task neglecting redundant information. Eventually, the third theory is the holistic model of image perception \\cite{bib72}. It focuses on the extension of the visual span. Charness et al. \\cite{bib73} shown that experts extract information from widely distanced and parafoveal regions, producing patterns of saccadic selectivity by piece saliency \\cite{bib74}. Our results suggested that expert over-performed the novice group in maintaining their speed. N-gram analysis revealed the visual scanning strategies related to speed were not found for the pilots whereas novices presents this AOI in their sequences. These results suggested the ability for experts to process information through parafoveal processing.

Reviewer #3

The manuscript presents an experiment that explored the potential of several eye metrics to differentiate between pilots with different flight expertise. It presents really interesting, innovative metrics (k coefficient, transition entropy, n-gram coding), although authors do not really justify the selection of such metrics and the classification of metrics seems forced (most metrics look to me more as gaze patterning metrics than gaze dispersion metrics). Anyway, the methods used are generally sound and valid and I feel that the manuscript will be of interest and very useful for researchers in the field.

More detailed comments are provided below.

1. The manuscript needs to be good, thoughtful proofread. Also, the messages are sometimes not clear and therefore the “story” that authors are telling is hard to follow. I am not only proposing authors to remove typos and spelling errors, or to carefully edit the whole manuscript for correctness and clarity, but to improve the organization of the contents. Some errors, only from the abstract section, are as follows. Some of them should have been detected prior submission as they clearly affect the flow of the text.

• We proofread the manuscript and we separated the beginning of the manuscript into two sections "Introduction" and "State-of-the-art visual scanning metrics" to facilitate the reading. We also enhanced the text flow to improve the reading experience.

The first sentence is written in a way that simplifies the construct of situational awareness: here, situational awareness would merely represent the monitoring of flight instruments. Moreover, monitoring the flight instruments is not the only demanding activity in the cockpit, and just monitoring them does not guarantee a timely intervention in an emergency situation.

• We do agree with the reviewer #3, we mitigated the first sentence as follow:

o “During a flight, pilots must rigorously monitor their flight instruments since it is one of the critical activities that contribute to update their situation awareness. This task is cognitively demanding, but it is necessary for timely intervention in the event of a parameter deviation.”

I know that there are texts that use the term “situation awareness”, whereas others use “situational awareness”. In this case, authors should only choose one term and stick to it.

• We homogenized the term and now use only the term “situation awareness”

It is not clear what authors mean by “qualify” visual strategies or visual information in the abstract. It needs to be clarified if authors are referring to “study”, or to “examine”, or any other synonym, or maybe they are referring to “quantify”.

• We replaced the term by “study “

Does “visual information taking” mean “visual information acquisition”?

• We replaced the term by “visual information acquisition, which is more widely used”.

Does “efficient perceptual efficiency” mean “higher perceptual efficiency”?

• We replaced by “higher perceptual efficiency”

The sentence “The two groups performed ...” in the abstract should be rephrased as its structure does not comply with English grammar rules.

• Thanks, the sentence has been rephrased as follow:

o “The two groups landed three times with different levels of difficulty (manipulated via a double task paradigm)”.

“complex” and “elaborate” are essentially synonyms. What does “elaborate” add to the sentence?

• We believe that “elaborate” brings here information suggesting the beneficial result of experience and is thus more positively connoted than “complex”. We can remove this word if reviewer #3 feels that it does not bring any additional information than “complex”.

Authors mentioned “better dispersion of their (pilots) attention” in the abstract. Attention is a very complex construct. What authors are referring to with the idea of a dispersed attention? I understand that they are probable referring to orienting, a kind of attention triggered by external cues (visual or in other modalities) and that usually implies the movements of the eyes toward a target location. It is important that authors differentiate between overt and covert orienting attention, as the latter may be especially relevant for expert pilots.

• We intended to refer to the idea that visual attention was more spatially distributed to the visual field (i.e. in more equal proportion vs. concentrated at few area of interest). In other words, their attention was not focused on a single channel of information (phenomenon called attentional tunneling when extreme, cf. Wickens 2005, Attentional Tunneling and Task Management). In the diffuse mode, visual attention is allocated to all regions of the visual field in equal proportion; in the focused mode, attention is concentrated at one area of interest, specified by a central or peripheral cue. (citer) Heitz, R. P., & Engle, R. W. (2007). Focusing the spotlight: Individual differences in visual attention control. Journal of Experimental Psychology: General, 136(2), 217.

• We replaced the term by "better distribution of attention” to the cockpit instruments and we slightly developed the notions of diffuse vs. focused attention as well as attentional tunneling concept in the attentional modes section:

o “According to Heitz et al. (2007), in the diffuse mode, visual attention is allocated to all regions of the visual field in equal proportion; in the focused mode, attention is concentrated at one area of interest, specified by a central or peripheral cue. An extremely focused mode could be compared to the concept of attentional tunneling (Wickens et al. 2009).”

We added in limitation overt vs covert orienting attention concerning eye-tracking device.

o Finally, we should also specify that eye tracking allow capturing only overt attention, for example when a person moves his eyes in the direction of an object, and not covert attention, when an individual focus his attention on an object, but without moving the eyes toward that object.

The sentence “These visual scanning differences…” in the abstract should be rephrased. The scanning differences are being used to classify pilots; this reads like the other way around.

• The reviewer is right we replaced the sentence by:

o We classified pilot's profiles (novices -- experts) by machine learning based on Cosine KNN (K-Nearest Neighbors) using transition matrices.

The sentence “Our results can benefit…” is saying the same thing twice. Also, “benefit for aviation” does not make sense. I understand it would be “benefit the aviation”.

• Reviewer 3 is right ; we make the necessary revisions in the text concerning the points mentioned below to improve readability.

o “Our results benefit the aviation domain to speed up learning, assess the monitoring performance of the crew, and ultimately reduce incidents or accidents due to human errors.

2. Authors presented a classification of gaze patterns depending on “spatial dispersion” and “its structure”. I am not sure why the “gaze dispersion metrics” are called that way. For example, the transition matrix includes a significant amount of temporal structure, and the k coefficient, while it somehow includes dispersion (saccade size), it is influenced at least 50% by dwell time, which again is a temporal parameter. Also, it is unclear why the transition entropy is not included here, since it is a very similar measure to the transition matrix density. In general, I am not sure there is a clear distinction between the two set of parameters, but, if any, the comparison is more between global and fine level structure.

• Reviewer #3 is right, we struggled to propose a clear classification, but we agree that it does not described in a sufficiently accurate fashion the specificity of each metric. We proposed a new classification in the text and a table is now introduced to describe the different metrics. We hope that this classification is more accurate.

o One approach to analyze visual scanning strategies is to analyze transition matrix, a second one is the characterization of fluctuation between ambient/focal visual behavior}, another one is to derive global patterns metrics such as entropy. More generally, in this paper, we classified visual scanning strategies metrics in three AOI based approaches: one is based on Markov chains (transition matrix), another is based on the attentional modes, and the last one is based on sequences analyses}. Figure 14 presents a comparison of the visual scanning metrics described below (e.g. formula, definition, strength shortcomings, strength, etc.).”

Regarding the transition matrix density, authors stated that “A sparse matrix (small index value) indicates a more efficient and directed search”. It is not necessarily more efficient, since it can be a marker of missing vital information. For example, a novice driver can direct his gaze continuously to the road while ignoring/forgetting the rearview mirrors.

• Reviewer #3 is absolutely right; this sentence has been improved. We mentioned now the fact that a sparse matrix can reflect a more efficient and directed search, for example when using a computer software (Goldberg, 2002), or, in other contexts, can indicate a failure to properly monitor the environment, for example when a novice driver directs his gaze continuously to the road while ignoring/forgetting the rearview mirrors or when a pilot is excessively engaging his visual attention on a channel of information (e.g., Wickens 2009). The following sentence was introduced:

o “A sparse matrix can reflect a more efficient and directed search, for example when using a computer software (Goldberg,2002), or, in other contexts, can indicate a failure to properly monitor the environment, for example when a novice driver directs his gaze continuously to the road while ignoring/forgetting the rearview mirrors or when a pilot is excessively engaging his visual attention on a single instrument (e.g., Wickens 2009).”

Regarding the k coefficient, authors stated that “Values of Ki close to zero indicate relative similarity between dwell durations and transition amplitudes.” It would be interesting to know if this comes from long dwelling periods followed by large saccades or short dwells and small saccades, since they are probably generated by very different cognitive/physiological states.

• The reviewer #3 is right future study should investigate this question we add.

o It is worth noting that the values of K coefficient should be interpreted together with dwell duration results because different groups can have different average values of dwell duration and transition amplitudes.}

Finally, there is an analysis using n-grams that is used on the data but not described here, which is extremely confusing.

• To the author's knowledge, this is the first time that the n-gram comparison is used in a study involving eye tracking data, thus we could not cite any reference and the n-gram analysis was described in the section method/data processing only. However, we do agree with reviewer #3 that a description was needed, this metric is now depicted in the table.

3. Methods:

Authors defined the flight scenarios used, but never the abbreviations used across the rest of the manuscript.

• The abbreviations concerning the flight scenarios were defined in figure label. But reviewer #3 is right and we have now added the abbreviations also in method section to ease the reading.

o Participants manually (i.e., without the autopilot) performed three times the same landing scenario according to three different conditions. The “control scenario” (CS) was a nominal landing without a supplementary task. The “easy dual task scenario” (EDTS) and the “difficult dual task scenario” (HDTS) were similar to the “control scenario” except that participants were asked to perform a supplementary monitoring task.

As far as I can tell, there is not a catch-all AOI being defined (i.e., subject is looking somewhere the design did not account for). Is there a significant amount of time spent looking outside these AOIs?

This is a very good question. We ensure that there is not a significant amount of time spent looking outside an AOI. The graph below shows the percentage of time spent outside an AOI for each participant numbered from 1 to 16. We added this graph in supplementary materials.

The eye tracking data section is probably the most important on the manuscript and it is very hard to understand. Authors should rewrite it carefully to improve readability.

• This section has been rewritten to improve clarity and readability.

o Figure 3 shows the entire eye tracking pipeline analysis. Each AOI was coded using numbers from 1 to 10 corresponding to the flight instruments (see \\ref{Fig3}). Only AOI-based data were extracted in this experiment and concatenated to obtain two chronological vectors containing the indices of the visited AOIs (from 1 to 10) and the time spent on them. Dwells inferior to 200 ms \\cite{bib53} were discarded. Furthermore, consecutive fixations in the same area were merged (e.g., for 1, 1, 4, 4, 5, 5, 5, 6 we only consider 1, 4, 5, 6). The transition vector (the vector containing the transitions between each AOI numbers) was used to compute LZC, GTE. Concerning the transition matrices, given their high dimensionality, it is difficult to use classical inferential statistics. Therefore, we applied machine learning algorithms on the concatenated transition matrices to compare the two groups of participants (novice vs pilot). \\hl{Various machine learning model types were used (SVM, LDA, K-Nearest Neighbor, for a review see} \\cite{bib67}). \\hl{The algorithm performing the best accuracy (Cosine KNN) was selected in this paper}. The transition probabilities from one AOI to another were taken as a feature, thus raising the number of features to a total of 100 features (i.e., 10 AOIs × 10 AOIs). A principal component analysis (PCA) was used to reduce the features’ numbers. This restricts the model to 35 features corresponding to the main transition probabilities of the matrices. Five-fold cross-validation was used, which is a good trade-off between bias and variance estimation \\cite{bib54}. According to Combrisson and Jerbi \\cite{bib55} theoretical chance level for classification for $p$ $<$ 0.05 with two classes is around 58\\%. Concerning the K coefficient, the transition entropy, and the Lempel Ziv complexity methods, they were respectively computed following the methods of \\cite{bib44}, \\cite{bib50}, \\cite{bib52}. Finally, based on the transition vector, the n-grams frequency-based method \\cite{bib56} was used to identify the number of common 3, 4, 5, and 6-gram sequences in each group. After counting the occurrence of given n-grams for each participant, the number of common sequences of each n-gram was calculated for each group (Novice/Pilots) .

o

1) Authors defined two types of entropy in the introduction section but did not clarify the one they finally used.

• We agree that this is a useless confusion. We have clarified this point in the text by removing from the text the entropy type that is not used in this paper.

2) Pattern identification has not been explained before.

• We have clarified this point by adding supplementary elements describe in introduction and method:

o In introduction: N-gram is an essential component of many methods in bioinformatics, including genome and transcriptome assembly, metagenomic sequencing, and error correction of sequence reads} \\cite{bib66}. Basically, an N-gram model might predict the occurrence of an AOI, based on the occurrence of its N–1 previous AOI. So here, we are answering the question: how far back in the history of a sequence of AOI should we go to predict the next AOI? For instance, a bigram model (N=2) predicts the occurrence of an AOI given only its previous AOI (as N–1=1 in this case). Similarly, a trigram model (N=3) predicts the occurrence of an AOI based on its previous two AOIs. The common N-gram sequence analysis uses the n-grams frequency-based method \\cite{bib56} to identify the number of common 3, 4, 5, and 6-gram sequences in each group. By using this method, it is possible to count the occurrence of N-gram AOI and their occurrence for each pilot, and thus, it allows to compare for each N-gram the intra-group patterns consistency.

o In method: The n-grams frequency-based method \\cite{bib56} was used to identify the number of common 3, 4, 5, and 6-gram sequences in each group. After counting the occurrence of given n-grams for each participant, the number of common sequences of each n-gram was calculated for each group (Novice/Pilots).

3) Machine learning models should be described.

• We have added a description concerning Machine learning models:

o “We used different classification algorithms (support vector Machine, LDA, Decision trees, ...) to determine which algorithm performed the best accuracy. The algorithm cosine KNN performing better was used in this paper to classify expertise based on transition matrix.”

4) A potential thing that could be added is an analysis on metric redundancy. Metrics such as the transition matrices and the transition entropy seem to be measuring very similar things, and kind of the same with the LZC and the n-grams. Is it really useful to have all of them?

• The author thanks reviewer #3 for this very relevant comment. As proposed by another reviewer we add a table comparing the different metrics (shortcoming, strength, formula, approach.) We believe that each metric has its strengths and weaknesses to bring an understanding of visual strategies. We add supplementary explanation in discussion :

o All the metrics used in this study allowed characterizing visual scanning. We examined the impact of expertise and flying difficulty on the visual scanning strategies. As our results showed, a large number of standard and advanced metrics were sensitive to these two factors. Each metric has its strengths and weaknesses to bring an understanding of visual strategies. For instance, while a transition matrix measure and an entropy value are closely related, the information presented for one and the other is different. A transition matrix makes it possible to measure the preferred paths when consulting AOIs. It highlights the strength of the links between AOIs while the entropy will reflect the disorder of these transition sequences. The application of these metrics can be different. For example, if the aim is to redesign a cockpit panel, transition matrices can be very useful because they give the strength of the relationship between AOIs. This metric can allow to bring close together instruments that are often gazed consecutively, which would help to spare the pilot's visual attention effort. Concerning LZC and N-gram method, N-gram compares the patterns used within the group, while LZC assesses the compressibility of the patterns (how varied the patterns are).

4. Results:

Figure 9 differences in transition matrices between expert and novice pilots. Which one has the more homogeneous distribution? Authors need to state it in the results section. Also, it seems like most of the novice complexity comes from exploration of AOIs 1-5. Is this relevant?

• Reviewer #3 is right; we added other elements in the text to discuss these results.

o The differences in transition matrices between novices/pilots are mainly observed in a more sparsed distribution of transition probabilities from one instrument to another for the pilot's group. Most of the AOI explored by Novice group involved AOI concentrated in the PFD (from 1 to 5, see Fig3) while pilot’s group explore other combinations of AOI.

The true positive-false positive seems redundant, since it has the exact same information as the confusion matrix.

• This is the current diagram for presenting the results of machine learning. We propose to keep it, but if reviewer #3 is really opposed to it, we can delete it.

The focal-ambient K coefficient showed that attention was dominantly focal (positive value) in both groups, but not in all scenarios. Is that correct? In the hard dual-task scenario, authors found that dual-task changed the ambient-focal strategy of the novices, while experienced pilots kept their strategy consistent across experimental scenarios. Any idea as to why?

• Yes, this is indeed the case. These results and interpretation are discussed in the discussion section. We added more details in discussion section to clarify this point in the text.

o Our explanation for this result would be that during the easy double task, novices can set up visual strategies to be able to fly the aircraft and perform the double task. As shown in both groups (novices and pilots), the addition of the double task required the participants to look at an extra zone (ND zone) which would be the explanation of the ambient mode for both groups. However, during the difficult double task, due to the time pressure the novices cannot check the value displayed in the ND zone as frequently as they should, which implies more omission error in the difficult double task compared to the easy double task. The consequence of this visual behaviour is that a more focused mode is generated while the pilots manage to maintain their visual strategies under high time pressure.

Attachment

Submitted filename: Response to Reviewers.pdf

Decision Letter 1

Peter James Hills

1 Feb 2021

Visual scanning strategies in the cockpit are modulated by pilots' expertise: a flight simulator study

PONE-D-20-21444R1

Dear Dr. LOUNIS,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Peter James Hills, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: No

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

Reviewer #2: The authors have addressed all my comments and it was a pleasure to read the revised version of the manuscript.

I am pleased to endorse this manuscript.

Reviewer #3: The authors have done a very thorough job in revising the manuscript and addressing my concerns. I believe that this manuscript is now suitable for publication, although I would urge authors to look for editing help from someone with full professional proficiency in English and to ensure the reference list is up to date.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Acceptance letter

Peter James Hills

3 Feb 2021

PONE-D-20-21444R1

Visual scanning strategies in the cockpit are modulated by pilots’ expertise: a flight simulator study 

Dear Dr. Lounis:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr Peter James Hills

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Time spent gazing outside the defined AOI for each participants during all scenarios.

    (TIF)

    S1 Data

    (XLSX)

    S2 Data

    (XLSX)

    S3 Data

    (XLSX)

    S4 Data

    (XLSX)

    S5 Data

    (XLSX)

    S6 Data

    (XLSX)

    Attachment

    Submitted filename: Response to Reviewers.pdf

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting information files.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES