Low-level factors increase gaze-guidance under cognitive load: A comparison of image-salience and semantic-salience models

Kerri Walter; Peter Bex

doi:10.1371/journal.pone.0277691

. 2022 Nov 28;17(11):e0277691. doi: 10.1371/journal.pone.0277691

Low-level factors increase gaze-guidance under cognitive load: A comparison of image-salience and semantic-salience models

Kerri Walter ^1,^*, Peter Bex ¹

Editor: Marcela de Lourdes Peña Garay²

PMCID: PMC9704686 PMID: 36441789

Abstract

Growing evidence links eye movements and cognitive functioning, however there is debate concerning what image content is fixated in natural scenes. Competing approaches have argued that low-level/feedforward and high-level/feedback factors contribute to gaze-guidance. We used one low-level model (Graph Based Visual Salience, GBVS) and a novel language-based high-level model (Global Vectors for Word Representation, GloVe) to predict gaze locations in a natural image search task, and we examined how fixated locations during this task vary under increasing levels of cognitive load. Participants (N = 30) freely viewed a series of 100 natural scenes for 10 seconds each. Between scenes, subjects identified a target object from the scene a specified number of trials (N) back among three distracter objects of the same type but from alternate scenes. The N-back was adaptive: N-back increased following two correct trials and decreased following one incorrect trial. Receiver operating characteristic (ROC) analysis of gaze locations showed that as cognitive load increased, there was a significant increase in prediction power for GBVS, but not for GloVe. Similarly, there was no significant difference in the area under the ROC between the minimum and maximum N-back achieved across subjects for GloVe (t(29) = -1.062, p = 0.297), while there was a cohesive upwards trend for GBVS (t(29) = -1.975, p = .058), although not significant. A permutation analysis showed that gaze locations were correlated with GBVS indicating that salient features were more likely to be fixated. However, gaze locations were anti-correlated with GloVe, indicating that objects with low semantic consistency with the scene were more likely to be fixated. These results suggest that fixations are drawn towards salient low-level image features and this bias increases with cognitive load. Additionally, there is a bias towards fixating improbable objects that does not vary under increasing levels of cognitive load.

Introduction

Fixational eye movements in natural scenes can be driven by both bottom-up, low-level factors and top-down, high-level factors. Low-level factors include basic sensory features, such as contrast, edges, brightness, and color. Several research groups have developed low-level feature-based models in which the probability of a location being fixated is correlated with image salience [1–4]. Other groups have developed high-level semantic salience based models in which the probability of an image location being fixated is correlated with information-based maps of image and task content [5–11].

Image salience

While free-viewing a scene, some evidence suggests that gaze [1–4] and overt attention [4] follow a path based on areas that are visually salient–i.e. high local variation in image statistics. A model for visual saliency was originally proposed by Itti, Koch, & Neibur [12] in which local center-surround difference maps of linear filter responses to color, intensity and orientation are combined and weighted to generate local feature salience maps, the peaks of which are sequentially fixated. Many elaborations of this general approach have now been proposed, for recent review see [13, 14]. In this paper, we employ graph-based visual saliency (GBVS), which is one such example that combines three main low-level features: color, edge-orientation, and intensity [2]. We chose the GBVS model due to its robustness and documented ability in predicting human fixations, as well as its accessibility as an open-source toolbox.

Semantic salience

Contrary evidence suggests that gaze follows a path based on the meaning and context of the setting [5, 15–18] or locations that are relevant for future action [19–21]. These findings support a theory of top-down processes guiding gaze deployment, in which prior experience, knowledge of the world [22], and task objectives [19, 23–25] guide where we look to find relevant information. Note that these changes in knowledge and task are unaccompanied by any change in low-level salience [15, 18, 26]. In this paper, we develop a language-based method utilizing Global Vectors for Word Representation (GloVe) [27] to compute object information based on descriptors of image content [6, 10].

Cognitive impairment

There is evidence to suggest eye movements and cognitive functioning are interconnected, as individuals with cognitive impairments exhibit abnormal oculomotor behaviors when performing certain tasks. For example, eye movement related correlates have been identified for neurodegenerative diseases such as Alzheimer’s, where viewing strategies become more erratic, and saccades and smooth pursuit are slowed [28–31] (for review see [32]). Similarly, children with Autism Spectrum Disorder execute fewer eye movements when processing language and social cues [33, 34], coupled with an exaggerated center-bias when viewing images. It has been proposed that these patterns are driven by reduced attention to gaze locations related to social information processing [34]. In related work, we have demonstrated that oculomotor parameters in neurotypical control subjects are similarly affected by increasing levels of cognitive load. In a demanding visual search task, we observed a decrease in the number of fixations and saccades, coupled with the lengthening of individual fixation durations, under increasing levels of cognitive demand in neurotypical subjects [35].

Present study

The amount of effort actively invoked by working memory is called cognitive load [36]. Working memory is often measured with N-back tasks as numerous studies have shown that increasing the demands of an N-back task is associated with increased activity in brain regions associated with working memory [37–41]. Based on these findings, we designed an adaptive N-back protocol that conforms to each subject’s individualized working memory load, thus pushing subjects to their discrete maximum cognitive capacity.

Previously, we demonstrated that the manipulation of cognitive load alters the oculomotor characteristics of a participant’s search [35]. In the present study, we re-examine the data from the previous paradigm to examine whether cognitive load also alters the low-level and high-level scene content that is fixated. Because top-down processing is affected by cognitive load [42], we hypothesize that the relative contributions of low-level and high-level factors involved in the guidance of gaze will be affected by changing cognitive load. Specifically, we predict that gaze will be guided by semantically relevant objects within the scene (top-down), but as cognitive load increases, gaze will be guided by more visually salient features (bottom-up). Furthermore, we hypothesize that individual differences in working memory capacity (subjects who have high or low performance in the present cognitive load task), will be associated with different viewing strategies. Thus, individuals who are able to identify images from many trials back may utilize more efficient strategies to view and encode images in memory that may be reflected in the information selected. In order to predict gaze guidance, we will use the GBVS model to determine which areas of an image are visually salient and the GloVe model to determine which areas are semantically salient. Because we predict a shift in gaze strategy, we hypothesize that the GloVe model will be a better predictor of gaze under low cognitive load, and GBVS will be a better predictor under high cognitive load.

Methods

Apparatus

Our task was programed using MATLAB (The MathWorks, Inc., Natick, MA) with Psychtoolbox [43] and analyzed using MATLAB with the Text Analytics and Statistics and Machine Learning toolboxes. The experiment was run on a Dell OptiPlex 9020 desktop computer (Dell Inc. Round Rock, TX) with a Quadro K420 graphics card (nVidia, Santa Clara, CA). Stimuli were presented on a 60cm x 34cm BenQ XL2720Z LCD monitor (BenQ Corporation, Taipei, Taiwan) set to a screen resolution of 1,920 × 1,080 pixels at 120 Hz. A chinrest was utilized to stabilize the head position of participants, who were seated 63cm from the screen (width = 50.9°). Eye movements were recorded using an Eyelink 1000 (SR Research Ltd. Mississauga, Ontario, Canada) with the MATLAB Eyelink Toolbox [44], where the sampling rate was set to 1,000 Hz. We used the built-in Eyelink nine-point calibration and validation procedures at the beginning of the experiment and in between blocks.

Stimuli

We selected 100 images from the LabelMe database [45]; (see [35] for selection criteria), which is a collection of natural scenes in which objects have been outlined and labeled by human volunteers. These scenes and their corresponding labeled annotations are made available for public use and provided us with unbiased, pre-labeled images. 50 indoor and 50 outdoor images were selected from the LabelMe database dependent on each scene containing at least 15 unique objects, having at least 75% of the image labeled, and being a large, clear image. All 100 images were presented in random order to each subject. When presented to participants, all images were resized without cropping or changing the aspect ratio to approximately 1,280 x 960 pixels. An example of an indoor and an outdoor scene used in the experiment can be found in Fig 1.

Fig 1 — Subjects viewed a scene for 10 seconds. The scene was then replaced with a 4AFC Task that was present until the subject responded. The task was to select by mouse click, the object from the image N back in the stream (in this case N = 0, referring to the scene directly preceding the 4AFC Task). For a depiction over a longer course of trials, see [35].

Participants

We recruited a total of 33 subjects (7 male, 26 female) with self-reported normal or corrected vision from the Northeastern undergraduate population to participate in this study and excluded 3 subjects due to program crashes (N = 2) or Eyelink calibration issues (N = 1), thus we analyzed data for 30 included subjects (7 male, 23 female). We determined a stopping number of 30 before data collection, and tested until 30 useable subjects were obtained. Course credit was given as compensation for each subject’s time. Written consent was obtained before the experiment, where all subjects read and signed an informed consent form. Any subject’s under the age of 18 received written consent from a parent or guardian. This experiment was performed in accordance with the tenets of the Declaration of Helsinki, and the experimental procedure was approved by the institutional review board at Northeastern University, IRB #: 14-09-16—Psychophysical Study of Visual perception and Eye Movement Control.

Procedure

As described in [35], participants were shown a scene for 10 seconds and were instructed to view the scene freely. The scene was then replaced with a 4 alternative forced choice task (4AFC) comprised of similar objects (i.e., having the same word label) where one object was taken from the target scene and three distracter objects were taken from random scenes within the experiment,. For example, if the object were a car, the 4AFC stimulus would contain four cars, where the target car was taken from the target scene, and the other three cars were taken from alternate scenes within the experiment. Objects presented in the 4AFC task were resized to 300 pixels along their longest dimension, while maintaining their original aspect ratio. To prevent overmagnification in the 4AFC display, only objects larger than 100 x 100 pixels were chosen as targets or distracters. Objects were spliced from a rectangular section containing 10% of the surrounding background. This was done to provide minimal context of the object background, because in pilot studies we found that the task was too difficult to complete with no object context. The target object was chosen at random every trial, and the target scene was dependent on the N-back task.

The N-back began at 0, meaning the target object was selected from the scene directly preceding it. Whenever a subject answered two 4AFC trials correctly in a row, they received a prompt informing them that the N-back was increased by 1, with no maximum. The target object would then be selected from the scene 1 back in the image stream. If at any point a subject answered incorrectly, the N-back was decreased by 1, and the subject received a prompt informing them that the N-back was decreased by 1, with a minimum of 0. Subjects received feedback for 750ms, then a new scene was presented, and the cycle continued. Eye movements were recorded at 1000Hz. This adaptive paradigm ensured that all subjects were tested at their individual maximum levels of cognitive load (Fig 1). Lighting conditions were controlled for using blackout curtains, all subjects were tested in the same experimental room using the same equipment, and the luminance of our screen remained constant in an otherwise dark room [46].

Image salience: Graph Based Visual Saliency (GBVS)

We used the GBVS model developed by Harel and Koch [2], implemented in Matlab (https://www.mathworks.com/products/matlab.html) with the default parameters to compute image salience maps. The GBVS model highlights areas of a scene that have high image salience by using channels of color, orientation, and intensity to create saliency heatmaps. Three individual conspicuity maps are created, coding local variation in each channel, and they are averaged together to create a single salience heatmap [2]. We applied Gaussian smoothing proportional to eye tracking precision, as described below.

Semantic salience: Global Vectors for Word Representation (GloVe)

For the semantic-based analysis, we highlighted areas of a scene that have high semantic salience by calculating the semantic similarity between a descriptive label for each scene and the label for each object in the scene supplied with each LabelMe image. We used PlacesCNN [47] to generate the scene labels for each of our 100 selected images. PlacesCNN is an algorithm that has been trained on a database of various indoor and outdoor scenes. PlacesCNN analyzes an image based on the content in the scene and provides a scene label of what it assumes the scene location is. For example, a scene with a single desk and computer might be best matched with the scene label “home_office”. PlacesCNN returns scene labels in descending order of most-likely scene description, and we selected the top five most-likely scene labels for each of our images. For a list of selected PlacesCNN labels, see S2 Table.

Because the LabelMe database contains noise in the form of junk labels or invalid objects [10], we used the criteria in Table 1 to manually edit object labels before performing our semantic analysis.

Table 1. List of criteria for manually editing object labels.

Criteria	Examples
Removed non-words	“gtreve”, “aqq”, “df”, “44”
Removed test objects	Areas labeled “test”
Fixed spelling errors	“coffe marker” → “coffee maker”
Separated conjoined words	“personwalking” → “person walking”
Removed unnecessary adjectives	“frontal”, “occluded”, “crop”, “side”
Removed obscure shapes	“triangle” over non-discrete area
Fixed mislabeled/duplicate label objects	“big brother” over “traffic light”
Removed scribble^* objects	“sheep pen”

Open in a new tab

Criteria and some examples of how the object labels were edited. A full list of these edits can be found in the S1 Table.

*Scribble objects are a type of add-on object in LabelMe. In all cases, these labels were either depicting additional incorrect objects, or were duplicates of properly labeled objects.

We then used the GloVe dataset developed by Stanford University [27] to calculate the similarity for each object in a scene and for each of the top five scene labels. GloVe is a pre-trained regression model that uses both global matrix factorization and local context window methods. GloVe is trained on large web-based datasets, we used the Common Crawl dataset comprised of 840 billion tokens and 2.2 million words. GloVe categorizes words along feature dimensions to create a similarity web of various terms based on multiple dimensions and compare the angles and vector lengths between comparison words to achieve a semantic similarity value between 0 (not similar) and 1 (identical) for any pair of words. For example, the words “office” and “desk” return a semantic similarity value of 0.6319, while “office” and “parrot” return a value of 0.0673. All pixels within the mask area for each object were assigned the semantic similarity value between that object and scene label, and the final semantic salience map was compiled from the average of the individual maps created from the five scene-labels.

The GloVe dataset does not contain dual words, for example there is no “window blinds”, but it does contain “window” and “blinds” as separate entities. In order to generate a single word for dual-word objects, we calculated the individual vectors for all components of a multi-word, then used vector math to calculate the closest common word between the multiple vectors. In this example, “window” was the closest relevant word between “window” and “blinds”. We performed this step for both objects and scene labels.

An alternative high-level approach entails hand-labeled analysis of image regions to create meaning-maps [5]. This method requires intensive labeling by several human raters, which is impractical for many applications, especially movies. Because the ratings are completed without context or task, this model, like low-level image salience models, is unable to deal with fixation changes with task [18]. By using manually-assigned estimates of meaning, human labelers employ an unknown combination of low-level features and high-level semantic factors that side-steps the problem of these two sources of gaze-guidance and is therefore unsuitable for the present analysis.

Spatial smoothing

We applied Gaussian smoothing to the heatmaps of both GBVS and GloVe models based on the average manufacturer error reported in the Eyelink 1000 manual. Eyelink reports between 0.25° and 0.50° of error, so we used a standard deviation of 0.375° for the GBVS and GloVe maps. This Gaussian smoothed the transitions between areas of differing significance within the image. We normalized all salience maps from 0–1 before performing an ROC analysis (Fig 2).

Fig 2 — An example image (a) was analyzed with b) low-level image salience and c) high-level semantic salience, see text for details. Areas with yellow values denote higher relevance coded by each model. A Gaussian of 0.375° was implemented to smooth the heatmap edges between distinct saliencies. Blue circles represent fixations, red dotted lines represent saccades. Heatmap plots demonstrate representative gaze data from 2 individual subjects during the same trial.

Results

We were successful in actively engaging and increasing working memory load. An increase in working memory load can be measured using response time [38, 39, 41], as well as observing pupil dilation [48], for details, see [35]). As N-back increased, we observed a significant increase in response time compared to N = 0 (t-tests performed comparing N = 0 and all other N-backs, all were significant below .01 except N = 9 and N = 10 due to low sample size), as well as a significant increase when comparing subject’s low-load (N = 0) and high-load (N = maximum achieved) conditions (t(29) = −5.717, p < .001). For details, see [35].

For ROC analysis, we set levels of specificity as 100 steps from 0 to 1 (in correspondence with the range of salience values). We quantified true positives as the areas of heatmap within each specificity step that do contain gaze points, true negatives as areas not within the current specificity that do not contain gaze points, false positives as areas within the current specificity that do not contain gaze points, and false negatives as areas not within the current specificity that do contain gaze points. The area under our ROC curves (AUROC) is the prediction power of each map and quantifies how well each heatmap predicted the gaze locations of our subjects [49].

When comparing performance at maximum and minimum load, we performed paired samples t-tests for each salience model. When comparing performance across all Ns, because of the unbalanced nature of our data (N = 0 having 929 cases while N = 10 has 1 case), we performed a linear mixed effect model analysis across N-back. For a more detailed analysis on the distribution of N-back, see [35].

Permutation analysis

We performed a permutation analysis to determine if any of the models predicted subject’s gaze above chance levels. We calculated the AUROC’s for corresponding gaze and image for each subject (the subject’s gaze pattern taken from the image they viewed, paired with the heatmap for that same image), and compared the AUROC with every other gaze and image non-corresponding combination. To calculate the null distribution, we overlaid gaze data with non-corresponding images, and compared the AUROC values for each map. Because the heatmaps from non-corresponding images are unrelated to the locations of features and objects in the image, the gaze-data should be uncorrelated with the model prediction of gaze locations.

Fig 3 shows the average AUROC value of corresponding gaze and heatmap pairs for each model across all subjects (red stars), plotted along the distribution of AUROC’s for all non-corresponding pairs. The GBVS model had a corresponding mean of .731, (z = 0.734; p = .231), and the GloVe model had a corresponding mean of .458, (z = -0.582, p = .280).

The results of this permutation analysis suggest that during the free viewing intervals of the current task, image salience is a better predictor of gaze compared to semantic salience. The observation that the corresponding image and gaze analysis for GLoVe is lower than the midpoint of the null distribution suggests that subjects tend to look at incongruent, or unexpected semantic objects, implying subjects are actively looking at objects that are not relevant to the scene, although this effect was not significant.

Model summary

Fig 4 shows the overall AUROC for each model, averaged across observers and N-back levels. Overall, the image salience algorithm outperformed the semantic salience model, with a higher average mean AUROC (.731) and lower average standard deviation of AUROC (.079) across subjects. Fixation predictions from the GloVe model did not differ from chance, with a mean AUROC of .458, and had a higher standard deviation of AUROC (.118), demonstrating more variance overall across all subjects. A paired samples t-test shows that these groups are significantly different in both mean (t(29) = 40.662, p < .001) and standard deviation (t(29) = -16.726, p < .001). The results suggest that image salience models provide a better prediction of gaze guidance than the language-based semantic model for this task.

GBVS

Fig 5 shows the area under the ROC (AUROC) for GBVS as a function of N-back for each subject (faint grey) and as box plots for all subjects. A linear mixed effects model analysis showed a significant effect of N-back for GBVS (X2(1, N = 30) = 8.256, p < .01). This demonstrates that as subjects reach higher N-backs (or as cognitive load increases), gaze is guided more by low-level image salience features. This also suggests that the gaze of subjects who excel at this cognitive load task (those reaching higher N-backs, or those who have high cognitive load capacities), are guided more by low-level image salience features than those who have lower cognitive load capacities.

There were large individual differences in task performance, with subjects reaching a maximum N-back between 2 and 10, with a median of 5 [35]. We therefore used the trials on which each subject reached their individual highest N-back as their maximum load, and N-back = 0 as the lowest load for each subject. Fig 7A shows the image salience model AUROC for minimum and maximum load for each subject (faint grey) and as box plots for all subjects. There was a trend upwards for image salience from minimum to maximum individualized N-back across subjects, however this did not reach significance (t(29) = -1.975, p = .058). This is consistent with a trend in which gaze becomes more biased towards salient low-level image features as cognitive load increased in the present task.

Fig 7 — Grey lines represent individual traces. Black line represents mean values. Minimum N (minimum load) for all subjects was N = 0, maximum N (maximum load) was variable across subjects. Red lines show median, boxes show interquartile range and whiskers show 95% confidence intervals.

GloVe

Fig 6 shows the AUROC as a function of N-back for GloVe for each subject (faint grey) and as box plots for all subjects. A linear mixed effects analysis showed no effect of N-back for GloVe (X2(1, N = 30) = 1.487, p = .223). This suggests that the gaze of subjects who excel at this cognitive load is not biased towards image locations with higher or lower semantic similarity in the scene compared to those who have low cognitive load capacities.

Fig 7B shows the GloVe semantic model AUROC for minimum and maximum load for each subject (faint grey) and as box plots for all subjects. There was no significant difference between AUROC for the minimum and maximum N-back achieved across subjects (t(29) = -1.062, p = .297). This suggests that the semantic image-viewing strategy was unaffected by cognitive load in the present task.

Discussion

We move our eye 2–3 times per second in order to position our high-resolution fovea on image locations of interest. This process involves oculomotor processes that coordinate these binocular eye movements, sensory processes that encode image features and cognitive processes that estimate the locations of task-relevant objects. Growing evidence in neurological studies has identified an association between eye movement behavior and cognitive processing and we recently identified a change in oculometrics as a function of cognitive load in healthy young subjects. Specifically, with increasing cognitive load we observed a decrease in number of fixations and saccades, and an increase in fixation duration [35], an effect that is similar to differences observed between neurotypical and cognitively impaired populations (for review see [32]). In this study, we hypothesized that cognitive load could be associated with changes in low-level and high-level factors involved in viewing strategy.

Due to the nature of our task, which was task independent at the time of scene viewing, subjects did not know which object would become the target when they viewed each scene. This created an implied information-gathering task for scene viewing. Therefore, we assumed that subjects would attempt to generate a broad mental representation of the full scene during the viewing phase, while maintaining a memory of previous scenes for the identification tasks. We hypothesized that as cognitive load increased, gaze guidance would shift towards low-level sensory factors and away from high-level cognitive factors. We quantified sensory gaze guidance with an image salience model, and cognitive gaze-guidance with a language-based semantic models.

Overall, the AUROC for image salience was greater than for semantic salience, indicating that low-level features generally predict fixation locations in natural scenes for the present task better than high-level semantic content. The prediction power of the image salience model (GBVS), significantly increased with increasing cognitive load. This suggests that as cognitive load increased, the role of low-level image features in directing fixations increased. Contrarily, the prediction power of the semantic model (GloVe), did not significantly change with cognitive load. The increase in prediction power of the GBVS model across N-back suggests that subjects who excel at this task, or those who were able to reach higher N-backs, utilize a more image salience based viewing strategy compared to those who struggle with this task, or those who could only reach lower N-backs.

In addition to comparing AUROC as a function of N-back, we also examined AUROC at the minimum and maximum N-back achieved on an individual subject basis. This comparison was not significant for either GBVS or GloVe models (albeit, was close to reaching significance for GBVS). However, Fig 7 demonstrates more consistency in the GBVS model, and more consistency in the individual subject trends. Although this overall trend did not reach significance, it demonstrates a more coherent tendency of an image salience based viewing strategy under increased cognitive load.

An additional explanation could be that task relevance is entirely responsible for shifts in related gaze. It has already been demonstrated that task influences gaze [19, 23–25]. In our paradigm, the task required subjects to utilize memory in order to recall objects within scenes. Because this task is based in semantics, we saw a decrease in gaze within visually salient areas with increased cognitive load. However, how would we expect gaze to shift in a task where an increase in cognitive load requires searching for more visually salient features? For example, walking through a cluttered area at increasing speed–where an increase in cognitive load (speed at which you must scan the environment and avoid obstacles) is reliant on attention to visually salient features. In this case, perhaps the opposite result would be true: an increase in load would result in a decrease in gaze towards semantically salient features. As we only have empirical evidence for a semantically related task we cannot draw conclusions towards a visually salient one, however further research into this area may hold interesting findings on how task influences gaze.

Although there was no significant relationship between AUROC and cognitive load for GloVe model of semantic salience, our permutation analysis demonstrated that the GloVe model predicted gaze less than chance, meaning there was a negative correlation between semantic salience and the probability of an object being fixated. This finding indicates that observers may be more likely to fixate an ‘outlier’ object that is inconsistent with its semantic context than an object that is semantically related to its context.

A potential confounding factor in our design is the frequency of certain objects in our images. For example, windows occur in both outdoor and indoor scenes, and in the case of large buildings, there are numerous windows in one image. This led to “window” being a common search term due to the random nature of our object selection process, and as a result “window” was chosen more often than other objects. Observant participants may pick up on this trend and change their search strategy to scan for windows. Future iterations of this task should include a failsafe to ensure common objects such as “window” are not chosen as a search term more often than other terms. A solution would be to condition the random selection of search objects upon their frequency, so that the resulting sampling be uniform per unique item.

The GBVS model was a better predictor overall with this task. This suggests that image salience is the driving factor when searching during our task, rather than semantics as quantified by our language-based model. The viewing portion of our paradigm did not have an explicit task, although wide exploration of the scene is explicit given that the search item in an upcoming trial was unknown, this finding provides evidence that image salience guides eye-movements during exploration of natural scenes. It is possible that semantic models may have greater predictive power when observers are assigned an explicit task, such as to search for a specific object (e.g. window), in which case we would expect observers to look at window-like objects. Additionally, the GBVS model predicted human fixations increasingly greater with increased cognitive load. This implies that as cognitive load increases, gaze guidance relies more on sensory factors (image salience) and less on cognitive factors (semantic salience).

Supporting information

S1 Table. List of all manually edited object labels from the LabelMe database.

(XLSX)

Click here for additional data file.^{(18.3KB, xlsx)}

S2 Table. Top 5 scene labels obtained from PlacesCNN for each image.

(XLS)

Click here for additional data file.^{(17.5KB, xls)}

Data Availability

The subject data and images used for this study are available at osf.io/f5dhn.

Funding Statement

PB was funded by grant EY032162 from the NIH Research Project Grant Program (NIH R01 EY032162). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1.Borji A, Sihite DN, Itti L. Objects do not predict fixations better than early saliency: A re-analysis of Einhauser et al.’s data. J Vis. 2013;13: 1–4. doi: 10.1167/13.10.18 [DOI] [PubMed] [Google Scholar]
2.Harel J, Koch C, Perona P. Graph-Based Visual Saliency. Adv Neural Inf Process Syst. 2007;19: 545–552. 10.7551/mitpress/7503.003.0073 [DOI] [Google Scholar]
3.Itti L, Koch C. Computational modelling of visual attention. Nat Rev Neurosci. 2001;2: 194–203. doi: 10.1038/35058500 [DOI] [PubMed] [Google Scholar]
4.Parkhurst D, Law K, Niebur E. Modeling the role of salience in the allocation of overt visual attention. Vision Res. 2002;42: 107–123. doi: 10.1016/s0042-6989(01)00250-4 [DOI] [PubMed] [Google Scholar]
5.Henderson JM, Hayes TR, Peacock CE, Rehrig G. Meaning and attentional guidance in scenes: A review of the meaning map approach. Vis Switz. 2019;3. doi: 10.3390/vision3020019 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Hwang AD, Wang H-C, Pomplun M. Semantic guidance of eye movements in real-world scenes. Vision Res. 2011;51: 1192–1205. doi: 10.1016/j.visres.2011.03.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Nyström M, Holmqvist K. Semantic Override of Low-level Features in Image Viewing–Both Initially and Overall. J Eye Mov Res. 2008;2: 11. 10.16910/jemr.2.2.2 [DOI] [Google Scholar]
8.Onat S, Açık A, Schumann F, König P. The Contributions of Image Content and Behavioral Relevancy to Overt Attention. Antal A, editor. PLoS ONE. 2014;9: e93254. doi: 10.1371/journal.pone.0093254 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Rider AT, Coutrot A, Pellicano E, Dakin SC, Mareschal I. Semantic content outweighs low-level saliency in determining children’s and adults’ fixation of movies. J Exp Child Psychol. 2018;166: 293–309. doi: 10.1016/j.jecp.2017.09.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Rose D, Bex P. The Linguistic Analysis of Scene Semantics: LASS. Behav Res Methods. 2020. [cited 26 Aug 2020]. doi: 10.3758/s13428-020-01390-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Stoll J, Thrun M, Nuthmann A, Einhäuser W. Overt attention in natural scenes: Objects dominate features. Vision Res. 2015;107: 36–48. doi: 10.1016/j.visres.2014.11.006 [DOI] [PubMed] [Google Scholar]
12.Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell. 1998;20: 1254–1259. doi: 10.1109/34.730558 [DOI] [Google Scholar]
13.Pedziwiatr MA, Kümmerer M, Wallis TSA, Bethge M, Teufel C. Meaning maps and saliency models based on deep convolutional neural networks are insensitive to image meaning when predicting human fixations. Cognition. 2021;206: 104465. doi: 10.1016/j.cognition.2020.104465 [DOI] [PubMed] [Google Scholar]
14.Yan F, Chen C, Xiao P, Qi S, Wang Z, Xiao R. Review of Visual Saliency Prediction: Development Process from Neurobiological Basis to Deep Models. Appl Sci. 2021;12: 309. doi: 10.3390/app12010309 [DOI] [Google Scholar]
15.Castelhano MS, Mack ML, Henderson JM. Viewing task influences eye movement control during active scene perception. J Vis. 2009;9: 1–15. doi: 10.1167/9.3.6 [DOI] [PubMed] [Google Scholar]
16.Castelhano MS, Henderson JM. Initial scene representations facilitate eye movement guidance in visual search. J Exp Psychol Hum Percept Perform. 2007;33: 753–763. doi: 10.1037/0096-1523.33.4.753 [DOI] [PubMed] [Google Scholar]
17.Hayes TR, Henderson JM. Looking for Semantic Similarity: What a Vector-Space Model of Semantics Can Tell Us About Attention in Real-World Scenes. Psychol Sci. 2021;32: 1262–1270. doi: 10.1177/0956797621994768 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Yarbus AL. Eye movements during perception of complex objects. Boston, MA: Springer; 1967. Available: doi: 10.1007/978-1-4899-5379-7_8 [DOI] [Google Scholar]
19.Hayhoe MM, Shrivastava A, Mruczek R, Pelz JB. Visual memory and motor planning in a natural task. J Vis. 2003;3: 49–63. doi: 10.1167/3.1.6 [DOI] [PubMed] [Google Scholar]
20.Johansson RS, Westling G, Bäckström A, Flanagan JR. Eye–Hand Coordination in Object Manipulation. J Neurosci. 2001;21: 6917–6932. doi: 10.1523/JNEUROSCI.21-17-06917.2001 [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Land M, Mennie N, Rusted J. The Roles of Vision and Eye Movements in the Control of Activities of Daily Living. Perception. 1999;28: 1311–1328. doi: 10.1068/p2935 [DOI] [PubMed] [Google Scholar]
22.Võ ML-H. The meaning and structure of scenes. Vision Res. 2021;181: 10–20. doi: 10.1016/j.visres.2020.11.003 [DOI] [PubMed] [Google Scholar]
23.Barton SL, Matthis JS, Fajen BR. Control strategies for rapid, visually guided adjustments of the foot during continuous walking. Exp Brain Res. 2019;237: 1673–1690. doi: 10.1007/s00221-019-05538-7 [DOI] [PubMed] [Google Scholar]
24.Patla A, Vickers J. How far ahead do we look when required to step on specific locations in the travel path during locomotion? Exp Brain Res. 2003;148: 133–138. doi: 10.1007/s00221-002-1246-y [DOI] [PubMed] [Google Scholar]
25.Domínguez-Zamora FJ, Marigold DS. Motives driving gaze and walking decisions. Curr Biol. 2021;31: 1632–1642.e4. doi: 10.1016/j.cub.2021.01.069 [DOI] [PubMed] [Google Scholar]
26.Rothkopf CA, Ballard DH, Hayhoe MM. Task and context determine where you look. J Vis. 2016;7: 16. doi: 10.1167/7.14.16 [DOI] [PubMed] [Google Scholar]
27.Pennington J, Socher R, Manning C. Glove: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Computational Linguistics; 2014. pp. 1532–1543. doi: 10.3115/v1/D14-1162 [DOI] [Google Scholar]
28.Garbutt S, Matlin A, Hellmuth J, Schenk AK, Johnson JK, Rosen H, et al. Oculomotor function in frontotemporal lobar degeneration, related disorders and Alzheimer’s disease. Brain. 2008;131: 1268–1281. doi: 10.1093/brain/awn047 [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Molitor RJ, Ko PC, Ally BA. Eye Movements in Alzheimer’s Disease. J Alzheimers Dis JAD. 2015;44: 1–12. doi: 10.3233/JAD-141173 [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Pavisic IM, Firth NC, Parsons S, Martinez Rego D, Shakespeare TJ, Yong KXX, et al. Eyetracking Metrics in Young Onset Alzheimer’s Disease: A Window into Cognitive Visual Functions. Front Neurol. 2017;8. doi: 10.3389/fneur.2017.00377 [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Zaccara G, Gangemi PF, Muscas GC, Paganini M, Pallanti S, Parigi A, et al. Smooth-pursuit eye movements: alterations in Alzheimer’s disease. J Neurol Sci. 1992;112: 81–89. doi: 10.1016/0022-510x(92)90136-9 [DOI] [PubMed] [Google Scholar]
32.Coubard OA. What do we know about eye movements in Alzheimer’s disease? The past 37 years and future directions. Biomark Med. 2016;10: 677–680. doi: 10.2217/bmm-2016-0095 [DOI] [PubMed] [Google Scholar]
33.Howard PL, Zhang L, Benson V. What Can Eye Movements Tell Us about Subtle Cognitive Processing Differences in Autism? Vision. 2019;3: 22. doi: 10.3390/vision3020022 [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Wang S, Jiang M, Duchesne XM, Laugeson EA, Kennedy DP, Adolphs R, et al. Atypical Visual Saliency in Autism Spectrum Disorder Quantified through Model-Based Eye Tracking. Neuron. 2015;88: 604–616. doi: 10.1016/j.neuron.2015.09.042 [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Walter K, Bex P. Cognitive Load Influences Oculomotor Behavior in Natural Scenes. Sci Rep. 2021;11: 12405. doi: 10.1038/s41598-021-91845-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Sweller J. Cognitive Load During Problem Solving: Effects on Learning. Cogn Sci. 1988;12: 29. doi: 10.1207/s15516709cog1202_4 [DOI] [Google Scholar]
37.Braver TS, Cohen JD, Nystrom LE, Jonides J, Smith EE, Noll DC. A Parametric Study of Prefrontal Cortex Involvement in Human Working Memory. NeuroImage. 1997;5: 49–62. doi: 10.1006/nimg.1996.0247 [DOI] [PubMed] [Google Scholar]
38.Carlson S. Distribution of cortical activation during visuospatial n-back tasks as revealed by functional magnetic resonance imaging. Cereb Cortex. 1998;8: 743–752. doi: 10.1093/cercor/8.8.743 [DOI] [PubMed] [Google Scholar]
39.Jonides J, Schumacher EH, Smith EE, Lauber EJ, Awh E, Minoshima S, et al. Verbal Working Memory Load Affects Regional Brain Activation as Measured by PET. J Cogn Neurosci. 1997;9: 462–475. doi: 10.1162/jocn.1997.9.4.462 [DOI] [PubMed] [Google Scholar]
40.Manoach DS, Schlaug G, Siewert B, Darby DG, Bly BM, Benfield A, et al. Prefrontal cortex fMRI signal changes are correlated with working memory load. NeuroReport. 1997;8: 545–549. doi: 10.1097/00001756-199701200-00033 [DOI] [PubMed] [Google Scholar]
41.Perlstein WM, Dixit NK, Carter CS, Noll DC, Cohen JD. Prefrontal cortex dysfunction mediates deficits in working memory and prepotent responding in schizophrenia. Biol Psychiatry. 2003;53: 25–38. doi: 10.1016/s0006-3223(02)01675-x [DOI] [PubMed] [Google Scholar]
42.Belke E, Humphreys GW, Watson DG, Meyer AS, Telling AL. Top-down effects of semantic knowledge in visual search are modulated by cognitive but not perceptual load. Percept Psychophys. 2008;70: 1444–1458. doi: 10.3758/PP.70.8.1444 [DOI] [PubMed] [Google Scholar]
43.Brainard DH. The Psychophysics Toolbox. Spat Vis. 1997;10: 433–436. doi: 10.1163/156856897X00357 [DOI] [PubMed] [Google Scholar]
44.Cornelissen FW, Peters EM, Palmer J. The Eyelink Toolbox: Eye tracking with MATLAB and the Psychophysics Toolbox. Behav Res Methods Instrum Comput. 2002;34: 613–617. doi: 10.3758/bf03195489 [DOI] [PubMed] [Google Scholar]
45.Russell BC, Torralba A, Murphy KP, Freeman WT. LabelMe: A database and web-based tool for image annotation. Int J Comput Vis. 2008;77: 157–173. doi: 10.1007/s11263-007-0090-8 [DOI] [Google Scholar]
46.Walter K. SceneProcessing_SalienceSemantics. In: Retrieved from osf.io/f5dhn. 29 Jul 2021. [Google Scholar]
47.Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A. Learning Deep Features for Scene Recognition using Places Database. Adv Neural Inf Process Syst. 2014;27. Available: http://hdl.handle.net/1721.1/96941 [Google Scholar]
48.Granholm E, Asarnow R, Sarkin A, Dykes K. Pupillary responses index cognitive resource limitations. Psychophysiology. 1996;33: 457–461. doi: 10.1111/j.1469-8986.1996.tb01071.x [DOI] [PubMed] [Google Scholar]
49.Bylinskii Z, Judd T, Oliva A, Torralba A, Durand F. What do different evaluation metrics tell us about saliency models? ArXiv160403605 Cs. 2017. [cited 6 Oct 2021]. Available: http://arxiv.org/abs/1604.03605 [DOI] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0277691.r001

Decision Letter 0

Avanti Dey

9 Aug 2022

PONE-D-22-06433Low-Level Factors Increase Gaze-Guidance Under Cognitive Load: A Comparison of Image-Salience and Semantic-Salience ModelsPLOS ONE

Dear Dr. Walter,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Can you please address the expert reviewer's comments thoroughly?

Please submit your revised manuscript by Sep 22 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Avanti Dey, PhD

Staff Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf".

2. Please provide additional details regarding participant consent. In the ethics statement in the Methods and online submission information, please ensure that you have specified what type you obtained (for instance, written or verbal, and if verbal, how it was documented and witnessed). If your study included minors, state whether you obtained consent from parents or guardians. If the need for consent was waived by the ethics committee, please include this information.

3. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

4. Thank you for stating the following in the Acknowledgments Section of your manuscript:

“Supported by NIH R01 EY029713.”

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

“PB was funded by grant EY029713 from the NIH Research Project Grant Program (https://www.nih.gov/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

5. Please amend either the title on the online submission form (via Edit Submission) or the title in the manuscript so that they are identical.

6. Please upload a copy of Figure 8, to which you refer in your text on page 15. If the figure is no longer to be included as part of the submission please remove all reference to it within the text.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Summary

This manuscript reports the results of one experiment that was aimed at testing how the contributions of low-level and high-level processes to gaze guidance change with cognitive load. The main hypothesis is that low-level factors should contribute more as cognitive load increases, much like individuals with cognitive impairment exhibited abnormal oculomotor behavior under certain conditions. Subjects viewed a series of images of natural scenes for 10 seconds each with no gaze restrictions (free viewing). Between images, they were presented with four objects and had to choose the object from the image N trials back. The N-back was adaptive in that N increased following two correct responses and decreased following one incorrect response. The authors used two models to generate predictions about fixation locations for each image: a low-level, saliency-based model (GBVS) and a high-level, language-based model (GloVe). The GBVS model outperformed the GloVe model in that it generated predictions of fixations that were both more accurate and (apparently) varied with cognitive load. The authors conclude that low-level saliency is the best predictor of fixations and that the bias to look toward salient low-level image features increases with cognitive load.

Overall evaluation

My overall evaluation of this paper is mixed. On the one hand, the study explores an interesting hypothesis that has both theoretical and practical significance. Their approach makes clever use of the N-back task, and they employed established models to generate competing predictions. The manuscript was generally well-written and easy to follow. On the other hand, there are several issues that I feel need to be addressed before the manuscript could be considered for publication.

Major/general issues

I felt that the logic of the approach was not explained as clearly as it could have been. Specifically, it was not immediately clear to me why it was necessary or useful to calculate the similarity between each object in an image and the scene label (as described on p. 9). If I understand correctly, the assumption is that if gaze is guided by top-down processes, then observers will fixate objects that are similar to overall description of a scene (e.g., if the image depicts an office, they should look at the computer or the desk). This is a key part of the logic of the experiment but was never explicitly stated. I would recommend adding a few sentences to the end of the introduction and then restating it on p. 9 to make this point clear.

The main conclusion rests on the (apparently) significant effect of N-back in Figure 5, interpreting a borderline non-significant effect of N-back in Figure 7A as meaningful, and interpreting the non-significant effect in Figures 6 and 7B as no effect of N-back. Overall, the evidence seems rather weak. Looking at the figures, it is hard to see an effect in Figure 5 and 7A but not in Figure 6 and 7B. Can the authors explain this? Wouldn’t it make more sense to base the interpretation on the interaction between N-back and model?

How broadly would the authors like to generalize their conclusion that “as cognitive load increases, gaze guidance relies more on sensory factors (image salience) and less on cognitive factors (semantic salience)” (last sentence of the manuscript, middle of p. 16). I have two reasons for asking. First, the GLoVe model is not the only way to quantify semantic salience. There could be other approaches. For example, perhaps observers look at objects that reduce their uncertainty about the overall meaning of the scene. A model based on this assumption would generate a different set of semantic saliency maps that might better fit the human data. Second, I could also see the effect of cognitive load working in the other direction in some circumstances. For example, when walking at a leisurely pace on a relatively flat hiking trail (low cognitive load), gaze may be guided by low-level image salience (e.g., the motion of an animal scurrying in the woods, a bright yellow flower). If the terrain becomes more challenging and the hiker starts running (high cognitive load), I would think that gaze would be driven less by image salience and more by task-relevance (i.e., a top-down process). Wouldn’t this example suggest the opposite of the main conclusion of this study?

Minor/specific issues

p. 3: The authors explained that they chose to use the GBVS model to generate predictions based on the low-level saliency account. What was the justification for choosing this model rather than one of the other saliency-based models? Would the results have turned out any different if one of the alternative models was used?

p. 3 (bottom): If you’re looking to cite papers that provide evidence of the effects of task on gaze behavior during walking, I would recommend the recent paper by Dominguez-Zamora & Marigold (2021) in Current Biology.

p. 5 (top): Please clarify the differences between this study and Walter & Bex (2021, Sci Rep). The two seem very similar but there is only a single sentence (top of p. 5) that speaks to the difference.

p. 6: At this point, it would be useful to the reader to see some examples of images used in the experiment.

p. 6: What was the justification for the decision to use 33 subjects?

p. 8 (middle): I found the first paragraph of the section on semantic salience to be a bit confusing. The authors refer to a “descriptive label for each scene” and a process for calculating the semantic similarity between the scene label and object label, as if these terms/concepts and their meaning in the context of this study were already introduced. They are eventually unpacked on the next page but the authors might consider letting readers know that so that they understand that further explanation is forthcoming.

p. 9 (middle): It would help to be provided with an example or two object-scene pairings that received a high and low ratings.

p. 9 (bottom): If I understand correctly, the alternative approach from Henderson et al. (2019) is an alternative to generating a map of semantic salience. However, because it appears in the middle of a paragraph about how the authors dealt with the problem of dual-word objects, I initially assumed that it was an alternative approach to dealing with that problem.

p. 10 (bottom): The authors report that as N-back increase, they “observed a significant increase in response time compared to N=0”. However, they did not report a statistical test result or show this result in a figure. (The t-test at the end of the paragraph refers to a different comparison.)

p. 11 (top): Please clarify what it means to “set levels of specificity as 100 steps from 0 to 1”. Specificity of what?

p. 13 (top): The authors’ interpretation of the results depicted in Figure 5 was that “the gaze of subjects who excel at this cognitive load task (those reaching higher N-backs, or those who have high cognitive load capacities), are guided more by low-level image salience features than those who have lower cognitive load capacities.” While this may be correct, I’m not entirely convinced that this is what is shown by Figure 5. I think what they mean is that when subjects reached higher N-backs, their gaze was guided more by low-level image salience features than when they were at lower N-backs. This is not the same as what is written at the top of p. 13.

p. 15 (middle): There is a reference to Figure 8 but no Figure 8 in the manuscript.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Nov 28;17(11):e0277691. doi: 10.1371/journal.pone.0277691.r002

Author response to Decision Letter 0

12 Sep 2022

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf".

The manuscript and corresponding files have been edited to reflect the style requirements of PLOS ONE.

Additional information concerning participant consent has been added to the Methods section and online submission form.

We did not have access to the ‘Funding Information’ or ‘Financial Disclosure’ sections when attempting to resubmit with revisions on the editorial manager portal – the correct grant number is NIH R01 EY029713.

4. Thank you for stating the following in the Acknowledgments Section of your manuscript:

“Supported by NIH R01 EY029713.”

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

We approve this funding statement.

5. Please amend either the title on the online submission form (via Edit Submission) or the title in the manuscript so that they are identical.

6. Please upload a copy of Figure 8, to which you refer in your text on page 15. If the figure is no longer to be included as part of the submission please remove all reference to it within the text.

The figure referenced on page 15 has been corrected to Figure 7.

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Partly

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

3. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

4. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

5. Review Comments to the Author

Reviewer #1: Summary

Overall evaluation

Major/general issues

We have added some clarification on the logic of the experiment at the end of the

“Present study” section of the introduction and reiterated it within the “Semantic salience” section of the procedure (we have also rearranged this section as per a later comment).

Although the figures are similar, the results of our mixed linear model found a significant increase for GBVS and no significant difference for GloVe. We believe this is due to an overall more consistent agreement of trends for GBVS compared to GloVe. This can be seen more clearly in Fig7, where the individual subject lines more consistently trend upwards in A compared to B.

This is a great point which allowed us to think a bit deeper about our results. We believe our results are indicative of the subject attending to the primary task at hand, in this case remembering objects. Here we show that specifically with memory, subjects gaze locations were less well-predicted by image salience with increased load (as the task, remembering objects, was semantic based). Because we only have empirical evidence for this task, it would be interesting to have evidence for other tasks, e.g. the hiking example mentioned here. If these results are strictly task relevant, then we predict that observers would fixate task-relevant locations (e.g. secure footholds that may not be correlated with image salience). Although it is a matter for empirical determination, we agree with the reviewer that we would expect to see the opposite effect of cognitive load on fixations for a secondary task in the hiking example, where task-relevant areas would be more image salience based, and as such an increase in load would result in a decrease of gaze in semantic salience related areas. We have included some discussion about this in the discussion section.

Minor/specific issues

We chose the GBVS model due to its availability as an open-source toolbox and its robustness and evaluated success in predicting human fixations. Because the majority of salience models utilize most of the same main filtering channels (ie. color, contrast, intensity), we believe our results would be largely unchanged if a different model were used. We have added this justification for our choice of model in the text.

Thank you for the relevant paper, we have added it as a reference on how task objective impacts gaze guidance.

The study is the same as that performed in Walter & Bex 2021, analyzed differently. The eye movement data obtained in Walter & Bex 2021 are the same data used here, however the previous study analyzed the oculomotor metrics while this study analyzes the salience metrics at fixated locations. This has been clarified in the text.

p. 6: At this point, it would be useful to the reader to see some examples of images used in the experiment.

We have directed attention to Fig 1 within the stimuli paragraph, as it contains some examples of images used in the experiment. We did not move this Fig up in the text because it demonstrates the procedure and would be confusing to show before the procedure is explained. However, making note of the Fig within the stimuli section should direct readers to it for examples of images used in the experiment.

p. 6: What was the justification for the decision to use 33 subjects?

This was an exploratory investigation with novel endpoints and therefore we could not conduct a power analysis to estimate sample size for a target effect size. As a conservative approach, we determined a stopping number of 30 before data collection and tested until 30 usable subjects were obtained (the three subjects that were excluded were known to be excluded immediately after their participation). This has been clarified in the text.

We have moved the explanation of how descriptive scene labels were obtained to before Table 1, within the same paragraph they are introduced.

p. 9 (middle): It would help to be provided with an example or two object-scene pairings that received a high and low ratings.

We have included an example of a high and low pairing; “office” and “desk” (0.6319) compared to “office” and “parrot” (0.0673).

We have separated this section into its own paragraph to clarify that these are individual ideas.

Response latency results are reported in more depth in Walter & Bex, 2021, we have added a citation in case readers are interested in exploring those results further. We did not want to re-use figures in an alternate journal for copyright reasons, however felt a note about response time was important in both papers. Due to the unbalanced sample sizes for each of our N-back groups, we ran conservative t-tests for each N-back, and for simplicity, plotted the significant results against N=0. All N-backs were significant at or below .01 except for N=9 and N=10, due to their low sample sizes. We have made a statistical note in the text and have referred readers to the earlier paper.

p. 11 (top): Please clarify what it means to “set levels of specificity as 100 steps from 0 to 1”. Specificity of what?

The range of specificity levels corresponds to the range of salience values. In order to perform the ROC analysis, we must test the salience value of each pixel across the image with increasing specificity. For example, if the first level of specificity is all salience values between 0 and .01, the next is all salience values between .01 and .02, etc. We have included a note in the text that the specificity is in correspondence with the range of salience values.

We have added a sentence to relate directly back to Figure 5 that explains increased cognitive load results in gaze-guidance through low-level features, and kept our original statement as an additional conclusion, less so directly related to Figure 5 itself.

p. 15 (middle): There is a reference to Figure 8 but no Figure 8 in the manuscript.

The figure referenced on page 15 was corrected to Figure 7.

Attachment

Submitted filename: SceneProcessing_SalienceSemantics - PLOS_Response.docx

Click here for additional data file.^{(26.4KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0277691.r003

Decision Letter 1

Marcela de Lourdes Peña Garay

5 Oct 2022

PONE-D-22-06433R1Low-level factors increase gaze-guidance under cognitive load: a comparison of image-salience and semantic-salience modelsPLOS ONE

Dear Dr. Walter,

Please submit your revised manuscript by Nov 19 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Marcela de Lourdes Peña Garay, Ph.D

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments (if provided):

Please, we need that you consider to rephrase the comment from reviewer 1 on page 6.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

**********

6. Review Comments to the Author

Reviewer #1: The authors have addressed all of my comments with one exception. I feel that their characterization of the relation between the present study and Walter & Bex (2021) was a bit vague. On p. 6, they wrote that "In the present study, we utilize the same paradigm...". If I understand correctly, they used the same data. In other words, in the present study, the authors reanalyzed the data that was published in the previous study. The new analyses are sufficiently different from those of the previous study, so I'm not suggesting that this is a problem. My point is simply that it should be clear to the reader that the data are the same.

Other than this one minor point, I think that the paper is suitable for publication. The authors did an excellent job responding to my concerns and the revised version will make an interesting contribution to the literature.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

**********

PLoS One. 2022 Nov 28;17(11):e0277691. doi: 10.1371/journal.pone.0277691.r004

Author response to Decision Letter 1

7 Oct 2022

Journal Requirements:

The manuscript has been checked and all references are current and relevant.

Additional Editor Comments (if provided):

Please, we need that you consider to rephrase the comment from reviewer 1 on page 6.

Reviewer 1’s comment has been addressed in the manuscript.

Additionally, after uploading figures to PACE, figures with individual subject lines (fig5-fig7) became blurred by the thinness and lightness of the individual subject traces. We have edited the lines so that they are darker and clearer after being uploaded to PACE.

Reviewers' comments:

It has been emphasized in the introduction of the present study (pg. 5) that the data used in the present analysis are that of the study from Walter & Bex (2021).

Attachment

Submitted filename: SceneProcessing_SalienceSemantics - PLOS_Response2.docx

Click here for additional data file.^{(16.3KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0277691.r005

Decision Letter 2

Marcela de Lourdes Peña Garay

2 Nov 2022

Low-level factors increase gaze-guidance under cognitive load: a comparison of image-salience and semantic-salience models

PONE-D-22-06433R2

Dear Dr. Walter,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Marcela de Lourdes Peña Garay, Ph.D

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

PLoS One. doi: 10.1371/journal.pone.0277691.r006

Acceptance letter

Marcela de Lourdes Peña Garay

5 Nov 2022

PONE-D-22-06433R2

Low-level factors increase gaze-guidance under cognitive load: a comparison of image-salience and semantic-salience models

Dear Dr. Walter:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Marcela de Lourdes Peña Garay

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Table. List of all manually edited object labels from the LabelMe database.

(XLSX)

Click here for additional data file.^{(18.3KB, xlsx)}

S2 Table. Top 5 scene labels obtained from PlacesCNN for each image.

(XLS)

Click here for additional data file.^{(17.5KB, xls)}

Attachment

Submitted filename: SceneProcessing_SalienceSemantics - PLOS_Response.docx

Click here for additional data file.^{(26.4KB, docx)}

Attachment

Submitted filename: SceneProcessing_SalienceSemantics - PLOS_Response2.docx

Click here for additional data file.^{(16.3KB, docx)}

Data Availability Statement

The subject data and images used for this study are available at osf.io/f5dhn.

[pone.0277691.ref001] 1.Borji A, Sihite DN, Itti L. Objects do not predict fixations better than early saliency: A re-analysis of Einhauser et al.’s data. J Vis. 2013;13: 1–4. doi: 10.1167/13.10.18 [DOI] [PubMed] [Google Scholar]

[pone.0277691.ref002] 2.Harel J, Koch C, Perona P. Graph-Based Visual Saliency. Adv Neural Inf Process Syst. 2007;19: 545–552. 10.7551/mitpress/7503.003.0073 [DOI] [Google Scholar]

[pone.0277691.ref003] 3.Itti L, Koch C. Computational modelling of visual attention. Nat Rev Neurosci. 2001;2: 194–203. doi: 10.1038/35058500 [DOI] [PubMed] [Google Scholar]

[pone.0277691.ref004] 4.Parkhurst D, Law K, Niebur E. Modeling the role of salience in the allocation of overt visual attention. Vision Res. 2002;42: 107–123. doi: 10.1016/s0042-6989(01)00250-4 [DOI] [PubMed] [Google Scholar]

[pone.0277691.ref005] 5.Henderson JM, Hayes TR, Peacock CE, Rehrig G. Meaning and attentional guidance in scenes: A review of the meaning map approach. Vis Switz. 2019;3. doi: 10.3390/vision3020019 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0277691.ref006] 6.Hwang AD, Wang H-C, Pomplun M. Semantic guidance of eye movements in real-world scenes. Vision Res. 2011;51: 1192–1205. doi: 10.1016/j.visres.2011.03.010 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0277691.ref007] 7.Nyström M, Holmqvist K. Semantic Override of Low-level Features in Image Viewing–Both Initially and Overall. J Eye Mov Res. 2008;2: 11. 10.16910/jemr.2.2.2 [DOI] [Google Scholar]

[pone.0277691.ref008] 8.Onat S, Açık A, Schumann F, König P. The Contributions of Image Content and Behavioral Relevancy to Overt Attention. Antal A, editor. PLoS ONE. 2014;9: e93254. doi: 10.1371/journal.pone.0093254 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0277691.ref009] 9.Rider AT, Coutrot A, Pellicano E, Dakin SC, Mareschal I. Semantic content outweighs low-level saliency in determining children’s and adults’ fixation of movies. J Exp Child Psychol. 2018;166: 293–309. doi: 10.1016/j.jecp.2017.09.002 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0277691.ref010] 10.Rose D, Bex P. The Linguistic Analysis of Scene Semantics: LASS. Behav Res Methods. 2020. [cited 26 Aug 2020]. doi: 10.3758/s13428-020-01390-8 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0277691.ref011] 11.Stoll J, Thrun M, Nuthmann A, Einhäuser W. Overt attention in natural scenes: Objects dominate features. Vision Res. 2015;107: 36–48. doi: 10.1016/j.visres.2014.11.006 [DOI] [PubMed] [Google Scholar]

[pone.0277691.ref012] 12.Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell. 1998;20: 1254–1259. doi: 10.1109/34.730558 [DOI] [Google Scholar]

[pone.0277691.ref013] 13.Pedziwiatr MA, Kümmerer M, Wallis TSA, Bethge M, Teufel C. Meaning maps and saliency models based on deep convolutional neural networks are insensitive to image meaning when predicting human fixations. Cognition. 2021;206: 104465. doi: 10.1016/j.cognition.2020.104465 [DOI] [PubMed] [Google Scholar]

[pone.0277691.ref014] 14.Yan F, Chen C, Xiao P, Qi S, Wang Z, Xiao R. Review of Visual Saliency Prediction: Development Process from Neurobiological Basis to Deep Models. Appl Sci. 2021;12: 309. doi: 10.3390/app12010309 [DOI] [Google Scholar]

[pone.0277691.ref015] 15.Castelhano MS, Mack ML, Henderson JM. Viewing task influences eye movement control during active scene perception. J Vis. 2009;9: 1–15. doi: 10.1167/9.3.6 [DOI] [PubMed] [Google Scholar]

[pone.0277691.ref016] 16.Castelhano MS, Henderson JM. Initial scene representations facilitate eye movement guidance in visual search. J Exp Psychol Hum Percept Perform. 2007;33: 753–763. doi: 10.1037/0096-1523.33.4.753 [DOI] [PubMed] [Google Scholar]

[pone.0277691.ref017] 17.Hayes TR, Henderson JM. Looking for Semantic Similarity: What a Vector-Space Model of Semantics Can Tell Us About Attention in Real-World Scenes. Psychol Sci. 2021;32: 1262–1270. doi: 10.1177/0956797621994768 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0277691.ref018] 18.Yarbus AL. Eye movements during perception of complex objects. Boston, MA: Springer; 1967. Available: doi: 10.1007/978-1-4899-5379-7_8 [DOI] [Google Scholar]

[pone.0277691.ref019] 19.Hayhoe MM, Shrivastava A, Mruczek R, Pelz JB. Visual memory and motor planning in a natural task. J Vis. 2003;3: 49–63. doi: 10.1167/3.1.6 [DOI] [PubMed] [Google Scholar]

[pone.0277691.ref020] 20.Johansson RS, Westling G, Bäckström A, Flanagan JR. Eye–Hand Coordination in Object Manipulation. J Neurosci. 2001;21: 6917–6932. doi: 10.1523/JNEUROSCI.21-17-06917.2001 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0277691.ref021] 21.Land M, Mennie N, Rusted J. The Roles of Vision and Eye Movements in the Control of Activities of Daily Living. Perception. 1999;28: 1311–1328. doi: 10.1068/p2935 [DOI] [PubMed] [Google Scholar]

[pone.0277691.ref022] 22.Võ ML-H. The meaning and structure of scenes. Vision Res. 2021;181: 10–20. doi: 10.1016/j.visres.2020.11.003 [DOI] [PubMed] [Google Scholar]

[pone.0277691.ref023] 23.Barton SL, Matthis JS, Fajen BR. Control strategies for rapid, visually guided adjustments of the foot during continuous walking. Exp Brain Res. 2019;237: 1673–1690. doi: 10.1007/s00221-019-05538-7 [DOI] [PubMed] [Google Scholar]

[pone.0277691.ref024] 24.Patla A, Vickers J. How far ahead do we look when required to step on specific locations in the travel path during locomotion? Exp Brain Res. 2003;148: 133–138. doi: 10.1007/s00221-002-1246-y [DOI] [PubMed] [Google Scholar]

[pone.0277691.ref025] 25.Domínguez-Zamora FJ, Marigold DS. Motives driving gaze and walking decisions. Curr Biol. 2021;31: 1632–1642.e4. doi: 10.1016/j.cub.2021.01.069 [DOI] [PubMed] [Google Scholar]

[pone.0277691.ref026] 26.Rothkopf CA, Ballard DH, Hayhoe MM. Task and context determine where you look. J Vis. 2016;7: 16. doi: 10.1167/7.14.16 [DOI] [PubMed] [Google Scholar]

[pone.0277691.ref027] 27.Pennington J, Socher R, Manning C. Glove: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Computational Linguistics; 2014. pp. 1532–1543. doi: 10.3115/v1/D14-1162 [DOI] [Google Scholar]

[pone.0277691.ref028] 28.Garbutt S, Matlin A, Hellmuth J, Schenk AK, Johnson JK, Rosen H, et al. Oculomotor function in frontotemporal lobar degeneration, related disorders and Alzheimer’s disease. Brain. 2008;131: 1268–1281. doi: 10.1093/brain/awn047 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0277691.ref029] 29.Molitor RJ, Ko PC, Ally BA. Eye Movements in Alzheimer’s Disease. J Alzheimers Dis JAD. 2015;44: 1–12. doi: 10.3233/JAD-141173 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0277691.ref030] 30.Pavisic IM, Firth NC, Parsons S, Martinez Rego D, Shakespeare TJ, Yong KXX, et al. Eyetracking Metrics in Young Onset Alzheimer’s Disease: A Window into Cognitive Visual Functions. Front Neurol. 2017;8. doi: 10.3389/fneur.2017.00377 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0277691.ref031] 31.Zaccara G, Gangemi PF, Muscas GC, Paganini M, Pallanti S, Parigi A, et al. Smooth-pursuit eye movements: alterations in Alzheimer’s disease. J Neurol Sci. 1992;112: 81–89. doi: 10.1016/0022-510x(92)90136-9 [DOI] [PubMed] [Google Scholar]

[pone.0277691.ref032] 32.Coubard OA. What do we know about eye movements in Alzheimer’s disease? The past 37 years and future directions. Biomark Med. 2016;10: 677–680. doi: 10.2217/bmm-2016-0095 [DOI] [PubMed] [Google Scholar]

[pone.0277691.ref033] 33.Howard PL, Zhang L, Benson V. What Can Eye Movements Tell Us about Subtle Cognitive Processing Differences in Autism? Vision. 2019;3: 22. doi: 10.3390/vision3020022 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0277691.ref034] 34.Wang S, Jiang M, Duchesne XM, Laugeson EA, Kennedy DP, Adolphs R, et al. Atypical Visual Saliency in Autism Spectrum Disorder Quantified through Model-Based Eye Tracking. Neuron. 2015;88: 604–616. doi: 10.1016/j.neuron.2015.09.042 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0277691.ref035] 35.Walter K, Bex P. Cognitive Load Influences Oculomotor Behavior in Natural Scenes. Sci Rep. 2021;11: 12405. doi: 10.1038/s41598-021-91845-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0277691.ref036] 36.Sweller J. Cognitive Load During Problem Solving: Effects on Learning. Cogn Sci. 1988;12: 29. doi: 10.1207/s15516709cog1202_4 [DOI] [Google Scholar]

[pone.0277691.ref037] 37.Braver TS, Cohen JD, Nystrom LE, Jonides J, Smith EE, Noll DC. A Parametric Study of Prefrontal Cortex Involvement in Human Working Memory. NeuroImage. 1997;5: 49–62. doi: 10.1006/nimg.1996.0247 [DOI] [PubMed] [Google Scholar]

[pone.0277691.ref038] 38.Carlson S. Distribution of cortical activation during visuospatial n-back tasks as revealed by functional magnetic resonance imaging. Cereb Cortex. 1998;8: 743–752. doi: 10.1093/cercor/8.8.743 [DOI] [PubMed] [Google Scholar]

[pone.0277691.ref039] 39.Jonides J, Schumacher EH, Smith EE, Lauber EJ, Awh E, Minoshima S, et al. Verbal Working Memory Load Affects Regional Brain Activation as Measured by PET. J Cogn Neurosci. 1997;9: 462–475. doi: 10.1162/jocn.1997.9.4.462 [DOI] [PubMed] [Google Scholar]

[pone.0277691.ref040] 40.Manoach DS, Schlaug G, Siewert B, Darby DG, Bly BM, Benfield A, et al. Prefrontal cortex fMRI signal changes are correlated with working memory load. NeuroReport. 1997;8: 545–549. doi: 10.1097/00001756-199701200-00033 [DOI] [PubMed] [Google Scholar]

[pone.0277691.ref041] 41.Perlstein WM, Dixit NK, Carter CS, Noll DC, Cohen JD. Prefrontal cortex dysfunction mediates deficits in working memory and prepotent responding in schizophrenia. Biol Psychiatry. 2003;53: 25–38. doi: 10.1016/s0006-3223(02)01675-x [DOI] [PubMed] [Google Scholar]

[pone.0277691.ref042] 42.Belke E, Humphreys GW, Watson DG, Meyer AS, Telling AL. Top-down effects of semantic knowledge in visual search are modulated by cognitive but not perceptual load. Percept Psychophys. 2008;70: 1444–1458. doi: 10.3758/PP.70.8.1444 [DOI] [PubMed] [Google Scholar]

[pone.0277691.ref043] 43.Brainard DH. The Psychophysics Toolbox. Spat Vis. 1997;10: 433–436. doi: 10.1163/156856897X00357 [DOI] [PubMed] [Google Scholar]

[pone.0277691.ref044] 44.Cornelissen FW, Peters EM, Palmer J. The Eyelink Toolbox: Eye tracking with MATLAB and the Psychophysics Toolbox. Behav Res Methods Instrum Comput. 2002;34: 613–617. doi: 10.3758/bf03195489 [DOI] [PubMed] [Google Scholar]

[pone.0277691.ref045] 45.Russell BC, Torralba A, Murphy KP, Freeman WT. LabelMe: A database and web-based tool for image annotation. Int J Comput Vis. 2008;77: 157–173. doi: 10.1007/s11263-007-0090-8 [DOI] [Google Scholar]

[pone.0277691.ref046] 46.Walter K. SceneProcessing_SalienceSemantics. In: Retrieved from osf.io/f5dhn. 29 Jul 2021. [Google Scholar]

[pone.0277691.ref047] 47.Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A. Learning Deep Features for Scene Recognition using Places Database. Adv Neural Inf Process Syst. 2014;27. Available: http://hdl.handle.net/1721.1/96941 [Google Scholar]

[pone.0277691.ref048] 48.Granholm E, Asarnow R, Sarkin A, Dykes K. Pupillary responses index cognitive resource limitations. Psychophysiology. 1996;33: 457–461. doi: 10.1111/j.1469-8986.1996.tb01071.x [DOI] [PubMed] [Google Scholar]

[pone.0277691.ref049] 49.Bylinskii Z, Judd T, Oliva A, Torralba A, Durand F. What do different evaluation metrics tell us about saliency models? ArXiv160403605 Cs. 2017. [cited 6 Oct 2021]. Available: http://arxiv.org/abs/1604.03605 [DOI] [PubMed] [Google Scholar]

PERMALINK

Low-level factors increase gaze-guidance under cognitive load: A comparison of image-salience and semantic-salience models

Kerri Walter

Peter Bex

Roles

Abstract

Introduction

Image salience

Semantic salience

Cognitive impairment

Present study

Methods

Apparatus

Stimuli

Fig 1. Illustration of the course of events within the procedure.

Participants

Procedure

Image salience: Graph Based Visual Saliency (GBVS)

Semantic salience: Global Vectors for Word Representation (GloVe)

Table 1. List of criteria for manually editing object labels.

Spatial smoothing

Fig 2. Examples of one image and salience heatmap overlaid with a representative subject’s gaze data.

Results

Permutation analysis

Fig 3. Permutation analysis for each model.

Model summary

Fig 4. Mean and standard deviation of AUROC for each model.

GBVS

Fig 5. Area under the ROC curve for GBVS prediction of gaze at each N-back.

Fig 7. Area under the ROC at minimum and maximum cognitive load for each model.

GloVe

Fig 6. Area under the ROC curve for GloVe-based predictions of gaze at each N-back.

Discussion

Supporting information

Data Availability

Funding Statement

References

Decision Letter 0

Avanti Dey

Roles

Author response to Decision Letter 0

Decision Letter 1

Marcela de Lourdes Peña Garay

Roles

Author response to Decision Letter 1

Decision Letter 2

Marcela de Lourdes Peña Garay

Roles

Acceptance letter

Marcela de Lourdes Peña Garay

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases