EM-COGLOAD: An investigation into age and cognitive load detection using eye tracking and deep learning

Gabriella Miles; Melvyn Smith; Nancy Zook; Wenhao Zhang

doi:10.1016/j.csbj.2024.03.014

. 2024 Mar 27;24:264–280. doi: 10.1016/j.csbj.2024.03.014

EM-COGLOAD: An investigation into age and cognitive load detection using eye tracking and deep learning

Gabriella Miles ^a,^⁎, Melvyn Smith ^a, Nancy Zook ^b, Wenhao Zhang ^a

PMCID: PMC11024913 PMID: 38638116

Abstract

Alzheimer’s Disease is the most prevalent neurodegenerative disease, and is a leading cause of disability among the elderly. Eye movement behaviour demonstrates potential as a non-invasive biomarker for Alzheimer’s Disease, with changes detectable at an early stage after initial onset. This paper introduces a new publicly available dataset: EM-COGLOAD (available at https://osf.io/zjtdq/, DOI: 10.17605/OSF.IO/ZJTDQ). A dual-task paradigm was used to create effects of declined cognitive performance in 75 healthy adults as they carried out visual tracking tasks. Their eye movement was recorded, and time series classification of the extracted eye movement traces was explored using a range of deep learning techniques. The results of this showed that convolutional neural networks were able to achieve an accuracy of 87.5% when distinguishing between eye movement under low and high cognitive load, and 76% when distinguishing between the oldest and youngest age groups.

MSC: 0000, 1111

Keywords: Time series classification, Eye movement, Deep learning, Cognitive load, Age

Graphical Abstract

Highlights

•
EM-COGLOAD dataset released with eye state detection, eye centre localisation, and eye movement classification tasks.
•
Deep learning methods (CNNs and encoders) are able to detect the presence of cognitive load using eye movement data.
•
Deep learning methods (CNNs) are able to predict age groups from eye movement data.
•
Both saccadic and smooth pursuit eye movements indicate the presence of cognitive load using eye movement data.

1. Introduction

Alzheimer’s Disease (AD), first documented by Alois Alzheimer in 1906, is the most prevalent neurodegenerative disease (ND), accounting for 60–75% of dementia cases [1], [2]. AD progressively affects memory, cognitive faculties, and behaviour. Initially asymptomatic, progression is marked by cognitive and functional decline, with impairment and neuropsychiatric symptoms becoming increasingly severe as the disease progresses [3].

The diagnosis of AD involves a comprehensive evaluation of the individual, necessitating multiple steps [4] in a complex process, frequently requiring input from several healthcare professionals. The typical route to diagnosis begins with general practitioners (GPs), who are the first point of contact for patients and their families. GPs conduct initial clinical tests, and if cognitive decline is suspected, refer individuals to specialists, such as neurologists and psychiatrists [4]. Neurologists perform neurological examinations and neuroimaging, and assist in differential diagnosis - distinguishing AD from Parkinson’s Disease (PD), for example [4]. Psychiatrists carry out more in-depth cognitive or mental status testing, as well as reviewing patient history, often working closely with neurologists [4], [5]. Given the length of this process, and the fact that observable symptoms1 - such as personality changes - frequently only manifest long after the initial onset of the disease, timely diagnosis poses a challenge. The timely diagnosis of AD (and NDs generally) is an area in need of improvement as highlighted by the Institute of Medicine [6]. In particular, the early diagnosis of AD is linked to the potential attenuation of cognitive decline [7], which has significant impact on both the individual with AD and their families.

The current AD diagnosis approach is effective but time-consuming, involving input from trained professionals across various tests aimed at detecting skill or functional loss, which are challenging to detect in the early stages. The application of artificial intelligence (AI) for assessing cognitive function through eye movements potentially offers a user-friendly, passive evaluation method suitable for various settings like GP surgeries, clinics, or homes. Results might be instantly provided to clinicians in a clear and relatable format. Given the projected increase of global AD cases [9], there is a need for an assessment tool that can be used at the early stage, and across a large population.

In recent years, changes in eye movement have continued to show potential as a non-invasive biomarker for NDs, including for mild cognitive impairment (MCI), and AD. In addition to changes due to ND, eye movement characteristics also change across the human lifetime. For example, research has shown that the reaction time (RT) of ballistic eye movements, such as saccades, typically slows [10], [11], [12], [13], and that destination targeting becomes less accurate [14] with age, even among healthy older adults. Smooth pursuit eye movements (SPEM) also demonstrate an increased initiation RT [15], [16] and increased saccade frequency [16]. These age-related differences are further exacerbated by the presence of AD or MCI [17] on both pro- [18] and anti-saccade [18], [19] as well as SPEM [20], [21] tasks.

While eye movement analysis has been investigated for the purpose of distinguishing between normal cognitive function, MCI, and AD [22], [23], [24], [25], conventional eye tracking methods (typically employing head-mounted eye trackers) often involve a large amount of preprocessing, such as segmenting eye movement time series data into individual saccades or fixations and studying them in isolation. This paper instead focuses on the application of deep learning (DL) techniques to the analysis of long-sequence, non-periodic eye movement data (>30,000 frames per sample), with a specific emphasis on healthy and atypical ageing.

To achieve this, the authors’ conducted a study which recruited healthy individuals. These individuals had self-reported to be cognitively unimpaired, and had not been diagnosed with any of the exclusionary criteria (detailed in Appendix D). Participants performed a visual tracking task and a mental arithmetic task - while their eye movement was recorded with a single desk-mounted camera, as well as a short cognitive test measuring inhibitory control (the Simon task).

Previous research has shown that both working memory (WM) [26] capacity and cognitive load can have a significant effect on eye movement [27], [28], [29]. As such, two test conditions were created with the intention to alter cognitive load involved in each task: low cognitive load (LCL) and high cognitive load (HCL). In the HCL condition, participants carried out the LCL (visual tracking) task while simultaneously performing the mental arithmetic task. These varying cognitive load conditions were employed with the aim of (i) creating cognitive variability (using a dual-task approach similar to that undertaken in [30], [31]), and (ii) exploring their effect on eye movement and whether the proposed machine learning (ML) approach is sensitive enough to differentially detect such variability.

The purpose of this research was to validate the use of DL techniques in detecting eye movement changes resulting from a lack of cognitive resources. This would demonstrate potential for the early detection of MCI and AD, which is characterised by an overall reduction in WM [32], using DL. The application of DL techniques in this context may result in more objective (data-driven), and earlier interventions. This is of significant importance given that early diagnosis can influence the prognosis of the disease, with indications suggesting that the rate of cognitive decline may possibly be reduced [7]. As such, there is increasing demand for automated tests capable of determining cognitive ability - the research presented in [24] has been developed into a tool for simple and rapid cognitive assessment [33], with the aim of providing differential diagnoses of dementia. This tool offers the ability to obtain objective assessments, which could be adopted as a costeffective solution for clinical use. Our approach differs from that described in [24], in which the duration that users spent looking at specific regions of interest is measured and correlated to the cognitive score. In the methods used in this paper, the full eye movement trace is analysed and minimal preprocessing of the data is required, permitting the investigation of subtle and complex patterns in eye movement data which may elude traditional analytical methods.

A head-mounted alternative [34] to that presented in [33] aims to diagnose many different NDs, while [35] also employs eye tracking technologies but focuses on the diagnosis of Attention Deficit Hyperactivity Disorder (ADHD) and dyslexia.

Traditional assessments to diagnose dementia can often be stressful for the individual, and can also require travel to specialist clinics. By interacting with a visual display, rather than actively engaging with questioning or challenging tasks, the stress response of participants may be reduced. The eye movement tests described in this paper are 3–4 minutes long, and require minimal instructions. Given the reduced workload (compared to more traditional methods of assessment) of applying and analysing such tests, they could be used to obtain continuous data on cognitive function, potentially facilitating early intervention, as well as the evaluation of such care.

The remaining structure of this paper is as follows. The experimental procedure is described in detail in Section 2.1, while the datasets collected are detailed in Section 2.2. The method used to extract time series data from the dataset is outlined in Section 2.3, with the results of the eye state detection (ESD) and eye centre localisation (ECL) models detailed in Sections 3.1 and 3.2, respectively. Methods for the analysis of this time series data are described in Section 2.4, with corresponding results in Section 3.3. Results for the Simon task are detailed in Section 3.4. Finally, the study and results are discussed in Section 4, with concluding remarks given in Section 5.

2. Material and methods

This section describes the experimental protocol (Section 2.1), the gathered dataset (Section 2.2), the method for extraction of the eye movement time series data (Section 2.3), and the methods for analysing the data to distinguish between cognitive load conditions and age groups (Section 2.4).

2.1. Data capture experiment

The aim of this experiment was to elicit different types of eye movement under varying cognitive loads and capture images of the participants’ eyes as they watched the designed videos. By increasing the cognitive load associated with a task, we aim to emulate the challenges faced by those experiencing cognitive decline [30], [31], [36] - characterised by a reduction in working memory capacity. The images were used to train DL models for the purpose of extracting key features pertaining to eye movement. The context was to investigate the potential for ML/DL techniques to distinguish between cognitive decline in typical and atypical ageing. Section 2.1 details the experimental procedure (Section 2.1.1) and participant demographic information (Section 2.1.2). Further information regarding the specific constituent elements of Video 1 (V1) and Video 2 (V2) are given in Appendix A and Appendix B, respectively.

2.1.1. Experimental procedure

The experimental setup included a computer and associated peripheral accessories (mouse, keyboard etc); a forehead-chin rest to minimise participants’ head movement; a 27-inch monitor, positioned at a distance of 57 cm from the participant; and a high-speed camera (FLIR Grasshopper 3) set to record at a resolution of 400 × 960 at 160 frames per second, positioned at a distance of 40 cm from the participant. The experiment took place in a quiet, well-lit laboratory setting, with additional lighting also positioned above the monitor.

Upon arriving at the experiment location, participants reviewed the participant information sheet - which had been made available to them prior. All participants then signed a consent form and completed a demographics questionnaire, which gathered data relating to age group, gender, ethnicity, educational attainment, and exclusionary criteria. Participants were made aware of their right to withdraw from the experiment, and were given the opportunity to take as many breaks as necessary between each experimental stage.

The experiment required participants to watch V1 and V2 twice, under two different cognitive loads, and to complete the Simon task [37] (a short test investigating inhibitory control) under two conditions (using two or four colours), further described in Appendix C. The duration of the experiment, including responding to questionnaires, was approximately 60 minutes.

The Simon task is a well-established cognitive task used in psychological experiments to assess cognitive performance, particularly relating to attention and response inhibition [38], [39]. In our experiment, the Simon task results provide additional ground truth to age group as general trends of increasing RT and error rate with age have been widely reported when using the Simon task [38], [39], [40], [41]. Inclusion of the Simon task therefore serves in the validation of the DL models and interpretation of the results.

In the LCL condition, the cognitive demand was limited to that inherent to the primary visual tracking task. In the HCL condition, cognitive demand was increased by introducing a simultaneous secondary task [30], [31] (counting backwards in sevens).

The experiment was split into five stages: (i) watch V1, LCL condition; (ii) watch V2, LCL condition; (iii) complete the Simon task; (iv) watch V1, HCL condition; (v) watch V2, HCL condition. The videos were designed to elicit different types of eye movement. Prior research has shown differences in saccadic eye movement characteristics between older and younger adults [10], [11], [12], [13], [14], as well as healthy older adults and AD patients [17], [18], [19]. V1 drew inspiration from these studies, focusing solely on saccadic motion. Differences have also been noted between these same groups in SPEM [15], [16], [20], [21]. As such, V2 focused on SPEM and other patterns that required more dynamic eye movements, such as waveform tracking and figure-of-eight tests. Detailed breakdowns of the video contents are given in Appendix A and Appendix B.

The Simon task used visual stimuli, with coloured squares employed as the target stimuli (TS). The test was completed under two conditions: (i) with two colours and (ii) with four colours, where the four-colour condition placed a greater demand on working memory due to the additional rules that participants were required to remember [38]. A full breakdown of the Simon task methodology and gathered data is given in Appendix C. It is important to note that Simon task results can display considerable variability, with some older adults performing much better than others. Factors such as bilingualism, education [42], [43], or lifestyle [44], [45] may play a role in comparatively better performance on the Simon task at a given age, serving to preserve cognitive function.

The Simon task scores are evaluated in both the two and four colour conditions, as well as across ages, to gain an understanding of each participant’s cognitive ability. This is because this may affect the accuracy of the trained DL models in determining the age group and cognitive load condition of the individual (i.e. it may be harder to distinguish an older individual who scores very well on the Simon task condition from younger individuals in the eye movement tasks).

Additionally, there is evidence that fatigue has an effect on eye movement [46]. To accommodate for variations in fatigue levels, participants were invited to take part in a repeat of the experiment described in this section at a later date. This permitted data collection under different fatigue conditions given the likelihood that identical levels of fatigue would not be experienced in each experiment. This would also introduce intra-person variability to our dataset. As the repeated experiment was exactly the same, a minimum duration of two weeks between the first and repeated experiment was required. This ensured that the practice effect - which may otherwise result in participants exhibiting improved performance on both visual tracking and cognitive tasks - was mitigated as much as possible.

2.1.2. Participant breakdown

The focus of this pilot study was on healthy ageing, and so a comprehensive list of exclusionary criteria for participants was used. This is included in Appendix D. All participants were also required to have either normal or corrected to 20/20 vision.

Overall, 75 participants (50 male, 25 female) took part in the experiment, all of whom successfully completed the entire experiment (excluding the repetition). Participants were split into four distinct age groups: 18–29 (31); 30–44 (13); 45–59 (19); and 60+ (12). In terms of ethnicity, the majority of participants identified as White (53), while the remaining participants identified as Asian (16), Black (3), or preferred not to disclose (3). With respect to education attainment, the majority of participants had obtained a degree - split into undergraduate (21), postgraduate (30), or doctorates (13), with the remaining participants having attained secondary school education (7) or alternative qualifications (2).

Of the 75 participants, 30 (22 male, 8 female) repeated the experiment. Notably, participants who repeated the experiment generally tended to be younger: 18–29 (17); 30–44 (4); 45–59 (5); and 60+ (4). The majority of repeat participants identified as White (22), while the remaining participants identified as Asian (7) or preferred not to disclose (1) this information. The educational attainment of repeat participants was: secondary school (3), undergraduate degree (11), postgraduate degree (13), doctorate (2), or other (1).

2.2. Datasets

The eye movement under differing cognitive loads (EM-COGLOAD) dataset gathered during this experiment is publicly available at [47], and is divided into four constituent parts: (i) labelled images, detailing the location of key eye features; (ii) labelled images, classifying whether the image contains a blink, or an open eye; (iii) the Simon task results; and (iv) the complete eye movement traces. Information linking each participant to a corresponding age group is also provided. The ultimate purpose of this work and developed dataset is to investigate the effect of age and varying cognitive load on eye movement.

The images captured during the experiment make up the majority of files within the dataset. The image filenames denote the time the image was saved. 40,800 images were captured during the viewing of V1, while 33,920 were captured during V2, resulting in 149,440 images per participant per experiment, as participants watched each video twice. With 75 participants taking part, this represents a substantial amount of image data. These images are stored by participant within each participant folder. V1 and V2 under the LCL condition are stored in subfolders 0 and 1, respectively, and the images captured whilst watching V1 and V2 under the HCL condition are stored in subfolders 2 and 3 respectively.

To facilitate the training of ML models for ECL, the dataset contains hand labelled images identifying the location of the pixel coordinates for the pupil centres of the left (l_x, l_y) and right (r_x, r_y) eyes, as illustrated in Fig. 1. Given the frame rate of the camera, it was likely that consecutive images would be very similar. Thus during the labelling process every fourth image was extracted for labelling consideration and subsequently the structural similarity index measure (SSIM) [48] was used to compare and select the most different images using an experimentally determined threshold for manual labelling. This subset is composed of 13,813 images containing labelled pupil centres.

Fig. 1 — Example of a labelled image, detailing the location of key eye points, resulting in four features: (l_x,l_y) and (r_x,r_y) for the left and right pupil centres, respectively. Note that *left* and *right are* denoted from the perspective of the image viewer, not the participant.

In addition, another subset of 52,506 images were manually annotated with eye state regarding its openness. This consists of 27,033 images containing blinks. There is considerable individual variability in blinking behaviour: many participants did not fully close the eye during a blink, while others may have closed one eye more than the other, or not closed one eye at all. An example of blinking behaviour when the eye is not fully closed is shown in Fig. 2. As such, a blink is defined throughout this work as the point at which the centre of the pupil is no longer visible - thus the eye can be partially open during a blink.

Fig. 2 — An example of a partial blink cycle - illustrating the ambiguity in defining a blink. The first and last images in the sequence are defined as open eyes, while the interim images are classified as blink.

Blinks were identified by manually inspecting thumbnails of every fourth image to locate a blink. Once a blink was identified, images were examined on a frame-by-frame basis to determine when the pupil centre was obscured or revealed, thus defining the start and end point of the blink. All interim images were also classified as a blink. To permit the development of DL models for ESD, a further 25,473 images of open eyes in a range of positions (e.g. looking to the left, right, up, or down) and degrees of openness (e.g. partially or fully open) were also labelled.

In addition to image data and corresponding labels, both the raw and calculated Simon task results are also included in the dataset and stored within the participant folders in CSV files. The raw data contains three features: (i) pattern, the colour and location (congruent, incongruent, neutral) of the target stimulus (TS); (ii) correctness, whether the participant pressed the correct key in response; and (iii) latency, the RT of a correct response.

The calculated data contains twenty features in total - the participant number and age group alongside eighteen calculated features: error rate, mean latency, and latency range for congruent, incongruent, and neutral trials, under LCL and HCL.

Finally, the full eye movement traces for each participant and video are also included. This data is ordered consecutively and contains five features: the x and y locations of the left and right eyes (l_x, l_y, r_x, r_y), and a binary variable indicating whether or not the image contains a blink in every captured frame.

2.3. Extraction of eye movement trace

To analyse the eye movement of the participants, it was necessary to extract the location of the eye centres in each frame. Due to the apparatus used and the position of the camera, the dataset images are constrained to a narrow field of view focused on the eye region, a subset of which is shown in Fig. 3.

Fig. 3 — Examples of dataset images demonstrating intra-person variability, as well as the overall constrained nature of the images which results in the images broadly being very similar.

While inter-participant variation is quite minimal as a result of the experimental setup, differences arise due to the presence or not of glasses, as well as eye colour, ethnicity, specularities, highlights, and shadowing. Some participants also appear at more of an angle relative to the camera - which occurred when the participant did not keep their forehead in contact with the forehead bar while watching the video.

To take advantage of the large number of images captured, a DL approach was explored for the purpose of extracting eye centres. DL models, particularly convolutional neural networks (CNN)s, have shown remarkable capability to learn varied tasks from image data [49], and thus form the basis of the following investigations.

The extraction of eye centres necessitated a three-stage process composed of: (i) ESD; (ii) eye region detection; and finally (ii) ECL. Eye region detection was carried out as a precursor step to ECL to increase the resolution of the segmented eye region used as input to the DL models, without changing the overall size of the input image, as in [50]. Transfer learning techniques were employed using models pretrained on ImageNet [51] in each of these tasks. The pretrained models were extended by the addition of fully connected layers.

Training was a two-stage process. In the first stage, the pretrained weights were frozen, and only the added fully-connected layers were trained until the model converged. In the second stage of training, the weights of the entire model were unfrozen, including the pretrained layers. Training was carried out in this manner to prevent large updates to the pretrained model weights during backpropagation, which may result from poor initialisation of the final additional layers. Therefore, during the first stage of training the pretrained weights are used to assist in network initialisation, and in the second stage of training the model is still able to learn features that may be significantly different to the initial classification task it was originally trained on.

The pipeline used to extract the eye movement traces is shown in Fig. 4. It is composed of three stages: (i) ESD to identify blinks and remove these images from the remainder of the pipeline; (ii) individual eye region identification; and (iii) ECL.

The MobileNetV2 model was used for eye state detection, VGG16 was used for eye region identification, and InceptionV3 was used for ECL. All models were implemented using Keras [52] with a Tensorflow [53] backend. Results for a range of models tested are shown in Sections 3.1 and 3.2, for the ESD and ECL tasks, respectively.

During the ESD stage, all images are classified as either containing both eyes open, or a blink. For those identified as eyes open, eye region identification was carried out to locate the two patches within each image containing the eyes, and subsequently ECL to predict the pupil centres was carried out. An example of the extracted eye movement trace is shown in Figure E.12.

2.4. Determining cognitive load and age group from eye movement time series data using ML techniques

With all eye movement traces extracted according to the pipeline illustrated in Fig. 4, it becomes possible to use them to determine the cognitive load condition. This investigation was approached as a time series classification (TSC) task. The extracted eye movement time series from each video were analysed. The length of the time series is equivalent to the number of frames captured during the video. Thus the length of the V1 (saccadic motion) time series was 40,800 steps, while the length of the V2 (SPEM) time series was 33,920 steps. In each instance all four eye movement features (l_x, l_y, r_x, r_y) were used.

The objective was to classify eye movement traces as belonging to either the LCL or HCL condition. The data was split into a training set (80%) and testing set (20%). The training set was further subdivided into five folds for five-fold cross-validation, using four folds for training and the remaining fold for validation during each of the five iterations. Each participant appeared exclusively in either the training set, or the testing set. Additionally, during cross validation, participants appeared exclusively in either one of the training folds, or the validation fold. Prior to training, the eye movement traces were standardised such that they had a mean of zero and a standard deviation of one. Any sequences that were not of the requisite length were removed.

The method for determining age group from the extracted time series data was largely identical to that for cognitive load - with the same two conditions analysed, and the same four features and preprocessing steps used. However, the dataset was composed solely of participants from the youngest (18−29), and the oldest (60+) age groups, with participants once again appearing either exclusively in the training, validation, or testing sets. The objective was to classify participants as belonging either to the 18–29 or 60+ age group, thus determining the feasibility of estimating age from eye movement using DL techniques. This was to demonstrate the feasibility of discriminating between the two selected classes.

Previous research has demonstrated the efficacy of a range of models for the purpose of TSC [54], [55]. The models tested in this instance were fully convolutional networks (FCN), CNN, Inception, and an encoder. The architectures of the models are shown in Appendix F, with key hyperparameter choices also detailed. All models used the Adam optimiser and the binary cross-entropy loss function. Learning rates were determined heuristically through testing on the validation sets, and varied between models.

3. Results

The results for the ESD and ECL models are given in Section 3.1, and Section 3.2, respectively. In Section 3.3 the results of the best performing DL models for determining cognitive load condition and age group from the extracted eye movement time series are detailed. Following this, a brief exploration of the Simon task results using traditional statistical techniques is given in Section 3.4 to assist with result interpretation.

3.1. Eye state detection results

The results of the ESD models on the validation set before and after unfreezing the weights of core (pretrained) model are shown in Table 1, alongside the final results of the best performing models on the test set. Three variations in the final layer configurations were explored: (i) the output of the core model was flattened and fed into four successive fully-connected layers; (ii) a global average pooling layer (GAP), which fed into a fully-connected layer; and (iii) a GAP layer, and two fully-connected layers. These configurations are referred to in Table 1 as 0, 1, and 2, respectively.

Table 1.

Eye state detection results. Three different final layer configurations were explored for each model. Each of these configurations are identified by 0, 1, and 2, in the Layers column. TS 1 and TS 2 details the model accuracy after training stage one and two, respectively. The best layer configuration for each model was evaluated on the test set.

Model	Layers	TS 1	TS 2	Test Accuracy
Xception	0	99.79%	99.74%	-
	1	98.81%	99.74%	99.62%
	2	99.53%	99.74%	-
InceptionResnetV2	0	99.81%	99.81%	99.72%
	1	98.46%	99.76%	-
	2	99.63%	99.72%	-
MobileNetV2	0	99.61%	99.74%	-
	1	99.38%	99.80%	99.77%
	2	99.60%	99.77%	-

Model	Eye	Training Stage	e < = 0.025	e < = 0.05	e < = 0.1	e < = 0.25
Xception	L	1	72.7%	95.3%	99.8%	99.9%
	L	2	93.3%	99.9%	100%	100%
	R	1	73.8%	95.6%	99.6%	99.9%
	R	2	92.8%	99.2%	99.9%	99.9%
InceptionResnetV2	L	1	28.4%	76.1%	99.3%	100%
	L	2	97.6%	99.7%	99.9%	100%
	R	1	43.6%	84.2%	99.0%	99.9%
	R	2	97.9%	99.6%	99.9%	99.9%
MobileNetV2	L	1	75.7%	96.3%	100%	100%
	L	2	86.3%	98.4%	100%	100%
	R	1	75.8%	96.9%	99.8%	99.9%
	R	2	87.0%	98.7%	99.8%	99.9%

Model	Data	Validation Accuracy	Test Accuracy
CNN	Saccadic (V1)	94.7%	87.5%
	SPEM (V2)	90.1%	86.5%
FCN	Saccadic (V1)	78.2%	-
	SPEM (V2)	70.0%	-
Encoder	Saccadic (V1)	90.1%	82.5%
	SPEM (V2)	86.2%	83.7%
Inception	Saccadic (V1)	81.1%	82.5%
	SPEM (V2)	74.1%	73.0%

Layer	Output Shape
Input	40800 × 4
1D Convolution	40794 × 8
1D Average Pooling	13598 × 8
1D Convolution	13586 × 64
1D Average Pooling	4530 × 64
Flatten	289920
Fully Connected	1

Model	Data	Validation Accuracy	Test Accuracy
CNN	Saccadic (V1)	82.2%	76.2%
	SPEM (V2)	88.3%	76.9%
FCN	Saccadic (V1)	80.9%	-
	SPEM (V2)	73.1%	-
Encoder	Saccadic (V1)	76.1%	71.4%
	SPEM (V2)	73.2%	73.1%
Inception	Saccadic (V1)	46.4%	-
	SPEM (V2)	70.5%	-

			Age Group
		18-29	30-44	45-59	60 +
Two-colour condition	Congruent	417.1 ± 63.8	434.1 ± 73.1	468.9 ± 72.1	494.7 ± 101.7
	Neutral	434.1 ± 74.2	455.0 ± 48.9	476.5 ± 68.4	528.8 ± 87.1
	Incongruent	459.7 ± 67.8	473.3 ± 54.6	525.8 ± 74.7	546.7 ± 75.1
Four-colour condition	Congruent	562.3 ± 66.5	561.5 ± 49.2	593.9 ± 67.0	643.1 ± 95.3
	Neutral	561.4 ± 81.1	568.1 ± 45.7	625.0 ± 67.9	657.2 ± 74.3
	Incongruent	582.5 ± 76.8	592.8 ± 56.0	650.2 ± 71.7	684.0 ± 88.5

Time (ms)	Event (Gap)	Event (Overlap)
0	Fixation point appears	Fixation point appears
500	Fixation point changes colour	Fixation point changes colour
1000	Fixation point disappears	Target stimuli appears
1200	Target stimuli appears	Fixation point disappears
2000	-	Target stimuli disappears
2200	Target stimuli disappears	-
2300	-	End of trial
2500	End of trial	-

Time (ms)	Event
0	Target stimuli first appears at screen centre.
500-1500	Target stimuli steps and begins to ramp.
1500	Blank screen displays
1800	End of trial

Category	Description
Vision Problems	Cataracts, diabetic retinopathy, macular degeneration, glaucoma, decreased visual field, poor night vision, poor colour vision.
Hearing loss	Right/left ear, both ears
Cardiovascular
Stroke
Transient Ischemic Attack (TIA)
Fainting
Neurological	Brain tumour, dementia, migraine or recurrent headaches, multiple sclerosis, Parkinson’s Disease, Huntington’s Disease, peripheral neuropathy, seizures, vertigo, dizziness, serious head injury (i.e. loss of memory or consciousness)
Psychiatric Illness	Depression, bipolar disorder, anxiety disorder, schizophrenia
Attention deficient (hyperactivity) disorder

Layer	Output Shape
Input	(40800, 4)
1D Convolution	(40800, 128)
1D Max Pooling	(20400, 128)
1D Convolution	(20400, 256)
1D Max Pooling	(10200, 256)
1D Convolution	(10200, 512)
Attention	(10200, 256)
Fully Connected	(10200, 64)
Flatten	(652800)
Fully Connected	1

Layer or Module	Output Shape
Inception Module	(40800, 128)
Inception Module	(40800, 128)
Inception Module	(40800, 128)
Inception Module	(40800, 128)
Inception Module	(40800, 128)
Inception Module	(40800, 128)
Global Average Pooling	(128)
Fully Connected	1

Convolutional Blocks	Kernel Sizes
2	(7, 7)¹, (7, 9), (7, 13), (7, 17), (9, 13), (9, 17), (9, 19)
3	(3, 7, 9), (7, 7, 7), (7, 9, 13), (7, 9, 17), (9, 13, 17), (13, 17, 19)
4	(3, 7, 9, 13), (7, 9, 17, 19)

Convolutional Blocks	Kernel Sizes
2	(2, 4), (2, 8), (6, 12), (8, 16), (8, 64), (16, 64), (64, 128)
3	(2, 8, 16), (2, 8, 64), (6, 12, 24), (8, 16, 32)
4	(6, 12, 24, 48)

PERMALINK

EM-COGLOAD: An investigation into age and cognitive load detection using eye tracking and deep learning

Gabriella Miles

Melvyn Smith

Nancy Zook

Wenhao Zhang

Abstract

Graphical Abstract

Highlights

1. Introduction

2. Material and methods

2.1. Data capture experiment

2.1.1. Experimental procedure

2.1.2. Participant breakdown

2.2. Datasets

Fig. 1.

Fig. 2.

2.3. Extraction of eye movement trace

Fig. 3.

Fig. 4.

2.4. Determining cognitive load and age group from eye movement time series data using ML techniques

3. Results

3.1. Eye state detection results

Table 1.

3.2. Eye centre localisation results

Table 2.

Table 3.

3.3. Eye movement time series classification analysis

3.3.1. Determining cognitive load from eye movement time series results

Table 4.

Table 5.

3.3.2. Determining age group from eye movement time series results

Table 6.

Table 7.

Fig. 5.

Fig. 6.

3.4. Simon task results

Table 8.

Fig. 7.

Fig. 8.

4. Discussion

4.1. Pathway to clinical adoption

5. Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgements

Footnotes

Appendix A. V1

Figure A.9.

Table A.9.

Appendix B. V2

Figure B.10.

Table B.10.

Figure B.11.

Appendix C. Simon Task

Appendix D. Exclusionary Criteria

Table D.11.

Appendix E. Time Series Eye Movement Trace

Figure E.12.

Appendix F. Time Series Model Architectures

Table F.13.

Table F.12.

Table F.14.

Appendix G. Hyperparameter Optimisation for CNN Models

Table G.15.

Table G.16.

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases