Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 May 23.
Published in final edited form as: IEEE Trans Affect Comput. 2017 May 23;8(2):176–189. doi: 10.1109/TAFFC.2016.2582490

Cognitive Load Measurement in a Virtual Reality-based Driving System for Autism Intervention

Lian Zhang 1, Joshua Wade 2, Dayi Bian 3, Jing Fan 4, Amy Swanson 5, Amy Weitlauf 6, Zachary Warren 7, Nilanjan Sarkar 8
PMCID: PMC5614512  NIHMSID: NIHMS806802  PMID: 28966730

Abstract

Autism Spectrum Disorder (ASD) is a highly prevalent neurodevelopmental disorder with enormous individual and social cost. In this paper, a novel virtual reality (VR)-based driving system was introduced to teach driving skills to adolescents with ASD. This driving system is capable of gathering eye gaze, electroencephalography, and peripheral physiology data in addition to driving performance data. The objective of this paper is to fuse multimodal information to measure cognitive load during driving such that driving tasks can be individualized for optimal skill learning. Individualization of ASD intervention is an important criterion due to the spectrum nature of the disorder. Twenty adolescents with ASD participated in our study and the data collected were used for systematic feature extraction and classification of cognitive loads based on five well-known machine learning methods. Subsequently, three information fusion schemes—feature level fusion, decision level fusion and hybrid level fusion—were explored. Results indicate that multimodal information fusion can be used to measure cognitive load with high accuracy. Such a mechanism is essential since it will allow individualization of driving skill training based on cognitive load, which will facilitate acceptance of this driving system for clinical use and eventual commercialization.

Index Terms: Multi-modal recognition, cognitive models, physiological measures, virtual realities, Intelligent tutoring systems

1 Introduction

AUTISM spectrum disorder (ASD) is a neurodevelopmental syndrome characterized by deficits in social reciprocity and communication [1]. In the United States, the estimated prevalence of ASD is 1 in 68 [2]. Although some of the core deficits of ASD, including social skills, communication impairment, and repetitive behavior have been extensively studied in individuals with ASD [3, 4], far fewer studies have focused on meaningful skills related to adaptive adult independence, such as driving. The ability to drive oneself is a necessary skill in most American cities in order to live with minimal supports and maintain employment, two classic hallmarks of adulthood. Individuals with ASD have difficulty achieving these milestones, however, in part due to difficulty in learning and maintaining driving skills (e.g., correctly identifying road hazards) [5]. Because behavioral and educational interventions can positively impact the lives of individuals with ASD [6] yet may be difficult to access within community settings, the clinical application of computer related technology, especially virtual reality (VR), has been widely studied as an alternative therapy modality. VR technology can be used to create immersive, interactive, and realistic environments for behavioral learning. For the purpose of driver training, particularly for individuals with ASD, a VR-based intervention platform has multiple distinctive advantages, such as precise control of complex stimuli, individualizable treatment, and a structured and safe learning environment [7].

In this context, Reimer et al. and Classen et al. designed a set of driving scenarios for individuals with ASD using driving simulation tools [5, 8]. Teens with ASD were observed to make more driving errors as compared with typically developing (TD) peers [8]. When comparing TD and ASD groups, differences were found regarding gaze patterns and physiological signals, such as heart rate (HR) and skin conductance level (SCL) [5]. In an effort to further explore how best to utilize VR environment for driving training, we have developed a novel VR-based driving system aimed at training driving skills in adolescents with ASD [9].

Training efficiency may be improved by adjusting difficulty levels of the driving tasks. Flow Theory [10] can be useful to provide guidance regarding the design of different difficulty levels. For instance, Channel et al., applying flow theory, verified that different task difficulty levels correspond to different emotions [11]. Another approach is to employ cognitive load theory [12] to design a cognitively intelligent system, which can sense, analyze and respond to a user’s cognitive state has the potential to improve learning efficiency [13]. For example, Koenig et al. implemented a cognitively intelligent system to maximize the training efficiency in their rehabilitation environments [14]. We will develop the VR-based driving system into a cognitively intelligent system because cognitive load appears to be more appropriate in the context of driving and is commonly used in driving related applications [15]. This paper explores fusion of multimodal information from a novel VR-based driving system for cognitive load measurement, which is a necessary step before building a cognitively intelligent VR-based driving system.

1.1. Background

Cognitive load is a multidimensional construct representing the working load that is imposed on a learner’s cognitive system when performing a particular task [16]. Cognitive load is believed to be a crucial factor in learning of complex tasks [12], such as driving tasks. The capacity of working memory is limited and it varies from person to person. If a learning task requires too little or too much cognitive capacity, learning may be impeded [17]. Therefore it is important to design learning tasks that provide an appropriate level of cognitive load, which is neither too high nor too low [18].

Cognitive load theory is concerned with efficient usage of people’s limited working memory to acquire knowledge and skills. There are different types of cognitive load, such as intrinsic load and extraneous load [19]. Intrinsic load reflects the natural complexity of learning information and the expertise of a learner. Extraneous load is related to the design of instructions [12]. When task difficulty exceeds a learner’s expertise, additional extraneous load is generated and the required cognitive load exceeds the learner’s working memory capacity. When the learner’s expertise exceeds the task difficulty, the learner wastes time and energy to solve tasks that are too simple and therefore will not benefit from learning. Thus the task difficulty level should match a learner’s expertise in order to enable effective learning [20].

Compared to TD individuals, working memory of individuals with ASD may be different [21]. Individuals with ASD performed significantly worse than TD individuals on tasks related to working memory [22]. Remington et al. reported altered performance of individuals with ASD under different levels of cognitive load [23]. Individuals with ASD also have difficulty in understanding the mental states of their own and others [21]. Therefore, a targeted system that can automatically measure cognitive load of individuals with ASD and then optimize their cognitive load may have the potential to improve their learning efficiency [24].

1.2. Related Research

Real-time measurement of cognitive load in individuals with ASD is critical for a cognitively intelligent system. There are three general ways to measure cognitive load [25]: subjective scales, performance-based measures and physiology-based measures. Subjective scales are inappropriate in a cognitively intelligent system for ASD intervention because: 1) individuals with ASD may have difficulty in accurately reporting their own cognitive load [21], and 2) subjective scales are not real-time measures. We therefore explored measuring real-time cognitive load using information from eye gaze, electroencephalography (EEG), and peripheral physiology modalities along with a task performance modality.

Each of the above-mentioned modalities has been studied with regards to cognitive load measurement. It has been found that eye gaze signals are reflective of a user’s cognitive state [26]. Pupil dilation is known to quickly respond to changes in a person's cognitive workload [26]. EEG signals are sensitive and reliable for continuous memory load measurement [27]. Alpha and theta wavebands of EEG are correlated with task difficulty [28]. Peripheral physiological signals signals are also important components of cognitive load measurement [29]. Electrocardiogram (ECG), respiration (RSP), and HR were demonstrated to be sensitive to cognitive load in [5, 13]. Performance-based measurement is a typical way to measure cognitive load [12]. In terms of driving studies, performance metrics, such as steering wheel movements, lane-keeping behavior, speed control, and time-to-line crossing, have been found to be related to cognitive load [30].

In order to classify cognitive load using observed information, several well-known machine learning algorithms and different parameter values of these algorithms have been evaluated in TD populations. Hussain et al. tested k-nearest neighbor (KNN) with different k values in measuring cognitive load using face, physiology, and task performance data [31]. Different kernel functions of support vector machine (SVM), including linear kernel [31] and Gaussian kernel [32], were used in cognitive load measurement. Novak et al. analyzed linear discriminant analysis (LDA), diagonal LDA, and stepwise LDA to classify cognitive load [33]. Lin et al. explored backpropagation and radial basis functions to build artificial neural networks (ANN) in cognitive load measurement [34]. One of the most important parameters of building a decision tree is the splitting criterion [35]. Hussain et al. selected cross-entropy as the splitting criterion to build decision trees for cognitive load measurement [31]. However, the studies using machine learning algorithms in measuring cognitive load of individuals with ASD are limited. Lagun et al. showed that SVM can achieve higher classification accuracy than naïve Bayes and logistic regression when measuring cognitive load of individuals with ASD [36].

Fusing multimodal information to measure cognitive load has been explored in different applications. Novak et al. fused physiological and performance information for upper extremity rehabilitation [33]. Steichen et al. used eye gaze together with performance information for cognitive load measurement in visualization systems [37]. Son et al. estimated users’ cognitive workload using two spoken tasks by integrating performance and eye gaze information [30]. However, there is no study to our knowledge that has systematically studied fusing multimodal information to measure cognitive load of individuals with ASD during VR-based driving.

1.3. Current Work

This paper fuses multimodal information collected from a novel VR-based driving system for cognitive load measurement, which is a necessary step before building a cognitively intelligent VR-based driving system. We hypothesize that multimodal information can lead to a more accurate cognitive load measurement than single modality-based measurement approaches. This hypothesis is tested by comparing single modality information to multimodal information in cognitive load measurement with multiple well known machine learning algorithms using data collected during VR-based driving in adolescents with ASD.

The key contribution of this research is to design a cognitive load measurement technique for VR-based driving such that the driving difficulty can be adjusted for each individual based on their cognitive load, which will likely enhance learning. The ground truth of cognitive load used in this paper is based on perceived task difficulty as experienced by the individuals with ASD and is rated by an experienced clinically trained rater. This ground truth, as we have shown, correlates well with the driving performance of users, and thus provides a method to measure cognitive load that overcomes the difficulty associated with self-rating, which is problematic for individuals with ASD. Thus this paper contributes in the following aspects: 1) to analyze eye gaze, EEG, peripheral physiological and performance data in the context of VR-based driving, which is designed to provide a safe and flexible environment to teach driving skills to adolescents with ASD who often have deficits in this regard; 2) to extract useful features from these data that can be used to measure their cognitive load; and 3) to apply several machine learning algorithms for measuring cognitive load of a user as well as explore how multimodal information can be fused at different levels to yield highly accurate cognitive load measurement.

The paper is organized as follows. Section 2 describes our novel VR-based driving system, including system design and experimental setup. Section 3 lists the features extracted from four modalities. Section 4 presents the classification algorithms as well as three data fusion strategies for cognitive load measurement. The results are provided in Section 5 followed by a discussion in Section 6. Finally conclusions of the presented work and future research plans are discussed in Section 7.

2 VR-Based Driving System

2.1 System Design

A VR-based driving system was designed to train and improve the driving skills of adolescents with ASD. The three primary components of the VR-based driving system were: a driving simulator, a data capture module and a rating module, as shown in Fig. 1.

Fig. 1.

Fig. 1

The framework of VR-based driving system

Fig. 2 shows the driving simulator. A Logitech G27 steering wheel controller was used to control a virtual agent vehicle in the virtual driving environment. Models in the virtual driving environment, such as traffic lights, stop signs, and vehicles were developed with the modeling tools ESRI CityEngine (www.esri.com/cityengine) and Autodesk Maya (www.autodesk.com/maya). The game development platform Unity3D (www.unity3d.com) was used to implement the system logic. A total of six different difficulty levels, each level consisting of three driving assignments, were developed for the VR-based driving system.

Fig. 2.

Fig. 2

The driving simulator of the VR-based driving system

These difficulty levels were tested and validated in our previous works [9]. Control parameters (Table 1), such as speed of vehicles, responsiveness of the agent vehicles’ brake and accelerator, and weather conditions, were manipulated to produce a range of difficulties. Table 2 shows the values of these control parameters used in each designed difficulty level.

Table 1.

The control parameters of difficulty level

Label Description of parameter Domain
As Speed of autonomous vehicles As ∈ [0.85,1.75]
Aa Aggressiveness of autonomous vehicles Aa ∈ [1,1.5]
Hs Traffic light alert sound. Hs ∈ {Enabled,
Disabled}
Rb Responsiveness of the brake pedal. Rb ∈ [0.35,1]
Ra Responsiveness of the accelerator pe-
dal.
Ra ∈ [1,1.5]
Rs Responsiveness of the steering wheel. Rs ∈ [1,3.75]
W Weather condition. W ∈ {Sunny,
Overcast,Rainy}
L Intensity of light in the environment. L ∈ [0.01,0.5]
Nv Number of vehicles at intersections. Nv ∈ {1,2,…,5}
Sd Duration of time to permit driving on
sidewalk.
Sd ∈ [0.6,4]

Table 2.

The configuration of the designed difficult level

Level As Aa Hs Rb Ra Rs W L Nv Sd
1 0.85 1 Enabled 1 1 1 sunny 0.5 0 to 1 4
2 1 1 Disabled 1 1 1 sunny 0.466 1 to 2 3.35
3 1.35 1 Disabled 1 1 1 overcast 0.409 2 to 3 2.66
4 1.35 1 Disabled 1 1 1 sunny 0.329 2 to 3 1.97
5 1.35 1.35 Disabled 0.675 1.25 2.375 sunny 0.226 3 to 5 1.29
6 1.75 1.5 Disabled 0.35 1.5 3.75 rainy 0.01 3 to 5 0.6

The data capture module recorded a user’s multimodal information while the user was engaged in driving. A Tobii X120 remote eye tracker (www.tobii.com) logged the eye gaze data at 120 Hz. A Biopac MP150 (www.biopac.com) physiological data acquisition system wirelessly sampled multiple peripheral physiological signals, including ECG, electromyography (EMG), RSP, SKT, photoplethysmogram (PPG), and galvanic skin response (GSR). The PPG and GSR signals were measured from toes instead of fingers in order to reduce the motion artifact from driving. The SKT signal was collected from the upper arm. An Emotiv EPOC wireless EEG headset (www.emotiv.com) recorded 14-channel EEG signals. Metrics of the user’s performance was recorded within the virtual driving environment.

We designed a rating mechanism for a rater to observe and rate a user’s affective and cognitive state in real time. A live video with sound, recording of the user’s frontal face and the virtual driving environment, was displayed for the rater. The rater could also view the entire experimental environment via a one-way mirror from an adjacent room. The computer used by the rater was connected to the driving simulator and the data capture module via a local area network (LAN). The data of each system component were labeled according to timestamps of the driving simulator component in order to facilitate offline synchronization.

2.2 Experimental Setup

A total of 20 adolescents with ASD, from 13 to 18 years old, were involved in a series of six experimental sessions. The participants were recruited through an existing university based clinical research registry. Although the study was open to adolescents from both genders, the majority (19 out of 20) were male participants. ASD is much more common in males than in females [38] and we were not able to recruit more female participants. All participants had a clinical diagnosis of ASD from a licensed clinical psychologist. The Social Responsiveness Scale, second edition (SRS-2) was completed for each participant by his/her parent to quantify the severity of his/her ASD symptoms [39]. This study was approved by the Vanderbilt University Institutional Review Board (IRB). Table 3 shows detailed participants’ information.

Table 3.

The Participants’ Information

Gender
(%male)
Age (year) SRS-2 total raw
score
SRS-2 score
95% 15.29(1.66) 97.85(28.35) 75.45(10.23)

Each of the participants completed six sessions on different days. Each session lasted approximately one hour. Fig. 3 shows the experimental protocol of a session. The blocks with dashed lines represent experimental steps that are only parts of the first session. At the beginning of the first session, informed consent was obtained. A video tutorial regarding the VR-based driving system was then shown to a participant in the first session. Three researchers set up peripheral physiological and EEG sensors, and calibrated an eye tracker in the sensor application step. Before data recording, all signals were checked by the researchers to make sure all the sensors were placed correctly. Then, baseline data were collected for three minutes for the peripheral physiological and EEG signals in a silent environment. In the first session, after the baseline recording step, the participant took part in driving tasks in a free-form mode. Following this step, three pre-selected driving assignments were carried out. During the driving assignments, the researchers monitored the peripheral physiological and EEG signals in real time to ensure the quality of the recorded data.

Fig. 3.

Fig. 3

The experimental protocol of a session

The first and the last sessions acted as pre- and post-tests and included the same three driving assignments (i.e., one easy driving assignment and two difficult driving assignments). The pre- and post-tests were included in order to evaluate the system in improving a participant’ diving skills. However, we do not consider performance improvement from the pre-test to the pose-test in this paper. Each of the other four sessions were composed of three driving assignments from the same difficulty level, with the driving difficulty increasing from the second to the fifth sessions.

During the experiment, a rater rated a participant’s affective and cognitive states in real time using the rating mechanism described in section 2.1. The rater had extensive experience working with individuals with ASD at the Treatment and Research Institute for Autism Spectrum Disorders (TRIAD) at Vanderbilt University. The rater had been trained to utilize a rating system across a series of other works regarding human-computer interaction in ASD populations [40]. She was directly supervised by licensed clinical psychologists who specialized in ASD diagnosis and treatment. Five categories of rating were collected: perceived task difficulty level, engagement, enjoyment, boredom, and frustration. However, only the rating of perceived task difficulty is considered in this paper. The rater rated the perceived task difficulty experienced by the participant in a continuous interval from 0 to 9 using 5 as the threshold. Specifically, a task was rated with a value higher than 5 if it was perceived to be hard with larger value indicating higher perceived task difficulty, and vice versa. The continuous rating of perceived task difficulty was later mapped into binary classes offline using 5 as the threshold. That is, if a rating of perceived task difficulty had a value less than five, it was mapped into the low cognitive load class; otherwise, it belonged to the high cognitive load class.

We did not use the designed task difficulty as ground-truth for cognitive load considering that the cognitive load caused by the same task may vary: 1) from person to person, and 2) at different times for the same person [20]. We therefore utilized the rating of perceived task difficulty by a trained rater for the ground truth of cognitive load. It was assumed that a high rating of perceived task difficulty was indicative of a high cognitive load experienced by the participant [41].

3 Feature extraction

3.1 Eye Gaze Features

The eye tracker signals, recorded by the Tobii X120 eye tracker, were preprocessed in order to remove invalid data and reduce noise. If the time duration of continuous lost data was larger than 1000 ms, the lost data were removed. This long duration lost data were primarily attributed to the movement of a participant’s head beyond the eye tracker’s detection range. If the time duration of continuous lost data was less than 75ms, the lost data were filled in with valid data using a linear interpolation method [42]. 75ms was selected as the threshold because it is the minimum closure duration of a blink. Any lost eye gaze data with a duration less than 75ms was deemed to be due to noise. The noise in the eye gaze data was then reduced with a median filter.

After preprocessing, we extracted 4 basic eye gaze features guided by previous literature [26, 43]: blink, pupil diameter, fixation and saccade (Table 4). We then extracted 10 secondary eye gaze features from the basic features, i.e. blink rate, fixation rate, Mean and Standard Deviation (M and SD) of blink duration, M and SD of pupil diameter, M and SD of fixation duration, and M and SD of saccade duration.

Table 4.

Basic Eye Gaze Feature

Basic features Definition
Blink A rapid closing of eye with closure duration be-
tween 75 ms to 400 ms
Pupil diameter The pupil diameter, unit in mm
Fixation The eye gaze maintains in one point with a very
slow eye movement
Saccade A quick eye movement between two fixations.

3.2 EEG Features

We recorded EEG signals using the Emotiv EPOC neuroheadset. EEG signals were collected at 128 Hz from 14 channels at locations AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, and AF4 as defined by the international 10–20 system [44]. The reference sensors were placed at locations P3 and P4. The recorded EEG signals had bandwidth from 0.2 Hz to 45 Hz, covering five frequency bands, which are delta (frequency<4Hz), theta (4Hz<frequency<8Hz), alpha (8Hz<frequency<13Hz), beta (13Hz<frequency<30Hz), and gamma (frequency> 30 Hz). The theta, alpha, beta, and gamma frequency band activities have been reported to be sensitive for measuring cognitive load for ASD populations [45]. The delta band was less informative and was susceptible to movement artifacts during driving. Therefore, we excluded the delta band from feature extraction in this paper.

The raw EEG signals were preprocessed by removing the outliers, which were defined as the change between two adjacent data points >50 µv. Then, a low pass filter and a high pass filter were used to remove the noise with frequency larger than 45Hz and less than 0.2Hz. After filtering, data were chopped into 1s epoch and those with poor contact quality were rejected. Eye blink, eye movement, and muscle movement artifacts were removed with an EOG-EMG artifact correction algorithm [46].

The power spectral density was the feature that best reflected the changes of the EEG activities and therefore was utilized in the previous literature for cognitive load measurement [47]. In this paper, power spectral density variables of theta, alpha, beta, and gamma bands, were extracted from the preprocessed signals in each channel, resulting in a total number of 56 features (14 channels × 4 wave bands = 56 features).

3.3 Peripheral Physiological Features

We used the Biopac MP150 and recorded ECG, EMG, RSP, SKT, PPG, and GSR signals with a 1000Hz sampling frequency. The EMG signals were recorded from Corrugator, Zygomaticus, and Trpezius muscles. These peripheral physiological signals were preprocessed offline with three steps: 1) outlier removal; 2) noise reduction with filters; and 3) subsampling. The details about how to analyze the peripheral physiological data can be found in [48]. In the first step, very small and very large outliers in each peripheral physiological signal were removed separately. Then, the noise was reduced using a low pass filter, a high pass filter, and a notch filter. The slowly changing signals, SKT, RSP, and GSR, were subsampled to reduce computation. The subsampling equation used with k=10 was: xsubsample[n] = xinitial [kn]. We identified 60 features from peripheral physiological signals as shown in Table 5.

Table 5.

The Peripheral Physiological Features and Their Descriptions

signal basic features description
PPG Amplitude of peak
values (M and SD)
The amplitude of the detected pulse
Pulse Transit Time
(M and SD)
The width of the detected pulse
GSR Tonic activity level
(M and SD)
Tonic level of electrical conductivity
of skin
Slope of tonic activity The change of the tonic per second
Amplitude phasic ac-
tivity (M and SD)
The amplitude of detected skin con-
ductance response (SCR) peak
Rate of phasic activ-
ity
The number of the detected SCR
peak per second
Rise time (M and SD) Temporal interval between SCR initi-
ation and SCR peak
Recovery time
(M and SD)
Temporal interval between SCR peak
and point of 50% recovery of SCR
amplitude
EMG EMG activity
(M and SD)
One of the EMG signal
Slope of activities The slow change of one EMG signal
per minute
burst activities fre-
quency
Number of EMG burst peak per mi-
nute
The burst activities
(M and SD)
The time duration of EMG burst peak
Activity frequency
(M and SD)
The frequency of one EMG signal
Amplitude of burst
activities
The amplitude of the detected EMG
burst peak
RSP Amplitude
(M and SD)
The amplitude of the detected breath
peak
Subband spectral en-
tropy
The spectral entropy in three subband
0.003–0.04Hz, 0.04–0.15Hz, and
0.15–0.4Hz
Minimum and maxi-
mum difference
The difference between the minimum
and the maximum amplitude of de-
tected breath peak
peak frequency The number of the detected breath
peak per minute
Power spectrum den-
sity of low power
The power of low-frequency compo-
nent (0.04–0.15Hz)
Power spectrum den-
sity of high power
The power of high-frequency compo-
nent (0.15–0.4Hz)
The first order differ-
ence
The output of the first-order differ-
ence equation
Poincare plot geome-
try SD1
The variance corresponding to short-
term breathing rate variability.
Poincare plot geome-
try SD2 (M and SD)
The variance corresponding to long-
term breathing rate variability.
Peak valley magni-
tude (M and SD)
The magnitude between the peak and
the valley
Respiratory rate
(M and SD)
The number of breaths per minute
SKT Temperature
(M and SD)
Peripheral temperature united in de-
gree centigrade
Slope of temperature The change of the temperature per
second

3.3 Performance Features

In the descirbed work, participants’ performance data were recorded through their driving behavior and task performance. The driving behavior included how a participant used the brake and accelerator during driving. The task performance indicated how well a participant completed a task, such as how many times he/she failed in one assignment and the driving score he/she achieved during one driving assignment. All the performance features and their descriptions are listed in Table 6.

Table 6.

Performance Features and Their Description

Features Description
Brake
(M and SD)
The level of using brake. A value between 0 and 1.
0 means no brake. 1 means full brake.
Accelerator
(M and SD)
The level of using accelerator. A value between 0
and 1. 0 means no acceleration. 1 means full accel-
eration.
Failure times The number of driving failures
Driving score Number of points achieved during one assignment

4 Classification and Data Fusion Method

4.1 The classification algorithm

We applied five well-known classification algorithms: SVM [49], KNN [50], decision tree [51], discriminant analysis [52], and ANN [53], to classify the cognitive load from recorded data. Because the accuracy of each machine learning algorithm depended on its key parameter [54], we tested each machine learning algorithm with a variety of parameter values. Table 7 summarizes the evaluated classification algorithms and specifies their parameter values used for cognitive load measurement in this paper. Regarding SVM, the value of the C parameter was 1 and the size of the radial basis function was also 1, which are used in this paper because they resulted in high accuracy in our previous work [40]. In terms of ANN, the value of the calculated number of hidden neurons is given by (Nf + Nc) / 2, where Nf is the input feature number and Nc is the output class number.

Table 7.

The List of Classification Algorithms Used to Measure Cognitive Load

Classifier
Index
Algo-
rithm
Parameters and their values
1 SVM Linear kernel
2 Quadratic kernel
3 Polynomial kernel of degree 3
4 Gaussian radial basis function kernel
5 KNN Euclidean distance and k=1
6 Euclidean distance and k=3
7 Euclidean distance and k=5
8 Covariance distance and k=1
9 Covariance distance and k=3
10 Covariance distance and k=5
11 Cosine distance and k=1
12 Cosine distance and k=3
13 Cosine distance and k=5
14 Decision
Tree
Gini’s diversity index as split criterions
15 Deviance as split criterions
16 Twoing as split criterions
17 Discrimi-
nant anal-
ysis
Linear discriminant analysis
18 Quadratic discriminant analysis
19 ANN Conjugate gradient backpropagation and
with 10 hidden neurons
20 RPROP algorithm and with 10 hidden
neurons
21 Marquardt algorithm and with 10 hidden
neurons
22 Conjugate gradient backpropagation and
with calculated number of hidden neurons
23 RPROP algorithm and with calculated
number of hidden neurons
24 Marquardt algorithm and with calculated
number of hidden neurons

4.2 Data Fusion Methods

In general, multimodal information can be fused in different levels: feature level fusion, decision level fusion, and hybrid level fusion [55]. Feature level fusion is easy to use but is not robust if information of some modalities is lost. Decision level fusion is a more robust method that combines the sub-decision of each modality [56]. The disadvantage of decision level fusion is its failure to reflect the correlation between features of different modalities [57]. Hybrid level fusion methods seek to combine the advantages of feature level fusion and decision level fusion [55]. However, it is not clear which level of fusion gives the highest accuracy in cognitive load measurement with eye gaze, EEG, peripheral physiological, and performance data in the VR-based driving system. We therefore compared these three fusion levels in fusing multimodal information in cognitive load measurement. The frameworks of the three level fusion techniques are shown in Fig. 4.

Fig. 4.

Fig. 4

(a) Feature level fusion framework; (b) Decision level fusion framework; and (c) Hybrid level fusion framework

Fig. 4 (a) shows the framework of feature level fusion. The input to the feature level fusion is a feature vector, which is composed of features from eye gaze modality (Eye), EEG modality (EEG), peripheral physiology modality (Phy), and performance modality (Per). In the preprocessing module, each feature of the feature vector is first normalized into a range from 0 to 1. Then, the dimension of the feature vector is reduced with principal component analysis. A classifier takes the feature vector after preprocessing as input and outputs a level of cognitive load (CL).

Fig. 4 (b) shows the framework of decision level fusion. Each of the four modalities yields a feature vector. Each feature vector is preprocessed in a preprocessing module as discussed in feature level fusion. Because the dimensions of the feature vectors extracted from eye gaze and peripheral physiologyl modalities are small, dimension reduction is not needed for these feature vectors. After preprocessing, each feature vector is input into a classifier, which outputs a level of cognitive load as a sub-decision. The fusion module calculates the final decision based on the weighted average of the four sub-decisions (1). The weighted average, y, is a function of a sub-decision vector, D, and a weight vector, W, (2). The elements of the sub-decision vector are four sub-decisions D = (d1,d2,d3,d4). Each sub-decision is an output of a binary classifier and therefore its value can be either 0 (meaning a low level of cognitive load) or 1 (meaning a high level of cognitive load). The elements of the weight vector are four weights, W = (w1, w2, w3, w4). Each weight is in the range [0,1] and the sum of all four weights is 1.

dfinal={0,y<0.51,y0.5 (1)
y=WDT=i=14widi (2)

The final decision depends on the weight vector. The weight vector that produces the highest accuracy of decision level fusion is the optimal weight vector, which is usually found by exhaustive search. For example, Koelstra et al. incremented each weight of a two-dimensional weight vector from 0 to 1 by 0.01 in order to find an optimal weight vector for emotion recognition using EEG and peripheral physiological signals [56]. However, the exhaustive search method is computationally expensive for decision level fusion with a high-dimensional weight vector. For example, a decision level fusion with a four-dimensional weight vector using exhaustive search with 0.01 step width needs to evaluate 106 weight vectors in order to find the optimal one. Because that the sub-decisions of our decision level fusion were binary data, the search space of weight vectors can be reduced. We present a new approach that allows finding an optimal weight vector from a small number of weight vectors and thereby reducing computational load significantly. We prove that the optimal weight vector can be found from the small number of selected weight vectors.

Lemma 1

A small number of weight vectors can yield the optimal one for a decision level fusion with four binary sub-decisions.

Proof

We define a small number of sets of weight vectors that cover the optimal one for the decision level fusion (Step 1). We prove that all weight vectors of each set yield the same final decision (Step 2).

Step 1: The universal set of weight vectors can be presented as (3).

U={(w1,w2,w3,w4)[0,1]4|i=14wi=1} (3)

First, based on whether wmax (the maximum weight of a weight vector in U (5)) is greater than, equal to, or less than 0.5, the universal set U can be partitioned into three disjoint subsets: O, P, and Q, respectively. If a weight vector is a member of the subset O as defined by (6), the final decision is determined by the sub-decision associated with the maximum weight of the weight vector. This condition is discussed separately in the results section as the single modality classification. If a weight vector is a member of the subset P as defined by (7), the decision level fusion produces a low accuracy because of the possible boundary condition, y = 0.5. We, therefore, excluded the subset P for the decision level fusion in this paper. The subset Q as defined by (8) is a collection of weight vectors, where the weights are each less than 0.5.

U=OPQ (4)
ωmax=max(ω1,ω2,ω3,ω4) (5)
O={(w1,w2,w3,w4)U|wmax>0.5} (6)
P={(w1,w2,w3,w4)U|wmax=0.5} (7)
Q={(w1,w2,w3,w4)U|wmax<0.5} (8)

Second, based on whether wmax + wmin (the sum of the maximum weight of a weight vector from (5) and the minimum weight of the weight vector from (10)) is greater than, equal to, or less than 0.5, set Q can be partitioned into three disjoint subsets as shown by (9). We excluded QC for the decision level fusion in this paper. If a weight vector is a member of QC, the decision level fusion produces a low accuracy due to the possible boundary condition, y = 0.5

Q=QAQBQC (9)
ωmin=min(ω1,ω2,ω3,ω4) (10)
QA={(w1,w2,w3,w4)Q|wmax+wmin>0.5} (11)
QB={(w1,w2,w3,w4)Q|wmax+wmin<0.5} (12)
QC={(w1,w2,w3,w4)Q|wmax+wmin=0.5} (13)

Third, the set QA can be further partitioned into four subsets according to the index of the maximum weight of a weight vector in QA, i.e. QA1 = {(w1, w2, w3, w4) ∈ QA | w1 = wmax},…, and QA4 = {(w1, w2, w3, w4) ∈ QA | w4 = wmax}. The maximum weight of the weight vector in QA is unique. This can be shown by the fact that a weight vector with more than one maximum weights will result in an invalid sum of the vector’s elements: w1 + w2 + w3 + w4 ≥ 2wmax + 2wmin > 1. Therefore, these subsets of QA are disjoint sets.

Fourth, the set QB can be further partitioned into four subsets according to the index of the minimum weight of a weight vector in QB, i.e. QB1 = {(w1, w2, w3, w4) ∈ QB | w1 = wmin},…, and QB4 = {(w1, w2, w3, w4) ∈ QB | w4 = wmin}. It is easy to prove that the subsets of QB are disjoint sets. These eight disjoint sets, QA1,QA2,…,QB4, were considered in this paper for decision level fusion.

Step 2: We prove that, within each of these eight subsets, the final decision of the decision level fusion is independent of the choice of the weight vector. We prove this assertion using the subset, QA1 (14) as an example.

QA1={(w1,w2,w3,w4)Q|w1=wmax,wmax+wmin>0.5} (14)

The value of an element of a sub-decision vector is 0 or 1 and, therefore, there are a total of 24 = 16 sub-decision vectors. Any sub-decision vector belongs to one of the four cases shown below by (15) to (18).

Case 1:d1=1,dk=1,k=2,3,4 (15)
Case 2:d1=1,dk=0,k=2,3,4 (16)
Case 3:d1=0,dk=0,k=2,3,4 (17)
Case 4:d1=0,dk=1,k=2,3,4 (18)

The final decision associated with weight vectors in QA1 is shown in Table 8. As can be seen, if a weight vector is in QA1, the final decision is dependent on the sub-decision vector, but is independent of the weight vector. Therefore, a weight vector of QA1 can represent all its weight vectors. It follows, then, that we can prove the assertion for any of the 8 subsets.

Table 8.

The values of final decision when sub-decision in different cases

W d y dfinal
QA1 Case 1
i=14widiw1+wmin>0.5
1
QA1 Case 2
i=14widi=w1<0.5
0
QA1 Case 3
i=14widi(1(w1+wmin))<0.5
0
QA1 Case 4
i=14widi=1w1>0.5
1

In conclusion, we defined a small number of subsets that cover the optimal weight vector for the decision level fusion. We proved that all weight vectors of a subset yielded the same final decision. Therefore, we can find the optimal weight vector by, 1) randomly selecting a weight vector from each of the eight subsets; and 2) computing and comparing accuracies of the decision level fusion with these eight weight vectors. The weight vector that yields the highest accuracy is the optimal one. Thus a small number of weight vectors can yield the optimal one for a decision level fusion with four binary sub-decisions and hence proves the lemma 1.

Fig. 4 (c) shows one instance of hybrid level fusion. Hybrid level fusion combines the processes of the feature level fusion and decision level fusion. The feature fusion module takes multimodal features (Eye and EEG in Fig. 4 (c)) as input and outputs a level of cognitive load as a sub-decision. Each of other sub-decisions is calculated by inputting the feature vector of one modality into a classifier. The final decision of hybrid level fusion is the weighted average of all sub-decisions.

We have calculated results of the hybrid level fusion with two sub-decisions and with three sub-decisions. Hybrid level fusion with one sub-decision is equivalent to feature level fusion; while hybrid level fusion with four sub-decisions is equivalent to decision level fusion. All the possible combinations of different modalities’ features were tested for the feature fusion module of the hybrid level fusion, which is listed in Section 5.3.

5 Results

Each participant completed six experimental sessions. Each session included 3 driving assignments. A binary cognitive load label (i.e., 0 or 1) was assigned to each driving assignment. Each driving assignment yielded one data sample. A total of 360 data samples were extracted (20 participants × 6 senssions × 3 assignments = 360 samples). However, because of data loss during the experiment, mostly due to the movement of participants, 74 bad data samples were removed after preprocessing. Ultimately, 286 data samples were included for the data analysis.

K-fold cross validation was selected to evaluate classification results. Usually, 5- to 10-fold cross validation is used in the literature to compute classification accuracy. In this paper, 5-fold cross validation was selected so that enough test data were included for validation. We ran the 5-fold cross validation 10 times and averaged their results as the final accuracy in order to make the result more robust.

5.1. Analysis of rating of perceived task difficulty

Fig. 5 depicts a histogram of ratings of perceived task difficulty for data analysis (M = 5.28, SD = 1.39). As can be seen, a large portion of the ratings of perceived task difficulty lie around 5, which means a majority of the driving tasks were perceived at medium difficulty level by the participants. This distribution fits our goal of training driving skills of adolescents with ASD because very easy or very hard tasks are not conducive to train driving skills. 57.34% of all the assignments were labeled as high level cognitive load, while 42.66% data samples have low cognitive load labels. For the almost balanced data, an accuracy was used to evaluate performance of classification models.

Fig. 5.

Fig. 5

The histogram of the rating of perceived task difficulty

Performance is an implicit estimation of cognitive load [58]. Performance features, such as reaction time [59] and success frequency [60], were previously utilized to evaluate the ground truth of cognitive load. In this paper, the correlation between ratings of perceived task difficulty and driving scores (a driving score is a performance feature, as shown in Table 6, and represents the success frequency in an assignment) was tested. The statistical analysis method, Spearman rank correlation, was selected because the driving score was an ordinal variable [61]. There was a strong negative correlation between the driving score and the rating of perceived task difficulty, ρ(284)=−0.62, p < 0.01. No correlation between the driving score and the designed difficulty level was found, ρ(284)=0.06, p=0.93, from the experimental data. Because performance is an indicator of cognitive load [58], these correlation results support that the rating of perceived task difficulty was a more reliable ground truth for cognitive load as compared to the designed difficulty level.

5.2. Feature level fusion and single modality classification

The first hypothesis of this paper is that by combining multimodal information, the accuracy of cognitive load measurement will increase. The hypothesis is tested by comparing multimodal information to each single modality information in cognitive load measurement with several classifiers. The choice of a classifier is data dependent [49]. Thus, a classifier may be insufficient to show the impact of different datasets in cognitive load measurement. We selected several commonly used classifiers, shown in Table 7, for cognitive load measurement. All classifiers were used in feature level fusion and each single modality classification. Their accuracies are shown in Table 9 for the purpose of comparison. The best accuracy of feature level fusion and the best accuracy of each single modality classification are shown in bold font type. The average accuracy of feature level fusion and the average accuracy of each single modality classification are shown at the bottom of the table. The best accuracy of feature level fusion, 84.43%, is higher than the best accuracy of each single modality classification. On average, feature level fusion also achieved a higher accuracy compared to each single modality classification. The accuracy of feature level fusion was statistically significantly higher than the accuracy of each single modality classification, i.e. the accuracy of the eye gaze based classification (Z = −4.88, p < .001), the accuracy of the EEG based classification (Z = −1.97, p < .05), the accuracy of the peripheral physiological information based classification (Z = −2.96, p < .05), and the accuracy of the performance based classification (Z = −4.61, p < .001). These results suggest that combining multimodal information has the ability to increase the accuracy of cognitive load measurement.

Table 9.

ACCURACIES OF ALL ALGORITHMS/PARAMETERS (%)

Classifier
Index
Eye EEG Phy Per Fusion
1 58.45 64.10 65.52 69.77 73.66
2 60.15 69.08 67.04 68.66 73.65
3 63.92 72.93 71.21 61.45 78.13
4 73.16 78.18 70.58 65.44 81.53
5 72.83 79.33 78.77 64.11 82.80
6 67.99 76.64 73.00 60.46 78.48
7 62.17 76.46 73.36 59.47 77.94
8 70.94 79.96 79.31 63.75 84.43
9 68.73 76.35 75.76 62.79 79.27
10 61.89 77.29 74.92 63.08 79.34
11 71.27 79.62 78.71 65.19 81.77
12 66.16 77.63 73.90 61.63 79.56
13 63.60 77.52 74.98 60.78 78.44
14 59.17 63.50 56.97 68.39 64.15
15 59.91 62.15 57.99 62.50 62.70
16 58.33 63.62 57.19 61.34 62.21
17 59.79 70.06 62.77 70.53 74.32
18 68.72 63.52 73.86 67.37 80.89
19 66.15 70.71 65.18 59.08 72.21
20 54.47 70.16 66.68 66.29 75.26
21 53.50 63.66 57.62 69.16 67.18
22 64.19 72.18 60.14 66.64 69.79
23 55.53 67.12 58.56 70.68 73.24
24 58.09 65.64 62.08 73.72 69.17
AVG 63.30 71.56 68.17 65.09 75.01

5.3. Decision level fusion and Hybrid level fusion

The final decision of decision level fusion is a weighted average of four sub-decisions as discussed in Section 4.2. All classifiers listed in Table 7 were used for each of the four sub-decisions in the decision level fusion. Each possible weight set (described in section 4.2) was tested for the weighted average in decision level fusion. The highest observed accuracy of decision level fusion was 81.48%.

There are two types of hybrid level fusion for our data: hybrid level fusion with three sub-decisions and hybrid level fusion with two sub-decisions as discussed in Section 4.2. The best accuracies for these two types of hybrid level fusion are shown in Table 10 and Table 11, respectively. In both tables, the column Classifier indicates the Classifier index in Table 7 that gives the best accuracy when classifying cognitive load using the corresponding features. The highest accuracy of hybrid level fusion was 83.42%.

TABLE 10.

Accuracies of Hybrid Level Fusion with Three Sub-decisions

Sub-decision 1 Sub-decision 2 Sub-decision 3 Accuracy
Features Classifier Features Classifier Features Classifier
performance 24 physiological 8 EEG & Eye gaze 4 81.35%
performance 24 physiological & EEG 8 Eye gaze 4 80.89%
performance & EEG 18 Physiological 8 Eye gaze 4 73.84%
performance 24 EEG 8 physiological & Eye gaze 8 80.57%
performance & physiological 24 EEG 8 Eye gaze 4 79.10%
performance & Eye gaze 8/4 physiological 8 EEG 8 79.46%

Table 11.

Accuracies of Hybrid Level Fusion with Two Sub-decisions

Sub-decision 1 Sub-decision 2 Accuracy

Features Classifier Features Classifier
EEG 8 performance & Eye gaze & physiological 8 83.00%
Eye gaze 4 performance & EEG & physiological 8 83.42%
physiological 8 performance & Eye gaze & EEG 8 81.52%
performance 24 physiological & EEG & Eye gaze 8 82.86%

6 Discussion

6.1 Feature level fusion and single modality classification

We found that feature level fusion performed better than all single modality classifications in cognitive load measurement indicated by statistical tests results, their best accuracies, and average accuracies. There are several existing studies that use multimodal information to measure cognitive load [32, 33]. We cannot compare the numerical results of our study with the numerical results of these studies because of differences in experimental design, populations, and measured signals. We can, however, compare our study with the existing studies to understand the effect of multimodal information in cognitive load measurement. Son et al. collected three modalities of information - physiological, gaze, and performance information, for cognitive load measurement in a driving simulator [32]. In their study, the best accuracy using the three-modality information was higher than the best accuracy using each single modality for cognitive load measurement. In an adaptive upper extremity rehabilitation task, Novak et al. showed that measuring cognitive load with physiological signals and task performance together can produce higher accuracy than using task performance or physiological signals, separately [33]. In a mental arithmetic task, Hussain et al. found that multimodal fusion could increase the accuracy of cognitive load measurement when no affective interference was involved [31]. While these studies were not designed for individuals with ASD, our results are in line with the existing results in cognitive load measurement using multimodal information for TD individuals. To the best of our knowledge, no study fused multimodal information to measure cognitive load of individuals with ASD.

6.2. Decision level fusion and Hybrid level fusion

We investigated the following research question in this paper: which level of multimodal fusion can give the best accuracy in cognitive load measurement? In order to answer this question, we compared the best accuracies that can be achieved using different levels of fusion, including feature level fusion, decision level fusion, and hybrid level fusion, in cognitive load measurement. Table 12 summarizes the best accuracies of the three multimodal fusion levels and shows that feature level fusion outperforms all other multimodal fusion levels in cognitive load measurement. Referring to previous literature in multimodal fusion, feature level fusion can achieve higher accuracies than decision level fusion due to the fact that feature level fusion utilizes the correlation among features from different modalities [55]. In our case, the effect of the correlation among features from different modalities can be seen from the best accuracy of hybrid level fusion with three sub-decisions. The best accuracy of hybrid level fusion with the three sub-decisions was achieved when eye gaze and EEG features were combined for one sub-decision, shown in Table 10. The correlation between eye gaze and EEG signals are significant [62]. The instance of hybrid level fusion utilizing this correlation achieved a higher accuracy than those that did not use this correlation.

Table 12.

Comparison Between Different Levels of Fusion

Feature level
fusion
Decision
level fusion
Hybrid level
fusion
Best accuracy 84.43% 81.48% 83.42%

7 Conclusions and Future Research

7.1. Conclusions

ASD is a highly prevalent neurodevelopmental disorder. A novel VR-based driving system was presented for ASD intervention that could present driving scenarios with variable task difficulties to facilitate individualized learning. The primary contribution of this paper is to systematically present the cognitive load measurements of individuals with ASD based on their eye gaze, EEG, peripheral physiology and performance data collected when they used the VR-based driving system, and to provide multimodal fusion schemes to more accurately measure cognitive load of these users. Feature level, decision level and hybrid level fusions demonstrate how multimodal information can be fused to measure cognitive load with increased accuracy. The model development for cognitive load measurement in this paper is aimed at building a cognitively intelligent VR-based driving system. In the future, the difficulty level of driving tasks will be adjusted in the cognitively intelligent VR-based driving system based on the research findings.

Our study has two distinct strengths that indicate the commercial viability of this system. First, it was tested with the intended target population, i.e., adolescents with ASD. Thus the system was acceptable and engaging to the target users. Second, the users used the driving simulator in a naturalistic way – they moved frequently and used it like a video game. As a result, the data was noisy and lost on occasions. Even then the cognitive load analysis was robust enough to predict cognitive load with a high accuracy in the presence of lost and noisy data. Thus we believe that this system will be commercially viable once cognitive load measurement mechanism presented in this paper is integrated with the rest of the system.

7.2. Limitations and Future Research Directions

There are several limitations of this research that need to be addressed in future work.

First, we lost a relatively large quantity of data (20.56 % of all the data). The data were lost primarily due to participants’ movements during driving, which was inevitable in the VR-based driving system aiming at training driving skills of adolescents with ASD in naturalistic conditions. One possible solution is detecting the valid data in real time in the cognitively intelligent VR-based driving system. If insufficient data for feedback is detected, the experiment could be extended in order to get more data.

Second, no multiple-class classification was analyzed in this paper. We attempted binary classification as a starting point because it was simpler and in many cases, sufficient. However, in a more complex system, multi-class classification may yield richer results and should be investigated in the future.

Third, the data fusion method used in this paper was limited. We combined sub-decisions for the final decision using a weighted average method. It is possible to use other methods to combine the sub-decisions, such as majority voting and classification algorithms [55]. We plan to explore different methods to combine sub-decisions in the future.

Finally, predefined, rather than randomized, difficulty levels were used in our experiments. Presenting randomized difficulty levels would be a better strategy for ultimately deciphering and analyzing potential confounds associated with task difficulty level. We chose to present increasing difficulty levels in this initial pilot study in order to match a participant’s expected skill increase with the higher levels of task difficulty. We will implement randomized difficulty levels in the future.

Even with above-mentioned limitations, we believe that this current work presents significant contributions towards developing cognitively intelligent VR-based driving systems that are robust, accurate, and useful for real-world applications indicating commercial viability in the near future. This system was explicitly designed for an ASD population who evidence both challenges with this functional adaptive skill (e.g., driving) and also historically found to evidence systemic, but complex heterogeneous impairments regarding information processing (e.g., working memory and executive functioning challenges, difficulties with social processing). We hypothesize that a multimodal fusion methodology capable of use within/across readily controllable intervention platforms (such as VR) could yield a tool for dramatically improving current modes of treatment. In this capacity the current findings will be used in future work developing a cognitively intelligent VR-based driving system. Its efficiency will be evaluated by comparing with a system without cognitive load-based feedback.

The generalizability of the training using our VR-based driving system will be evaluated in the real world in the future. A driving simulator, such as our VR-based driving system, is obviously not perfect for the on-road setting [63]. However, it should be noted that driving behaviors of people in such kinds of simulators are similar to their driving behaviors in real-world driving [64]. The speed patterns of people driving in a driving simulator were found to be similar to the speed patterns when driving in real world [65]. Traffic risk pattern, in terms of crash history, has also been shown to generalize from the simulator to the real world [66]. The extant literature supports the usefulness of driving simulators. Evaluating the usefulness of training using our VR-based driving system for real world driving, in terms of speed-maintenance and error-reduction, will be carried out in future work.

Acknowledgments

This work was supported in part by the National Institute of Health Grant 1R01MH091102-01A1, National Science Foundation Grant 0967170 and the Hobbs Society Grant from the Vanderbilt Kennedy Center.

Biographies

graphic file with name nihms806802b1.gif

Lian Zhang received the M.S. degree from the School of Automation Science and Electrical Engineering, in 2012, from Beihang University, Beijing, China. She is currently a Ph.D. student in the department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA. Her research interests include affective computing, virtual reality, machine learning, and human-computer interaction.

graphic file with name nihms806802b2.gif

Joshua Wade received the B.S. degree in computer science from Middle Tennessee State University in Murfreesboro, TN, USA in 2013 and the M.S. degree in computer science from Vanderbilt University in Nashville, TN, USA in 2015. His research interests include virtual reality, embedded systems, and machine learning.

graphic file with name nihms806802b3.gif

Dayi Bian received the B.S. degree in automation from Nanjing University, Nanjing, China and the M.S. degree in instrumental science and technology from Southeast University, Nanjing, China. He is currently working toward the Ph.D. degree in electrical engineering at Vanderbilt University.

graphic file with name nihms806802b4.gif

Jing Fan received the M.S. degree in electrical engineering, in 2014, from Vanderbilt University, Nashville, TN, USA, where she is currently working toward the Ph.D. degree in electrical engineering. Her research interests include human-robot interaction, robotics, machine learning, and cognitive computing.

graphic file with name nihms806802b5.gif

Amy R. Swanson received the M.A. degree in social science from University of Chicago, Chicago, IL, USA, in 2006. Currently she is research analyst at Vanderbilt Kennedy Center’s Treatment and Research Institute for Autism Spectrum Disorders, Nashville, TN, USA.

graphic file with name nihms806802b6.gif

Dr. Amy Weitlauf is an Assistant Professor of Pediatrics at Vanderbilt University Medical Center. She is a clinical psychologist with expertise in early diagnosis of Autism Spectrum Disorder.

graphic file with name nihms806802b7.gif

Dr. Zachary Warren received a Ph.D. in Clinical Psychology in 2005 from the University of Miami and is currently an Associate Professor of Pediatrics, Psychiatry, and Special Education at Vanderbilt University. He is Executive Director of the Vanderbilt Kennedy Center’s (VKC) Treatment and Research Institute on Autism Spectrum Disorders (TRIAD), Director of Autism Clinical Services within the Division of Developmental Medicine at Vanderbilt Children’s Hospital, and Director of Autism Research for the VKC and the Department of Pediatrics.

graphic file with name nihms806802b8.gif

Nilanjan Sarkar (S’92–M’93–SM’04) received Ph.D. in mechanical engineering and applied mechanics from the University of Pennsylvania, Philadelphia, in 1993. After a Postdoctoral Fellowship at Queen’s University, Canada, he joined the University of Hawaii as an Assistant Professor in mechanical engineering. In 2000, Dr. Sarkar joined Vanderbilt University, Nashville, TN, where he is currently a Professor of mechanical engineering and electrical engineering and computer science. His current research interests include human–robot interaction, affective computing, dynamics, and control. Dr. Sarkar is a Fellow of American Society of Mechanical Engineers. He served as an associate editor for the IEEE Transactions on Robotics.

Contributor Information

Lian Zhang, Department of Electrical Engineering and Computer Science Department, Vanderbilt University, Nashville, TN, USA.

Joshua Wade, Department of Electrical Engineering and Computer Science Department, Vanderbilt University, Nashville, TN, USA.

Dayi Bian, Department of Electrical Engineering and Computer Science Department, Vanderbilt University, Nashville, TN, USA.

Jing Fan, Department of Electrical Engineering and Computer Science Department, Vanderbilt University, Nashville, TN, USA.

Amy Swanson, Vanderbilt Kennedy Center, Treatment and Research Institute for Autism Spectrum Disorders, Vanderbilt University, Nashville, TN, USA.

Amy Weitlauf, Department of Pediatrics, Vanderbilt Kennedy Center, Treatment and Research Institute for Autism Spectrum Disorders, Vanderbilt University, Nashville, TN, USA.

Zachary Warren, Department of Pediatrics, Vanderbilt Kennedy Center, Treatment and Research Institute for Autism Spectrum Disorders, Vanderbilt University, Nashville, TN, USA.

Nilanjan Sarkar, Department of Electrical Engineering and Computer Science and Department of Mechanical Engineering, Vanderbilt University, Robotics and Autonomous Systems Laboratory, Vanderbilt University, Department of Mechanical Engineering, Olin Hall Room 101, 2400 Highland Avenue, Nashville, TN, USA. 37212.

References

  • 1.Lord C, Cook EH. Autism spectrum disorders. Autism: The Science of Mental Health. 2013;28:217. [Google Scholar]
  • 2.Wingate M, Kirby RS, Pettygrove S, et al. Prevalence of autism spectrum disorder among children aged 8 years-autism and developmental disabilities monitoring network, 11 sites, United States, 2010. MMWR SURVEILLANCE SUMMARIES. 2014;63(2) [PubMed] [Google Scholar]
  • 3.Sundberg ML, Partington JW. Teaching language to children with autism and other developmental disabilities. Pleasant Hill, CA: Behavior Analysts Inc; 1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Bauminger N. The facilitation of social-emotional understanding and social interaction in high-functioning children with autism: Intervention outcomes. Journal of autism and developmental disorders. 2002;32(4):283–298. doi: 10.1023/a:1016378718278. [DOI] [PubMed] [Google Scholar]
  • 5.Reimer B, Fried R, Mehler B, et al. Brief report: Examining driving behavior in young adults with high functioning autism spectrum disorders: A pilot study using a driving simulation paradigm. Journal of autism and developmental disorders. 2013;43(9):2211–2217. doi: 10.1007/s10803-013-1764-4. [DOI] [PubMed] [Google Scholar]
  • 6.Rogers SJ. Empirically supported comprehensive treatments for young children with autism. Journal of clinical child psychology. 1998;27(2):168–179. doi: 10.1207/s15374424jccp2702_4. [DOI] [PubMed] [Google Scholar]
  • 7.Strickland D. Virtual reality for the treatment of autism. Studies in health technology and informatics. 1997:81–86. [PubMed] [Google Scholar]
  • 8.Classen S, Monahan M, Hernandez S. Indicators of simulated driving skills in adolescents with autism spectrum disorder. The Open Journal of Occupational Therapy. 2013;1(4):2. [Google Scholar]
  • 9.Wade J, Zhang L, Bian D, et al. A gaze-contingent adaptive virtual reality driving environment for intervention in individuals with autism spectrum disorders. ACM Transactions on Interactive Intelligent Systems. 2016 (In press) [Google Scholar]
  • 10.Csikszentmihalyi M. Flow: The psychology of optimal performance. NY: Cambridge University Press; 1990. [Google Scholar]
  • 11.Chanel G, Rebetez C, Bétrancourt M, et al. Emotion assessment from physiological signals for adaptation of game difficulty. Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on. 2011;41(6):1052–1063. [Google Scholar]
  • 12.Paas F, Tuovinen JE, Tabbers H, et al. Cognitive load measurement as a means to advance cognitive load theory. Educational psychologist. 2003;38(1):63–71. [Google Scholar]
  • 13.Novak D, Mihelj M, Munih M. A survey of methods for data fusion and system adaptation using autonomic nervous system responses in physiological computing. Interacting with computers. 2012;24(3):154–172. [Google Scholar]
  • 14.Koenig A, Novak D, Omlin X, et al. Real-time closed-loop control of cognitive load in neurological patients during robot-assisted gait training. Neural Systems and Rehabilitation Engineering, IEEE Transactions on. 2011;19(4):453–464. doi: 10.1109/TNSRE.2011.2160460. [DOI] [PubMed] [Google Scholar]
  • 15.Yannakakis GN, Togelius J. Experience-driven procedural content generation. Affective Computing, IEEE Transactions on. 2011;2(3):147–161. [Google Scholar]
  • 16.Paas FG, Van Merriënboer JJ. Instructional control of cognitive load in the training of complex cognitive tasks. Educational Psychology Review. 1994;6(4):351–371. [Google Scholar]
  • 17.De Jong T. Cognitive load theory, educational research, and instructional design: some food for thought. Instructional Science. 2010;38(2):105–134. [Google Scholar]
  • 18.Schoor C, Bannert M, Brünken R. Role of dual task design when measuring cognitive load during multimedia learning. Educational Technology Research and Development. 2012;60(5):753–768. [Google Scholar]
  • 19.Sweller J. Element interactivity and intrinsic, extraneous, and germane cognitive load. Educational Psychology Review. 2010;22(2):123–138. [Google Scholar]
  • 20.Schnotz W, Kürschner C. A reconsideration of cognitive load theory. Educational Psychology Review. 2007;19(4):469–508. [Google Scholar]
  • 21.Rajendran G, Mitchell P. Cognitive theories of autism. Developmental review. 2007;27(2):224–260. [Google Scholar]
  • 22.Bennetto L, Pennington BF, Rogers SJ. Intact and impaired memory functions in autism. Child development. 1996;67(4):1816–1835. [PubMed] [Google Scholar]
  • 23.Remington A, Swettenham J, Campbell R, et al. Selective attention and perceptual load in autism spectrum disorder. Psychological Science. 2009;20(11):1388–1393. doi: 10.1111/j.1467-9280.2009.02454.x. [DOI] [PubMed] [Google Scholar]
  • 24.Ozonoff S, Strayer DL. Further evidence of intact working memory in autism. Journal of autism and developmental disorders. 2001;31(3):257–263. doi: 10.1023/a:1010794902139. [DOI] [PubMed] [Google Scholar]
  • 25.Meshkati N, Hancock PA, Rahimi M, et al. Techniques in mental workload assessment. 1995 [Google Scholar]
  • 26.Pomplun M, Sunkara S. Pupil dilation as an indicator of cognitive workload in human-computer interaction [Google Scholar]
  • 27.Gevins A, Smith ME, Leong H, et al. Monitoring working memory load during computer-based tasks with EEG pattern recognition methods. Human factors: the journal of the human factors and ergonomics society. 1998;40(1):79–91. doi: 10.1518/001872098779480578. [DOI] [PubMed] [Google Scholar]
  • 28.Gevins A, Smith ME. Neurophysiological measures of working memory and individual differences in cognitive ability and cognitive style. Cerebral cortex. 2000;10(9):829–839. doi: 10.1093/cercor/10.9.829. [DOI] [PubMed] [Google Scholar]
  • 29.Mehler B, Reimer B, Coughlin JF, et al. Impact of incremental increases in cognitive workload on physiological arousal and performance in young adult drivers. Transportation Research Record: Journal of the Transportation Research Board. 2009;2138(1):6–12. [Google Scholar]
  • 30.Son J, Park M. Estimating cognitive load complexity using performance and physiological data in a driving simulator [Google Scholar]
  • 31.Hussain MS, Calvo RA, Chen F. Automatic cognitive load detection from face, physiology, task performance and fusion during affective interference. Interacting with computers. 2013:iwt032. [Google Scholar]
  • 32.Son J, Oh H, Park M. Identification of driver cognitive workload using support vector machines with driving performance, physiology and eye movement in a driving simulator. International Journal of Precision Engineering and Manufacturing. 2013;14(8):1321–1327. [Google Scholar]
  • 33.Novak D, Mihelj M, Ziherl J, et al. Psychophysiological measurements in a biocooperative feedback loop for upper extremity rehabilitation. Neural Systems and Rehabilitation Engineering, IEEE Transactions on. 2011;19(4):400–410. doi: 10.1109/TNSRE.2011.2160357. [DOI] [PubMed] [Google Scholar]
  • 34.Lin Y, Tang P, Zhang W, et al. Artificial neural network modelling of driver handling behaviour in a driver-vehicle-environment system. International Journal of Vehicle Design. 2005;37(1):24–45. [Google Scholar]
  • 35.Narsky I, Porter FC. Statistical analysis techniques in particle physics: Fits, density estimation and supervised learning. John Wiley & Sons; 2013. [Google Scholar]
  • 36.Lagun D, Manzanares C, Zola SM, et al. Detecting cognitive impairment by eye movement analysis using automatic classification algorithms. Journal of neuroscience methods. 2011;201(1):196–203. doi: 10.1016/j.jneumeth.2011.06.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Steichen B, Conati C, Carenini G. Inferring Visualization Task Properties, User Performance, and User Cognitive Abilities from Eye Gaze Data. ACM Transactions on Interactive Intelligent Systems (TiiS) 2014;4(2):11. [Google Scholar]
  • 38.Werling DM, Geschwind DH. Sex differences in autism spectrum disorders. Current opinion in neurology. 2013;26(2):146. doi: 10.1097/WCO.0b013e32835ee548. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Kim J, André E. Emotion recognition based on physiological changes in music listening. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 2008;30(12):2067–2083. doi: 10.1109/TPAMI.2008.26. [DOI] [PubMed] [Google Scholar]
  • 40.Liu C, Agrawal P, Sarkar N, et al. Dynamic difficulty adjustment in computer games through real-time anxiety-based affective feedback. International Journal of Human-Computer Interaction. 2009;25(6):506–529. [Google Scholar]
  • 41.Paas FG. Training strategies for attaining transfer of problem-solving skill in statistics: A cognitive-load approach. Journal of educational psychology. 1992;84(4):429. [Google Scholar]
  • 42.Olsen A. The Tobii I-VT fixation filter. Copyright© Tobii Technology AB. 2012 [Google Scholar]
  • 43.Lahiri U, Warren Z, Sarkar N. Design of a gaze-sensitive virtual social interactive system for children with autism. Neural Systems and Rehabilitation Engineering, IEEE Transactions on. 2011;19(4):443–452. doi: 10.1109/TNSRE.2011.2153874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Klem GH, Lüders HO, Jasper H, et al. The ten-twenty electrode system of the International Federation. Electroencephalogr Clin Neurophysiol. 1999;52(suppl.):3. [PubMed] [Google Scholar]
  • 45.Lushchekina E, Podreznaya E, Lushchekin V, et al. Characteristics of the spectral power of EEG rhythms in children with early childhood autism and their association with the development of different symptoms of schizophrenia. Neuroscience and Behavioral Physiology. 2013;43(1):40–45. [Google Scholar]
  • 46.De Clercq W, Vergult A, Vanrumste B, et al. Canonical Correlation Analysis Applied to Remove Muscle Artifacts From the Electroencephalogram. Biomedical Engineering, IEEE Transactions on. 2006;53(12):2583–2587. doi: 10.1109/TBME.2006.879459. [DOI] [PubMed] [Google Scholar]
  • 47.Antonenko P, Paas F, Grabner R, et al. Using electroencephalography to measure cognitive load. Educational Psychology Review. 2010;22(4):425–438. [Google Scholar]
  • 48.Sarkar N. Psychophysiological control architecture for human-robot coordination-concepts and initial experiments. :3719–3724. [Google Scholar]
  • 49.Bishop CM. Pattern recognition and machine learning. New York: springer; 2006. [Google Scholar]
  • 50.Aditya S, Tibarewala D. Comparing ANN, LDA, QDA, KNN and SVM algorithms in classifying relaxed and stressful mental state from two-channel prefrontal EEG data. International Journal of Artificial Intelligence and Soft Computing. 2012;3(2):143–164. [Google Scholar]
  • 51.Safavian SR, Landgrebe D. A survey of decision tree classifier methodology. 1990 [Google Scholar]
  • 52.Fisher RA. The use of multiple measurements in taxonomic problems. Annals of eugenics. 1936;7(2):179–188. [Google Scholar]
  • 53.Hagan MT, Demuth HB, Beale MH, et al. Neural network design. Boston: PWS publishing company; 1996. [Google Scholar]
  • 54.Bergstra J, Bengio Y. Random search for hyper-parameter optimization. The Journal of Machine Learning Research. 2012;13(1):281–305. [Google Scholar]
  • 55.Atrey PK, Hossain MA, El Saddik A, et al. Multimodal fusion for multimedia analysis: a survey. Multimedia systems. 2010;16(6):345–379. [Google Scholar]
  • 56.Koelstra S, Mühl C, Soleymani M, et al. Deap: A database for emotion analysis; using physiological signals. Affective Computing, IEEE Transactions on. 2012;3(1):18–31. [Google Scholar]
  • 57.Liu ES, Theodoropoulos GK. Interest management for distributed virtual environments: A survey. ACM Computing Surveys (CSUR) 2014;46(4):51. [Google Scholar]
  • 58.Miller S. National Advanced Driving Simulator. Iowa City, United States: 2001. Workload measures. [Google Scholar]
  • 59.Hussain S, Chen S, Calvo RA, et al. Classification of Cognitive Load from Task Performance & Multichannel Physiology during Affective Changes [Google Scholar]
  • 60.Wu D, Courtney CG, Lance BJ, et al. Optimal arousal identification and classification for affective computing using physiological signals: virtual reality Stroop task. Affective Computing, IEEE Transactions on. 2010;1(2):109–118. [Google Scholar]
  • 61.Mukaka M. A guide to appropriate use of Correlation coefficient in medical research. Malawi Medical Journal. 2012;24(3):69–71. [PMC free article] [PubMed] [Google Scholar]
  • 62.Dement W, Kleitman N. Cyclic variations in EEG during sleep and their relation to eye movements, body motility, and dreaming. Electroencephalography and clinical neurophysiology. 1957;9(4):673–690. doi: 10.1016/0013-4694(57)90088-3. [DOI] [PubMed] [Google Scholar]
  • 63.Godley ST, Triggs TJ, Fildes BN. Driving simulator validation for speed research. Accident Analysis & Prevention. 2002;34(5):589–600. doi: 10.1016/s0001-4575(01)00056-2. [DOI] [PubMed] [Google Scholar]
  • 64.Keith K, Trentacoste M, Depue L, et al. Roadway human factors and behavioral safety in europe. 2005 [Google Scholar]
  • 65.Bella F. Driving simulator for speed research on two-lane rural roads. Accident Analysis & Prevention. 2008;40(3):1078–1087. doi: 10.1016/j.aap.2007.10.015. [DOI] [PubMed] [Google Scholar]
  • 66.Yan X, Abdel-Aty M, Radwan E, et al. Validating a driving simulator using surrogate safety measures. Accident Analysis & Prevention. 2008;40(1):274–288. doi: 10.1016/j.aap.2007.06.007. [DOI] [PubMed] [Google Scholar]

RESOURCES